1
|
Martinka J, Pederzoli M, Barbatti M, Dral PO, Pittner J. A simple approach to rotationally invariant machine learning of a vector quantity. J Chem Phys 2024; 161:174104. [PMID: 39484894 DOI: 10.1063/5.0230176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2024] [Accepted: 10/14/2024] [Indexed: 11/03/2024] Open
Abstract
Unlike with the energy, which is a scalar property, machine learning (ML) prediction of vector or tensor properties poses the additional challenge of achieving proper invariance (covariance) with respect to molecular rotation. For the energy gradients needed in molecular dynamics (MD), this symmetry is automatically fulfilled when taking analytic derivative of the energy, which is a scalar invariant (using properly invariant molecular descriptors). However, if the properties cannot be obtained by differentiation, other appropriate methods should be applied to retain the covariance. Several approaches have been suggested to properly treat this issue. For nonadiabatic couplings and polarizabilities, for example, it was possible to construct virtual quantities from which the above tensorial properties are obtained by differentiation and thus guarantee the covariance. Another possible solution is to build the rotational equivariance into the design of a neural network employed in the model. Here, we propose a simpler alternative technique, which does not require construction of auxiliary properties or application of special equivariant ML techniques. We suggest a three-step approach, using the molecular tensor of inertia. In the first step, the molecule is rotated using the eigenvectors of this tensor to its principal axes. In the second step, the ML procedure predicts the vector property relative to this orientation, based on a training set where all vector properties were in this same coordinate system. As the third step, it remains to transform the ML estimate of the vector property back to the original orientation. This rotate-predict-rotate (RPR) procedure should thus guarantee proper covariance of a vector property and is trivially extensible also to tensors such as polarizability. The RPR procedure has an advantage that the accurate models can be trained very fast for thousands of molecular configurations, which might be beneficial where many training sets are required (e.g., in active learning). We have implemented the RPR technique, using the MLatom and Newton-X programs for ML and MD, and performed its assessment on the dipole moment along MD trajectories of 1,2-dichloroethane.
Collapse
Affiliation(s)
- Jakub Martinka
- J. Heyrovský Institute of Physical Chemistry, Academy of Sciences of the Czech Republic, v.v.i., Dolejškova 3, 18223 Prague 8, Czech Republic
- Department of Physical and Macromolecular Chemistry, Faculty of Sciences, Charles University, Hlavova 8, 12843 Prague 2, Czech Republic
| | - Marek Pederzoli
- J. Heyrovský Institute of Physical Chemistry, Academy of Sciences of the Czech Republic, v.v.i., Dolejškova 3, 18223 Prague 8, Czech Republic
| | - Mario Barbatti
- Aix Marseille University, CNRS, ICR, Marseille, France
- Institut Universitaire de France, 75231 Paris, France
| | - Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Xiamen University, Xiamen, Fujian 361005, China
- Institute of Physics, Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University in Toruń, ul. Grudziądzka 5, 87-100 Toruń, Poland
| | - Jiří Pittner
- J. Heyrovský Institute of Physical Chemistry, Academy of Sciences of the Czech Republic, v.v.i., Dolejškova 3, 18223 Prague 8, Czech Republic
| |
Collapse
|
2
|
Zhang Y, Jiang B. Universal machine learning for the response of atomistic systems to external fields. Nat Commun 2023; 14:6424. [PMID: 37827998 PMCID: PMC10570356 DOI: 10.1038/s41467-023-42148-y] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2023] [Accepted: 10/01/2023] [Indexed: 10/14/2023] Open
Abstract
Machine learned interatomic interaction potentials have enabled efficient and accurate molecular simulations of closed systems. However, external fields, which can greatly change the chemical structure and/or reactivity, have been seldom included in current machine learning models. This work proposes a universal field-induced recursively embedded atom neural network (FIREANN) model, which integrates a pseudo field vector-dependent feature into atomic descriptors to represent system-field interactions with rigorous rotational equivariance. This "all-in-one" approach correlates various response properties like dipole moment and polarizability with the field-dependent potential energy in a single model, very suitable for spectroscopic and dynamics simulations in molecular and periodic systems in the presence of electric fields. Especially for periodic systems, we find that FIREANN can overcome the intrinsic multiple-value issue of the polarization by training atomic forces only. These results validate the universality and capability of the FIREANN method for efficient first-principles modeling of complicated systems in strong external fields.
Collapse
Affiliation(s)
- Yaolong Zhang
- Key Laboratory of Precision and Intelligent Chemistry, Department of Chemical Physics, Key Laboratory of Surface and Interface Chemistry and Energy Catalysis of Anhui Higher Education Institutes, University of Science and Technology of China, Hefei, Anhui, 230026, China
- École Polytechnique Fédérale de Lausanne, 1015, Lausanne, Switzerland
| | - Bin Jiang
- Key Laboratory of Precision and Intelligent Chemistry, Department of Chemical Physics, Key Laboratory of Surface and Interface Chemistry and Energy Catalysis of Anhui Higher Education Institutes, University of Science and Technology of China, Hefei, Anhui, 230026, China.
- Hefei National Laboratory, University of Science and Technology of China, Hefei, 230088, China.
| |
Collapse
|
3
|
Fan J, Qian C, Zhou S. Machine Learning Spectroscopy Using a 2-Stage, Generalized Constituent Contribution Protocol. RESEARCH (WASHINGTON, D.C.) 2023; 6:0115. [PMID: 37287889 PMCID: PMC10243197 DOI: 10.34133/research.0115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Accepted: 03/20/2023] [Indexed: 06/09/2023]
Abstract
A corrected group contribution (CGC)-molecule contribution (MC)-Bayesian neural network (BNN) protocol for accurate prediction of absorption spectra is presented. Upon combination of BNN with CGC methods, the full absorption spectra of various molecules are afforded accurately and efficiently-by using only a small dataset for training. Here, with a small training sample (<100), accurate prediction of maximum wavelength for single molecules is afforded with the first stage of the protocol; by contrast, previously reported machine learning (ML) methods require >1,000 samples to ensure the accuracy of prediction. Furthermore, with <500 samples, the mean square error in the prediction of full ultraviolet spectra reaches <2%; for comparison, ML models with molecular SMILES for training require a much larger dataset (>2,000) to achieve comparable accuracy. Moreover, by employing an MC method designed specifically for CGC that properly interprets the mixing rule, the spectra of mixtures are obtained with high accuracy. The logical origins of the good performance of the protocol are discussed in detail. Considering that such a constituent contribution protocol combines chemical principles and data-driven tools, most likely, it will be proven efficient to solve molecular-property-relevant problems in wider fields.
Collapse
Affiliation(s)
- Jinming Fan
- College of Chemical and Biological Engineering, Zhejiang Provincial Key Laboratory of Advanced Chemical Engineering Manufacture Technology, Zhejiang University, 310027 Hangzhou, P. R. China
- Institute of Zhejiang University - Quzhou, Zheda Rd. #99, 324000 Quzhou, P. R. China
| | - Chao Qian
- College of Chemical and Biological Engineering, Zhejiang Provincial Key Laboratory of Advanced Chemical Engineering Manufacture Technology, Zhejiang University, 310027 Hangzhou, P. R. China
- Institute of Zhejiang University - Quzhou, Zheda Rd. #99, 324000 Quzhou, P. R. China
| | - Shaodong Zhou
- College of Chemical and Biological Engineering, Zhejiang Provincial Key Laboratory of Advanced Chemical Engineering Manufacture Technology, Zhejiang University, 310027 Hangzhou, P. R. China
- Institute of Zhejiang University - Quzhou, Zheda Rd. #99, 324000 Quzhou, P. R. China
| |
Collapse
|
4
|
Boodaghidizaji M, Milind Athalye S, Thakur S, Esmaili E, Verma MS, Ardekani AM. Characterizing viral samples using machine learning for Raman and absorption spectroscopy. Microbiologyopen 2022; 11:e1336. [PMID: 36479629 PMCID: PMC9721089 DOI: 10.1002/mbo3.1336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Revised: 10/31/2022] [Accepted: 10/31/2022] [Indexed: 12/12/2022] Open
Abstract
Machine learning methods can be used as robust techniques to provide invaluable information for analyzing biological samples in pharmaceutical industries, such as predicting the concentration of viral particles of interest in biological samples. Here, we utilized both convolutional neural networks (CNNs) and random forests (RFs) to predict the concentration of the samples containing measles, mumps, rubella, and varicella-zoster viruses (ProQuad®) based on Raman and absorption spectroscopy. We prepared Raman and absorption spectra data sets with known concentration values, then used the Raman and absorption signals individually and together to train RFs and CNNs. We demonstrated that both RFs and CNNs can make predictions with R2 values as high as 95%. We proposed two different networks to jointly use the Raman and absorption spectra, where our results demonstrated that concatenating the Raman and absorption data increases the prediction accuracy compared to using either Raman or absorption spectrum alone. Additionally, we further verified the advantage of using joint Raman-absorption with principal component analysis. Furthermore, our method can be extended to characterize properties other than concentration, such as the type of viral particles.
Collapse
Affiliation(s)
| | - Shreya Milind Athalye
- Department of Agricultural and Biological EngineeringPurdue UniversityWest LafayetteIndianaUSA
| | - Sukirt Thakur
- School of Mechanical EngineeringPurdue UniversityWest LafayetteIndianaUSA
| | - Ehsan Esmaili
- School of Mechanical EngineeringPurdue UniversityWest LafayetteIndianaUSA
| | - Mohit S. Verma
- Department of Agricultural and Biological EngineeringPurdue UniversityWest LafayetteIndianaUSA
- Weldon School of Biomedical EngineeringPurdue UniversityWest LafayetteIndianaUSA
- Birck Nanotechnology CenterPurdue UniversityWest LafayetteIndianaUSA
| | - Arezoo M. Ardekani
- School of Mechanical EngineeringPurdue UniversityWest LafayetteIndianaUSA
| |
Collapse
|
5
|
Ren H, Zhang Q, Wang Z, Zhang G, Liu H, Guo W, Mukamel S, Jiang J. Machine learning recognition of protein secondary structures based on two-dimensional spectroscopic descriptors. Proc Natl Acad Sci U S A 2022; 119:e2202713119. [PMID: 35476517 PMCID: PMC9171355 DOI: 10.1073/pnas.2202713119] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Accepted: 03/28/2022] [Indexed: 11/29/2022] Open
Abstract
Protein secondary structure discrimination is crucial for understanding their biological function. It is not generally possible to invert spectroscopic data to yield the structure. We present a machine learning protocol which uses two-dimensional UV (2DUV) spectra as pattern recognition descriptors, aiming at automated protein secondary structure determination from spectroscopic features. Accurate secondary structure recognition is obtained for homologous (97%) and nonhomologous (91%) protein segments, randomly selected from simulated model datasets. The advantage of 2DUV descriptors over one-dimensional linear absorption and circular dichroism spectra lies in the cross-peak information that reflects interactions between local regions of the protein. Thanks to their ultrafast (∼200 fs) nature, 2DUV measurements can be used in the future to probe conformational variations in the course of protein dynamics.
Collapse
Affiliation(s)
- Hao Ren
- School of Materials Science and Engineering, China University of Petroleum (East China), Qingdao 266580, Shandong, China
| | - Qian Zhang
- School of Materials Science and Engineering, China University of Petroleum (East China), Qingdao 266580, Shandong, China
| | - Zhengjie Wang
- School of Materials Science and Engineering, China University of Petroleum (East China), Qingdao 266580, Shandong, China
| | - Guozhen Zhang
- School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, Anhui, China
| | - Hongzhang Liu
- School of Materials Science and Engineering, China University of Petroleum (East China), Qingdao 266580, Shandong, China
| | - Wenyue Guo
- School of Materials Science and Engineering, China University of Petroleum (East China), Qingdao 266580, Shandong, China
| | - Shaul Mukamel
- Department of Chemistry and Physics & Astronomy, University of California, Irvine, CA 92697
| | - Jun Jiang
- School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, Anhui, China
| |
Collapse
|
6
|
Wang Y, Zhao L, Zhou X, Zhang J, Jiang J, Dong H. Global Fold Switching of the RafH Protein: Diverse Structures with a Conserved Pathway. J Phys Chem B 2022; 126:2979-2989. [PMID: 35438983 DOI: 10.1021/acs.jpcb.1c10965] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
It is generally believed that a protein's sequence uniquely determines its structure, the basis for a protein to perform biological functions. However, as a representative metamorphic protein, RfaH can be encoded by a single amino acid sequence into two distinct native state structures. Its C-terminal domain (CTD) either takes an all-α-helical configuration to pack tightly with its N-terminal domain (NTD), or the CTD disassociates from the NTD, transforms into an all-β-barrel fold, and further attaches to the ribosome, leaving the NTD exposed to bind RNA polymerases. Therefore, the RfaH protein couples transcription and translation processes. Although previous studies have provided a preliminary understanding of its function, the full course of the conformational change of RfaH-CTD at the atomic level is elusive. We used teDA2, a feature space-based enhanced sampling protocol, to explore the transformation of RfaH-CTD. We found that it undergoes a large-scale structural rearrangement, with characteristic spectra as the fingerprint, and a global unfolding transition with a tighter and energetically moderate molten globule-like nucleus formed in between. The formation of this nucleus limits the possible intermediate conformations, facilitates the formation of secondary and tertiary structures, and thus ensures the efficiency of transformation. The key features along the transition path disclosed from this work are likely associated with the evolution of RfaH, such that encoding a single sequence into multiple folds with distinct biological functions is energetically unhindered.
Collapse
Affiliation(s)
- Yiqiao Wang
- Kuang Yaming Honors School, Nanjing University, Nanjing 210023, China.,School of Physics, National Laboratory of Solid State Microstructure, and Collaborative Innovation Center of Advanced Microstructures, Nanjing University, Nanjing 210093, China
| | - Luyuan Zhao
- Hefei National Laboratory for Physical Sciences at the Microscale, Collaborative Innovation Center of Chemistry for Energy Materials, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, Anhui, China
| | - Xuejie Zhou
- Kuang Yaming Honors School, Nanjing University, Nanjing 210023, China
| | - Jian Zhang
- School of Physics, National Laboratory of Solid State Microstructure, and Collaborative Innovation Center of Advanced Microstructures, Nanjing University, Nanjing 210093, China.,Institute for Brain Sciences, Nanjing University, Nanjing 210023, China
| | - Jun Jiang
- Hefei National Laboratory for Physical Sciences at the Microscale, Collaborative Innovation Center of Chemistry for Energy Materials, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, Anhui, China
| | - Hao Dong
- Kuang Yaming Honors School, Nanjing University, Nanjing 210023, China.,Institute for Brain Sciences, Nanjing University, Nanjing 210023, China.,State Key Laboratory of Analytical Chemistry for Life Science, Nanjing University, Nanjing 210023, China.,Engineering Research Center of Protein and Peptide Medicine of Ministry of Education, Nanjing University, Nanjing 210023, China
| |
Collapse
|
7
|
Meuwly M. Atomistic Simulations for Reactions and Vibrational Spectroscopy in the Era of Machine Learning─ Quo Vadis?. J Phys Chem B 2022; 126:2155-2167. [PMID: 35286087 DOI: 10.1021/acs.jpcb.2c00212] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Atomistic simulations using accurate energy functions can provide molecular-level insight into functional motions of molecules in the gas and in the condensed phase. This Perspective delineates the present status of the field from the efforts of others and some of our own work and discusses open questions and future prospects. The combination of physics-based long-range representations using multipolar charge distributions and kernel representations for the bonded interactions is shown to provide realistic models for the exploration of the infrared spectroscopy of molecules in solution. For reactions, empirical models connecting dedicated energy functions for the reactant and product states allow statistically meaningful sampling of conformational space whereas machine-learned energy functions are superior in accuracy. The future combination of physics-based models with machine-learning techniques and integration into all-purpose molecular simulation software provides a unique opportunity to bring such dynamics simulations closer to reality.
Collapse
Affiliation(s)
- Markus Meuwly
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, 4056 Basel, Switzerland
| |
Collapse
|
8
|
Zhao L, Zhang J, Zhang Y, Ye S, Zhang G, Chen X, Jiang B, Jiang J. Accurate Machine Learning Prediction of Protein Circular Dichroism Spectra with Embedded Density Descriptors. JACS AU 2021; 1:2377-2384. [PMID: 34977905 PMCID: PMC8715543 DOI: 10.1021/jacsau.1c00449] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Indexed: 05/08/2023]
Abstract
A data-driven approach to simulate circular dichroism (CD) spectra is appealing for fast protein secondary structure determination, yet the challenge of predicting electric and magnetic transition dipole moments poses a substantial barrier for the goal. To address this problem, we designed a new machine learning (ML) protocol in which ordinary pure geometry-based descriptors are replaced with alternative embedded density descriptors and electric and magnetic transition dipole moments are successfully predicted with an accuracy comparable to first-principle calculation. The ML model is able to not only simulate protein CD spectra nearly 4 orders of magnitude faster than conventional first-principle simulation but also obtain CD spectra in good agreement with experiments. Finally, we predicted a series of CD spectra of the Trp-cage protein associated with continuous changes of protein configuration along its folding path, showing the potential of our ML model for supporting real-time CD spectroscopy study of protein dynamics.
Collapse
Affiliation(s)
- Luyuan Zhao
- Hefei
National Laboratory for Physical Sciences at the Microscale, Collaborative
Innovation Center of Chemistry for Energy Materials, School of Chemistry
and Materials Science, University of Science
and Technology of China, Hefei, Anhui 230026, P. R. China
| | - Jinxiao Zhang
- Guangxi
Key Laboratory of Electrochemical and Magneto-chemical Functional
Materials, College of Chemistry and Bioengineering, Guilin University of Technology, Guilin 541006, P. R. China
| | - Yaolong Zhang
- Hefei
National Laboratory for Physical Sciences at the Microscale, Collaborative
Innovation Center of Chemistry for Energy Materials, School of Chemistry
and Materials Science, University of Science
and Technology of China, Hefei, Anhui 230026, P. R. China
| | - Sheng Ye
- School
of Artificial Intelligence, Anhui University, Hefei, Anhui 230601, P. R. China
| | - Guozhen Zhang
- Hefei
National Laboratory for Physical Sciences at the Microscale, Collaborative
Innovation Center of Chemistry for Energy Materials, School of Chemistry
and Materials Science, University of Science
and Technology of China, Hefei, Anhui 230026, P. R. China
| | - Xin Chen
- Gusu
Laboratory of Materials, Suzhou, Jiangsu 215123, P. R. China
| | - Bin Jiang
- Hefei
National Laboratory for Physical Sciences at the Microscale, Collaborative
Innovation Center of Chemistry for Energy Materials, School of Chemistry
and Materials Science, University of Science
and Technology of China, Hefei, Anhui 230026, P. R. China
| | - Jun Jiang
- Hefei
National Laboratory for Physical Sciences at the Microscale, Collaborative
Innovation Center of Chemistry for Energy Materials, School of Chemistry
and Materials Science, University of Science
and Technology of China, Hefei, Anhui 230026, P. R. China
| |
Collapse
|