1
|
Manchev YT, Popelier PLA. Impact of Derivative Observations on Gaussian Process Machine Learning Potentials: A Direct Comparison of Three Modeling Approaches. J Chem Theory Comput 2025; 21:5490-5500. [PMID: 40408763 DOI: 10.1021/acs.jctc.5c00344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2025]
Abstract
Machine learning (ML) potentials have become a well-established tool for providing inexpensive, yet quantum-mechanically accurate, atomistic simulations. Here, we extend our current modeling procedure, based on Gaussian process regression, to include derivative observations into the ML models. We directly compare three system-energy modeling approaches based on quantum mechanically derived quantities: (i) atomic energies, (ii) total system energy, and (iii) total system energy with derivative observations. We find that modeling the total energy with derivative observations has the best performance across the board, achieving chemical accuracy with fewer training data. In addition, both energy and force errors are around an order of magnitude lower when derivative observations are added to the models in some cases. We follow up with a discussion on the multiple advantages the proposed method of modeling brings, such as improved data set availability and the ability to easily include dispersion interactions. Additionally, we discuss the use cases of the new modeling approach in the ML force field FFLUX.
Collapse
Affiliation(s)
- Yulian T Manchev
- Department of Chemistry, The University of Manchester, Manchester M13 9PL, Great Britain
| | - Paul L A Popelier
- Department of Chemistry, The University of Manchester, Manchester M13 9PL, Great Britain
| |
Collapse
|
2
|
Xia J, Zhang Y, Jiang B. The evolution of machine learning potentials for molecules, reactions and materials. Chem Soc Rev 2025; 54:4790-4821. [PMID: 40227021 DOI: 10.1039/d5cs00104h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/15/2025]
Abstract
Recent years have witnessed the fast development of machine learning potentials (MLPs) and their widespread applications in chemistry, physics, and material science. By fitting discrete ab initio data faithfully to continuous and symmetry-preserving mathematical forms, MLPs have enabled accurate and efficient atomistic simulations in a large scale from first principles. In this review, we provide an overview of the evolution of MLPs in the past two decades and focus on the state-of-the-art MLPs proposed in the last a few years for molecules, reactions, and materials. We discuss some representative applications of MLPs and the trend of developing universal potentials across a variety of systems. Finally, we outline a list of open challenges and opportunities in the development and applications of MLPs.
Collapse
Affiliation(s)
- Junfan Xia
- State Key Laboratory of Precision and Intelligent Chemistry, University of Science and Technology of China, Hefei, Anhui 230026, China.
- School of Chemistry and Materials Science, Department of Chemical Physics, University of Science and Technology of China, Hefei, Anhui 230026, China
| | - Yaolong Zhang
- Department of Chemistry and Chemical Biology, Center for Computational Chemistry, University of New Mexico, Albuquerque, New Mexico 87131, USA
| | - Bin Jiang
- State Key Laboratory of Precision and Intelligent Chemistry, University of Science and Technology of China, Hefei, Anhui 230026, China.
- School of Chemistry and Materials Science, Department of Chemical Physics, University of Science and Technology of China, Hefei, Anhui 230026, China
- Hefei National Laboratory, University of Science and Technology of China, Hefei, 230088, China
| |
Collapse
|
3
|
Lee Y, Chen X, Gericke SM, Li M, Zakharov DN, Head AR, Yang JC, Alexandrova AN. Machine-Learning-Driven Exploration of Surface Reconstructions of Reduced Rutile TiO 2. Angew Chem Int Ed Engl 2025:e202501017. [PMID: 40261805 DOI: 10.1002/anie.202501017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2025] [Revised: 03/28/2025] [Accepted: 04/22/2025] [Indexed: 04/24/2025]
Abstract
Titanium dioxide (TiO2) is widely used as a catalyst support due to its stability, tunable electronic properties, and surface oxygen vacancies, which are crucial for catalytic processes such as the reverse water-gas shift (RWGS) reaction. Reduced TiO2 surfaces undergo complex surface reconstructions that endow unique properties but are computationally challenging to describe. In this study, we utilize machine-learning interatomic potentials (MLIPs) integrated with an active-learning workflow to efficiently explore reduced rutile TiO2 surfaces. This approach enabled the prediction of a phase diagram as a function of oxygen chemical potential, revealing a variety of reconstructed phases, including a previously unreported subsurface shear plane structure. We further investigate the electronic properties of these surfaces and validate our results by comparing experimental and theoretical high-resolution transmission electron microscopy (HRTEM). Our findings provide new insights into how extreme surface reductions influence the structural and electronic properties of TiO2, with potential implications for catalyst design.
Collapse
Affiliation(s)
- Yonghyuk Lee
- Chemistry and Biochemistry, University of California Los Angeles, 607 Charles E. Young Drive East, Los Angeles, California, 90095, USA
| | - Xiaobo Chen
- Department of Chemical and Petroleum Engineering, University of Pittsburgh, 940 Benedum Hall, Pittsburgh, Pennsylvania, 15261, USA
| | - Sabrina M Gericke
- Center for Functional Nanomaterials, Brookhaven National Laboratory, Bldg. 735, Upton, New York, 11973, USA
| | - Meng Li
- Center for Functional Nanomaterials, Brookhaven National Laboratory, Bldg. 735, Upton, New York, 11973, USA
| | - Dmitri N Zakharov
- Center for Functional Nanomaterials, Brookhaven National Laboratory, Bldg. 735, Upton, New York, 11973, USA
| | - Ashley R Head
- Center for Functional Nanomaterials, Brookhaven National Laboratory, Bldg. 735, Upton, New York, 11973, USA
| | - Judith C Yang
- Department of Chemical and Petroleum Engineering, University of Pittsburgh, 940 Benedum Hall, Pittsburgh, Pennsylvania, 15261, USA
- Center for Functional Nanomaterials, Brookhaven National Laboratory, Bldg. 735, Upton, New York, 11973, USA
| | - Anastassia N Alexandrova
- Chemistry and Biochemistry, University of California Los Angeles, 607 Charles E. Young Drive East, Los Angeles, California, 90095, USA
| |
Collapse
|
4
|
Huang Q, Li Y, Zhu L, Yu W. Hierarchical Deep Potential with Structure Constraints for Efficient Coarse-Grained Modeling. J Chem Inf Model 2025; 65:3203-3214. [PMID: 40119793 DOI: 10.1021/acs.jcim.4c02042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/24/2025]
Abstract
Coarse-grained molecular dynamics is a powerful approach for simulating large-scale systems by reducing the number of degrees of freedom. Nonetheless, the development of accurate coarse-grained force fields remains challenging, particularly for complex systems, such as polymers. In this study, we introduce a novel framework, hierarchical deep potential with structure constraints (HDP-SC), designed to construct coarse-grained force fields for polymer materials. Our methodology integrates a prior energy term obtained through direct Boltzmann inversion with a deep neural network potential, which is trained using hierarchical bead environment descriptors. This framework facilitates the reproduction of structural distributions and the potential of mean force, thus enhancing the accuracy and efficiency of the coarse-grained model. We validate our approach using polystyrene systems, demonstrating that the HDP-SC model not only successfully reproduces the structural properties of these systems but also remains applicable at larger scales. Our findings underscore the promise of machine learning-based techniques in advancing the development of coarse-grained force fields for polymer materials.
Collapse
Affiliation(s)
- Qi Huang
- National Key Laboratory of Materials for Integrated Circuits, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai 200050, China
- College of Materials Science and Optoelectronic Technology, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yedi Li
- National Key Laboratory of Materials for Integrated Circuits, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai 200050, China
| | - Lei Zhu
- National Key Laboratory of Materials for Integrated Circuits, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai 200050, China
- College of Materials Science and Optoelectronic Technology, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wenjie Yu
- National Key Laboratory of Materials for Integrated Circuits, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai 200050, China
- College of Materials Science and Optoelectronic Technology, University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
5
|
Liu S, Yang Q, Zhang L, Luo S. Highly Precise Prediction of Micro- and Supra-pK a Based on 3D Descriptors Integrating Non-Covalent Interactions. Angew Chem Int Ed Engl 2025; 64:e202424069. [PMID: 39904757 DOI: 10.1002/anie.202424069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2024] [Revised: 02/02/2025] [Accepted: 02/04/2025] [Indexed: 02/06/2025]
Abstract
Accurate pKa prediction is crucial for understanding proton dissociation in complex molecular systems. However, existing models often face challenges in addressing subtle stereoelectronic effects and conformational flexibility. This study presents H-SPOC, a localized 3D descriptor that captures covalent and non-covalent interactions and incorporates solvent effects to predict site-specific pKa values accurately. H-SPOC was validated on multiple benchmark datasets, including SAMPL6, SAMPL7, and SAMPL8, where it outperformed state-of-the-art methods. H-SPOC also proved versatile across various applications, including 2-acetoxybenzoic acid (the main components of aspirin) non-equilibrium conformations, glycine's microstate distributions, and the stereoelectronic anomalies of Janus Sponge and Meldrum's Acid. It addressed challenging supra-pKa predictions in crystalline environments and accurately correlated pKa with reaction rates, selectivity, tautomerism, and pharmacokinetic properties. With its chemically intuitive design and computational efficiency, H-SPOC provides an efficient framework for rapid and precise micro- and supra-pKa predictions, offering significant potential in drug discovery, catalysis, and materials science.
Collapse
Affiliation(s)
- Siyuan Liu
- Center of Basic Molecular Science, Department of Chemistry, Tsinghua University, Beijing, 100084, China
| | - Qi Yang
- Center of Basic Molecular Science, Department of Chemistry, Tsinghua University, Beijing, 100084, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin, 300192, China
| | - Long Zhang
- Center of Basic Molecular Science, Department of Chemistry, Tsinghua University, Beijing, 100084, China
| | - Sanzhong Luo
- Center of Basic Molecular Science, Department of Chemistry, Tsinghua University, Beijing, 100084, China
| |
Collapse
|
6
|
Croitoru A, Kumar A, Lambry JC, Lee J, Sharif S, Yu W, MacKerell AD, Aleksandrov A. Increasing the Accuracy and Robustness of the CHARMM General Force Field with an Expanded Training Set. J Chem Theory Comput 2025; 21:3044-3065. [PMID: 40033678 PMCID: PMC11938330 DOI: 10.1021/acs.jctc.5c00046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Small molecule empirical force fields (FFs), including the CHARMM General Force Field (CGenFF), are designed to have wide coverage of organic molecules and to rapidly assign parameters to molecules not explicitly included in the FF. Assignment of parameters to new molecules in CGenFF is based on a trained bond-angle-dihedral charge increment linear interpolation scheme for the partial atomic charges along with bonded parameters assigned based on analogy using a rules-based penalty score scheme associated with atom types and chemical connectivity. Accordingly, the accuracy of CGenFF is related to the extent of the training set of available parameters. In the present study that training set is extended by 1390 molecules selected to represent connectivities new to CGenFF training compounds. Quantum mechanical (QM) data for optimized geometries, bond, valence angle, and dihedral angle potential energy scans, interactions with water, molecular dipole moments, and electrostatic potentials were used as target data. The resultant bonded parameters and partial atomic charges were used to train a new version of the CGenFF program, v5.0, which was used to generate parameters for a validation set of molecules, including drug-like molecules approved by the FDA, which were then benchmarked against both experimental and QM data. CGenFF v5.0 shows overall improvements with respect to QM intramolecular geometries, vibrations, dihedral potential energy scans, dipole moments and interactions with water. Tests of pure solvent properties of 216 molecules show small improvements versus the previous release of CGenFF v2.5.1 reflecting the high quality of the Lennard-Jones parameters that were explicitly optimized during the initial optimization of both the CGenFF and the CHARMM36 force field. CGenFF v5.0 represents an improvement that is anticipated to more accurately model intramolecular geometries and strain energies as well as noncovalent interactions of drug-like and other organic molecules.
Collapse
Affiliation(s)
- Anastasia Croitoru
- Laboratoire d’Optique et Biosciences (CNRS UMR7645,
INSERM U1182), Ecole Polytechnique, Institut polytechnique de Paris, F-91128
Palaiseau, France
- Department of Pharmaceutical Sciences, School of Pharmacy,
University of Maryland, 20 Penn Street, Baltimore, Maryland 21201, USA
| | - Anmol Kumar
- Department of Pharmaceutical Sciences, School of Pharmacy,
University of Maryland, 20 Penn Street, Baltimore, Maryland 21201, USA
| | - Jean-Christophe Lambry
- Laboratoire d’Optique et Biosciences (CNRS UMR7645,
INSERM U1182), Ecole Polytechnique, Institut polytechnique de Paris, F-91128
Palaiseau, France
| | - Jihyeon Lee
- Department of Pharmaceutical Sciences, School of Pharmacy,
University of Maryland, 20 Penn Street, Baltimore, Maryland 21201, USA
| | - Suliman Sharif
- Department of Pharmaceutical Sciences, School of Pharmacy,
University of Maryland, 20 Penn Street, Baltimore, Maryland 21201, USA
| | - Wenbo Yu
- Department of Pharmaceutical Sciences, School of Pharmacy,
University of Maryland, 20 Penn Street, Baltimore, Maryland 21201, USA
| | - Alexander D. MacKerell
- Department of Pharmaceutical Sciences, School of Pharmacy,
University of Maryland, 20 Penn Street, Baltimore, Maryland 21201, USA
| | - Alexey Aleksandrov
- Laboratoire d’Optique et Biosciences (CNRS UMR7645,
INSERM U1182), Ecole Polytechnique, Institut polytechnique de Paris, F-91128
Palaiseau, France
| |
Collapse
|
7
|
Lei YK, Yagi K, Sugita Y. Efficient Training of Neural Network Potentials for Chemical and Enzymatic Reactions by Continual Learning. J Chem Theory Comput 2025; 21:2695-2711. [PMID: 40065732 PMCID: PMC11912204 DOI: 10.1021/acs.jctc.4c01393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2024] [Revised: 02/04/2025] [Accepted: 02/05/2025] [Indexed: 03/19/2025]
Abstract
Machine learning (ML) methods have emerged as an efficient surrogate for high-level electronic structure theory, offering precision and computational efficiency. However, the vast conformational and chemical space remains challenging when constructing a general force field. Training data sets typically cover only a limited region of this space, resulting in poor extrapolation performance. Traditional strategies must address this problem by training models from scratch using old and new data sets. In addition, model transferability is crucial for general force field construction. Existing ML force fields, designed for closed systems with no external environmental potential, exhibit limited transferability to complex condensed phase systems such as enzymatic reactions, resulting in inferior performance and high memory costs. Our ML/MM model, based on the Taylor expansion of the electrostatic operator, showed high transferability between reactions in several simple solvents. This work extends the strategy to enzymatic reactions to explore the transferability between more complex heterogeneous environments. In addition, we also apply continual learning strategies based on memory data sets to enable autonomous and on-the-fly training on a continuous stream of new data. By combining these two methods, we can efficiently construct a force field that can be applied to chemical reactions in various environmental media.
Collapse
Affiliation(s)
- Yao-Kun Lei
- Theoretical
Molecular Science Laboratory, RIKEN Cluster
for Pioneering Research, Wako, Saitama 351-0198, Japan
- Computational
Biophysics Research Team, RIKEN Center for
Computational Science, Kobe, Hyogo 650-0047, Japan
- RIKEN
Interdisciplinary Theoretical and Mathematical Sciences Program (iTHEMS), Wako, Saitama 351-0198, Japan
| | - Kiyoshi Yagi
- Theoretical
Molecular Science Laboratory, RIKEN Cluster
for Pioneering Research, Wako, Saitama 351-0198, Japan
- Computational
Biophysics Research Team, RIKEN Center for
Computational Science, Kobe, Hyogo 650-0047, Japan
- Department
of Chemistry, Institute of Pure and Applied
Sciences, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8571, Japan
| | - Yuji Sugita
- Theoretical
Molecular Science Laboratory, RIKEN Cluster
for Pioneering Research, Wako, Saitama 351-0198, Japan
- Computational
Biophysics Research Team, RIKEN Center for
Computational Science, Kobe, Hyogo 650-0047, Japan
- RIKEN
Interdisciplinary Theoretical and Mathematical Sciences Program (iTHEMS), Wako, Saitama 351-0198, Japan
- Laboratory
for Biomolecular Function Simulation, RIKEN
Center for Biosystems Dynamics Research, Kobe, Hyogo 650-0047, Japan
| |
Collapse
|
8
|
Ghukasyan T, Altunyan V, Bughdaryan A, Aghajanyan T, Smbatyan K, Papoian GA, Petrosyan G. Smart distributed data factory volunteer computing platform for active learning-driven molecular data acquisition. Sci Rep 2025; 15:7122. [PMID: 40016468 PMCID: PMC11868574 DOI: 10.1038/s41598-025-90981-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2024] [Accepted: 02/17/2025] [Indexed: 03/01/2025] Open
Abstract
This paper presents the smart distributed data factory (SDDF), an AI-driven distributed computing platform designed to address challenges in drug discovery by creating comprehensive datasets of molecular conformations and their properties. SDDF uses volunteer computing, leveraging the processing power of personal computers worldwide to accelerate quantum chemistry (DFT) calculations. To tackle the vast chemical space and limited high-quality data, SDDF employs an ensemble of machine learning (ML) models to predict molecular properties and selectively choose the most challenging data points for further DFT calculations. The platform also generates new molecular conformations using molecular dynamics with the forces derived from these models. SDDF makes several contributions: the volunteer computing platform for DFT calculations; an active learning framework for constructing a dataset of molecular conformations; a large public dataset of diverse ENAMINE molecules with calculated energies; an ensemble of ML models for accurate energy prediction. The energy dataset was generated to validate the SDDF approach of reducing the need for extensive calculations. With its strict scaffold split, the dataset can be used for training and benchmarking energy models. By combining active learning, distributed computing, and quantum chemistry, SDDF offers a scalable, cost-effective solution for developing accurate molecular models and ultimately accelerating drug discovery.
Collapse
|
9
|
Airas J, Zhang B. Scaling Graph Neural Networks to Large Proteins. J Chem Theory Comput 2025; 21:2055-2066. [PMID: 39913331 PMCID: PMC11904306 DOI: 10.1021/acs.jctc.4c01420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/26/2025]
Abstract
Graph neural network (GNN) architectures have emerged as promising force field models, exhibiting high accuracy in predicting complex energies and forces based on atomic identities and Cartesian coordinates. To expand the applicability of GNNs, and machine learning force fields more broadly, optimizing their computational efficiency is critical, especially for large biomolecular systems in classical molecular dynamics simulations. In this study, we address key challenges in existing GNN benchmarks by introducing a dataset, DISPEF, which comprises large, biologically relevant proteins. DISPEF includes 207,454 proteins with sizes up to 12,499 atoms and features diverse chemical environments, spanning folded and disordered regions. The implicit solvation free energies, used as training targets, represent a particularly challenging case due to their many-body nature, providing a stringent test for evaluating the expressiveness of machine learning models. We benchmark the performance of seven GNNs on DISPEF, emphasizing the importance of directly accounting for long-range interactions to enhance model transferability. Additionally, we present a novel multiscale architecture, termed Schake, which delivers transferable and computationally efficient energy and force predictions for large proteins. Our findings offer valuable insights and tools for advancing GNNs in protein modeling applications.
Collapse
Affiliation(s)
| | - Bin Zhang
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
10
|
Poltavsky I, Puleva M, Charkin-Gorbulin A, Fonseca G, Batatia I, Browning NJ, Chmiela S, Cui M, Frank JT, Heinen S, Huang B, Käser S, Kabylda A, Khan D, Müller C, Price AJA, Riedmiller K, Töpfer K, Ko TW, Meuwly M, Rupp M, Csányi G, Anatole von Lilienfeld O, Margraf JT, Müller KR, Tkatchenko A. Crash testing machine learning force fields for molecules, materials, and interfaces: molecular dynamics in the TEA challenge 2023. Chem Sci 2025; 16:3738-3754. [PMID: 39911337 PMCID: PMC11791520 DOI: 10.1039/d4sc06530a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2024] [Accepted: 12/25/2024] [Indexed: 02/07/2025] Open
Abstract
We present the second part of the rigorous evaluation of modern machine learning force fields (MLFFs) within the TEA Challenge 2023. This study provides an in-depth analysis of the performance of MACE, SO3krates, sGDML, SOAP/GAP, and FCHL19* in modeling molecules, molecule-surface interfaces, and periodic materials. We compare observables obtained from molecular dynamics (MD) simulations using different MLFFs under identical conditions. Where applicable, density-functional theory (DFT) or experiment serves as a reference to reliably assess the performance of the ML models. In the absence of DFT benchmarks, we conduct a comparative analysis based on results from various MLFF architectures. Our findings indicate that, at the current stage of MLFF development, the choice of ML model is in the hands of the practitioner. When a problem falls within the scope of a given MLFF architecture, the resulting simulations exhibit weak dependency on the specific architecture used. Instead, emphasis should be placed on developing complete, reliable, and representative training datasets. Nonetheless, long-range noncovalent interactions remain challenging for all MLFF models, necessitating special caution in simulations of physical systems where such interactions are prominent, such as molecule-surface interfaces. The findings presented here reflect the state of MLFF models as of October 2023.
Collapse
Affiliation(s)
- Igor Poltavsky
- Department of Physics and Materials Science, University of Luxembourg L-1511 Luxembourg Luxembourg
| | - Mirela Puleva
- Department of Physics and Materials Science, University of Luxembourg L-1511 Luxembourg Luxembourg
- Institute for Advanced Studies, University of Luxembourg Campus Belval L-4365 Esch-sur-Alzette Luxembourg
| | - Anton Charkin-Gorbulin
- Department of Physics and Materials Science, University of Luxembourg L-1511 Luxembourg Luxembourg
- Laboratory for Chemistry of Novel Materials, University of Mons B-7000 Mons Belgium
| | - Grégory Fonseca
- Department of Physics and Materials Science, University of Luxembourg L-1511 Luxembourg Luxembourg
| | - Ilyes Batatia
- Department of Engineering, University of Cambridge Trumpington Street Cambridge CB2 1PZ UK
| | | | - Stefan Chmiela
- Machine Learning Group, Technical University Berlin Berlin Germany
- BIFOLD, Berlin Institute for the Foundations of Learning and Data Berlin Germany
| | - Mengnan Cui
- Fritz-Haber-Institut der Max-Planck-Gesellschaft Berlin Germany
| | - J Thorben Frank
- Machine Learning Group, Technical University Berlin Berlin Germany
- BIFOLD, Berlin Institute for the Foundations of Learning and Data Berlin Germany
| | - Stefan Heinen
- Vector Institute for Artificial Intelligence Toronto ON M5S 1M1 Canada
| | - Bing Huang
- Wuhan University, Department of Chemistry and Molecular Sciences 430072 Wuhan China
| | - Silvan Käser
- Department of Chemistry, University of Basel Klingelbergstrasse 80 CH-4056 Basel Switzerland
| | - Adil Kabylda
- Department of Physics and Materials Science, University of Luxembourg L-1511 Luxembourg Luxembourg
| | - Danish Khan
- Vector Institute for Artificial Intelligence Toronto ON M5S 1M1 Canada
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto St. George Campus Toronto ON Canada
| | - Carolin Müller
- Friedrich-Alexander-Universität Erlangen-Nürnberg, Computer-Chemistry-Center Nägelsbachstraße 25 91052 Erlangen Germany
| | - Alastair J A Price
- Department of Chemistry, University of Toronto St. George campus Toronto ON Canada
- Acceleration Consortium, University of Toronto 80 St George St Toronto ON M5S 3H6 Canada
| | - Kai Riedmiller
- Heidelberg Institute for Theoretical Studies Heidelberg Germany
| | - Kai Töpfer
- Department of Chemistry, University of Basel Klingelbergstrasse 80 CH-4056 Basel Switzerland
| | - Tsz Wai Ko
- Department of NanoEngineering, University of California San Diego 9500 Gilman Dr, Mail Code 0448 La Jolla CA 92093-0448 USA
| | - Markus Meuwly
- Department of Chemistry, University of Basel Klingelbergstrasse 80 CH-4056 Basel Switzerland
| | - Matthias Rupp
- Luxembourg Institute of Science and Technology (LIST) L-4362 Esch-sur-Alzette Luxembourg
| | - Gábor Csányi
- Department of Engineering, University of Cambridge Trumpington Street Cambridge CB2 1PZ UK
| | - O Anatole von Lilienfeld
- Machine Learning Group, Technical University Berlin Berlin Germany
- BIFOLD, Berlin Institute for the Foundations of Learning and Data Berlin Germany
- Vector Institute for Artificial Intelligence Toronto ON M5S 1M1 Canada
- Department of Chemistry, University of Toronto St. George campus Toronto ON Canada
- Acceleration Consortium, University of Toronto 80 St George St Toronto ON M5S 3H6 Canada
- Department of Materials Science and Engineering, University of Toronto St. George campus Toronto ON Canada
- Department of Physics, University of Toronto, St. George campus Toronto ON Canada
| | - Johannes T Margraf
- University of Bayreuth, Bavarian Center for Battery Technology (BayBatt) Bayreuth Germany
| | - Klaus-Robert Müller
- Machine Learning Group, Technical University Berlin Berlin Germany
- BIFOLD, Berlin Institute for the Foundations of Learning and Data Berlin Germany
- University of Bayreuth, Bavarian Center for Battery Technology (BayBatt) Bayreuth Germany
- Department of Artificial Intelligence, Korea University Seoul South Korea
- Max Planck Institut für Informatik Saarbrücken Germany
- Google DeepMind Berlin Germany
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg L-1511 Luxembourg Luxembourg
- Institute for Advanced Studies, University of Luxembourg Campus Belval L-4365 Esch-sur-Alzette Luxembourg
| |
Collapse
|
11
|
Poltavsky I, Charkin-Gorbulin A, Puleva M, Fonseca G, Batatia I, Browning NJ, Chmiela S, Cui M, Frank JT, Heinen S, Huang B, Käser S, Kabylda A, Khan D, Müller C, Price AJA, Riedmiller K, Töpfer K, Ko TW, Meuwly M, Rupp M, Csányi G, von Lilienfeld OA, Margraf JT, Müller KR, Tkatchenko A. Crash testing machine learning force fields for molecules, materials, and interfaces: model analysis in the TEA Challenge 2023. Chem Sci 2025; 16:3720-3737. [PMID: 39935506 PMCID: PMC11809572 DOI: 10.1039/d4sc06529h] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2024] [Accepted: 12/25/2024] [Indexed: 02/13/2025] Open
Abstract
Atomistic simulations are routinely employed in academia and industry to study the behavior of molecules, materials, and their interfaces. Central to these simulations are force fields (FFs), whose development is challenged by intricate interatomic interactions at different spatio-temporal scales and the vast expanse of chemical space. Machine learning (ML) FFs, trained on quantum-mechanical energies and forces, have shown the capacity to achieve sub-kcal (mol-1 Å-1) accuracy while maintaining computational efficiency. The TEA Challenge 2023 rigorously evaluated commonly used MLFFs across diverse applications, highlighting their strengths and weaknesses. Participants trained their models using provided datasets, and the results were systematically analyzed to assess the ability of MLFFs to reproduce potential energy surfaces, handle incomplete reference data, manage multi-component systems, and model complex periodic structures. This publication describes the datasets, outlines the proposed challenges, and presents a detailed analysis of the accuracy, stability, and efficiency of the MACE, SO3krates, sGDML, SOAP/GAP, and FCHL19* architectures in molecular dynamics simulations. The models represent the MLFF developers who participated in the TEA Challenge 2023. All results presented correspond to the state of the ML architectures as of October 2023. A comprehensive analysis of the molecular dynamics results obtained with different MLFFs will be presented in the second part of this manuscript.
Collapse
Affiliation(s)
- Igor Poltavsky
- Department of Physics and Materials Science, University of Luxembourg L-1511 Luxembourg Luxembourg
| | - Anton Charkin-Gorbulin
- Department of Physics and Materials Science, University of Luxembourg L-1511 Luxembourg Luxembourg
- Laboratory for Chemistry of Novel Materials, University of Mons B-7000 Mons Belgium
| | - Mirela Puleva
- Department of Physics and Materials Science, University of Luxembourg L-1511 Luxembourg Luxembourg
- Institute for Advanced Studies, University of Luxembourg Campus Belval L-4365 Esch-sur-Alzette Luxembourg
| | - Grégory Fonseca
- Department of Physics and Materials Science, University of Luxembourg L-1511 Luxembourg Luxembourg
| | - Ilyes Batatia
- Department of Engineering, University of Cambridge Trumpington Street Cambridge CB2 1PZ UK
| | | | - Stefan Chmiela
- Machine Learning Group, Technical University Berlin Berlin Germany
- BIFOLD, Berlin Institute for the Foundations of Learning and Data Berlin Germany
| | - Mengnan Cui
- Fritz-Haber-Institut der Max-Planck-Gesellschaft Berlin Germany
| | - J Thorben Frank
- Machine Learning Group, Technical University Berlin Berlin Germany
- BIFOLD, Berlin Institute for the Foundations of Learning and Data Berlin Germany
| | - Stefan Heinen
- Vector Institute for Artificial Intelligence Toronto ON M5S 1M1 Canada
| | - Bing Huang
- Wuhan University, Department of Chemistry and Molecular Sciences 430072 Wuhan China
| | - Silvan Käser
- Department of Chemistry, University of Basel Klingelbergstrasse 80 CH-4056 Basel Switzerland
| | - Adil Kabylda
- Department of Physics and Materials Science, University of Luxembourg L-1511 Luxembourg Luxembourg
| | - Danish Khan
- Vector Institute for Artificial Intelligence Toronto ON M5S 1M1 Canada
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto St George Campus Toronto ON Canada
| | - Carolin Müller
- Friedrich-Alexander-Universität Erlangen-Nürnberg, Computer-Chemistry-Center Nägelsbachstraße 25 91052 Erlangen Germany
| | - Alastair J A Price
- Department of Chemistry, University of Toronto St George Campus Toronto ON Canada
- Acceleration Consortium, University of Toronto 80 St George St Toronto ON M5S 3H6 Canada
| | - Kai Riedmiller
- Heidelberg Institute for Theoretical Studies Heidelberg Germany
| | - Kai Töpfer
- Department of Chemistry, University of Basel Klingelbergstrasse 80 CH-4056 Basel Switzerland
| | - Tsz Wai Ko
- Department of NanoEngineering, University of California San Diego 9500 Gilman Dr, Mail Code 0448 La Jolla CA 92093-0448 USA
| | - Markus Meuwly
- Department of Chemistry, University of Basel Klingelbergstrasse 80 CH-4056 Basel Switzerland
| | - Matthias Rupp
- Luxembourg Institute of Science and Technology (LIST) L-4362 Esch-sur-Alzette Luxembourg
| | - Gábor Csányi
- Department of Engineering, University of Cambridge Trumpington Street Cambridge CB2 1PZ UK
| | - O Anatole von Lilienfeld
- Machine Learning Group, Technical University Berlin Berlin Germany
- BIFOLD, Berlin Institute for the Foundations of Learning and Data Berlin Germany
- Vector Institute for Artificial Intelligence Toronto ON M5S 1M1 Canada
- Department of Chemistry, University of Toronto St George Campus Toronto ON Canada
- Acceleration Consortium, University of Toronto 80 St George St Toronto ON M5S 3H6 Canada
- Department of Materials Science and Engineering, University of Toronto St George Campus Toronto ON Canada
- Department of Physics, University of Toronto St George Campus Toronto ON Canada
| | - Johannes T Margraf
- University of Bayreuth, Bavarian Center for Battery Technology (BayBatt) Bayreuth Germany
| | - Klaus-Robert Müller
- Machine Learning Group, Technical University Berlin Berlin Germany
- BIFOLD, Berlin Institute for the Foundations of Learning and Data Berlin Germany
- Department of Artificial Intelligence, Korea University Seoul South Korea
- Max Planck Institut für Informatik Saarbrücken Germany
- Google DeepMind Berlin Germany
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg L-1511 Luxembourg Luxembourg
- Institute for Advanced Studies, University of Luxembourg Campus Belval L-4365 Esch-sur-Alzette Luxembourg
| |
Collapse
|
12
|
Chen J, Gao Q, Huang M, Yu K. Application of modern artificial intelligence techniques in the development of organic molecular force fields. Phys Chem Chem Phys 2025; 27:2294-2319. [PMID: 39820957 DOI: 10.1039/d4cp02989e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2025]
Abstract
The molecular force field (FF) determines the accuracy of molecular dynamics (MD) and is one of the major bottlenecks that limits the application of MD in molecular design. Recently, artificial intelligence (AI) techniques, such as machine-learning potentials (MLPs), have been rapidly reshaping the landscape of MD. Meanwhile, organic molecular systems feature unique characteristics, and require more careful treatment in both model construction, optimization, and validation. While an accurate and generic organic molecular force field is still missing, significant progress has been made with the facilitation of AI, warranting a promising future. In this review, we provide an overview of the various types of AI techniques used in molecular FF development and discuss both the advantages and weaknesses of these methodologies. We show how AI methods provide unprecedented capabilities in many tasks such as potential fitting, atom typification, and automatic optimization. Meanwhile, it is also worth noting that more efforts are needed to improve the transferability of the model, develop a more comprehensive database, and establish more standardized validation procedures. With these discussions, we hope to inspire more efforts to solve the existing problems, eventually leading to the birth of next-generation generic organic FFs.
Collapse
Affiliation(s)
- Junmin Chen
- Institute of Materials Research (IMR), Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China.
- Tsinghua-Berkeley Shenzhen Institute (TBSI), Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China
| | - Qian Gao
- Institute of Materials Research (IMR), Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China.
| | - Miaofei Huang
- Institute of Materials Research (IMR), Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China.
| | - Kuang Yu
- Institute of Materials Research (IMR), Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China.
- Tsinghua-Berkeley Shenzhen Institute (TBSI), Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China
| |
Collapse
|
13
|
Esders M, Schnake T, Lederer J, Kabylda A, Montavon G, Tkatchenko A, Müller KR. Analyzing Atomic Interactions in Molecules as Learned by Neural Networks. J Chem Theory Comput 2025; 21:714-729. [PMID: 39792788 PMCID: PMC11780731 DOI: 10.1021/acs.jctc.4c01424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2024] [Revised: 12/30/2024] [Accepted: 01/02/2025] [Indexed: 01/12/2025]
Abstract
While machine learning (ML) models have been able to achieve unprecedented accuracies across various prediction tasks in quantum chemistry, it is now apparent that accuracy on a test set alone is not a guarantee for robust chemical modeling such as stable molecular dynamics (MD). To go beyond accuracy, we use explainable artificial intelligence (XAI) techniques to develop a general analysis framework for atomic interactions and apply it to the SchNet and PaiNN neural network models. We compare these interactions with a set of fundamental chemical principles to understand how well the models have learned the underlying physicochemical concepts from the data. We focus on the strength of the interactions for different atomic species, how predictions for intensive and extensive quantum molecular properties are made, and analyze the decay and many-body nature of the interactions with interatomic distance. Models that deviate too far from known physical principles produce unstable MD trajectories, even when they have very high energy and force prediction accuracy. We also suggest further improvements to the ML architectures to better account for the polynomial decay of atomic interactions.
Collapse
Affiliation(s)
- Malte Esders
- BIFOLD—Berlin
Institute for the Foundations of Learning and Data, 10587 Berlin, Germany
- Machine
Learning Group, Berlin Institute of Technology, 10587 Berlin, Germany
| | - Thomas Schnake
- BIFOLD—Berlin
Institute for the Foundations of Learning and Data, 10587 Berlin, Germany
- Machine
Learning Group, Berlin Institute of Technology, 10587 Berlin, Germany
| | - Jonas Lederer
- BIFOLD—Berlin
Institute for the Foundations of Learning and Data, 10587 Berlin, Germany
- Machine
Learning Group, Berlin Institute of Technology, 10587 Berlin, Germany
| | - Adil Kabylda
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Grégoire Montavon
- BIFOLD—Berlin
Institute for the Foundations of Learning and Data, 10587 Berlin, Germany
- Machine
Learning Group, Berlin Institute of Technology, 10587 Berlin, Germany
- Department
of Mathematics and Computer Science, Free
University of Berlin, 14195 Berlin, Germany
| | - Alexandre Tkatchenko
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Klaus-Robert Müller
- BIFOLD—Berlin
Institute for the Foundations of Learning and Data, 10587 Berlin, Germany
- Machine
Learning Group, Berlin Institute of Technology, 10587 Berlin, Germany
- Google
Deepmind, 10963 Berlin, Germany
- Department
of Artificial Intelligence, Korea University, 136-713 Seoul, Korea
- Max
Planck Institute for Informatics, 66123 Saarbrücken, Germany
| |
Collapse
|
14
|
David R, de la Puente M, Gomez A, Anton O, Stirnemann G, Laage D. ArcaNN: automated enhanced sampling generation of training sets for chemically reactive machine learning interatomic potentials. DIGITAL DISCOVERY 2025; 4:54-72. [PMID: 39553851 PMCID: PMC11563209 DOI: 10.1039/d4dd00209a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2024] [Accepted: 10/21/2024] [Indexed: 11/19/2024]
Abstract
The emergence of artificial intelligence is profoundly impacting computational chemistry, particularly through machine-learning interatomic potentials (MLIPs). Unlike traditional potential energy surface representations, MLIPs overcome the conventional computational scaling limitations by offering an effective combination of accuracy and efficiency for calculating atomic energies and forces to be used in molecular simulations. These MLIPs have significantly enhanced molecular simulations across various applications, including large-scale simulations of materials, interfaces, chemical reactions, and beyond. Despite these advances, the construction of training datasets-a critical component for the accuracy of MLIPs-has not received proportional attention, especially in the context of chemical reactivity, which depends on rare barrier-crossing events that are not easily included in the datasets. Here we address this gap by introducing ArcaNN, a comprehensive framework designed for generating training datasets for reactive MLIPs. ArcaNN employs a concurrent learning approach combined with advanced sampling techniques to ensure an accurate representation of high-energy geometries. The framework integrates automated processes for iterative training, exploration, new configuration selection, and energy and force labeling, all while ensuring reproducibility and documentation. We demonstrate ArcaNN's capabilities through two paradigm reactions: a nucleophilic substitution and a Diels-Alder reaction. These examples showcase its effectiveness, the uniformly low error of the resulting MLIP everywhere along the chemical reaction coordinate, and its potential for broad applications in reactive molecular dynamics. Finally, we provide guidelines for assessing the quality of MLIPs in reactive systems.
Collapse
Affiliation(s)
- Rolf David
- PASTEUR, Département de Chimie, École Normale Supérieure, PSL University, Sorbonne Université, CNRS 75005 Paris France
| | - Miguel de la Puente
- PASTEUR, Département de Chimie, École Normale Supérieure, PSL University, Sorbonne Université, CNRS 75005 Paris France
| | - Axel Gomez
- PASTEUR, Département de Chimie, École Normale Supérieure, PSL University, Sorbonne Université, CNRS 75005 Paris France
| | - Olaia Anton
- PASTEUR, Département de Chimie, École Normale Supérieure, PSL University, Sorbonne Université, CNRS 75005 Paris France
| | - Guillaume Stirnemann
- PASTEUR, Département de Chimie, École Normale Supérieure, PSL University, Sorbonne Université, CNRS 75005 Paris France
| | - Damien Laage
- PASTEUR, Département de Chimie, École Normale Supérieure, PSL University, Sorbonne Université, CNRS 75005 Paris France
| |
Collapse
|
15
|
Thiemann FL, O'Neill N, Kapil V, Michaelides A, Schran C. Introduction to machine learning potentials for atomistic simulations. JOURNAL OF PHYSICS. CONDENSED MATTER : AN INSTITUTE OF PHYSICS JOURNAL 2024; 37:073002. [PMID: 39577092 DOI: 10.1088/1361-648x/ad9657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/16/2024] [Accepted: 11/22/2024] [Indexed: 11/24/2024]
Abstract
Machine learning potentials have revolutionised the field of atomistic simulations in recent years and are becoming a mainstay in the toolbox of computational scientists. This paper aims to provide an overview and introduction into machine learning potentials and their practical application to scientific problems. We provide a systematic guide for developing machine learning potentials, reviewing chemical descriptors, regression models, data generation and validation approaches. We begin with an emphasis on the earlier generation of models, such as high-dimensional neural network potentials and Gaussian approximation potentials, to provide historical perspective and guide the reader towards the understanding of recent developments, which are discussed in detail thereafter. Furthermore, we refer to relevant expert reviews, open-source software, and practical examples-further lowering the barrier to exploring these methods. The paper ends with selected showcase examples, highlighting the capabilities of machine learning potentials and how they can be applied to push the boundaries in atomistic simulations.
Collapse
Affiliation(s)
- Fabian L Thiemann
- IBM Research Europe, Daresbury, Warrington WA4 4AD, United Kingdom
- Cavendish Laboratory, Department of Physics, University of Cambridge, Cambridge CB3 0HE, United Kingdom
| | - Niamh O'Neill
- Cavendish Laboratory, Department of Physics, University of Cambridge, Cambridge CB3 0HE, United Kingdom
- Yusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
- Lennard-Jones Centre, University of Cambridge, Trinity Ln, Cambridge CB2 1TN, United Kingdom
| | - Venkat Kapil
- Yusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
- Lennard-Jones Centre, University of Cambridge, Trinity Ln, Cambridge CB2 1TN, United Kingdom
- Department of Physics and Astronomy, University College London, London, United Kingdom
- Thomas Young Centre and London Centre for Nanotechnology, London, United Kingdom
| | - Angelos Michaelides
- Yusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
- Lennard-Jones Centre, University of Cambridge, Trinity Ln, Cambridge CB2 1TN, United Kingdom
| | - Christoph Schran
- Cavendish Laboratory, Department of Physics, University of Cambridge, Cambridge CB3 0HE, United Kingdom
- Lennard-Jones Centre, University of Cambridge, Trinity Ln, Cambridge CB2 1TN, United Kingdom
| |
Collapse
|
16
|
Hou YF, Zhang Q, Dral PO. Surprising Dynamics Phenomena in the Diels-Alder Reaction of C 60 Uncovered with AI. J Org Chem 2024; 89:15041-15047. [PMID: 39358911 DOI: 10.1021/acs.joc.4c01763] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/04/2024]
Abstract
We performed an extensive artificial intelligence-accelerated quasi-classical molecular dynamics investigation of the time-resolved mechanism of the Diels-Alder reaction of fullerene C60 with 2,3-dimethyl-1,3-butadiene. In a substantial fraction (10%) of reactive trajectories, the larger C60 noncovalently attracts the 2,3-dimethyl-1,3-butadiene long before the barrier so that the diene undergoes the series of complex motions including roaming, somersaults, twisting, and twisting somersaults around the fullerene until it aligns itself to pass over the barrier. These complicated processes could be easily missed in typically performed quantum chemical simulations with shorter and fewer trajectories. After the barrier is passed, the bonds take longer to form compared to the simplest prototypical Diels-Alder reaction of ethene with 1,3-butadiene despite high similarities in transition states and barrier widths evaluated with intrinsic reaction coordinate (IRC) calculations. C60 is mainly responsible for these differences as its reaction with 1,3-butadiene is similar to the reaction with 2,3-dimethyl-1,3-butadiene: the only substantial difference being that the extra methyl groups double the probability of the prolonged alignment phase in dynamics. These additional calculations of C60 with 1,3-butadiene could be performed via active learning more easily by reusing the data generated for the other two reactions, showing the potential for larger-scale exploration of the effects of different substrates in the same types of reactions.
Collapse
Affiliation(s)
- Yi-Fan Hou
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
| | - Quanhao Zhang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
| | - Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
- Institute of Physics, Faculty of Physics, Astronomy, and Informatics, Nicolaus Copernicus University in Toruń, Ul. Grudziądzka 5, Toruń 87-100, Poland
| |
Collapse
|
17
|
Xu W, Xu H, Zhu M, Wen J. Ultrafast dynamics in spatially confined photoisomerization: accelerated simulations through machine learning models. Phys Chem Chem Phys 2024; 26:25994-26003. [PMID: 39370956 DOI: 10.1039/d4cp01497a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/08/2024]
Abstract
This study sheds light on the exploration of photoresponsive host-guest systems, highlighting the intricate interplay between confined spaces and photosensitive guest molecules. Conducting nonadiabatic molecular dynamics (NAMD) simulations based on electronic structure calculations for such large systems remains a formidable challenge. By leveraging machine learning (ML) as an accelerator for NAMD simulations, we analytically constructed excited-state potential energy surfaces along relevant collective variables to investigate photoisomerization processes efficiently. Combining the quantum mechanics/molecular mechanics (QM/MM) methodology with ML-based NAMD simulations, we elucidated the reaction pathways and identified the key degrees of freedom as reaction coordinates leading to conical intersections. A machine learning-based nonadiabatic dynamics model has been developed to compare the excited-state dynamics of the guest molecule, benzopyran, in both the gas phase and its behavior within the confined space of cucurbit[5]uril. This comparative analysis was designed to determine the influence of the environment on the photoisomerization rate of the guest molecule. The results underscore the effectiveness of ML models in simulating trajectory evolution in a cost-effective manner. This research offers a practical approach to accelerate NAMD simulations in large-scale systems of photochemical reactions, with potential applications in other host-guest complex systems.
Collapse
Affiliation(s)
- Weijia Xu
- State Key Laboratory for Modification of Chemical Fibers and Polymer Materials, College of Materials Science and Engineering, Donghua University, Shanghai 201620, China.
| | - Haoyang Xu
- State Key Laboratory for Modification of Chemical Fibers and Polymer Materials, College of Materials Science and Engineering, Donghua University, Shanghai 201620, China.
| | - Meifang Zhu
- State Key Laboratory for Modification of Chemical Fibers and Polymer Materials, College of Materials Science and Engineering, Donghua University, Shanghai 201620, China.
| | - Jin Wen
- State Key Laboratory for Modification of Chemical Fibers and Polymer Materials, College of Materials Science and Engineering, Donghua University, Shanghai 201620, China.
| |
Collapse
|
18
|
Zaporozhets I, Musil F, Kapil V, Clementi C. Accurate nuclear quantum statistics on machine-learned classical effective potentials. J Chem Phys 2024; 161:134102. [PMID: 39352405 DOI: 10.1063/5.0226764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Accepted: 09/13/2024] [Indexed: 10/03/2024] Open
Abstract
The contribution of nuclear quantum effects (NQEs) to the properties of various hydrogen-bound systems, including biomolecules, is increasingly recognized. Despite the development of many acceleration techniques, the computational overhead of incorporating NQEs in complex systems is sizable, particularly at low temperatures. In this work, we leverage deep learning and multiscale coarse-graining techniques to mitigate the computational burden of path integral molecular dynamics (PIMD). In particular, we employ a machine-learned potential to accurately represent corrections to classical potentials, thereby significantly reducing the computational cost of simulating NQEs. We validate our approach using four distinct systems: Morse potential, Zundel cation, single water molecule, and bulk water. Our framework allows us to accurately compute position-dependent static properties, as demonstrated by the excellent agreement obtained between the machine-learned potential and computationally intensive PIMD calculations, even in the presence of strong NQEs. This approach opens the way to the development of transferable machine-learned potentials capable of accurately reproducing NQEs in a wide range of molecular systems.
Collapse
Affiliation(s)
- Iryna Zaporozhets
- Department of Physics, Freie Universität Berlin, Arnimallee 12, 14195 Berlin, Germany
- Department of Chemistry, Rice University, Houston, Texas 77005, USA
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, USA
| | - Félix Musil
- Department of Physics, Freie Universität Berlin, Arnimallee 12, 14195 Berlin, Germany
| | - Venkat Kapil
- Yusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
- Department of Physics and Astronomy, University College, London WC1E 6BT, United Kingdom
- Thomas Young Centre and London Centre for Nanotechnology, London WC1E 6BT, United Kingdom
| | - Cecilia Clementi
- Department of Physics, Freie Universität Berlin, Arnimallee 12, 14195 Berlin, Germany
- Department of Chemistry, Rice University, Houston, Texas 77005, USA
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, USA
| |
Collapse
|
19
|
Isamura BK, Popelier PLA. Transfer learning of hyperparameters for fast construction of anisotropic GPR models: design and application to the machine-learned force field FFLUX. Phys Chem Chem Phys 2024; 26:23677-23691. [PMID: 39224929 PMCID: PMC11369757 DOI: 10.1039/d4cp01862a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2024] [Accepted: 08/22/2024] [Indexed: 09/04/2024]
Abstract
The polarisable machine-learned force field FFLUX requires pre-trained anisotropic Gaussian process regression (GPR) models of atomic energies and multipole moments to propagate unbiased molecular dynamics simulations. The outcome of FFLUX simulations is highly dependent on the predictive accuracy of the underlying models whose training entails determining the optimal set of model hyperparameters. Unfortunately, traditional direct learning (DL) procedures do not scale well on this task, especially when the hyperparameter search is initiated from a (set of) random guess solution(s). Additionally, the complexity of the hyperparameter space (HS) increases with the number of geometrical input features, at least for anisotropic kernels, making the optimization of hyperparameters even more challenging. In this study, we propose a transfer learning (TL) protocol that accelerates the training process of anisotropic GPR models by facilitating access to promising regions of the HS. The protocol is based on a seeding-relaxation mechanism in which an excellent guess solution is identified by rapidly building one or several small source models over a subset of the target training set before readjusting the previous guess over the entire set. We demonstrate the performance of this protocol by building and assessing the performance of DL and TL models of atomic energies and charges in various conformations of benzene, ethanol, formic acid dimer and the drug fomepizole. Our experiments suggest that TL models can be built one order of magnitude faster while preserving the quality of their DL analogs. Most importantly, when deployed in FFLUX simulations, TL models compete with or even outperform their DL analogs when it comes to performing FFLUX geometry optimization and computing harmonic vibrational modes.
Collapse
Affiliation(s)
- Bienfait K Isamura
- Department of Chemistry, The University of Manchester, Manchester, M13 9PL, UK.
| | - Paul L A Popelier
- Department of Chemistry, The University of Manchester, Manchester, M13 9PL, UK.
| |
Collapse
|
20
|
Hou YF, Zhang L, Zhang Q, Ge F, Dral PO. Physics-Informed Active Learning for Accelerating Quantum Chemical Simulations. J Chem Theory Comput 2024. [PMID: 39264419 DOI: 10.1021/acs.jctc.4c00821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/13/2024]
Abstract
Quantum chemical simulations can be greatly accelerated by constructing machine learning potentials, which is often done using active learning (AL). The usefulness of the constructed potentials is often limited by the high effort required and their insufficient robustness in the simulations. Here, we introduce the end-to-end AL for constructing robust data-efficient potentials with affordable investment of time and resources and minimum human interference. Our AL protocol is based on the physics-informed sampling of training points, automatic selection of initial data, uncertainty quantification, and convergence monitoring. The versatility of this protocol is shown in our implementation of quasi-classical molecular dynamics for simulating vibrational spectra, conformer search of a key biochemical molecule, and time-resolved mechanism of the Diels-Alder reaction. These investigations took us days instead of weeks of pure quantum chemical calculations on a high-performance computing cluster.
Collapse
Affiliation(s)
- Yi-Fan Hou
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
| | - Lina Zhang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
| | - Quanhao Zhang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
| | - Fuchun Ge
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
| | - Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
- Institute of Physics, Faculty of Physics, Astronomy, and Informatics, Nicolaus Copernicus University in Toruń, ul. Grudziądzka 5, Toruń 87-100, Poland
| |
Collapse
|
21
|
Tu NTP, Williamson S, Johnson ER, Rowley CN. Modeling Intermolecular Interactions with Exchange-Hole Dipole Moment Dispersion Corrections to Neural Network Potentials. J Phys Chem B 2024; 128:8290-8302. [PMID: 39166778 DOI: 10.1021/acs.jpcb.4c02882] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/23/2024]
Abstract
Neural network potentials (NNPs) are an innovative approach for calculating the potential energy and forces of a chemical system. In principle, these methods are capable of modeling large systems with an accuracy approaching that of a high-level ab initio calculation, but with a much smaller computational cost. Due to their training to density-functional theory (DFT) data and neglect of long-range interactions, some classes of NNPs require an additional term to include London dispersion physics. In this Perspective, we discuss the requirements for a dispersion model for use with an NNP, focusing on the MLXDM (Machine Learned eXchange-Hole Dipole Moment) model developed by our groups. This model is based on the DFT-based XDM dispersion correction, which calculates interatomic dispersion coefficients in terms of atomic moments and polarizabilities, both of which can be approximated effectively using neural networks.
Collapse
Affiliation(s)
| | - Siri Williamson
- Department of Chemistry, Carleton University, Ottawa, Ontario K1S 5B6, Canada
| | - Erin R Johnson
- Department of Chemistry, Dalhousie University, Halifax, Nova Scotia B3H 4J3, Canada
| | | |
Collapse
|
22
|
Jin Y, Perez-Lemus GR, Zubieta Rico PF, de Pablo JJ. Improving Machine Learned Force Fields for Complex Fluids through Enhanced Sampling: A Liquid Crystal Case Study. J Phys Chem A 2024; 128:7257-7268. [PMID: 39150905 DOI: 10.1021/acs.jpca.4c01546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/18/2024]
Abstract
Machine learned force fields offer the potential for faster execution times while retaining the accuracy of traditional DFT calculations, making them promising candidates for molecular simulations in cases where reliable classical force fields are not available. Some of the challenges associated with machine learned force fields include simulation stability over extended periods of time and ensuring that the statistical and dynamical properties of the underlying simulated systems are correctly captured. In this work, we propose a systematic training pipeline for such force fields that leads to improved model quality, compared to that achieved by traditional data generation and training approaches. That pipeline relies on the use of enhanced sampling techniques, and it is demonstrated here in the context of a liquid crystal, which exemplifies many of the challenges that are encountered in fluids and materials with complex free energy landscapes. Our results indicate that, whereas the majority of traditional machine learned force field training approaches lead to molecular dynamics simulations that are only stable over hundred-picosecond trajectories, our approach allows for stable simulations over tens of nanoseconds for organic molecular systems comprising thousands of atoms.
Collapse
Affiliation(s)
- Yezhi Jin
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637-1476, United States
| | - Gustavo R Perez-Lemus
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637-1476, United States
| | - Pablo F Zubieta Rico
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637-1476, United States
| | - Juan J de Pablo
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637-1476, United States
| |
Collapse
|
23
|
Williams CD, Kalayan J, Burton NA, Bryce RA. Stable and accurate atomistic simulations of flexible molecules using conformationally generalisable machine learned potentials. Chem Sci 2024; 15:12780-12795. [PMID: 39148799 PMCID: PMC11323334 DOI: 10.1039/d4sc01109k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Accepted: 07/07/2024] [Indexed: 08/17/2024] Open
Abstract
Computational simulation methods based on machine learned potentials (MLPs) promise to revolutionise shape prediction of flexible molecules in solution, but their widespread adoption has been limited by the way in which training data is generated. Here, we present an approach which allows the key conformational degrees of freedom to be properly represented in reference molecular datasets. MLPs trained on these datasets using a global descriptor scheme are generalisable in conformational space, providing quantum chemical accuracy for all conformers. These MLPs are capable of propagating long, stable molecular dynamics trajectories, an attribute that has remained a challenge. We deploy the MLPs in obtaining converged conformational free energy surfaces for flexible molecules via well-tempered metadynamics simulations; this approach provides a hitherto inaccessible route to accurately computing the structural, dynamical and thermodynamical properties of a wide variety of flexible molecular systems. It is further demonstrated that MLPs must be trained on reference datasets with complete coverage of conformational space, including in barrier regions, to achieve stable molecular dynamics trajectories.
Collapse
Affiliation(s)
- Christopher D Williams
- Division of Pharmacy and Optometry, School of Health Sciences, Faculty of Biology, Medicine and Health, The University of Manchester Oxford Road Manchester M13 9PL UK
| | - Jas Kalayan
- Science and Technologies Facilities Council (STFC), Daresbury Laboratory Keckwick Lane, Daresbury Warrington WA4 4AD UK
| | - Neil A Burton
- Department of Chemistry, School of Natural Sciences, Faculty of Science and Engineering, The University of Manchester Oxford Road Manchester M13 9PL UK
| | - Richard A Bryce
- Division of Pharmacy and Optometry, School of Health Sciences, Faculty of Biology, Medicine and Health, The University of Manchester Oxford Road Manchester M13 9PL UK
| |
Collapse
|
24
|
Bone RA, Chung MKJ, Ponder JW, Riccardi D, Muzny C, Sundararaman R, Schwarz K. A new method to calculate broadband dielectric spectra of solvents from molecular dynamics simulations demonstrated with polarizable force fields. J Chem Phys 2024; 161:064306. [PMID: 39132799 PMCID: PMC11324330 DOI: 10.1063/5.0217883] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Accepted: 07/26/2024] [Indexed: 08/13/2024] Open
Abstract
Simulating the dielectric spectra of solvents requires the nuanced definition of inter- and intra-molecular forces. Non-polarizable force fields, while thoroughly benchmarked for dielectric applications, do not capture all the spectral features of solvents, such as water. Conversely, polarizable force fields have been largely untested in the context of dielectric spectroscopy but include charge and dipole fluctuations that contribute to intermolecular interactions. We benchmark non-polarizable force fields and the polarizable force fields AMOEBA03 and HIPPO for liquid water and find that the polarizable force fields can capture all the experimentally observed spectral features with varying degrees of accuracy. However, the non-polarizable force fields miss at least one peak. To diagnose this deficiency, we decompose the liquid water spectra from polarizable force fields at multiple temperatures into static and induced dipole contributions and find that the peak originates from induced dipole contributions. Broadening our inquiry to other solvents parameterized with the AMOEBA09 force field, we demonstrate good agreement between the experimental and simulated dielectric spectra of methanol and formamide. To produce these spectra, we develop a new computational approach to calculate the dielectric spectrum via the fluctuation dissipation theorem. This method minimizes the error in both the low and high frequency portions of the spectrum, improving the overall accuracy of the simulated spectrum and broadening the computed frequency range.
Collapse
Affiliation(s)
| | - Moses K. J. Chung
- Department of Chemistry, Washington University in St. Louis, St. Louis, Missouri 63130, USA
| | - Jay W. Ponder
- Department of Chemistry, Washington University in St. Louis, St. Louis, Missouri 63130, USA
| | - Demian Riccardi
- Material Measurement Laboratory, National Institute of Standards and Technology, 325 Broadway, Boulder, Colorado 80305, USA
| | - Chris Muzny
- Material Measurement Laboratory, National Institute of Standards and Technology, 325 Broadway, Boulder, Colorado 80305, USA
| | - Ravishankar Sundararaman
- Department of Materials Science and Engineering, Rensselaer Polytechnic Institute, 110 8th St., Troy, New York 12180, USA
| | - Kathleen Schwarz
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr., Gaithersburg, Maryland 20899, USA
| |
Collapse
|
25
|
Litman Y, Kapil V, Feldman YMY, Tisi D, Begušić T, Fidanyan K, Fraux G, Higer J, Kellner M, Li TE, Pós ES, Stocco E, Trenins G, Hirshberg B, Rossi M, Ceriotti M. i-PI 3.0: A flexible and efficient framework for advanced atomistic simulations. J Chem Phys 2024; 161:062504. [PMID: 39140447 DOI: 10.1063/5.0215869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2024] [Accepted: 07/11/2024] [Indexed: 08/15/2024] Open
Abstract
Atomic-scale simulations have progressed tremendously over the past decade, largely thanks to the availability of machine-learning interatomic potentials. These potentials combine the accuracy of electronic structure calculations with the ability to reach extensive length and time scales. The i-PI package facilitates integrating the latest developments in this field with advanced modeling techniques thanks to a modular software architecture based on inter-process communication through a socket interface. The choice of Python for implementation facilitates rapid prototyping but can add computational overhead. In this new release, we carefully benchmarked and optimized i-PI for several common simulation scenarios, making such overhead negligible when i-PI is used to model systems up to tens of thousands of atoms using widely adopted machine learning interatomic potentials, such as Behler-Parinello, DeePMD, and MACE neural networks. We also present the implementation of several new features, including an efficient algorithm to model bosonic and fermionic exchange, a framework for uncertainty quantification to be used in conjunction with machine-learning potentials, a communication infrastructure that allows for deeper integration with electronic-driven simulations, and an approach to simulate coupled photon-nuclear dynamics in optical or plasmonic cavities.
Collapse
Affiliation(s)
- Yair Litman
- Y. Hamied Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | - Venkat Kapil
- Y. Hamied Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
- Department of Physics and Astronomy, University College London, 17-19 Gordon St, London WC1H 0AH, United Kingdom
- Thomas Young Centre and London Centre for Nanotechnology, 19 Gordon St, London WC1H 0AH, United Kingdom
| | | | - Davide Tisi
- Laboratory of Computational Science and Modeling, Institut des Matériaux, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Tomislav Begušić
- Div. of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, USA
| | - Karen Fidanyan
- MPI for the Structure and Dynamics of Matter, Hamburg, Germany
| | - Guillaume Fraux
- Laboratory of Computational Science and Modeling, Institut des Matériaux, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Jacob Higer
- School of Physics, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Matthias Kellner
- Laboratory of Computational Science and Modeling, Institut des Matériaux, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Tao E Li
- Department of Physics and Astronomy, University of Delaware, Newark, Delaware 19716, USA
| | - Eszter S Pós
- MPI for the Structure and Dynamics of Matter, Hamburg, Germany
| | - Elia Stocco
- MPI for the Structure and Dynamics of Matter, Hamburg, Germany
| | - George Trenins
- MPI for the Structure and Dynamics of Matter, Hamburg, Germany
| | - Barak Hirshberg
- School of Chemistry, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Mariana Rossi
- MPI for the Structure and Dynamics of Matter, Hamburg, Germany
| | - Michele Ceriotti
- Laboratory of Computational Science and Modeling, Institut des Matériaux, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| |
Collapse
|
26
|
Frank JT, Unke OT, Müller KR, Chmiela S. A Euclidean transformer for fast and stable machine learned force fields. Nat Commun 2024; 15:6539. [PMID: 39107296 PMCID: PMC11303804 DOI: 10.1038/s41467-024-50620-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 07/10/2024] [Indexed: 08/10/2024] Open
Abstract
Recent years have seen vast progress in the development of machine learned force fields (MLFFs) based on ab-initio reference calculations. Despite achieving low test errors, the reliability of MLFFs in molecular dynamics (MD) simulations is facing growing scrutiny due to concerns about instability over extended simulation timescales. Our findings suggest a potential connection between robustness to cumulative inaccuracies and the use of equivariant representations in MLFFs, but the computational cost associated with these representations can limit this advantage in practice. To address this, we propose a transformer architecture called SO3KRATES that combines sparse equivariant representations (Euclidean variables) with a self-attention mechanism that separates invariant and equivariant information, eliminating the need for expensive tensor products. SO3KRATES achieves a unique combination of accuracy, stability, and speed that enables insightful analysis of quantum properties of matter on extended time and system size scales. To showcase this capability, we generate stable MD trajectories for flexible peptides and supra-molecular structures with hundreds of atoms. Furthermore, we investigate the PES topology for medium-sized chainlike molecules (e.g., small peptides) by exploring thousands of minima. Remarkably, SO3KRATES demonstrates the ability to strike a balance between the conflicting demands of stability and the emergence of new minimum-energy conformations beyond the training data, which is crucial for realistic exploration tasks in the field of biochemistry.
Collapse
Affiliation(s)
- J Thorben Frank
- Machine Learning Group, TU Berlin, Berlin, Germany
- BIFOLD, Berlin Institute for the Foundations of Learning and Data, Berlin, Germany
| | | | - Klaus-Robert Müller
- Machine Learning Group, TU Berlin, Berlin, Germany.
- BIFOLD, Berlin Institute for the Foundations of Learning and Data, Berlin, Germany.
- Google DeepMind, Berlin, Germany.
- Department of Artificial Intelligence, Korea University, Seoul, Korea.
- Max Planck Institut für Informatik, Saarbrücken, Germany.
| | - Stefan Chmiela
- Machine Learning Group, TU Berlin, Berlin, Germany.
- BIFOLD, Berlin Institute for the Foundations of Learning and Data, Berlin, Germany.
| |
Collapse
|
27
|
Margraf JT. Neural graph distance embedding for molecular geometry generation. J Comput Chem 2024; 45:1784-1790. [PMID: 38655845 DOI: 10.1002/jcc.27349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 03/05/2024] [Accepted: 03/08/2024] [Indexed: 04/26/2024]
Abstract
This article introduces neural graph distance embedding (nGDE), a method for generating 3D molecular geometries. Leveraging a graph neural network trained on the OE62 dataset of molecular geometries, nGDE predicts interatomic distances based on molecular graphs. These distances are then used in multidimensional scaling to produce 3D geometries, subsequently refined with standard bioorganic forcefields. The machine learning-based graph distance introduced herein is found to be an improvement over the conventional shortest path distances used in graph drawing. Comparative analysis with a state-of-the-art distance geometry method demonstrates nGDE's competitive performance, particularly showcasing robustness in handling polycyclic molecules-a challenge for existing methods.
Collapse
Affiliation(s)
- Johannes T Margraf
- Bavarian Center for Battery Technology (BayBatt), University of Bayreuth, Bayreuth, Germany
| |
Collapse
|
28
|
Biriukov D, Vácha R. Pathways to a Shiny Future: Building the Foundation for Computational Physical Chemistry and Biophysics in 2050. ACS PHYSICAL CHEMISTRY AU 2024; 4:302-313. [PMID: 39069976 PMCID: PMC11274290 DOI: 10.1021/acsphyschemau.4c00003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2024] [Revised: 03/15/2024] [Accepted: 03/18/2024] [Indexed: 07/30/2024]
Abstract
In the last quarter-century, the field of molecular dynamics (MD) has undergone a remarkable transformation, propelled by substantial enhancements in software, hardware, and underlying methodologies. In this Perspective, we contemplate the future trajectory of MD simulations and their possible look at the year 2050. We spotlight the pivotal role of artificial intelligence (AI) in shaping the future of MD and the broader field of computational physical chemistry. We outline critical strategies and initiatives that are essential for the seamless integration of such technologies. Our discussion delves into topics like multiscale modeling, adept management of ever-increasing data deluge, the establishment of centralized simulation databases, and the autonomous refinement, cross-validation, and self-expansion of these repositories. The successful implementation of these advancements requires scientific transparency, a cautiously optimistic approach to interpreting AI-driven simulations and their analysis, and a mindset that prioritizes knowledge-motivated research alongside AI-enhanced big data exploration. While history reminds us that the trajectory of technological progress can be unpredictable, this Perspective offers guidance on preparedness and proactive measures, aiming to steer future advancements in the most beneficial and successful direction.
Collapse
Affiliation(s)
- Denys Biriukov
- CEITEC
− Central European Institute of Technology, Masaryk University, Kamenice 753/5, 625 00 Brno, Czech Republic
- National
Centre for Biomolecular Research, Faculty of Science, Masaryk University, Kamenice 753/5, 625 00 Brno, Czech Republic
| | - Robert Vácha
- CEITEC
− Central European Institute of Technology, Masaryk University, Kamenice 753/5, 625 00 Brno, Czech Republic
- National
Centre for Biomolecular Research, Faculty of Science, Masaryk University, Kamenice 753/5, 625 00 Brno, Czech Republic
- Department
of Condensed Matter Physics, Faculty of Science, Masaryk University, Kotlářská 267/2, 611 37 Brno, Czech
Republic
| |
Collapse
|
29
|
Slootman E, Poltavsky I, Shinde R, Cocomello J, Moroni S, Tkatchenko A, Filippi C. Accurate Quantum Monte Carlo Forces for Machine-Learned Force Fields: Ethanol as a Benchmark. J Chem Theory Comput 2024; 20:6020-6027. [PMID: 39003522 PMCID: PMC11270822 DOI: 10.1021/acs.jctc.4c00498] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Revised: 05/31/2024] [Accepted: 06/03/2024] [Indexed: 07/15/2024]
Abstract
Quantum Monte Carlo (QMC) is a powerful method to calculate accurate energies and forces for molecular systems. In this work, we demonstrate how we can obtain accurate QMC forces for the fluxional ethanol molecule at room temperature by using either multideterminant Jastrow-Slater wave functions in variational Monte Carlo or just a single determinant in diffusion Monte Carlo. The excellent performance of our protocols is assessed against high-level coupled cluster calculations on a diverse set of representative configurations of the system. Finally, we train machine-learning force fields on the QMC forces and compare them to models trained on coupled cluster reference data, showing that a force field based on the diffusion Monte Carlo forces with a single determinant can faithfully reproduce coupled cluster power spectra in molecular dynamics simulations.
Collapse
Affiliation(s)
- E. Slootman
- MESA+
Institute for Nanotechnology, University
of Twente, P.O. Box 217,
7500 AE Enschede, The Netherlands
| | - I. Poltavsky
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - R. Shinde
- MESA+
Institute for Nanotechnology, University
of Twente, P.O. Box 217,
7500 AE Enschede, The Netherlands
| | - J. Cocomello
- MESA+
Institute for Nanotechnology, University
of Twente, P.O. Box 217,
7500 AE Enschede, The Netherlands
| | - S. Moroni
- CNR-IOM
DEMOCRITOS, Istituto Officina dei Materiali,
and SISSA Scuola Internazionale Superiore di Studi Avanzati, Via Bonomea 265, I-34136 Trieste, Italy
| | - A. Tkatchenko
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - C. Filippi
- MESA+
Institute for Nanotechnology, University
of Twente, P.O. Box 217,
7500 AE Enschede, The Netherlands
| |
Collapse
|
30
|
Medrano Sandonas L, Van Rompaey D, Fallani A, Hilfiker M, Hahn D, Perez-Benito L, Verhoeven J, Tresadern G, Kurt Wegner J, Ceulemans H, Tkatchenko A. Dataset for quantum-mechanical exploration of conformers and solvent effects in large drug-like molecules. Sci Data 2024; 11:742. [PMID: 38972891 PMCID: PMC11228031 DOI: 10.1038/s41597-024-03521-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Accepted: 06/13/2024] [Indexed: 07/09/2024] Open
Abstract
We here introduce the Aquamarine (AQM) dataset, an extensive quantum-mechanical (QM) dataset that contains the structural and electronic information of 59,783 low-and high-energy conformers of 1,653 molecules with a total number of atoms ranging from 2 to 92 (mean: 50.9), and containing up to 54 (mean: 28.2) non-hydrogen atoms. To gain insights into the solvent effects as well as collective dispersion interactions for drug-like molecules, we have performed QM calculations supplemented with a treatment of many-body dispersion (MBD) interactions of structures and properties in the gas phase and implicit water. Thus, AQM contains over 40 global and local physicochemical properties (including ground-state and response properties) per conformer computed at the tightly converged PBE0+MBD level of theory for gas-phase molecules, whereas PBE0+MBD with the modified Poisson-Boltzmann (MPB) model of water was used for solvated molecules. By addressing both molecule-solvent and dispersion interactions, AQM dataset can serve as a challenging benchmark for state-of-the-art machine learning methods for property modeling and de novo generation of large (solvated) molecules with pharmaceutical and biological relevance.
Collapse
Affiliation(s)
- Leonardo Medrano Sandonas
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg.
- Institute for Materials Science and Max Bergmann Center of Biomaterials, TU Dresden, 01062, Dresden, Germany.
| | - Dries Van Rompaey
- Drug Discovery Data Sciences (D3S), Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium.
| | - Alessio Fallani
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg
- Drug Discovery Data Sciences (D3S), Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Mathias Hilfiker
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg
| | - David Hahn
- Computational Chemistry, Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Laura Perez-Benito
- Computational Chemistry, Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Jonas Verhoeven
- Drug Discovery Data Sciences (D3S), Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Gary Tresadern
- Computational Chemistry, Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Joerg Kurt Wegner
- Drug Discovery Data Sciences (D3S), Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium
- Drug Discovery Data Sciences (D3S), Johnson & Johnson Innovative Medicine, 301 Binney Street, MA 02142, Cambridge, USA
| | - Hugo Ceulemans
- Drug Discovery Data Sciences (D3S), Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg.
| |
Collapse
|
31
|
Aldossary A, Campos-Gonzalez-Angulo JA, Pablo-García S, Leong SX, Rajaonson EM, Thiede L, Tom G, Wang A, Avagliano D, Aspuru-Guzik A. In Silico Chemical Experiments in the Age of AI: From Quantum Chemistry to Machine Learning and Back. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2024; 36:e2402369. [PMID: 38794859 DOI: 10.1002/adma.202402369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 04/28/2024] [Indexed: 05/26/2024]
Abstract
Computational chemistry is an indispensable tool for understanding molecules and predicting chemical properties. However, traditional computational methods face significant challenges due to the difficulty of solving the Schrödinger equations and the increasing computational cost with the size of the molecular system. In response, there has been a surge of interest in leveraging artificial intelligence (AI) and machine learning (ML) techniques to in silico experiments. Integrating AI and ML into computational chemistry increases the scalability and speed of the exploration of chemical space. However, challenges remain, particularly regarding the reproducibility and transferability of ML models. This review highlights the evolution of ML in learning from, complementing, or replacing traditional computational chemistry for energy and property predictions. Starting from models trained entirely on numerical data, a journey set forth toward the ideal model incorporating or learning the physical laws of quantum mechanics. This paper also reviews existing computational methods and ML models and their intertwining, outlines a roadmap for future research, and identifies areas for improvement and innovation. Ultimately, the goal is to develop AI architectures capable of predicting accurate and transferable solutions to the Schrödinger equation, thereby revolutionizing in silico experiments within chemistry and materials science.
Collapse
Affiliation(s)
- Abdulrahman Aldossary
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
| | | | - Sergio Pablo-García
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
- Department of Computer Science, University of Toronto, 40 St. George Street, Toronto, ON, M5S 2E4, Canada
| | - Shi Xuan Leong
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
| | - Ella Miray Rajaonson
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
- Vector Institute for Artificial Intelligence, 661 University Ave. Suite 710, Toronto, ON, M5G 1M1, Canada
| | - Luca Thiede
- Department of Computer Science, University of Toronto, 40 St. George Street, Toronto, ON, M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, 661 University Ave. Suite 710, Toronto, ON, M5G 1M1, Canada
| | - Gary Tom
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
- Vector Institute for Artificial Intelligence, 661 University Ave. Suite 710, Toronto, ON, M5G 1M1, Canada
| | - Andrew Wang
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
| | - Davide Avagliano
- Chimie ParisTech, PSL University, CNRS, Institute of Chemistry for Life and Health Sciences (iCLeHS UMR 8060), Paris, F-75005, France
| | - Alán Aspuru-Guzik
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
- Department of Computer Science, University of Toronto, 40 St. George Street, Toronto, ON, M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, 661 University Ave. Suite 710, Toronto, ON, M5G 1M1, Canada
- Department of Materials Science & Engineering, University of Toronto, 184 College St., Toronto, ON, M5S 3E4, Canada
- Department of Chemical Engineering & Applied Chemistry, University of Toronto, 200 College St., Toronto, ON, M5S 3E5, Canada
- Lebovic Fellow, Canadian Institute for Advanced Research (CIFAR), 66118 University Ave., Toronto, M5G 1M1, Canada
- Acceleration Consortium, 80 St George St, Toronto, M5S 3H6, Canada
| |
Collapse
|
32
|
Fu W, Mo Y, Xiao Y, Liu C, Zhou F, Wang Y, Zhou J, Zhang YJ. Enhancing Molecular Energy Predictions with Physically Constrained Modifications to the Neural Network Potential. J Chem Theory Comput 2024; 20:4533-4544. [PMID: 38828925 DOI: 10.1021/acs.jctc.3c01181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/05/2024]
Abstract
Exclusively prioritizing the precision of energy prediction frequently proves inadequate in satisfying multifaceted requirements. A heightened focus is warranted on assessing the rationality of potential energy curves predicted by machine learning-based force fields (MLFFs), alongside evaluating the pragmatic utility of these MLFFs. This study introduces SWANI, an optimized neural network potential stemming from the ANI framework. Through the incorporation of supplementary physical constraints, SWANI aligns more cohesively with chemical expectations, yielding rational potential energy profiles. It also exhibits superior predictive precision compared with that of the ANI model. Additionally, a comprehensive comparison is conducted between SWANI and a prominent graph neural network-based model. The findings indicate that SWANI outperforms the latter, particularly for molecules exceeding the dimensions of the training set. This outcome underscores SWANI's exceptional capacity for generalization and its proficiency in handling larger molecular systems.
Collapse
Affiliation(s)
- Weiqiang Fu
- Beijing StoneWise Technology Co., Ltd., Haidian Street 15, Haidian District, Beijing 100080, China
| | - Yujie Mo
- Beijing StoneWise Technology Co., Ltd., Haidian Street 15, Haidian District, Beijing 100080, China
| | - Yi Xiao
- Beijing StoneWise Technology Co., Ltd., Haidian Street 15, Haidian District, Beijing 100080, China
| | - Chang Liu
- Beijing StoneWise Technology Co., Ltd., Haidian Street 15, Haidian District, Beijing 100080, China
| | - Feng Zhou
- Beijing StoneWise Technology Co., Ltd., Haidian Street 15, Haidian District, Beijing 100080, China
| | - Yang Wang
- Beijing StoneWise Technology Co., Ltd., Haidian Street 15, Haidian District, Beijing 100080, China
| | - Jielong Zhou
- Beijing StoneWise Technology Co., Ltd., Haidian Street 15, Haidian District, Beijing 100080, China
| | - Yingsheng J Zhang
- Beijing StoneWise Technology Co., Ltd., Haidian Street 15, Haidian District, Beijing 100080, China
| |
Collapse
|
33
|
Pelaez RP, Simeon G, Galvelis R, Mirarchi A, Eastman P, Doerr S, Thölke P, Markland TE, De Fabritiis G. TorchMD-Net 2.0: Fast Neural Network Potentials for Molecular Simulations. J Chem Theory Comput 2024; 20:4076-4087. [PMID: 38743033 DOI: 10.1021/acs.jctc.4c00253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Achieving a balance between computational speed, prediction accuracy, and universal applicability in molecular simulations has been a persistent challenge. This paper presents substantial advancements in TorchMD-Net software, a pivotal step forward in the shift from conventional force fields to neural network-based potentials. The evolution of TorchMD-Net into a more comprehensive and versatile framework is highlighted, incorporating cutting-edge architectures such as TensorNet. This transformation is achieved through a modular design approach, encouraging customized applications within the scientific community. The most notable enhancement is a significant improvement in computational efficiency, achieving a very remarkable acceleration in the computation of energy and forces for TensorNet models, with performance gains ranging from 2× to 10× over previous, nonoptimized, iterations. Other enhancements include highly optimized neighbor search algorithms that support periodic boundary conditions and smooth integration with existing molecular dynamics frameworks. Additionally, the updated version introduces the capability to integrate physical priors, further enriching its application spectrum and utility in research. The software is available at https://github.com/torchmd/torchmd-net.
Collapse
Affiliation(s)
- Raul P Pelaez
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Guillem Simeon
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Raimondas Galvelis
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
- Acellera Labs, C Dr Trueta 183, 08005 Barcelona, Spain
| | - Antonio Mirarchi
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Peter Eastman
- Department of Chemistry, Stanford University, Stanford, California 94305, United States
| | - Stefan Doerr
- Acellera Labs, C Dr Trueta 183, 08005 Barcelona, Spain
| | | | - Thomas E Markland
- Department of Chemistry, Stanford University, Stanford, California 94305, United States
| | - Gianni De Fabritiis
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
- Acellera Labs, C Dr Trueta 183, 08005 Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluis Companys 23, 08010 Barcelona, Spain
| |
Collapse
|
34
|
Pelaez RP, Simeon G, Galvelis R, Mirarchi A, Eastman P, Doerr S, Thölke P, Markland TE, De Fabritiis G. TorchMD-Net 2.0: Fast Neural Network Potentials for Molecular Simulations. ARXIV 2024:arXiv:2402.17660v3. [PMID: 38463504 PMCID: PMC10925388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Achieving a balance between computational speed, prediction accuracy, and universal applicability in molecular simulations has been a persistent challenge. This paper presents substantial advancements in the TorchMD-Net software, a pivotal step forward in the shift from conventional force fields to neural network-based potentials. The evolution of TorchMD-Net into a more comprehensive and versatile framework is highlighted, incorporating cutting-edge architectures such as TensorNet. This transformation is achieved through a modular design approach, encouraging customized applications within the scientific community. The most notable enhancement is a significant improvement in computational efficiency, achieving a very remarkable acceleration in the computation of energy and forces for Tensor-Net models, with performance gains ranging from 2x to 10x over previous, non-optimized, iterations. Other enhancements include highly optimized neighbor search algorithms that support periodic boundary conditions and smooth integration with existing molecular dynamics frameworks. Additionally, the updated version introduces the capability to integrate physical priors, further enriching its application spectrum and utility in research. The software is available at https://github.com/torchmd/torchmd-net.
Collapse
Affiliation(s)
- Raul P Pelaez
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Guillem Simeon
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Raimondas Galvelis
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
- Acellera Labs, C Dr Trueta 183, 08005, Barcelona, Spain
| | - Antonio Mirarchi
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Peter Eastman
- Department of Chemistry, Stanford University, Stanford, CA 94305, USA
| | - Stefan Doerr
- Acellera Labs, C Dr Trueta 183, 08005, Barcelona, Spain
| | | | - Thomas E Markland
- Department of Chemistry, Stanford University, Stanford, CA 94305, USA
| | - Gianni De Fabritiis
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
- Acellera Labs, C Dr Trueta 183, 08005, Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluis Companys 23, 08010 Barcelona, Spain
| |
Collapse
|
35
|
Wan K, He J, Shi X. Construction of High Accuracy Machine Learning Interatomic Potential for Surface/Interface of Nanomaterials-A Review. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2024; 36:e2305758. [PMID: 37640376 DOI: 10.1002/adma.202305758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 08/24/2023] [Indexed: 08/31/2023]
Abstract
The inherent discontinuity and unique dimensional attributes of nanomaterial surfaces and interfaces bestow them with various exceptional properties. These properties, however, also introduce difficulties for both experimental and computational studies. The advent of machine learning interatomic potential (MLIP) addresses some of the limitations associated with empirical force fields, presenting a valuable avenue for accurate simulations of these surfaces/interfaces of nanomaterials. Central to this approach is the idea of capturing the relationship between system configuration and potential energy, leveraging the proficiency of machine learning (ML) to precisely approximate high-dimensional functions. This review offers an in-depth examination of MLIP principles and their execution and elaborates on their applications in the realm of nanomaterial surface and interface systems. The prevailing challenges faced by this potent methodology are also discussed.
Collapse
Affiliation(s)
- Kaiwei Wan
- Laboratory of Theoretical and Computational Nanoscience, National Center for Nanoscience and Technology, Chinese Academy of Sciences, Beijing, 100190, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Jianxin He
- Laboratory of Theoretical and Computational Nanoscience, National Center for Nanoscience and Technology, Chinese Academy of Sciences, Beijing, 100190, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Xinghua Shi
- Laboratory of Theoretical and Computational Nanoscience, National Center for Nanoscience and Technology, Chinese Academy of Sciences, Beijing, 100190, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| |
Collapse
|
36
|
Chen M, Jiang X, Zhang L, Chen X, Wen Y, Gu Z, Li X, Zheng M. The emergence of machine learning force fields in drug design. Med Res Rev 2024; 44:1147-1182. [PMID: 38173298 DOI: 10.1002/med.22008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Revised: 11/29/2023] [Accepted: 12/05/2023] [Indexed: 01/05/2024]
Abstract
In the field of molecular simulation for drug design, traditional molecular mechanic force fields and quantum chemical theories have been instrumental but limited in terms of scalability and computational efficiency. To overcome these limitations, machine learning force fields (MLFFs) have emerged as a powerful tool capable of balancing accuracy with efficiency. MLFFs rely on the relationship between molecular structures and potential energy, bypassing the need for a preconceived notion of interaction representations. Their accuracy depends on the machine learning models used, and the quality and volume of training data sets. With recent advances in equivariant neural networks and high-quality datasets, MLFFs have significantly improved their performance. This review explores MLFFs, emphasizing their potential in drug design. It elucidates MLFF principles, provides development and validation guidelines, and highlights successful MLFF implementations. It also addresses potential challenges in developing and applying MLFFs. The review concludes by illuminating the path ahead for MLFFs, outlining the challenges to be overcome and the opportunities to be harnessed. This inspires researchers to embrace MLFFs in their investigations as a new tool to perform molecular simulations in drug design.
Collapse
Affiliation(s)
- Mingan Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Physical Science and Technology, ShanghaiTech University, Shanghai, China
- Lingang Laboratory, Shanghai, China
| | - Xinyu Jiang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
| | - Lehan Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
| | - Xiaoxu Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, China
| | - Yiming Wen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, China
| | - Zhiyong Gu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, China
| |
Collapse
|
37
|
Zhai Y, Rashmi R, Palos E, Paesani F. Many-body interactions and deep neural network potentials for water. J Chem Phys 2024; 160:144501. [PMID: 38587225 DOI: 10.1063/5.0203682] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Accepted: 03/23/2024] [Indexed: 04/09/2024] Open
Abstract
We present a detailed assessment of deep neural network potentials developed within the Deep Potential Molecular Dynamics (DeePMD) framework and trained on the MB-pol data-driven many-body potential energy function. Specific focus is directed at the ability of DeePMD-based potentials to correctly reproduce the accuracy of MB-pol across various water systems. Analyses of bulk and interfacial properties as well as many-body interactions characteristic of water elucidate inherent limitations in the transferability and predictive accuracy of DeePMD-based potentials. These limitations can be traced back to an incomplete implementation of the "nearsightedness of electronic matter" principle, which may be common throughout machine learning potentials that do not include a proper representation of self-consistently determined long-range electric fields. These findings provide further support for the "short-blanket dilemma" faced by DeePMD-based potentials, highlighting the challenges in achieving a balance between computational efficiency and a rigorous, physics-based representation of the properties of water. Finally, we believe that our study contributes to the ongoing discourse on the development and application of machine learning models in simulating water systems, offering insights that could guide future improvements in the field.
Collapse
Affiliation(s)
- Yaoguang Zhai
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, California 92093, USA
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, California 92093, USA
| | - Richa Rashmi
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, California 92093, USA
| | - Etienne Palos
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, California 92093, USA
| | - Francesco Paesani
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, California 92093, USA
- Materials Science and Engineering, University of California San Diego, La Jolla, California 92093, USA
- Halicioğlu Data Science Institute, University of California San Diego, La Jolla, California 92093, USA
- San Diego Supercomputer Center, University of California San Diego, La Jolla, California 92093, USA
| |
Collapse
|
38
|
Unke OT, Stöhr M, Ganscha S, Unterthiner T, Maennel H, Kashubin S, Ahlin D, Gastegger M, Medrano Sandonas L, Berryman JT, Tkatchenko A, Müller KR. Biomolecular dynamics with machine-learned quantum-mechanical force fields trained on diverse chemical fragments. SCIENCE ADVANCES 2024; 10:eadn4397. [PMID: 38579003 PMCID: PMC11809612 DOI: 10.1126/sciadv.adn4397] [Citation(s) in RCA: 24] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/10/2023] [Accepted: 02/29/2024] [Indexed: 04/07/2024]
Abstract
The GEMS method enables molecular dynamics simulations of large heterogeneous systems at ab initio quality.
Collapse
Affiliation(s)
- Oliver T. Unke
- Google DeepMind, Tucholskystraße 2, 10117 Berlin, Germany and Brandschenkestrasse 110, 8002 Zürich, Switzerland
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
- DFG Cluster of Excellence “Unifying Systems in Catalysis” (UniSysCat), Technische Universität Berlin, 10623 Berlin, Germany
| | - Martin Stöhr
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Stefan Ganscha
- Google DeepMind, Tucholskystraße 2, 10117 Berlin, Germany and Brandschenkestrasse 110, 8002 Zürich, Switzerland
| | - Thomas Unterthiner
- Google DeepMind, Tucholskystraße 2, 10117 Berlin, Germany and Brandschenkestrasse 110, 8002 Zürich, Switzerland
| | - Hartmut Maennel
- Google DeepMind, Tucholskystraße 2, 10117 Berlin, Germany and Brandschenkestrasse 110, 8002 Zürich, Switzerland
| | - Sergii Kashubin
- Google DeepMind, Tucholskystraße 2, 10117 Berlin, Germany and Brandschenkestrasse 110, 8002 Zürich, Switzerland
| | - Daniel Ahlin
- Google DeepMind, Tucholskystraße 2, 10117 Berlin, Germany and Brandschenkestrasse 110, 8002 Zürich, Switzerland
| | - Michael Gastegger
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
- DFG Cluster of Excellence “Unifying Systems in Catalysis” (UniSysCat), Technische Universität Berlin, 10623 Berlin, Germany
- BASLEARN — TU Berlin/BASF Joint Lab for Machine Learning, Technische Universität Berlin, 10587 Berlin, Germany
| | - Leonardo Medrano Sandonas
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Joshua T. Berryman
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Klaus-Robert Müller
- Google DeepMind, Tucholskystraße 2, 10117 Berlin, Germany and Brandschenkestrasse 110, 8002 Zürich, Switzerland
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
- Department of Artificial Intelligence, Korea University, Anam-dong, Seongbuk-gu, Seoul 02841, Korea
- Max Planck Institute for Informatics, Stuhlsatzenhausweg, 66123 Saarbrücken, Germany
- BIFOLD — Berlin Institute for the Foundations of Learning and Data, Berlin, Germany
| |
Collapse
|
39
|
Käser S, Meuwly M. Numerical Accuracy Matters: Applications of Machine Learned Potential Energy Surfaces. J Phys Chem Lett 2024:3419-3424. [PMID: 38506827 DOI: 10.1021/acs.jpclett.3c03405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/21/2024]
Abstract
The role of numerical accuracy in training and evaluating neural network-based potential energy surfaces is examined for different experimental observables. For observables that require third- and fourth-order derivatives of the potential energy with respect to Cartesian coordinates single-precision arithmetics as is typically used in ML-based approaches is insufficient and leads to roughness of the underlying PES as is explicitly demonstrated. Increasing the numerical accuracy to double-precision gives a smooth PES with higher-order derivatives that are numerically stable and yield meaningful anharmonic frequencies and tunneling splitting as is demonstrated for H2CO and malonaldehyde. For molecular dynamics simulations, which only require first-order derivatives, single-precision arithmetics appears to be sufficient, though.
Collapse
Affiliation(s)
- Silvan Käser
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
| | - Markus Meuwly
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
| |
Collapse
|
40
|
Dral PO. AI in computational chemistry through the lens of a decade-long journey. Chem Commun (Camb) 2024; 60:3240-3258. [PMID: 38444290 DOI: 10.1039/d4cc00010b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/07/2024]
Abstract
This article gives a perspective on the progress of AI tools in computational chemistry through the lens of the author's decade-long contributions put in the wider context of the trends in this rapidly expanding field. This progress over the last decade is tremendous: while a decade ago we had a glimpse of what was to come through many proof-of-concept studies, now we witness the emergence of many AI-based computational chemistry tools that are mature enough to make faster and more accurate simulations increasingly routine. Such simulations in turn allow us to validate and even revise experimental results, deepen our understanding of the physicochemical processes in nature, and design better materials, devices, and drugs. The rapid introduction of powerful AI tools gives rise to unique challenges and opportunities that are discussed in this article too.
Collapse
Affiliation(s)
- Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China.
| |
Collapse
|
41
|
Horn KP, Vazquez-Salazar LI, Koch CP, Meuwly M. Improving potential energy surfaces using measured Feshbach resonance states. SCIENCE ADVANCES 2024; 10:eadi6462. [PMID: 38427733 PMCID: PMC10906917 DOI: 10.1126/sciadv.adi6462] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 01/29/2024] [Indexed: 03/03/2024]
Abstract
The structure and dynamics of a molecular system is governed by its potential energy surface (PES), representing the total energy as a function of the nuclear coordinates. Obtaining accurate potential energy surfaces is limited by the exponential scaling of Hilbert space, restricting quantitative predictions of experimental observables from first principles to small molecules with just a few electrons. Here, we present an explicitly physics-informed approach for improving and assessing the quality of families of PESs by modifying them through linear coordinate transformations based on experimental data. We demonstrate this "morphing" of the PES for the He - H2+ complex using recent comprehensive Feshbach resonance (FR) measurements for reference PESs at three different levels of quantum chemistry. In all cases, the positions and intensities of peaks in the energy distributions are improved. We find these observables to be mainly sensitive to the long-range part of the PES.
Collapse
Affiliation(s)
- Karl P. Horn
- Dahlem Center for Complex Quantum Systems and Fachbereich Physik, Freie Universität Berlin, Arnimallee 14, D-14195 Berlin, Germany
| | | | - Christiane P. Koch
- Dahlem Center for Complex Quantum Systems and Fachbereich Physik, Freie Universität Berlin, Arnimallee 14, D-14195 Berlin, Germany
| | - Markus Meuwly
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
| |
Collapse
|
42
|
Brooks CL, MacKerell AD, Post CB, Nilsson L. Biomolecular dynamics in the 21st century. Biochim Biophys Acta Gen Subj 2024; 1868:130534. [PMID: 38065235 PMCID: PMC10842176 DOI: 10.1016/j.bbagen.2023.130534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 11/28/2023] [Accepted: 11/29/2023] [Indexed: 01/03/2024]
Abstract
The relevance of motions in biological macromolecules has been clear since the early structural analyses of proteins by X-ray crystallography. Computer simulations have been applied to provide a deeper understanding of the dynamics of biological macromolecules since 1976, and are now a standard tool in many labs working on the structure and function of biomolecules. In this mini-review we highlight some areas of current interest and active development for simulations, in particular all-atom molecular dynamics simulations.
Collapse
Affiliation(s)
- Charles L Brooks
- University of Michigan, Department of Chemistry, Ann Arbor, MI 48109, USA.
| | | | - Carol B Post
- Purdue University, Department of Medicinal Chemistry and Molecular Pharmacology, West Lafayette, IN 47907-2091, USA.
| | - Lennart Nilsson
- Karolinska Institutet, Department of Biosciences and Nutrition, SE-14183 Huddinge, Sweden.
| |
Collapse
|
43
|
Bonfà P, Onuorah IJ, Lang F, Timrov I, Monacelli L, Wang C, Sun X, Petracic O, Pizzi G, Marzari N, Blundell SJ, De Renzi R. Magnetostriction-Driven Muon Localization in an Antiferromagnetic Oxide. PHYSICAL REVIEW LETTERS 2024; 132:046701. [PMID: 38335330 DOI: 10.1103/physrevlett.132.046701] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 09/18/2023] [Accepted: 11/20/2023] [Indexed: 02/12/2024]
Abstract
Magnetostriction results from the coupling between magnetic and elastic degrees of freedom. Though it is associated with a relatively small energy, we show that it plays an important role in determining the site of an implanted muon, so that the energetically favorable site can switch on crossing a magnetic phase transition. This surprising effect is demonstrated in the cubic rocksalt antiferromagnet MnO which undergoes a magnetostriction-driven rhombohedral distortion at the Néel temperature T_{N}=118 K. Above T_{N}, the muon becomes delocalized around a network of equivalent sites, but below T_{N} the distortion lifts the degeneracy between these equivalent sites. Our first-principles simulations based on Hubbard-corrected density-functional theory and molecular dynamics are consistent with the experimental data and help to resolve a long-standing puzzle regarding muon data on MnO, as well as having wider applicability to other magnetic oxides.
Collapse
Affiliation(s)
- Pietro Bonfà
- Dipartimento di Scienze Matematiche, Fisiche e Informatiche, Universitá di Parma, I-43124 Parma, Italy
| | - Ifeanyi John Onuorah
- Dipartimento di Scienze Matematiche, Fisiche e Informatiche, Universitá di Parma, I-43124 Parma, Italy
| | - Franz Lang
- ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Chilton, Didcot OX11 0QX, United Kingdom
| | - Iurii Timrov
- Theory and Simulation of Materials (THEOS), and National Centre for Computational Design and Discovery of Novel Materials (MARVEL), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Lorenzo Monacelli
- Theory and Simulation of Materials (THEOS), and National Centre for Computational Design and Discovery of Novel Materials (MARVEL), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Chennan Wang
- Laboratory for Muon Spin Spectroscopy, Paul Scherrer Institute, CH-5232 Villigen, Switzerland
| | - Xiao Sun
- Jülich Centre for Neutron Science JCNS-2 and Peter Grünberg Institute PGI-4, JARA-FIT, Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
| | - Oleg Petracic
- Jülich Centre for Neutron Science JCNS-2 and Peter Grünberg Institute PGI-4, JARA-FIT, Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
| | - Giovanni Pizzi
- Laboratory for Materials Simulations (LMS), Paul Scherrer Institut (PSI), CH-5232 Villigen PSI, Switzerland
| | - Nicola Marzari
- Theory and Simulation of Materials (THEOS), and National Centre for Computational Design and Discovery of Novel Materials (MARVEL), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
- Laboratory for Materials Simulations (LMS), Paul Scherrer Institut (PSI), CH-5232 Villigen PSI, Switzerland
| | - Stephen J Blundell
- Department of Physics, University of Oxford, Clarendon Laboratory, Oxford OX1 3PU, United Kingdom
| | - Roberto De Renzi
- Dipartimento di Scienze Matematiche, Fisiche e Informatiche, Universitá di Parma, I-43124 Parma, Italy
| |
Collapse
|
44
|
Gelžinytė E, Öeren M, Segall MD, Csányi G. Transferable Machine Learning Interatomic Potential for Bond Dissociation Energy Prediction of Drug-like Molecules. J Chem Theory Comput 2024; 20:164-177. [PMID: 38108269 PMCID: PMC10782450 DOI: 10.1021/acs.jctc.3c00710] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 11/30/2023] [Accepted: 11/30/2023] [Indexed: 12/19/2023]
Abstract
We present a transferable MACE interatomic potential that is applicable to open- and closed-shell drug-like molecules containing hydrogen, carbon, and oxygen atoms. Including an accurate description of radical species extends the scope of possible applications to bond dissociation energy (BDE) prediction, for example, in the context of cytochrome P450 (CYP) metabolism. The transferability of the MACE potential was validated on the COMP6 data set, containing only closed-shell molecules, where it reaches better accuracy than the readily available general ANI-2x potential. MACE achieves similar accuracy on two CYP metabolism-specific data sets, which include open- and closed-shell structures. This model enables us to calculate the aliphatic C-H BDE, which allows us to compare reaction energies of hydrogen abstraction, which is the rate-limiting step of the aliphatic hydroxylation reaction catalyzed by CYPs. On the "CYP 3A4" data set, MACE achieves a BDE RMSE of 1.37 kcal/mol and better prediction of BDE ranks than alternatives: the semiempirical AM1 and GFN2-xTB methods and the ALFABET model that directly predicts bond dissociation enthalpies. Finally, we highlight the smoothness of the MACE potential over paths of sp3C-H bond elongation and show that a minimal extension is enough for the MACE model to start finding reasonable minimum energy paths of methoxy radical-mediated hydrogen abstraction. Altogether, this work lays the ground for further extensions of scope in terms of chemical elements, (CYP-mediated) reaction classes and modeling the full reaction paths, not only BDEs.
Collapse
Affiliation(s)
- Elena Gelžinytė
- Engineering
Laboratory, University of Cambridge, Trumpington Street, Cambridge CB2 1PZ, U.K.
| | - Mario Öeren
- Optibrium
Limited, Cambridge Innovation Park, Denny End Road, Cambridge CB25 9GL, U.K.
| | - Matthew D. Segall
- Optibrium
Limited, Cambridge Innovation Park, Denny End Road, Cambridge CB25 9GL, U.K.
| | - Gábor Csányi
- Engineering
Laboratory, University of Cambridge, Trumpington Street, Cambridge CB2 1PZ, U.K.
| |
Collapse
|
45
|
Wang Y, Wang T, Li S, He X, Li M, Wang Z, Zheng N, Shao B, Liu TY. Enhancing geometric representations for molecules with equivariant vector-scalar interactive message passing. Nat Commun 2024; 15:313. [PMID: 38182565 PMCID: PMC10770089 DOI: 10.1038/s41467-023-43720-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 11/16/2023] [Indexed: 01/07/2024] Open
Abstract
Geometric deep learning has been revolutionizing the molecular modeling field. Despite the state-of-the-art neural network models are approaching ab initio accuracy for molecular property prediction, their applications, such as drug discovery and molecular dynamics (MD) simulation, have been hindered by insufficient utilization of geometric information and high computational costs. Here we propose an equivariant geometry-enhanced graph neural network called ViSNet, which elegantly extracts geometric features and efficiently models molecular structures with low computational costs. Our proposed ViSNet outperforms state-of-the-art approaches on multiple MD benchmarks, including MD17, revised MD17 and MD22, and achieves excellent chemical property prediction on QM9 and Molecule3D datasets. Furthermore, through a series of simulations and case studies, ViSNet can efficiently explore the conformational space and provide reasonable interpretability to map geometric representations to molecular structures.
Collapse
Affiliation(s)
- Yusong Wang
- Microsoft Research AI4Science, 100080, Beijing, China
- National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, National Engineering Research Center for Visual Information and Applications, and Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, 710049, Xi'an, China
| | - Tong Wang
- Microsoft Research AI4Science, 100080, Beijing, China.
| | - Shaoning Li
- Microsoft Research AI4Science, 100080, Beijing, China
| | - Xinheng He
- Microsoft Research AI4Science, 100080, Beijing, China
- The CAS Key Laboratory of Receptor Research and State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 201203, Shanghai, China
- University of Chinese Academy of Sciences, 100049, Beijing, China
| | - Mingyu Li
- Microsoft Research AI4Science, 100080, Beijing, China
- Medicinal Chemistry and Bioinformatics Center, School of Medicine, Shanghai Jiaotong University, Shanghai, 200025, China
| | - Zun Wang
- Microsoft Research AI4Science, 100080, Beijing, China
| | - Nanning Zheng
- National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, National Engineering Research Center for Visual Information and Applications, and Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, 710049, Xi'an, China
| | - Bin Shao
- Microsoft Research AI4Science, 100080, Beijing, China.
| | - Tie-Yan Liu
- Microsoft Research AI4Science, 100080, Beijing, China
| |
Collapse
|
46
|
Fonseca G, Poltavsky I, Tkatchenko A. Force Field Analysis Software and Tools (FFAST): Assessing Machine Learning Force Fields under the Microscope. J Chem Theory Comput 2023; 19:8706-8717. [PMID: 38011895 PMCID: PMC10720330 DOI: 10.1021/acs.jctc.3c00985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 11/06/2023] [Accepted: 11/07/2023] [Indexed: 11/29/2023]
Abstract
As the sophistication of machine learning force fields (MLFF) increases to match the complexity of extended molecules and materials, so does the need for tools to properly analyze and assess the practical performance of MLFFs. To go beyond average error metrics and into a complete picture of a model's applicability and limitations, we developed FFAST (force field analysis software and tools): a cross-platform software package designed to gain detailed insights into a model's performance and limitations, complete with an easy-to-use graphical user interface. The software allows the user to gauge the performance of any molecular force field,─such as popular state-of-the-art MLFF models, ─ on various popular data set types, providing general prediction error overviews, outlier detection mechanisms, atom-projected errors, and more. It has a 3D visualizer to find and picture problematic configurations, atoms, or clusters in a large data set. In this paper, the example of the MACE and NequIP models is used on two data sets of interest [stachyose and docosahexaenoic acid (DHA)]─to illustrate the use cases of the software. With this, it was found that carbons and oxygens involved in or near glycosidic bonds inside the stachyose molecule present increased prediction errors. In addition, prediction errors on DHA rise as the molecule folds, especially for the carboxylic group at the edge of the molecule. We emphasize the need for a systematic assessment of MLFF models for ensuring their successful application to the study of dynamics of molecules and materials.
Collapse
Affiliation(s)
- Gregory Fonseca
- Department of Physics and Materials
Science, University of Luxembourg, Luxembourg City L-1511, Luxembourg
| | - Igor Poltavsky
- Department of Physics and Materials
Science, University of Luxembourg, Luxembourg City L-1511, Luxembourg
| | - Alexandre Tkatchenko
- Department of Physics and Materials
Science, University of Luxembourg, Luxembourg City L-1511, Luxembourg
| |
Collapse
|
47
|
Kanhaiya K, Nathanson M, In 't Veld PJ, Zhu C, Nikiforov I, Tadmor EB, Choi YK, Im W, Mishra RK, Heinz H. Accurate Force Fields for Atomistic Simulations of Oxides, Hydroxides, and Organic Hybrid Materials up to the Micrometer Scale. J Chem Theory Comput 2023; 19:8293-8322. [PMID: 37962992 DOI: 10.1021/acs.jctc.3c00750] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2023]
Abstract
The simulation of metals, oxides, and hydroxides can accelerate the design of therapeutics, alloys, catalysts, cement-based materials, ceramics, bioinspired composites, and glasses. Here we introduce the INTERFACE force field (IFF) and surface models for α-Al2O3, α-Cr2O3, α-Fe2O3, NiO, CaO, MgO, β-Ca(OH)2, β-Mg(OH)2, and β-Ni(OH)2. The force field parameters are nonbonded, including atomic charges for Coulomb interactions, Lennard-Jones (LJ) potentials for van der Waals interactions with 12-6 and 9-6 options, and harmonic bond stretching for hydroxide ions. The models outperform DFT calculations and earlier atomistic models (Pedone, ReaxFF, UFF, CLAYFF) up to 2 orders of magnitude in reliability, compatibility, and interpretability due to a quantitative representation of chemical bonding consistent with other compounds across the periodic table and curated experimental data for validation. The IFF models exhibit average deviations of 0.2% in lattice parameters, <10% in surface energies (to the extent known), and 6% in bulk moduli relative to experiments. The parameters and models can be used with existing parameters for solvents, inorganic compounds, organic compounds, biomolecules, and polymers in IFF, CHARMM, CVFF, AMBER, OPLS-AA, PCFF, and COMPASS, to simulate bulk oxides, hydroxides, electrolyte interfaces, and multiphase, biological, and organic hybrid materials at length scales from atoms to micrometers. The nonbonded character of the models also enables the analysis of mixed oxides, glasses, and certain chemical reactions, and well-performing nonbonded models for silica phases, SiO2, are introduced. Automated model building is available in the CHARMM-GUI Nanomaterial Modeler. We illustrate applications of the models to predict the structure of mixed oxides, and energy barriers of ion migration, as well as binding energies of water and organic molecules in outstanding agreement with experimental data and calculations at the CCSD(T) level. Examples of model building for hydrated, pH-sensitive oxide surfaces to simulate solid-electrolyte interfaces are discussed.
Collapse
Affiliation(s)
- Krishan Kanhaiya
- Department of Chemical and Biological Engineering, University of Colorado at Boulder, Boulder, Colorado 80309, United States
| | - Michael Nathanson
- Department of Chemical and Biological Engineering, University of Colorado at Boulder, Boulder, Colorado 80309, United States
| | - Pieter J In 't Veld
- BASF SE, Molecular Modeling & Drug Discovery, Carl Bosch Str. 38, 67056 Ludwigshafen, Germany
| | - Cheng Zhu
- Department of Chemical and Biological Engineering, University of Colorado at Boulder, Boulder, Colorado 80309, United States
| | - Ilia Nikiforov
- Department of Aerospace Engineering and Mechanics, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Ellad B Tadmor
- Department of Aerospace Engineering and Mechanics, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Yeol Kyo Choi
- Department of Biological Sciences, Lehigh University, Bethlehem, Pennsylvania 18015, United States
| | - Wonpil Im
- Department of Biological Sciences, Lehigh University, Bethlehem, Pennsylvania 18015, United States
| | - Ratan K Mishra
- BASF SE, Molecular Modeling & Drug Discovery, Carl Bosch Str. 38, 67056 Ludwigshafen, Germany
| | - Hendrik Heinz
- Department of Chemical and Biological Engineering, University of Colorado at Boulder, Boulder, Colorado 80309, United States
| |
Collapse
|
48
|
Plé T, Lagardère L, Piquemal JP. Force-field-enhanced neural network interactions: from local equivariant embedding to atom-in-molecule properties and long-range effects. Chem Sci 2023; 14:12554-12569. [PMID: 38020379 PMCID: PMC10646944 DOI: 10.1039/d3sc02581k] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 10/03/2023] [Indexed: 12/01/2023] Open
Abstract
We introduce FENNIX (Force-Field-Enhanced Neural Network InteraXions), a hybrid approach between machine-learning and force-fields. We leverage state-of-the-art equivariant neural networks to predict local energy contributions and multiple atom-in-molecule properties that are then used as geometry-dependent parameters for physically-motivated energy terms which account for long-range electrostatics and dispersion. Using high-accuracy ab initio data (small organic molecules/dimers), we trained a first version of the model. Exhibiting accurate gas-phase energy predictions, FENNIX is transferable to the condensed phase. It is able to produce stable Molecular Dynamics simulations, including nuclear quantum effects, for water predicting accurate liquid properties. The extrapolating power of the hybrid physically-driven machine learning FENNIX approach is exemplified by computing: (i) the solvated alanine dipeptide free energy landscape; (ii) the reactive dissociation of small molecules.
Collapse
Affiliation(s)
- Thomas Plé
- Sorbonne Université, LCT, UMR 7616 CNRS F-75005 Paris France thomas.ple@sorbonne-université louis.lagardere@sorbonne-université jean-philip.piquemal@sorbonne-université
| | - Louis Lagardère
- Sorbonne Université, LCT, UMR 7616 CNRS F-75005 Paris France thomas.ple@sorbonne-université louis.lagardere@sorbonne-université jean-philip.piquemal@sorbonne-université
| | - Jean-Philip Piquemal
- Sorbonne Université, LCT, UMR 7616 CNRS F-75005 Paris France thomas.ple@sorbonne-université louis.lagardere@sorbonne-université jean-philip.piquemal@sorbonne-université
| |
Collapse
|
49
|
Lederer J, Gastegger M, Schütt KT, Kampffmeyer M, Müller KR, Unke OT. Automatic identification of chemical moieties. Phys Chem Chem Phys 2023; 25:26370-26379. [PMID: 37750554 PMCID: PMC10548786 DOI: 10.1039/d3cp03845a] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 08/18/2023] [Indexed: 09/27/2023]
Abstract
In recent years, the prediction of quantum mechanical observables with machine learning methods has become increasingly popular. Message-passing neural networks (MPNNs) solve this task by constructing atomic representations, from which the properties of interest are predicted. Here, we introduce a method to automatically identify chemical moieties (molecular building blocks) from such representations, enabling a variety of applications beyond property prediction, which otherwise rely on expert knowledge. The required representation can either be provided by a pretrained MPNN, or be learned from scratch using only structural information. Beyond the data-driven design of molecular fingerprints, the versatility of our approach is demonstrated by enabling the selection of representative entries in chemical databases, the automatic construction of coarse-grained force fields, as well as the identification of reaction coordinates.
Collapse
Affiliation(s)
- Jonas Lederer
- Berlin Institute of Technology (TU Berlin), 10587 Berlin, Germany.
- BIFOLD - Berlin Institute for the Foundations of Learning and Data, Germany
| | - Michael Gastegger
- Berlin Institute of Technology (TU Berlin), 10587 Berlin, Germany.
- BIFOLD - Berlin Institute for the Foundations of Learning and Data, Germany
| | - Kristof T Schütt
- Berlin Institute of Technology (TU Berlin), 10587 Berlin, Germany.
- BIFOLD - Berlin Institute for the Foundations of Learning and Data, Germany
| | - Michael Kampffmeyer
- Department of Physics and Technology, UiT The Arctic University of Norway, 9019 Tromsø, Norway
| | - Klaus-Robert Müller
- Berlin Institute of Technology (TU Berlin), 10587 Berlin, Germany.
- BIFOLD - Berlin Institute for the Foundations of Learning and Data, Germany
- Google Deepmind, Germany
- Department of Artificial Intelligence, Korea University, Seoul 136-713, Korea
- Max Planck Institut für Informatik, 66123 Saarbrücken, Germany
| | - Oliver T Unke
- Berlin Institute of Technology (TU Berlin), 10587 Berlin, Germany.
- BIFOLD - Berlin Institute for the Foundations of Learning and Data, Germany
- Google Deepmind, Germany
| |
Collapse
|
50
|
Wang T, He X, Li M, Shao B, Liu TY. AIMD-Chig: Exploring the conformational space of a 166-atom protein Chignolin with ab initio molecular dynamics. Sci Data 2023; 10:549. [PMID: 37607915 PMCID: PMC10444755 DOI: 10.1038/s41597-023-02465-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Accepted: 08/11/2023] [Indexed: 08/24/2023] Open
Abstract
Molecular dynamics (MD) simulations have revolutionized the modeling of biomolecular conformations and provided unprecedented insight into molecular interactions. Due to the prohibitive computational overheads of ab initio simulation for large biomolecules, dynamic modeling for proteins is generally constrained on force field with molecular mechanics, which suffers from low accuracy as well as ignores the electronic effects. Here, we report AIMD-Chig, an MD dataset including 2 million conformations of 166-atom protein Chignolin sampled at the density functional theory (DFT) level with 7,763,146 CPU hours. 10,000 conformations were initialized covering the whole conformational space of Chignolin, including folded, unfolded, and metastable states. Ab initio simulations were driven by M06-2X/6-31 G* with a Berendsen thermostat at 340 K. We reported coordinates, energies, and forces for each conformation. AIMD-Chig brings the DFT level conformational space exploration from small organic molecules to real-world proteins. It can serve as the benchmark for developing machine learning potentials for proteins and facilitate the exploration of protein dynamics with ab initio accuracy.
Collapse
Affiliation(s)
- Tong Wang
- Microsoft Research AI4Science, Beijing, China.
| | - Xinheng He
- Microsoft Research AI4Science, Beijing, China
- Work done during an internship at Microsoft Research AI4Science, Beijing, China
- State Key Laboratory of Drug Research and CAS Key Laboratory of Receptor Research and, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Mingyu Li
- Microsoft Research AI4Science, Beijing, China
- Work done during an internship at Microsoft Research AI4Science, Beijing, China
- Department of Pathophysiology, Key Laboratory of Cell Differentiation and Apoptosis of Chinese Ministry of Education, Shanghai Jiao Tong University, School of Medicine, Shanghai, China
| | - Bin Shao
- Microsoft Research AI4Science, Beijing, China.
| | - Tie-Yan Liu
- Microsoft Research AI4Science, Beijing, China
| |
Collapse
|