1
|
Kovács DP, Moore JH, Browning NJ, Batatia I, Horton JT, Pu Y, Kapil V, Witt WC, Magdău IB, Cole DJ, Csányi G. MACE-OFF: Short-Range Transferable Machine Learning Force Fields for Organic Molecules. J Am Chem Soc 2025; 147:17598-17611. [PMID: 40387214 PMCID: PMC12123624 DOI: 10.1021/jacs.4c07099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Revised: 04/30/2025] [Accepted: 05/02/2025] [Indexed: 05/20/2025]
Abstract
Classical empirical force fields have dominated biomolecular simulations for over 50 years. Although widely used in drug discovery, crystal structure prediction, and biomolecular dynamics, they generally lack the accuracy and transferability required for first-principles predictive modeling. In this paper, we introduce MACE-OFF, a series of short-range transferable force fields for organic molecules created using state-of-the-art machine learning technology and first-principles reference data computed with a high level of quantum mechanical theory. MACE-OFF demonstrates the remarkable capabilities of short-range models by accurately predicting a wide variety of gas- and condensed-phase properties of molecular systems. It produces accurate, easy-to-converge dihedral torsion scans of unseen molecules as well as reliable descriptions of molecular crystals and liquids, including quantum nuclear effects. We further demonstrate the capabilities of MACE-OFF by determining free energy surfaces in explicit solvent as well as the folding dynamics of peptides and nanosecond simulations of a fully solvated protein. These developments enable first-principles simulations of molecular systems for the broader chemistry community at high accuracy and relatively low computational cost.
Collapse
Affiliation(s)
| | - J. Harry Moore
- Engineering
Laboratory, University of Cambridge, CambridgeCB2 1PZ, U.K.
- Ångström
AI, 2325 Third Street, San Francisco, California94107, United States
| | | | - Ilyes Batatia
- Engineering
Laboratory, University of Cambridge, CambridgeCB2 1PZ, U.K.
| | - Joshua T. Horton
- School
of Natural and Environmental Sciences, Newcastle
University, Newcastle
upon TyneNE1 7RU, U.K.
| | - Yixuan Pu
- Department
of Physics and Astronomy, University College, LondonWC1E 6BT, U.K.
| | - Venkat Kapil
- Department
of Physics and Astronomy, University College, LondonWC1E 6BT, U.K.
- Yusuf Hamied
Department of Chemistry, University of Cambridge, Lensfield Road, CambridgeCB2 1EW, U.K.
- Thomas
Young Centre and London Centre for Nanotechnology, LondonWC1E 6BT, U.K.
| | - William C. Witt
- Department
of Materials Science and Metallurgy, University
of Cambridge, 27 Charles Babbage Road, CambridgeCB3 0FS, U.K.
| | - Ioan-Bogdan Magdău
- School
of Natural and Environmental Sciences, Newcastle
University, Newcastle
upon TyneNE1 7RU, U.K.
| | - Daniel J. Cole
- School
of Natural and Environmental Sciences, Newcastle
University, Newcastle
upon TyneNE1 7RU, U.K.
| | - Gábor Csányi
- Engineering
Laboratory, University of Cambridge, CambridgeCB2 1PZ, U.K.
- Ångström
AI, 2325 Third Street, San Francisco, California94107, United States
| |
Collapse
|
2
|
Della Pia F, Shi BX, Kapil V, Zen A, Alfè D, Michaelides A. Accurate and efficient machine learning interatomic potentials for finite temperature modelling of molecular crystals. Chem Sci 2025:d5sc01325a. [PMID: 40417296 PMCID: PMC12101462 DOI: 10.1039/d5sc01325a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2025] [Accepted: 05/08/2025] [Indexed: 05/27/2025] Open
Abstract
As with many parts of the natural sciences, machine learning interatomic potentials (MLIPs) are revolutionizing the modelling of molecular crystals. However, challenges remain for the accurate and efficient calculation of sublimation enthalpies - a key thermodynamic quantity measuring the stability of a molecular crystal. Specifically, two key stumbling blocks are: (i) the need for thousands of ab initio quality reference structures to generate training data; and (ii) the sometimes unreliable nature of density functional theory, the main technique for generating such data. Exploiting recent developments in foundation models for chemistry and materials science alongside accurate quantum diffusion Monte Carlo benchmarks, offers a promising path forward. Herein, we demonstrate the generation of MLIPs capable of describing molecular crystals at finite temperature and pressure with sub-chemical accuracy, using as few as ∼200 data structures; an order of magnitude improvement over the current state-of-the-art. We apply this framework to compute the sublimation enthalpies of the X23 dataset, accounting for anharmonicity and nuclear quantum effects, achieving sub-chemical accuracy with respect to experiment. Importantly, we show that our framework can be generalized to crystals of pharmaceutical relevance, including paracetamol and aspirin. Nuclear quantum effects are also accurately captured as shown for the case of squaric acid. By enabling accurate modelling at ambient conditions, this work paves the way for deeper insights into pharmaceutical and biological systems.
Collapse
Affiliation(s)
- Flaviano Della Pia
- Yusuf Hamied Department of Chemistry, University of Cambridge Cambridge CB2 1EW UK
| | - Benjamin X Shi
- Yusuf Hamied Department of Chemistry, University of Cambridge Cambridge CB2 1EW UK
| | - Venkat Kapil
- Yusuf Hamied Department of Chemistry, University of Cambridge Cambridge CB2 1EW UK
- Department of Physics and Astronomy, University College London London UK
- Thomas Young Centre and London Centre for Nanotechnology, University College London London WC1E 6BT UK
| | - Andrea Zen
- Dipartimento di Fisica Ettore Pancini, Università di Napoli Federico II Monte S. Angelo I-80126 Napoli Italy
- Department of Earth Sciences, University College London London WC1E 6BT UK
| | - Dario Alfè
- Thomas Young Centre and London Centre for Nanotechnology, University College London London WC1E 6BT UK
- Dipartimento di Fisica Ettore Pancini, Università di Napoli Federico II Monte S. Angelo I-80126 Napoli Italy
- Department of Earth Sciences, University College London London WC1E 6BT UK
| | - Angelos Michaelides
- Yusuf Hamied Department of Chemistry, University of Cambridge Cambridge CB2 1EW UK
| |
Collapse
|
3
|
Song G, Yang W. NepoIP/MM: Toward Accurate Biomolecular Simulation with a Machine Learning/Molecular Mechanics Model Incorporating Polarization Effects. J Chem Theory Comput 2025. [PMID: 40397856 DOI: 10.1021/acs.jctc.5c00372] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/23/2025]
Abstract
Machine learning force fields offer the ability to simulate biomolecules with quantum mechanical accuracy while significantly reducing computational costs, attracting a growing amount of attention in biophysics. Meanwhile, by leveraging the efficiency of molecular mechanics in modeling solvent molecules and long-range interactions, a hybrid machine learning/molecular mechanics (ML/MM) model offers a more realistic approach to describing complex biomolecular systems in solution. However, multiscale models with electrostatic embedding require accounting for the polarization of the ML region induced by the MM environment. To address this, we adapt the state-of-the-art NequIP architecture into a polarizable ML force field, NepoIP, enabling the modeling of polarization effects based on the external electrostatic potential. We found that the nanosecond MD simulations based on NepoIP/MM are stable for the periodic solvated dipeptide system, and the converged sampling shows excellent agreement with the reference QM/MM level. Moreover, we show that a single NepoIP model can be transferable across different MM force fields, as well as an extremely different MM environment of water and proteins, laying the foundation for developing a general ML biomolecular force field to be used in ML/MM with electrostatic embedding.
Collapse
Affiliation(s)
- Ge Song
- Department of Chemistry, Duke University, Durham, North Carolina 27708, United States
| | - Weitao Yang
- Department of Chemistry and Department of Physics, Duke University, Durham, North Carolina 27708, United States
| |
Collapse
|
4
|
Xia J, Zhang Y, Jiang B. The evolution of machine learning potentials for molecules, reactions and materials. Chem Soc Rev 2025; 54:4790-4821. [PMID: 40227021 DOI: 10.1039/d5cs00104h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/15/2025]
Abstract
Recent years have witnessed the fast development of machine learning potentials (MLPs) and their widespread applications in chemistry, physics, and material science. By fitting discrete ab initio data faithfully to continuous and symmetry-preserving mathematical forms, MLPs have enabled accurate and efficient atomistic simulations in a large scale from first principles. In this review, we provide an overview of the evolution of MLPs in the past two decades and focus on the state-of-the-art MLPs proposed in the last a few years for molecules, reactions, and materials. We discuss some representative applications of MLPs and the trend of developing universal potentials across a variety of systems. Finally, we outline a list of open challenges and opportunities in the development and applications of MLPs.
Collapse
Affiliation(s)
- Junfan Xia
- State Key Laboratory of Precision and Intelligent Chemistry, University of Science and Technology of China, Hefei, Anhui 230026, China.
- School of Chemistry and Materials Science, Department of Chemical Physics, University of Science and Technology of China, Hefei, Anhui 230026, China
| | - Yaolong Zhang
- Department of Chemistry and Chemical Biology, Center for Computational Chemistry, University of New Mexico, Albuquerque, New Mexico 87131, USA
| | - Bin Jiang
- State Key Laboratory of Precision and Intelligent Chemistry, University of Science and Technology of China, Hefei, Anhui 230026, China.
- School of Chemistry and Materials Science, Department of Chemical Physics, University of Science and Technology of China, Hefei, Anhui 230026, China
- Hefei National Laboratory, University of Science and Technology of China, Hefei, 230088, China
| |
Collapse
|
5
|
Wang L, Tricard N, Chen Z, Deng S. Progress in computational methods and mechanistic insights on the growth of carbon nanotubes. NANOSCALE 2025; 17:11812-11863. [PMID: 40275725 DOI: 10.1039/d4nr05487c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2025]
Abstract
Carbon nanotubes (CNTs), as a promising nanomaterial with broad applications across various fields, are continuously attracting significant research attention. Despite substantial progress in understanding their growth mechanisms, synthesis methods, and post-processing techniques, two major goals remain challenging: achieving property-targeted growth and efficient mass production. Recent advancements in computational methods driven by increased computational resources, the development of platforms, and the refinement of theoretical models, have significantly deepened our understanding of the mechanisms underlying CNT growth. This review aims to comprehensively examine the latest computational techniques that shed light on various aspects of CNT synthesis. The first part of this review focuses on progress in computational methods. Beginning with atomistic simulation approaches, we introduce the fundamentals and advancements in density functional theory (DFT), molecular dynamics (MD) simulations, and kinetic Monte Carlo (kMC) simulations. We discuss the applicability and limitations of each method in studying mechanisms of CNT growth. Then, the focus shifts to multiscale modeling approaches, where we demonstrate the coupling of atomic-scale simulations with reactor-scale multiphase flow models. Given that CNT growth inherently spans multiple temporal and spatial scales, the development and application of multiscale modeling techniques are poised to become a central focus of future computational research in this field. Furthermore, this review emphasizes the growing role played by machine learning in CNT growth research. Compared with traditional physics-based simulation methods, data-driven machine learning approaches have rapidly emerged in recent years, revolutionizing research paradigms from molecular simulation to experimental design. In the second part of this review, we highlight the latest advancements in CNT growth mechanisms and synthesis methods achieved through computational techniques. These include novel findings across fundamental growth stages, i.e., from nucleation to elongation and ultimately termination. We also examine the dynamic behaviors of catalyst nanoparticles and chirality-controlled growth processes, emphasizing how these insights contribute to advancing the field. Finally, in the concluding section, we propose future directions for advancements of computational approaches toward deeper understanding of CNT growth mechanisms and better support of CNT manufacturing.
Collapse
Affiliation(s)
- Linzheng Wang
- Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, 02139, MA, USA.
| | - Nicolas Tricard
- Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, 02139, MA, USA.
| | - Zituo Chen
- Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, 02139, MA, USA.
| | - Sili Deng
- Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, 02139, MA, USA.
| |
Collapse
|
6
|
Vornweg JR, Maier TM, Jacob CR. The density-based many-body expansion for poly-peptides and proteins. Phys Chem Chem Phys 2025; 27:8719-8730. [PMID: 40235457 DOI: 10.1039/d5cp00727e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/17/2025]
Abstract
Fragmentation schemes enable the efficient quantum-chemical treatment of large biomolecular systems, and provide an ideal starting point for the development of accurate machine-learning potentials for proteins. Here, we present a fragment-based method that only uses calculations for single-amino acids and their dimers, and is able to reduce the fragmentation error in total energies to ca. 1 kJ mol-1 per amino acid for polypeptides and proteins across different structural motifs. This is achieved by combining a two-body extension of the molecular fractionation with conjugate caps (MFCC) scheme with the density-based many-body expansion (db-MBE), thus extending the applicability of the db-MBE from molecular clusters to polypeptides and proteins.
Collapse
Affiliation(s)
- Johannes R Vornweg
- Technische Universität Braunschweig, Institute of Physical and Theoretical Chemistry, Gaußstraße 17, 38106 Braunschweig, Germany.
| | - Toni M Maier
- Technische Universität Braunschweig, Institute of Physical and Theoretical Chemistry, Gaußstraße 17, 38106 Braunschweig, Germany.
| | - Christoph R Jacob
- Technische Universität Braunschweig, Institute of Physical and Theoretical Chemistry, Gaußstraße 17, 38106 Braunschweig, Germany.
| |
Collapse
|
7
|
Anstine DM, Zubatyuk R, Isayev O. AIMNet2: a neural network potential to meet your neutral, charged, organic, and elemental-organic needs. Chem Sci 2025:d4sc08572h. [PMID: 40342914 PMCID: PMC12057637 DOI: 10.1039/d4sc08572h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2024] [Accepted: 04/21/2025] [Indexed: 05/11/2025] Open
Abstract
Machine learned interatomic potentials (MLIPs) are reshaping computational chemistry practices because of their ability to drastically exceed the accuracy-length/time scale tradeoff. Despite this attraction, the benefits of such efficiency are only impactful when an MLIP uniquely enables insight into a target system or is broadly transferable outside of the training dataset. In this work, we present the 2nd generation of our atoms-in-molecules neural network potential (AIMNet2), which is applicable to species composed of up to 14 chemical elements in both neutral and charged states, making it a valuable method for modeling the majority of non-metallic compounds. Using an exhaustive dataset of 2 × 107 hybrid DFT level of theory quantum chemical calculations, AIMNet2 combines ML-parameterized short-range and physics-based long-range terms to attain generalizability that reaches from simple organics to diverse molecules with "exotic" element-organic bonding. We show that AIMNet2 outperforms semi-empirical GFN2-xTB and is on par with reference density functional theory for interaction energy contributions, conformer search tasks, torsion rotation profiles, and molecular-to-macromolecular geometry optimization. Overall, the demonstrated chemical coverage and computational efficiency of AIMNet2 is a significant step toward providing access to MLIPs that avoid the crucial limitation of curating additional quantum chemical data and retraining with each new application.
Collapse
Affiliation(s)
- Dylan M Anstine
- Department of Chemistry, Mellon College of Science, Carnegie Mellon University Pittsburgh Pennsylvania 15213 USA
| | - Roman Zubatyuk
- Department of Chemistry, Mellon College of Science, Carnegie Mellon University Pittsburgh Pennsylvania 15213 USA
| | - Olexandr Isayev
- Department of Chemistry, Mellon College of Science, Carnegie Mellon University Pittsburgh Pennsylvania 15213 USA
| |
Collapse
|
8
|
Tiefenbacher MX, Bachmair B, Chen CG, Westermayr J, Marquetand P, Dietschreit JCB, González L. Excited-state nonadiabatic dynamics in explicit solvent using machine learned interatomic potentials. DIGITAL DISCOVERY 2025:d5dd00044k. [PMID: 40352439 PMCID: PMC12060776 DOI: 10.1039/d5dd00044k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/30/2025] [Accepted: 04/11/2025] [Indexed: 05/14/2025]
Abstract
Excited-state nonadiabatic simulations with quantum mechanics/molecular mechanics (QM/MM) are essential to understand photoinduced processes in explicit environments. However, the high computational cost of the underlying quantum chemical calculations limits its application in combination with trajectory surface hopping methods. Here, we use FieldSchNet, a machine-learned interatomic potential capable of incorporating electric field effects into the electronic states, to replace traditional QM/MM electrostatic embedding with its ML/MM counterpart for nonadiabatic excited state trajectories. The developed method is applied to furan in water, including five coupled singlet states. Our results demonstrate that with sufficiently curated training data, the ML/MM model reproduces the electronic kinetics and structural rearrangements of QM/MM surface hopping reference simulations. Furthermore, we identify performance metrics that provide robust and interpretable validation of model accuracy.
Collapse
Affiliation(s)
- Maximilian X Tiefenbacher
- Research Platform on Accelerating Photoreaction Discovery (ViRAPID), University of Vienna Währinger Straße 17 1090 Vienna Austria
- Vienna Doctoral School in Chemistry, University of Vienna Währinger Straße 42 1090 Vienna Austria
| | - Brigitta Bachmair
- Research Platform on Accelerating Photoreaction Discovery (ViRAPID), University of Vienna Währinger Straße 17 1090 Vienna Austria
- Vienna Doctoral School in Chemistry, University of Vienna Währinger Straße 42 1090 Vienna Austria
- Institute of Theoretical Chemistry, Faculty of Chemistry, University of Vienna Währinger Straße 17 1090 Vienna Austria
| | - Cheng Giuseppe Chen
- Institute of Theoretical Chemistry, Faculty of Chemistry, University of Vienna Währinger Straße 17 1090 Vienna Austria
- Department of Chemistry, Sapienza University of Rome Piazzale Aldo Moro, 5 Rome 00185 Italy
| | - Julia Westermayr
- Wilhelm-Ostwald Institute, University of Leipzig Linnéstraße 2 04103 Leipzig Germany
- Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI) Dresden/Leipzig, Humboldtstraße 25 04105 Leipzig Germany
| | - Philipp Marquetand
- Research Platform on Accelerating Photoreaction Discovery (ViRAPID), University of Vienna Währinger Straße 17 1090 Vienna Austria
- Institute of Theoretical Chemistry, Faculty of Chemistry, University of Vienna Währinger Straße 17 1090 Vienna Austria
| | - Johannes C B Dietschreit
- Institute of Theoretical Chemistry, Faculty of Chemistry, University of Vienna Währinger Straße 17 1090 Vienna Austria
| | - Leticia González
- Research Platform on Accelerating Photoreaction Discovery (ViRAPID), University of Vienna Währinger Straße 17 1090 Vienna Austria
- Institute of Theoretical Chemistry, Faculty of Chemistry, University of Vienna Währinger Straße 17 1090 Vienna Austria
| |
Collapse
|
9
|
Bonollo G, Trèves G, Komarov D, Mansoor S, Moroni E, Colombo G. Advancing Molecular Simulations: Merging Physical Models, Experiments, and AI to Tackle Multiscale Complexity. J Phys Chem Lett 2025; 16:3606-3615. [PMID: 40179097 PMCID: PMC12010417 DOI: 10.1021/acs.jpclett.5c00652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2025] [Revised: 03/28/2025] [Accepted: 04/01/2025] [Indexed: 04/05/2025]
Abstract
Proteins and protein complexes form adaptable networks that regulate essential biochemical pathways and define cell phenotypes through dynamic mechanisms and interactions. Advances in structural biology and molecular simulations have revealed how protein systems respond to changes in their environments, such as ligand binding, stress conditions, or perturbations like mutations and post-translational modifications, influencing signal transduction and cellular phenotypes. Here, we discuss how computational approaches, ranging from molecular dynamics (MD) simulations to AI-driven methods, are instrumental in studying protein dynamics from isolated molecules to large assemblies. These techniques elucidate conformational landscapes, ligand-binding mechanisms, and protein-protein interactions and are starting to support the construction of multiscale realistic representations of highly complex systems, ranging up to whole cell models. With cryo-electron microscopy, cryo-electron tomography, and AlphaFold accelerating the structural characterization of protein networks, we suggest that integrating AI and Machine Learning with multiscale MD methods will enhance fundamental understating for systems of ever-increasing complexity, usher in exciting possibilities for predictive modeling of the behavior of cell compartments or even whole cells. These advances are indeed transforming biophysics and chemical biology, offering new opportunities to study biomolecular mechanisms at atomic resolution.
Collapse
Affiliation(s)
- Giorgio Bonollo
- Department
of Chemistry, University of Pavia, via Taramelli 12, 27100 Pavia, Italy
| | - Gauthier Trèves
- Department
of Chemistry, University of Pavia, via Taramelli 12, 27100 Pavia, Italy
| | - Denis Komarov
- Department
of Chemistry, University of Pavia, via Taramelli 12, 27100 Pavia, Italy
| | - Samman Mansoor
- Department
of Chemistry, University of Pavia, via Taramelli 12, 27100 Pavia, Italy
| | - Elisabetta Moroni
- National
Research Council of Italy (CNR) - Institute of Chemical Sciences and
Technologies (SCITEC), via Mario Bianco 9, 20131 Milano, Italy
| | - Giorgio Colombo
- Department
of Chemistry, University of Pavia, via Taramelli 12, 27100 Pavia, Italy
| |
Collapse
|
10
|
Zajac JWP, Muralikrishnan P, Tohidian I, Zeng X, Heldt CL, Perry SL, Sarupria S. Flipping out: role of arginine in hydrophobic interactions and biological formulation design. Chem Sci 2025; 16:6780-6792. [PMID: 40110519 PMCID: PMC11915020 DOI: 10.1039/d4sc08672d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2024] [Accepted: 03/09/2025] [Indexed: 03/22/2025] Open
Abstract
Arginine has been a mainstay in biological formulation development for decades. To date, the way arginine modulates protein stability has been widely studied and debated. Here, we employed a hydrophobic polymer to decouple hydrophobic effects from other interactions relevant to protein folding. While existing hypotheses for the effects of arginine can generally be categorized as either direct or indirect, our results indicate that direct and indirect mechanisms of arginine co-exist and oppose each other. At low concentrations, arginine was observed to stabilize hydrophobic polymer folding via a sidechain-dominated direct mechanism, while at high concentrations, arginine stabilized polymer folding via a backbone-dominated indirect mechanism. Upon introducing partially charged polymer sites, arginine destabilized polymer folding. Further, we found arginine-induced destabilization of a model virus similar to direct-mechanism destabilization of the charged polymer and concentration-dependent stabilization of a model protein similar to the indirect mechanism of hydrophobic polymer stabilization. These findings highlight the modular nature of the widely used additive arginine, with relevance in the information-driven design of stable biological formulations.
Collapse
Affiliation(s)
- Jonathan W P Zajac
- Department of Chemistry, University of Minnesota Minneapolis MN 55455 USA
- Chemical Theory Center, University of Minnesota Minneapolis MN 55455 USA
| | - Praveen Muralikrishnan
- Department of Chemical Engineering and Materials Science, University of Minnesota Minneapolis MN 55455 USA
- Chemical Theory Center, University of Minnesota Minneapolis MN 55455 USA
| | - Idris Tohidian
- Department of Chemical Engineering, Michigan Technological University Houghton MI 49931 USA
| | - Xianci Zeng
- Department of Chemical Engineering, University of Massachusetts Amherst MA 01003 USA
| | - Caryn L Heldt
- Department of Chemical Engineering, Michigan Technological University Houghton MI 49931 USA
| | - Sarah L Perry
- Department of Chemical Engineering, University of Massachusetts Amherst MA 01003 USA
| | - Sapna Sarupria
- Department of Chemistry, University of Minnesota Minneapolis MN 55455 USA
- Chemical Theory Center, University of Minnesota Minneapolis MN 55455 USA
| |
Collapse
|
11
|
Eberhart ME, Alexandrova AN, Ajmera P, Bím D, Chaturvedi SS, Vargas S, Wilson TR. Methods for Theoretical Treatment of Local Fields in Proteins and Enzymes. Chem Rev 2025; 125:3772-3813. [PMID: 39993955 DOI: 10.1021/acs.chemrev.4c00471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/26/2025]
Abstract
Electric fields generated by protein scaffolds are crucial in enzymatic catalysis. This review surveys theoretical approaches for detecting, analyzing, and comparing electric fields, electrostatic potentials, and their effects on the charge density within enzyme active sites. Pioneering methods like the empirical valence bond approach rely on evaluating ionic and covalent resonance forms influenced by the field. Strategies employing polarizable force fields also facilitate field detection. The vibrational Stark effect connects computational simulations to experimental Stark spectroscopy, enabling direct comparisons. We highlight how protein dynamics induce fluctuations in local fields, influencing enzyme activity. Recent techniques assess electric fields throughout the active site volume rather than only at specific bonds, and machine learning helps relate these global fields to reactivity. Quantum theory of atoms in molecules captures the entire electron density landscape, providing a chemically intuitive perspective on field-driven catalysis. Overall, these methodologies show protein-generated fields are highly dynamic and heterogeneous, and understanding both aspects is critical for elucidating enzyme mechanisms. This holistic view empowers rational enzyme engineering by tuning electric fields, promising new avenues in drug design, biocatalysis, and industrial applications. Future directions include incorporating electric fields as explicit design targets to enhance catalytic performance and biochemical functionalities.
Collapse
Affiliation(s)
- Mark E Eberhart
- Chemistry Department, Colorado School of Mines, 1500 Illinois Street, Golden, Colorado 80401, United States
| | - Anastassia N Alexandrova
- Department of Chemistry, and Biochemistry, University of California, Los Angeles, Los Angeles, California 90095, United States
| | - Pujan Ajmera
- Department of Chemistry, and Biochemistry, University of California, Los Angeles, Los Angeles, California 90095, United States
| | - Daniel Bím
- Department of Physical Chemistry, University of Chemistry and Technology, Prague 166 28, Czech Republic
| | - Shobhit S Chaturvedi
- Department of Chemistry, and Biochemistry, University of California, Los Angeles, Los Angeles, California 90095, United States
| | - Santiago Vargas
- Department of Chemistry, and Biochemistry, University of California, Los Angeles, Los Angeles, California 90095, United States
| | - Timothy R Wilson
- Chemistry Department, Colorado School of Mines, 1500 Illinois Street, Golden, Colorado 80401, United States
| |
Collapse
|
12
|
Aranganathan A, Gu X, Wang D, Vani BP, Tiwary P. Modeling Boltzmann-weighted structural ensembles of proteins using artificial intelligence-based methods. Curr Opin Struct Biol 2025; 91:103000. [PMID: 39923288 PMCID: PMC12011212 DOI: 10.1016/j.sbi.2025.103000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2024] [Revised: 01/09/2025] [Accepted: 01/20/2025] [Indexed: 02/11/2025]
Abstract
This review highlights recent advances in AI-driven methods for generating Boltzmann-weighted structural ensembles, which are crucial for understanding biomolecular dynamics and drug discovery. With the rise of deep learning models such as AlphaFold2, there has been a shift toward more accurate and efficient sampling of structural ensembles. The review discusses the integration of AI with traditional molecular dynamics techniques as well as experiments, the challenges of conformational sampling, and future directions for AI-driven research in structural biology, particularly in drug discovery and protein dynamics.
Collapse
Affiliation(s)
- Akashnathan Aranganathan
- Biophysics Program, University of Maryland, College Park, 20742, MD, USA; Institute of Physical Science and Technology, University of Maryland, College Park, 20742, MD, USA
| | - Xinyu Gu
- Institute of Physical Science and Technology, University of Maryland, College Park, 20742, MD, USA; University of Maryland Institute for Health Computing, Bethesda, 20852, MD, USA.
| | - Dedi Wang
- Genentech, 1 DNA Way, South San Francisco, 94080, CA, USA
| | - Bodhi P Vani
- Genentech, 1 DNA Way, South San Francisco, 94080, CA, USA
| | - Pratyush Tiwary
- Institute of Physical Science and Technology, University of Maryland, College Park, 20742, MD, USA; University of Maryland Institute for Health Computing, Bethesda, 20852, MD, USA; Department of Chemistry and Biochemistry, University of Maryland, College Park, 20742, MD, USA.
| |
Collapse
|
13
|
Kuryla D, Csányi G, van Duin ACT, Michaelides A. Efficient exploration of reaction pathways using reaction databases and active learning. J Chem Phys 2025; 162:114122. [PMID: 40116310 DOI: 10.1063/5.0235715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2024] [Accepted: 02/24/2025] [Indexed: 03/23/2025] Open
Abstract
The fast and accurate simulation of chemical reactions is a major goal of computational chemistry. Recently, the pursuit of this goal has been aided by machine learning interatomic potentials (MLIPs), which provide energies and forces at quantum mechanical accuracy but at a fraction of the cost of the reference quantum mechanical calculations. Assembling the training set of relevant configurations is key to building the MLIP. Here, we demonstrate two approaches to training reactive MLIPs based on reaction pathway information. One approach exploits reaction datasets containing reactant, product, and transition state structures. Using an SN2 reaction dataset, we accurately locate reaction pathways and transition state geometries of up to 170 unseen reactions. In another approach, which does not depend on data availability, we present an efficient active learning procedure that yields an accurate MLIP and converged minimum energy path given only the reaction end point structures, avoiding quantum mechanics driven reaction pathway search at any stage of training set construction. We demonstrate this procedure on an SN2 reaction in the gas phase and with a small number of solvating water molecules, predicting reaction barriers within 20 meV of the reference quantum chemistry method. We then apply the active learning procedure on a more complex reaction involving a nucleophilic aromatic substitution and proton transfer, comparing the results against the reactive ReaxFF force field. Our active learning procedure, in addition to rapidly finding reaction paths for individual reactions, provides an approach to building large reaction path databases for training transferable reactive machine learning potentials.
Collapse
Affiliation(s)
- Domantas Kuryla
- Yusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, United Kingdom
| | - Gábor Csányi
- Engineering Laboratory, University of Cambridge, Trumpington St and JJ Thomson Ave, Cambridge, United Kingdom
| | - Adri C T van Duin
- Department of Mechanical Engineering, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Angelos Michaelides
- Yusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, United Kingdom
| |
Collapse
|
14
|
Cui Q. Machine learning in molecular biophysics: Protein allostery, multi-level free energy simulations, and lipid phase transitions. BIOPHYSICS REVIEWS 2025; 6:011305. [PMID: 39957913 PMCID: PMC11825181 DOI: 10.1063/5.0248589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/12/2024] [Accepted: 01/14/2025] [Indexed: 02/18/2025]
Abstract
Machine learning (ML) techniques have been making major impacts on all areas of science and engineering, including biophysics. In this review, we discuss several applications of ML to biophysical problems based on our recent research. The topics include the use of ML techniques to identify hotspot residues in allosteric proteins using deep mutational scanning data and to analyze how mutations of these hotspots perturb co-operativity in the framework of a statistical thermodynamic model, to improve the accuracy of free energy simulations by integrating data from different levels of potential energy functions, and to determine the phase transition temperature of lipid membranes. Through these examples, we illustrate the unique value of ML in extracting patterns or parameters from complex data sets, as well as the remaining limitations. By implementing the ML approaches in the context of physically motivated models or computational frameworks, we are able to gain a deeper mechanistic understanding or better convergence in numerical simulations. We conclude by briefly discussing how the introduced models can be further expanded to tackle more complex problems.
Collapse
Affiliation(s)
- Qiang Cui
- Author to whom correspondence should be addressed:
| |
Collapse
|
15
|
Poltavsky I, Puleva M, Charkin-Gorbulin A, Fonseca G, Batatia I, Browning NJ, Chmiela S, Cui M, Frank JT, Heinen S, Huang B, Käser S, Kabylda A, Khan D, Müller C, Price AJA, Riedmiller K, Töpfer K, Ko TW, Meuwly M, Rupp M, Csányi G, Anatole von Lilienfeld O, Margraf JT, Müller KR, Tkatchenko A. Crash testing machine learning force fields for molecules, materials, and interfaces: molecular dynamics in the TEA challenge 2023. Chem Sci 2025; 16:3738-3754. [PMID: 39911337 PMCID: PMC11791520 DOI: 10.1039/d4sc06530a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2024] [Accepted: 12/25/2024] [Indexed: 02/07/2025] Open
Abstract
We present the second part of the rigorous evaluation of modern machine learning force fields (MLFFs) within the TEA Challenge 2023. This study provides an in-depth analysis of the performance of MACE, SO3krates, sGDML, SOAP/GAP, and FCHL19* in modeling molecules, molecule-surface interfaces, and periodic materials. We compare observables obtained from molecular dynamics (MD) simulations using different MLFFs under identical conditions. Where applicable, density-functional theory (DFT) or experiment serves as a reference to reliably assess the performance of the ML models. In the absence of DFT benchmarks, we conduct a comparative analysis based on results from various MLFF architectures. Our findings indicate that, at the current stage of MLFF development, the choice of ML model is in the hands of the practitioner. When a problem falls within the scope of a given MLFF architecture, the resulting simulations exhibit weak dependency on the specific architecture used. Instead, emphasis should be placed on developing complete, reliable, and representative training datasets. Nonetheless, long-range noncovalent interactions remain challenging for all MLFF models, necessitating special caution in simulations of physical systems where such interactions are prominent, such as molecule-surface interfaces. The findings presented here reflect the state of MLFF models as of October 2023.
Collapse
Affiliation(s)
- Igor Poltavsky
- Department of Physics and Materials Science, University of Luxembourg L-1511 Luxembourg Luxembourg
| | - Mirela Puleva
- Department of Physics and Materials Science, University of Luxembourg L-1511 Luxembourg Luxembourg
- Institute for Advanced Studies, University of Luxembourg Campus Belval L-4365 Esch-sur-Alzette Luxembourg
| | - Anton Charkin-Gorbulin
- Department of Physics and Materials Science, University of Luxembourg L-1511 Luxembourg Luxembourg
- Laboratory for Chemistry of Novel Materials, University of Mons B-7000 Mons Belgium
| | - Grégory Fonseca
- Department of Physics and Materials Science, University of Luxembourg L-1511 Luxembourg Luxembourg
| | - Ilyes Batatia
- Department of Engineering, University of Cambridge Trumpington Street Cambridge CB2 1PZ UK
| | | | - Stefan Chmiela
- Machine Learning Group, Technical University Berlin Berlin Germany
- BIFOLD, Berlin Institute for the Foundations of Learning and Data Berlin Germany
| | - Mengnan Cui
- Fritz-Haber-Institut der Max-Planck-Gesellschaft Berlin Germany
| | - J Thorben Frank
- Machine Learning Group, Technical University Berlin Berlin Germany
- BIFOLD, Berlin Institute for the Foundations of Learning and Data Berlin Germany
| | - Stefan Heinen
- Vector Institute for Artificial Intelligence Toronto ON M5S 1M1 Canada
| | - Bing Huang
- Wuhan University, Department of Chemistry and Molecular Sciences 430072 Wuhan China
| | - Silvan Käser
- Department of Chemistry, University of Basel Klingelbergstrasse 80 CH-4056 Basel Switzerland
| | - Adil Kabylda
- Department of Physics and Materials Science, University of Luxembourg L-1511 Luxembourg Luxembourg
| | - Danish Khan
- Vector Institute for Artificial Intelligence Toronto ON M5S 1M1 Canada
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto St. George Campus Toronto ON Canada
| | - Carolin Müller
- Friedrich-Alexander-Universität Erlangen-Nürnberg, Computer-Chemistry-Center Nägelsbachstraße 25 91052 Erlangen Germany
| | - Alastair J A Price
- Department of Chemistry, University of Toronto St. George campus Toronto ON Canada
- Acceleration Consortium, University of Toronto 80 St George St Toronto ON M5S 3H6 Canada
| | - Kai Riedmiller
- Heidelberg Institute for Theoretical Studies Heidelberg Germany
| | - Kai Töpfer
- Department of Chemistry, University of Basel Klingelbergstrasse 80 CH-4056 Basel Switzerland
| | - Tsz Wai Ko
- Department of NanoEngineering, University of California San Diego 9500 Gilman Dr, Mail Code 0448 La Jolla CA 92093-0448 USA
| | - Markus Meuwly
- Department of Chemistry, University of Basel Klingelbergstrasse 80 CH-4056 Basel Switzerland
| | - Matthias Rupp
- Luxembourg Institute of Science and Technology (LIST) L-4362 Esch-sur-Alzette Luxembourg
| | - Gábor Csányi
- Department of Engineering, University of Cambridge Trumpington Street Cambridge CB2 1PZ UK
| | - O Anatole von Lilienfeld
- Machine Learning Group, Technical University Berlin Berlin Germany
- BIFOLD, Berlin Institute for the Foundations of Learning and Data Berlin Germany
- Vector Institute for Artificial Intelligence Toronto ON M5S 1M1 Canada
- Department of Chemistry, University of Toronto St. George campus Toronto ON Canada
- Acceleration Consortium, University of Toronto 80 St George St Toronto ON M5S 3H6 Canada
- Department of Materials Science and Engineering, University of Toronto St. George campus Toronto ON Canada
- Department of Physics, University of Toronto, St. George campus Toronto ON Canada
| | - Johannes T Margraf
- University of Bayreuth, Bavarian Center for Battery Technology (BayBatt) Bayreuth Germany
| | - Klaus-Robert Müller
- Machine Learning Group, Technical University Berlin Berlin Germany
- BIFOLD, Berlin Institute for the Foundations of Learning and Data Berlin Germany
- University of Bayreuth, Bavarian Center for Battery Technology (BayBatt) Bayreuth Germany
- Department of Artificial Intelligence, Korea University Seoul South Korea
- Max Planck Institut für Informatik Saarbrücken Germany
- Google DeepMind Berlin Germany
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg L-1511 Luxembourg Luxembourg
- Institute for Advanced Studies, University of Luxembourg Campus Belval L-4365 Esch-sur-Alzette Luxembourg
| |
Collapse
|
16
|
Seute L, Hartmann E, Stühmer J, Gräter F. Grappa - a machine learned molecular mechanics force field. Chem Sci 2025; 16:2907-2930. [PMID: 39822899 PMCID: PMC11734696 DOI: 10.1039/d4sc05465b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2024] [Accepted: 12/13/2024] [Indexed: 01/19/2025] Open
Abstract
Simulating large molecular systems over long timescales requires force fields that are both accurate and efficient. In recent years, E(3) equivariant neural networks have lifted the tension between computational efficiency and accuracy of force fields, but they are still several orders of magnitude more expensive than established molecular mechanics (MM) force fields. Here, we propose Grappa, a machine learning framework to predict MM parameters from the molecular graph, employing a graph attentional neural network and a transformer with symmetry-preserving positional encoding. The resulting Grappa force field outperforms tabulated and machine-learned MM force fields in terms of accuracy at the same computational efficiency and can be used in existing Molecular Dynamics (MD) engines like GROMACS and OpenMM. It predicts energies and forces of small molecules, peptides, and RNA at state-of-the-art MM accuracy, while also reproducing experimentally measured values for J-couplings. With its simple input features and high data-efficiency, Grappa is well suited for extensions to uncharted regions of chemical space, which we show on the example of peptide radicals. We demonstrate Grappa's transferability to macromolecules in MD simulations from a small fast-folding protein up to a whole virus particle. Our force field sets the stage for biomolecular simulations closer to chemical accuracy, but with the same computational cost as established protein force fields.
Collapse
Affiliation(s)
- Leif Seute
- Heidelberg Institute for Theoretical Studies Schloss-Wolfsbrunnenweg 35 69118 Heidelberg Germany
- Interdisciplinary Center for Scientific Computing, Heidelberg University INF 205 69120 Heidelberg Germany
| | - Eric Hartmann
- Heidelberg Institute for Theoretical Studies Schloss-Wolfsbrunnenweg 35 69118 Heidelberg Germany
- Interdisciplinary Center for Scientific Computing, Heidelberg University INF 205 69120 Heidelberg Germany
| | - Jan Stühmer
- Heidelberg Institute for Theoretical Studies Schloss-Wolfsbrunnenweg 35 69118 Heidelberg Germany
- Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology Kaiserstr. 12 76131 Karlsruhe Germany
| | - Frauke Gräter
- Heidelberg Institute for Theoretical Studies Schloss-Wolfsbrunnenweg 35 69118 Heidelberg Germany
- Interdisciplinary Center for Scientific Computing, Heidelberg University INF 205 69120 Heidelberg Germany
- Max Planck Institute for Polymer Research Ackermannweg 10 55128 Mainz Germany
| |
Collapse
|
17
|
David R, de la Puente M, Gomez A, Anton O, Stirnemann G, Laage D. ArcaNN: automated enhanced sampling generation of training sets for chemically reactive machine learning interatomic potentials. DIGITAL DISCOVERY 2025; 4:54-72. [PMID: 39553851 PMCID: PMC11563209 DOI: 10.1039/d4dd00209a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2024] [Accepted: 10/21/2024] [Indexed: 11/19/2024]
Abstract
The emergence of artificial intelligence is profoundly impacting computational chemistry, particularly through machine-learning interatomic potentials (MLIPs). Unlike traditional potential energy surface representations, MLIPs overcome the conventional computational scaling limitations by offering an effective combination of accuracy and efficiency for calculating atomic energies and forces to be used in molecular simulations. These MLIPs have significantly enhanced molecular simulations across various applications, including large-scale simulations of materials, interfaces, chemical reactions, and beyond. Despite these advances, the construction of training datasets-a critical component for the accuracy of MLIPs-has not received proportional attention, especially in the context of chemical reactivity, which depends on rare barrier-crossing events that are not easily included in the datasets. Here we address this gap by introducing ArcaNN, a comprehensive framework designed for generating training datasets for reactive MLIPs. ArcaNN employs a concurrent learning approach combined with advanced sampling techniques to ensure an accurate representation of high-energy geometries. The framework integrates automated processes for iterative training, exploration, new configuration selection, and energy and force labeling, all while ensuring reproducibility and documentation. We demonstrate ArcaNN's capabilities through two paradigm reactions: a nucleophilic substitution and a Diels-Alder reaction. These examples showcase its effectiveness, the uniformly low error of the resulting MLIP everywhere along the chemical reaction coordinate, and its potential for broad applications in reactive molecular dynamics. Finally, we provide guidelines for assessing the quality of MLIPs in reactive systems.
Collapse
Affiliation(s)
- Rolf David
- PASTEUR, Département de Chimie, École Normale Supérieure, PSL University, Sorbonne Université, CNRS 75005 Paris France
| | - Miguel de la Puente
- PASTEUR, Département de Chimie, École Normale Supérieure, PSL University, Sorbonne Université, CNRS 75005 Paris France
| | - Axel Gomez
- PASTEUR, Département de Chimie, École Normale Supérieure, PSL University, Sorbonne Université, CNRS 75005 Paris France
| | - Olaia Anton
- PASTEUR, Département de Chimie, École Normale Supérieure, PSL University, Sorbonne Université, CNRS 75005 Paris France
| | - Guillaume Stirnemann
- PASTEUR, Département de Chimie, École Normale Supérieure, PSL University, Sorbonne Université, CNRS 75005 Paris France
| | - Damien Laage
- PASTEUR, Département de Chimie, École Normale Supérieure, PSL University, Sorbonne Université, CNRS 75005 Paris France
| |
Collapse
|
18
|
Rufa D, Fass J, Chodera JD. Fine-tuning molecular mechanics force fields to experimental free energy measurements. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.01.06.631610. [PMID: 39829785 PMCID: PMC11741335 DOI: 10.1101/2025.01.06.631610] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/22/2025]
Abstract
Alchemical free energy methods using molecular mechanics (MM) force fields are essential tools for predicting thermodynamic properties of small molecules, especially via free energy calculations that can estimate quantities relevant for drug discovery such as affinities, selectivities, the impact of target mutations, and ADMET properties. While traditional MM forcefields rely on hand-crafted, discrete atom types and parameters, modern approaches based on graph neural networks (GNNs) learn continuous embedding vectors that represent chemical environments from which MM parameters can be generated. Excitingly, GNN parameterization approaches provide a fully end-to-end differentiable model that offers the possibility of systematically improving these models using experimental data. In this study, we treat a pretrained GNN force field-here, espaloma-0.3.2-as a foundation simulation model and fine-tune its charge model using limited quantities of experimental hydration free energy data, with the goal of assessing the degree to which this can systematically improve the prediction of other related free energies. We demonstrate that a highly efficient "one-shot fine-tuning" method using an exponential (Zwanzig) reweighting free energy estimator can improve prediction accuracy without the need to resimulate molecular configurations. To achieve this "one-shot" improvement, we demonstrate the importance of using effective sample size (ESS) regularization strategies to retain good overlap between initial and fine-tuned force fields. Moreover, we show that leveraging low-rank projections of embedding vectors can achieve comparable accuracy improvements as higher-dimensional approaches in a variety of data-size regimes. Our results demonstrate that linearly-perturbative fine-tuning of foundation model electrostatic parameters to limited experimental data offers a cost-effective strategy that achieves state-of-the-art performance in predicting hydration free energies on the FreeSolv dataset.
Collapse
Affiliation(s)
- Dominic Rufa
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065
- Tri-Institutional PhD Program in Chemical Biology, Weill Cornell Graduate School of Medical Sciences, New York, NY 10065, USA
| | - Joshua Fass
- Computation, Relay Therapeutics, Cambridge, Massachusetts 02139, United States
| | - John D. Chodera
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065
| |
Collapse
|
19
|
Babaei M, Sadeghi A. On machine learnability of local contributions to interatomic potentials from density functional theory calculations. Sci Rep 2024; 14:31395. [PMID: 39733082 DOI: 10.1038/s41598-024-82990-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2024] [Accepted: 12/10/2024] [Indexed: 12/30/2024] Open
Abstract
Machine learning interatomic potentials, as a modern generation of classical force fields, take atomic environments as input and predict the corresponding atomic energies and forces. We challenge the commonly accepted assumption that the contribution of an atom can be learned from the short-range local environment of that atom. We employ density functional theory calculations to quantify the decay of the induced electron density and electrostatic potential in response to local perturbations throughout insulating, semiconducting and metallic samples of different dimensionalities. Molecules and thin layers are shown to fail keeping such disturbances localized. Therefore, the learnability of local atomic contributions, which guarantees scalability and transferability of a machine learning interatomic potential, is questionable in the case of molecules and low-dimensional samples. Similarly, the induced electrostatic effects due to substituted impurities or vacancy sites in a crystalline bulk are weakly damped and remain significant beyond several interatomic distances. However, geometric deformations in bulks are practically local within the first neighbors and induce a Yukawa-type electrostatic potential that exponentially vanishes. The practical importance of this finding is that it limits the application of the machine learning interatomic potentials to conformational search or thermal properties of bulk materials and so on, where only purely geometrical deformations are involved. Once chemically impactful defects like aliovalent impurities or vacancies are present, the interatomic potentials trained on local environments need to be corrected for long-range effects.
Collapse
Affiliation(s)
- Mahboobeh Babaei
- Department of Physics, Shahid Beheshti University, Tehran, 1983969411, Iran
| | - Ali Sadeghi
- Department of Physics, Shahid Beheshti University, Tehran, 1983969411, Iran.
- School of Nano Science, Institute for Research in Fundamental Sciences (IPM), P.O. Box 19395-5531, Tehran, Iran.
| |
Collapse
|
20
|
Kulichenko M, Nebgen B, Lubbers N, Smith JS, Barros K, Allen AEA, Habib A, Shinkle E, Fedik N, Li YW, Messerly RA, Tretiak S. Data Generation for Machine Learning Interatomic Potentials and Beyond. Chem Rev 2024; 124:13681-13714. [PMID: 39572011 DOI: 10.1021/acs.chemrev.4c00572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2024]
Abstract
The field of data-driven chemistry is undergoing an evolution, driven by innovations in machine learning models for predicting molecular properties and behavior. Recent strides in ML-based interatomic potentials have paved the way for accurate modeling of diverse chemical and structural properties at the atomic level. The key determinant defining MLIP reliability remains the quality of the training data. A paramount challenge lies in constructing training sets that capture specific domains in the vast chemical and structural space. This Review navigates the intricate landscape of essential components and integrity of training data that ensure the extensibility and transferability of the resulting models. We delve into the details of active learning, discussing its various facets and implementations. We outline different types of uncertainty quantification applied to atomistic data acquisition and the correlations between estimated uncertainty and true error. The role of atomistic data samplers in generating diverse and informative structures is highlighted. Furthermore, we discuss data acquisition via modified and surrogate potential energy surfaces as an innovative approach to diversify training data. The Review also provides a list of publicly available data sets that cover essential domains of chemical space.
Collapse
Affiliation(s)
- Maksim Kulichenko
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Benjamin Nebgen
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Nicholas Lubbers
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Justin S Smith
- NVIDIA Corporation, Santa Clara, California 95051, United States
| | - Kipton Barros
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Alice E A Allen
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Adela Habib
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Emily Shinkle
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Nikita Fedik
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Ying Wai Li
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Richard A Messerly
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Sergei Tretiak
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Center for Integrated Nanotechnologies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| |
Collapse
|
21
|
Wang T, He X, Li M, Li Y, Bi R, Wang Y, Cheng C, Shen X, Meng J, Zhang H, Liu H, Wang Z, Li S, Shao B, Liu TY. Ab initio characterization of protein molecular dynamics with AI 2BMD. Nature 2024; 635:1019-1027. [PMID: 39506110 PMCID: PMC11602711 DOI: 10.1038/s41586-024-08127-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 09/26/2024] [Indexed: 11/08/2024]
Abstract
Biomolecular dynamics simulation is a fundamental technology for life sciences research, and its usefulness depends on its accuracy and efficiency1-3. Classical molecular dynamics simulation is fast but lacks chemical accuracy4,5. Quantum chemistry methods such as density functional theory can reach chemical accuracy but cannot scale to support large biomolecules6. Here we introduce an artificial intelligence-based ab initio biomolecular dynamics system (AI2BMD) that can efficiently simulate full-atom large biomolecules with ab initio accuracy. AI2BMD uses a protein fragmentation scheme and a machine learning force field7 to achieve generalizable ab initio accuracy for energy and force calculations for various proteins comprising more than 10,000 atoms. Compared to density functional theory, it reduces the computational time by several orders of magnitude. With several hundred nanoseconds of dynamics simulations, AI2BMD demonstrated its ability to efficiently explore the conformational space of peptides and proteins, deriving accurate 3J couplings that match nuclear magnetic resonance experiments, and showing protein folding and unfolding processes. Furthermore, AI2BMD enables precise free-energy calculations for protein folding, and the estimated thermodynamic properties are well aligned with experiments. AI2BMD could potentially complement wet-lab experiments, detect the dynamic processes of bioactivities and enable biomedical research that is impossible to conduct at present.
Collapse
Affiliation(s)
| | | | | | - Yatao Li
- Microsoft Research, Beijing, China
| | - Ran Bi
- Microsoft Research, Beijing, China
| | | | | | | | | | - He Zhang
- Microsoft Research, Beijing, China
| | | | - Zun Wang
- Microsoft Research, Beijing, China
| | | | - Bin Shao
- Microsoft Research, Beijing, China.
| | | |
Collapse
|
22
|
Nandi A, Pandey P, Houston PL, Qu C, Yu Q, Conte R, Tkatchenko A, Bowman JM. Δ-Machine Learning to Elevate DFT-Based Potentials and a Force Field to the CCSD( T) Level Illustrated for Ethanol. J Chem Theory Comput 2024; 20:8807-8819. [PMID: 39361051 PMCID: PMC11500277 DOI: 10.1021/acs.jctc.4c00977] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2024] [Revised: 09/17/2024] [Accepted: 09/18/2024] [Indexed: 10/23/2024]
Abstract
Progress in machine learning has facilitated the development of potentials that offer both the accuracy of first-principles techniques and vast increases in the speed of evaluation. Recently, Δ-machine learning has been used to elevate the quality of a potential energy surface (PES) based on low-level, e.g., density functional theory (DFT) energies and gradients to close to the gold-standard coupled cluster level of accuracy. We have demonstrated the success of this approach for molecules, ranging in size from H3O+ to 15-atom acetyl-acetone and tropolone. These were all done using the B3LYP functional. Here, we investigate the generality of this approach for the PBE, M06, M06-2X, and PBE0 + MBD functionals, using ethanol as the example molecule. Linear regression with permutationally invariant polynomials is used to fit both low-level and correction PESs. These PESs are employed for standard RMSE analysis for training and test data sets, and then general fidelity tests such as energetics of stationary points, normal-mode frequencies, and torsional potentials are examined. We achieve similar improvements in all cases. Interestingly, we obtained significant improvement over DFT gradients where coupled cluster gradients were not used to correct the low-level PES. Finally, we present some results for correcting a recent molecular mechanics force field for ethanol and comment on the possible generality of this approach.
Collapse
Affiliation(s)
- Apurba Nandi
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Priyanka Pandey
- Department
of Chemistry and Cherry L. Emerson Center for Scientific Computation, Emory University, Atlanta, Georgia 30322, United States
| | - Paul L. Houston
- Department
of Chemistry and Chemical Biology, Cornell
University, Ithaca, New York 14853, United States
- Department
of Chemistry and Biochemistry, Georgia Institute
of Technology, Atlanta, Georgia 30332, United States
| | - Chen Qu
- Independent
Researcher, Toronto, Ontario M9B0E3, Canada
| | - Qi Yu
- Department
of Chemistry, Fudan University, Shanghai 200438, P. R. China
| | - Riccardo Conte
- Dipartimento
di Chimica, Università degli Studi
di Milano, via Golgi 19, 20133 Milano, Italy
| | - Alexandre Tkatchenko
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Joel M. Bowman
- Department
of Chemistry and Cherry L. Emerson Center for Scientific Computation, Emory University, Atlanta, Georgia 30322, United States
| |
Collapse
|
23
|
Antolović I, Vrabec J, Klajmon M. COSMOPharm: Drug-Polymer Compatibility of Pharmaceutical Amorphous Solid Dispersions from COSMO-SAC. Mol Pharm 2024; 21:4395-4415. [PMID: 39078049 PMCID: PMC11372840 DOI: 10.1021/acs.molpharmaceut.4c00342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/31/2024]
Abstract
The quantum mechanics-aided COSMO-SAC activity coefficient model is applied and systematically examined for predicting the thermodynamic compatibility of drugs and polymers. The drug-polymer compatibility is a key aspect in the rational selection of optimal polymeric carriers for pharmaceutical amorphous solid dispersions (ASD) that enhance drug bioavailability. The drug-polymer compatibility is evaluated in terms of both solubility and miscibility, calculated using standard thermodynamic equilibrium relations based on the activity coefficients predicted by COSMO-SAC. As inherent to COSMO-SAC, our approach relies only on quantum-mechanically derived σ-profiles of the considered molecular species and involves no parameter fitting to experimental data. All σ-profiles used were determined in this work, with those of the polymers being derived from their shorter oligomers by replicating the properties of their central monomer unit(s). Quantitatively, COSMO-SAC achieved an overall average absolute deviation of 13% in weight fraction drug solubility predictions compared to experimental data. Qualitatively, COSMO-SAC correctly categorized different polymer types in terms of their compatibility with drugs and provided meaningful estimations of the amorphous-amorphous phase separation. Furthermore, we analyzed the sensitivity of the COSMO-SAC results for ASD to different model configurations and σ-profiles of polymers. In general, while the free volume and dispersion terms exerted a limited effect on predictions, the structures of oligomers used to produce σ-profiles of polymers appeared to be more important, especially in the case of strongly interacting polymers. Explanations for these observations are provided. COSMO-SAC proved to be an efficient method for compatibility prediction and polymer screening in ASD, particularly in terms of its performance-cost ratio, as it relies only on first-principles calculations for the considered molecular species. The open-source nature of both COSMO-SAC and the Python-based tool COSMOPharm, developed in this work for predicting the API-polymer thermodynamic compatibility, invites interested readers to explore and utilize this method for further research or assistance in the design of pharmaceutical formulations.
Collapse
Affiliation(s)
- Ivan Antolović
- Thermodynamics, Technische Universität Berlin, Ernst-Reuter-Platz 1, 10587 Berlin, Germany
| | - Jadran Vrabec
- Thermodynamics, Technische Universität Berlin, Ernst-Reuter-Platz 1, 10587 Berlin, Germany
| | - Martin Klajmon
- Department of Physical Chemistry, University of Chemistry and Technology, Prague, Technická 5, 166 28 Prague 6, Czechia
| |
Collapse
|
24
|
Frank JT, Unke OT, Müller KR, Chmiela S. A Euclidean transformer for fast and stable machine learned force fields. Nat Commun 2024; 15:6539. [PMID: 39107296 PMCID: PMC11303804 DOI: 10.1038/s41467-024-50620-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 07/10/2024] [Indexed: 08/10/2024] Open
Abstract
Recent years have seen vast progress in the development of machine learned force fields (MLFFs) based on ab-initio reference calculations. Despite achieving low test errors, the reliability of MLFFs in molecular dynamics (MD) simulations is facing growing scrutiny due to concerns about instability over extended simulation timescales. Our findings suggest a potential connection between robustness to cumulative inaccuracies and the use of equivariant representations in MLFFs, but the computational cost associated with these representations can limit this advantage in practice. To address this, we propose a transformer architecture called SO3KRATES that combines sparse equivariant representations (Euclidean variables) with a self-attention mechanism that separates invariant and equivariant information, eliminating the need for expensive tensor products. SO3KRATES achieves a unique combination of accuracy, stability, and speed that enables insightful analysis of quantum properties of matter on extended time and system size scales. To showcase this capability, we generate stable MD trajectories for flexible peptides and supra-molecular structures with hundreds of atoms. Furthermore, we investigate the PES topology for medium-sized chainlike molecules (e.g., small peptides) by exploring thousands of minima. Remarkably, SO3KRATES demonstrates the ability to strike a balance between the conflicting demands of stability and the emergence of new minimum-energy conformations beyond the training data, which is crucial for realistic exploration tasks in the field of biochemistry.
Collapse
Affiliation(s)
- J Thorben Frank
- Machine Learning Group, TU Berlin, Berlin, Germany
- BIFOLD, Berlin Institute for the Foundations of Learning and Data, Berlin, Germany
| | | | - Klaus-Robert Müller
- Machine Learning Group, TU Berlin, Berlin, Germany.
- BIFOLD, Berlin Institute for the Foundations of Learning and Data, Berlin, Germany.
- Google DeepMind, Berlin, Germany.
- Department of Artificial Intelligence, Korea University, Seoul, Korea.
- Max Planck Institut für Informatik, Saarbrücken, Germany.
| | - Stefan Chmiela
- Machine Learning Group, TU Berlin, Berlin, Germany.
- BIFOLD, Berlin Institute for the Foundations of Learning and Data, Berlin, Germany.
| |
Collapse
|
25
|
Atz K, Cotos L, Isert C, Håkansson M, Focht D, Hilleke M, Nippa DF, Iff M, Ledergerber J, Schiebroek CCG, Romeo V, Hiss JA, Merk D, Schneider P, Kuhn B, Grether U, Schneider G. Prospective de novo drug design with deep interactome learning. Nat Commun 2024; 15:3408. [PMID: 38649351 PMCID: PMC11035696 DOI: 10.1038/s41467-024-47613-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 04/02/2024] [Indexed: 04/25/2024] Open
Abstract
De novo drug design aims to generate molecules from scratch that possess specific chemical and pharmacological properties. We present a computational approach utilizing interactome-based deep learning for ligand- and structure-based generation of drug-like molecules. This method capitalizes on the unique strengths of both graph neural networks and chemical language models, offering an alternative to the need for application-specific reinforcement, transfer, or few-shot learning. It enables the "zero-shot" construction of compound libraries tailored to possess specific bioactivity, synthesizability, and structural novelty. In order to proactively evaluate the deep interactome learning framework for protein structure-based drug design, potential new ligands targeting the binding site of the human peroxisome proliferator-activated receptor (PPAR) subtype gamma are generated. The top-ranking designs are chemically synthesized and computationally, biophysically, and biochemically characterized. Potent PPAR partial agonists are identified, demonstrating favorable activity and the desired selectivity profiles for both nuclear receptors and off-target interactions. Crystal structure determination of the ligand-receptor complex confirms the anticipated binding mode. This successful outcome positively advocates interactome-based de novo design for application in bioorganic and medicinal chemistry, enabling the creation of innovative bioactive molecules.
Collapse
Affiliation(s)
- Kenneth Atz
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Leandro Cotos
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Clemens Isert
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Maria Håkansson
- SARomics Biostructures AB, Medicon Village, SE-223 81, Lund, Sweden
| | - Dorota Focht
- SARomics Biostructures AB, Medicon Village, SE-223 81, Lund, Sweden
| | - Mattis Hilleke
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - David F Nippa
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, CH-4070, Basel, Switzerland
- Department of Pharmacy, Ludwig-Maximilians-Universität München, Butenandtstrasse 5, 81377, Munich, Germany
| | - Michael Iff
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Jann Ledergerber
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Carl C G Schiebroek
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Valentina Romeo
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, CH-4070, Basel, Switzerland
| | - Jan A Hiss
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Daniel Merk
- Department of Pharmacy, Ludwig-Maximilians-Universität München, Butenandtstrasse 5, 81377, Munich, Germany
| | - Petra Schneider
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Bernd Kuhn
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, CH-4070, Basel, Switzerland
| | - Uwe Grether
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, CH-4070, Basel, Switzerland
| | - Gisbert Schneider
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland.
| |
Collapse
|