1
|
Katzberger P, Hauswirth LM, Kuhn AS, Landrum GA, Riniker S. Rapid Access to Small Molecule Conformational Ensembles in Organic Solvents Enabled by Graph Neural Network-Based Implicit Solvent Model. J Am Chem Soc 2025; 147:13264-13275. [PMID: 40207982 PMCID: PMC12022995 DOI: 10.1021/jacs.4c17622] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2024] [Revised: 03/27/2025] [Accepted: 03/28/2025] [Indexed: 04/11/2025]
Abstract
Understanding and manipulating the conformational behavior of a molecule in different solvent environments is of great interest in the fields of drug discovery and organic synthesis. Molecular dynamics (MD) simulations with solvent molecules explicitly present are the gold standard to compute such conformational ensembles (within the accuracy of the underlying force field), complementing experimental findings and supporting their interpretation. However, conventional methods often face challenges related to computational cost (explicit solvent) or accuracy (implicit solvent). Here, we showcase how our graph neural network (GNN)-based implicit solvent (GNNIS) approach can be used to rapidly compute small molecule conformational ensembles in 39 common organic solvents reproducing explicit-solvent simulations with high accuracy. We validate this approach using nuclear magnetic resonance (NMR) measurements, thus identifying the conformers contributing most to the experimental observable. The method allows the time required to accurately predict conformational ensembles to be reduced from days to minutes while achieving results within one kBT of the experimental values.
Collapse
Affiliation(s)
- Paul Katzberger
- Department of Chemistry and
Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 2, Zürich 8093, Switzerland
| | - Lea M. Hauswirth
- Department of Chemistry and
Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 2, Zürich 8093, Switzerland
| | - Antonia S. Kuhn
- Department of Chemistry and
Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 2, Zürich 8093, Switzerland
| | - Gregory A. Landrum
- Department of Chemistry and
Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 2, Zürich 8093, Switzerland
| | - Sereina Riniker
- Department of Chemistry and
Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 2, Zürich 8093, Switzerland
| |
Collapse
|
2
|
Pultar F, Thürlemann M, Gordiy I, Doloszeski E, Riniker S. Neural Network Potential with Multiresolution Approach Enables Accurate Prediction of Reaction Free Energies in Solution. J Am Chem Soc 2025; 147:6835-6856. [PMID: 39961342 PMCID: PMC11869291 DOI: 10.1021/jacs.4c17015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2024] [Revised: 01/27/2025] [Accepted: 01/28/2025] [Indexed: 02/27/2025]
Abstract
We present the design and implementation of a novel neural network potential (NNP) and its combination with an electrostatic embedding scheme, commonly used within the context of hybrid quantum-mechanical/molecular-mechanical (QM/MM) simulations. Substitution of a computationally expensive QM Hamiltonian by an NNP with the same accuracy largely reduces the computational cost and enables efficient sampling in prospective MD simulations, the main limitation faced by traditional QM/MM setups. The model relies on the recently introduced anisotropic message passing (AMP) formalism to compute atomic interactions and encode symmetries found in QM systems. AMP is shown to be highly efficient in terms of both data and computational costs and can be readily scaled to sample systems involving more than 350 solute and 40,000 solvent atoms for hundreds of nanoseconds using umbrella sampling. Most deviations of AMP predictions from the underlying DFT ground truth lie within chemical accuracy (4.184 kJ mol-1). The performance and broad applicability of our approach are showcased by calculating the free-energy surface of alanine dipeptide, the preferred ligation states of nickel phosphine complexes, and dissociation free energies of charged pyridine and quinoline dimers. Results with this ML/MM approach show excellent agreement with experimental data and reach chemical accuracy in most cases. In contrast, free energies calculated with static DFT calculations paired with implicit solvent models or QM/MM MD simulations using cheaper semiempirical methods show up to ten times higher deviation from the experimental ground truth and sometimes even fail to reproduce qualitative trends.
Collapse
Affiliation(s)
| | | | - Igor Gordiy
- Department of Chemistry and
Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 2, Zürich 8093, Switzerland
| | - Eva Doloszeski
- Department of Chemistry and
Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 2, Zürich 8093, Switzerland
| | - Sereina Riniker
- Department of Chemistry and
Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 2, Zürich 8093, Switzerland
| |
Collapse
|
3
|
Geng X, Gu J, Qin G, Wang LW, Meng X. ABFML: A problem-oriented package for rapidly creating, screening, and optimizing new machine learning force fields. J Chem Phys 2025; 162:052502. [PMID: 39902684 DOI: 10.1063/5.0247559] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2024] [Accepted: 01/12/2025] [Indexed: 02/06/2025] Open
Abstract
Machine Learning Force Fields (MLFFs) require ongoing improvement and innovation to effectively address challenges across various domains. Developing MLFF models typically involves extensive screening, tuning, and iterative testing. However, existing packages based on a single mature descriptor or model are unsuitable for this process. Therefore, we developed a package named ABFML, based on PyTorch, which aims to promote MLFF innovation by providing developers with a rapid, efficient, and user-friendly tool for constructing, screening, and validating new force field models. Moreover, by leveraging standardized module operations and cutting-edge machine learning frameworks, developers can swiftly establish models. In addition, the platform can seamlessly transition to the graphics processing unit environments, enabling accelerated calculations and large-scale parallel simulations of molecular dynamics. In contrast to traditional from-scratch approaches for MLFF development, ABFML significantly lowers the barriers to developing force field models, thereby expediting innovation and application within the MLFF development domains.
Collapse
Affiliation(s)
- Xingze Geng
- College of Sciences, Northeastern University, Shenyang 110819, China
- Key Laboratory of Optoelectronic Materials and Devices, Institute of Semiconductors, Chinese Academy of Sciences, Beijing 100083, China
| | - Jianing Gu
- Institute of Materials Intelligent Technology, Liaoning Academy of Materials, Shenyang 110004, China
| | - Gaowu Qin
- Institute of Materials Intelligent Technology, Liaoning Academy of Materials, Shenyang 110004, China
- Key Laboratory for Anisotropy and Texture of Materials (MoE), School of Materials Science and Engineering, Northeastern University, Shenyang 110819, China
| | - Lin-Wang Wang
- Key Laboratory of Optoelectronic Materials and Devices, Institute of Semiconductors, Chinese Academy of Sciences, Beijing 100083, China
| | - Xiangying Meng
- College of Sciences, Northeastern University, Shenyang 110819, China
- Institute of Materials Intelligent Technology, Liaoning Academy of Materials, Shenyang 110004, China
| |
Collapse
|
4
|
Chen J, Gao Q, Huang M, Yu K. Application of modern artificial intelligence techniques in the development of organic molecular force fields. Phys Chem Chem Phys 2025; 27:2294-2319. [PMID: 39820957 DOI: 10.1039/d4cp02989e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2025]
Abstract
The molecular force field (FF) determines the accuracy of molecular dynamics (MD) and is one of the major bottlenecks that limits the application of MD in molecular design. Recently, artificial intelligence (AI) techniques, such as machine-learning potentials (MLPs), have been rapidly reshaping the landscape of MD. Meanwhile, organic molecular systems feature unique characteristics, and require more careful treatment in both model construction, optimization, and validation. While an accurate and generic organic molecular force field is still missing, significant progress has been made with the facilitation of AI, warranting a promising future. In this review, we provide an overview of the various types of AI techniques used in molecular FF development and discuss both the advantages and weaknesses of these methodologies. We show how AI methods provide unprecedented capabilities in many tasks such as potential fitting, atom typification, and automatic optimization. Meanwhile, it is also worth noting that more efforts are needed to improve the transferability of the model, develop a more comprehensive database, and establish more standardized validation procedures. With these discussions, we hope to inspire more efforts to solve the existing problems, eventually leading to the birth of next-generation generic organic FFs.
Collapse
Affiliation(s)
- Junmin Chen
- Institute of Materials Research (IMR), Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China.
- Tsinghua-Berkeley Shenzhen Institute (TBSI), Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China
| | - Qian Gao
- Institute of Materials Research (IMR), Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China.
| | - Miaofei Huang
- Institute of Materials Research (IMR), Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China.
| | - Kuang Yu
- Institute of Materials Research (IMR), Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China.
- Tsinghua-Berkeley Shenzhen Institute (TBSI), Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China
| |
Collapse
|
5
|
See TJ, Zhang D, Boley M, Chalmers DK. Graph Neural Network-Based Molecular Property Prediction with Patch Aggregation. J Chem Theory Comput 2024; 20:8886-8896. [PMID: 39356714 DOI: 10.1021/acs.jctc.4c00798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/04/2024]
Abstract
Graph neural networks (GNNs) have emerged as powerful tools for quantum chemical property prediction, leveraging the inherent graph structure of molecular systems. GNNs depend on an edge-to-node aggregation mechanism for combining edge representations into node representations. Unfortunately, existing learnable edge-to-node aggregation methods substantially increase the number of parameters and, thus, the computational cost relative to simple sum aggregation. Worse, as we report here, they often fail to improve predictive accuracy. We therefore propose a novel learnable edge-to-node aggregation mechanism that aims to improve the accuracy and parameter efficiency of GNNs in predicting molecular properties. The new mechanism, called "patch aggregation", is inspired by the Multi-Head Attention and Mixture of Experts machine learning techniques. We have incorporated the patch aggregation method into the specialized, state-of-the-art GNN models SchNet, DimeNet++, SphereNet, TensorNet, and VisNet and show that patch aggregation consistently outperforms existing learnable and nonlearnable aggregation techniques (sum, multilayer perceptron, softmax, and set transformer aggregation) in the prediction of molecular properties such as QM9 thermodynamic properties and MD17 molecular dynamics trajectory energies and forces. We also find that patch aggregation not only improves prediction accuracy but also is parameter-efficient, making it an attractive option for practical applications for which computational resources are limited. Further, we show that Patch aggregation can be applied across different GNN models. Overall, Patch aggregation is a powerful edge-to-node aggregation mechanism that improves the accuracy of molecular property predictions by GNNs.
Collapse
Affiliation(s)
- Teng Jiek See
- Medicinal Chemistry, Monash Institute of Pharmaceutical Sciences, Monash University, 381 Royal Parade, Parkville, VIC 3068, Australia
| | - Daokun Zhang
- School of Computer Science, University of Nottingham Ningbo China, 199 Taikang East Road, Ningbo 315100, China
| | - Mario Boley
- Department of Data Science and AI, Faculty of Information Technology, Monash University, Clayton Campus, Building 63, 25 Exhibition Walk, VIC 3800, Australia
| | - David K Chalmers
- Medicinal Chemistry, Monash Institute of Pharmaceutical Sciences, Monash University, 381 Royal Parade, Parkville, VIC 3068, Australia
| |
Collapse
|
6
|
Takaba K, Friedman AJ, Cavender CE, Behara PK, Pulido I, Henry MM, MacDermott-Opeskin H, Iacovella CR, Nagle AM, Payne AM, Shirts MR, Mobley DL, Chodera JD, Wang Y. Machine-learned molecular mechanics force fields from large-scale quantum chemical data. Chem Sci 2024; 15:12861-12878. [PMID: 39148808 PMCID: PMC11322960 DOI: 10.1039/d4sc00690a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Accepted: 06/17/2024] [Indexed: 08/17/2024] Open
Abstract
The development of reliable and extensible molecular mechanics (MM) force fields-fast, empirical models characterizing the potential energy surface of molecular systems-is indispensable for biomolecular simulation and computer-aided drug design. Here, we introduce a generalized and extensible machine-learned MM force field, espaloma-0.3, and an end-to-end differentiable framework using graph neural networks to overcome the limitations of traditional rule-based methods. Trained in a single GPU-day to fit a large and diverse quantum chemical dataset of over 1.1 M energy and force calculations, espaloma-0.3 reproduces quantum chemical energetic properties of chemical domains highly relevant to drug discovery, including small molecules, peptides, and nucleic acids. Moreover, this force field maintains the quantum chemical energy-minimized geometries of small molecules and preserves the condensed phase properties of peptides and folded proteins, self-consistently parametrizing proteins and ligands to produce stable simulations leading to highly accurate predictions of binding free energies. This methodology demonstrates significant promise as a path forward for systematically building more accurate force fields that are easily extensible to new chemical domains of interest.
Collapse
Affiliation(s)
- Kenichiro Takaba
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center New York NY 10065 USA
- Pharmaceuticals Research Center, Advanced Drug Discovery, Asahi Kasei Pharma Corporation Shizuoka 410-2321 Japan
| | - Anika J Friedman
- Department of Chemical and Biological Engineering, University of Colorado Boulder Boulder CO 80309 USA
| | - Chapin E Cavender
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego 9500 Gilman Drive La Jolla CA 92093 USA
| | - Pavan Kumar Behara
- Center for Neurotherapeutics, Department of Pathology and Laboratory Medicine, University of California Irvine CA 92697 USA
| | - Iván Pulido
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center New York NY 10065 USA
| | - Michael M Henry
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center New York NY 10065 USA
| | | | - Christopher R Iacovella
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center New York NY 10065 USA
| | - Arnav M Nagle
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center New York NY 10065 USA
- Department of Bioengineering, University of California, Berkeley Berkeley CA 94720 USA
| | - Alexander Matthew Payne
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center New York NY 10065 USA
- Tri-Institutional PhD Program in Chemical Biology, Memorial Sloan Kettering Cancer Center New York 10065 USA
| | - Michael R Shirts
- Department of Chemical and Biological Engineering, University of Colorado Boulder Boulder CO 80309 USA
| | - David L Mobley
- Department of Pharmaceutical Sciences, University of California Irvine California 92697 USA
| | - John D Chodera
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center New York NY 10065 USA
| | - Yuanqing Wang
- Simons Center for Computational Physical Chemistry and Center for Data Science, New York University New York NY 10004 USA
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center New York NY 10065 USA
| |
Collapse
|
7
|
Chen G, Jaffrelot Inizan T, Plé T, Lagardère L, Piquemal JP, Maday Y. Advancing Force Fields Parameterization: A Directed Graph Attention Networks Approach. J Chem Theory Comput 2024; 20:5558-5569. [PMID: 38875012 DOI: 10.1021/acs.jctc.3c01421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2024]
Abstract
Force fields (FFs) are an established tool for simulating large and complex molecular systems. However, parametrizing FFs is a challenging and time-consuming task that relies on empirical heuristics, experimental data, and computational data. Recent efforts aim to automate the assignment of FF parameters using pre-existing databases and on-the-fly ab initio data. In this study, we propose a graph-based force field (GB-FFs) model to directly derive parameters for the Generalized Amber Force Field (GAFF) from chemical environments and research into the influence of functional forms. Our end-to-end parametrization approach predicts parameters by aggregating the basic information in directed molecular graphs, eliminating the need for expert-defined procedures and enhances the accuracy and transferability of GAFF across a broader range of molecular complexes. Simulation results are compared to the original GAFF parametrization. In practice, our results demonstrate an improved transferability of the model, showcasing its improved accuracy in modeling intermolecular and torsional interactions, as well as improved solvation free energies. The optimization approach developed in this work is fully applicable to other nonpolarizable FFs as well as to polarizable ones.
Collapse
Affiliation(s)
- Gong Chen
- Sorbonne Université, CNRS, Université Paris Cité, Laboratoire Jacques-Louis Lions (LJLL), UMR 7598 CNRS, 75005 Paris, France
| | - Théo Jaffrelot Inizan
- Sorbonne Université, Laboratoire de Chimie Théorique (LCT), UMR 7616 CNRS, 75005 Paris, France
| | - Thomas Plé
- Sorbonne Université, Laboratoire de Chimie Théorique (LCT), UMR 7616 CNRS, 75005 Paris, France
| | - Louis Lagardère
- Sorbonne Université, Laboratoire de Chimie Théorique (LCT), UMR 7616 CNRS, 75005 Paris, France
| | - Jean-Philip Piquemal
- Sorbonne Université, Laboratoire de Chimie Théorique (LCT), UMR 7616 CNRS, 75005 Paris, France
| | - Yvon Maday
- Sorbonne Université, CNRS, Université Paris Cité, Laboratoire Jacques-Louis Lions (LJLL), UMR 7598 CNRS, 75005 Paris, France
| |
Collapse
|
8
|
Lei YK, Yagi K, Sugita Y. Learning QM/MM potential using equivariant multiscale model. J Chem Phys 2024; 160:214109. [PMID: 38828815 DOI: 10.1063/5.0205123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2024] [Accepted: 05/09/2024] [Indexed: 06/05/2024] Open
Abstract
The machine learning (ML) method emerges as an efficient and precise surrogate model for high-level electronic structure theory. Its application has been limited to closed chemical systems without considering external potentials from the surrounding environment. To address this limitation and incorporate the influence of external potentials, polarization effects, and long-range interactions between a chemical system and its environment, the first two terms of the Taylor expansion of an electrostatic operator have been used as extra input to the existing ML model to represent the electrostatic environments. However, high-order electrostatic interaction is often essential to account for external potentials from the environment. The existing models based only on invariant features cannot capture significant distribution patterns of the external potentials. Here, we propose a novel ML model that includes high-order terms of the Taylor expansion of an electrostatic operator and uses an equivariant model, which can generate a high-order tensor covariant with rotations as a base model. Therefore, we can use the multipole-expansion equation to derive a useful representation by accounting for polarization and intermolecular interaction. Moreover, to deal with long-range interactions, we follow the same strategy adopted to derive long-range interactions between a target system and its environment media. Our model achieves higher prediction accuracy and transferability among various environment media with these modifications.
Collapse
Affiliation(s)
- Yao-Kun Lei
- Theoretical Molecular Science Laboratory, RIKEN Cluster for Pioneering Research, Wako, Saitama 351-0198, Japan
- Computational Biophysics Research Team, RIKEN Center for Computational Science, Kobe, Hyogo 650-0047, Japan
- RIKEN Interdisciplinary Theoretical and Mathematical Sciences Program (iTHEMS), Wako, Saitama 351-0198, Japan
| | - Kiyoshi Yagi
- Theoretical Molecular Science Laboratory, RIKEN Cluster for Pioneering Research, Wako, Saitama 351-0198, Japan
- Computational Biophysics Research Team, RIKEN Center for Computational Science, Kobe, Hyogo 650-0047, Japan
| | - Yuji Sugita
- Theoretical Molecular Science Laboratory, RIKEN Cluster for Pioneering Research, Wako, Saitama 351-0198, Japan
- Computational Biophysics Research Team, RIKEN Center for Computational Science, Kobe, Hyogo 650-0047, Japan
- RIKEN Interdisciplinary Theoretical and Mathematical Sciences Program (iTHEMS), Wako, Saitama 351-0198, Japan
- Laboratory for Biomolecular Function Simulation, RIKEN Center for Biosystems Dynamics Research, Kobe, Hyogo 650-0047, Japan
| |
Collapse
|
9
|
Kalayan J, Ramzan I, Williams CD, Bryce RA, Burton NA. A neural network potential based on pairwise resolved atomic forces and energies. J Comput Chem 2024; 45:1143-1151. [PMID: 38284556 DOI: 10.1002/jcc.27313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 12/23/2023] [Accepted: 01/05/2024] [Indexed: 01/30/2024]
Abstract
Molecular simulations have become a key tool in molecular and materials design. Machine learning (ML)-based potential energy functions offer the prospect of simulating complex molecular systems efficiently at quantum chemical accuracy. In previous work, we have introduced the ML-based PairF-Net approach to neural network potentials, that adopts a pairwise interatomic scheme to predicting forces within a molecular system. Here, we further develop the PairF-Net model to intrinsically incorporate energy conservation and couple the model to a molecular mechanical (MM) environment within the OpenMM package. The updated PairF-Net model yields energy and force predictions and dynamical distributions in good agreement with the rMD17 dataset of ten small organic molecules in the gas-phase. We further show that these in vacuo ML models of small molecules can be applied to force predictions in aqueous solution via hybrid ML/MM simulations. We present a new benchmark dataset for these ten molecules in solution, obtained from QM/MM simulations, which we denote as rMD17-aq (https://zenodo.org/records/10048644); and assess the ability of PairF-Net to reproduce the molecular energy, atomic forces and dynamical distributions of these solution conformations via ML/MM simulations.
Collapse
Affiliation(s)
- Jas Kalayan
- Division of Pharmacy and Optometry, School of Health Sciences, University of Manchester, Manchester, UK
| | - Ismaeel Ramzan
- Division of Pharmacy and Optometry, School of Health Sciences, University of Manchester, Manchester, UK
- Neural Circuits and Computations Unit, RIKEN Center for Brain Science, Wako, Japan
| | - Christopher D Williams
- Division of Pharmacy and Optometry, School of Health Sciences, University of Manchester, Manchester, UK
| | - Richard A Bryce
- Division of Pharmacy and Optometry, School of Health Sciences, University of Manchester, Manchester, UK
| | - Neil A Burton
- Department of Chemistry, University of Manchester, Manchester, UK
| |
Collapse
|
10
|
Demir Gİ, Tekin A. NICE-FF: A non-empirical, intermolecular, consistent, and extensible force field for nucleic acids and beyond. J Chem Phys 2023; 159:244117. [PMID: 38153156 DOI: 10.1063/5.0176641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 12/04/2023] [Indexed: 12/29/2023] Open
Abstract
A new non-empirical ab initio intermolecular force field (NICE-FF in buffered 14-7 potential form) has been developed for nucleic acids and beyond based on the dimer interaction energies (IEs) calculated at the spin component scaled-MI-second order Møller-Plesset perturbation theory. A fully automatic framework has been implemented for this purpose, capable of generating well-polished computational grids, performing the necessary ab initio calculations, conducting machine learning (ML) assisted force field (FF) parametrization, and extending existing FF parameters by incorporating new atom types. For the ML-assisted parametrization of NICE-FF, interaction energies of ∼18 000 dimer geometries (with IE < 0) were used, and the best fit gave a mean square deviation of about 0.46 kcal/mol. During this parametrization, atom types apparent in four deoxyribonucleic acid (DNA) bases have been first trained using the generated DNA base datasets. Both uracil and hypoxanthine, which contain the same atom types found in DNA bases, have been considered as test molecules. Three new atom types have been added to the DNA atom types by using IE datasets of both pyrazinamide and 9-methylhypoxanthine. Finally, the last test molecule, theophylline, has been selected, which contains already-fitted atom-type parameters. The performance of NICE-FF has been investigated on the S22 dataset, and it has been found that NICE-FF outperforms the well-known FFs by generating the most consistent IEs with the high-level ab initio ones. Moreover, NICE-FF has been integrated into our in-house developed crystal structure prediction (CSP) tool [called FFCASP (Fast and Flexible CrystAl Structure Predictor)], aiming to find the experimental crystal structures of all considered molecules. CSPs, which were performed up to 4 formula units (Z), resulted in NICE-FF being able to locate almost all the known experimental crystal structures with sufficiently low RMSD20 values to provide good starting points for density functional theory optimizations.
Collapse
Affiliation(s)
- Gözde İniş Demir
- Informatics Institute, Istanbul Technical University, 34469 Maslak, Istanbul, Türkiye
| | - Adem Tekin
- Informatics Institute, Istanbul Technical University, 34469 Maslak, Istanbul, Türkiye
- Research Institute for Fundamental Sciences (TÜBİTAK-TBAE), Kocaeli, Türkiye
| |
Collapse
|
11
|
Thürlemann M, Riniker S. Hybrid classical/machine-learning force fields for the accurate description of molecular condensed-phase systems. Chem Sci 2023; 14:12661-12675. [PMID: 38020395 PMCID: PMC10646964 DOI: 10.1039/d3sc04317g] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Accepted: 10/24/2023] [Indexed: 12/01/2023] Open
Abstract
Electronic structure methods offer in principle accurate predictions of molecular properties, however, their applicability is limited by computational costs. Empirical methods are cheaper, but come with inherent approximations and are dependent on the quality and quantity of training data. The rise of machine learning (ML) force fields (FFs) exacerbates limitations related to training data even further, especially for condensed-phase systems for which the generation of large and high-quality training datasets is difficult. Here, we propose a hybrid ML/classical FF model that is parametrized exclusively on high-quality ab initio data of dimers and monomers in vacuum but is transferable to condensed-phase systems. The proposed hybrid model combines our previous ML-parametrized classical model with ML corrections for situations where classical approximations break down, thus combining the robustness and efficiency of classical FFs with the flexibility of ML. Extensive validation on benchmarking datasets and experimental condensed-phase data, including organic liquids and small-molecule crystal structures, showcases how the proposed approach may promote FF development and unlock the full potential of classical FFs.
Collapse
Affiliation(s)
- Moritz Thürlemann
- Department of Chemistry and Applied Biosciences, ETH Zürich Vladimir-Prelog-Weg 2 Zürich 8093 Switzerland
| | - Sereina Riniker
- Department of Chemistry and Applied Biosciences, ETH Zürich Vladimir-Prelog-Weg 2 Zürich 8093 Switzerland
| |
Collapse
|
12
|
Hagg A, Kirschner KN. Open-Source Machine Learning in Computational Chemistry. J Chem Inf Model 2023; 63:4505-4532. [PMID: 37466636 PMCID: PMC10430767 DOI: 10.1021/acs.jcim.3c00643] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Indexed: 07/20/2023]
Abstract
The field of computational chemistry has seen a significant increase in the integration of machine learning concepts and algorithms. In this Perspective, we surveyed 179 open-source software projects, with corresponding peer-reviewed papers published within the last 5 years, to better understand the topics within the field being investigated by machine learning approaches. For each project, we provide a short description, the link to the code, the accompanying license type, and whether the training data and resulting models are made publicly available. Based on those deposited in GitHub repositories, the most popular employed Python libraries are identified. We hope that this survey will serve as a resource to learn about machine learning or specific architectures thereof by identifying accessible codes with accompanying papers on a topic basis. To this end, we also include computational chemistry open-source software for generating training data and fundamental Python libraries for machine learning. Based on our observations and considering the three pillars of collaborative machine learning work, open data, open source (code), and open models, we provide some suggestions to the community.
Collapse
Affiliation(s)
- Alexander Hagg
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Electrical Engineering, Mechanical Engineering and Technical Journalism, University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| | - Karl N. Kirschner
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Computer Science, University of Applied
Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| |
Collapse
|