1
|
Lao KU. Canonical coupled cluster binding benchmark for nanoscale noncovalent complexes at the hundred-atom scale. J Chem Phys 2024; 161:234103. [PMID: 39679503 DOI: 10.1063/5.0242359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2024] [Accepted: 11/27/2024] [Indexed: 12/17/2024] Open
Abstract
In this study, we introduce two datasets for nanoscale noncovalent binding, featuring complexes at the hundred-atom scale, benchmarked using coupled cluster with single, double, and perturbative triple [CCSD(T)] excitations extrapolated to the complete basis set (CBS) limit. The first dataset, L14, comprises 14 complexes with canonical CCSD(T)/CBS benchmarks, extending the applicability of CCSD(T)/CBS binding benchmarks to systems as large as 113 atoms. The second dataset, vL11, consists of 11 even larger complexes, evaluated using the local CCSD(T)/CBS method with stringent thresholds, covering systems up to 174 atoms. We compare binding energies obtained from local CCSD(T) and fixed-node diffusion Monte Carlo (FN-DMC), which have previously shown discrepancies exceeding the chemical accuracy threshold of 1 kcal/mol in large complexes, with the new canonical CCSD(T)/CBS results. While local CCSD(T)/CBS agrees with canonical CCSD(T)/CBS within binding uncertainties, FN-DMC consistently underestimates binding energies in π-π complexes by over 1 kcal/mol. Potential sources of error in canonical CCSD(T)/CBS are discussed, and we argue that the observed discrepancies are unlikely to originate from CCSD(T) itself. Instead, the fixed-node approximation in FN-DMC warrants further investigation to elucidate these binding discrepancies. Using these datasets as reference, we evaluate the performance of various electronic structure methods, semi-empirical approaches, and machine learning potentials for nanoscale complexes. Based on computational accuracy and stability across system sizes, we recommend MP2+aiD(CCD), PBE0+D4, and ωB97X-3c as reliable methods for investigating noncovalent interactions in nanoscale complexes, maintaining their promising performance observed in smaller systems.
Collapse
Affiliation(s)
- Ka Un Lao
- Department of Chemistry, Virginia Commonwealth University, Richmond, Virginia 23284, USA
| |
Collapse
|
2
|
Li H, Briccolani-Bandini L, Tirri B, Cardini G, Brémond E, Sancho-García JC, Adamo C. Evaluating Noncovalent Interactions in Halogenated Molecules with Double-Hybrid Functionals and a Dedicated Small Basis Set. J Phys Chem A 2024. [PMID: 39067011 DOI: 10.1021/acs.jpca.4c03007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
We present here an extension of our recently developed PBE-QIDH/DH-SVPD basis set to halogen atoms, with the aim of obtaining, for weakly interacting halogenated molecules, interaction energies close to those provided by a large basis set (def2-TZVPP) coupled to empirical dispersion potential. The core of our approach is the split-valence basis set, DH-SVPD, that has been developed for F, Cl, Br, and I atoms using a self-consistent formula, containing only energy terms computed for dimers and the corresponding monomers at the same level of theory. The basis set developed considering four systems, one for each halogen atoms, has been then tested on the X40, X4 × 10 benchmarks as well as on other two, less standard, data sets. Finally, a large system (380 atoms) has been also considered as a "crash" test. Our results show that the simple and nonempirical PBE-QIDH/DH-SVPD approach is able to provide accurate results for interaction energies of all the considered systems and can thus be considered as a cheaper alternative to DH functionals paired with empirical dispersion corrections and a large basis set of triple-ζ quality.
Collapse
Affiliation(s)
- Hanwei Li
- Chimie ParisTech, PSL Research University, CNRS, Institute of Chemistry for Health and Life Sciences, F-75005 Paris, France
| | - Lorenzo Briccolani-Bandini
- Dipartimento di Chimica "Ugo Schiff", Università degli Studi di Firenze, Via della Lastruccia 3, Sesto Fiorentino 50019, Italy
| | - Bernardino Tirri
- Chimie ParisTech, PSL Research University, CNRS, Institute of Chemistry for Health and Life Sciences, F-75005 Paris, France
| | - Gianni Cardini
- Dipartimento di Chimica "Ugo Schiff", Università degli Studi di Firenze, Via della Lastruccia 3, Sesto Fiorentino 50019, Italy
| | - Eric Brémond
- ITODYS, CNRS, Université de Paris, Paris F-75006, France
| | | | - Carlo Adamo
- Chimie ParisTech, PSL Research University, CNRS, Institute of Chemistry for Health and Life Sciences, F-75005 Paris, France
| |
Collapse
|
3
|
Tarek Ibrahim M, Wait E, Ren P. Quantum Mechanics Characterization of Non-Covalent Interaction in Nucleotide Fragments. Molecules 2024; 29:3258. [PMID: 39064837 PMCID: PMC11279843 DOI: 10.3390/molecules29143258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Revised: 07/03/2024] [Accepted: 07/06/2024] [Indexed: 07/28/2024] Open
Abstract
Accurate calculation of non-covalent interaction energies in nucleotides is crucial for understanding the driving forces governing nucleic acid structure and function, as well as developing advanced molecular mechanics forcefields or machine learning potentials tailored to nucleic acids. Here, we dissect the nucleotides' structure into three main constituents: nucleobases (A, G, C, T, and U), sugar moieties (ribose and deoxyribose), and phosphate group. The interactions among these fragments and between fragments and water were analyzed. Different quantum mechanical methods were compared for their accuracy in capturing the interaction energy. The non-covalent interaction energy was decomposed into electrostatics, exchange-repulsion, dispersion, and induction using two ab initio methods: Symmetry-Adapted Perturbation Theory (SAPT) and Absolutely Localized Molecular Orbitals (ALMO). These calculations provide a benchmark for different QM methods, in addition to providing a valuable understanding of the roles of various intermolecular forces in hydrogen bonding and aromatic stacking. With SAPT, a higher theory level and/or larger basis set did not necessarily give more accuracy. It is hard to know which combination would be best for a given system. In contrast, ALMO EDA2 did not show dependence on theory level or basis set; additionally, it is faster.
Collapse
Affiliation(s)
- Mayar Tarek Ibrahim
- Department of Biomedical Engineering, The University of Texas at Austin, Austin, TX 78712, USA;
| | - Elizabeth Wait
- Interdisciplinary Life Sciences Graduate Program, The University of Texas at Austin, Austin, TX 78712, USA;
| | - Pengyu Ren
- Department of Biomedical Engineering, The University of Texas at Austin, Austin, TX 78712, USA;
- Interdisciplinary Life Sciences Graduate Program, The University of Texas at Austin, Austin, TX 78712, USA;
| |
Collapse
|
4
|
Aldossary A, Campos-Gonzalez-Angulo JA, Pablo-García S, Leong SX, Rajaonson EM, Thiede L, Tom G, Wang A, Avagliano D, Aspuru-Guzik A. In Silico Chemical Experiments in the Age of AI: From Quantum Chemistry to Machine Learning and Back. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2024; 36:e2402369. [PMID: 38794859 DOI: 10.1002/adma.202402369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 04/28/2024] [Indexed: 05/26/2024]
Abstract
Computational chemistry is an indispensable tool for understanding molecules and predicting chemical properties. However, traditional computational methods face significant challenges due to the difficulty of solving the Schrödinger equations and the increasing computational cost with the size of the molecular system. In response, there has been a surge of interest in leveraging artificial intelligence (AI) and machine learning (ML) techniques to in silico experiments. Integrating AI and ML into computational chemistry increases the scalability and speed of the exploration of chemical space. However, challenges remain, particularly regarding the reproducibility and transferability of ML models. This review highlights the evolution of ML in learning from, complementing, or replacing traditional computational chemistry for energy and property predictions. Starting from models trained entirely on numerical data, a journey set forth toward the ideal model incorporating or learning the physical laws of quantum mechanics. This paper also reviews existing computational methods and ML models and their intertwining, outlines a roadmap for future research, and identifies areas for improvement and innovation. Ultimately, the goal is to develop AI architectures capable of predicting accurate and transferable solutions to the Schrödinger equation, thereby revolutionizing in silico experiments within chemistry and materials science.
Collapse
Affiliation(s)
- Abdulrahman Aldossary
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
| | | | - Sergio Pablo-García
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
- Department of Computer Science, University of Toronto, 40 St. George Street, Toronto, ON, M5S 2E4, Canada
| | - Shi Xuan Leong
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
| | - Ella Miray Rajaonson
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
- Vector Institute for Artificial Intelligence, 661 University Ave. Suite 710, Toronto, ON, M5G 1M1, Canada
| | - Luca Thiede
- Department of Computer Science, University of Toronto, 40 St. George Street, Toronto, ON, M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, 661 University Ave. Suite 710, Toronto, ON, M5G 1M1, Canada
| | - Gary Tom
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
- Vector Institute for Artificial Intelligence, 661 University Ave. Suite 710, Toronto, ON, M5G 1M1, Canada
| | - Andrew Wang
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
| | - Davide Avagliano
- Chimie ParisTech, PSL University, CNRS, Institute of Chemistry for Life and Health Sciences (iCLeHS UMR 8060), Paris, F-75005, France
| | - Alán Aspuru-Guzik
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
- Department of Computer Science, University of Toronto, 40 St. George Street, Toronto, ON, M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, 661 University Ave. Suite 710, Toronto, ON, M5G 1M1, Canada
- Department of Materials Science & Engineering, University of Toronto, 184 College St., Toronto, ON, M5S 3E4, Canada
- Department of Chemical Engineering & Applied Chemistry, University of Toronto, 200 College St., Toronto, ON, M5S 3E5, Canada
- Lebovic Fellow, Canadian Institute for Advanced Research (CIFAR), 66118 University Ave., Toronto, M5G 1M1, Canada
- Acceleration Consortium, 80 St George St, Toronto, M5S 3H6, Canada
| |
Collapse
|
5
|
Low K, Coote ML, Izgorodina EI. Accurate Prediction of Three-Body Intermolecular Interactions via Electron Deformation Density-Based Machine Learning. J Chem Theory Comput 2023; 19:1466-1475. [PMID: 36787280 DOI: 10.1021/acs.jctc.2c00984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/15/2023]
Abstract
This work extends the electron deformation density-based descriptor, originally developed in the electron deformation density-based interaction energy machine learning (EDDIE-ML) algorithm to predict dimer interaction energies, to the prediction of three-body interactions in trimers. Using a sequential learning process to select the training data, the resulting Gaussian process regression (GPR) model predicts the three-body interaction energy within 0.2 kcal mol-1 of the SRS-MP2/cc-pVTZ reference values for the 3B69 and S22-3 trimer data sets. A hybrid kernel function is introduced, which combines contributions from the average and individual atomic environments, allowing the total trimer interaction energy to be predicted in addition to the three-body contribution using the same descriptor. To extend the range and diversity of trimer interaction energies available in the literature, a new data set based on a protein-ligand crystal structure is introduced, consisting of 509 structures of a central ligand with two protein fragments. Benchmark calculations are provided for the new data set, which contains significantly larger molecular interactions than current databases in the literature in addition to charged fragments. Compared to density funtional theory (DFT)- and wavefunction-based methods for calculating the three-body interaction energy, our model makes predictions in a significantly shorter time frame by reducing the number of required SCF calculations from 7 to 4 performed at the PBE0 level of theory, showcasing the utility and efficiency of our Δ-ML method particularly when applied to larger systems.
Collapse
Affiliation(s)
- Kaycee Low
- Monash Computational Chemistry Group, School of Chemistry, Monash University, Clayton, Victoria 3800, Australia
| | - Michelle L Coote
- Institute for Nanoscale Science and Technology, College of Science and Engineering, Flinders University, Bedford Park, South Australia 5042, Australia
| | - Ekaterina I Izgorodina
- Monash Computational Chemistry Group, School of Chemistry, Monash University, Clayton, Victoria 3800, Australia
| |
Collapse
|
6
|
Wang D, Li W, Dong X, Li H, Hu L. TFRegNCI: Interpretable Noncovalent Interaction Correction Multimodal Based on Transformer Encoder Fusion. J Chem Inf Model 2023; 63:782-793. [PMID: 36652718 DOI: 10.1021/acs.jcim.2c01283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
The interpretability is an important issue for end-to-end learning models. Motivated by computer vision algorithms, an interpretable noncovalent interaction (NCI) correction multimodal (TFRegNCI) is proposed for NCI prediction. TFRegNCI is based on RegNet feature extraction and a transformer encoder fusion strategy. RegNet is a network design paradigm that mainly focuses on local features. Meanwhile, the Vision Transformer is also leveraged for feature extraction, because it can capture global features better than RegNet while lowering the computational cost. Using a transformer encoder as the fusion strategy rather than multilayer perceptron can enhance model performance, due to its emphasis on important features with less parameters. Therefore, the proposed TFRegNCI achieved high accurate prediction (mean absolute error of ∼0.1 kcal/mol) comparing with the coupled cluster single double (triple) (CCSD(T)) benchmark. To further improve the model efficiency, TFRegNCI applies two-dimensional (2D) inputs transformed from three-dimensional (3D) electron density cubes, which saves time (30%), while the model accuracy remains. To improve model interpretability, a visualization module, Gradient-weighted Regression Activation Mapping (Grad-RAM) has been embedded. Grad-RAM is promoted from the classification algorithm, Gradient-weighted Class Activation Mapping, to perform feature visualization for the regression task. With Grad-RAM, the visual location map for features in deep learning models can be displayed. The feature map visualizations suggest that the 2D model has the similar performance as the 3D model, because of equally effective feature extractions from electron density. Moreover, the valid feature region on the location map by the 3D model is consistent with the NCIPLOT NCI isosurface. It is confirmed that the model does extract significant features related to the NCI interaction. The interpretable analyses are carried out through molecular orbital contribution on effective features. Thereby, the proposed model is likely to be a promising tool to reveal some essential information on NCIs, with regard to the level of electronic theory.
Collapse
Affiliation(s)
- Donghan Wang
- School of Information Science and Technology, Northeast Normal University, Changchun130117, China
| | - Wenze Li
- College of Computer and Information Engineering, Henan Normal University, Henan, Xinxiang453007, China
| | - Xu Dong
- School of Information Science and Technology, Northeast Normal University, Changchun130117, China
| | - Hongzhi Li
- School of Information Science and Technology, Northeast Normal University, Changchun130117, China
| | - LiHong Hu
- School of Information Science and Technology, Northeast Normal University, Changchun130117, China
| |
Collapse
|
7
|
Nagy PR, Gyevi-Nagy L, Lőrincz BD, Kállay M. Pursuing the basis set limit of CCSD(T) non-covalent interaction energies for medium-sized complexes: case study on the S66 compilation. Mol Phys 2022. [DOI: 10.1080/00268976.2022.2109526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2022]
Affiliation(s)
- Péter R. Nagy
- Faculty of Chemical Technology and Biotechnology, Department of Physical Chemistry and Materials Science, Budapest University of Technology and Economics, Budapest, Hungary
- ELKH-BME Quantum Chemistry Research Group, Budapest, Hungary
| | - László Gyevi-Nagy
- Faculty of Chemical Technology and Biotechnology, Department of Physical Chemistry and Materials Science, Budapest University of Technology and Economics, Budapest, Hungary
- ELKH-BME Quantum Chemistry Research Group, Budapest, Hungary
| | - Balázs D. Lőrincz
- Faculty of Chemical Technology and Biotechnology, Department of Physical Chemistry and Materials Science, Budapest University of Technology and Economics, Budapest, Hungary
- ELKH-BME Quantum Chemistry Research Group, Budapest, Hungary
| | - Mihály Kállay
- Faculty of Chemical Technology and Biotechnology, Department of Physical Chemistry and Materials Science, Budapest University of Technology and Economics, Budapest, Hungary
- ELKH-BME Quantum Chemistry Research Group, Budapest, Hungary
| |
Collapse
|
8
|
Cheng L, Sun J, Miller TF. Accurate Molecular-Orbital-Based Machine Learning Energies via Unsupervised Clustering of Chemical Space. J Chem Theory Comput 2022; 18:4826-4835. [PMID: 35858242 DOI: 10.1021/acs.jctc.2c00396] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We introduce an unsupervised clustering algorithm to improve training efficiency and accuracy in predicting energies using molecular-orbital-based machine learning (MOB-ML). This work determines clusters via the Gaussian mixture model (GMM) in an entirely automatic manner and simplifies an earlier supervised clustering approach [ J. Chem. Theory Comput. 2019, 15, 6668] by eliminating both the necessity for user-specified parameters and the training of an additional classifier. Unsupervised clustering results from GMM have the advantages of accurately reproducing chemically intuitive groupings of frontier molecular orbitals and exhibiting improved performance with an increasing number of training examples. The resulting clusters from supervised or unsupervised clustering are further combined with scalable Gaussian process regression (GPR) or linear regression (LR) to learn molecular energies accurately by generating a local regression model in each cluster. Among all four combinations of regressors and clustering methods, GMM combined with scalable exact GPR (GMM/GPR) is the most efficient training protocol for MOB-ML. The numerical tests of molecular energy learning on thermalized data sets of drug-like molecules demonstrate the improved accuracy, transferability, and learning efficiency of GMM/GPR over other training protocols for MOB-ML, i.e., supervised regression clustering combined with GPR (RC/GPR) and GPR without clustering. GMM/GPR also provides the best molecular energy predictions compared with ones from the literature on the same benchmark data sets. With a lower scaling, GMM/GPR has a 10.4-fold speedup in wall-clock training time compared with scalable exact GPR with a training size of 6500 QM7b-T molecules.
Collapse
Affiliation(s)
- Lixue Cheng
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, United States
| | - Jiace Sun
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, United States
| | - Thomas F Miller
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, United States
| |
Collapse
|
9
|
Karandashev K, von Lilienfeld OA. An orbital-based representation for accurate quantum machine learning. J Chem Phys 2022; 156:114101. [DOI: 10.1063/5.0083301] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We introduce an electronic structure based representation for quantum machine learning (QML) of electronic properties throughout chemical compound space. The representation is constructed using computationally inexpensive ab initio calculations and explicitly accounts for changes in the electronic structure. We demonstrate the accuracy and flexibility of resulting QML models when applied to property labels, such as total potential energy, HOMO and LUMO energies, ionization potential, and electron affinity, using as datasets for training and testing entries from the QM7b, QM7b-T, QM9, and LIBE libraries. For the latter, we also demonstrate the ability of this approach to account for molecular species of different charge and spin multiplicity, resulting in QML models that infer total potential energies based on geometry, charge, and spin as input.
Collapse
Affiliation(s)
| | - O. Anatole von Lilienfeld
- Faculty of Physics, University of Vienna, Kolingasse 14-16, AT-1090 Wien, Austria
- Department of Chemistry, Institute of Physical Chemistry and National Center for Computational Design and Discovery of Novel Materials (MARVEL), University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
| |
Collapse
|
10
|
Li W, Wang D, Yang Z, Zhang H, Hu L, Chen G. DeepNCI: DFT Noncovalent Interaction Correction with Transferable Multimodal Three-Dimensional Convolutional Neural Networks. J Chem Inf Model 2021; 62:5090-5099. [PMID: 34958566 DOI: 10.1021/acs.jcim.1c01305] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
A multimodal deep learning model, DeepNCI, is proposed for improving noncovalent interactions (NCIs) calculated via density functional theory (DFT). DeepNCI is composed of a three-dimensional convolutional neural network (3D CNN) for abstracting critical and comprehensive features from 3D electron density, and a neural network for modeling one-dimensional quantum chemical properties. By merging features from two networks, DeepNCI is able to reduce the root-mean-square error of DFT-calculated NCI from 1.19 kcal/mol to ∼0.2 kcal/mol for a NCI molecular database (>1000 molecules). The representativeness of the joint features can be visualized by t-distributed stochastic neighbor embedding (t-SNE), where they can distinguish categorized NCI systems quite well. Therefore, the fused model performs better than its component networks. In addition, the 3D CNN takes electron density as inputs that are in the same range, despite the size of molecular systems, so it can promote model applicability and transferability. To clarify the applicability of DeepNCI, an application domain (AD) has been defined with merged features using the K-nearest-neighbor method. The calculations for external test sets are shown that AD can properly monitor the reliability for a prediction. The model transferability is tested with a small database of homolysis bond dissociation energy including only dozens of samples. With NCI database pretrained parameters, the same or better performance than the reported results is achieved by transfer learning. This suggests that the DeepNCI model is transferable and it may transfer to other relative tasks, which possibly can resolve some small sampling problems. The source code of DeepNCI can be freely accessed at https://github.com/wenzelee/DeepNCI.
Collapse
Affiliation(s)
- Wenze Li
- School of Information Science and Technology, Northeast Normal University, Changchun, 130117, China
| | - Donghan Wang
- School of Information Science and Technology, Northeast Normal University, Changchun, 130117, China
| | - Zirui Yang
- School of Information Science and Technology, Northeast Normal University, Changchun, 130117, China
| | - Huijie Zhang
- School of Information Science and Technology, Northeast Normal University, Changchun, 130117, China
| | - LiHong Hu
- School of Information Science and Technology, Northeast Normal University, Changchun, 130117, China
| | - GuanHua Chen
- Department of Chemistry, The University of Hong Kong, Hong Kong S.A.R., China
| |
Collapse
|
11
|
Abstract
Chemical compound space (CCS), the set of all theoretically conceivable combinations of chemical elements and (meta-)stable geometries that make up matter, is colossal. The first-principles based virtual sampling of this space, for example, in search of novel molecules or materials which exhibit desirable properties, is therefore prohibitive for all but the smallest subsets and simplest properties. We review studies aimed at tackling this challenge using modern machine learning techniques based on (i) synthetic data, typically generated using quantum mechanics based methods, and (ii) model architectures inspired by quantum mechanics. Such Quantum mechanics based Machine Learning (QML) approaches combine the numerical efficiency of statistical surrogate models with an ab initio view on matter. They rigorously reflect the underlying physics in order to reach universality and transferability across CCS. While state-of-the-art approximations to quantum problems impose severe computational bottlenecks, recent QML based developments indicate the possibility of substantial acceleration without sacrificing the predictive power of quantum mechanics.
Collapse
Affiliation(s)
- Bing Huang
- Faculty
of Physics, University of Vienna, 1090 Vienna, Austria
| | - O. Anatole von Lilienfeld
- Faculty
of Physics, University of Vienna, 1090 Vienna, Austria
- Institute
of Physical Chemistry and National Center for Computational Design
and Discovery of Novel Materials (MARVEL), Department of Chemistry, University of Basel, 4056 Basel, Switzerland
| |
Collapse
|
12
|
Ballesteros F, Dunivan S, Lao KU. Coupled cluster benchmarks of large noncovalent complexes: The L7 dataset as well as DNA-ellipticine and buckycatcher-fullerene. J Chem Phys 2021; 154:154104. [PMID: 33887937 DOI: 10.1063/5.0042906] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
In this work, benchmark binding energies for dispersion-bound complexes in the L7 dataset, the DNA-ellipticine intercalation complex, and the buckycatcher-C60 complex with 120 heavy atoms using a focal-point method based on the canonical form of second-order Møller-Plesset theory (MP2) and the domain based local pair natural orbital scheme for the coupled cluster with single, double, and perturbative triple excitations [CCSD(T)] extrapolated to the complete basis set (CBS) limit are reported. This work allows for increased confidence given the agreement with respect to values recently obtained using the local natural orbital CCSD(T) for L7 and the canonical CCSD(T)/CBS result for the coronene dimer (C2C2PD). Therefore, these results can be considered pushing the CCSD(T)/CBS binding benchmark to the hundred-atom scale. The disagreements between the two state-of-the-art methods, CCSD(T) and fixed-node diffusion Monte Carlo, are substantial with at least 2.0 (∼10%), 1.9 (∼5%), and 10.3 kcal/mol (∼25%) differences for C2C2PD in L7, DNA-ellipticine, and buckycatcher-C60, respectively. Such sizable discrepancy above "chemical accuracy" for large noncovalent complexes indicates how challenging it is to obtain benchmark binding interactions for systems beyond small molecules, although the three up-to-date density functionals, PBE0+D4, ωB97M-V, and B97M-V, agree better with CCSD(T) for these large systems. In addition to reporting these values, different basis sets and various CBS extrapolation parameters for Hartree-Fock and MP2 correlation energies were tested for the first time in large noncovalent complexes with the goal of providing some indications toward optimal cost effective routes to approach the CBS limit without substantial loss in quality.
Collapse
Affiliation(s)
- Francisco Ballesteros
- Department of Chemistry, Virginia Commonwealth University, Richmond, Virginia 23284, USA
| | - Shelbie Dunivan
- Department of Chemistry, Virginia Commonwealth University, Richmond, Virginia 23284, USA
| | - Ka Un Lao
- Department of Chemistry, Virginia Commonwealth University, Richmond, Virginia 23284, USA
| |
Collapse
|
13
|
Käser S, Boittier ED, Upadhyay M, Meuwly M. Transfer Learning to CCSD(T): Accurate Anharmonic Frequencies from Machine Learning Models. J Chem Theory Comput 2021; 17:3687-3699. [PMID: 33960787 DOI: 10.1021/acs.jctc.1c00249] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
The calculation of the anharmonic modes of small- to medium-sized molecules for assigning experimentally measured frequencies to the corresponding type of molecular motions is computationally challenging at sufficiently high levels of quantum chemical theory. Here, a practical and affordable way to calculate coupled-cluster quality anharmonic frequencies using second-order vibrational perturbation theory (VPT2) from machine-learned models is presented. The approach, referenced as "NN + VPT2", uses a high-dimensional neural network (PhysNet) to learn potential energy surfaces (PESs) at different levels of theory from which harmonic and VPT2 frequencies can be efficiently determined. The NN + VPT2 approach is applied to eight small- to medium-sized molecules (H2CO, trans-HONO, HCOOH, CH3OH, CH3CHO, CH3NO2, CH3COOH, and CH3CONH2) and frequencies are reported from NN-learned models at the MP2/aug-cc-pVTZ, CCSD(T)/aug-cc-pVTZ, and CCSD(T)-F12/aug-cc-pVTZ-F12 levels of theory. For the largest molecules and at the highest levels of theory, transfer learning (TL) is used to determine the necessary full-dimensional, near-equilibrium PESs. Overall, NN + VPT2 yields anharmonic frequencies to within 20 cm-1 of experimentally determined frequencies for close to 90% of the modes for the highest quality PES available and to within 10 cm-1 for more than 60% of the modes. For the MP2 PESs only ∼60% of the NN + VPT2 frequencies were within 20 cm-1 of the experiment, with outliers up to ∼150 cm-1, compared to the experiment. It is also demonstrated that the approach allows to provide correct assignments for strongly interacting modes such as the OH bending and the OH torsional modes in formic acid monomer and the CO-stretch and OH-bend mode in acetic acid.
Collapse
Affiliation(s)
- Silvan Käser
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
| | - Eric D Boittier
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
| | - Meenu Upadhyay
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
| | - Markus Meuwly
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
| |
Collapse
|
14
|
Gyevi-Nagy L, Kállay M, Nagy PR. Accurate Reduced-Cost CCSD(T) Energies: Parallel Implementation, Benchmarks, and Large-Scale Applications. J Chem Theory Comput 2021; 17:860-878. [PMID: 33400527 PMCID: PMC7884001 DOI: 10.1021/acs.jctc.0c01077] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Indexed: 11/28/2022]
Abstract
The accurate and systematically improvable frozen natural orbital (FNO) and natural auxiliary function (NAF) cost-reducing approaches are combined with our recent coupled-cluster singles, doubles, and perturbative triples [CCSD(T)] implementations. Both of the closed- and open-shell FNO-CCSD(T) codes benefit from OpenMP parallelism, completely or partially integral-direct density-fitting algorithms, checkpointing, and hand-optimized, memory- and operation count effective implementations exploiting all permutational symmetries. The closed-shell CCSD(T) code requires negligible disk I/O and network bandwidth, is MPI/OpenMP parallel, and exhibits outstanding peak performance utilization of 50-70% up to hundreds of cores. Conservative FNO and NAF truncation thresholds benchmarked for challenging reaction, atomization, and ionization energies of both closed- and open-shell species are shown to maintain 1 kJ/mol accuracy against canonical CCSD(T) for systems of 31-43 atoms even with large basis sets. The cost reduction of up to an order of magnitude achieved extends the reach of FNO-CCSD(T) to systems of 50-75 atoms (up to 2124 atomic orbitals) with triple- and quadruple-ζ basis sets, which is unprecedented without local approximations. Consequently, a considerably larger portion of the chemical compound space can now be covered by the practically "gold standard" quality FNO-CCSD(T) method using affordable resources and about a week of wall time. Large-scale applications are presented for organocatalytic and transition-metal reactions as well as noncovalent interactions. Possible applications for benchmarking local CCSD(T) methods, as well as for the accuracy assessment or parametrization of less complete models, for example, density functional approximations or machine learning potentials, are also outlined.
Collapse
Affiliation(s)
- László Gyevi-Nagy
- Department of Physical Chemistry and
Materials Science, Budapest University of
Technology and Economics, P.O. Box 91, H-1521 Budapest, Hungary
| | - Mihály Kállay
- Department of Physical Chemistry and
Materials Science, Budapest University of
Technology and Economics, P.O. Box 91, H-1521 Budapest, Hungary
| | - Péter R. Nagy
- Department of Physical Chemistry and
Materials Science, Budapest University of
Technology and Economics, P.O. Box 91, H-1521 Budapest, Hungary
| |
Collapse
|
15
|
Jesus WS, Prudente FV, Marques JMC, Pereira FB. Modeling microsolvation clusters with electronic-structure calculations guided by analytical potentials and predictive machine learning techniques. Phys Chem Chem Phys 2021; 23:1738-1749. [PMID: 33427847 DOI: 10.1039/d0cp05200k] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
We propose a new methodology to study, at the density functional theory (DFT) level, the clusters resulting from the microsolvation of alkali-metal ions with rare-gas atoms. The workflow begins with a global optimization search to generate a pool of low-energy minimum structures for different cluster sizes. This is achieved by employing an analytical potential energy surface (PES) and an evolutionary algorithm (EA). The next main stage of the methodology is devoted to establish an adequate DFT approach to treat the microsolvation system, through a systematic benchmark study involving several combinations of functionals and basis sets, in order to characterize the global minimum structures of the smaller clusters. In the next stage, we apply machine learning (ML) classification algorithms to predict how the low-energy minima of the analytical PES map to the DFT ones. An early and accurate detection of likely DFT local minima is extremely important to guide the choice of the most promising low-energy minima of large clusters to be re-optimized at the DFT level of theory. In this work, the methodology was applied to the Li+Krn (n = 2-14 and 16) microsolvation clusters for which the most competitive DFT approach was found to be the B3LYP-D3/aug-pcseg-1. Additionally, the ML classifier was able to accurately predict most of the solutions to be re-optimized at the DFT level of theory, thereby greatly enhancing the efficiency of the process and allowing its applicability to larger clusters.
Collapse
Affiliation(s)
- W S Jesus
- Instituto de Física, Universidade Federal da Bahia, 40170-115 Salvador, BA, Brazil.
| | - F V Prudente
- Instituto de Física, Universidade Federal da Bahia, 40170-115 Salvador, BA, Brazil.
| | - J M C Marques
- CQC, Department of Chemistry, University of Coimbra, 3004-535 Coimbra, Portugal.
| | - F B Pereira
- Coimbra Polytechnic - ISEC, Coimbra, Portugal and Centro de Informática e Sistemas da Universidade de Coimbra (CISUC), Coimbra, Portugal.
| |
Collapse
|
16
|
Abstract
We introduce new and robust decompositions of mean-field Hartree-Fock and Kohn-Sham density functional theory relying on the use of localized molecular orbitals and physically sound charge population protocols. The new lossless property decompositions, which allow for partitioning one-electron reduced density matrices into either bond-wise or atomic contributions, are compared to alternatives from the literature with regard to both molecular energies and dipole moments. Besides commenting on possible applications as an interpretative tool in the rationalization of certain electronic phenomena, we demonstrate how decomposed mean-field theory makes it possible to expose and amplify compositional features in the context of machine-learned quantum chemistry. This is made possible by improving upon the granularity of the underlying data. On the basis of our preliminary proof-of-concept results, we conjecture that many of the structure-property inferences in existence today may be further refined by efficiently leveraging an increase in dataset complexity and richness.
Collapse
Affiliation(s)
- Janus J Eriksen
- School of Chemistry, University of Bristol, Cantock's Close, Bristol BS8 1TS, United Kingdom
| |
Collapse
|
17
|
Abstract
A broad range of approaches to many-body dispersion are discussed, including empirical approaches with multiple fitted parameters, augmented density functional-based approaches, symmetry adapted perturbation theory, and a supermolecule approach based on coupled cluster theory. Differing definitions of "body" are considered, specifically atom-based vs molecule-based approaches.
Collapse
Affiliation(s)
- Peng Xu
- Department of Chemistry, Iowa State University, Ames, Iowa 50014, United States
| | - Melisa Alkan
- Department of Chemistry, Iowa State University, Ames, Iowa 50014, United States
| | - Mark S Gordon
- Department of Chemistry, Iowa State University, Ames, Iowa 50014, United States
| |
Collapse
|
18
|
Stocker S, Csányi G, Reuter K, Margraf JT. Machine learning in chemical reaction space. Nat Commun 2020; 11:5505. [PMID: 33127879 PMCID: PMC7603480 DOI: 10.1038/s41467-020-19267-x] [Citation(s) in RCA: 65] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Accepted: 10/01/2020] [Indexed: 12/29/2022] Open
Abstract
Chemical compound space refers to the vast set of all possible chemical compounds, estimated to contain 1060 molecules. While intractable as a whole, modern machine learning (ML) is increasingly capable of accurately predicting molecular properties in important subsets. Here, we therefore engage in the ML-driven study of even larger reaction space. Central to chemistry as a science of transformations, this space contains all possible chemical reactions. As an important basis for 'reactive' ML, we establish a first-principles database (Rad-6) containing closed and open-shell organic molecules, along with an associated database of chemical reaction energies (Rad-6-RE). We show that the special topology of reaction spaces, with central hub molecules involved in multiple reactions, requires a modification of existing compound space ML-concepts. Showcased by the application to methane combustion, we demonstrate that the learned reaction energies offer a non-empirical route to rationally extract reduced reaction networks for detailed microkinetic analyses.
Collapse
Affiliation(s)
- Sina Stocker
- Chair of Theoretical Chemistry and Catalysis Research Center, Technische Universität München, Garching, Germany
| | - Gábor Csányi
- Engineering Laboratory, University of Cambridge, Cambridge, CB2 1PZ, UK
| | - Karsten Reuter
- Chair of Theoretical Chemistry and Catalysis Research Center, Technische Universität München, Garching, Germany
- Fritz-Haber-Institut der Max-Planck-Gesellschaft, Berlin, Germany
| | - Johannes T Margraf
- Chair of Theoretical Chemistry and Catalysis Research Center, Technische Universität München, Garching, Germany.
| |
Collapse
|
19
|
von Rudorff GF, Heinen SN, Bragato M, von Lilienfeld OA. Thousands of reactants and transition states for competing E2 and S$_\mathrm{N}$2 reactions. MACHINE LEARNING-SCIENCE AND TECHNOLOGY 2020. [DOI: 10.1088/2632-2153/aba822] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|