1
|
Grigoryan IV, Antiufrieva LA, Grigoryan AP, Pigareva VA, Generalov EA, Khomutov GB, Sybachin AV. IPECnet: ML model for predicting the area of water solubility of interpolyelectrolyte complexes. Phys Chem Chem Phys 2025; 27:8136-8147. [PMID: 40172530 DOI: 10.1039/d4cp04775c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2025]
Abstract
Interpolyelectrolyte complexes (IPECs) are known for years as classic representatives of smart polymers. The solubility of IPECs in water-salt media is driven by numerous factors connected with polymer component parameters and media composition. This work is devoted to the development of the world's first machine learning-based model for predicting the area of existence of water-soluble IPECs for solving biomedical problems. A new approach is proposed that takes into account both the physico-chemical properties of polyelectrolytes and the chemical structures of their monomeric units. The developed approach is universal and can be used to predict the properties of multicomponent systems of a different chemical nature. The results of the work were applied to select the composition of water-soluble IPECs for treatment of surfaces in order to create bactericidal coatings. The dataset and model structures are publicly available on GitHub.
Collapse
Affiliation(s)
- Ilya V Grigoryan
- Physics Department of Lomonosov Moscow State University, Leninskie Gory, 1-2, Moscow, 199991, Russia.
- Kotelnikov Institute of Radioengineering and Electronics, Russian Academy of Sciences, Moscow, 125009, Russia
| | - Liubov A Antiufrieva
- Skolkovo Institute of Science and Technology, the territory of the Skolkovo Innovation Center, Bolshoy Boulevard, 30, bld. 1, Moscow, 121205, Russia
| | - Anna P Grigoryan
- Physics Department of Lomonosov Moscow State University, Leninskie Gory, 1-2, Moscow, 199991, Russia.
- Faculty of Space Research of Lomonosov Moscow State University, Leninskiye Gory, 1-52, Moscow, 119991, Russia
| | - Vladislava A Pigareva
- A. N. Nesmeyanov Institute of Organoelement compounds Russian Academy of Sciences, Vavilova St., 28, bld. 1, Moscow, 119334, Russia
| | - Evgenii A Generalov
- Physics Department of Lomonosov Moscow State University, Leninskie Gory, 1-2, Moscow, 199991, Russia.
| | - Gennady B Khomutov
- Physics Department of Lomonosov Moscow State University, Leninskie Gory, 1-2, Moscow, 199991, Russia.
- Kotelnikov Institute of Radioengineering and Electronics, Russian Academy of Sciences, Moscow, 125009, Russia
| | - Andrey V Sybachin
- Chemistry Department of Lomonosov Moscow State University, Leninskie Gory, 1-3, Moscow, 199991, Russia.
| |
Collapse
|
2
|
Pathirage PDVS, Quebedeaux B, Akram S, Vogiatzis KD. Transferability Across Different Molecular Systems and Levels of Theory with the Data-Driven Coupled-Cluster Scheme. J Phys Chem A 2025; 129:2988-2997. [PMID: 40132101 DOI: 10.1021/acs.jpca.4c05718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/27/2025]
Abstract
Machine learning has recently been introduced into the arsenal of tools that are available to computational chemists. In the past few years, we have seen an increase in the applicability of these tools on a plethora of applications, including the automated exploration of a large fraction of the chemical space, the reduction of repetitive computational tasks, the detection of outliers on large databases, and the acceleration of molecular simulations. An attractive application of machine learning in molecular electronic structure theory is the "recycling" of molecular wave functions for faster and more accurate completion of complex quantum chemical calculations. Along these lines, we have developed hybrid quantum chemical/machine learning workflows that utilize information from low-level wave functions for the accurate prediction of higher-level wave functions. The data-driven coupled-cluster (DDCC) family of methods is discussed in this article together with the importance of the inclusion of physical properties in such hybrid workflows. After a short introduction to the philosophy and the capabilities of DDCC, we present our recent progress in extending its applicability to larger and more complex molecular structures and data sets. A significant advantage offered by DDCC is its transferability, with respect to different molecular systems and different excitation levels. As we show here, predicted wave functions at the coupled-cluster singles and doubles level of theory can be used for the accurate prediction of the perturbative triples of the CCSD(T) scheme. We conclude with some personal considerations with respect to future directions related to the development of the next generation of such hybrid quantum chemical/machine learning models.
Collapse
Affiliation(s)
- P D Varuna S Pathirage
- Department of Chemistry, University of Tennessee, Knoxville, Tennessee 37996-1600, United States
| | - Brody Quebedeaux
- Department of Chemistry, University of Tennessee, Knoxville, Tennessee 37996-1600, United States
| | - Shahzad Akram
- Department of Chemistry, University of Tennessee, Knoxville, Tennessee 37996-1600, United States
| | - Konstantinos D Vogiatzis
- Department of Chemistry, University of Tennessee, Knoxville, Tennessee 37996-1600, United States
| |
Collapse
|
3
|
Golub P, Yang C, Vlček V, Veis L. Quantum Chemical Density Matrix Renormalization Group Method Boosted by Machine Learning. J Phys Chem Lett 2025; 16:3295-3301. [PMID: 40126916 PMCID: PMC11973911 DOI: 10.1021/acs.jpclett.5c00207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2025] [Revised: 03/07/2025] [Accepted: 03/19/2025] [Indexed: 03/26/2025]
Abstract
The use of machine learning (ML) to refine low-level theoretical calculations to achieve higher accuracy is a promising and actively evolving approach known as Δ-ML. The density matrix renormalization group (DMRG) is a powerful variational approach widely used for studying strongly correlated quantum systems. High computational efficiency can be achieved without compromising accuracy. Here, we demonstrate the potential of a simple ML model to significantly enhance the performance of the quantum chemical DMRG method.
Collapse
Affiliation(s)
- Pavlo Golub
- J.
Heyrovsky Institute of Physical Chemistry, v.v.i., Czech Academy of Sciences, Prague, 18223, Czech Republic
| | - Chao Yang
- Applied
Mathematics and Computational Research Division, Lawerence Berkeley National Laboratory, Berkeley, 94720, United States
| | - Vojtěch Vlček
- Department
of Chemistry and Biochemistry, University
of California, Santa Barbara, Santa Barbara, 93117, United States
- Department
of Materials, University of California,
Santa Barbara, Santa Barbara, 93117, United
States
| | - Libor Veis
- J.
Heyrovský Institute of Physical Chemistry, v.v.i., Czech Academy of Sciences, Prague, 18223, Czech Republic
| |
Collapse
|
4
|
Ng WP, Zhang Z, Yang J. Accurate Neural Network Fine-Tuning Approach for Transferable Ab Initio Energy Prediction across Varying Molecular and Crystalline Scales. J Chem Theory Comput 2025; 21:1602-1614. [PMID: 39902570 DOI: 10.1021/acs.jctc.4c01261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2025]
Abstract
Existing machine learning models attempt to predict the energies of large molecules by training small molecules, but eventually fail to retain high accuracy as the errors increase with system size. Through an orbital pairwise decomposition of the correlation energy, a pretrained neural network model on hundred-scale data containing small molecules is demonstrated to be sufficiently transferable for accurately predicting large systems, including molecules and crystals. Our model introduces a residual connection to explicitly learn the pairwise energy corrections, and employs various low-rank retraining techniques to modestly adjust the learned network parameters. We demonstrate that with as few as only one larger molecule retraining the base model originally trained on only small molecules of (H2O)6, the MP2 correlation energy of the large liquid water (H2O)64 in a periodic supercell can be predicted at chemical accuracy. Similar performance is observed for large protonated clusters and periodic poly glycine chains. A demonstrative application is presented to predict the energy ordering of symmetrically inequivalent sublattices for distinct hydrogen orientations in the ice XV phase. Our work represents an important step forward in the quest for cost-effective, highly accurate and transferable neural network models in quantum chemistry, bridging the electronic structure patterns between small and large systems.
Collapse
Affiliation(s)
- Wai-Pan Ng
- Department of Chemistry, The University of Hong Kong, Hong Kong 999077, P. R. China
| | - Zili Zhang
- Department of Chemistry, The University of Hong Kong, Hong Kong 999077, P. R. China
| | - Jun Yang
- Department of Chemistry, The University of Hong Kong, Hong Kong 999077, P. R. China
- Hong Kong Quantum AI Lab Limited, Hong Kong 999077, P. R. China
| |
Collapse
|
5
|
Chen J, Gao Q, Huang M, Yu K. Application of modern artificial intelligence techniques in the development of organic molecular force fields. Phys Chem Chem Phys 2025; 27:2294-2319. [PMID: 39820957 DOI: 10.1039/d4cp02989e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2025]
Abstract
The molecular force field (FF) determines the accuracy of molecular dynamics (MD) and is one of the major bottlenecks that limits the application of MD in molecular design. Recently, artificial intelligence (AI) techniques, such as machine-learning potentials (MLPs), have been rapidly reshaping the landscape of MD. Meanwhile, organic molecular systems feature unique characteristics, and require more careful treatment in both model construction, optimization, and validation. While an accurate and generic organic molecular force field is still missing, significant progress has been made with the facilitation of AI, warranting a promising future. In this review, we provide an overview of the various types of AI techniques used in molecular FF development and discuss both the advantages and weaknesses of these methodologies. We show how AI methods provide unprecedented capabilities in many tasks such as potential fitting, atom typification, and automatic optimization. Meanwhile, it is also worth noting that more efforts are needed to improve the transferability of the model, develop a more comprehensive database, and establish more standardized validation procedures. With these discussions, we hope to inspire more efforts to solve the existing problems, eventually leading to the birth of next-generation generic organic FFs.
Collapse
Affiliation(s)
- Junmin Chen
- Institute of Materials Research (IMR), Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China.
- Tsinghua-Berkeley Shenzhen Institute (TBSI), Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China
| | - Qian Gao
- Institute of Materials Research (IMR), Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China.
| | - Miaofei Huang
- Institute of Materials Research (IMR), Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China.
| | - Kuang Yu
- Institute of Materials Research (IMR), Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China.
- Tsinghua-Berkeley Shenzhen Institute (TBSI), Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China
| |
Collapse
|
6
|
Aldossary A, Campos-Gonzalez-Angulo JA, Pablo-García S, Leong SX, Rajaonson EM, Thiede L, Tom G, Wang A, Avagliano D, Aspuru-Guzik A. In Silico Chemical Experiments in the Age of AI: From Quantum Chemistry to Machine Learning and Back. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2024; 36:e2402369. [PMID: 38794859 DOI: 10.1002/adma.202402369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 04/28/2024] [Indexed: 05/26/2024]
Abstract
Computational chemistry is an indispensable tool for understanding molecules and predicting chemical properties. However, traditional computational methods face significant challenges due to the difficulty of solving the Schrödinger equations and the increasing computational cost with the size of the molecular system. In response, there has been a surge of interest in leveraging artificial intelligence (AI) and machine learning (ML) techniques to in silico experiments. Integrating AI and ML into computational chemistry increases the scalability and speed of the exploration of chemical space. However, challenges remain, particularly regarding the reproducibility and transferability of ML models. This review highlights the evolution of ML in learning from, complementing, or replacing traditional computational chemistry for energy and property predictions. Starting from models trained entirely on numerical data, a journey set forth toward the ideal model incorporating or learning the physical laws of quantum mechanics. This paper also reviews existing computational methods and ML models and their intertwining, outlines a roadmap for future research, and identifies areas for improvement and innovation. Ultimately, the goal is to develop AI architectures capable of predicting accurate and transferable solutions to the Schrödinger equation, thereby revolutionizing in silico experiments within chemistry and materials science.
Collapse
Affiliation(s)
- Abdulrahman Aldossary
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
| | | | - Sergio Pablo-García
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
- Department of Computer Science, University of Toronto, 40 St. George Street, Toronto, ON, M5S 2E4, Canada
| | - Shi Xuan Leong
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
| | - Ella Miray Rajaonson
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
- Vector Institute for Artificial Intelligence, 661 University Ave. Suite 710, Toronto, ON, M5G 1M1, Canada
| | - Luca Thiede
- Department of Computer Science, University of Toronto, 40 St. George Street, Toronto, ON, M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, 661 University Ave. Suite 710, Toronto, ON, M5G 1M1, Canada
| | - Gary Tom
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
- Vector Institute for Artificial Intelligence, 661 University Ave. Suite 710, Toronto, ON, M5G 1M1, Canada
| | - Andrew Wang
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
| | - Davide Avagliano
- Chimie ParisTech, PSL University, CNRS, Institute of Chemistry for Life and Health Sciences (iCLeHS UMR 8060), Paris, F-75005, France
| | - Alán Aspuru-Guzik
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
- Department of Computer Science, University of Toronto, 40 St. George Street, Toronto, ON, M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, 661 University Ave. Suite 710, Toronto, ON, M5G 1M1, Canada
- Department of Materials Science & Engineering, University of Toronto, 184 College St., Toronto, ON, M5S 3E4, Canada
- Department of Chemical Engineering & Applied Chemistry, University of Toronto, 200 College St., Toronto, ON, M5S 3E5, Canada
- Lebovic Fellow, Canadian Institute for Advanced Research (CIFAR), 66118 University Ave., Toronto, M5G 1M1, Canada
- Acceleration Consortium, 80 St George St, Toronto, M5S 3H6, Canada
| |
Collapse
|
7
|
Pathirage PDVS, Phillips JT, Vogiatzis KD. Exploration of the Two-Electron Excitation Space with Data-Driven Coupled Cluster. J Phys Chem A 2024. [PMID: 38422511 DOI: 10.1021/acs.jpca.3c06600] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/02/2024]
Abstract
Computational cost limits the applicability of post-Hartree-Fock methods such as coupled-cluster on larger molecular systems. The data-driven coupled-cluster (DDCC) method applies machine learning to predict the coupled-cluster two-electron amplitudes (t2) using data from second-order perturbation theory (MP2). One major limitation of the DDCC models is the size of training sets that increases exponentially with the system size. Effective sampling of the amplitude space can resolve this issue. Five different amplitude selection techniques that reduce the amount of data used for training were evaluated, an approach that also prevents model overfitting and increases the portability of data-driven coupled-cluster singles and doubles to more complex molecules or larger basis sets. In combination with a localized orbital formalism to predict the CCSD t2 amplitudes, we have achieved a 10-fold error reduction for energy calculations.
Collapse
Affiliation(s)
- P D Varuna S Pathirage
- Department of Chemistry, University of Tennessee, Knoxville, Tennessee 37996-1600, United States
| | - Justin T Phillips
- Department of Chemistry, University of Tennessee, Knoxville, Tennessee 37996-1600, United States
| | - Konstantinos D Vogiatzis
- Department of Chemistry, University of Tennessee, Knoxville, Tennessee 37996-1600, United States
| |
Collapse
|
8
|
Ng WP, Liang Q, Yang J. Low-Data Deep Quantum Chemical Learning for Accurate MP2 and Coupled-Cluster Correlations. J Chem Theory Comput 2023; 19:5439-5449. [PMID: 37506400 DOI: 10.1021/acs.jctc.3c00518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/30/2023]
Abstract
Accurate ab initio prediction of electronic energies is very expensive for macromolecules by explicitly solving post-Hartree-Fock equations. We here exploit the physically justified local correlation feature in a compact basis of small molecules and construct an expressive low-data deep neural network (dNN) model to obtain machine-learned electron correlation energies on par with MP2 and CCSD levels of theory for more complex molecules and different datasets that are not represented in the training set. We show that our dNN-powered model is data efficient and makes highly transferable predictions across alkanes of various lengths, organic molecules with non-covalent and biomolecular interactions, as well as water clusters of different sizes and morphologies. In particular, by training 800 (H2O)8 clusters with the local correlation descriptors, accurate MP2/cc-pVTZ correlation energies up to (H2O)128 can be predicted with a small random error within chemical accuracy from exact values, while a majority of prediction deviations are attributed to an intrinsically systematic error. Our results reveal that an extremely compact local correlation feature set, which is poor for any direct post-Hartree-Fock calculations, has however a prominent advantage in reserving important electron correlation patterns for making accurate transferable predictions across distinct molecular compositions, bond types, and geometries.
Collapse
Affiliation(s)
- Wai-Pan Ng
- Department of Chemistry, The University of Hong Kong, Hong Kong 999077, P. R. China
- Hong Kong Quantum AI Lab Limited, Hong Kong 999077, P. R. China
| | - Qiujiang Liang
- Department of Chemistry, The University of Hong Kong, Hong Kong 999077, P. R. China
| | - Jun Yang
- Department of Chemistry, The University of Hong Kong, Hong Kong 999077, P. R. China
- Hong Kong Quantum AI Lab Limited, Hong Kong 999077, P. R. China
| |
Collapse
|
9
|
Hagg A, Kirschner KN. Open-Source Machine Learning in Computational Chemistry. J Chem Inf Model 2023; 63:4505-4532. [PMID: 37466636 PMCID: PMC10430767 DOI: 10.1021/acs.jcim.3c00643] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Indexed: 07/20/2023]
Abstract
The field of computational chemistry has seen a significant increase in the integration of machine learning concepts and algorithms. In this Perspective, we surveyed 179 open-source software projects, with corresponding peer-reviewed papers published within the last 5 years, to better understand the topics within the field being investigated by machine learning approaches. For each project, we provide a short description, the link to the code, the accompanying license type, and whether the training data and resulting models are made publicly available. Based on those deposited in GitHub repositories, the most popular employed Python libraries are identified. We hope that this survey will serve as a resource to learn about machine learning or specific architectures thereof by identifying accessible codes with accompanying papers on a topic basis. To this end, we also include computational chemistry open-source software for generating training data and fundamental Python libraries for machine learning. Based on our observations and considering the three pillars of collaborative machine learning work, open data, open source (code), and open models, we provide some suggestions to the community.
Collapse
Affiliation(s)
- Alexander Hagg
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Electrical Engineering, Mechanical Engineering and Technical Journalism, University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| | - Karl N. Kirschner
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Computer Science, University of Applied
Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| |
Collapse
|
10
|
Jones GM, Li RR, DePrince AE, Vogiatzis KD. Data-Driven Refinement of Electronic Energies from Two-Electron Reduced-Density-Matrix Theory. J Phys Chem Lett 2023:6377-6385. [PMID: 37418691 DOI: 10.1021/acs.jpclett.3c01382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/09/2023]
Abstract
The exponential computational cost of describing strongly correlated electrons can be mitigated by adopting a reduced-density matrix (RDM)-based description of the electronic structure. While variational two-electron RDM (v2RDM) methods can enable large-scale calculations on such systems, the quality of the solution is limited by the fact that only a subset of known necessary N-representability constraints can be applied to the 2RDM in practical calculations. Here, we demonstrate that violations of partial three-particle (T1 and T2) N-representability conditions, which can be evaluated with knowledge of only the 2RDM, can serve as physics-based features in a machine-learning (ML) protocol for improving energies from v2RDM calculations that consider only two-particle (PQG) conditions. Proof-of-principle calculations demonstrate that the model yields substantially improved energies relative to reference values from configuration-interaction-based calculations.
Collapse
Affiliation(s)
- Grier M Jones
- Department of Chemistry, University of Tennessee, Knoxville, Tennessee 37996, United States
| | - Run R Li
- Department of Chemistry and Biochemistry, Florida State University, Tallahassee, Florida 32306-4390, United States
| | - A Eugene DePrince
- Department of Chemistry and Biochemistry, Florida State University, Tallahassee, Florida 32306-4390, United States
| | | |
Collapse
|
11
|
Dandu NK, Ward L, Assary RS, Redfern PC, Curtiss LA. Accurate Prediction of Adiabatic Ionization Potentials of Organic Molecules using Quantum Chemistry Assisted Machine Learning. J Phys Chem A 2023. [PMID: 37406209 DOI: 10.1021/acs.jpca.3c00823] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/07/2023]
Abstract
In previous work (Dandu et al., J. Phys. Chem. A, 2022, 126, 4528-4536), we were successful in predicting accurate atomization energies of organic molecules using machine learning (ML) models, obtaining an accuracy as low as 0.1 kcal/mol compared to the G4MP2 method. In this work, we extend the use of these ML models to adiabatic ionization potentials on data sets of energies generated using quantum chemical calculations. Atomic specific corrections that were found to improve atomization energies from quantum chemical calculations have also been used in this study to improve ionization potentials. The quantum chemical calculations were performed on 3405 molecules containing eight or fewer non-hydrogen atoms derived from the QM9 data set, using the B3LYP functional with the 6-31G(2df,p) basis set for optimization. Low-fidelity IPs for these structures were obtained using two density functional methods: B3LYP/6-31+G(2df,p) and ωB97XD/6-311+G(3df,2p). Highly accurate G4MP2 calculations were performed on these optimized structures to obtain high-fidelity IPs to use in ML models based on the low-fidelity IPs. Our best performing ML methods gave IPs of organic molecules within a mean absolute deviation of 0.035 eV from the G4MP2 IPs for the whole data set. This work demonstrates that ML predictions assisted by quantum chemical calculations can be used to successfully predict IPs of organic molecules for use in high throughput screening.
Collapse
Affiliation(s)
- Naveen K Dandu
- Materials Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
- Joint Center for Energy Storage Research (JCESR), Argonne National Laboratory, Lemont, Illinois 60439, United States
- Chemical Engineering Department, University of Illinois-Chicago, Chicago, Illinois 60608, United States
| | - Logan Ward
- Joint Center for Energy Storage Research (JCESR), Argonne National Laboratory, Lemont, Illinois 60439, United States
- Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Rajeev S Assary
- Materials Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
- Joint Center for Energy Storage Research (JCESR), Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Paul C Redfern
- Materials Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Larry A Curtiss
- Materials Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
- Joint Center for Energy Storage Research (JCESR), Argonne National Laboratory, Lemont, Illinois 60439, United States
| |
Collapse
|
12
|
Ruth M, Gerbig D, Schreiner PR. Machine Learning of Coupled Cluster (T)-Energy Corrections via Delta (Δ)-Learning. J Chem Theory Comput 2022; 18:4846-4855. [PMID: 35816588 DOI: 10.1021/acs.jctc.2c00501] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Accurate thermochemistry is essential in many chemical disciplines, such as astro-, atmospheric, or combustion chemistry. These areas often involve fleetingly existent intermediates whose thermochemistry is difficult to assess. Whenever direct calorimetric experiments are infeasible, accurate computational estimates of relative molecular energies are required. However, high-level computations, often using coupled cluster theory, are generally resource-intensive. To expedite the process using machine learning techniques, we generated a database of energies for small organic molecules at the CCSD(T)/cc-pVDZ, CCSD(T)/aug-cc-pVDZ, and CCSD(T)/cc-pVTZ levels of theory. Leveraging the power of deep learning by employing graph neural networks, we are able to predict the effect of perturbatively included triples (T), that is, the difference between CCSD and CCSD(T) energies, with a mean absolute error of 0.25, 0.25, and 0.28 kcal mol-1 (R2 of 0.998, 0.997, and 0.998) with the cc-pVDZ, aug-cc-pVDZ, and cc-pVTZ basis sets, respectively. Our models were further validated by application to three validation sets taken from the S22 Database as well as to a selection of known theoretically challenging cases.
Collapse
Affiliation(s)
- Marcel Ruth
- Institute of Organic Chemistry, Justus Liebig University, Heinrich-Buff-Ring 17, 35392 Giessen, Germany
| | - Dennis Gerbig
- Institute of Organic Chemistry, Justus Liebig University, Heinrich-Buff-Ring 17, 35392 Giessen, Germany
| | - Peter R Schreiner
- Institute of Organic Chemistry, Justus Liebig University, Heinrich-Buff-Ring 17, 35392 Giessen, Germany
| |
Collapse
|
13
|
Jeong W, Gaggioli CA, Gagliardi L. Active Learning Configuration Interaction for Excited-State Calculations of Polycyclic Aromatic Hydrocarbons. J Chem Theory Comput 2021; 17:7518-7530. [PMID: 34787422 PMCID: PMC8675132 DOI: 10.1021/acs.jctc.1c00769] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2021] [Indexed: 11/30/2022]
Abstract
We present the active learning configuration interaction (ALCI) method for multiconfigurational calculations based on large active spaces. ALCI leverages the use of an active learning procedure to find important electronic configurations among the full configurational space generated within an active space. We tested it for the calculation of singlet-singlet excited states of acenes and pyrene using different machine learning algorithms. The ALCI method yields excitation energies within 0.2-0.3 eV from those obtained by traditional complete active-space configuration interaction (CASCI) calculations (affordable for active spaces up to 16 electrons in 16 orbitals) by including only a small fraction of the CASCI configuration space in the calculations. For larger active spaces (we tested up to 26 electrons in 26 orbitals), not affordable with traditional CI methods, ALCI captures the trends of experimental excitation energies. Overall, ALCI provides satisfactory approximations to large active-space wave functions with up to 10 orders of magnitude fewer determinants for the systems presented here. These ALCI wave functions are promising and affordable starting points for the subsequent second-order perturbation theory or pair-density functional theory calculations.
Collapse
Affiliation(s)
- WooSeok Jeong
- Department
of Chemistry, Nanoporous Materials Genome Center, Chemical Theory
Center, and Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Carlo Alberto Gaggioli
- Department
of Chemistry, Pritzker School of Molecular Engineering, James Franck
Institute, Chicago Center for Theoretical Chemistry, University of Chicago, Chicago, Illinois 60637, United States
| | - Laura Gagliardi
- Department
of Chemistry, Pritzker School of Molecular Engineering, James Franck
Institute, Chicago Center for Theoretical Chemistry, University of Chicago, Chicago, Illinois 60637, United States
- Argonne
National Laboratory, Lemont, Illinois 60439, United States
| |
Collapse
|
14
|
Keith JA, Vassilev-Galindo V, Cheng B, Chmiela S, Gastegger M, Müller KR, Tkatchenko A. Combining Machine Learning and Computational Chemistry for Predictive Insights Into Chemical Systems. Chem Rev 2021; 121:9816-9872. [PMID: 34232033 PMCID: PMC8391798 DOI: 10.1021/acs.chemrev.1c00107] [Citation(s) in RCA: 271] [Impact Index Per Article: 67.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Indexed: 12/23/2022]
Abstract
Machine learning models are poised to make a transformative impact on chemical sciences by dramatically accelerating computational algorithms and amplifying insights available from computational chemistry methods. However, achieving this requires a confluence and coaction of expertise in computer science and physical sciences. This Review is written for new and experienced researchers working at the intersection of both fields. We first provide concise tutorials of computational chemistry and machine learning methods, showing how insights involving both can be achieved. We follow with a critical review of noteworthy applications that demonstrate how computational chemistry and machine learning can be used together to provide insightful (and useful) predictions in molecular and materials modeling, retrosyntheses, catalysis, and drug design.
Collapse
Affiliation(s)
- John A. Keith
- Department
of Chemical and Petroleum Engineering Swanson School of Engineering, University of Pittsburgh, Pittsburgh, Pennsylvania 15261, United States
| | - Valentin Vassilev-Galindo
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Bingqing Cheng
- Accelerate
Programme for Scientific Discovery, Department
of Computer Science and Technology, 15 J. J. Thomson Avenue, Cambridge CB3 0FD, United Kingdom
| | - Stefan Chmiela
- Department
of Software Engineering and Theoretical Computer Science, Technische Universität Berlin, 10587, Berlin, Germany
| | - Michael Gastegger
- Department
of Software Engineering and Theoretical Computer Science, Technische Universität Berlin, 10587, Berlin, Germany
| | - Klaus-Robert Müller
- Machine
Learning Group, Technische Universität
Berlin, 10587, Berlin, Germany
- Department
of Artificial Intelligence, Korea University, Anam-dong, Seongbuk-gu, Seoul, 02841, Korea
- Max-Planck-Institut für Informatik, 66123 Saarbrücken, Germany
- Google Research, Brain Team, 10117 Berlin, Germany
| | - Alexandre Tkatchenko
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| |
Collapse
|
15
|
Abstract
Chemical compound space (CCS), the set of all theoretically conceivable combinations of chemical elements and (meta-)stable geometries that make up matter, is colossal. The first-principles based virtual sampling of this space, for example, in search of novel molecules or materials which exhibit desirable properties, is therefore prohibitive for all but the smallest subsets and simplest properties. We review studies aimed at tackling this challenge using modern machine learning techniques based on (i) synthetic data, typically generated using quantum mechanics based methods, and (ii) model architectures inspired by quantum mechanics. Such Quantum mechanics based Machine Learning (QML) approaches combine the numerical efficiency of statistical surrogate models with an ab initio view on matter. They rigorously reflect the underlying physics in order to reach universality and transferability across CCS. While state-of-the-art approximations to quantum problems impose severe computational bottlenecks, recent QML based developments indicate the possibility of substantial acceleration without sacrificing the predictive power of quantum mechanics.
Collapse
Affiliation(s)
- Bing Huang
- Faculty
of Physics, University of Vienna, 1090 Vienna, Austria
| | - O. Anatole von Lilienfeld
- Faculty
of Physics, University of Vienna, 1090 Vienna, Austria
- Institute
of Physical Chemistry and National Center for Computational Design
and Discovery of Novel Materials (MARVEL), Department of Chemistry, University of Basel, 4056 Basel, Switzerland
| |
Collapse
|
16
|
Han R, Luber S. Fast Estimation of Møller-Plesset Correlation Energies Based on Atomic Contributions. J Phys Chem Lett 2021; 12:5324-5331. [PMID: 34061529 DOI: 10.1021/acs.jpclett.1c00900] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Dynamic correlation plays an important role in the accurate calculation of chemical compounds such as the description of equilibrium structures in chemical systems. A model for the fast estimation of dynamic correlation energy is introduced in this work. This model is based on the idea of decomposition of the contribution of dynamic correlation energy calculated by nth order Møller-Plesset perturbation (MPn) theory with respect to atomic regions. Multiple levels of theory, including MP2, MP2.5, and MP4, are used as the reference, and the corresponding correlation energy densities are calculated. The proposed model is concise, fast, and promising for practical use, such as the prediction of reaction energies. It can also work as a baseline model or pretrained model for follow-up studies of machine learning.
Collapse
Affiliation(s)
- R Han
- Department of Chemistry A, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland
| | - S Luber
- Department of Chemistry A, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland
| |
Collapse
|