1
|
Mukeba CT, Bilonda MK, Katshiatshia HM, Muya JT. Pyrolysis of bioethanol and biobutanol: A thermodynamic and kinetic study. J Mol Model 2025; 31:146. [PMID: 40266340 DOI: 10.1007/s00894-025-06357-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2024] [Accepted: 03/24/2025] [Indexed: 04/24/2025]
Abstract
CONTEXT Bioethanol and biobutanol are renewable oxygenated fuels derived from biomass, commonly blended with gasoline for use in gasoline engines. These alcohol-based fuels have high oxygen content, promoting more complete combustion and reducing carbon dioxide emissions compared to petroleum fuels. However, during combustion, oxygenated radicals can interact and lead to the formation of formaldehyde, a highly toxic compound. This study delves into the thermodynamic and kinetic study of biofuel pyrolysis using quantum chemical methods. Our results identify C-C bond as the weakest in the initiation step, with bond dissociation enthalpy around 86 kcal/mol. Notably, ethanol exhibits higher bond dissociation energies than butanol. While the initiation step predominantly involves C-C bond breaking, the propagation step reveals a competition between H abstraction and C-C bond cleavage. Analyzing the computed rate constants and Gibbs free energies for radical reactions in the propagation steps indicates the likelihood formation of acetaldehyde, formaldehydes, methane, and ethylene. These products indeed present significant risks to both human health and the environment. This emphasizes the importance of carefully controlling macroscopic thermodynamic variables, such as temperature and pressure, during the pyrolysis of alcohol. Proper regulation of these factors is crucial in minimizing the formation of harmful aldehydes and ensuring a safer and more sustainable process. METHODS The reaction mechanisms of thermal decomposition are analyzed using UωB97XD/6-311 + G(3 df,2p), G4MP2, and G4 computational methods. The latter offers highly accurate enthalpies of formation, with a deviation from experiment values approximately 1 kcal/mol, though it is computationally expensive compared to DFT. To evaluate the diradical character of certain open-shell intermediate species, CASSCF and MP2-CASSCF methods, which effectively account for static correlation effects, are employed with the cc-pVDZ basis set. Thermodynamic and kinetic analyses are carried out using both ab initio and semi-empirical approaches through Gaussian 09 and OpenSMOKE + + 0.21.0 programs.
Collapse
Affiliation(s)
- Christian Tshikala Mukeba
- Department of Chemistry, Faculty of Sciences, University of Kinshasa, Kinshasa, Democratic Republic of the Congo
- Research Center for Theoretical Chemistry and Physics in Central Africa, Faculty of Science, University of Kinshasa, Kinshasa, Democratic Republic of the Congo
- Institut of Mechanics, Materials and Civil Engineering, Ecole Polytechnique de Louvain, Université catholique de Louvain, Louvain, Belgium
| | - Mireille Kabuyi Bilonda
- Department of Chemistry, Faculty of Sciences, University of Kinshasa, Kinshasa, Democratic Republic of the Congo.
- Research Center for Theoretical Chemistry and Physics in Central Africa, Faculty of Science, University of Kinshasa, Kinshasa, Democratic Republic of the Congo.
- Department of Chemistry, Faculty of Sciences, University of Rhodes, Makhanda, Republic of South Africa.
| | - Haddy Mbuyi Katshiatshia
- Research Center for Renewable Energy, Polytechnics Faculty, University of Kinshasa, Kinshasa, Democratic Republic of the Congo
| | - Jules Tshishimbi Muya
- Department of Chemistry, Faculty of Sciences, University of Kinshasa, Kinshasa, Democratic Republic of the Congo.
- Research Center for Theoretical Chemistry and Physics in Central Africa, Faculty of Science, University of Kinshasa, Kinshasa, Democratic Republic of the Congo.
- Department of Chemistry, Faculty of Sciences, University of Richmond, Richmond, VA, USA.
| |
Collapse
|
2
|
Jiang M, Wang Z, Chen Y, Zhang W, Zhu Z, Yan W, Wu J, Xu X. X2-PEC: A Neural Network Model Based on Atomic Pair Energy Corrections. J Comput Chem 2025; 46:e70081. [PMID: 40099806 DOI: 10.1002/jcc.70081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2025] [Revised: 02/27/2025] [Accepted: 02/28/2025] [Indexed: 03/20/2025]
Abstract
With the development of artificial neural networks (ANNs), its applications in chemistry have become increasingly widespread, especially in the prediction of various molecular properties. This work introduces the X2-PEC method, that is, the second generalization of the X1 series of ANN methods developed in our group, utilizing pair energy correction (PEC). The essence of the X2 model lies in its feature vector construction, using overlap integrals and core Hamiltonian integrals to incorporate physical and chemical information into the feature vectors to describe atomic interactions. It aims to enhance the accuracy of low-rung density functional theory (DFT) calculations, such as those from the widely used BLYP/6-31G(d) or B3LYP/6-31G(2df,p) methods, to the level of top-rung DFT calculations, such as those from the highly accurate doubly hybrid XYGJ-OS/GTLarge method. Trained on the QM9 dataset, X2-PEC excels in predicting the atomization energies of isomers such as C6H8 and C4H4N2O with varying bonding structures. The performance of the X2-PEC model on standard enthalpies of formation for datasets such as G2-HCNOF, PSH36, ALKANE28, BIGMOL20, and HEDM45, as well as a HCNOF subset of BH9 for reaction barriers, is equally commendable, demonstrating its good generalization ability and predictive accuracy, as well as its potential for further development to achieve greater accuracy. These outcomes highlight the practical significance of the X2-PEC model in elevating the results from lower-rung DFT calculations to the level of higher-rung DFT calculations through deep learning.
Collapse
Affiliation(s)
- Minghong Jiang
- Collaborative Innovation Center of Chemistry for Energy Materials, Shanghai Key Laboratory of Molecular Catalysis and Innovative Materials, MOE Key Laboratory of Computational Physical Sciences, Department of Chemistry, Fudan University, Shanghai, China
| | - Zhanfeng Wang
- Collaborative Innovation Center of Chemistry for Energy Materials, Shanghai Key Laboratory of Molecular Catalysis and Innovative Materials, MOE Key Laboratory of Computational Physical Sciences, Department of Chemistry, Fudan University, Shanghai, China
| | - Yicheng Chen
- Collaborative Innovation Center of Chemistry for Energy Materials, Shanghai Key Laboratory of Molecular Catalysis and Innovative Materials, MOE Key Laboratory of Computational Physical Sciences, Department of Chemistry, Fudan University, Shanghai, China
| | - Wenhao Zhang
- Collaborative Innovation Center of Chemistry for Energy Materials, Shanghai Key Laboratory of Molecular Catalysis and Innovative Materials, MOE Key Laboratory of Computational Physical Sciences, Department of Chemistry, Fudan University, Shanghai, China
| | - Zhenyu Zhu
- Collaborative Innovation Center of Chemistry for Energy Materials, Shanghai Key Laboratory of Molecular Catalysis and Innovative Materials, MOE Key Laboratory of Computational Physical Sciences, Department of Chemistry, Fudan University, Shanghai, China
| | - Wenjie Yan
- Collaborative Innovation Center of Chemistry for Energy Materials, Shanghai Key Laboratory of Molecular Catalysis and Innovative Materials, MOE Key Laboratory of Computational Physical Sciences, Department of Chemistry, Fudan University, Shanghai, China
| | - Jianming Wu
- Collaborative Innovation Center of Chemistry for Energy Materials, Shanghai Key Laboratory of Molecular Catalysis and Innovative Materials, MOE Key Laboratory of Computational Physical Sciences, Department of Chemistry, Fudan University, Shanghai, China
| | - Xin Xu
- Collaborative Innovation Center of Chemistry for Energy Materials, Shanghai Key Laboratory of Molecular Catalysis and Innovative Materials, MOE Key Laboratory of Computational Physical Sciences, Department of Chemistry, Fudan University, Shanghai, China
- Hefei National Laboratory, Hefei, China
| |
Collapse
|
3
|
Khan D, Price AJA, Huang B, Ach ML, von Lilienfeld OA. Adapting hybrid density functionals with machine learning. SCIENCE ADVANCES 2025; 11:eadt7769. [PMID: 39888985 PMCID: PMC11784814 DOI: 10.1126/sciadv.adt7769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/28/2024] [Accepted: 01/03/2025] [Indexed: 02/02/2025]
Abstract
Exact exchange contributions significantly affect electronic states, influencing covalent bond formation and breaking. Hybrid density functional approximations, which average exact exchange admixtures empirically, have achieved success but fall short of high-level quantum chemistry accuracy due to delocalization errors. We propose adaptive hybrid functionals, generating optimal exact exchange admixture ratios on the fly using data-efficient quantum machine learning models with negligible overhead. The adaptive Perdew-Burke-Ernzerhof hybrid density functional (aPBE0) improves energetics, electron densities, and HOMO-LUMO gaps in QM9, QM7b, and GMTKN55 benchmark datasets. A model uncertainty-based constraint reduces the method smoothly to PBE0 in extrapolative regimes, ensuring general applicability with limited training. By tuning exact exchange fractions for different spin states, aPBE0 effectively addresses the spin gap problem in open-shell systems such as carbenes. We also present a revised QM9 (revQM9) dataset with more accurate quantum properties, including stronger covalent binding, larger bandgaps, more localized electron densities, and larger dipole moments.
Collapse
Affiliation(s)
- Danish Khan
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, St. George Campus, Toronto, ON, Canada
- Vector Institute for Artificial Intelligence, Toronto, ON, Canada
| | - Alastair J. A. Price
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, St. George Campus, Toronto, ON, Canada
- Acceleration Consortium, University of Toronto, Toronto, ON, Canada
| | - Bing Huang
- Wuhan University, Department of Chemistry and Molecular Sciences, Wuhan 430072, China
| | - Maximilian L. Ach
- Department of Physics, University of Toronto, St. George Campus, Toronto, ON, Canada
- Department of Physics, Ludwig-Maximilians-Universität München (LMU), Munich, Germany
| | - O. Anatole von Lilienfeld
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, St. George Campus, Toronto, ON, Canada
- Vector Institute for Artificial Intelligence, Toronto, ON, Canada
- Acceleration Consortium, University of Toronto, Toronto, ON, Canada
- Department of Physics, University of Toronto, St. George Campus, Toronto, ON, Canada
- Department of Materials Science and Engineering, University of Toronto, St. George Campus, Toronto, ON, Canada
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
- Berlin Institute for the Foundations of Learning and Data, 10587 Berlin, Germany
| |
Collapse
|
4
|
Harb H, Elliott SN, Ward L, Foster IT, Klippenstein SJ, Curtiss LA, Assary RS. Accurate Dehydrogenation Enthalpies Dataset for Liquid Organic Hydrogen Carriers. Sci Data 2025; 12:171. [PMID: 39881140 PMCID: PMC11779890 DOI: 10.1038/s41597-025-04468-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Accepted: 01/14/2025] [Indexed: 01/31/2025] Open
Abstract
This contribution presents a comprehensive extension of the QM9 dataset (originally at 133 K molecules) with the calculation of G4MP2 enthalpies for 9,841 molecules, featuring up to nine heavy atoms. We present QM9-LOHC, a (de)hydrogenation dataset of 10,373 reactions, including a minimum of 5.5% weight hydrogen storage capacity in line with the Department of Energy standards for Liquid Organic Hydrogen Carriers (LOHC). By utilizing the accurate quantum chemical method G4MP2 we expand the QM9 database and explore new avenues for the exploration of hydrogen storage technologies (electrochemical LOHCs, alkali metal-LOHCs, and mixtures of LOHCs). The QM9-LOHC dataset, with its focus on reactions that vary only by hydrogen saturation levels, provides a needed data resource for advancing the design and optimization of both conventional and innovative LOHC systems, and high-fidelity data for molecular discovery.
Collapse
Affiliation(s)
- Hassan Harb
- Materials Science Division, Argonne National Laboratory, Lemont, IL, 60439, USA
| | - Sarah N Elliott
- Chemical Sciences and Engineering Division, Argonne National Laboratory, Lemont, IL, 60439, USA
| | - Logan Ward
- Data Sciences and Learning Division, Argonne National Laboratory, Lemont, IL, 60439, USA
| | - Ian T Foster
- Data Sciences and Learning Division, Argonne National Laboratory, Lemont, IL, 60439, USA
| | - Stephen J Klippenstein
- Chemical Sciences and Engineering Division, Argonne National Laboratory, Lemont, IL, 60439, USA
| | - Larry A Curtiss
- Materials Science Division, Argonne National Laboratory, Lemont, IL, 60439, USA
| | | |
Collapse
|
5
|
Sommer T, Clarke C, García-Melchor M. Beyond chemical structures: lessons and guiding principles for the next generation of molecular databases. Chem Sci 2025; 16:1002-1016. [PMID: 39660292 PMCID: PMC11626465 DOI: 10.1039/d4sc04064c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Accepted: 11/28/2024] [Indexed: 12/12/2024] Open
Abstract
Databases of molecules and materials are indispensable for advancing chemical research, especially when enriched with electronic structure information from quantum chemistry methods like density functional theory. In this perspective, we review and analyze the current landscape of materials and molecular databases containing quantum chemical data. Our analysis reveals that the materials community has significantly benefited from data platforms such as the Materials Project, which seamlessly integrate chemical structures, electronic structure data, and open-source software. Conversely, quantum chemical data for molecular systems remains largely fragmented across individual datasets, lacking the comprehensive framework of a unified database. We distilled insights from these existing data resources into seven guiding principles termed QUANTUM, which build upon the foundational FAIR principles of data sharing (Findable, Accessible, Interoperable, and Reusable). These principles are aimed at advancing the development of molecular databases into robust, integrated data platforms. We conclude with an outlook on both short- and long-term objectives, guided by these QUANTUM principles, to foster future advancements in molecular quantum databases and enhance their utility for the research community.
Collapse
Affiliation(s)
- Timo Sommer
- School of Chemistry, CRANN and AMBER Research Centres, Trinity College Dublin, College Green Dublin 2 Ireland
| | - Cian Clarke
- School of Chemistry, CRANN and AMBER Research Centres, Trinity College Dublin, College Green Dublin 2 Ireland
| | - Max García-Melchor
- School of Chemistry, CRANN and AMBER Research Centres, Trinity College Dublin, College Green Dublin 2 Ireland
- Center for Cooperative Research on Alternative Energy (CIC EnergiGUNE), Basque Research and Technology Alliance (BRTA), Alava Technology Park Albert Einstein 48 01510 Vitoria-Gasteiz Spain
- IKERBASQUE, Basque Foundation for Science Plaza de Euskadi 5 48009 Bilbao Spain
| |
Collapse
|
6
|
Chen J, Pelc A, Ameixa J, Kossoski F, Denifl S. Low-Energy Electron Interactions with Methyl-p-benzoquinone: A Study of Negative Ion Formation. ACS OMEGA 2024; 9:38032-38043. [PMID: 39281892 PMCID: PMC11391464 DOI: 10.1021/acsomega.4c04899] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Revised: 08/09/2024] [Accepted: 08/13/2024] [Indexed: 09/18/2024]
Abstract
Methyl-p-benzoquinone (MpBQ, CH3C6H3(=O)2) is a prototypical molecule in the study of quinones, which are compounds of relevance in biology and several redox reactions. Understanding the electron attachment properties of MpBQ and its ability to form anions is crucial in elucidating its role in these reactions. In this study, we investigate electron attachment to MpBQ employing a crossed electron-molecular beam experiment in the electron energy range of approximately 0 to 12 eV, as well as theoretical approaches using quantum chemical and electron scattering calculations. Six anionic species were identified: C7H6O2 -, C7H5O2 -, C6H5O-, C4HO-, C2H2 -, and O-. The parent anion is formed most efficiently, with large cross sections, through two resonances at electron energies between 1 and 2 eV. Potential reaction pathways for all negative ions observed are explored, and the experimental appearance energies are compared with calculated thermochemical thresholds. Although exhibiting similar electron attachment properties to pBQ, MpBQ's additional methyl group introduces entirely new dissociative reactions, while quenching others, underscoring its distinctive chemical behavior.
Collapse
Affiliation(s)
- Jiakuan Chen
- Institut für Ionenphysik und Angewandte Physik, Universität Innsbruck, Technikerstraße 25, A-6020 Innsbruck, Austria
| | - Andrzej Pelc
- Department of Biophysics, Mass Spectrometry Laboratory, Maria Curie-Skłodowska University, Pl. M. C.-Skłodowskiej 1, 20-031 Lublin, Poland
| | - João Ameixa
- Institute of Chemistry, Hybrid Nanostructures, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476 Potsdam, Germany
| | - Fábris Kossoski
- Laboratoire de Chimie et Physique Quantiques (UMR 5626), Université de Toulouse, CNRS, UPS, F-31062 Toulouse, France
| | - Stephan Denifl
- Institut für Ionenphysik und Angewandte Physik, Universität Innsbruck, Technikerstraße 25, A-6020 Innsbruck, Austria
| |
Collapse
|
7
|
Rodrigues R, Bou Debes D, Mendes M, Guerra P, Mestre G, Eden S, Cornetta LM, Ingólfsson O, da Silva FF. Experimental and Theoretical Study on Electron Ionization and Fragmentation of Propylene Oxide─the First Chiral Molecule Detected in the Interstellar Medium. J Phys Chem A 2024; 128:4795-4805. [PMID: 38860325 DOI: 10.1021/acs.jpca.4c02116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2024]
Abstract
Propylene oxide, CH3CHOCH2, is the first chiral molecule detected in space and the third C3 oxide detected toward the Sagittarius B2 (Sgr B2 (N)) molecular cloud, the others being propanal, CH3CH2CHO, and acetone, (CH3)2CO. With homochirality being ubiquitous in the building blocks of living matter, the formation and decay paths of propylene oxide in space are of specific interest. Motivated by the significant role of photo- and secondary electrons in astrochemistry, we have studied electron ionization and fragmentation of propylene oxide. Ion appearance energies are determined and compared to threshold values for the respective processes calculated at the G4MP2 level of theory, and potential reaction pathways are computed at the DFT level of theory. Electron ionization is found to destabilize propylene oxide, leading to barrierless opening of the C1-C2 bond of the epoxy ring, hydrogen transfer, and fragmentation over the methyl vinyl ether or rupture of the C2-O bond of the epoxy ring and fragmentation of the allyl alcohol cation as an intermediate, rather than direct bond ruptures.
Collapse
Affiliation(s)
- Rodrigo Rodrigues
- CEFITEC, Departamento de Física, NOVA School of Science and Technology, Universidade NOVA de Lisboa, Caparica 2829-516, Portugal
| | - Daniel Bou Debes
- School of Physical Sciences, The Open University, Walton Hall, Milton Keynes MK7 6AA, U.K
| | - Mónica Mendes
- CEFITEC, Departamento de Física, NOVA School of Science and Technology, Universidade NOVA de Lisboa, Caparica 2829-516, Portugal
| | - Pedro Guerra
- CEFITEC, Departamento de Física, NOVA School of Science and Technology, Universidade NOVA de Lisboa, Caparica 2829-516, Portugal
| | - Gonçalo Mestre
- CEFITEC, Departamento de Física, NOVA School of Science and Technology, Universidade NOVA de Lisboa, Caparica 2829-516, Portugal
| | - Samuel Eden
- School of Physical Sciences, The Open University, Walton Hall, Milton Keynes MK7 6AA, U.K
| | - Lucas M Cornetta
- Instituto de Física da Universidade de São Paulo, Universidade de São Paulo, São Paulo 05508-900, Brazil
| | - Oddur Ingólfsson
- Department of Chemistry and Science Institute, University of Iceland, Dunhagi 3, Reykjavik IS-107, Iceland
| | - F Ferreira da Silva
- CEFITEC, Departamento de Física, NOVA School of Science and Technology, Universidade NOVA de Lisboa, Caparica 2829-516, Portugal
| |
Collapse
|
8
|
Karton A. Big data benchmarking: how do DFT methods across the rungs of Jacob's ladder perform for a dataset of 122k CCSD(T) total atomization energies? Phys Chem Chem Phys 2024; 26:14594-14606. [PMID: 38738470 DOI: 10.1039/d4cp00387j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/14/2024]
Abstract
Total atomization energies (TAEs) are a central quantity in density functional theory (DFT) benchmark studies. However, so far TAE databases obtained from experiment or high-level ab initio wavefunction theory included up to hundreds of TAEs. Here, we use the GDB-9 database of 133k CCSD(T) TAEs generated by Curtiss and co-workers [B. Narayanan, P. C. Redfern, R. S. Assary and L. A. Curtiss, Chem. Sci., 2019, 10, 7449] to evaluate the performance of 14 representative DFT methods across the rungs of Jacob's ladder (namely, PBE, BLYP, B97-D, M06-L, τ-HCTH, PBE0, B3LYP, B3PW91, ωB97X-D, τ-HCTHh, PW6B95, M06, M06-2X, and MN15). We first use the A25[PBE] diagnostic for nondynamical correlation to eliminate systems that potentially include significant multireference effects, for which the CCSD(T) TAEs might not be sufficiently reliable. The resulting database (denoted by GDB9-nonMR) includes 122k species. Of the considered functionals, B3LYP attains the best performance relative to the G4(MP2) reference TAEs, with a mean absolute deviation (MAD) of 4.09 kcal mol-1. This first-generation hybrid functional, in which the three mixing coefficients were fitted against a small set of TAEs, is one of the few functionals that are not systematically biased towards overestimating the G4(MP2) TAEs, as demonstrated by a mean-signed deviation (MSD) of 0.45 kcal mol-1. The relatively good performance of B3LYP is followed by the heavily parameterized M06-L meta-GGA functional, which attains a MAD of 6.24 kcal mol-1. The PW6B95, M06, M06-2X, and MN15 functionals tend to systematically overestimate the G4(MP2) TAEs and attain MADs ranging between 18.69 (M06) and 28.54 (MN15) kcal mol-1. However, PW6B95 and M06-2X exhibit particularly narrow error distributions. Thus, scaling their TAEs by an empirical scaling factor reduces their MADs to merely 3.38 (PW6B95) and 2.85 (M06-2X) kcal mol-1. Empirical dispersion corrections (e.g., D3 and D4) are attractive, and therefore, their inclusion worsens the performance of methods that systematically overestimate the TAEs.
Collapse
Affiliation(s)
- Amir Karton
- School of Science and Technology, University of New England, Armidale, NSW 2351, Australia.
| |
Collapse
|
9
|
Yang Z, Huang T, Pan L, Wang J, Wang L, Ding J, Xiao J. QuanDB: a quantum chemical property database towards enhancing 3D molecular representation learning. J Cheminform 2024; 16:48. [PMID: 38685101 PMCID: PMC11059686 DOI: 10.1186/s13321-024-00843-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Accepted: 04/24/2024] [Indexed: 05/02/2024] Open
Abstract
Previous studies have shown that the three-dimensional (3D) geometric and electronic structure of molecules play a crucial role in determining their key properties and intermolecular interactions. Therefore, it is necessary to establish a quantum chemical (QC) property database containing the most stable 3D geometric conformations and electronic structures of molecules. In this study, a high-quality QC property database, called QuanDB, was developed, which included structurally diverse molecular entities and featured a user-friendly interface. Currently, QuanDB contains 154,610 compounds sourced from public databases and scientific literature, with 10,125 scaffolds. The elemental composition comprises nine elements: H, C, O, N, P, S, F, Cl, and Br. For each molecule, QuanDB provides 53 global and 5 local QC properties and the most stable 3D conformation. These properties are divided into three categories: geometric structure, electronic structure, and thermodynamics. Geometric structure optimization and single point energy calculation at the theoretical level of B3LYP-D3(BJ)/6-311G(d)/SMD/water and B3LYP-D3(BJ)/def2-TZVP/SMD/water, respectively, were applied to ensure highly accurate calculations of QC properties, with the computational cost exceeding 107 core-hours. QuanDB provides high-value geometric and electronic structure information for use in molecular representation models, which are critical for machine-learning-based molecular design, thereby contributing to a comprehensive description of the chemical compound space. As a new high-quality dataset for QC properties, QuanDB is expected to become a benchmark tool for the training and optimization of machine learning models, thus further advancing the development of novel drugs and materials. QuanDB is freely available, without registration, at https://quandb.cmdrg.com/ .
Collapse
Affiliation(s)
- Zhijiang Yang
- State Key Laboratory of NBC Protection for Civilian, Beijing, People's Republic of China
| | - Tengxin Huang
- State Key Laboratory of NBC Protection for Civilian, Beijing, People's Republic of China
| | - Li Pan
- State Key Laboratory of NBC Protection for Civilian, Beijing, People's Republic of China
| | - Jingjing Wang
- State Key Laboratory of NBC Protection for Civilian, Beijing, People's Republic of China
| | - Liangliang Wang
- State Key Laboratory of NBC Protection for Civilian, Beijing, People's Republic of China.
| | - Junjie Ding
- State Key Laboratory of NBC Protection for Civilian, Beijing, People's Republic of China.
| | - Junhua Xiao
- State Key Laboratory of NBC Protection for Civilian, Beijing, People's Republic of China.
| |
Collapse
|
10
|
Lee AS, Elliott S, Harb H, Ward L, Foster I, Curtiss L, Assary RS. Emin: A First-Principles Thermochemical Descriptor for Predicting Molecular Synthesizability. J Chem Inf Model 2024; 64:1277-1289. [PMID: 38359461 DOI: 10.1021/acs.jcim.3c01583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/17/2024]
Abstract
Predicting the synthesizability of a new molecule remains an unsolved challenge that chemists have long tackled with heuristic approaches. Here, we report a new method for predicting synthesizability using a simple yet accurate thermochemical descriptor. We introduce Emin, the energy difference between a molecule and its lowest energy constitutional isomer, as a synthesizability predictor that is accurate, physically meaningful, and first-principles based. We apply Emin to 134,000 molecules in the QM9 data set and find that Emin is accurate when used alone and reduces incorrect predictions of "synthesizable" by up to 52% when used to augment commonly used prediction methods. Our work illustrates how first-principles thermochemistry and heuristic approximations for molecular stability are complementary, opening a new direction for synthesizability prediction methods.
Collapse
Affiliation(s)
- Andrew S Lee
- Department of Materials Science and Engineering, Northwestern University, Evanston, Illinois 60208, United States
| | - Sarah Elliott
- Chemical Sciences and Engineering Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Hassan Harb
- Materials Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Logan Ward
- Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Ian Foster
- Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Larry Curtiss
- Materials Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Rajeev S Assary
- Materials Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| |
Collapse
|
11
|
Stylianakis I, Zervos N, Lii JH, Pantazis DA, Kolocouris A. Conformational energies of reference organic molecules: benchmarking of common efficient computational methods against coupled cluster theory. J Comput Aided Mol Des 2023; 37:607-656. [PMID: 37597063 PMCID: PMC10618395 DOI: 10.1007/s10822-023-00513-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Accepted: 06/03/2023] [Indexed: 08/21/2023]
Abstract
We selected 145 reference organic molecules that include model fragments used in computer-aided drug design. We calculated 158 conformational energies and barriers using force fields, with wide applicability in commercial and free softwares and extensive application on the calculation of conformational energies of organic molecules, e.g. the UFF and DREIDING force fields, the Allinger's force fields MM3-96, MM3-00, MM4-8, the MM2-91 clones MMX and MM+, the MMFF94 force field, MM4, ab initio Hartree-Fock (HF) theory with different basis sets, the standard density functional theory B3LYP, the second-order post-HF MP2 theory and the Domain-based Local Pair Natural Orbital Coupled Cluster DLPNO-CCSD(T) theory, with the latter used for accurate reference values. The data set of the organic molecules includes hydrocarbons, haloalkanes, conjugated compounds, and oxygen-, nitrogen-, phosphorus- and sulphur-containing compounds. We reviewed in detail the conformational aspects of these model organic molecules providing the current understanding of the steric and electronic factors that determine the stability of low energy conformers and the literature including previous experimental observations and calculated findings. While progress on the computer hardware allows the calculations of thousands of conformations for later use in drug design projects, this study is an update from previous classical studies that used, as reference values, experimental ones using a variety of methods and different environments. The lowest mean error against the DLPNO-CCSD(T) reference was calculated for MP2 (0.35 kcal mol-1), followed by B3LYP (0.69 kcal mol-1) and the HF theories (0.81-1.0 kcal mol-1). As regards the force fields, the lowest errors were observed for the Allinger's force fields MM3-00 (1.28 kcal mol-1), ΜΜ3-96 (1.40 kcal mol-1) and the Halgren's MMFF94 force field (1.30 kcal mol-1) and then for the MM2-91 clones MMX (1.77 kcal mol-1) and MM+ (2.01 kcal mol-1) and MM4 (2.05 kcal mol-1). The DREIDING (3.63 kcal mol-1) and UFF (3.77 kcal mol-1) force fields have the lowest performance. These model organic molecules we used are often present as fragments in drug-like molecules. The values calculated using DLPNO-CCSD(T) make up a valuable data set for further comparisons and for improved force field parameterization.
Collapse
Affiliation(s)
- Ioannis Stylianakis
- Department of Medicinal Chemistry, Faculty of Pharmacy, National and Kapodistrian University of Athens, Panepistimioupolis Zografou, 15771, Athens, Greece
| | - Nikolaos Zervos
- Department of Medicinal Chemistry, Faculty of Pharmacy, National and Kapodistrian University of Athens, Panepistimioupolis Zografou, 15771, Athens, Greece
| | - Jenn-Huei Lii
- Department of Chemistry, National Changhua University of Education, Changhua City, Taiwan
| | - Dimitrios A Pantazis
- Max-Planck-Institut für Kohlenforschung, Kaiser-Wilhelm-Platz 1, 45470, Mülheim an der Ruhr, Germany
| | - Antonios Kolocouris
- Department of Medicinal Chemistry, Faculty of Pharmacy, National and Kapodistrian University of Athens, Panepistimioupolis Zografou, 15771, Athens, Greece.
- Laboratory of Medicinal Chemistry, Section of Pharmaceutical Chemistry, Department of Pharmacy, National and Kapodistrian University of Athens, Panepistimiopolis-Zografou, 15771, Athens, Greece.
| |
Collapse
|
12
|
Yadav S, Misra N, Mansi, Khanna P, Jain M, Khanna L. A DFT study on substituents, solvent, and temperature effect and mechanism of Diels-Alder reaction of hexafluoro-2-butyne with furan. J Mol Model 2023; 29:387. [PMID: 38008793 DOI: 10.1007/s00894-023-05754-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 10/12/2023] [Indexed: 11/28/2023]
Abstract
CONTEXT Furan and its derivatives constitute a vital class of heterocyclic chemistry used widely in organic synthesis via Diels-Alder reactions. As fluorine incorporation has been of great interest due to the limited possible pathways, the present study on [4 + 2] cycloaddition Diels-Alder reaction, between hexafluoro-2-butyne and 2-substituted (NH2, OCH3, OTMS, NHBoc) furans, uses the reaction as a likely route. The computational study revealed that that the reaction is feasible in all conditions and is most favorable for NH2 substituent in furan. The study of the effect of temperature has depicted that low temperature favors the formation of adducts, while the rise in temperature prefers ring opening to form 4-substituted-2,3-di(trifluoromethyl)phenol derivatives. The feasibility of a reaction has been determined by Gibbs energy change. The transition state study has been performed to find the activation energy, C-C single bond formation and global electron density transfer (GEDT) involved in the adduct formation. MEP plots have been used to understand the region of electrophilicity and nucleophilicity character. Furthermore, the mechanism for the formation of phenol products has been discussed. The decomposition of the NHBoc group at higher temperatures has been proved via a proposed mechanism and compared with experimental results. METHODS The reaction was theoretically investigated using B3LYP hybrid functional with 6-311 + G(d,p) basis sets, in gas phase and under different solvent conditions like water, acetonitrile, and THF. The transition state structures of the adduct were optimized at the lower basis set B3LYP/6-31 + G(d,p) as well as at the higher basis set B3LYP/6-311 + G(d,p) level. The changes in Gibbs energy (∆G) for the formation of products at different temperatures and in various solvents have been calculated at B3LYP/6-311 + G(d,p) level.
Collapse
Affiliation(s)
- Shilpa Yadav
- University School of Basic and Applied Sciences, Guru Gobind Singh Indraprastha University, Sector 16-C, Dwarka, New Delhi, 110078, India
| | - Neeti Misra
- Department of Chemistry, Acharya Narendra Dev College, University of Delhi, Kalkaji, New Delhi, 110019, India
| | - Mansi
- University School of Basic and Applied Sciences, Guru Gobind Singh Indraprastha University, Sector 16-C, Dwarka, New Delhi, 110078, India
| | - Pankaj Khanna
- Department of Chemistry, Acharya Narendra Dev College, University of Delhi, Kalkaji, New Delhi, 110019, India
| | - Manisha Jain
- Department of Chemistry, Acharya Narendra Dev College, University of Delhi, Kalkaji, New Delhi, 110019, India
| | - Leena Khanna
- University School of Basic and Applied Sciences, Guru Gobind Singh Indraprastha University, Sector 16-C, Dwarka, New Delhi, 110078, India.
| |
Collapse
|
13
|
Jablonka KM, Ai Q, Al-Feghali A, Badhwar S, Bocarsly JD, Bran AM, Bringuier S, Brinson LC, Choudhary K, Circi D, Cox S, de Jong WA, Evans ML, Gastellu N, Genzling J, Gil MV, Gupta AK, Hong Z, Imran A, Kruschwitz S, Labarre A, Lála J, Liu T, Ma S, Majumdar S, Merz GW, Moitessier N, Moubarak E, Mouriño B, Pelkie B, Pieler M, Ramos MC, Ranković B, Rodriques SG, Sanders JN, Schwaller P, Schwarting M, Shi J, Smit B, Smith BE, Van Herck J, Völker C, Ward L, Warren S, Weiser B, Zhang S, Zhang X, Zia GA, Scourtas A, Schmidt KJ, Foster I, White AD, Blaiszik B. 14 examples of how LLMs can transform materials science and chemistry: a reflection on a large language model hackathon. DIGITAL DISCOVERY 2023; 2:1233-1250. [PMID: 38013906 PMCID: PMC10561547 DOI: 10.1039/d3dd00113j] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Accepted: 08/08/2023] [Indexed: 11/04/2023]
Abstract
Large-language models (LLMs) such as GPT-4 caught the interest of many scientists. Recent studies suggested that these models could be useful in chemistry and materials science. To explore these possibilities, we organized a hackathon. This article chronicles the projects built as part of this hackathon. Participants employed LLMs for various applications, including predicting properties of molecules and materials, designing novel interfaces for tools, extracting knowledge from unstructured data, and developing new educational applications. The diverse topics and the fact that working prototypes could be generated in less than two days highlight that LLMs will profoundly impact the future of our fields. The rich collection of ideas and projects also indicates that the applications of LLMs are not limited to materials science and chemistry but offer potential benefits to a wide range of scientific disciplines.
Collapse
Affiliation(s)
- Kevin Maik Jablonka
- Laboratory of Molecular Simulation (LSMO), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL) Sion Valais Switzerland
| | - Qianxiang Ai
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
| | | | | | - Joshua D Bocarsly
- Yusuf Hamied Department of Chemistry, University of Cambridge Lensfield Road Cambridge CB2 1EW UK
| | - Andres M Bran
- Laboratory of Artificial Chemical Intelligence (LIAC), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL) Lausanne Switzerland
- National Centre of Competence in Research (NCCR) Catalysis, Ecole Polytechnique Fédérale de Lausanne (EPFL) Lausanne Switzerland
| | | | | | - Kamal Choudhary
- Material Measurement Laboratory, National Institute of Standards and Technology Maryland 20899 USA
| | - Defne Circi
- Mechanical Engineering and Materials Science, Duke University USA
| | - Sam Cox
- Department of Chemical Engineering, University of Rochester USA
| | - Wibe A de Jong
- Applied Mathematics and Computational Research Division, Lawrence Berkeley National Laboratory Berkeley CA 94720 USA
| | - Matthew L Evans
- Institut de la Matière Condensée et des Nanosciences (IMCN), UCLouvain Chemin des Étoiles 8 Louvain-la-Neuve 1348 Belgium
- Matgenix SRL 185 Rue Armand Bury 6534 Gozée Belgium
| | - Nicolas Gastellu
- Department of Chemistry, McGill University Montreal Quebec Canada
| | - Jerome Genzling
- Department of Chemistry, McGill University Montreal Quebec Canada
| | - María Victoria Gil
- Instituto de Ciencia y Tecnología del Carbono (INCAR), CSIC Francisco Pintado Fe 26 33011 Oviedo Spain
| | - Ankur K Gupta
- Applied Mathematics and Computational Research Division, Lawrence Berkeley National Laboratory Berkeley CA 94720 USA
| | - Zhi Hong
- Department of Computer Science, University of Chicago Chicago Illinois 60637 USA
| | - Alishba Imran
- Computer Science, University of California Berkeley CA 94704 USA
| | - Sabine Kruschwitz
- Bundesanstalt für Materialforschung und -prüfung Unter den Eichen 87 12205 Berlin Germany
| | - Anne Labarre
- Department of Chemistry, McGill University Montreal Quebec Canada
| | - Jakub Lála
- Francis Crick Institute 1 Midland Rd London NW1 1AT UK
| | - Tao Liu
- Department of Chemistry, McGill University Montreal Quebec Canada
| | - Steven Ma
- Department of Chemistry, McGill University Montreal Quebec Canada
| | - Sauradeep Majumdar
- Laboratory of Molecular Simulation (LSMO), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL) Sion Valais Switzerland
| | - Garrett W Merz
- American Family Insurance Data Science Institute, University of Wisconsin-Madison Madison WI 53706 USA
| | | | - Elias Moubarak
- Laboratory of Molecular Simulation (LSMO), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL) Sion Valais Switzerland
| | - Beatriz Mouriño
- Laboratory of Molecular Simulation (LSMO), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL) Sion Valais Switzerland
| | - Brenden Pelkie
- Department of Chemical Engineering, University of Washington Seattle WA 98105 USA
| | | | | | - Bojana Ranković
- Laboratory of Artificial Chemical Intelligence (LIAC), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL) Lausanne Switzerland
- National Centre of Competence in Research (NCCR) Catalysis, Ecole Polytechnique Fédérale de Lausanne (EPFL) Lausanne Switzerland
| | | | - Jacob N Sanders
- Department of Chemistry and Biochemistry, University of California Los Angeles CA 90095 USA
| | - Philippe Schwaller
- Laboratory of Artificial Chemical Intelligence (LIAC), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL) Lausanne Switzerland
- National Centre of Competence in Research (NCCR) Catalysis, Ecole Polytechnique Fédérale de Lausanne (EPFL) Lausanne Switzerland
| | - Marcus Schwarting
- Department of Computer Science, University of Chicago Chicago IL 60490 USA
| | - Jiale Shi
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
| | - Berend Smit
- Laboratory of Molecular Simulation (LSMO), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL) Sion Valais Switzerland
| | - Ben E Smith
- Yusuf Hamied Department of Chemistry, University of Cambridge Lensfield Road Cambridge CB2 1EW UK
| | - Joren Van Herck
- Laboratory of Molecular Simulation (LSMO), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL) Sion Valais Switzerland
| | - Christoph Völker
- Bundesanstalt für Materialforschung und -prüfung Unter den Eichen 87 12205 Berlin Germany
| | - Logan Ward
- Data Science and Learning Division, Argonne National Lab USA
| | - Sean Warren
- Department of Chemistry, McGill University Montreal Quebec Canada
| | - Benjamin Weiser
- Department of Chemistry, McGill University Montreal Quebec Canada
| | - Sylvester Zhang
- Department of Chemistry, McGill University Montreal Quebec Canada
| | - Xiaoqi Zhang
- Laboratory of Molecular Simulation (LSMO), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL) Sion Valais Switzerland
| | - Ghezal Ahmad Zia
- Bundesanstalt für Materialforschung und -prüfung Unter den Eichen 87 12205 Berlin Germany
| | - Aristana Scourtas
- Globus, University of Chicago, Data Science and Learning Division, Argonne National Lab USA
| | - K J Schmidt
- Globus, University of Chicago, Data Science and Learning Division, Argonne National Lab USA
| | - Ian Foster
- Department of Computer Science, University of Chicago, Data Science and Learning Division, Argonne National Lab USA
| | - Andrew D White
- Department of Chemical Engineering, University of Rochester USA
| | - Ben Blaiszik
- Globus, University of Chicago, Data Science and Learning Division, Argonne National Lab USA
| |
Collapse
|
14
|
Dandu NK, Ward L, Assary RS, Redfern PC, Curtiss LA. Accurate Prediction of Adiabatic Ionization Potentials of Organic Molecules using Quantum Chemistry Assisted Machine Learning. J Phys Chem A 2023. [PMID: 37406209 DOI: 10.1021/acs.jpca.3c00823] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/07/2023]
Abstract
In previous work (Dandu et al., J. Phys. Chem. A, 2022, 126, 4528-4536), we were successful in predicting accurate atomization energies of organic molecules using machine learning (ML) models, obtaining an accuracy as low as 0.1 kcal/mol compared to the G4MP2 method. In this work, we extend the use of these ML models to adiabatic ionization potentials on data sets of energies generated using quantum chemical calculations. Atomic specific corrections that were found to improve atomization energies from quantum chemical calculations have also been used in this study to improve ionization potentials. The quantum chemical calculations were performed on 3405 molecules containing eight or fewer non-hydrogen atoms derived from the QM9 data set, using the B3LYP functional with the 6-31G(2df,p) basis set for optimization. Low-fidelity IPs for these structures were obtained using two density functional methods: B3LYP/6-31+G(2df,p) and ωB97XD/6-311+G(3df,2p). Highly accurate G4MP2 calculations were performed on these optimized structures to obtain high-fidelity IPs to use in ML models based on the low-fidelity IPs. Our best performing ML methods gave IPs of organic molecules within a mean absolute deviation of 0.035 eV from the G4MP2 IPs for the whole data set. This work demonstrates that ML predictions assisted by quantum chemical calculations can be used to successfully predict IPs of organic molecules for use in high throughput screening.
Collapse
Affiliation(s)
- Naveen K Dandu
- Materials Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
- Joint Center for Energy Storage Research (JCESR), Argonne National Laboratory, Lemont, Illinois 60439, United States
- Chemical Engineering Department, University of Illinois-Chicago, Chicago, Illinois 60608, United States
| | - Logan Ward
- Joint Center for Energy Storage Research (JCESR), Argonne National Laboratory, Lemont, Illinois 60439, United States
- Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Rajeev S Assary
- Materials Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
- Joint Center for Energy Storage Research (JCESR), Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Paul C Redfern
- Materials Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Larry A Curtiss
- Materials Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
- Joint Center for Energy Storage Research (JCESR), Argonne National Laboratory, Lemont, Illinois 60439, United States
| |
Collapse
|
15
|
Collins EM, Raghavachari K. Interpretable Graph-Network-Based Machine Learning Models via Molecular Fragmentation. J Chem Theory Comput 2023; 19:2804-2810. [PMID: 37134275 DOI: 10.1021/acs.jctc.2c01308] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
Chemists have long benefitted from the ability to understand and interpret the predictions of computational models. With the current shift to more complex deep learning models, in many situations that utility is lost. In this work, we expand on our previously work on computational thermochemistry and propose an interpretable graph network, FragGraph(nodes), that provides decomposed predictions into fragment-wise contributions. We demonstrate the usefulness of our model in predicting a correction to density functional theory (DFT)-calculated atomization energies using Δ-learning. Our model predicts G4(MP2)-quality thermochemistry with an accuracy of <1 kJ mol-1 for the GDB9 dataset. Besides the high accuracy of our predictions, we observe trends in the fragment corrections which quantitatively describe the deficiencies of B3LYP. Node-wise predictions significantly outperform our previous model predictions from a global state vector. This effect is most pronounced as we explore the generality by predicting on more diverse test sets indicating node-wise predictions are less sensitive to extending machine learning models to larger molecules.
Collapse
Affiliation(s)
- Eric M Collins
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| | - Krishnan Raghavachari
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| |
Collapse
|
16
|
Liang J, Wang Z, Li J, Wong J, Liu X, Ganoe B, Head-Gordon T, Head-Gordon M. Efficient Calculation of NMR Shielding Constants Using Composite Method Approximations and Locally Dense Basis Sets. J Chem Theory Comput 2023. [PMID: 36594660 DOI: 10.1021/acs.jctc.2c00933] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
This paper presents a systematic study of applying composite method approximations with locally dense basis sets (LDBS) to efficiently calculate NMR shielding constants in small and medium-sized molecules. The pcSseg-n series of basis sets are shown to have similar accuracy to the pcS-n series when n ≥ 1 and can slightly reduce computational costs. We identify two different LDBS partition schemes that perform very effectively for density functional calculations. We select a large subset of the recent NS372 database containing 290 H, C, N, and O shielding values evaluated by reference methods on 106 molecules to carefully assess methods of the high, medium, and low computational costs to make practical recommendations. Our assessment covers conventional electronic structure methods (density functional theory and wave function) with global basis calculations, as well as their use in one of the satisfactory LDBS approaches, and a range of composite approaches, also with and without LDBS. Altogether 99 methods are evaluated. On this basis, we recommend different methods to reach three different levels of accuracy and time requirements across the four nuclei considered.
Collapse
Affiliation(s)
- Jiashu Liang
- Kenneth S. Pitzer Center for Theoretical Chemistry, Department of Chemistry, University of California at Berkeley, Berkeley, California94720, United States
| | - Zhe Wang
- Kenneth S. Pitzer Center for Theoretical Chemistry, Department of Chemistry, University of California at Berkeley, Berkeley, California94720, United States
| | - Jie Li
- Kenneth S. Pitzer Center for Theoretical Chemistry, Department of Chemistry, University of California at Berkeley, Berkeley, California94720, United States
| | - Jonathan Wong
- Kenneth S. Pitzer Center for Theoretical Chemistry, Department of Chemistry, University of California at Berkeley, Berkeley, California94720, United States
| | - Xiao Liu
- Kenneth S. Pitzer Center for Theoretical Chemistry, Department of Chemistry, University of California at Berkeley, Berkeley, California94720, United States
| | - Brad Ganoe
- Kenneth S. Pitzer Center for Theoretical Chemistry, Department of Chemistry, University of California at Berkeley, Berkeley, California94720, United States
| | - Teresa Head-Gordon
- Kenneth S. Pitzer Center for Theoretical Chemistry, Department of Chemistry, University of California at Berkeley, Berkeley, California94720, United States
| | - Martin Head-Gordon
- Kenneth S. Pitzer Center for Theoretical Chemistry, Department of Chemistry, University of California at Berkeley, Berkeley, California94720, United States.,Chemical Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California94720, United States
| |
Collapse
|
17
|
Andreeva IV, Turovtsev VV, Qian S, Bara JE, Verevkin SP. Biofuel Additives: Thermodynamic Studies of Glycerol Ethers. Ind Eng Chem Res 2022. [DOI: 10.1021/acs.iecr.2c02351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Irina V. Andreeva
- Department of Physical Chemistry, University of Rostock, and Competence Centre CALOR of Faculty of Interdisciplinary Research at University of Rostock, 18059Rostock, Germany
| | - Vladimir V. Turovtsev
- Department of Physics, Mathematics and Medical Informatics, Tver State Medical University, 170100Tver, Russia
| | - Shuai Qian
- Department of Chemical & Biological Engineering, University of Alabama, Tuscaloosa, Alabama35487-0203, United States
| | - Jason E. Bara
- Department of Chemical & Biological Engineering, University of Alabama, Tuscaloosa, Alabama35487-0203, United States
| | - Sergey P. Verevkin
- Department of Physical Chemistry, University of Rostock, and Competence Centre CALOR of Faculty of Interdisciplinary Research at University of Rostock, 18059Rostock, Germany
- Department of Physical Chemistry, Kazan Federal University, 420008Kazan, Russia
| |
Collapse
|
18
|
Spiekermann K, Pattanaik L, Green WH. High accuracy barrier heights, enthalpies, and rate coefficients for chemical reactions. Sci Data 2022; 9:417. [PMID: 35851390 PMCID: PMC9293986 DOI: 10.1038/s41597-022-01529-6] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Accepted: 06/30/2022] [Indexed: 12/13/2022] Open
Abstract
Quantitative chemical reaction data, including activation energies and reaction rates, are crucial for developing detailed kinetic mechanisms and accurately predicting reaction outcomes. However, such data are often difficult to find, and high-quality datasets are especially rare. Here, we use CCSD(T)-F12a/cc-pVDZ-F12//ωB97X-D3/def2-TZVP to obtain high-quality single point calculations for nearly 22,000 unique stable species and transition states. We report the results from these quantum chemistry calculations and extract the barrier heights and reaction enthalpies to create a kinetics dataset of nearly 12,000 gas-phase reactions. These reactions involve H, C, N, and O, contain up to seven heavy atoms, and have cleaned atom-mapped SMILES. Our higher-accuracy coupled-cluster barrier heights differ significantly (RMSE of ∼5 kcal mol-1) relative to those calculated at ωB97X-D3/def2-TZVP. We also report accurate transition state theory rate coefficients [Formula: see text] between 300 K and 2000 K and the corresponding Arrhenius parameters for a subset of rigid reactions. We believe this data will accelerate development of automated and reliable methods for quantitative reaction prediction.
Collapse
Affiliation(s)
- Kevin Spiekermann
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA, 02139, USA
| | - Lagnajit Pattanaik
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA, 02139, USA
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA, 02139, USA.
| |
Collapse
|
19
|
Park S, Han H, Kim H, Choi S. Machine Learning Applications for Chemical Reactions. Chem Asian J 2022; 17:e202200203. [PMID: 35471772 PMCID: PMC9401034 DOI: 10.1002/asia.202200203] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 04/26/2022] [Indexed: 11/30/2022]
Abstract
Machine learning (ML) approaches have enabled rapid and efficient molecular property predictions as well as the design of new novel materials. In addition to great success for molecular problems, ML techniques are applied to various chemical reaction problems that require huge costs to solve with the existing experimental and simulation methods. In this review, starting with basic representations of chemical reactions, we summarized recent achievements of ML studies on two different problems; predicting reaction properties and synthetic routes. The various ML models are used to predict physical properties related to chemical reaction properties (e. g. thermodynamic changes, activation barriers, and reaction rates). Furthermore, the predictions of reactivity, self-optimization of reaction, and designing retrosynthetic reaction paths are also tackled by ML approaches. Herein we illustrate various ML strategies utilized in the various context of chemical reaction studies.
Collapse
Affiliation(s)
- Sanggil Park
- Department of ChemistryIncheon Natoinal University and Research Institute of Basic SciencesIncheon22012Republic of Korea
| | - Herim Han
- Digital Bio R&D CenterMediazenSeoul07789Republic of Korea
- Department of Polymer Science and EngineeringDankook UniversityYongin, Gyeonggi16890Republic of Korea
| | - Hyungjun Kim
- Department of ChemistryIncheon Natoinal University and Research Institute of Basic SciencesIncheon22012Republic of Korea
| | - Sunghwan Choi
- Division of National SupercomputingKorea Institute of Science and Technology InformationDaejeon34141Republic of Korea
| |
Collapse
|
20
|
Ruth M, Gerbig D, Schreiner PR. Machine Learning of Coupled Cluster (T)-Energy Corrections via Delta (Δ)-Learning. J Chem Theory Comput 2022; 18:4846-4855. [PMID: 35816588 DOI: 10.1021/acs.jctc.2c00501] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Accurate thermochemistry is essential in many chemical disciplines, such as astro-, atmospheric, or combustion chemistry. These areas often involve fleetingly existent intermediates whose thermochemistry is difficult to assess. Whenever direct calorimetric experiments are infeasible, accurate computational estimates of relative molecular energies are required. However, high-level computations, often using coupled cluster theory, are generally resource-intensive. To expedite the process using machine learning techniques, we generated a database of energies for small organic molecules at the CCSD(T)/cc-pVDZ, CCSD(T)/aug-cc-pVDZ, and CCSD(T)/cc-pVTZ levels of theory. Leveraging the power of deep learning by employing graph neural networks, we are able to predict the effect of perturbatively included triples (T), that is, the difference between CCSD and CCSD(T) energies, with a mean absolute error of 0.25, 0.25, and 0.28 kcal mol-1 (R2 of 0.998, 0.997, and 0.998) with the cc-pVDZ, aug-cc-pVDZ, and cc-pVTZ basis sets, respectively. Our models were further validated by application to three validation sets taken from the S22 Database as well as to a selection of known theoretically challenging cases.
Collapse
Affiliation(s)
- Marcel Ruth
- Institute of Organic Chemistry, Justus Liebig University, Heinrich-Buff-Ring 17, 35392 Giessen, Germany
| | - Dennis Gerbig
- Institute of Organic Chemistry, Justus Liebig University, Heinrich-Buff-Ring 17, 35392 Giessen, Germany
| | - Peter R Schreiner
- Institute of Organic Chemistry, Justus Liebig University, Heinrich-Buff-Ring 17, 35392 Giessen, Germany
| |
Collapse
|
21
|
Dandu NK, Assary RS, Redfern PC, Ward L, Foster I, Curtiss LA. Improving the Accuracy of Composite Methods: A G4MP2 Method with G4-like Accuracy and Implications for Machine Learning. J Phys Chem A 2022; 126:4528-4536. [PMID: 35786965 DOI: 10.1021/acs.jpca.2c01327] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
G4MP2 theory has proven to be a reliable and accurate quantum chemical composite method for the calculation of molecular energies using an approximation based on second-order perturbation theory to lower computational costs compared to G4 theory. However, it has been found to have significantly increased errors when applied to larger organic molecules with 10 or more nonhydrogen atoms. We report here on an investigation of the cause of the failure of G4MP2 theory for such larger molecules. One source of error is found to be the "higher-level correction (HLC)", which is meant to correct for deficiencies in correlation contributions to the calculated energies. This is because the HLC assumes that the contribution is independent of the element and the type of bonding involved, both of which become more important with larger molecules. We address this problem by adding an atom-specific correction, dependent on atom type but not bond type, to the higher-level correction. We find that a G4MP2 method that incorporates this modification of the higher-level correction, referred to as G4MP2A, becomes as accurate as G4 theory (for computing enthalpies of formation) for a test set of molecules with less than 10 nonhydrogen atoms as well as a set with 10-14 such atoms, the set of molecules considered here, with a much lower computational cost. The G4MP2A method is also found to significantly improve ionization potentials and electron affinities. Finally, we implemented the G4MP2A energies in a machine learning method to predict molecular energies.
Collapse
Affiliation(s)
- Naveen K Dandu
- Materials Science Division, Argonne National Laboratory, Lemont, Illinois 60439 United States.,Joint Center for Energy Storage Research, Argonne National Laboratory, Lemont, Illinois 60439 United States.,Chemical Engineering Department, University of Illinois-Chicago, Chicago, Illinois 60607 United States
| | - Rajeev S Assary
- Materials Science Division, Argonne National Laboratory, Lemont, Illinois 60439 United States.,Joint Center for Energy Storage Research, Argonne National Laboratory, Lemont, Illinois 60439 United States
| | - Paul C Redfern
- Materials Science Division, Argonne National Laboratory, Lemont, Illinois 60439 United States
| | - Logan Ward
- Joint Center for Energy Storage Research, Argonne National Laboratory, Lemont, Illinois 60439 United States.,Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois 60439 United States
| | - Ian Foster
- Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois 60439 United States.,Department of Computer Science, University of Chicago, Chicago, Illinois 60637 United States
| | - Larry A Curtiss
- Materials Science Division, Argonne National Laboratory, Lemont, Illinois 60439 United States.,Joint Center for Energy Storage Research, Argonne National Laboratory, Lemont, Illinois 60439 United States
| |
Collapse
|
22
|
Si Y, Liu Y, Lai W, Ma Y, Shi J, Wang B, Liu M, Yu T. A New Enthalpy of Formation Test Set Designed for Organic Fluorine Containing Compounds. ADVANCED THEORY AND SIMULATIONS 2022. [DOI: 10.1002/adts.202200093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Affiliation(s)
- Yitao Si
- State Key Laboratory of Fluorine and Nitrogen Chemicals Xi'an Modern Chemistry Research Institute Xi'an 710065 P. R. China
- International Research Center for Renewable Energy State Key Laboratory of Multiphase Flow Xi'an Jiaotong University Xi'an 710049 P. R. China
| | - Yingzhe Liu
- State Key Laboratory of Fluorine and Nitrogen Chemicals Xi'an Modern Chemistry Research Institute Xi'an 710065 P. R. China
| | - Weipeng Lai
- State Key Laboratory of Fluorine and Nitrogen Chemicals Xi'an Modern Chemistry Research Institute Xi'an 710065 P. R. China
| | - Yiding Ma
- State Key Laboratory of Fluorine and Nitrogen Chemicals Xi'an Modern Chemistry Research Institute Xi'an 710065 P. R. China
| | - Jinwen Shi
- International Research Center for Renewable Energy State Key Laboratory of Multiphase Flow Xi'an Jiaotong University Xi'an 710049 P. R. China
| | - Bozhou Wang
- State Key Laboratory of Fluorine and Nitrogen Chemicals Xi'an Modern Chemistry Research Institute Xi'an 710065 P. R. China
| | - Maochang Liu
- International Research Center for Renewable Energy State Key Laboratory of Multiphase Flow Xi'an Jiaotong University Xi'an 710049 P. R. China
| | - Tao Yu
- State Key Laboratory of Fluorine and Nitrogen Chemicals Xi'an Modern Chemistry Research Institute Xi'an 710065 P. R. China
- School of Chemistry and Chemical Engineering Southeast University Nanjing 211189 P. R. China
| |
Collapse
|
23
|
Wiik K, Høyvik IM, Unneberg E, Jensen TL, Swang O. Unimolecular Decomposition Reactions of Picric Acid and Its Methylated Derivatives─A DFT Study. J Phys Chem A 2022; 126:2645-2657. [PMID: 35472276 PMCID: PMC9082609 DOI: 10.1021/acs.jpca.1c10770] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
![]()
To handle energetic
materials safely, it is important to have knowledge
about their sensitivity. Density functional theory (DFT) has proven
a valuable tool in the study of energetic materials, and in the current
work, DFT is employed to study the thermal unimolecular decomposition
of 2,4,6-trinitrophenol (picric acid, PA), 3-methyl-2,4,6-trinitrophenol
(methyl picric acid, mPA), and 3,5-dimethyl-2,4,6-trinitrophenol (dimethyl
picric acid, dmPA). These compounds have similar molecular structures,
but according to the literature, mPA is far less sensitive to impact
than the other two compounds. Three pathways believed important for
the initiation reactions are investigated at 0 and 298.15 K. We compare
the computed energetics of the reaction pathways with the objective
of rationalizing the unexpected sensitivity behavior. Our results
reveal a few if any significant differences in the energetics of the
three molecules, and thus do not reflect the sensitivity deviations
observed in experiments. These findings point toward the potential
importance of crystal structure, crystal morphology, bimolecular reactions,
or combinations thereof on the impact sensitivity of nitroaromatics.
Collapse
Affiliation(s)
- Kristine Wiik
- Chemistry Department, The Norwegian University of Science and Technology (NTNU), Høgskoleringen 5, 7491 Trondheim, Norway.,Department of Process Technology, SINTEF Industry, P.O. Box 124 Blindern, 0314 Oslo, Norway
| | - Ida-Marie Høyvik
- Chemistry Department, The Norwegian University of Science and Technology (NTNU), Høgskoleringen 5, 7491 Trondheim, Norway
| | - Erik Unneberg
- Norwegian Defence Research Establishment (FFI), P.O. Box 25, 2027 Kjeller, Norway
| | - Tomas Lunde Jensen
- Norwegian Defence Research Establishment (FFI), P.O. Box 25, 2027 Kjeller, Norway
| | - Ole Swang
- Department of Process Technology, SINTEF Industry, P.O. Box 124 Blindern, 0314 Oslo, Norway
| |
Collapse
|
24
|
Nunes CM, Pereira NAM, Fausto R. Photochromism of a Spiropyran in Low-Temperature Matrices: Unprecedented Bidirectional Switching between a Merocyanine and an Allene Intermediate. J Phys Chem A 2022; 126:2222-2233. [PMID: 35362982 DOI: 10.1021/acs.jpca.2c01105] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Photochromism of spiropyrans has attracted much attention due to its potential in many light-controlled system applications. However, several fundamental aspects regarding the structure, energetics, and mechanistic details of the transformations of spiropyrans are still not well understood. Here, we report the study of the photochromism of a 6-hydroxy-spiropyran (HBPS) under conditions of matrix isolation, where monomers of the compound are frozen in a solidified noble gas (krypton, at 15 K). The structure of the matrix-isolated HBPS was first elucidated by infrared (IR) spectroscopy supported by density functional theory computations. Then, the photochromism of HBPS, from the colorless spiropyran to the colored merocyanine, was induced by ultraviolet (UV) irradiation at 310 nm. The analysis of the IR spectrum of the photoproduced species revealed the exclusive formation of the most stable merocyanine MC-TTC stereoisomer. Subsequent visible-light (550 nm) irradiation of MC-TTC generated a new colorless allenic isomeric species ALN, where the UV irradiation (310 nm) of ALN was found to convert this species back to MC-TTC. This constitutes an unprecedented bidirectional transformation between a colored merocyanine and a colorless allene species. The newly observed photoswitching reaction (or photochromism) occurs along an intramolecular hydrogen bond existing in both merocyanine and allenic species, thus suggesting that it might be generally feasible in the chemistry of spiropyrans. On the other hand, the usual assumption that, as a general rule, merocyanines photochemically revert to spiropyrans is not supported in this work.
Collapse
Affiliation(s)
- Cláudio M Nunes
- CQC-IMS, Department of Chemistry, University of Coimbra, Coimbra 3004-535, Portugal
| | - Nelson A M Pereira
- CQC-IMS, Department of Chemistry, University of Coimbra, Coimbra 3004-535, Portugal
| | - Rui Fausto
- CQC-IMS, Department of Chemistry, University of Coimbra, Coimbra 3004-535, Portugal
| |
Collapse
|
25
|
Habershon S. Program Synthesis of Sparse Algorithms for Wave Function and Energy Prediction in Grid-Based Quantum Simulations. J Chem Theory Comput 2022; 18:2462-2478. [PMID: 35293216 PMCID: PMC9009083 DOI: 10.1021/acs.jctc.2c00035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We have recently shown how program synthesis (PS), or the concept of "self-writing code", can generate novel algorithms that solve the vibrational Schrödinger equation, providing approximations to the allowed wave functions for bound, one-dimensional (1-D) potential energy surfaces (PESs). The resulting algorithms use a grid-based representation of the underlying wave function ψ(x) and PES V(x), providing codes which represent approximations to standard discrete variable representation (DVR) methods. In this Article, we show how this inductive PS strategy can be improved and modified to enable prediction of both vibrational wave functions and energy eigenvalues of representative model PESs (both 1-D and multidimensional). We show that PS can generate algorithms that offer some improvements in energy eigenvalue accuracy over standard DVR schemes; however, we also demonstrate that PS can identify accurate numerical methods that exhibit desirable computational features, such as employing very sparse (tridiagonal) matrices. The resulting PS-generated algorithms are initially developed and tested for 1-D vibrational eigenproblems, before solution of multidimensional problems is demonstrated; we find that our new PS-generated algorithms can reduce calculation times for grid-based eigenvector computation by an order of magnitude or more. More generally, with further development and optimization, we anticipate that PS-generated algorithms based on effective Hamiltonian approximations, such as those proposed here, could be useful in direct simulations of quantum dynamics via wave function propagation and evaluation of molecular electronic structure.
Collapse
Affiliation(s)
- Scott Habershon
- Department of Chemistry, University of Warwick, Coventry, CV4 7AL, United Kingdom
| |
Collapse
|
26
|
Gupta AK, Raghavachari K. Three-Dimensional Convolutional Neural Networks Utilizing Molecular Topological Features for Accurate Atomization Energy Predictions. J Chem Theory Comput 2022; 18:2132-2143. [PMID: 35226496 DOI: 10.1021/acs.jctc.1c00504] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Deep learning methods provide a novel way to establish a correlation between two quantities. In this context, computer vision techniques such as three-dimensional (3D)-convolutional neural networks become a natural choice to associate a molecular property with its structure due to the inherent 3D nature of a molecule. However, traditional 3D input data structures are intrinsically sparse in nature, which tend to induce instabilities during the learning process, which in turn may lead to underfitted results. To address this deficiency, in this project, we propose to use quantum-chemically derived molecular topological features, namely, localized orbital locator and electron localization function, as molecular descriptors, which provide a relatively denser input representation in a 3D space. Such topological features provide a detailed picture of the atomic and electronic configuration and interatomic interactions in the molecule and hence are ideal for predicting properties that are highly dependent on the physical or electronic structure of the molecule. Herein, we demonstrate the efficacy of our proposed model by applying it to the task of predicting atomization energies for the QM9-G4MP2 data set, which contains ∼134k molecules. Furthermore, we incorporated the Δ-machine learning approach into our model, which enabled us to reach beyond benchmark accuracy levels (∼1.0 kJ mol-1). As a result, we consistently obtain impressive mean absolute errors of the order 0.1 kcal mol-1 (∼0.42 kJ mol-1) versus the G4(MP2) theory using relatively modest models, which could potentially be improved further in a systematic manner using additional compute resources.
Collapse
Affiliation(s)
- Ankur Kumar Gupta
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| | - Krishnan Raghavachari
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| |
Collapse
|
27
|
Andreeva IV, Zaitsau DH, Qian S, Turovtzev VV, Pimerzin AA, Bara JE, Verevkin SP. Glycerol valorisation towards biofuel additivities: Thermodynamic studies of glycerol ethers. Chem Eng Sci 2022. [DOI: 10.1016/j.ces.2021.117032] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
28
|
Li B, Rangarajan S. A conceptual study of transfer learning with linear models for data-driven property prediction. Comput Chem Eng 2022. [DOI: 10.1016/j.compchemeng.2021.107599] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
29
|
Andreeva IV, Pimerzin AA, Turovtsev VV, Qian S, Bara JE, Verevkin SP. Commodity Chemicals and Fuels from Biomass: Thermodynamic Properties of Levoglucosan Derivatives. Ind Eng Chem Res 2021. [DOI: 10.1021/acs.iecr.1c02230] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Irina V. Andreeva
- Department of Physical Chemistry and Faculty of Interdisciplinary Research, Competence Centre CALOR, University of Rostock, Rostock 18059, Germany
| | - Aleksey A. Pimerzin
- Chemical Department, Samara State Technical University, Samara 443100, Russia
| | - Vladimir V. Turovtsev
- Department of Physics, Mathematics and Medical Informatics, Tver State Medical University, Tver 170100, Russia
| | - Shuai Qian
- Department of Chemical & Biological Engineering, University of Alabama, Tuscaloosa, Alabama 35487-0203, United States
| | - Jason E. Bara
- Department of Chemical & Biological Engineering, University of Alabama, Tuscaloosa, Alabama 35487-0203, United States
| | - Sergey P. Verevkin
- Department of Physical Chemistry and Faculty of Interdisciplinary Research, Competence Centre CALOR, University of Rostock, Rostock 18059, Germany
- Chemical Department, Samara State Technical University, Samara 443100, Russia
| |
Collapse
|
30
|
Yalcin-Ozkat G. Molecular Modeling Strategies of Cancer Multidrug Resistance. Drug Resist Updat 2021; 59:100789. [PMID: 34973929 DOI: 10.1016/j.drup.2021.100789] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 11/20/2021] [Accepted: 11/22/2021] [Indexed: 02/07/2023]
Abstract
Cancer remains a leading cause of morbidity and mortality worldwide. Hence, the increase in cancer cases observed in the elderly population, as well as in children and adolescents, makes human malignancies a prime target for anticancer drug development. Although highly effective chemotherapeutic agents are continuously developed and approved for clinical treatment, the major impediment towards curative cancer therapy remains multidrug resistance (MDR). In recent years, intensive studies have been carried out on the identification of new therapeutic molecules to reverse MDR efflux transporters of the ATP-binding cassette (ABC) superfamily. Although a great deal of progress has been made in the development of specific inhibitors for certain MDR efflux pumps in experimental studies, advanced computational studies can accelerate this drug development process. In the literature, there are many experimental studies on the impact of natural products and synthetic small molecules on the reversal of cancer MDR. Molecular modeling methods provide an opportunity to explain the activity of these molecules on the ABC-transporter family with non-covalent interactions as well as it is possible to carry out studies for the discovery of new anticancer drugs specific to MDR with these methods. The coordinate file of the 3-dimensional (3D) structure of the target protein is indispensable for molecular modeling studies. In some cases where a 3D structure cannot be obtained by experimental methods, the homology modeling method can be applied to obtain the file containing the target protein's information including atomic coordinates, secondary structure assignments, and atomic connectivity. Homology modeling studies are of great importance for efflux transporter proteins that still lack 3D structures due to crystallization problems with multiple hydrophobic transmembrane domains. Quantum mechanics, molecular docking and molecular dynamics simulation applications are the most frequently used molecular modeling methods in the literature to investigate non-covalent interactions between the drug-ABC transporter superfamily. The quantitative structure-activity relationship (QSAR) model provides a relationship between the chemical properties of a compound and its biological activity. Determining the pharmacophore region for a new drug molecule by superpositioning a series of molecules according to their physicochemical properties using QSAR models is another method in which molecular modeling is used in computational drug development studies with ABC transporter proteins. There are also in silico absorption, distribution, metabolism, excretion, and toxicity (ADME/Tox) studies conducted to make a prediction about the pharmacokinetic properties, and drug-likeness of new molecules. Drug repurposing studies, which have become a trending topic in recent years, involve identifying possible new targets for an already approved drug molecule. There are few studies in the literature in which drug repurposing performed by molecular modelling methods has been applied on ABC transporter proteins. The aim of the current paper is to create a complete review of drug development studies including aforementioned molecular modeling methods carried out between the years 2019-2021. Furthermore, an intensive investigation is also conducted on licensed applications and free web servers used in in silico studies. The current review is an up-to-date guide for researchers who plan to conduct computational studies with MDR transporter proteins.
Collapse
Affiliation(s)
- Gozde Yalcin-Ozkat
- Recep Tayyip Erdogan University, Faculty of Engineering and Architecture, Bioengineering Department, 53100, Rize, Turkey; Max Planck Institute for Dynamics of Complex Technical Systems, Molecular Simulations and Design Group, Sandtorstrasse 1, 39106, Magdeburg, Germany.
| |
Collapse
|
31
|
Abstract
We demonstrate that a program synthesis approach based on a linear code representation can be used to generate algorithms that approximate the ground-state solutions of one-dimensional time-independent Schrödinger equations constructed with bound polynomial potential energy surfaces (PESs). Here, an algorithm is constructed as a linear series of instructions operating on a set of input vectors, matrices, and constants that define the problem characteristics, such as the PES. Discrete optimization is performed using simulated annealing in order to identify sequences of code-lines, operating on the program inputs that can reproduce the expected ground-state wavefunctions ψ(x) for a set of target PESs. The outcome of this optimization is not simply a mathematical function approximating ψ(x) but is, instead, a complete algorithm that converts the input vectors describing the system into a ground-state solution of the Schrödinger equation. These initial results point the way toward an alternative route for developing novel algorithms for quantum chemistry applications.
Collapse
Affiliation(s)
- Scott Habershon
- Department of Chemistry, University of Warwick, Coventry CV4 7AL, United Kingdom
| |
Collapse
|
32
|
Abstract
Chemical compound space (CCS), the set of all theoretically conceivable combinations of chemical elements and (meta-)stable geometries that make up matter, is colossal. The first-principles based virtual sampling of this space, for example, in search of novel molecules or materials which exhibit desirable properties, is therefore prohibitive for all but the smallest subsets and simplest properties. We review studies aimed at tackling this challenge using modern machine learning techniques based on (i) synthetic data, typically generated using quantum mechanics based methods, and (ii) model architectures inspired by quantum mechanics. Such Quantum mechanics based Machine Learning (QML) approaches combine the numerical efficiency of statistical surrogate models with an ab initio view on matter. They rigorously reflect the underlying physics in order to reach universality and transferability across CCS. While state-of-the-art approximations to quantum problems impose severe computational bottlenecks, recent QML based developments indicate the possibility of substantial acceleration without sacrificing the predictive power of quantum mechanics.
Collapse
Affiliation(s)
- Bing Huang
- Faculty
of Physics, University of Vienna, 1090 Vienna, Austria
| | - O. Anatole von Lilienfeld
- Faculty
of Physics, University of Vienna, 1090 Vienna, Austria
- Institute
of Physical Chemistry and National Center for Computational Design
and Discovery of Novel Materials (MARVEL), Department of Chemistry, University of Basel, 4056 Basel, Switzerland
| |
Collapse
|
33
|
Intermolecular insights into allosteric inhibition of histone lysine-specific demethylase 1. Biochim Biophys Acta Gen Subj 2021; 1865:129990. [PMID: 34390793 DOI: 10.1016/j.bbagen.2021.129990] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Revised: 08/04/2021] [Accepted: 08/06/2021] [Indexed: 02/07/2023]
Abstract
BACKGROUND Histone lysine-specific demethylase 1 (LSD1) has become a potential anticancer target for the novel drug discovery. Recent reports have shown that SP2509 and its derivatives strongly inhibit LSD1 as allosteric inhibitors. However, the binding mechanism of these allosteric inhibitors in the allosteric site of LSD1 is not known yet. METHODS The stability and binding mechanism of allosteric inhibitors in the binding site of LSD1 were evaluated by molecular docking, ligand-based pharmacophore, molecular dynamics (MD) simulations, molecular mechanics generalized born surface area (MM/GBSA) analysis, quantum mechanics/molecular mechanics (QM/MM) calculation and Hirshfeld surface analysis. RESULTS The conformational geometry and the intermolecular interactions of allosteric inhibitors showed high binding affinity towards allosteric site of LSD1 with the neighboring amino acids (Gly358, Cys360, Leu362, Asp375 and Glu379). Meanwhile, MD simulations and MM/GBSA analysis were performed on selected allosteric inhibitors in complex with LSD1 protein, which confirmed the high stability and binding affinity of these inhibitors in the allosteric site of LSD1. CONCLUSION The simulation results revealed the crucial factors accounting for allosteric inhibitors of LSD1, including different protein-ligand interactions, the positions and conformations of key residues, and the ligands flexibilities. Meanwhile, a halogen bond interaction between chlorine atom of ligand and key residues Trp531 and His532 was recurrent in our analysis confirming its importance. GENERAL SIGNIFICANCE Overall, our research analyzed in depth the binding modes of allosteric inhibitors with LSD1 and could provide useful information for the design of novel allosteric inhibitors.
Collapse
|
34
|
Collins EM, Raghavachari K. A Fragmentation-Based Graph Embedding Framework for QM/ML. J Phys Chem A 2021; 125:6872-6880. [PMID: 34342449 DOI: 10.1021/acs.jpca.1c06152] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
We introduce a new fragmentation-based molecular representation framework "FragGraph" for QM/ML methods involving embedding fragment-wise fingerprints onto molecular graphs. Our model is specifically designed for delta machine learning (Δ-ML) with the central goal of correcting the deficiencies of approximate methods such as DFT to achieve high accuracy. Our framework is based on a judicious combination of ideas from fragmentation, error cancellation, and a state-of-the-art deep learning architecture. Broadly, we develop a general graph-network framework for molecular machine learning by incorporating the inherent advantages prebuilt into error cancellation methods such as the generalized Connectivity-Based Hierarchy. More specifically, we develop a QM/ML representation through a fragmentation-based attributed graph representation encoded with fragment-wise molecular fingerprints. The utility of our representation is demonstrated through a graph network fingerprint encoder in which a global fingerprint is generated through message passing of local neighborhoods of fragment-wise fingerprints, effectively augmenting standard fingerprints to also include the inbuilt molecular graph structure. On the 130k-GDB9 dataset, our method predicts an out-of-sample mean absolute error significantly lower than 1 kJ/mol compared to target G4(MP2) calculated energies, rivaling current deep learning methods with reduced computational scaling.
Collapse
Affiliation(s)
- Eric M Collins
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| | - Krishnan Raghavachari
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| |
Collapse
|
35
|
Ward L, Dandu N, Blaiszik B, Narayanan B, Assary RS, Redfern PC, Foster I, Curtiss LA. Graph-Based Approaches for Predicting Solvation Energy in Multiple Solvents: Open Datasets and Machine Learning Models. J Phys Chem A 2021; 125:5990-5998. [PMID: 34191512 DOI: 10.1021/acs.jpca.1c01960] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The solvation properties of molecules, often estimated using quantum chemical simulations, are important in the synthesis of energy storage materials, drugs, and industrial chemicals. Here, we develop machine learning models of solvation energies to replace expensive quantum chemistry calculations with inexpensive-to-compute message-passing neural network models that require only the molecular graph as inputs. Our models are trained on a new database of solvation energies for 130,258 molecules taken from the QM9 dataset computed in five solvents (acetone, ethanol, acetonitrile, dimethyl sulfoxide, and water) via an implicit solvent model. Our best model achieves a mean absolute error of 0.5 kcal/mol for molecules with nine or fewer non-hydrogen atoms and 1 kcal/mol for molecules with between 10 and 14 non-hydrogen atoms. We make the entire dataset of 651,290 computed entries openly available and provide simple web and programmatic interfaces to enable others to run our solvation energy model on new molecules. This model calculates the solvation energies for molecules using only the SMILES string and also provides an estimate of whether each molecule is within the domain of applicability of our model. We envision that the dataset and models will provide the functionality needed for the rapid screening of large chemical spaces to discover improved molecules for many applications.
Collapse
Affiliation(s)
- Logan Ward
- Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Naveen Dandu
- Materials Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Ben Blaiszik
- Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois 60439, United States.,Globus, University of Chicago, Chicago, Illinois 60637, United States
| | - Badri Narayanan
- Materials Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States.,Department of Mechanical Engineering, University of Louisville, Louisville, Kentucky 40292, United States
| | - Rajeev S Assary
- Materials Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Paul C Redfern
- Materials Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Ian Foster
- Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois 60439, United States.,Globus, University of Chicago, Chicago, Illinois 60637, United States.,Department of Computer Science, University of Chicago, Chicago, Illinois 60637, United States
| | - Larry A Curtiss
- Materials Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| |
Collapse
|
36
|
Dobbelaere MR, Plehiers PP, Van de Vijver R, Stevens CV, Van Geem KM. Learning Molecular Representations for Thermochemistry Prediction of Cyclic Hydrocarbons and Oxygenates. J Phys Chem A 2021; 125:5166-5179. [PMID: 34081474 DOI: 10.1021/acs.jpca.1c01956] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Accurate thermochemistry estimation of polycyclic molecules is crucial for kinetic modeling of chemical processes that use renewable and alternative feedstocks. In kinetic model generators, molecular properties are estimated rapidly with group additivity, but this method is known to have limitations for polycyclic structures. This issue has been resolved in our work by combining a geometry-based molecular representation with a deep neural network trained on ab initio data. Each molecule is transformed into a probabilistic vector from its interatomic distances, bond angles, and dihedral angles. The model is tested on a small experimental dataset (200 molecules) from the literature, a new medium-sized set (4000 molecules) with both open-shell and closed-shell species, calculated at the CBS-QB3 level with empirical corrections, and a large G4MP2-level QM9-based dataset (40 000 molecules). Heat capacities between 298.15 and 2500 K are calculated in the medium set with an average deviation of about 1.5 J mol-1 K-1 and the standard entropy at 298.15 K is predicted with an average error below 4 J mol-1 K-1. The standard enthalpy of formation at 298.15 K has an average out-of-sample error below 4 kJ mol-1 on a QM9 training set size of around 15 000 molecules. By fitting NASA polynomials, the enthalpy of formation at higher temperatures can be calculated with the same accuracy as the standard enthalpy of formation. Uncertainty quantification by means of the ensemble standard deviation is included to indicate when molecules that are on the edge or outside of the application range of the model are evaluated.
Collapse
Affiliation(s)
- Maarten R Dobbelaere
- Laboratory for Chemical Technology, Department of Materials, Textiles and Chemical Engineering, Ghent University, Technologiepark 125, 9052 Gent, Belgium
| | - Pieter P Plehiers
- Laboratory for Chemical Technology, Department of Materials, Textiles and Chemical Engineering, Ghent University, Technologiepark 125, 9052 Gent, Belgium
| | - Ruben Van de Vijver
- Laboratory for Chemical Technology, Department of Materials, Textiles and Chemical Engineering, Ghent University, Technologiepark 125, 9052 Gent, Belgium
| | - Christian V Stevens
- SynBioC Research Group, Department of Green Chemistry and Technology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, 9000 Gent, Belgium
| | - Kevin M Van Geem
- Laboratory for Chemical Technology, Department of Materials, Textiles and Chemical Engineering, Ghent University, Technologiepark 125, 9052 Gent, Belgium
| |
Collapse
|
37
|
Group Contribution Revisited: The Enthalpy of Formation of Organic Compounds with “Chemical Accuracy”. CHEMENGINEERING 2021. [DOI: 10.3390/chemengineering5020024] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Group contribution (GC) methods to predict thermochemical properties are of eminent importance to process design. Compared to previous works, we present an improved group contribution parametrization for the heat of formation of organic molecules exhibiting chemical accuracy, i.e., a maximum 1 kcal/mol (4.2 kJ/mol) difference between the experiment and model, while, at the same time, minimizing the number of parameters. The latter is extremely important as too many parameters lead to overfitting and, therewith, to more or less serious incorrect predictions for molecules that were not within the data set used for parametrization. Moreover, it was found to be important to explicitly account for common chemical knowledge, e.g., geminal effects or ring strain. The group-related parameters were determined step-wise: first, alkanes only, and then only one additional group in the next class of molecules. This ensures unique and optimal parameter values for each chemical group. All data will be made available, enabling other researchers to extend the set to other classes of molecules.
Collapse
|
38
|
Han H, Choi S. Transfer Learning from Simulation to Experimental Data: NMR Chemical Shift Predictions. J Phys Chem Lett 2021; 12:3662-3668. [PMID: 33826849 DOI: 10.1021/acs.jpclett.1c00578] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
An accurate prediction of chemical shifts (δ) to elucidate molecular structures has been a challenging problem. Recently, noble machine learning architectures achieve accurate prediction performance, but the difficulty of building a huge chemical database limits the applicability of machine learning approaches. In this work, we demonstrate that the prior knowledge gained from the simulation database is successfully transferred into the problem of predicting an experimentally measured δ. Although both simulation and experimental databases are vastly different in chemical perspectives, reliable accuracy for δ is achieved by additional training with randomly sampled small numbers of experimental data. Furthermore, the prior knowledge allows us to successfully train the model on the more focused chemical space that the experimental database sparsely covers. The proposed approach, the knowledge transfer from the simulation database, can be utilized to enhance the usability of the local experimental database.
Collapse
Affiliation(s)
- Herim Han
- Division of National Supercomputing, Korea Institute of Science and Technology Information, 245 Daehak-Ro, Yuseong-Gu, Daejeon 34141, Republic of Korea
- Department of Polymer Science and Engineering, Dankook University, 152 Jukjeon-Ro, Suji-Gu, Yongin, Gyeonggi 16890, Republic of Korea
| | - Sunghwan Choi
- Division of National Supercomputing, Korea Institute of Science and Technology Information, 245 Daehak-Ro, Yuseong-Gu, Daejeon 34141, Republic of Korea
| |
Collapse
|
39
|
Senthil S, Chakraborty S, Ramakrishnan R. Troubleshooting unstable molecules in chemical space. Chem Sci 2021; 12:5566-5573. [PMID: 34163773 PMCID: PMC8179589 DOI: 10.1039/d0sc05591c] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Accepted: 02/27/2021] [Indexed: 01/11/2023] Open
Abstract
A key challenge in automated chemical compound space explorations is ensuring veracity in minimum energy geometries-to preserve intended bonding connectivities. We discuss an iterative high-throughput workflow for connectivity preserving geometry optimizations exploiting the nearness between quantum mechanical models. The methodology is benchmarked on the QM9 dataset comprising DFT-level properties of 133 885 small molecules, wherein 3054 have questionable geometric stability. Of these, we successfully troubleshoot 2988 molecules while maintaining a bijective mapping with the Lewis formulae. Our workflow, based on DFT and post-DFT methods, identifies 66 molecules as unstable; 52 contain -NNO-, and the rest are strained due to pyramidal sp2 C. In the curated dataset, we inspect molecules with long C-C bonds and identify ultralong candidates (r > 1.70 Å) supported by topological analysis of electron density. The proposed strategy can aid in minimizing unintended structural rearrangements during quantum chemistry big data generation.
Collapse
Affiliation(s)
- Salini Senthil
- Tata Institute of Fundamental Research, Centre for Interdisciplinary Sciences Hyderabad 500107 India +91 40 2020 3052
| | - Sabyasachi Chakraborty
- Tata Institute of Fundamental Research, Centre for Interdisciplinary Sciences Hyderabad 500107 India +91 40 2020 3052
| | - Raghunathan Ramakrishnan
- Tata Institute of Fundamental Research, Centre for Interdisciplinary Sciences Hyderabad 500107 India +91 40 2020 3052
| |
Collapse
|
40
|
Using the Gini coefficient to characterize the shape of computational chemistry error distributions. Theor Chem Acc 2021. [DOI: 10.1007/s00214-021-02725-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
41
|
Das SK, Chakraborty S, Ramakrishnan R. Critical benchmarking of popular composite thermochemistry models and density functional approximations on a probabilistically pruned benchmark dataset of formation enthalpies. J Chem Phys 2021; 154:044113. [PMID: 33514111 DOI: 10.1063/5.0032713] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
First-principles calculation of the standard formation enthalpy, ΔHf° (298 K), in such a large scale as required by chemical space explorations, is amenable only with density functional approximations (DFAs) and certain composite wave function theories (cWFTs). Unfortunately, the accuracies of popular range-separated hybrid, "rung-4" DFAs, and cWFTs that offer the best accuracy-vs-cost trade-off have until now been established only for datasets predominantly comprising small molecules; their transferability to larger systems remains vague. In this study, we present an extended benchmark dataset of ΔHf° for structurally and electronically diverse molecules. We apply quartile-ranking based on boundary-corrected kernel density estimation to filter outliers and arrive at probabilistically pruned enthalpies of 1694 compounds (PPE1694). For this dataset, we rank the prediction accuracies of G4, G4(MP2), ccCA, CBS-QB3, and 23 popular DFAs using conventional and probabilistic error metrics. We discuss systematic prediction errors and highlight the role an empirical higher-level correction plays in the G4(MP2) model. Furthermore, we comment on uncertainties associated with the reference empirical data for atoms and the systematic errors stemming from these that grow with the molecular size. We believe that these findings will aid in identifying meaningful application domains for quantum thermochemical methods.
Collapse
Affiliation(s)
- Sambit Kumar Das
- Tata Institute of Fundamental Research, Centre for Interdisciplinary Sciences, Hyderabad 500107, India
| | - Sabyasachi Chakraborty
- Tata Institute of Fundamental Research, Centre for Interdisciplinary Sciences, Hyderabad 500107, India
| | - Raghunathan Ramakrishnan
- Tata Institute of Fundamental Research, Centre for Interdisciplinary Sciences, Hyderabad 500107, India
| |
Collapse
|
42
|
Hemati N, Shiri F, Hadidi S, Mohammadi E, Parvizi R, Hosein Farzaei M. A theoretical investigation on decarboxylation mechanism of antibiotic para-aminosalicylic acid to highly toxic form meta-aminophenol. Struct Chem 2020. [DOI: 10.1007/s11224-020-01676-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
43
|
Simmie JM, Somers KP. Snakes on the Rungs of Jacob's Ladder: Anomalous Vibrational Spectra from Double-Hybrid DFT Methods. J Phys Chem A 2020; 124:6899-6902. [PMID: 32787002 DOI: 10.1021/acs.jpca.0c05120] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The computation by some double-hybrid density functionals of the vibrational modes of a number of CHNO species, including the radicals of carbonic and carbamic acids and of dimethyl carbonate, gives rise to unphysical and anomalous IR spectra with errors well in excess of 1000 cm-1. The effect is not immediately obvious since calculated entropies are largely unaffected, but by contrast, the zero point energies are significantly increased-this has not previously been documented in the literature.
Collapse
Affiliation(s)
- J M Simmie
- School of Chemistry, National University of Ireland, Galway H91 TK33, Ireland
| | - K P Somers
- School of Chemistry, National University of Ireland, Galway H91 TK33, Ireland
| |
Collapse
|
44
|
Collins EM, Raghavachari K. Effective Molecular Descriptors for Chemical Accuracy at DFT Cost: Fragmentation, Error-Cancellation, and Machine Learning. J Chem Theory Comput 2020; 16:4938-4950. [PMID: 32678593 DOI: 10.1021/acs.jctc.0c00236] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Recent advances in theoretical thermochemistry have allowed the study of small organic and bio-organic molecules with high accuracy. However, applications to larger molecules are still impeded by the steep scaling problem of highly accurate quantum mechanical (QM) methods, forcing the use of approximate, more cost-effective methods at a greatly reduced accuracy. One of the most successful strategies to mitigate this error is the use of systematic error-cancellation schemes, in which highly accurate QM calculations can be performed on small portions of the molecule to construct corrections to an approximate method. Herein, we build on ideas from fragmentation and error-cancellation to introduce a new family of molecular descriptors for machine learning modeled after the Connectivity-Based Hierarchy (CBH) of generalized isodesmic reaction schemes. The best performing descriptor ML(CBH-2) is constructed from fragments preserving only the immediate connectivity of all heavy (non-H) atoms of a molecule along with overlapping regions of fragments in accordance with the inclusion-exclusion principle. Our proposed approach offers a simple, chemically intuitive grouping of atoms, tuned with an optimal amount of error-cancellation, and outperforms previous structure-based descriptors using a much smaller input vector length. For a wide variety of density functionals, DFT+ΔML(CBH-2) models, trained on a set of small- to medium-sized organic HCNOSCl-containing molecules, achieved an out-of-sample MAE within 0.5 kcal/mol and 2σ (95%) confidence interval of <1.5 kcal/mol compared to accurate G4 reference values at DFT cost.
Collapse
Affiliation(s)
- Eric M Collins
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| | - Krishnan Raghavachari
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| |
Collapse
|
45
|
Dandu N, Ward L, Assary RS, Redfern PC, Narayanan B, Foster IT, Curtiss LA. Quantum-Chemically Informed Machine Learning: Prediction of Energies of Organic Molecules with 10 to 14 Non-hydrogen Atoms. J Phys Chem A 2020; 124:5804-5811. [PMID: 32539388 DOI: 10.1021/acs.jpca.0c01777] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
High-fidelity quantum-chemical calculations can provide accurate predictions of molecular energies, but their high computational costs limit their utility, especially for larger molecules. We have shown in previous work that machine learning models trained on high-level quantum-chemical calculations (G4MP2) for organic molecules with one to nine non-hydrogen atoms can provide accurate predictions for other molecules of comparable size at much lower costs. Here we demonstrate that such models can also be used to effectively predict energies of molecules larger than those in the training set. To implement this strategy, we first established a set of 191 molecules with 10-14 non-hydrogen atoms having reliable experimental enthalpies of formation. We then assessed the accuracy of computed G4MP2 enthalpies of formation for these 191 molecules. The error in the G4MP2 results was somewhat larger than that for smaller molecules, and the reason for this increase is discussed. Two density functional methods, B3LYP and ωB97X-D, were also used on this set of molecules, with ωB97X-D found to perform better than B3LYP at predicting energies. The G4MP2 energies for the 191 molecules were then predicted using these two functionals with two machine learning methods, the FCHL-Δ and SchNet-Δ models, with the learning done on calculated energies of the one to nine non-hydrogen atom molecules. The better-performing model, FCHL-Δ, gave atomization energies of the 191 organic molecules with 10-14 non-hydrogen atoms within 0.4 kcal/mol of their G4MP2 energies. Thus, this work demonstrates that quantum-chemically informed machine learning can be used to successfully predict the energies of large organic molecules whose size is beyond that in the training set.
Collapse
Affiliation(s)
- Naveen Dandu
- Materials Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States.,Joint Center for Energy Storage Research (JCESR), Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Logan Ward
- Joint Center for Energy Storage Research (JCESR), Argonne National Laboratory, Lemont, Illinois 60439, United States.,Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Rajeev S Assary
- Materials Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States.,Joint Center for Energy Storage Research (JCESR), Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Paul C Redfern
- Materials Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Badri Narayanan
- Department of Mechanical Engineering, University of Louisville, Louisville, Kentucky 40292, United States
| | - Ian T Foster
- Joint Center for Energy Storage Research (JCESR), Argonne National Laboratory, Lemont, Illinois 60439, United States.,Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois 60439, United States.,Department of Computer Science, University of Chicago, Chicago, Illinois 60637, United States
| | - Larry A Curtiss
- Materials Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States.,Joint Center for Energy Storage Research (JCESR), Argonne National Laboratory, Lemont, Illinois 60439, United States
| |
Collapse
|
46
|
Pernot P, Savin A. Probabilistic performance estimators for computational chemistry methods: Systematic improvement probability and ranking probability matrix. II. Applications. J Chem Phys 2020; 152:164109. [DOI: 10.1063/5.0006204] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Affiliation(s)
- Pascal Pernot
- Institut de Chimie Physique, UMR8000, CNRS, Université Paris-Saclay, 91405 Orsay, France
| | - Andreas Savin
- Laboratoire de Chimie Théorique, CNRS and UPMC Université Paris 06, Sorbonne Universités, 75252 Paris, France
| |
Collapse
|
47
|
Abstract
As the quantum chemistry (QC) community embraces machine learning (ML), the number of new methods and applications based on the combination of QC and ML is surging. In this Perspective, a view of the current state of affairs in this new and exciting research field is offered, challenges of using machine learning in quantum chemistry applications are described, and potential future developments are outlined. Specifically, examples of how machine learning is used to improve the accuracy and accelerate quantum chemical research are shown. Generalization and classification of existing techniques are provided to ease the navigation in the sea of literature and to guide researchers entering the field. The emphasis of this Perspective is on supervised machine learning.
Collapse
Affiliation(s)
- Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| |
Collapse
|