1
|
Yang B, Schaefer AJ, Small BL, Leseberg JA, Bischof SM, Webster-Gardiner MS, Ess DH. Experimentally-based Fe-catalyzed ethene oligomerization machine learning model provides highly accurate prediction of propagation/termination selectivity. Chem Sci 2024:d4sc03433c. [PMID: 39449687 PMCID: PMC11495513 DOI: 10.1039/d4sc03433c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2024] [Accepted: 10/09/2024] [Indexed: 10/26/2024] Open
Abstract
Linear α-olefins (1-alkenes) are critical comonomers for ethene copolymerization. A major impediment in the development of new homogeneous Fe catalysts for ethene oligomerization to produce comonomers and other important commercial products is the prediction of propagation versus termination rates that control the α-olefin distribution (e.g., 1-butene through 1-decene), which is often referred to as a K-value. Because the transition states for propagation versus termination are generally separated by less than a one kcal mol-1 difference in energy, this selectivity cannot be accurately predicted by either DFT or wavefunction methods (even DLPNO-CCSD(T)). Therefore, we developed a sub-kcal mol-1 accuracy machine learning model based on several hundred experimental selectivity values and straightforward 2D chemical and physical features that enables the prediction of α-olefin distribution K-values. As part of our model, we developed a new ad hoc feature that boosted the model performance. This machine learning model captures the effects of a broad range of ligand architectures and chemically nonintuitive trends in oligomerization selectivity. Our machine learning model was experimentally validated by prediction of a K-value for a new Fe phosphaneyl-pyridinyl-quinoline catalyst followed by experimental measurement that showed precise agreement. In addition to quantitative predictions, we demonstrate how this machine learning model can provide qualitative catalyst design using proximity of pairs type analysis.
Collapse
Affiliation(s)
- Bo Yang
- Department of Chemistry and Biochemistry, Brigham Young University Provo Utah 84602 USA
| | - Anthony J Schaefer
- Department of Chemistry and Biochemistry, Brigham Young University Provo Utah 84602 USA
| | - Brooke L Small
- Research & Technology, Chevron Phillips Chemical 1862 Kingwood Drive Kingwood Texas 77339 USA
| | - Julie A Leseberg
- Research & Technology, Chevron Phillips Chemical 1862 Kingwood Drive Kingwood Texas 77339 USA
| | - Steven M Bischof
- Research & Technology, Chevron Phillips Chemical 1862 Kingwood Drive Kingwood Texas 77339 USA
| | | | - Daniel H Ess
- Department of Chemistry and Biochemistry, Brigham Young University Provo Utah 84602 USA
| |
Collapse
|
2
|
Cho Y, Laplaza R, Vela S, Corminboeuf C. Automated prediction of ground state spin for transition metal complexes. DIGITAL DISCOVERY 2024; 3:1638-1647. [PMID: 39118977 PMCID: PMC11305380 DOI: 10.1039/d4dd00093e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Accepted: 07/10/2024] [Indexed: 08/10/2024]
Abstract
Exploiting crystallographic data repositories for large-scale quantum chemical computations requires the rapid and accurate extraction of the molecular structure, charge and spin from the crystallographic information file. Here, we develop a general approach to assign the ground state spin of transition metal complexes, in complement to our previous efforts on determining metal oxidation states and bond order within the cell2mol software. Starting from a database of 31k transition metal complexes extracted from the Cambridge Structural Database with cell2mol, we construct the TM-GSspin dataset, which contains 2063 mononuclear first row transition metal complexes and their computed ground state spins. TM-GSspin is highly diverse in terms of metals, metal oxidation states, coordination geometries, and coordination sphere compositions. Based on TM-GSspin, we identify correlations between structural and electronic features of the complexes and their ground state spins to develop a rule-based spin state assignment model. Leveraging this knowledge, we construct interpretable descriptors and build a statistical model achieving 98% cross-validated accuracy in predicting the ground state spin across the board. Our approach provides a practical way to determine the ground state spin of transition metal complexes directly from crystal structures without additional computations, thus enabling the automated use of crystallographic data for large-scale computations involving transition metal complexes.
Collapse
Affiliation(s)
- Yuri Cho
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne Lausanne Switzerland
- National Centre for Computational Design and Discovery of Novel Materials (MARVEL), École Polytechnique Fédérale de Lausanne Lausanne Switzerland
| | - Ruben Laplaza
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne Lausanne Switzerland
- National Centre for Competence in Research-Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne Lausanne Switzerland
| | - Sergi Vela
- Departament de Ciència de Materials i Química Física and IQTCUB, Universitat de Barcelona Barcelona Spain
- Institut de Química Avançada de Catalunya (IQAC-CSIC) Barcelona Spain
| | - Clémence Corminboeuf
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne Lausanne Switzerland
- National Centre for Computational Design and Discovery of Novel Materials (MARVEL), École Polytechnique Fédérale de Lausanne Lausanne Switzerland
- National Centre for Competence in Research-Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne Lausanne Switzerland
| |
Collapse
|
3
|
Agea MI, Čmelo I, Dehaen W, Chen Y, Kirchmair J, Sedlák D, Bartůněk P, Šícho M, Svozil D. Chemical space exploration with Molpher: Generating and assessing a glucocorticoid receptor ligand library. Mol Inform 2024; 43:e202300316. [PMID: 38979783 DOI: 10.1002/minf.202300316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 04/23/2024] [Accepted: 04/24/2024] [Indexed: 07/10/2024]
Abstract
Computational exploration of chemical space is crucial in modern cheminformatics research for accelerating the discovery of new biologically active compounds. In this study, we present a detailed analysis of the chemical library of potential glucocorticoid receptor (GR) ligands generated by the molecular generator, Molpher. To generate the targeted GR library and construct the classification models, structures from the ChEMBL database as well as from the internal IMG library, which was experimentally screened for biological activity in the primary luciferase reporter cell assay, were utilized. The composition of the targeted GR ligand library was compared with a reference library that randomly samples chemical space. A random forest model was used to determine the biological activity of ligands, incorporating its applicability domain using conformal prediction. It was demonstrated that the GR library is significantly enriched with GR ligands compared to the random library. Furthermore, a prospective analysis demonstrated that Molpher successfully designed compounds, which were subsequently experimentally confirmed to be active on the GR. A collection of 34 potential new GR ligands was also identified. Moreover, an important contribution of this study is the establishment of a comprehensive workflow for evaluating computationally generated ligands, particularly those with potential activity against targets that are challenging to dock.
Collapse
Affiliation(s)
- M Isabel Agea
- Department of Informatics and Chemistry & CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Faculty of Chemical Technology, University of Chemistry and Technology, Prague, 16628, Czech Republic
| | - Ivan Čmelo
- Department of Informatics and Chemistry & CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Faculty of Chemical Technology, University of Chemistry and Technology, Prague, 16628, Czech Republic
| | - Wim Dehaen
- Department of Informatics and Chemistry & CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Faculty of Chemical Technology, University of Chemistry and Technology, Prague, 16628, Czech Republic
- Department of Organic Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology, Prague, 16628, Czech Republic
| | - Ya Chen
- Center for Bioinformatics (ZBH), Department of Informatics, Faculty of Mathematics, Informatics and Natural Sciences, Universität Hamburg, 20146, Hamburg, Germany
- Division of Pharmaceutical Chemistry, Department of Pharmaceutical Sciences, Faculty of Life Sciences, University of Vienna, 1090, Vienna, Austria
| | - Johannes Kirchmair
- Center for Bioinformatics (ZBH), Department of Informatics, Faculty of Mathematics, Informatics and Natural Sciences, Universität Hamburg, 20146, Hamburg, Germany
- Division of Pharmaceutical Chemistry, Department of Pharmaceutical Sciences, Faculty of Life Sciences, University of Vienna, 1090, Vienna, Austria
| | - David Sedlák
- CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Institute of Molecular Genetics of the Czech Academy of Sciences, Prague, 14220, Czech Republic
| | - Petr Bartůněk
- CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Institute of Molecular Genetics of the Czech Academy of Sciences, Prague, 14220, Czech Republic
| | - Martin Šícho
- Department of Informatics and Chemistry & CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Faculty of Chemical Technology, University of Chemistry and Technology, Prague, 16628, Czech Republic
| | - Daniel Svozil
- Department of Informatics and Chemistry & CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Faculty of Chemical Technology, University of Chemistry and Technology, Prague, 16628, Czech Republic
- CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Institute of Molecular Genetics of the Czech Academy of Sciences, Prague, 14220, Czech Republic
| |
Collapse
|
4
|
Orsi M, Shing Loh B, Weng C, Ang WH, Frei A. Using Machine Learning to Predict the Antibacterial Activity of Ruthenium Complexes. Angew Chem Int Ed Engl 2024; 63:e202317901. [PMID: 38088924 DOI: 10.1002/anie.202317901] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Indexed: 01/26/2024]
Abstract
Rising antimicrobial resistance (AMR) and lack of innovation in the antibiotic pipeline necessitate novel approaches to discovering new drugs. Metal complexes have proven to be promising antimicrobial compounds, but the number of studied compounds is still low compared to the millions of organic molecules investigated so far. Lately, machine learning (ML) has emerged as a valuable tool for guiding the design of small organic molecules, potentially even in low-data scenarios. For the first time, we extend the application of ML to the discovery of metal-based medicines. Utilising 288 modularly synthesized ruthenium arene Schiff-base complexes and their antibacterial properties, a series of ML models were trained. The models perform well and are used to predict the activity of 54 new compounds. These displayed a 5.7x higher hit-rate (53.7 %) against methicillin-resistant Staphylococcus aureus (MRSA) compared to the original library (9.4 %), demonstrating that ML can be applied to improve the success-rates in the search of new metalloantibiotics. This work paves the way for more ambitious applications of ML in the field of metal-based drug discovery.
Collapse
Affiliation(s)
- Markus Orsi
- Department of Chemistry, Biochemistry & Pharmaceutical Sciences, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| | - Boon Shing Loh
- Department of Chemistry, National University of Singapore, 3 Science Drive 3, Singapore, 117543, Singapore
| | - Cheng Weng
- Department of Chemistry, National University of Singapore, 3 Science Drive 3, Singapore, 117543, Singapore
| | - Wee Han Ang
- Department of Chemistry, National University of Singapore, 3 Science Drive 3, Singapore, 117543, Singapore
- NUS Graduate School - Integrated Science and Engineering Programme (ISEP), National University of Singapore, 21 Lower Kent Ridge Rd, Singapore, 119077, Singapore
| | - Angelo Frei
- Department of Chemistry, Biochemistry & Pharmaceutical Sciences, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| |
Collapse
|
5
|
Ju CW, Shen Y, French EJ, Yi J, Bi H, Tian A, Lin Z. Accurate Electronic and Optical Properties of Organic Doublet Radicals Using Machine Learned Range-Separated Functionals. J Phys Chem A 2024. [PMID: 38382058 DOI: 10.1021/acs.jpca.3c07437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Luminescent organic semiconducting doublet-spin radicals are unique and emergent optical materials because their fluorescent quantum yields (Φfl) are not compromised by the spin-flipping intersystem crossing (ISC) into a dark high-spin state. The multiconfigurational nature of these radicals challenges their electronic structure calculations in the framework of single-reference density functional theory (DFT) and introduces room for method improvement. In the present study, we extended our earlier development of ML-ωPBE [J. Phys. Chem. Lett., 2021, 12, 9516-9524], a range-separated hybrid (RSH) exchange-correlation (XC) functional constructed using the stacked ensemble machine learning (SEML) algorithm, from closed-shell organic semiconducting molecules to doublet-spin organic semiconducting radicals. We assessed its performance for a new test set of 64 doublet-spin radicals from five categories while placing all previously compiled 3926 closed-shell molecules in the new training set. Interestingly, ML-ωPBE agrees with the nonempirical OT-ωPBE functional regarding the prediction of the molecule-dependent range-separation parameter (ω), with a small mean absolute error (MAE) of 0.0197 a0-1, but saves the computational cost by 2.46 orders of magnitude. This result demonstrates an outstanding domain adaptation capacity of ML-ωPBE for diverse organic semiconducting species. To further assess the predictive power of ML-ωPBE in experimental observables, we also applied it to evaluate absorption and fluorescence energies (Eabs and Efl) using linear-response time-dependent DFT (TDDFT), and we compared its behavior with nine popular XC functionals. For most radicals, ML-ωPBE reproduces experimental measurements of Eabs and Efl with small MAEs of 0.299 and 0.254 eV, only marginally different from those of OT-ωPBE. Our work illustrates a successful extension of the SEML framework from closed-shell molecules to doublet-spin radicals and will open the venue for calculating optical properties for organic semiconductors using single-reference TDDFT.
Collapse
Affiliation(s)
- Cheng-Wei Ju
- Department of Chemistry, University of Massachusetts, Amherst, Massachusetts 01003, United States
- Pritzker School of Molecular Engineering, The University of Chicago, Chicago, Illinois 60637, United States
| | - Yili Shen
- Manning College of Information and Computer Sciences, University of Massachusetts, Amherst, Massachusetts 01003, United States
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, Indiana 46556, United States
| | - Ethan J French
- Department of Chemistry, University of Massachusetts, Amherst, Massachusetts 01003, United States
- Department of Mathematics and Statistics, University of Massachusetts, Amherst, Massachusetts 01003, United States
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Charlestown, Massachusetts 02129, United States
| | - Jun Yi
- Department of Chemistry, University of Massachusetts, Amherst, Massachusetts 01003, United States
- Department of Chemistry, Wake Forest University, Winston-Salem, North Carolina 27109, United States
| | - Hongshan Bi
- Department of Chemistry, University of Massachusetts, Amherst, Massachusetts 01003, United States
| | - Aaron Tian
- Manning College of Information and Computer Sciences, University of Massachusetts, Amherst, Massachusetts 01003, United States
- Department of Mathematics and Statistics, University of Massachusetts, Amherst, Massachusetts 01003, United States
| | - Zhou Lin
- Department of Chemistry, University of Massachusetts, Amherst, Massachusetts 01003, United States
| |
Collapse
|
6
|
Chen SS, Meyer Z, Jensen B, Kraus A, Lambert A, Ess DH. ReaLigands: A Ligand Library Cultivated from Experiment and Intended for Molecular Computational Catalyst Design. J Chem Inf Model 2023; 63:7412-7422. [PMID: 37987743 DOI: 10.1021/acs.jcim.3c01310] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Abstract
Computational catalyst design requires identification of a metal and ligand that together result in the desired reaction reactivity and/or selectivity. A major impediment to translating computational designs to experiments is evaluating ligands that are likely to be synthesized. Here, we provide a solution to this impediment with our ReaLigands library that contains >30,000 monodentate, bidentate (didentate), tridentate, and larger ligands cultivated by dismantling experimentally reported crystal structures. Individual ligands from mononuclear crystal structures were identified using a modified depth-first search algorithm and charge was assigned using a machine learning model based on quantum-chemical calculated features. In the library, ligands are sorted based on direct ligand-to-metal atomic connections and on denticity. Representative principal component analysis (PCA) and uniform manifold approximation and projection (UMAP) analyses were used to analyze several tridentate ligand categories, which revealed both the diversity of ligands and connections between ligand categories. We also demonstrated the utility of this library by implementing it with our building and optimization tools, which resulted in the very rapid generation of barriers for 750 bidentate ligands for Rh-hydride ethylene migratory insertion.
Collapse
Affiliation(s)
- Shu-Sen Chen
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84604 United States
| | - Zack Meyer
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84604 United States
| | - Brendan Jensen
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84604 United States
| | - Alex Kraus
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84604 United States
| | - Allison Lambert
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84604 United States
| | - Daniel H Ess
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84604 United States
| |
Collapse
|
7
|
Taylor MG, Burrill DJ, Janssen J, Batista ER, Perez D, Yang P. Architector for high-throughput cross-periodic table 3D complex building. Nat Commun 2023; 14:2786. [PMID: 37188661 PMCID: PMC10185541 DOI: 10.1038/s41467-023-38169-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Accepted: 04/18/2023] [Indexed: 05/17/2023] Open
Abstract
Rare-earth and actinide complexes are critical for a wealth of clean-energy applications. Three-dimensional (3D) structural generation and prediction for these organometallic systems remains a challenge, limiting opportunities for computational chemical discovery. Here, we introduce Architector, a high-throughput in-silico synthesis code for s-, p-, d-, and f-block mononuclear organometallic complexes capable of capturing nearly the full diversity of the known experimental chemical space. Beyond known chemical space, Architector performs in-silico design of new complexes including any chemically accessible metal-ligand combinations. Architector leverages metal-center symmetry, interatomic force fields, and tight binding methods to build many possible 3D conformers from minimal 2D inputs including metal oxidation and spin state. Over a set of more than 6,000 x-ray diffraction (XRD)-determined complexes spanning the periodic table, we demonstrate quantitative agreement between Architector-predicted and experimentally observed structures. Further, we demonstrate out-of-the box conformer generation and energetic rankings of non-minimum energy conformers produced from Architector, which are critical for exploring potential energy surfaces and training force fields. Overall, Architector represents a transformative step towards cross-periodic table computational design of metal complex chemistry.
Collapse
Affiliation(s)
- Michael G Taylor
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, 87545, USA
| | - Daniel J Burrill
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, 87545, USA
| | - Jan Janssen
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, 87545, USA
| | - Enrique R Batista
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, 87545, USA.
| | - Danny Perez
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, 87545, USA.
| | - Ping Yang
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, 87545, USA.
| |
Collapse
|
8
|
Cytter Y, Nandy A, Duan C, Kulik HJ. Insights into the deviation from piecewise linearity in transition metal complexes from supervised machine learning models. Phys Chem Chem Phys 2023; 25:8103-8116. [PMID: 36876903 DOI: 10.1039/d3cp00258f] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
Abstract
Virtual high-throughput screening (VHTS) and machine learning (ML) with density functional theory (DFT) suffer from inaccuracies from the underlying density functional approximation (DFA). Many of these inaccuracies can be traced to the lack of derivative discontinuity that leads to a curvature in the energy with electron addition or removal. Over a dataset of nearly one thousand transition metal complexes typical of VHTS applications, we computed and analyzed the average curvature (i.e., deviation from piecewise linearity) for 23 density functional approximations spanning multiple rungs of "Jacob's ladder". While we observe the expected dependence of the curvatures on Hartree-Fock exchange, we note limited correlation of curvature values between different rungs of "Jacob's ladder". We train ML models (i.e., artificial neural networks or ANNs) to predict the curvature and the associated frontier orbital energies for each of these 23 functionals and then interpret differences in curvature among the different DFAs through analysis of the ML models. Notably, we observe spin to play a much more important role in determining the curvature of range-separated and double hybrids in comparison to semi-local functionals, explaining why curvature values are weakly correlated between these and other families of functionals. Over a space of 187.2k hypothetical compounds, we use our ANNs to pinpoint DFAs for which representative transition metal complexes have near-zero curvature with low uncertainty, demonstrating an approach to accelerate screening of complexes with targeted optical gaps.
Collapse
Affiliation(s)
- Yael Cytter
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| |
Collapse
|
9
|
Duan C, Nandy A, Terrones GG, Kastner DW, Kulik HJ. Active Learning Exploration of Transition-Metal Complexes to Discover Method-Insensitive and Synthetically Accessible Chromophores. JACS AU 2023; 3:391-401. [PMID: 36873700 PMCID: PMC9976347 DOI: 10.1021/jacsau.2c00547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 11/15/2022] [Accepted: 11/16/2022] [Indexed: 06/18/2023]
Abstract
Transition-metal chromophores with earth-abundant transition metals are an important design target for their applications in lighting and nontoxic bioimaging, but their design is challenged by the scarcity of complexes that simultaneously have well-defined ground states and optimal target absorption energies in the visible region. Machine learning (ML) accelerated discovery could overcome such challenges by enabling the screening of a larger space but is limited by the fidelity of the data used in ML model training, which is typically from a single approximate density functional. To address this limitation, we search for consensus in predictions among 23 density functional approximations across multiple rungs of "Jacob's ladder". To accelerate the discovery of complexes with absorption energies in the visible region while minimizing the effect of low-lying excited states, we use two-dimensional (2D)efficient global optimization to sample candidate low-spin chromophores from multimillion complex spaces. Despite the scarcity (i.e., ∼0.01%) of potential chromophores in this large chemical space, we identify candidates with high likelihood (i.e., >10%) of computational validation as the ML models improve during active learning, representing a 1000-fold acceleration in discovery. Absorption spectra of promising chromophores from time-dependent density functional theory verify that 2/3 of candidates have the desired excited-state properties. The observation that constituent ligands from our leads have demonstrated interesting optical properties in the literature exemplifies the effectiveness of our construction of a realistic design space and active learning approach.
Collapse
Affiliation(s)
- Chenru Duan
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States
| | - Aditya Nandy
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States
| | - Gianmarco G. Terrones
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - David W. Kastner
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Biological Engineering, Massachusetts
Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J. Kulik
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
10
|
Zhang S, Xu L, Li S, Oliveira JCA, Li X, Ackermann L, Hong X. Bridging Chemical Knowledge and Machine Learning for Performance Prediction of Organic Synthesis. Chemistry 2023; 29:e202202834. [PMID: 36206170 PMCID: PMC10099903 DOI: 10.1002/chem.202202834] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Indexed: 11/29/2022]
Abstract
Recent years have witnessed a boom of machine learning (ML) applications in chemistry, which reveals the potential of data-driven prediction of synthesis performance. Digitalization and ML modelling are the key strategies to fully exploit the unique potential within the synergistic interplay between experimental data and the robust prediction of performance and selectivity. A series of exciting studies have demonstrated the importance of chemical knowledge implementation in ML, which improves the model's capability for making predictions that are challenging and often go beyond the abilities of human beings. This Minireview summarizes the cutting-edge embedding techniques and model designs in synthetic performance prediction, elaborating how chemical knowledge can be incorporated into machine learning until June 2022. By merging organic synthesis tactics and chemical informatics, we hope this Review can provide a guide map and intrigue chemists to revisit the digitalization and computerization of organic chemistry principles.
Collapse
Affiliation(s)
- Shuo‐Qing Zhang
- Center of Chemistry for Frontier TechnologiesDepartment of ChemistryState Key Laboratory of Clean Energy UtilizationZhejiang University38 Zheda RoadHangzhou310027P. R. China
| | - Li‐Cheng Xu
- Center of Chemistry for Frontier TechnologiesDepartment of ChemistryState Key Laboratory of Clean Energy UtilizationZhejiang University38 Zheda RoadHangzhou310027P. R. China
| | - Shu‐Wen Li
- Center of Chemistry for Frontier TechnologiesDepartment of ChemistryState Key Laboratory of Clean Energy UtilizationZhejiang University38 Zheda RoadHangzhou310027P. R. China
| | - João C. A. Oliveira
- Institut für Organische und Biomolekulare ChemieWöhler Research Institute for Sustainable Chemistry (WISCh)Georg-August-UniversitätTammannstraße 237077GöttingenGermany
| | - Xin Li
- Center of Chemistry for Frontier TechnologiesDepartment of ChemistryState Key Laboratory of Clean Energy UtilizationZhejiang University38 Zheda RoadHangzhou310027P. R. China
| | - Lutz Ackermann
- Institut für Organische und Biomolekulare ChemieWöhler Research Institute for Sustainable Chemistry (WISCh)Georg-August-UniversitätTammannstraße 237077GöttingenGermany
| | - Xin Hong
- Center of Chemistry for Frontier TechnologiesDepartment of ChemistryState Key Laboratory of Clean Energy UtilizationZhejiang University38 Zheda RoadHangzhou310027P. R. China
- Beijing National Laboratory for Molecular SciencesZhongguancun North First Street No. 2Beijing100190P. R. China
- Key Laboratory of Precise Synthesis ofFunctional Molecules of Zhejiang ProvinceSchool of ScienceWestlake University18 Shilongshan RoadHangzhou310024Zhejiang ProvinceP. R. China
| |
Collapse
|
11
|
Gallarati S, van Gerwen P, Laplaza R, Vela S, Fabrizio A, Corminboeuf C. OSCAR: an extensive repository of chemically and functionally diverse organocatalysts. Chem Sci 2022; 13:13782-13794. [PMID: 36544722 PMCID: PMC9710326 DOI: 10.1039/d2sc04251g] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 10/24/2022] [Indexed: 12/24/2022] Open
Abstract
The automated construction of datasets has become increasingly relevant in computational chemistry. While transition-metal catalysis has greatly benefitted from bottom-up or top-down strategies for the curation of organometallic complexes libraries, the field of organocatalysis is mostly dominated by case-by-case studies, with a lack of transferable data-driven tools that facilitate both the exploration of a wider range of catalyst space and the optimization of reaction properties. For these reasons, we introduce OSCAR, a repository of 4000 experimentally derived organocatalysts along with their corresponding building blocks and combinatorially enriched structures. We outline the fragment-based approach used for database generation and showcase the chemical diversity, in terms of functions and molecular properties, covered in OSCAR. The structures and corresponding stereoelectronic properties are publicly available (https://archive.materialscloud.org/record/2022.106) and constitute the starting point to build generative and predictive models for organocatalyst performance.
Collapse
Affiliation(s)
- Simone Gallarati
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
| | - Puck van Gerwen
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
- National Center for Competence in Research - Catalysis (NCCR-Catalysis), Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
| | - Ruben Laplaza
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
- National Center for Competence in Research - Catalysis (NCCR-Catalysis), Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
| | - Sergi Vela
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
| | - Alberto Fabrizio
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
- National Center for Computational Design and Discovery of Novel Materials (MARVEL), Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
| | - Clemence Corminboeuf
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
- National Center for Competence in Research - Catalysis (NCCR-Catalysis), Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
- National Center for Computational Design and Discovery of Novel Materials (MARVEL), Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
| |
Collapse
|
12
|
Krenn M, Ai Q, Barthel S, Carson N, Frei A, Frey NC, Friederich P, Gaudin T, Gayle AA, Jablonka KM, Lameiro RF, Lemm D, Lo A, Moosavi SM, Nápoles-Duarte JM, Nigam A, Pollice R, Rajan K, Schatzschneider U, Schwaller P, Skreta M, Smit B, Strieth-Kalthoff F, Sun C, Tom G, Falk von Rudorff G, Wang A, White AD, Young A, Yu R, Aspuru-Guzik A. SELFIES and the future of molecular string representations. PATTERNS (NEW YORK, N.Y.) 2022; 3:100588. [PMID: 36277819 PMCID: PMC9583042 DOI: 10.1016/j.patter.2022.100588] [Citation(s) in RCA: 59] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Artificial intelligence (AI) and machine learning (ML) are expanding in popularity for broad applications to challenging tasks in chemistry and materials science. Examples include the prediction of properties, the discovery of new reaction pathways, or the design of new molecules. The machine needs to read and write fluently in a chemical language for each of these tasks. Strings are a common tool to represent molecular graphs, and the most popular molecular string representation, Smiles, has powered cheminformatics since the late 1980s. However, in the context of AI and ML in chemistry, Smiles has several shortcomings-most pertinently, most combinations of symbols lead to invalid results with no valid chemical interpretation. To overcome this issue, a new language for molecules was introduced in 2020 that guarantees 100% robustness: SELF-referencing embedded string (Selfies). Selfies has since simplified and enabled numerous new applications in chemistry. In this perspective, we look to the future and discuss molecular string representations, along with their respective opportunities and challenges. We propose 16 concrete future projects for robust molecular representations. These involve the extension toward new chemical domains, exciting questions at the interface of AI and robust languages, and interpretability for both humans and machines. We hope that these proposals will inspire several follow-up works exploiting the full potential of molecular string representations for the future of AI in chemistry and materials science.
Collapse
Affiliation(s)
- Mario Krenn
- Max Planck Institute for the Science of Light (MPL), Erlangen, Germany
| | - Qianxiang Ai
- Department of Chemistry, Fordham University, The Bronx, NY, USA
| | - Senja Barthel
- Department of Mathematics, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
| | - Nessa Carson
- Syngenta Jealott’s Hill International Research Centre, Bracknell, Berkshire, UK
| | - Angelo Frei
- Department of Chemistry, Imperial College London, Molecular Sciences Research Hub, White City Campus, Wood Lane, London, UK
| | - Nathan C. Frey
- Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Pascal Friederich
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Eggenstein-Leopoldshafen, Germany
| | - Théophile Gaudin
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- IBM Research Europe, Zürich, Switzerland
| | | | - Kevin Maik Jablonka
- Laboratory of Molecular Simulation (LSMO), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Sion, Valais, Switzerland
| | - Rafael F. Lameiro
- Medicinal and Biological Chemistry Group, São Carlos Institute of Chemistry, University of São Paulo, São Paulo, Brazil
| | - Dominik Lemm
- Faculty of Physics, University of Vienna, Vienna, Austria
| | - Alston Lo
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Seyed Mohamad Moosavi
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
| | | | - AkshatKumar Nigam
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Robert Pollice
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada
| | - Kohulan Rajan
- Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller Universität Jena, Jena, Germany
| | - Ulrich Schatzschneider
- Institut für Anorganische Chemie, Julius-Maximilians-Universität Würzburg, Würzburg, Germany
| | - Philippe Schwaller
- IBM Research Europe, Zürich, Switzerland
- Laboratory of Artificial Chemical Intelligence (LIAC), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- National Centre of Competence in Research (NCCR) Catalysis, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Marta Skreta
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Vector Institute for Artificial Intelligence, Toronto, ON, Canada
| | - Berend Smit
- Laboratory of Molecular Simulation (LSMO), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Sion, Valais, Switzerland
| | - Felix Strieth-Kalthoff
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada
| | - Chong Sun
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Gary Tom
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada
| | | | - Andrew Wang
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada
- Solar Fuels Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada
| | - Andrew D. White
- Department of Chemical Engineering, University of Rochester, Rochester, NY, USA
| | - Adamo Young
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Vector Institute for Artificial Intelligence, Toronto, ON, Canada
| | - Rose Yu
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA
| | - Alán Aspuru-Guzik
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada
- Vector Institute for Artificial Intelligence, Toronto, ON, Canada
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON, Canada
- Department of Materials Science, University of Toronto, Toronto, ON, Canada
- Canadian Institute for Advanced Research (CIFAR) Lebovic Fellow, Toronto, ON, Canada
| |
Collapse
|
13
|
Miller E, Mai BK, Read JA, Bell WC, Derrick JS, Liu P, Toste FD. A Combined DFT, Energy Decomposition, and Data Analysis Approach to Investigate the Relationship Between Noncovalent Interactions and Selectivity in a Flexible DABCOnium/Chiral Anion Catalyst System. ACS Catal 2022; 12:12369-12385. [PMID: 37215160 PMCID: PMC10195112 DOI: 10.1021/acscatal.2c03077] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Developing strategies to study reactivity and selectivity in flexible catalyst systems has become an important topic of research. Herein, we report a combined experimental and computational study aimed at understanding the mechanistic role of an achiral DABCOnium cofactor in a regio- and enantiodivergent bromocyclization reaction. It was found that electron-deficient aryl substituents enable rigidified transition states via an anion-π interaction with the catalyst, which drives the selectivity of the reaction. In contrast, electron-rich aryl groups on the DABCOnium result in significantly more flexible transition states, where interactions between the catalyst and substrate are more important. An analysis of not only the lowest-energy transition state structures but also an ensemble of low-energy transition state conformers via energy decomposition analysis and machine learning was crucial to revealing the dominant noncovalent interactions responsible for observed changes in selectivity in this flexible system.
Collapse
Affiliation(s)
- Edward Miller
- Department of Chemistry, University of California, Berkeley, California 94720, United States
| | - Binh Khanh Mai
- Department of Chemistry, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| | - Jacquelyne A Read
- Department of Chemistry, University of Utah, Salt Lake City, Utah 84112, United States
| | - William C Bell
- Department of Chemistry, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| | - Jeffrey S Derrick
- Department of Chemistry, University of California, Berkeley, California 94720, United States
| | - Peng Liu
- Department of Chemistry, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| | - F Dean Toste
- Department of Chemistry, University of California, Berkeley, California 94720, United States
| |
Collapse
|
14
|
Duan C, Ladera AJ, Liu JCL, Taylor MG, Ariyarathna IR, Kulik HJ. Exploiting Ligand Additivity for Transferable Machine Learning of Multireference Character across Known Transition Metal Complex Ligands. J Chem Theory Comput 2022; 18:4836-4845. [PMID: 35834742 DOI: 10.1021/acs.jctc.2c00468] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Accurate virtual high-throughput screening (VHTS) of transition metal complexes (TMCs) remains challenging due to the possibility of high multireference (MR) character that complicates property evaluation. We compute MR diagnostics for over 5,000 ligands present in previously synthesized octahedral mononuclear transition metal complexes in the Cambridge Structural Database (CSD). To accomplish this task, we introduce an iterative approach for consistent ligand charge assignment for ligands in the CSD. Across this set, we observe that the MR character correlates linearly with the inverse value of the averaged bond order over all bonds in the molecule. We then demonstrate that ligand additivity of the MR character holds in TMCs, which suggests that the TMC MR character can be inferred from the sum of the MR character of the ligands. Encouraged by this observation, we leverage ligand additivity and develop a ligand-derived machine learning representation to train neural networks to predict the MR character of TMCs from properties of the constituent ligands. This approach yields models with excellent performance and superior transferability to unseen ligand chemistry and compositions.
Collapse
Affiliation(s)
- Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Adriana J Ladera
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Julian C-L Liu
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Michael G Taylor
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Isuru R Ariyarathna
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
15
|
Duan C, Nandy A, Adamji H, Roman-Leshkov Y, Kulik HJ. Machine Learning Models Predict Calculation Outcomes with the Transferability Necessary for Computational Catalysis. J Chem Theory Comput 2022; 18:4282-4292. [PMID: 35737587 DOI: 10.1021/acs.jctc.2c00331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Virtual high-throughput screening (VHTS) and machine learning (ML) have greatly accelerated the design of single-site transition-metal catalysts. VHTS of catalysts, however, is often accompanied with a high calculation failure rate and wasted computational resources due to the difficulty of simultaneously converging all mechanistically relevant reactive intermediates to expected geometries and electronic states. We demonstrate a dynamic classifier approach, i.e., a convolutional neural network that monitors geometry optimizations on the fly, and exploit its good performance and transferability in identifying geometry optimization failures for catalyst design. We show that the dynamic classifier performs well on all reactive intermediates in the representative catalytic cycle of the radical rebound mechanism for the conversion of methane to methanol despite being trained on only one reactive intermediate. The dynamic classifier also generalizes to chemically distinct intermediates and metal centers absent from the training data without loss of accuracy or model confidence. We rationalize this superior model transferability as arising from the use of electronic structure and geometric information generated on-the-fly from density functional theory calculations and the convolutional layer in the dynamic classifier. When used in combination with uncertainty quantification, the dynamic classifier saves more than half of the computational resources that would have been wasted on unsuccessful calculations for all reactive intermediates being considered.
Collapse
Affiliation(s)
- Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Husain Adamji
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Yuriy Roman-Leshkov
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
16
|
Lustosa DM, Milo A. Mechanistic Inference from Statistical Models at Different Data-Size Regimes. ACS Catal 2022. [DOI: 10.1021/acscatal.2c01741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Danilo M. Lustosa
- Department of Chemistry, Ben-Gurion University of the Negev, Beer Sheva 84105, Israel
| | - Anat Milo
- Department of Chemistry, Ben-Gurion University of the Negev, Beer Sheva 84105, Israel
| |
Collapse
|
17
|
Duan C, Chu DBK, Nandy A, Kulik HJ. Detection of multi-reference character imbalances enables a transfer learning approach for virtual high throughput screening with coupled cluster accuracy at DFT cost. Chem Sci 2022; 13:4962-4971. [PMID: 35655882 PMCID: PMC9067623 DOI: 10.1039/d2sc00393g] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Accepted: 04/04/2022] [Indexed: 01/08/2023] Open
Abstract
Appropriately identifying and treating molecules and materials with significant multi-reference (MR) character is crucial for achieving high data fidelity in virtual high-throughput screening (VHTS). Despite development of numerous MR diagnostics, the extent to which a single value of such a diagnostic indicates the MR effect on a chemical property prediction is not well established. We evaluate MR diagnostics for over 10 000 transition-metal complexes (TMCs) and compare to those for organic molecules. We observe that only some MR diagnostics are transferable from one chemical space to another. By studying the influence of MR character on chemical properties (i.e., MR effect) that involve multiple potential energy surfaces (i.e., adiabatic spin splitting, ΔE H-L, and ionization potential, IP), we show that differences in MR character are more important than the cumulative degree of MR character in predicting the magnitude of an MR effect. Motivated by this observation, we build transfer learning models to predict CCSD(T)-level adiabatic ΔE H-L and IP from lower levels of theory. By combining these models with uncertainty quantification and multi-level modeling, we introduce a multi-pronged strategy that accelerates data acquisition by at least a factor of three while achieving coupled cluster accuracy (i.e., to within 1 kcal mol-1 MAE) for robust VHTS.
Collapse
Affiliation(s)
- Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
- Department of Chemistry, Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Daniel B K Chu
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
- Department of Chemistry, Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
| |
Collapse
|
18
|
Harper DR, Nandy A, Arunachalam N, Duan C, Janet JP, Kulik HJ. Representations and strategies for transferable machine learning Improve model performance in chemical discovery. J Chem Phys 2022; 156:074101. [DOI: 10.1063/5.0082964] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Daniel R Harper
- Massachusetts Institute of Technology, United States of America
| | - Aditya Nandy
- Massachusetts Institute of Technology, United States of America
| | | | - Chenru Duan
- Massachusetts Institute of Technology, United States of America
| | | | - Heather J. Kulik
- Dept of Chemical Engineering, Massachusetts Institute of Technology, United States of America
| |
Collapse
|
19
|
Cammarota RC, Liu W, Bacsa J, Davies HML, Sigman MS. Mechanistically Guided Workflow for Relating Complex Reactive Site Topologies to Catalyst Performance in C–H Functionalization Reactions. J Am Chem Soc 2022; 144:1881-1898. [DOI: 10.1021/jacs.1c12198] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Affiliation(s)
- Ryan C. Cammarota
- Department of Chemistry, University of Utah, 315 South 1400 East, Salt Lake City, Utah 84112, United States
| | - Wenbin Liu
- Department of Chemistry, Emory University, 1515 Dickey Drive, Atlanta, Georgia 30322, United States
| | - John Bacsa
- Department of Chemistry, Emory University, 1515 Dickey Drive, Atlanta, Georgia 30322, United States
| | - Huw M. L. Davies
- Department of Chemistry, Emory University, 1515 Dickey Drive, Atlanta, Georgia 30322, United States
| | - Matthew S. Sigman
- Department of Chemistry, University of Utah, 315 South 1400 East, Salt Lake City, Utah 84112, United States
| |
Collapse
|
20
|
Harper DR, Kulik HJ. Computational Scaling Relationships Predict Experimental Activity and Rate-Limiting Behavior in Homogeneous Water Oxidation. Inorg Chem 2022; 61:2186-2197. [PMID: 35037756 DOI: 10.1021/acs.inorgchem.1c03376] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
While computational screening with first-principles density functional theory (DFT) is essential for evaluating candidate catalysts, limitations in accuracy typically prevent the prediction of experimentally relevant activities. Exemplary of these challenges are homogeneous water oxidation catalysts (WOCs) where differences in experimental conditions or small changes in ligand structure can alter rate constants by over an order of magnitude. Here, we compute mechanistically relevant electronic and energetic properties for 19 mononuclear Ru transition-metal complexes (TMCs) from three experimental water oxidation catalysis studies. We discover that 15 of these TMCs have experimental activities that correlate with a single property, the ionization potential of the Ru(II)-O2 catalytic intermediate. This scaling parameter allows the quantitative understanding of activity trends and provides insight into the rate-limiting behavior. We use this approach to rationalize differences in activity with different experimental conditions, and we qualitatively analyze the source of distinct behavior for different electronic states in the other four catalysts. Comparison to closely related single-atom catalysts and modified WOCs enables rationalization of the source of rate enhancement in these WOCs.
Collapse
Affiliation(s)
- Daniel R Harper
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
21
|
Khrabrov K, Shenbin I, Ryabov A, Tsypin A, Telepov A, Alekseev A, Grishin A, Strashnov P, Zhilyaev P, Nikolenko S, Kadurin A. nablaDFT: Large-Scale Conformational Energy and Hamiltonian Prediction benchmark and dataset. Phys Chem Chem Phys 2022; 24:25853-25863. [DOI: 10.1039/d2cp03966d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
In this work we present nablaDFT, the new dataset and benchmark for the Density Functional Theory Hamiltonian and energy prediction. We provide data for over 1 million different molecules and over 5 million conformations and baseline models for both tasks.
Collapse
Affiliation(s)
- Kuzma Khrabrov
- AIRI, Kutuzovskiy prospect house 32 building K.1, Moscow, 121170, Russia
| | - Ilya Shenbin
- St. Petersburg Department of Steklov Mathematical Institute of Russian Academy of Sciences, nab. r. Fontanki 27, St. Petersburg 191011, Russia
| | - Alexander Ryabov
- Center for Materials Technologies, Skolkovo Institute of Science and Technology, Bolshoy Boulevard 30, bld. 1, Moscow, 121205, Russia
- Moscow Institute of Physics and Technology (National Research University), Institutsky lane, 9, Dolgoprudny, Moscow Region 141700, Russia
| | - Artem Tsypin
- AIRI, Kutuzovskiy prospect house 32 building K.1, Moscow, 121170, Russia
| | - Alexander Telepov
- AIRI, Kutuzovskiy prospect house 32 building K.1, Moscow, 121170, Russia
| | - Anton Alekseev
- St. Petersburg Department of Steklov Mathematical Institute of Russian Academy of Sciences, nab. r. Fontanki 27, St. Petersburg 191011, Russia
- St. Petersburg University, 7-9 Universitetskaya Embankment, St Petersburg, 199034, Russia
| | - Alexander Grishin
- AIRI, Kutuzovskiy prospect house 32 building K.1, Moscow, 121170, Russia
| | - Pavel Strashnov
- AIRI, Kutuzovskiy prospect house 32 building K.1, Moscow, 121170, Russia
| | - Petr Zhilyaev
- Center for Materials Technologies, Skolkovo Institute of Science and Technology, Bolshoy Boulevard 30, bld. 1, Moscow, 121205, Russia
| | - Sergey Nikolenko
- St. Petersburg Department of Steklov Mathematical Institute of Russian Academy of Sciences, nab. r. Fontanki 27, St. Petersburg 191011, Russia
- ISP RAS Research Center for Trusted Artificial Intelligence, Alexander Solzhenitsyn st. 25, Moscow, 109004, Russia
| | - Artur Kadurin
- AIRI, Kutuzovskiy prospect house 32 building K.1, Moscow, 121170, Russia
- Kuban State University, Stavropolskaya Street, 149, Krasnodar 350040, Russia
| |
Collapse
|
22
|
Lu H, Kang X, Luo Y. Structure-Based Relative Energy Prediction Model: A Case Study of Pd(II)-Catalyzed Ethylene Polymerization and the Electronic Effect of Ancillary Ligands. J Phys Chem B 2021; 125:12047-12053. [PMID: 34694809 DOI: 10.1021/acs.jpcb.1c05143] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Rapidly mapping a reaction energy profile to understand the reaction mechanism is of great importance and highly desired for the discovery of new chemical reactions. Herein, a combination of density functional theory (DFT) calculations and regression analysis has been applied to construct quantitative structures-based energy prediction models, considering Pd(II)-catalyzed ethylene polymerization as an example, for rapid construction of the reaction energy profile. It is inspiring that only geometrical parameters of the reaction center of one species are capable of predicting the whole energy profile with high accuracy. The reaction energies of ethylene insertion and β-H elimination, which directly correlate with polymerization activity and the possibility of branch formation, were studied to elucidate the electronic effects of ancillary ligands. Further analyses of these models from the statistical and chemical points of view afforded useful information on the design of the catalyst ligand. The current work is expected to methodologically shed new light on rapidly mapping the energy profile of chemical reactions and further provide useful information for the development of the reactions.
Collapse
Affiliation(s)
- Han Lu
- State Key Laboratory of Fine Chemicals, School of Chemical Engineering, Dalian University of Technology, Dalian 116024, China
| | - Xiaohui Kang
- College of Pharmacy, Dalian Medical University, Dalian 116044, China
| | - Yi Luo
- State Key Laboratory of Fine Chemicals, School of Chemical Engineering, Dalian University of Technology, Dalian 116024, China.,PetroChina Petrochemical Research Institute, Beijing 102206, China
| |
Collapse
|
23
|
Taylor MG, Nandy A, Lu CC, Kulik HJ. Deciphering Cryptic Behavior in Bimetallic Transition-Metal Complexes with Machine Learning. J Phys Chem Lett 2021; 12:9812-9820. [PMID: 34597514 DOI: 10.1021/acs.jpclett.1c02852] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
We demonstrate an alternative, data-driven approach to uncovering structure-property relationships for the rational design of heterobimetallic transition-metal complexes that exhibit metal-metal bonding. We tailor graph-based representations of the metal-local environment for these complexes for use in multiple linear regression and kernel ridge regression (KRR) models. We curate a set of 28 experimentally characterized complexes to develop a multiple linear regression model for oxidation potentials. We achieve good accuracy (mean absolute error of 0.25 V) and preserve transferability to unseen experimental data with a new ligand structure. We also train a KRR model on a subset of 330 structurally characterized heterobimetallics to predict the degree of metal-metal bonding. This KRR model predicts relative metal-metal bond lengths in the test set to within 5%, and analysis of key features reveals the fundamental atomic contributions (e.g., the valence electron configuration) that most strongly influence the behavior of these complexes. Our work provides guidance for rational bimetallic design, suggesting that properties, including the formal shortness ratio, should be transferable from one period to another.
Collapse
Affiliation(s)
- Michael G Taylor
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Connie C Lu
- Department of Chemistry, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
24
|
Duan C, Chen S, Taylor MG, Liu F, Kulik HJ. Machine learning to tame divergent density functional approximations: a new path to consensus materials design principles. Chem Sci 2021; 12:13021-13036. [PMID: 34745533 PMCID: PMC8513898 DOI: 10.1039/d1sc03701c] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Accepted: 09/01/2021] [Indexed: 01/17/2023] Open
Abstract
Virtual high-throughput screening (VHTS) with density functional theory (DFT) and machine-learning (ML)-acceleration is essential in rapid materials discovery. By necessity, efficient DFT-based workflows are carried out with a single density functional approximation (DFA). Nevertheless, properties evaluated with different DFAs can be expected to disagree for cases with challenging electronic structure (e.g., open-shell transition-metal complexes, TMCs) for which rapid screening is most needed and accurate benchmarks are often unavailable. To quantify the effect of DFA bias, we introduce an approach to rapidly obtain property predictions from 23 representative DFAs spanning multiple families, “rungs” (e.g., semi-local to double hybrid) and basis sets on over 2000 TMCs. Although computed property values (e.g., spin state splitting and frontier orbital gap) differ by DFA, high linear correlations persist across all DFAs. We train independent ML models for each DFA and observe convergent trends in feature importance, providing DFA-invariant, universal design rules. We devise a strategy to train artificial neural network (ANN) models informed by all 23 DFAs and use them to predict properties (e.g., spin-splitting energy) of over 187k TMCs. By requiring consensus of the ANN-predicted DFA properties, we improve correspondence of computational lead compounds with literature-mined, experimental compounds over the typically employed single-DFA approach. Machine learning (ML)-based feature analysis reveals universal design rules regardless of density functional choices. Using the consensus among multiple functionals, we identify robust lead complexes in ML-accelerated chemical discovery.![]()
Collapse
Affiliation(s)
- Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA +1-617-253-4584.,Department of Chemistry, Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Shuxin Chen
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA +1-617-253-4584.,Department of Chemistry, Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Michael G Taylor
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA +1-617-253-4584
| | - Fang Liu
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA +1-617-253-4584
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA +1-617-253-4584
| |
Collapse
|
25
|
Tantillo DJ, Laconsay CJ. Melding of Experiment and Theory Illuminates Mechanisms of Metal-Catalyzed Rearrangements: Computational Approaches and Caveats. SYNTHESIS-STUTTGART 2021. [DOI: 10.1055/s-0040-1720451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
AbstractThis review summarizes approaches and caveats in computational modeling of transition-metal-catalyzed sigmatropic rearrangements involving carbene transfer. We highlight contemporary examples of combined synthetic and theoretical investigations that showcase the synergy achievable by integrating experiment and theory.1 Introduction2 Mechanistic Models3 Theoretical Approaches and Caveats3.1 Recommended Computational Tools3.2 Choice of Functional and Basis Set3.3 Conformations and Ligand-Binding Modes3.4 Solvation4 Synergy of Experiment and Theory – Case Studies4.1 Metal-Bound or Free Ylides?4.2 Conformations and Ligand-Binding Modes of Paddlewheel Complexes4.3 No Metal, Just Light4.4 How To ‘Cope’ with Nonstatistical Dynamic Effects5 Outlook
Collapse
|
26
|
Lan Z, Mallikarjun Sharada S. A framework for constructing linear free energy relationships to design molecular transition metal catalysts. Phys Chem Chem Phys 2021; 23:15543-15556. [PMID: 34254089 DOI: 10.1039/d1cp02278d] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
A computational framework for ligand-driven design of transition metal complexes is presented in this work. We propose a general procedure for the construction of active site-specific linear free energy relationships (LFERs), which are inspired from Hammett and Taft correlations in organic chemistry and grounded in the activation strain model (ASM). Ligand effects are isolated and quantified in terms of their contribution to interaction and strain energy components of ASM. Scalar descriptors that are easily obtainable are then employed to construct the complete LFER. We successfully demonstrate proof-of-concept by constructing and applying an LFER to CH activation with enzyme-inspired [Cu2O2]2+ complexes. The key benefit of using ASM is a built-in compensation or error cancellation between LFER prediction of interaction and strain terms, resulting in accurate barrier predictions for 37 of the 47 catalysts examined in this study. The LFER is also transferable with respect to level of theory and flexible towards the choice of reference system. The absence of interaction-strain compensation or poor model performance for the remaining systems is a consequence of the approximate nature of the chosen interaction energy descriptor and LFER construction of the strain term, which focuses largely on trends in substrate and not catalyst strain.
Collapse
Affiliation(s)
- Zhenzhuo Lan
- Mork Family Department of Chemical Engineering and Materials Science, University of Southern California, Los Angeles, CA, USA.
| | - Shaama Mallikarjun Sharada
- Mork Family Department of Chemical Engineering and Materials Science, University of Southern California, Los Angeles, CA, USA. and Department of Chemistry, University of Southern California, Los Angeles, CA, USA
| |
Collapse
|
27
|
Turcani L, Tarzia A, Szczypiński FT, Jelfs KE. stk: An extendable Python framework for automated molecular and supramolecular structure assembly and discovery. J Chem Phys 2021; 154:214102. [PMID: 34240979 DOI: 10.1063/5.0049708] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Computational software workflows are emerging as all-in-one solutions to speed up the discovery of new materials. Many computational approaches require the generation of realistic structural models for property prediction and candidate screening. However, molecular and supramolecular materials represent classes of materials with many potential applications for which there is no go-to database of existing structures or general protocol for generating structures. Here, we report a new version of the supramolecular toolkit, stk, an open-source, extendable, and modular Python framework for general structure generation of (supra)molecular structures. Our construction approach works on arbitrary building blocks and topologies and minimizes the input required from the user, making stk user-friendly and applicable to many material classes. This version of stk includes metal-containing structures and rotaxanes as well as general implementation and interface improvements. Additionally, this version includes built-in tools for exploring chemical space with an evolutionary algorithm and tools for database generation and visualization. The latest version of stk is freely available at github.com/lukasturcani/stk.
Collapse
Affiliation(s)
- Lukas Turcani
- Department of Chemistry, Molecular Sciences Research Hub, Imperial College London, White City Campus, Wood Lane, London W12 0BZ, United Kingdom
| | - Andrew Tarzia
- Department of Chemistry, Molecular Sciences Research Hub, Imperial College London, White City Campus, Wood Lane, London W12 0BZ, United Kingdom
| | - Filip T Szczypiński
- Department of Chemistry, Molecular Sciences Research Hub, Imperial College London, White City Campus, Wood Lane, London W12 0BZ, United Kingdom
| | - Kim E Jelfs
- Department of Chemistry, Molecular Sciences Research Hub, Imperial College London, White City Campus, Wood Lane, London W12 0BZ, United Kingdom
| |
Collapse
|
28
|
Duan C, Liu F, Nandy A, Kulik HJ. Putting Density Functional Theory to the Test in Machine-Learning-Accelerated Materials Discovery. J Phys Chem Lett 2021; 12:4628-4637. [PMID: 33973793 DOI: 10.1021/acs.jpclett.1c00631] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Accelerated discovery with machine learning (ML) has begun to provide the advances in efficiency needed to overcome the combinatorial challenge of computational materials design. Nevertheless, ML-accelerated discovery both inherits the biases of training data derived from density functional theory (DFT) and leads to many attempted calculations that are doomed to fail. Many compelling functional materials and catalytic processes involve strained chemical bonds, open-shell radicals and diradicals, or metal-organic bonds to open-shell transition-metal centers. Although promising targets, these materials present unique challenges for electronic structure methods and combinatorial challenges for their discovery. In this Perspective, we describe the advances needed in accuracy, efficiency, and approach beyond what is typical in conventional DFT-based ML workflows. These challenges have begun to be addressed through ML models trained to predict the results of multiple methods or the differences between them, enabling quantitative sensitivity analysis. For DFT to be trusted for a given data point in a high-throughput screen, it must pass a series of tests. ML models that predict the likelihood of calculation success and detect the presence of strong correlation will enable rapid diagnoses and adaptation strategies. These "decision engines" represent the first steps toward autonomous workflows that avoid the need for expert determination of the robustness of DFT-based materials discoveries.
Collapse
Affiliation(s)
- Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Fang Liu
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|