1
|
Singh S, Hernández-Lobato JM. Deep Kernel learning for reaction outcome prediction and optimization. Commun Chem 2024; 7:136. [PMID: 38877182 DOI: 10.1038/s42004-024-01219-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2024] [Accepted: 06/05/2024] [Indexed: 06/16/2024] Open
Abstract
Recent years have seen a rapid growth in the application of various machine learning methods for reaction outcome prediction. Deep learning models have gained popularity due to their ability to learn representations directly from the molecular structure. Gaussian processes (GPs), on the other hand, provide reliable uncertainty estimates but are unable to learn representations from the data. We combine the feature learning ability of neural networks (NNs) with uncertainty quantification of GPs in a deep kernel learning (DKL) framework to predict the reaction outcome. The DKL model is observed to obtain very good predictive performance across different input representations. It significantly outperforms standard GPs and provides comparable performance to graph neural networks, but with uncertainty estimation. Additionally, the uncertainty estimates on predictions provided by the DKL model facilitated its incorporation as a surrogate model for Bayesian optimization (BO). The proposed method, therefore, has a great potential towards accelerating reaction discovery by integrating accurate predictive models that provide reliable uncertainty estimates with BO.
Collapse
Affiliation(s)
- Sukriti Singh
- Department of Engineering, University of Cambridge, Cambridge, UK.
| | | |
Collapse
|
2
|
Bassani CL, van Anders G, Banin U, Baranov D, Chen Q, Dijkstra M, Dimitriyev MS, Efrati E, Faraudo J, Gang O, Gaston N, Golestanian R, Guerrero-Garcia GI, Gruenwald M, Haji-Akbari A, Ibáñez M, Karg M, Kraus T, Lee B, Van Lehn RC, Macfarlane RJ, Mognetti BM, Nikoubashman A, Osat S, Prezhdo OV, Rotskoff GM, Saiz L, Shi AC, Skrabalak S, Smalyukh II, Tagliazucchi M, Talapin DV, Tkachenko AV, Tretiak S, Vaknin D, Widmer-Cooper A, Wong GCL, Ye X, Zhou S, Rabani E, Engel M, Travesset A. Nanocrystal Assemblies: Current Advances and Open Problems. ACS NANO 2024; 18:14791-14840. [PMID: 38814908 DOI: 10.1021/acsnano.3c10201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
We explore the potential of nanocrystals (a term used equivalently to nanoparticles) as building blocks for nanomaterials, and the current advances and open challenges for fundamental science developments and applications. Nanocrystal assemblies are inherently multiscale, and the generation of revolutionary material properties requires a precise understanding of the relationship between structure and function, the former being determined by classical effects and the latter often by quantum effects. With an emphasis on theory and computation, we discuss challenges that hamper current assembly strategies and to what extent nanocrystal assemblies represent thermodynamic equilibrium or kinetically trapped metastable states. We also examine dynamic effects and optimization of assembly protocols. Finally, we discuss promising material functions and examples of their realization with nanocrystal assemblies.
Collapse
Affiliation(s)
- Carlos L Bassani
- Institute for Multiscale Simulation, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91058 Erlangen, Germany
| | - Greg van Anders
- Department of Physics, Engineering Physics, and Astronomy, Queen's University, Kingston, Ontario K7L 3N6, Canada
| | - Uri Banin
- Institute of Chemistry and the Center for Nanoscience and Nanotechnology, The Hebrew University of Jerusalem, Jerusalem 91904, Israel
| | - Dmitry Baranov
- Division of Chemical Physics, Department of Chemistry, Lund University, SE-221 00 Lund, Sweden
| | - Qian Chen
- University of Illinois, Urbana, Illinois 61801, USA
| | - Marjolein Dijkstra
- Soft Condensed Matter & Biophysics, Debye Institute for Nanomaterials Science, Utrecht University, 3584 CC Utrecht, The Netherlands
| | - Michael S Dimitriyev
- Department of Polymer Science and Engineering, University of Massachusetts, Amherst, Massachusetts 01003, USA
- Department of Materials Science and Engineering, Texas A&M University, College Station, Texas 77843, USA
| | - Efi Efrati
- Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot 76100, Israel
- James Franck Institute, The University of Chicago, Chicago, Illinois 60637, USA
| | - Jordi Faraudo
- Institut de Ciencia de Materials de Barcelona (ICMAB-CSIC), Campus de la UAB, E-08193 Bellaterra, Barcelona, Spain
| | - Oleg Gang
- Department of Chemical Engineering, Columbia University, New York, New York 10027, USA
- Department of Applied Physics and Applied Mathematics, Columbia University, New York, New York 10027, USA
- Center for Functional Nanomaterials, Brookhaven National Laboratory, Upton, New York 11973, USA
| | - Nicola Gaston
- The MacDiarmid Institute for Advanced Materials and Nanotechnology, Department of Physics, The University of Auckland, Auckland 1142, New Zealand
| | - Ramin Golestanian
- Max Planck Institute for Dynamics and Self-Organization (MPI-DS), 37077 Göttingen, Germany
- Rudolf Peierls Centre for Theoretical Physics, University of Oxford, Oxford OX1 3PU, UK
| | - G Ivan Guerrero-Garcia
- Facultad de Ciencias de la Universidad Autónoma de San Luis Potosí, 78295 San Luis Potosí, México
| | - Michael Gruenwald
- Department of Chemistry, University of Utah, Salt Lake City, Utah 84112, USA
| | - Amir Haji-Akbari
- Department of Chemical and Environmental Engineering, Yale University, New Haven, Connecticut 06511, USA
| | - Maria Ibáñez
- Institute of Science and Technology Austria (ISTA), 3400 Klosterneuburg, Austria
| | - Matthias Karg
- Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
| | - Tobias Kraus
- INM - Leibniz-Institute for New Materials, 66123 Saarbrücken, Germany
- Saarland University, Colloid and Interface Chemistry, 66123 Saarbrücken, Germany
| | - Byeongdu Lee
- X-ray Science Division, Argonne National Laboratory, Lemont, Illinois 60439, USA
| | - Reid C Van Lehn
- Department of Chemical and Biological Engineering, University of Wisconsin-Madison, Madison, Wisconsin 53717, USA
| | - Robert J Macfarlane
- Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142, USA
| | - Bortolo M Mognetti
- Center for Nonlinear Phenomena and Complex Systems, Université Libre de Bruxelles, 1050 Brussels, Belgium
| | - Arash Nikoubashman
- Leibniz-Institut für Polymerforschung Dresden e.V., 01069 Dresden, Germany
- Institut für Theoretische Physik, Technische Universität Dresden, 01069 Dresden, Germany
| | - Saeed Osat
- Max Planck Institute for Dynamics and Self-Organization (MPI-DS), 37077 Göttingen, Germany
| | - Oleg V Prezhdo
- Department of Chemistry, University of Southern California, Los Angeles, CA 90089, USA
- Department of Physics and Astronomy, University of Southern California, Los Angeles, California 90089, USA
| | - Grant M Rotskoff
- Department of Chemistry, Stanford University, Stanford, California 94305, USA
| | - Leonor Saiz
- Department of Biomedical Engineering, University of California, Davis, California 95616, USA
| | - An-Chang Shi
- Department of Physics & Astronomy, McMaster University, Hamilton, Ontario L8S 4M1, Canada
| | - Sara Skrabalak
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, USA
| | - Ivan I Smalyukh
- Department of Physics and Chemical Physics Program, University of Colorado, Boulder, Colorado 80309, USA
- International Institute for Sustainability with Knotted Chiral Meta Matter, Hiroshima University, Higashi-Hiroshima City 739-0046, Japan
| | - Mario Tagliazucchi
- Universidad de Buenos Aires, Ciudad Universitaria, C1428EHA Ciudad Autónoma de Buenos Aires, Buenos Aires 1428 Argentina
| | - Dmitri V Talapin
- Department of Chemistry, James Franck Institute and Pritzker School of Molecular Engineering, The University of Chicago, Chicago, Illinois 60637, USA
- Center for Nanoscale Materials, Argonne National Laboratory, Argonne, IL 60439, USA
| | - Alexei V Tkachenko
- Center for Functional Nanomaterials, Brookhaven National Laboratory, Upton, New York 11973, USA
| | - Sergei Tretiak
- Theoretical Division and Center for Integrated Nanotechnologies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
| | - David Vaknin
- Iowa State University and Ames Lab, Ames, Iowa 50011, USA
| | - Asaph Widmer-Cooper
- ARC Centre of Excellence in Exciton Science, School of Chemistry, University of Sydney, Sydney, New South Wales 2006, Australia
- The University of Sydney Nano Institute, University of Sydney, Sydney, New South Wales 2006, Australia
| | - Gerard C L Wong
- Department of Bioengineering, University of California, Los Angeles, California 90095, USA
- Department of Chemistry and Biochemistry, University of California, Los Angeles, California 90095, USA
- Department of Microbiology, Immunology & Molecular Genetics, University of California, Los Angeles, CA 90095, USA
- California NanoSystems Institute, University of California, Los Angeles, CA 90095, USA
| | - Xingchen Ye
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, USA
| | - Shan Zhou
- Department of Nanoscience and Biomedical Engineering, South Dakota School of Mines and Technology, Rapid City, South Dakota 57701, USA
| | - Eran Rabani
- Department of Chemistry, University of California and Materials Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
- The Raymond and Beverly Sackler Center of Computational Molecular and Materials Science, Tel Aviv University, Tel Aviv 69978, Israel
| | - Michael Engel
- Institute for Multiscale Simulation, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91058 Erlangen, Germany
| | - Alex Travesset
- Iowa State University and Ames Lab, Ames, Iowa 50011, USA
| |
Collapse
|
3
|
Luchini G, Paton RS. Bottom-Up Atomistic Descriptions of Top-Down Macroscopic Measurements: Computational Benchmarks for Hammett Electronic Parameters. ACS PHYSICAL CHEMISTRY AU 2024; 4:259-267. [PMID: 38800724 PMCID: PMC11117679 DOI: 10.1021/acsphyschemau.3c00045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Revised: 01/14/2024] [Accepted: 01/16/2024] [Indexed: 05/29/2024]
Abstract
The ability to relate substituent electronic effects to chemical reactivity is a cornerstone of physical organic chemistry and Linear Free Energy Relationships. The computation of electronic parameters is increasingly attractive since they can be obtained rapidly for structures and substituents without available experimental data and can be applied beyond aromatic substituents, for example, in studies of transition metal complexes and aliphatic and radical systems. Nevertheless, the description of "top-down" macroscopic observables, such as Hammett parameters using a "bottom-up" computational approach, poses several challenges for the practitioner. We have examined and benchmarked the performance of various computational charge schemes encompassing quantum mechanical methods that partition charge density, methods that fit charge to physical observables, and methods enhanced by semiempirical adjustments alongside NMR values. We study the locations of the atoms used to obtain these descriptors and their correlation with empirical Hammett parameters and rate differences resulting from electronic effects. These seemingly small choices have a much more significant impact than previously imagined, which outweighs the level of theory or basis set used. We observe a wide range of performance across the different computational protocols and observe stark and surprising differences in the ability of computational parameters to capture para- vs meta-electronic effects. In general, σm predictions fare much worse than σp. As a result, the choice of where to compute these descriptors-for the ring carbons or the attached H or other substituent atoms-affects their ability to capture experimental electronic differences. Density-based schemes, such as Hirshfeld charges, are more stable toward unphysical charge perturbations that result from nearby functional groups and outperform all other computational descriptors, including several commonly used basis set based schemes such as Natural Population Analysis. Using attached atoms also improves the statistical correlations. We obtained general linear relationships for the global prediction of experimental Hammett parameters from computed descriptors for use in statistical modeling studies.
Collapse
Affiliation(s)
- Guilian Luchini
- Department
of Chemistry, Colorado State University, 1301 Center Ave., Ft. Collins, Colorado 80523-1872, United States
| | - Robert S. Paton
- Department
of Chemistry, Colorado State University, 1301 Center Ave., Ft. Collins, Colorado 80523-1872, United States
| |
Collapse
|
4
|
van Gerwen P, Briling KR, Calvino Alonso Y, Franke M, Corminboeuf C. Benchmarking machine-readable vectors of chemical reactions on computed activation barriers. DIGITAL DISCOVERY 2024; 3:932-943. [PMID: 38756222 PMCID: PMC11094696 DOI: 10.1039/d3dd00175j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 02/28/2024] [Indexed: 05/18/2024]
Abstract
In recent years, there has been a surge of interest in predicting computed activation barriers, to enable the acceleration of the automated exploration of reaction networks. Consequently, various predictive approaches have emerged, ranging from graph-based models to methods based on the three-dimensional structure of reactants and products. In tandem, many representations have been developed to predict experimental targets, which may hold promise for barrier prediction as well. Here, we bring together all of these efforts and benchmark various methods (Morgan fingerprints, the DRFP, the CGR representation-based Chemprop, SLATMd, B2Rl2, EquiReact and language model BERT + RXNFP) for the prediction of computed activation barriers on three diverse datasets.
Collapse
Affiliation(s)
- Puck van Gerwen
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| | - Ksenia R Briling
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| | - Yannick Calvino Alonso
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| | - Malte Franke
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| | - Clemence Corminboeuf
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| |
Collapse
|
5
|
Sigmund LM, S SS, Albers A, Erdmann P, Paton RS, Greb L. Predicting Lewis Acidity: Machine Learning the Fluoride Ion Affinity of p-Block-Atom-Based Molecules. Angew Chem Int Ed Engl 2024; 63:e202401084. [PMID: 38452299 DOI: 10.1002/anie.202401084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 03/01/2024] [Accepted: 03/04/2024] [Indexed: 03/09/2024]
Abstract
"How strong is this Lewis acid?" is a question researchers often approach by calculating its fluoride ion affinity (FIA) with quantum chemistry. Here, we present FIA49k, an extensive FIA dataset with 48,986 data points calculated at the RI-DSD-BLYP-D3(BJ)/def2-QZVPP//PBEh-3c level of theory, including 13 different p-block atoms as the fluoride accepting site. The FIA49k dataset was used to train FIA-GNN, two message-passing graph neural networks, which predict gas and solution phase FIA values of molecules excluded from training with a mean absolute error of 14 kJ mol-1 (r2=0.93) from the SMILES string of the Lewis acid as the only input. The level of accuracy is notable, given the wide energetic range of 750 kJ mol-1 spanned by FIA49k. The model's value was demonstrated with four case studies, including predictions for molecules extracted from the Cambridge Structural Database and by reproducing results from catalysis research available in the literature. Weaknesses of the model are evaluated and interpreted chemically. FIA-GNN and the FIA49k dataset can be reached via a free web app (www.grebgroup.de/fia-gnn).
Collapse
Affiliation(s)
- Lukas M Sigmund
- Anorganisch-Chemisches Institut, Ruprecht-Karls-Universität Heidelberg, Im Neuenheimer Feld 270, 69120, Heidelberg, Germany
- Department of Chemistry, Colorado State University, 1301 Center Avenue, Fort Collins, CO, 80523, USA
| | - Shree Sowndarya S
- Department of Chemistry, Colorado State University, 1301 Center Avenue, Fort Collins, CO, 80523, USA
| | - Andreas Albers
- Anorganisch-Chemisches Institut, Ruprecht-Karls-Universität Heidelberg, Im Neuenheimer Feld 270, 69120, Heidelberg, Germany
| | - Philipp Erdmann
- Anorganisch-Chemisches Institut, Ruprecht-Karls-Universität Heidelberg, Im Neuenheimer Feld 270, 69120, Heidelberg, Germany
| | - Robert S Paton
- Department of Chemistry, Colorado State University, 1301 Center Avenue, Fort Collins, CO, 80523, USA
| | - Lutz Greb
- Anorganisch-Chemisches Institut, Ruprecht-Karls-Universität Heidelberg, Im Neuenheimer Feld 270, 69120, Heidelberg, Germany
| |
Collapse
|
6
|
Gallarati S, van Gerwen P, Laplaza R, Brey L, Makaveev A, Corminboeuf C. A genetic optimization strategy with generality in asymmetric organocatalysis as a primary target. Chem Sci 2024; 15:3640-3660. [PMID: 38455002 PMCID: PMC10915838 DOI: 10.1039/d3sc06208b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 01/30/2024] [Indexed: 03/09/2024] Open
Abstract
A catalyst possessing a broad substrate scope, in terms of both turnover and enantioselectivity, is sometimes called "general". Despite their great utility in asymmetric synthesis, truly general catalysts are difficult or expensive to discover via traditional high-throughput screening and are, therefore, rare. Existing computational tools accelerate the evaluation of reaction conditions from a pre-defined set of experiments to identify the most general ones, but cannot generate entirely new catalysts with enhanced substrate breadth. For these reasons, we report an inverse design strategy based on the open-source genetic algorithm NaviCatGA and on the OSCAR database of organocatalysts to simultaneously probe the catalyst and substrate scope and optimize generality as a primary target. We apply this strategy to the Pictet-Spengler condensation, for which we curate a database of 820 reactions, used to train statistical models of selectivity and activity. Starting from OSCAR, we define a combinatorial space of millions of catalyst possibilities, and perform evolutionary experiments on a diverse substrate scope that is representative of the whole chemical space of tetrahydro-β-carboline products. While privileged catalysts emerge, we show how genetic optimization can address the broader question of generality in asymmetric synthesis, extracting structure-performance relationships from the challenging areas of chemical space.
Collapse
Affiliation(s)
- Simone Gallarati
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
| | - Puck van Gerwen
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
- National Center for Competence in Research - Catalysis (NCCR-Catalysis), Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
| | - Ruben Laplaza
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
- National Center for Competence in Research - Catalysis (NCCR-Catalysis), Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
| | - Lucien Brey
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
| | - Alexander Makaveev
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
| | - Clemence Corminboeuf
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
- National Center for Competence in Research - Catalysis (NCCR-Catalysis), Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
- National Center for Computational Design and Discovery of Novel Materials (MARVEL), Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
| |
Collapse
|
7
|
Kalikadien AV, Mirza A, Hossaini AN, Sreenithya A, Pidko EA. Paving the road towards automated homogeneous catalyst design. Chempluschem 2024:e202300702. [PMID: 38279609 DOI: 10.1002/cplu.202300702] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 12/20/2023] [Indexed: 01/28/2024]
Abstract
In the past decade, computational tools have become integral to catalyst design. They continue to offer significant support to experimental organic synthesis and catalysis researchers aiming for optimal reaction outcomes. More recently, data-driven approaches utilizing machine learning have garnered considerable attention for their expansive capabilities. This Perspective provides an overview of diverse initiatives in the realm of computational catalyst design and introduces our automated tools tailored for high-throughput in silico exploration of the chemical space. While valuable insights are gained through methods for high-throughput in silico exploration and analysis of chemical space, their degree of automation and modularity are key. We argue that the integration of data-driven, automated and modular workflows is key to enhancing homogeneous catalyst design on an unprecedented scale, contributing to the advancement of catalysis research.
Collapse
Affiliation(s)
- Adarsh V Kalikadien
- Inorganic Systems Engineering, Department of Chemical Engineering, Faculty of Applied Sciences, Delft University of Technology, Van der Maasweg 9, 2629 HZ, Delft, The Netherlands
| | - Adrian Mirza
- Inorganic Systems Engineering, Department of Chemical Engineering, Faculty of Applied Sciences, Delft University of Technology, Van der Maasweg 9, 2629 HZ, Delft, The Netherlands
| | - Aydin Najl Hossaini
- Inorganic Systems Engineering, Department of Chemical Engineering, Faculty of Applied Sciences, Delft University of Technology, Van der Maasweg 9, 2629 HZ, Delft, The Netherlands
| | - Avadakkam Sreenithya
- Inorganic Systems Engineering, Department of Chemical Engineering, Faculty of Applied Sciences, Delft University of Technology, Van der Maasweg 9, 2629 HZ, Delft, The Netherlands
| | - Evgeny A Pidko
- Inorganic Systems Engineering, Department of Chemical Engineering, Faculty of Applied Sciences, Delft University of Technology, Van der Maasweg 9, 2629 HZ, Delft, The Netherlands
| |
Collapse
|
8
|
Raghavan P, Haas BC, Ruos ME, Schleinitz J, Doyle AG, Reisman SE, Sigman MS, Coley CW. Dataset Design for Building Models of Chemical Reactivity. ACS CENTRAL SCIENCE 2023; 9:2196-2204. [PMID: 38161380 PMCID: PMC10755851 DOI: 10.1021/acscentsci.3c01163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 11/06/2023] [Accepted: 11/15/2023] [Indexed: 01/03/2024]
Abstract
Models can codify our understanding of chemical reactivity and serve a useful purpose in the development of new synthetic processes via, for example, evaluating hypothetical reaction conditions or in silico substrate tolerance. Perhaps the most determining factor is the composition of the training data and whether it is sufficient to train a model that can make accurate predictions over the full domain of interest. Here, we discuss the design of reaction datasets in ways that are conducive to data-driven modeling, emphasizing the idea that training set diversity and model generalizability rely on the choice of molecular or reaction representation. We additionally discuss the experimental constraints associated with generating common types of chemistry datasets and how these considerations should influence dataset design and model building.
Collapse
Affiliation(s)
- Priyanka Raghavan
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Brittany C. Haas
- Department
of Chemistry, University of Utah, Salt Lake City, Utah 84112, United States
| | - Madeline E. Ruos
- Department
of Chemistry & Biochemistry, University
of California, Los Angeles, Los Angeles, California 90095, United States
| | - Jules Schleinitz
- Division
of Chemistry and Chemical Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| | - Abigail G. Doyle
- Department
of Chemistry & Biochemistry, University
of California, Los Angeles, Los Angeles, California 90095, United States
| | - Sarah E. Reisman
- Division
of Chemistry and Chemical Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| | - Matthew S. Sigman
- Department
of Chemistry, University of Utah, Salt Lake City, Utah 84112, United States
| | - Connor W. Coley
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
9
|
Lu H, Kang X, Yu H, Zhang W, Luo Y. Using a single complex to predict the reaction energy profile: a case study of Pd/Ni-catalyzed ethylene polymerization. Dalton Trans 2023; 52:14790-14796. [PMID: 37807861 DOI: 10.1039/d3dt02745g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/10/2023]
Abstract
Mechanism-driven catalyst screening could be greatly accelerated by quantitative prediction models of the reaction energy profile. Here, we propose a novel method for molecular representation, taking palladium- and nickel-catalyzed ethylene polymerization as model reactions. The geometric parameters (GPfra) and electron occupancies (EOfra) from the non-ligand fragment of the η3-complex were extracted as the molecular descriptors, followed by constructing the reaction energy profile prediction models on the basis of various regression algorithms. The models showed great accuracy with respect to both theoretical and experimental data. More importantly, the models are convenient for training and utilization. On one hand, all the features were easily captured from the single η3-complex. On the other hand, further investigation also demonstrated that the models could be constructed with a small training sample size. We believe that our featurization method could possibly be generalized to more organometallic reactions and paves the way to efficient catalyst design.
Collapse
Affiliation(s)
- Han Lu
- State Key Laboratory of Fine Chemicals, School of Chemical Engineering, Dalian University of Technology, Dalian 116024, China.
| | - Xiaohui Kang
- College of Pharmacy, Dalian Medical University, Dalian 116044, China
| | - Hang Yu
- Liaoning Key Laboratory of Clean Energy, Shenyang Aerospace University, Shenyang 110136, China
| | - Wenzhen Zhang
- State Key Laboratory of Fine Chemicals, School of Chemical Engineering, Dalian University of Technology, Dalian 116024, China.
| | - Yi Luo
- State Key Laboratory of Fine Chemicals, School of Chemical Engineering, Dalian University of Technology, Dalian 116024, China.
- PetroChina Petrochemical Research Institute, Beijing 102206, China
| |
Collapse
|
10
|
Li CH, Tabor DP. Generative organic electronic molecular design informed by quantum chemistry. Chem Sci 2023; 14:11045-11055. [PMID: 37860647 PMCID: PMC10583709 DOI: 10.1039/d3sc03781a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2023] [Accepted: 09/11/2023] [Indexed: 10/21/2023] Open
Abstract
Generative molecular design strategies have emerged as promising alternatives to trial-and-error approaches for exploring and optimizing within large chemical spaces. To date, generative models with reinforcement learning approaches have frequently used low-cost methods to evaluate the quality of the generated molecules, enabling many loops through the generative model. However, for functional molecular materials tasks, such low-cost methods are either not available or would require the generation of large amounts of training data to train surrogate machine learning models. In this work, we develop a framework that connects the REINVENT reinforcement learning framework with excited state quantum chemistry calculations to discover molecules with specified molecular excited state energy levels, specifically molecules with excited state landscapes that would serve as promising singlet fission or triplet-triplet annihilation materials. We employ a two-step curriculum strategy to first find a set of diverse promising molecules, then demonstrate the framework's ability to exploit a more focused chemical space with anthracene derivatives. Under this protocol, we show that the framework can find desired molecules and improve Pareto fronts for targeted properties versus synthesizability. Moreover, we are able to find several different design principles used by chemists for the design of singlet fission and triplet-triplet annihilation molecules.
Collapse
Affiliation(s)
- Cheng-Han Li
- Department of Chemistry, Texas A&M University College Station TX 77842 USA
| | - Daniel P Tabor
- Department of Chemistry, Texas A&M University College Station TX 77842 USA
| |
Collapse
|
11
|
Mahjour B, Zhang R, Shen Y, McGrath A, Zhao R, Mohamed OG, Lin Y, Zhang Z, Douthwaite JL, Tripathi A, Cernak T. Rapid planning and analysis of high-throughput experiment arrays for reaction discovery. Nat Commun 2023; 14:3924. [PMID: 37400469 DOI: 10.1038/s41467-023-39531-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Accepted: 06/13/2023] [Indexed: 07/05/2023] Open
Abstract
High-throughput experimentation (HTE) is an increasingly important tool in reaction discovery. While the hardware for running HTE in the chemical laboratory has evolved significantly in recent years, there remains a need for software solutions to navigate data-rich experiments. Here we have developed phactor™, a software that facilitates the performance and analysis of HTE in a chemical laboratory. phactor™ allows experimentalists to rapidly design arrays of chemical reactions or direct-to-biology experiments in 24, 96, 384, or 1,536 wellplates. Users can access online reagent data, such as a chemical inventory, to virtually populate wells with experiments and produce instructions to perform the reaction array manually, or with the assistance of a liquid handling robot. After completion of the reaction array, analytical results can be uploaded for facile evaluation, and to guide the next series of experiments. All chemical data, metadata, and results are stored in machine-readable formats that are readily translatable to various software. We also demonstrate the use of phactor™ in the discovery of several chemistries, including the identification of a low micromolar inhibitor of the SARS-CoV-2 main protease. Furthermore, phactor™ has been made available for free academic use in 24- and 96-well formats via an online interface.
Collapse
Affiliation(s)
- Babak Mahjour
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Rui Zhang
- Department of Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Yuning Shen
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Andrew McGrath
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Ruheng Zhao
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Osama G Mohamed
- Natural Products Discovery Core, Life Sciences Institute, University of Michigan, Ann Arbor, MI, USA
| | - Yingfu Lin
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Zirong Zhang
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - James L Douthwaite
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Ashootosh Tripathi
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
- Natural Products Discovery Core, Life Sciences Institute, University of Michigan, Ann Arbor, MI, USA
| | - Tim Cernak
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA.
- Department of Chemistry, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
12
|
Li SW, Xu LC, Zhang C, Zhang SQ, Hong X. Reaction performance prediction with an extrapolative and interpretable graph model based on chemical knowledge. Nat Commun 2023; 14:3569. [PMID: 37322041 DOI: 10.1038/s41467-023-39283-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 05/31/2023] [Indexed: 06/17/2023] Open
Abstract
Accurate prediction of reactivity and selectivity provides the desired guideline for synthetic development. Due to the high-dimensional relationship between molecular structure and synthetic function, it is challenging to achieve the predictive modelling of synthetic transformation with the required extrapolative ability and chemical interpretability. To meet the gap between the rich domain knowledge of chemistry and the advanced molecular graph model, herein we report a knowledge-based graph model that embeds the digitalized steric and electronic information. In addition, a molecular interaction module is developed to enable the learning of the synergistic influence of reaction components. In this study, we demonstrate that this knowledge-based graph model achieves excellent predictions of reaction yield and stereoselectivity, whose extrapolative ability is corroborated by additional scaffold-based data splittings and experimental verifications with new catalysts. Because of the embedding of local environment, the model allows the atomic level of interpretation of the steric and electronic influence on the overall synthetic performance, which serves as a useful guide for the molecular engineering towards the target synthetic function. This model offers an extrapolative and interpretable approach for reaction performance prediction, pointing out the importance of chemical knowledge-constrained reaction modelling for synthetic purpose.
Collapse
Affiliation(s)
- Shu-Wen Li
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou, 310027, China
| | - Li-Cheng Xu
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou, 310027, China
| | - Cheng Zhang
- Department of Chemistry, University of Science and Technology of China, Hefei, China
| | - Shuo-Qing Zhang
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou, 310027, China.
| | - Xin Hong
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou, 310027, China.
- Beijing National Laboratory for Molecular Sciences, Zhongguancun North First Street No. 2, Beijing, 100190, PR China.
- Key Laboratory of Precise Synthesis of Functional Molecules of Zhejiang Province, School of Science, Westlake University, 18 Shilongshan Road, Hangzhou, 310024, Zhejiang Province, China.
| |
Collapse
|
13
|
Zhang ZJ, Li SW, Oliveira JCA, Li Y, Chen X, Zhang SQ, Xu LC, Rogge T, Hong X, Ackermann L. Data-driven design of new chiral carboxylic acid for construction of indoles with C-central and C-N axial chirality via cobalt catalysis. Nat Commun 2023; 14:3149. [PMID: 37258542 DOI: 10.1038/s41467-023-38872-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Accepted: 05/16/2023] [Indexed: 06/02/2023] Open
Abstract
Challenging enantio- and diastereoselective cobalt-catalyzed C-H alkylation has been realized by an innovative data-driven knowledge transfer strategy. Harnessing the statistics of a related transformation as the knowledge source, the designed machine learning (ML) model took advantage of delta learning and enabled accurate and extrapolative enantioselectivity predictions. Powered by the knowledge transfer model, the virtual screening of a broad scope of 360 chiral carboxylic acids led to the discovery of a new catalyst featuring an intriguing furyl moiety. Further experiments verified that the predicted chiral carboxylic acid can achieve excellent stereochemical control for the target C-H alkylation, which supported the expedient synthesis for a large library of substituted indoles with C-central and C-N axial chirality. The reported machine learning approach provides a powerful data engine to accelerate the discovery of molecular catalysis by harnessing the hidden value of the available structure-performance statistics.
Collapse
Affiliation(s)
- Zi-Jing Zhang
- Institut für Organische und Biomolekulare Chemie, Georg-August-Universität Göttingen, Tammannstraße 2, 37077, Göttingen, Germany
| | - Shu-Wen Li
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou, 310027, PR China
| | - João C A Oliveira
- Institut für Organische und Biomolekulare Chemie, Georg-August-Universität Göttingen, Tammannstraße 2, 37077, Göttingen, Germany
| | - Yanjun Li
- Institut für Organische und Biomolekulare Chemie, Georg-August-Universität Göttingen, Tammannstraße 2, 37077, Göttingen, Germany
| | - Xinran Chen
- Institut für Organische und Biomolekulare Chemie, Georg-August-Universität Göttingen, Tammannstraße 2, 37077, Göttingen, Germany
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou, 310027, PR China
| | - Shuo-Qing Zhang
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou, 310027, PR China
| | - Li-Cheng Xu
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou, 310027, PR China
| | - Torben Rogge
- Institut für Organische und Biomolekulare Chemie, Georg-August-Universität Göttingen, Tammannstraße 2, 37077, Göttingen, Germany
| | - Xin Hong
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou, 310027, PR China.
- Beijing National Laboratory for Molecular Sciences, Zhongguancun North First Street No. 2, Beijing, 100190, PR China.
- Key Laboratory of Precise Synthesis of Functional Molecules of Zhejiang Province, School of Science, Westlake University, 18 Shilongshan Road, Hangzhou, 310024, Zhejiang Province, PR China.
| | - Lutz Ackermann
- Institut für Organische und Biomolekulare Chemie, Georg-August-Universität Göttingen, Tammannstraße 2, 37077, Göttingen, Germany.
- Wöhler Research Institute for Sustainable Chemistry (WISCh), Georg-August-Universität Göttingen, Tammannstraße 2, 37077, Göttingen, Germany.
| |
Collapse
|
14
|
Zhou Z, Eden M, Shen W. Treat Molecular Linear Notations as Sentences: Accurate Quantitative Structure–Property Relationship Modeling via a Natural Language Processing Approach. Ind Eng Chem Res 2023. [DOI: 10.1021/acs.iecr.2c04070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/31/2023]
|
15
|
Alegre‐Requena JV, Sowndarya S. V. S, Pérez‐Soto R, Alturaifi TM, Paton RS. AQME: Automated quantum mechanical environments for researchers and educators. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2023. [DOI: 10.1002/wcms.1663] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/01/2023]
Affiliation(s)
- Juan V. Alegre‐Requena
- Dpto. de Química Inorgánica Instituto de Síntesis Química y Catálisis Homogénea (ISQCH) CSIC‐Universidad de Zaragoza Zaragoza Spain
| | | | - Raúl Pérez‐Soto
- Department of Chemistry Colorado State University Fort Collins Colorado USA
| | - Turki M. Alturaifi
- Department of Chemistry Colorado State University Fort Collins Colorado USA
| | - Robert S. Paton
- Department of Chemistry Colorado State University Fort Collins Colorado USA
| |
Collapse
|
16
|
Kuang Y, Lai J, Reid JP. Transferrable selectivity profiles enable prediction in synergistic catalyst space. Chem Sci 2023; 14:1885-1895. [PMID: 36819850 PMCID: PMC9931051 DOI: 10.1039/d2sc05974f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Accepted: 01/10/2023] [Indexed: 01/19/2023] Open
Abstract
Organometallic intermediates participate in many multi-catalytic enantioselective transformations directed by a chiral catalyst, but the requirement of optimizing two catalyst components is a significant barrier to widely adopting this approach for chiral molecule synthesis. Algorithms can potentially accelerate the screening process by developing quantitative structure-function relationships from large experimental datasets. However, the chemical data available in this catalyst space is limited. Herein, we report a data-driven strategy that effectively translates selectivity relationships trained on enantioselectivity outcomes derived from one catalyst reaction systems where an abundance of data exists, to synergistic catalyst space. We describe three case studies involving different modes of catalysis (Brønsted acid, chiral anion, and secondary amine) that substantiate the prospect of this approach to predict and elucidate selectivity in reactions where more than one catalyst is involved. Ultimately, the success in applying our approach to diverse areas of asymmetric catalysis implies that this general workflow should find broad use in the study and development of new enantioselective, multi-catalytic processes.
Collapse
Affiliation(s)
- Yutao Kuang
- Department of Chemistry, University of British Columbia 2036 Main Mall, Vancouver British Columbia V6T 1Z1 Canada
| | - Junshan Lai
- Department of Chemistry, University of British Columbia 2036 Main Mall, Vancouver British Columbia V6T 1Z1 Canada
| | - Jolene P. Reid
- Department of Chemistry, University of British Columbia2036 Main Mall, VancouverBritish ColumbiaV6T 1Z1Canada
| |
Collapse
|
17
|
Singh S, Sunoj RB. Molecular Machine Learning for Chemical Catalysis: Prospects and Challenges. Acc Chem Res 2023; 56:402-412. [PMID: 36715248 DOI: 10.1021/acs.accounts.2c00801] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
ConspectusIn the domain of reaction development, one aims to obtain higher efficacies as measured in terms of yield and/or selectivities. During the empirical cycles, an admixture of outcomes from low to high yields/selectivities is expected. While it is not easy to identify all of the factors that might impact the reaction efficiency, complex and nonlinear dependence on the nature of reactants, catalysts, solvents, etc. is quite likely. Developmental stages of newer reactions would typically offer a few hundreds of samples with variations in participating molecules and/or reaction conditions. These "observations" and their "output" can be harnessed as valuable labeled data for developing molecular machine learning (ML) models. Once a robust ML model is built for a specific reaction under development, it can predict the reaction outcome for any new choice of substrates/catalyst in a few seconds/minutes and thus can expedite the identification of promising candidates for experimental validation. Recent years have witnessed impressive applications of ML in the molecular world, most of them aimed at predicting important chemical or biological properties. We believe that an integration of effective ML workflows can be made richly beneficial to reaction discovery.As with any new technology, direct adaptation of ML as used in well-developed domains, such as natural language processing (NLP) and image recognition, is unlikely to succeed in reaction discovery. Some of the challenges stem from ineffective featurization of the molecular space, unavailability of quality data and its distribution, in making the right choice of ML model and its technically robust deployment. It shall be noted that there is no universal ML model suitable for an inherently high-dimensional problem such as chemical reactions. Given these backgrounds, rendering ML tools conducive for reactions is an exciting as well as challenging endeavor at the same time. With the increased availability of efficient ML algorithms, we focused on tapping their potential for small-data reaction discovery (a few hundreds to thousands of samples).In this Account, we describe both feature engineering and feature learning approaches for molecular ML as applied to diverse reactions of high contemporary interest. Among these, catalytic asymmetric hydrogenation of imines/alkenes, β-C(sp3)-H bond functionalization, and relay Heck reaction employed a feature engineering approach using the quantum-chemically derived physical organic descriptors as the molecular features─all designed to predict the enantioselectivity. The selection of molecular features to customize it for a reaction of interest is described, along with emphasizing the chemical insights that could be gathered through the use of such features. Feature learning methods for predicting the yield of Buchwald-Hartwig cross-coupling, deoxyfluorination of alcohols, and enantioselectivity of N,S-acetal formation are found to offer excellent predictions. We propose a transfer learning protocol, wherein an ML model such as a language model is trained on a large number of molecules (105-106) and fine-tuned on a focused library of target task reactions, as an effective alternative for small-data reaction discovery (102-103 reactions). The exploitation of deep neural network latent space as a method for generative tasks to identify useful substrates for a reaction is demonstrated as a promising strategy.
Collapse
Affiliation(s)
- Sukriti Singh
- Department of Chemistry, Indian Institute of Technology Bombay, Mumbai 400076, India
| | - Raghavan B Sunoj
- Department of Chemistry, Indian Institute of Technology Bombay, Mumbai 400076, India.,Centre for Machine Intelligence and Data Science, Indian Institute of Technology Bombay, Mumbai 400076, India
| |
Collapse
|
18
|
Zhang SQ, Xu LC, Li SW, Oliveira JCA, Li X, Ackermann L, Hong X. Bridging Chemical Knowledge and Machine Learning for Performance Prediction of Organic Synthesis. Chemistry 2023; 29:e202202834. [PMID: 36206170 PMCID: PMC10099903 DOI: 10.1002/chem.202202834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Indexed: 11/29/2022]
Abstract
Recent years have witnessed a boom of machine learning (ML) applications in chemistry, which reveals the potential of data-driven prediction of synthesis performance. Digitalization and ML modelling are the key strategies to fully exploit the unique potential within the synergistic interplay between experimental data and the robust prediction of performance and selectivity. A series of exciting studies have demonstrated the importance of chemical knowledge implementation in ML, which improves the model's capability for making predictions that are challenging and often go beyond the abilities of human beings. This Minireview summarizes the cutting-edge embedding techniques and model designs in synthetic performance prediction, elaborating how chemical knowledge can be incorporated into machine learning until June 2022. By merging organic synthesis tactics and chemical informatics, we hope this Review can provide a guide map and intrigue chemists to revisit the digitalization and computerization of organic chemistry principles.
Collapse
Affiliation(s)
- Shuo-Qing Zhang
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, 38 Zheda Road, Hangzhou, 310027, P. R. China
| | - Li-Cheng Xu
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, 38 Zheda Road, Hangzhou, 310027, P. R. China
| | - Shu-Wen Li
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, 38 Zheda Road, Hangzhou, 310027, P. R. China
| | - João C A Oliveira
- Institut für Organische und Biomolekulare Chemie, Wöhler Research Institute for Sustainable Chemistry (WISCh), Georg-August-Universität, Tammannstraße 2, 37077, Göttingen, Germany
| | - Xin Li
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, 38 Zheda Road, Hangzhou, 310027, P. R. China
| | - Lutz Ackermann
- Institut für Organische und Biomolekulare Chemie, Wöhler Research Institute for Sustainable Chemistry (WISCh), Georg-August-Universität, Tammannstraße 2, 37077, Göttingen, Germany
| | - Xin Hong
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, 38 Zheda Road, Hangzhou, 310027, P. R. China.,Beijing National Laboratory for Molecular Sciences, Zhongguancun North First Street No. 2, Beijing, 100190, P. R. China.,Key Laboratory of Precise Synthesis of, Functional Molecules of Zhejiang Province, School of Science, Westlake University, 18 Shilongshan Road, Hangzhou, 310024, Zhejiang Province, P. R. China
| |
Collapse
|
19
|
Dotson JJ, van Dijk L, Timmerman JC, Grosslight S, Walroth RC, Gosselin F, Püntener K, Mack KA, Sigman MS. Data-Driven Multi-Objective Optimization Tactics for Catalytic Asymmetric Reactions Using Bisphosphine Ligands. J Am Chem Soc 2023; 145:110-121. [PMID: 36574729 DOI: 10.1021/jacs.2c08513] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Optimization of the catalyst structure to simultaneously improve multiple reaction objectives (e.g., yield, enantioselectivity, and regioselectivity) remains a formidable challenge. Herein, we describe a machine learning workflow for the multi-objective optimization of catalytic reactions that employ chiral bisphosphine ligands. This was demonstrated through the optimization of two sequential reactions required in the asymmetric synthesis of an active pharmaceutical ingredient. To accomplish this, a density functional theory-derived database of >550 bisphosphine ligands was constructed, and a designer chemical space mapping technique was established. The protocol used classification methods to identify active catalysts, followed by linear regression to model reaction selectivity. This led to the prediction and validation of significantly improved ligands for all reaction outputs, suggesting a general strategy that can be readily implemented for reaction optimizations where performance is controlled by bisphosphine ligands.
Collapse
Affiliation(s)
- Jordan J Dotson
- Department of Chemistry, University of Utah, Salt Lake City, Utah 84112, United States
| | - Lucy van Dijk
- Department of Chemistry, University of Utah, Salt Lake City, Utah 84112, United States
| | - Jacob C Timmerman
- Department of Small Molecule Process Chemistry, Genentech, Inc., South San Francisco, California 94080, United States
| | - Samantha Grosslight
- Department of Chemistry, University of Utah, Salt Lake City, Utah 84112, United States
| | - Richard C Walroth
- Department of Small Molecule Process Chemistry, Genentech, Inc., South San Francisco, California 94080, United States
| | - Francis Gosselin
- Department of Small Molecule Process Chemistry, Genentech, Inc., South San Francisco, California 94080, United States
| | - Kurt Püntener
- Synthetic Molecules Technical Development, Process Chemistry & Catalysis, F. Hoffmann-La Roche Limited, CH-4070 Basel, Switzerland
| | - Kyle A Mack
- Department of Small Molecule Process Chemistry, Genentech, Inc., South San Francisco, California 94080, United States
| | - Matthew S Sigman
- Department of Chemistry, University of Utah, Salt Lake City, Utah 84112, United States
| |
Collapse
|
20
|
Tu Z, Stuyver T, Coley CW. Predictive chemistry: machine learning for reaction deployment, reaction development, and reaction discovery. Chem Sci 2023; 14:226-244. [PMID: 36743887 PMCID: PMC9811563 DOI: 10.1039/d2sc05089g] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Accepted: 11/25/2022] [Indexed: 11/29/2022] Open
Abstract
The field of predictive chemistry relates to the development of models able to describe how molecules interact and react. It encompasses the long-standing task of computer-aided retrosynthesis, but is far more reaching and ambitious in its goals. In this review, we summarize several areas where predictive chemistry models hold the potential to accelerate the deployment, development, and discovery of organic reactions and advance synthetic chemistry.
Collapse
Affiliation(s)
- Zhengkai Tu
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| | - Thijs Stuyver
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| | - Connor W Coley
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| |
Collapse
|
21
|
Lu J, Paci I, Leitch DC. A broadly applicable quantitative relative reactivity model for nucleophilic aromatic substitution (S NAr) using simple descriptors. Chem Sci 2022; 13:12681-12695. [PMID: 36519044 PMCID: PMC9645419 DOI: 10.1039/d2sc04041g] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 10/17/2022] [Indexed: 07/22/2023] Open
Abstract
We report a multivariate linear regression model able to make accurate predictions for the relative rate and regioselectivity of nucleophilic aromatic substitution (SNAr) reactions based on the electrophile structure. This model uses a diverse training/test set from experimentally-determined relative SNAr rates between benzyl alcohol and 74 unique electrophiles, including heterocycles with multiple substitution patterns. There is a robust linear relationship between the experimental SNAr free energies of activation and three molecular descriptors that can be obtained computationally: the electron affinity (EA) of the electrophile; the average molecular electrostatic potential (ESP) at the carbon undergoing substitution; and the sum of average ESP values for the ortho and para atoms relative to the reactive center. Despite using only simple descriptors calculated from ground state wavefunctions, this model demonstrates excellent correlation with previously measured SNAr reaction rates, and is able to accurately predict site selectivity for multihalogenated substrates: 91% prediction accuracy across 82 individual examples. The excellent agreement between predicted and experimental outcomes makes this easy-to-implement reactivity model a potentially powerful tool for synthetic planning.
Collapse
Affiliation(s)
- Jingru Lu
- Department of Chemistry, University of Victoria 3800 Finnerty Rd. Victoria BC CANADA V8P 5C2
| | - Irina Paci
- Department of Chemistry, University of Victoria 3800 Finnerty Rd. Victoria BC CANADA V8P 5C2
| | - David C Leitch
- Department of Chemistry, University of Victoria 3800 Finnerty Rd. Victoria BC CANADA V8P 5C2
| |
Collapse
|
22
|
Krenn M, Ai Q, Barthel S, Carson N, Frei A, Frey NC, Friederich P, Gaudin T, Gayle AA, Jablonka KM, Lameiro RF, Lemm D, Lo A, Moosavi SM, Nápoles-Duarte JM, Nigam A, Pollice R, Rajan K, Schatzschneider U, Schwaller P, Skreta M, Smit B, Strieth-Kalthoff F, Sun C, Tom G, Falk von Rudorff G, Wang A, White AD, Young A, Yu R, Aspuru-Guzik A. SELFIES and the future of molecular string representations. PATTERNS (NEW YORK, N.Y.) 2022; 3:100588. [PMID: 36277819 PMCID: PMC9583042 DOI: 10.1016/j.patter.2022.100588] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Artificial intelligence (AI) and machine learning (ML) are expanding in popularity for broad applications to challenging tasks in chemistry and materials science. Examples include the prediction of properties, the discovery of new reaction pathways, or the design of new molecules. The machine needs to read and write fluently in a chemical language for each of these tasks. Strings are a common tool to represent molecular graphs, and the most popular molecular string representation, Smiles, has powered cheminformatics since the late 1980s. However, in the context of AI and ML in chemistry, Smiles has several shortcomings-most pertinently, most combinations of symbols lead to invalid results with no valid chemical interpretation. To overcome this issue, a new language for molecules was introduced in 2020 that guarantees 100% robustness: SELF-referencing embedded string (Selfies). Selfies has since simplified and enabled numerous new applications in chemistry. In this perspective, we look to the future and discuss molecular string representations, along with their respective opportunities and challenges. We propose 16 concrete future projects for robust molecular representations. These involve the extension toward new chemical domains, exciting questions at the interface of AI and robust languages, and interpretability for both humans and machines. We hope that these proposals will inspire several follow-up works exploiting the full potential of molecular string representations for the future of AI in chemistry and materials science.
Collapse
Affiliation(s)
- Mario Krenn
- Max Planck Institute for the Science of Light (MPL), Erlangen, Germany,Corresponding author
| | - Qianxiang Ai
- Department of Chemistry, Fordham University, The Bronx, NY, USA
| | - Senja Barthel
- Department of Mathematics, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
| | - Nessa Carson
- Syngenta Jealott’s Hill International Research Centre, Bracknell, Berkshire, UK
| | - Angelo Frei
- Department of Chemistry, Imperial College London, Molecular Sciences Research Hub, White City Campus, Wood Lane, London, UK
| | - Nathan C. Frey
- Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Pascal Friederich
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany,Institute of Nanotechnology, Karlsruhe Institute of Technology, Eggenstein-Leopoldshafen, Germany
| | - Théophile Gaudin
- Department of Computer Science, University of Toronto, Toronto, ON, Canada,IBM Research Europe, Zürich, Switzerland
| | | | - Kevin Maik Jablonka
- Laboratory of Molecular Simulation (LSMO), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Sion, Valais, Switzerland
| | - Rafael F. Lameiro
- Medicinal and Biological Chemistry Group, São Carlos Institute of Chemistry, University of São Paulo, São Paulo, Brazil
| | - Dominik Lemm
- Faculty of Physics, University of Vienna, Vienna, Austria
| | - Alston Lo
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Seyed Mohamad Moosavi
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
| | | | - AkshatKumar Nigam
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Robert Pollice
- Department of Computer Science, University of Toronto, Toronto, ON, Canada,Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada
| | - Kohulan Rajan
- Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller Universität Jena, Jena, Germany
| | - Ulrich Schatzschneider
- Institut für Anorganische Chemie, Julius-Maximilians-Universität Würzburg, Würzburg, Germany
| | - Philippe Schwaller
- IBM Research Europe, Zürich, Switzerland,Laboratory of Artificial Chemical Intelligence (LIAC), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland,National Centre of Competence in Research (NCCR) Catalysis, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Marta Skreta
- Department of Computer Science, University of Toronto, Toronto, ON, Canada,Vector Institute for Artificial Intelligence, Toronto, ON, Canada
| | - Berend Smit
- Laboratory of Molecular Simulation (LSMO), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Sion, Valais, Switzerland
| | - Felix Strieth-Kalthoff
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada
| | - Chong Sun
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Gary Tom
- Department of Computer Science, University of Toronto, Toronto, ON, Canada,Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada
| | | | - Andrew Wang
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada,Solar Fuels Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada
| | - Andrew D. White
- Department of Chemical Engineering, University of Rochester, Rochester, NY, USA
| | - Adamo Young
- Department of Computer Science, University of Toronto, Toronto, ON, Canada,Vector Institute for Artificial Intelligence, Toronto, ON, Canada
| | - Rose Yu
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA
| | - Alán Aspuru-Guzik
- Department of Computer Science, University of Toronto, Toronto, ON, Canada,Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada,Vector Institute for Artificial Intelligence, Toronto, ON, Canada,Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON, Canada,Department of Materials Science, University of Toronto, Toronto, ON, Canada,Canadian Institute for Advanced Research (CIFAR) Lebovic Fellow, Toronto, ON, Canada,Corresponding author
| |
Collapse
|
23
|
Fedik N, Zubatyuk R, Kulichenko M, Lubbers N, Smith JS, Nebgen B, Messerly R, Li YW, Boldyrev AI, Barros K, Isayev O, Tretiak S. Extending machine learning beyond interatomic potentials for predicting molecular properties. Nat Rev Chem 2022; 6:653-672. [PMID: 37117713 DOI: 10.1038/s41570-022-00416-3] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/15/2022] [Indexed: 11/09/2022]
Abstract
Machine learning (ML) is becoming a method of choice for modelling complex chemical processes and materials. ML provides a surrogate model trained on a reference dataset that can be used to establish a relationship between a molecular structure and its chemical properties. This Review highlights developments in the use of ML to evaluate chemical properties such as partial atomic charges, dipole moments, spin and electron densities, and chemical bonding, as well as to obtain a reduced quantum-mechanical description. We overview several modern neural network architectures, their predictive capabilities, generality and transferability, and illustrate their applicability to various chemical properties. We emphasize that learned molecular representations resemble quantum-mechanical analogues, demonstrating the ability of the models to capture the underlying physics. We also discuss how ML models can describe non-local quantum effects. Finally, we conclude by compiling a list of available ML toolboxes, summarizing the unresolved challenges and presenting an outlook for future development. The observed trends demonstrate that this field is evolving towards physics-based models augmented by ML, which is accompanied by the development of new methods and the rapid growth of user-friendly ML frameworks for chemistry.
Collapse
|
24
|
Boni YT, Cammarota RC, Liao K, Sigman MS, Davies HML. Leveraging Regio- and Stereoselective C(sp 3)-H Functionalization of Silyl Ethers to Train a Logistic Regression Classification Model for Predicting Site-Selectivity Bias. J Am Chem Soc 2022; 144:15549-15561. [PMID: 35977100 DOI: 10.1021/jacs.2c04383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The C-H functionalization of silyl ethers via carbene-induced C-H insertion represents an efficient synthetic disconnection strategy. In this work, site- and stereoselective C(sp3)-H functionalization at α, γ, δ, and even more distal positions to the siloxy group has been achieved using donor/acceptor carbene intermediates. By exploiting the predilections of Rh2(R-TCPTAD)4 and Rh2(S-2-Cl-5-BrTPCP)4 catalysts to target either more electronically activated or more spatially accessible C-H sites, respectively, divergent desired products can be formed with good diastereocontrol and enantiocontrol. Notably, the reaction can also be extended to enable desymmetrization of meso silyl ethers. Leveraging the broad substrate scope examined in this study, we have trained a machine learning classification model using logistic regression to predict the major C-H functionalization site based on intrinsic substrate reactivity and catalyst propensity for overriding it. This model enables prediction of the major product when applying these C-H functionalization methods to a new substrate of interest. Applying this model broadly, we have demonstrated its utility for guiding late-stage functionalization in complex settings and developed an intuitive visualization tool to assist synthetic chemists in such endeavors.
Collapse
Affiliation(s)
- Yannick T Boni
- Department of Chemistry, Emory University, 1515 Dickey Drive, Atlanta, Georgia 30322, United States
| | - Ryan C Cammarota
- Department of Chemistry, University of Utah, 315 South 1400 East, Salt Lake City, Utah 84112, United States
| | - Kuangbiao Liao
- Department of Chemistry, Emory University, 1515 Dickey Drive, Atlanta, Georgia 30322, United States
| | - Matthew S Sigman
- Department of Chemistry, University of Utah, 315 South 1400 East, Salt Lake City, Utah 84112, United States
| | - Huw M L Davies
- Department of Chemistry, Emory University, 1515 Dickey Drive, Atlanta, Georgia 30322, United States
| |
Collapse
|
25
|
S. V. SS, Law JN, Tripp CE, Duplyakin D, Skordilis E, Biagioni D, Paton RS, St. John PC. Multi-objective goal-directed optimization of de novo stable organic radicals for aqueous redox flow batteries. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00506-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
AbstractAdvances in the field of goal-directed molecular optimization offer the promise of finding feasible candidates for even the most challenging molecular design applications. One example of a fundamental design challenge is the search for novel stable radical scaffolds for an aqueous redox flow battery that simultaneously satisfy redox requirements at the anode and cathode, as relatively few stable organic radicals are known to exist. To meet this challenge, we develop a new open-source molecular optimization framework based on AlphaZero coupled with a fast, machine-learning-derived surrogate objective trained with nearly 100,000 quantum chemistry simulations. The objective function comprises two graph neural networks: one that predicts adiabatic oxidation and reduction potentials and a second that predicts electron density and local three-dimensional environment, previously shown to be correlated with radical persistence and stability. With no hard-coded knowledge of organic chemistry, the reinforcement learning agent finds molecule candidates that satisfy a precise combination of redox, stability and synthesizability requirements defined at the quantum chemistry level, many of which have reasonable predicted retrosynthetic pathways. The optimized molecules show that alternative stable radical scaffolds may offer a unique profile of stability and redox potentials to enable low-cost symmetric aqueous redox flow batteries.
Collapse
|
26
|
When machine learning meets molecular synthesis. TRENDS IN CHEMISTRY 2022. [DOI: 10.1016/j.trechm.2022.07.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
27
|
Lustosa DM, Milo A. Mechanistic Inference from Statistical Models at Different Data-Size Regimes. ACS Catal 2022. [DOI: 10.1021/acscatal.2c01741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Danilo M. Lustosa
- Department of Chemistry, Ben-Gurion University of the Negev, Beer Sheva 84105, Israel
| | - Anat Milo
- Department of Chemistry, Ben-Gurion University of the Negev, Beer Sheva 84105, Israel
| |
Collapse
|
28
|
Lan J, Li X, Yang Y, Zhang X, Chung LW. New Insights and Predictions into Complex Homogeneous Reactions Enabled by Computational Chemistry in Synergy with Experiments: Isotopes and Mechanisms. Acc Chem Res 2022; 55:1109-1123. [PMID: 35385649 DOI: 10.1021/acs.accounts.1c00774] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Homogeneous catalysis and biocatalysis have been widely applied in synthetic, medicinal, and energy chemistry as well as synthetic biology. Driven by developments of new computational chemistry methods and better computer hardware, computational chemistry has become an essentially indispensable mechanistic "instrument" to help understand structures and decipher reaction mechanisms in catalysis. In addition, synergy between computational and experimental chemistry deepens our mechanistic understanding, which further promotes the rational design of new catalysts. In this Account, we summarize new or deeper mechanistic insights (including isotope, dispersion, and dynamical effects) into several complex homogeneous reactions from our systematic computational studies along with subsequent experimental studies by different groups. Apart from uncovering new mechanisms in some reactions, a few computational predictions (such as excited-state heavy-atom tunneling, steric-controlled enantioswitching, and a new geminal addition mechanism) based on our mechanistic insights were further verified by ensuing experiments.The Zimmerman group developed a photoinduced triplet di-π-methane rearrangement to form cyclopropane derivatives. Recently, our computational study predicted the first excited-state heavy-atom (carbon) quantum tunneling in one triplet di-π-methane rearrangement, in which the reaction rates and 12C/13C kinetic isotope effects (KIEs) can be enhanced by quantum tunneling at low temperatures. This unprecedented excited-state heavy-atom tunneling in a photoinduced reaction has recently been verified by an experimental 12C/13C KIE study by the Singleton group. Such combined computational and experimental studies should open up opportunities to discover more rare excited-state heavy-atom tunneling in other photoinduced reactions. In addition, we found unexpectedly large secondary KIE values in the five-coordinate Fe(III)-catalyzed hetero-Diels-Alder pathway, even with substantial C-C bond formation, due to the non-negligible equilibrium isotope effect (EIE) derived from altered metal coordination. Therefore, these KIE values cannot reliably reflect transition-state structures for the five-coordinate metal pathway. Furthermore, our density functional theory (DFT) quasi-classical molecular dynamics (MD) simulations demonstrated that the coordination mode and/or spin state of the iron metal as well as an electric field can affect the dynamics of this reaction (e.g., the dynamically stepwise process, the entrance/exit reaction channels).Moreover, we unveiled a new reaction mechanism to account for the uncommon Ru(II)-catalyzed geminal-addition semihydrogenation and hydroboration of silyl alkynes. Our proposed key gem-Ru(II)-carbene intermediates derived from double migrations on the same alkyne carbon were verified by crossover experiments. Additionally, our DFT MD simulations suggested that the first hydrogen migration transition-state structures may directly and quickly form the key gem-Ru-carbene structures, thereby "bypassing" the second migration step. Furthermore, our extensive study revealed the origin of the enantioselectivity of the Cu(I)-catalyzed 1,3-dipolar cycloaddition of azomethine ylides with β-substituted alkenyl bicyclic heteroarenes enabled by dual coordination of both substrates. Such mechanistic insights promoted our computational predictions of the enantioselectivity reversal for the corresponding monocyclic heteroarene substrates and the regiospecific addition to the less reactive internal C═C bond of one diene substrate. These predictions were proven by our experimental collaborators. Finally, our mechanistic insights into a few other reactions are also presented. Overall, we hope that these interactive computational and experimental studies enrich our mechanistic understanding and aid in reaction development.
Collapse
Affiliation(s)
- Jialing Lan
- School of Chemistry and Chemical Engineering, Harbin Institute of Technology, Harbin 150001, China
- Shenzhen Grubbs Institute, Department of Chemistry, and Guangdong Provincial Key Laboratory of Catalysis, Southern University of Science and Technology (SUSTech), Shenzhen 518055, China
| | - Xin Li
- Shenzhen Grubbs Institute, Department of Chemistry, and Guangdong Provincial Key Laboratory of Catalysis, Southern University of Science and Technology (SUSTech), Shenzhen 518055, China
| | - Yuhong Yang
- Shenzhen Grubbs Institute, Department of Chemistry, and Guangdong Provincial Key Laboratory of Catalysis, Southern University of Science and Technology (SUSTech), Shenzhen 518055, China
| | - Xiaoyong Zhang
- Shenzhen Grubbs Institute, Department of Chemistry, and Guangdong Provincial Key Laboratory of Catalysis, Southern University of Science and Technology (SUSTech), Shenzhen 518055, China
| | - Lung Wa Chung
- Shenzhen Grubbs Institute, Department of Chemistry, and Guangdong Provincial Key Laboratory of Catalysis, Southern University of Science and Technology (SUSTech), Shenzhen 518055, China
| |
Collapse
|
29
|
Stuyver T, Coley CW. Quantum chemistry-augmented neural networks for reactivity prediction: Performance, generalizability, and explainability. J Chem Phys 2022; 156:084104. [DOI: 10.1063/5.0079574] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
There is a perceived dichotomy between structure-based and descriptor-based molecular representations used for predictive chemistry tasks. Here, we study the performance, generalizability, and explainability of the quantum mechanics-augmented graph neural network (ml-QM-GNN) architecture as applied to the prediction of regioselectivity (classification) and of activation energies (regression). In our hybrid QM-augmented model architecture, structure-based representations are first used to predict a set of atom- and bond-level reactivity descriptors derived from density functional theory calculations. These estimated reactivity descriptors are combined with the original structure-based representation to make the final reactivity prediction. We demonstrate that our model architecture leads to significant improvements over structure-based GNNs in not only overall accuracy but also in generalization to unseen compounds. Even when provided training sets of only a couple hundred labeled data points, the ml-QM-GNN outperforms other state-of-the-art structure-based architectures that have been applied to these tasks as well as descriptor-based (linear) regressions. As a primary contribution of this work, we demonstrate a bridge between data-driven predictions and conceptual frameworks commonly used to gain qualitative insights into reactivity phenomena, taking advantage of the fact that our models are grounded in (but not restricted to) QM descriptors. This effort results in a productive synergy between theory and data science, wherein QM-augmented models provide a data-driven confirmation of previous qualitative analyses, and these analyses in turn facilitate insights into the decision-making process occurring within ml-QM-GNNs.
Collapse
Affiliation(s)
- Thijs Stuyver
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| | - Connor W. Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| |
Collapse
|
30
|
Wigh DS, Goodman JM, Lapkin AA. A review of molecular representation in the age of machine learning. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1603] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Affiliation(s)
- Daniel S. Wigh
- Department of Chemical Engineering and Biotechnology University of Cambridge Cambridge UK
| | | | - Alexei A. Lapkin
- Department of Chemical Engineering and Biotechnology University of Cambridge Cambridge UK
| |
Collapse
|
31
|
Li Y, Zhang J, Zhao X, Wang Y. Exploring the chemistry of E/Z configuration in gold-catalyzed domino cyclization: Insights on the stereoselectivity. MOLECULAR CATALYSIS 2022. [DOI: 10.1016/j.mcat.2022.112154] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
|
32
|
Lai J, Reid JP. Interrogating the Thionium Hydrogen Bond as a Noncovalent Stereocontrolling Interaction in Chiral Phosphate Catalysis. Chem Sci 2022; 13:11065-11073. [PMID: 36320465 PMCID: PMC9516887 DOI: 10.1039/d2sc02171d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Accepted: 08/15/2022] [Indexed: 12/04/2022] Open
Abstract
CH⋯O bonds are a privileged noncovalent interaction determining the energies and geometries of a large number of structures. In catalytic settings, these are invoked as a decisive feature controlling many asymmetric transformations involving aldehydes. However, little is known about their stereochemical role when the interaction involves other substrate types. We report the results of computations that show for the first time thionium hydrogen bonds to be an important noncovalent interaction in asymmetric catalysis. As a validating case study, we explored an asymmetric Pummerer rearrangement involving thionium intermediates to yield enantioenriched N,S-acetals under BINOL-derived chiral phosphate catalysis. DFT and QM/MM hybrid calculations showed that the lowest energy pathway corresponded to a transition state involving two hydrogen bonding interactions from the thionium intermediate to the catalyst. However, the enantiomer resulting from this process differed from the originally published absolute configuration. Experimental determination of the absolute configuration resolved this conflict in favor of our calculations. The reaction features required for enantioselectivity were further interrogated by statistical modeling analysis that utilized bespoke featurization techniques to enable the translation of enantioselectivity trends from intermolecular reactions to those proceeding intramolecularly. Through this suite of computational modeling techniques, a new model is revealed that provides a different explanation for the product outcome and enabled reassignment of the absolute product configuration. Transferable selectivity profiles allow data from intermolecular reactions using iminium substrates to be applied to predict intramolecular reactions involving thioniums.![]()
Collapse
Affiliation(s)
- Junshan Lai
- Department of Chemistry, University of British Columbia 2036 Main Mall Vancouver British Columbia V6T 1Z1 Canada
| | - Jolene P Reid
- Department of Chemistry, University of British Columbia 2036 Main Mall Vancouver British Columbia V6T 1Z1 Canada
| |
Collapse
|
33
|
Lu J, Donnecke S, Paci I, Leitch DC. A reactivity model for oxidative addition to palladium enables quantitative predictions for catalytic cross-coupling reactions. Chem Sci 2022; 13:3477-3488. [PMID: 35432873 PMCID: PMC8943861 DOI: 10.1039/d2sc00174h] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Accepted: 02/28/2022] [Indexed: 11/21/2022] Open
Abstract
Making accurate, quantitative predictions of chemical reactivity based on molecular structure is an unsolved problem in chemical synthesis, particularly for complex molecules. We report an approach to reactivity prediction for...
Collapse
Affiliation(s)
- Jingru Lu
- Department of Chemistry, University of Victoria 3800 Finnerty Rd Victoria BC V8P 5C2 Canada
| | - Sofia Donnecke
- Department of Chemistry, University of Victoria 3800 Finnerty Rd Victoria BC V8P 5C2 Canada
| | - Irina Paci
- Department of Chemistry, University of Victoria 3800 Finnerty Rd Victoria BC V8P 5C2 Canada
| | - David C Leitch
- Department of Chemistry, University of Victoria 3800 Finnerty Rd Victoria BC V8P 5C2 Canada
| |
Collapse
|
34
|
Cencer MM, Moore JS, Assary RS. Machine learning for polymeric materials: an introduction. POLYM INT 2021. [DOI: 10.1002/pi.6345] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Affiliation(s)
- Morgan M Cencer
- Department of Chemistry University of Illinois at Urbana‐Champaign Urbana IL USA
- Materials Science Division Argonne National Laboratory Lemont IL USA
- Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana‐Champaign Urbana IL USA
| | - Jeffrey S Moore
- Department of Chemistry University of Illinois at Urbana‐Champaign Urbana IL USA
- Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana‐Champaign Urbana IL USA
| | - Rajeev S Assary
- Materials Science Division Argonne National Laboratory Lemont IL USA
| |
Collapse
|
35
|
Lu H, Kang X, Luo Y. Structure-Based Relative Energy Prediction Model: A Case Study of Pd(II)-Catalyzed Ethylene Polymerization and the Electronic Effect of Ancillary Ligands. J Phys Chem B 2021; 125:12047-12053. [PMID: 34694809 DOI: 10.1021/acs.jpcb.1c05143] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Rapidly mapping a reaction energy profile to understand the reaction mechanism is of great importance and highly desired for the discovery of new chemical reactions. Herein, a combination of density functional theory (DFT) calculations and regression analysis has been applied to construct quantitative structures-based energy prediction models, considering Pd(II)-catalyzed ethylene polymerization as an example, for rapid construction of the reaction energy profile. It is inspiring that only geometrical parameters of the reaction center of one species are capable of predicting the whole energy profile with high accuracy. The reaction energies of ethylene insertion and β-H elimination, which directly correlate with polymerization activity and the possibility of branch formation, were studied to elucidate the electronic effects of ancillary ligands. Further analyses of these models from the statistical and chemical points of view afforded useful information on the design of the catalyst ligand. The current work is expected to methodologically shed new light on rapidly mapping the energy profile of chemical reactions and further provide useful information for the development of the reactions.
Collapse
Affiliation(s)
- Han Lu
- State Key Laboratory of Fine Chemicals, School of Chemical Engineering, Dalian University of Technology, Dalian 116024, China
| | - Xiaohui Kang
- College of Pharmacy, Dalian Medical University, Dalian 116044, China
| | - Yi Luo
- State Key Laboratory of Fine Chemicals, School of Chemical Engineering, Dalian University of Technology, Dalian 116024, China.,PetroChina Petrochemical Research Institute, Beijing 102206, China
| |
Collapse
|
36
|
Sowndarya S V S, St John PC, Paton RS. A quantitative metric for organic radical stability and persistence using thermodynamic and kinetic features. Chem Sci 2021; 12:13158-13166. [PMID: 34745547 PMCID: PMC8514092 DOI: 10.1039/d1sc02770k] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Accepted: 09/03/2021] [Indexed: 01/04/2023] Open
Abstract
Long-lived organic radicals are promising candidates for the development of high-performance energy solutions such as organic redox batteries, transistors, and light-emitting diodes. However, “stable” organic radicals that remain unreactive for an extended time and that can be stored and handled under ambient conditions are rare. A necessary but not sufficient condition for organic radical stability is the presence of thermodynamic stabilization, such as conjugation with an adjacent π-bond or lone-pair, or hyperconjugation with a σ-bond. However, thermodynamic factors alone do not result in radicals with extended lifetimes: many resonance-stabilized radicals are transient species that exist for less than a millisecond. Kinetic stabilization is also necessary for persistence, such as steric effects that inhibit radical dimerization or reaction with solvent molecules. We describe a quantitative approach to map organic radical stability, using molecular descriptors intended to capture thermodynamic and kinetic considerations. The comparison of an extensive dataset of quantum chemical calculations of organic radicals with experimentally-known stable radical species reveals a region of this feature space where long-lived radicals are located. These descriptors, based upon maximum spin density and buried volume, are combined into a single metric, the radical stability score, that outperforms thermodynamic scales based on bond dissociation enthalpies in identifying remarkably long-lived radicals. This provides an objective and accessible metric for use in future molecular design and optimization campaigns. We demonstrate this approach in identifying Pareto-optimal candidates for stable organic radicals. Molecular descriptors encoding kinetic and thermodynamic stabilization capture the difference between transient and persistent organic radicals.![]()
Collapse
Affiliation(s)
- Shree Sowndarya S V
- Department of Chemistry, Colorado State University Fort Collins CO 80523 USA
| | - Peter C St John
- Biosciences Center, National Renewable Energy Laboratory Golden CO 80401 USA
| | - Robert S Paton
- Department of Chemistry, Colorado State University Fort Collins CO 80523 USA
| |
Collapse
|
37
|
Tantillo DJ, Laconsay CJ. Melding of Experiment and Theory Illuminates Mechanisms of Metal-Catalyzed Rearrangements: Computational Approaches and Caveats. SYNTHESIS-STUTTGART 2021. [DOI: 10.1055/s-0040-1720451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
AbstractThis review summarizes approaches and caveats in computational modeling of transition-metal-catalyzed sigmatropic rearrangements involving carbene transfer. We highlight contemporary examples of combined synthetic and theoretical investigations that showcase the synergy achievable by integrating experiment and theory.1 Introduction2 Mechanistic Models3 Theoretical Approaches and Caveats3.1 Recommended Computational Tools3.2 Choice of Functional and Basis Set3.3 Conformations and Ligand-Binding Modes3.4 Solvation4 Synergy of Experiment and Theory – Case Studies4.1 Metal-Bound or Free Ylides?4.2 Conformations and Ligand-Binding Modes of Paddlewheel Complexes4.3 No Metal, Just Light4.4 How To ‘Cope’ with Nonstatistical Dynamic Effects5 Outlook
Collapse
|
38
|
Guan Y, Shree Sowndarya SV, Gallegos LC, St John PC, Paton RS. Real-time prediction of 1H and 13C chemical shifts with DFT accuracy using a 3D graph neural network. Chem Sci 2021; 12:12012-12026. [PMID: 34667567 PMCID: PMC8457395 DOI: 10.1039/d1sc03343c] [Citation(s) in RCA: 46] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2021] [Accepted: 07/19/2021] [Indexed: 11/23/2022] Open
Abstract
Nuclear magnetic resonance (NMR) is one of the primary techniques used to elucidate the chemical structure, bonding, stereochemistry, and conformation of organic compounds. The distinct chemical shifts in an NMR spectrum depend upon each atom's local chemical environment and are influenced by both through-bond and through-space interactions with other atoms and functional groups. The in silico prediction of NMR chemical shifts using quantum mechanical (QM) calculations is now commonplace in aiding organic structural assignment since spectra can be computed for several candidate structures and then compared with experimental values to find the best possible match. However, the computational demands of calculating multiple structural- and stereo-isomers, each of which may typically exist as an ensemble of rapidly-interconverting conformations, are expensive. Additionally, the QM predictions themselves may lack sufficient accuracy to identify a correct structure. In this work, we address both of these shortcomings by developing a rapid machine learning (ML) protocol to predict 1H and 13C chemical shifts through an efficient graph neural network (GNN) using 3D structures as input. Transfer learning with experimental data is used to improve the final prediction accuracy of a model trained using QM calculations. When tested on the CHESHIRE dataset, the proposed model predicts observed 13C chemical shifts with comparable accuracy to the best-performing DFT functionals (1.5 ppm) in around 1/6000 of the CPU time. An automated prediction webserver and graphical interface are accessible online at http://nova.chem.colostate.edu/cascade/. We further demonstrate the model in three applications: first, we use the model to decide the correct organic structure from candidates through experimental spectra, including complex stereoisomers; second, we automatically detect and revise incorrect chemical shift assignments in a popular NMR database, the NMRShiftDB; and third, we use NMR chemical shifts as descriptors for determination of the sites of electrophilic aromatic substitution. From quantum chemical and experimental NMR data, a 3D graph neural network, CASCADE, has been developed to predict carbon and proton chemical shifts. Stereoisomers and conformers of organic molecules can be correctly distinguished.![]()
Collapse
Affiliation(s)
- Yanfei Guan
- Department of Chemistry, Colorado State University Fort Collins CO 80523 USA
| | - S V Shree Sowndarya
- Department of Chemistry, Colorado State University Fort Collins CO 80523 USA
| | - Liliana C Gallegos
- Department of Chemistry, Colorado State University Fort Collins CO 80523 USA
| | - Peter C St John
- Biosciences Center, National Renewable Energy Laboratory Golden CO 80401 USA
| | - Robert S Paton
- Department of Chemistry, Colorado State University Fort Collins CO 80523 USA
| |
Collapse
|
39
|
Shoja A, Zhai J, Reid JP. Comprehensive Stereochemical Models for Selectivity Prediction in Diverse Chiral Phosphate-Catalyzed Reaction Space. ACS Catal 2021. [DOI: 10.1021/acscatal.1c03520] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Affiliation(s)
- Ali Shoja
- Department of Chemistry, University of British Columbia, 2036 Main Mall, Vancouver, British Columbia V6T 1Z1, Canada
| | - Jianyu Zhai
- Department of Chemistry, University of British Columbia, 2036 Main Mall, Vancouver, British Columbia V6T 1Z1, Canada
| | - Jolene P. Reid
- Department of Chemistry, University of British Columbia, 2036 Main Mall, Vancouver, British Columbia V6T 1Z1, Canada
| |
Collapse
|
40
|
Affiliation(s)
- Agustí Lledós
- Departament de Química Universitat Autònoma de Barcelona Campus UAB 08193 Cerdanyola del Vallès Catalonia Spain
| |
Collapse
|