1
|
Tavakoli M, Chiu YTT, Carlton AM, Van Vranken D, Baldi P. Chemically Informed Deep Learning for Interpretable Radical Reaction Prediction. J Chem Inf Model 2025; 65:1228-1242. [PMID: 39871741 PMCID: PMC11815866 DOI: 10.1021/acs.jcim.4c01901] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2024] [Revised: 01/14/2025] [Accepted: 01/15/2025] [Indexed: 01/29/2025]
Abstract
Organic radical reactions are crucial in many areas of chemistry, including synthetic, biological, and atmospheric chemistry. We develop a predictive framework based on the interaction of molecular orbitals that operates on mechanistic-level radical reactions. Given our chemistry-aware model, all predictions are provided with different levels of interpretability. Our models are trained and evaluated using the RMechDB database of radical reaction steps. Our model predicts the correct orbital interaction and products for 96% of the test reactions in RMechDB. By chaining these predictions, we perform a pathway search capable of identifying all intermediates and byproducts of a radical reaction. We test the pathway search on two classes of problems in atmospheric and polymerization chemistry. RMechRP is publicly available online at https://deeprxn.ics.uci.edu/rmechrp/.
Collapse
Affiliation(s)
- Mohammadamin Tavakoli
- Department
of Computer Science, University of California,
Irvine, Irvine, California 92697, United States
| | - Yin Ting T. Chiu
- Department
of Chemistry, University of California,
Irvine, Irvine, California 92697, United States
| | - Ann Marie Carlton
- Department
of Chemistry, University of California,
Irvine, Irvine, California 92697, United States
| | - David Van Vranken
- Department
of Chemistry, University of California,
Irvine, Irvine, California 92697, United States
| | - Pierre Baldi
- Department
of Computer Science, University of California,
Irvine, Irvine, California 92697, United States
| |
Collapse
|
2
|
Roszak R, Gadina L, Wołos A, Makkawi A, Mikulak-Klucznik B, Bilgi Y, Molga K, Gołębiowska P, Popik O, Klucznik T, Szymkuć S, Moskal M, Baś S, Frydrych R, Mlynarski J, Vakuliuk O, Gryko DT, Grzybowski BA. Systematic, computational discovery of multicomponent and one-pot reactions. Nat Commun 2024; 15:10285. [PMID: 39604395 PMCID: PMC11603032 DOI: 10.1038/s41467-024-54611-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2024] [Accepted: 11/18/2024] [Indexed: 11/29/2024] Open
Abstract
Discovery of new types of reactions is essential to organic chemistry because it expands the scope of accessible molecular scaffolds and can enable more economical syntheses of existing structures. In this context, the so-called multicomponent reactions, MCRs, are of particular interest because they can build complex scaffolds from multiple starting materials in just one step, without purification of intermediates. However, for over a century of active research, MCRs have been discovered rather than designed, and their number remains limited to only several hundred. This work demonstrates that computers taught the essential knowledge of reaction mechanisms and rules of physical-organic chemistry can design - completely autonomously and in large numbers - mechanistically distinct MCRs. Moreover, when supplemented by models to approximate kinetic rates, the algorithm can predict reaction yields and identify reactions that have potential for organocatalysis. These predictions are validated by experiments spanning different modes of reactivity and diverse product scaffolds.
Collapse
Affiliation(s)
| | - Louis Gadina
- Institute of Organic Chemistry, Polish Academy of Sciences, Warsaw, Poland
- Center for Algorithmic and Robotized Synthesis (CARS), Institute for Basic Science (IBS), Ulsan, 44919, Republic of Korea
| | | | - Ahmad Makkawi
- Institute of Organic Chemistry, Polish Academy of Sciences, Warsaw, Poland
| | | | - Yasemin Bilgi
- Institute of Organic Chemistry, Polish Academy of Sciences, Warsaw, Poland
- Center for Algorithmic and Robotized Synthesis (CARS), Institute for Basic Science (IBS), Ulsan, 44919, Republic of Korea
| | - Karol Molga
- Allchemy Inc., Highland, IN, USA
- Institute of Organic Chemistry, Polish Academy of Sciences, Warsaw, Poland
| | | | - Oskar Popik
- Institute of Organic Chemistry, Polish Academy of Sciences, Warsaw, Poland
| | | | | | | | - Sebastian Baś
- Institute of Organic Chemistry, Polish Academy of Sciences, Warsaw, Poland
- Jagiellonian University, Krakow, Poland
| | - Rafał Frydrych
- Institute of Organic Chemistry, Polish Academy of Sciences, Warsaw, Poland
- Center for Algorithmic and Robotized Synthesis (CARS), Institute for Basic Science (IBS), Ulsan, 44919, Republic of Korea
| | - Jacek Mlynarski
- Institute of Organic Chemistry, Polish Academy of Sciences, Warsaw, Poland
| | - Olena Vakuliuk
- Institute of Organic Chemistry, Polish Academy of Sciences, Warsaw, Poland
| | - Daniel T Gryko
- Institute of Organic Chemistry, Polish Academy of Sciences, Warsaw, Poland.
| | - Bartosz A Grzybowski
- Institute of Organic Chemistry, Polish Academy of Sciences, Warsaw, Poland.
- Center for Algorithmic and Robotized Synthesis (CARS), Institute for Basic Science (IBS), Ulsan, 44919, Republic of Korea.
- Department of Chemistry, Ulsan Institute of Science and Technology, UNIST, Ulsan, 44919, Republic of Korea.
| |
Collapse
|
3
|
Szymkuć S, Wołos A, Roszak R, Grzybowski BA. Estimation of multicomponent reactions' yields from networks of mechanistic steps. Nat Commun 2024; 15:10286. [PMID: 39604372 PMCID: PMC11603315 DOI: 10.1038/s41467-024-54550-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Accepted: 11/14/2024] [Indexed: 11/29/2024] Open
Abstract
This work describes estimation of yields of complex, multicomponent reactions (MCRs) based on the modeled networks of mechanistic steps spanning both the main reaction pathway as well as immediate and downstream side reactions. Because experimental values of the kinetic rate constants for individual mechanistic transforms are extremely sparse, these constants are approximated here using Mayr's nucleophilicity and electrophilicity parameters fine-tuned by correction terms grounded in linear free-energy relationships. With this formalism, the model trained on the mechanistic networks of only 20 - but mechanistically- and yield-diverse MCRs - transfers well to newly discovered MCRs that are based on markedly different mechanisms and types of individual mechanistic transforms. These results suggest that mechanistic-level approach to yield estimation may be a useful alternative to models that are derived from full-reaction data and lack information about yield-lowering side reactions.
Collapse
Affiliation(s)
| | - Agnieszka Wołos
- Allchemy, Inc., Highland, IN, USA
- Institute of Organic Chemistry, Polish Academy of Sciences, Warsaw, Poland
| | - Rafał Roszak
- Allchemy, Inc., Highland, IN, USA.
- Institute of Organic Chemistry, Polish Academy of Sciences, Warsaw, Poland.
| | - Bartosz A Grzybowski
- Institute of Organic Chemistry, Polish Academy of Sciences, Warsaw, Poland.
- Center for Algorithmic and Robotized Synthesis (CARS), Institute for Basic Science (IBS), Ulsan, Republic of Korea.
- Department of Chemistry, Ulsan Institute of Science and Technology, UNIST, Ulsan, Republic of Korea.
| |
Collapse
|
4
|
Liu Y, Mo Y, Cheng Y. Uncertainty Qualification for Deep Learning-Based Elementary Reaction Property Prediction. J Chem Inf Model 2024; 64:8131-8141. [PMID: 39441973 DOI: 10.1021/acs.jcim.4c01358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2024]
Abstract
The prediction of the thermodynamic and kinetic properties of elementary reactions has shown rapid improvement due to the implementation of deep learning (DL) methods. While various studies have reported the success in predicting reaction properties, the quantification of prediction uncertainty has seldom been investigated, thus compromising the confidence in using these predicted properties in practical applications. Here, we integrated graph convolutional neural networks (GCNN) with three uncertainty prediction techniques, including deep ensemble, Monte Carlo (MC)-dropout, and evidential learning, to provide insights into the uncertainty quantification and utility. The deep ensemble model outperforms others in accuracy and shows the highest reliability in estimating prediction uncertainty across all elementary reaction property data sets. We also verified that the deep ensemble model showed a satisfactory capability in recognizing epistemic and aleatoric uncertainties. Additionally, we adopted a Monte Carlo Tree Search method for extracting the explainable reaction substructures, providing a chemical explanation for DL predicted properties and corresponding uncertainties. Finally, to demonstrate the utility of uncertainty qualification in practical applications, we performed an uncertainty-guided calibration of the DL-constructed kinetic model, which achieved a 25% higher hit ratio in identifying dominant reaction pathways compared to that of the calibration without uncertainty guidance.
Collapse
Affiliation(s)
- Yan Liu
- College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, China
- Key Laboratory of Biomass Chemical Engineering of Ministry of Education, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, China
| | - Yiming Mo
- College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, China
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University, Hangzhou 311215, China
| | - Youwei Cheng
- College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, China
- Key Laboratory of Biomass Chemical Engineering of Ministry of Education, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, China
- Zhejiang Hengyi Petrochemical Research Institute Co., Ltd., Hangzhou 311215, China
| |
Collapse
|
5
|
Renningholtz T, Lim ERX, James MJ, Trujillo C. Computational methods for investigating organic radical species. Org Biomol Chem 2024; 22:6166-6173. [PMID: 39012651 DOI: 10.1039/d4ob00532e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/17/2024]
Abstract
Computational analysis of organic radical species presents significant challenges. This study compares the efficacy of various DFT and wavefunction methods in predicting radical stabilisation energies, bond dissociation energies, and redox potentials for organic radicals. The hybrid meta-GGA M062X-D3(0), and the range-separated hybrids ωB97M-V and ωB97M-D3(BJ) emerged as the most reliable functionals, consistently providing accurate predictions across different basis sets including 6-311G**, cc-pVTZ, and def2-TZVP.
Collapse
Affiliation(s)
- Tim Renningholtz
- The University of Manchester, Oxford Road, Manchester, M13 9PL, UK.
| | - Ethan R X Lim
- The University of Manchester, Oxford Road, Manchester, M13 9PL, UK.
| | - Michael J James
- The University of Manchester, Oxford Road, Manchester, M13 9PL, UK.
| | - Cristina Trujillo
- The University of Manchester, Oxford Road, Manchester, M13 9PL, UK.
- TBSI - School of Chemistry, The University of Dublin, Trinity College, D02 R590 Dublin 2, Ireland
| |
Collapse
|
6
|
Tavakoli M, Miller RJ, Angel MC, Pfeiffer MA, Gutman ES, Mood AD, Van Vranken D, Baldi P. PMechDB: A Public Database of Elementary Polar Reaction Steps. J Chem Inf Model 2024; 64:1975-1983. [PMID: 38483315 PMCID: PMC10966657 DOI: 10.1021/acs.jcim.3c01810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 02/15/2024] [Accepted: 02/16/2024] [Indexed: 03/26/2024]
Abstract
Most online chemical reaction databases are not publicly accessible or are fully downloadable. These databases tend to contain reactions in noncanonicalized formats and often lack comprehensive information regarding reaction pathways, intermediates, and byproducts. Within the few publicly available databases, reactions are typically stored in the form of unbalanced, overall transformations with minimal interpretability of the underlying chemistry. These limitations present significant obstacles to data-driven applications including the development of machine learning models. As an effort to overcome these challenges, we introduce PMechDB, a publicly accessible platform designed to curate, aggregate, and share polar chemical reaction data in the form of elementary reaction steps. Our initial version of PMechDB consists of over 100,000 such steps. In the PMechDB, all reactions are stored as canonicalized and balanced elementary steps, featuring accurate atom mapping and arrow-pushing mechanisms. As an online interactive database, PMechDB provides multiple interfaces that enable users to search, download, and upload chemical reactions. We anticipate that the public availability of PMechDB and its standardized data representation will prove beneficial for chemoinformatics research and education and the development of data-driven, interpretable models for predicting reactions and pathways. PMechDB platform is accessible online at https://deeprxn.ics.uci.edu/pmechdb.
Collapse
Affiliation(s)
- Mohammadamin Tavakoli
- Department
of Computer Science, University of California,
Irvine, Irvine, California 92697, United States
| | - Ryan J. Miller
- Department
of Computer Science, University of California,
Irvine, Irvine, California 92697, United States
| | - Mirana Claire Angel
- Department
of Computer Science, University of California,
Irvine, Irvine, California 92697, United States
| | - Michael A. Pfeiffer
- Department
of Chemistry, University of California,
Irvine, Irvine, California 92697, United States
| | - Eugene S. Gutman
- Department
of Chemistry, University of California,
Irvine, Irvine, California 92697, United States
| | - Aaron D. Mood
- Department
of Chemistry, University of California,
Irvine, Irvine, California 92697, United States
| | - David Van Vranken
- Department
of Chemistry, University of California,
Irvine, Irvine, California 92697, United States
| | - Pierre Baldi
- Department
of Computer Science, University of California,
Irvine, Irvine, California 92697, United States
| |
Collapse
|