1
|
Souza LW, Ricke ND, Chaffin BC, Fortunato ME, Jiang S, Soylu C, Caya TC, Lau SH, Wieser KA, Doyle AG, Tan KL. Applying Active Learning toward Building a Generalizable Model for Ni-Photoredox Cross-Electrophile Coupling of Aryl and Alkyl Bromides. J Am Chem Soc 2025. [PMID: 40401689 DOI: 10.1021/jacs.5c02218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/23/2025]
Abstract
When developing machine learning models for yield prediction, the two main challenges are effectively exploring condition space and substrate space. In this article, we disclose an approach for mapping the substrate space for Ni/photoredox-catalyzed cross-electrophile coupling of alkyl bromides and aryl bromides in a high-throughput experimentation (HTE) context. This model employs active learning (in particular, uncertainty querying) as a strategy to rapidly construct a yield model. Given the vastness of substrate space, we focused on an approach that builds an initial model and then uses a minimal data set to expand into new chemical spaces. In particular, we built a model for a virtual space of 22,240 compounds using less than 400 data points. We demonstrated that the model can be expanded to 33,312 compounds by adding information around 24 building blocks (<100 additional reactions). Comparing the active learning-based model to one constructed on randomly selected data showed that the active learning model was significantly better at predicting which reactions will be successful. A combination of density function theory (DFT) and difference Morgan fingerprints was employed to construct the random forest model. Feature importance analysis indicates that key DFT features that are related to the reaction mechanism (e.g., alkyl radical LUMO energy) were crucial for model performance and predictions on aryl bromides outside the training set. We anticipate that combining DFT featurization and uncertainty-based querying will help the synthetic organic community build predictive models in a data-efficient manner for other chemical reactions that feature large and diverse scopes.
Collapse
Affiliation(s)
- Lucas W Souza
- Global Discovery Chemistry, Novartis, Cambridge, Massachusetts 02139, United States
| | - Nathan D Ricke
- Global Discovery Chemistry, Novartis, Cambridge, Massachusetts 02139, United States
| | - Braden C Chaffin
- Department of Chemistry & Biochemistry, University of California, Los Angeles, California 90095, United States
| | - Mike E Fortunato
- Global Discovery Chemistry, Novartis, Cambridge, Massachusetts 02139, United States
| | - Shutian Jiang
- Department of Chemistry & Biochemistry, University of California, Los Angeles, California 90095, United States
| | - Cihan Soylu
- Global Discovery Chemistry, Novartis, Cambridge, Massachusetts 02139, United States
| | - Thomas C Caya
- Global Discovery Chemistry, Novartis, Cambridge, Massachusetts 02139, United States
| | - Sii Hong Lau
- Global Discovery Chemistry, Novartis, Cambridge, Massachusetts 02139, United States
| | - Katherine A Wieser
- Global Discovery Chemistry, Novartis, Cambridge, Massachusetts 02139, United States
| | - Abigail G Doyle
- Department of Chemistry & Biochemistry, University of California, Los Angeles, California 90095, United States
| | - Kian L Tan
- Global Discovery Chemistry, Novartis, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
2
|
Song W, Sun H. Local reaction condition optimization via machine learning. J Mol Model 2025; 31:143. [PMID: 40266356 DOI: 10.1007/s00894-025-06365-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2025] [Accepted: 03/31/2025] [Indexed: 04/24/2025]
Abstract
CONTEXT Reaction condition optimization addresses shared requirements across academia and industry, particularly in chemistry, pharmaceutical development, and fine chemical engineering. This review examines recent progress and persistent challenges in machine learning-guided optimization of localized reaction conditions, with an emphasis on three core aspects: dataset, condition representation, and optimization methods, as well as the main issues in each related stage. The review explores challenges such as dataset scarcity, data quality, and the "completeness trap" in dataset preparation stage, summarizes the limitations of current molecular representation techniques in condition representation stage, and discusses the search efficiency challenges of optimization methods in optimization stage. METHODS The review analyzes the molecular representation techniques and identifies them as the primary bottleneck in advancing localized reaction condition optimization. It further examines existing optimization methodologies. Among them, Bayesian optimization and active learning emerges as the most commonly applied approaches in this field, utilizing incremental learning mechanisms and human-in-the-loop strategies to minimize experimental data requirements while mitigating molecular representation limitations. The review concludes that advancements in molecular representation techniques are essential for developing more efficient optimization methods in the future.
Collapse
Affiliation(s)
- Wenhuan Song
- School of Mechanical, Electrical & Information Engineering, Shandong University, Weihai, 264209, China.
| | - Honggang Sun
- School of Mechanical, Electrical & Information Engineering, Shandong University, Weihai, 264209, China
| |
Collapse
|
3
|
Noto N, Kunisada R, Rohlfs T, Hayashi M, Kojima R, García Mancheño O, Yanai T, Saito S. Transfer learning across different photocatalytic organic reactions. Nat Commun 2025; 16:3388. [PMID: 40204731 PMCID: PMC11982376 DOI: 10.1038/s41467-025-58687-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2024] [Accepted: 03/31/2025] [Indexed: 04/11/2025] Open
Abstract
While seasoned organic chemists can often predict suitable catalysts for new reactions based on their past experiences in other catalytic reactions, developing this ability is costly, laborious and time-consuming. Therefore, replicating this remarkable expertize of human researchers through machine learning (ML) is compelling, albeit that it remains highly challenging. Herein, we apply a domain-adaptation-based transfer-learning (TL) approach to photocatalysis. Despite being different reaction types, the knowledge of the catalytic behavior of organic photosensitizers (OPSs) from photocatalytic cross-coupling reactions is successfully transferred to ML for a [2+2] cycloaddition reaction, improving the prediction of the photocatalytic activity compared with conventional ML approaches. Furthermore, a satisfactory predictive performance is achieved by using only ten training data points. This experimentally readily accessible small dataset can also be used to identify effective OPSs for alkene photoisomerization, thereby showcasing the potential benefits of TL in catalyst exploration.
Collapse
Affiliation(s)
- Naoki Noto
- Integrated Research Consortium on Chemical Sciences (IRCCS), Nagoya University, Nagoya, Japan.
| | - Ryuga Kunisada
- Graduate School of Science, Nagoya University, Nagoya, Japan
| | - Tabea Rohlfs
- Organic Chemistry Institute, University of Münster, Münster, Germany
| | - Manami Hayashi
- Graduate School of Science, Nagoya University, Nagoya, Japan
| | - Ryosuke Kojima
- Department of Biomedical Data Intelligence, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | | | - Takeshi Yanai
- Graduate School of Science, Nagoya University, Nagoya, Japan
- Institute of Transformative Bio-Molecules (WPI-ITbM), Nagoya University, Nagoya, Japan
| | - Susumu Saito
- Integrated Research Consortium on Chemical Sciences (IRCCS), Nagoya University, Nagoya, Japan.
- Graduate School of Science, Nagoya University, Nagoya, Japan.
| |
Collapse
|
4
|
Schleinitz J, Carretero-Cerdán A, Gurajapu A, Harnik Y, Lee G, Pandey A, Milo A, Reisman SE. Designing Target-specific Data Sets for Regioselectivity Predictions on Complex Substrates. J Am Chem Soc 2025; 147:7476-7484. [PMID: 39982221 PMCID: PMC11887056 DOI: 10.1021/jacs.4c15902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2024] [Revised: 02/05/2025] [Accepted: 02/06/2025] [Indexed: 02/22/2025]
Abstract
The development of machine learning models to predict the regioselectivity of C(sp3)-H functionalization reactions is reported. A data set for dioxirane oxidations was curated from the literature and used to generate a model to predict the regioselectivity of C-H oxidation. To assess whether smaller, intentionally designed data sets could provide accuracy on complex targets, a series of acquisition functions were developed to select the most informative molecules for the specific target. Active learning-based acquisition functions that leverage predicted reactivity and model uncertainty were found to outperform those based on molecular and site similarity alone. The use of acquisition functions for data set elaboration significantly reduced the number of data points needed to perform accurate prediction, and it was found that smaller, machine-designed data sets can give accurate predictions when larger, randomly selected data sets fail. Finally, the workflow was experimentally validated on five complex substrates and shown to be applicable to predicting the regioselectivity of arene C-H radical borylation. These studies provide a quantitative alternative to the intuitive extrapolation from "model substrates" that is frequently used to estimate reactivity on complex molecules.
Collapse
Affiliation(s)
- Jules Schleinitz
- The
Warren and Katharine Schlinger Laboratory for Chemistry and Chemical
Engineering, Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, United States
| | - Alba Carretero-Cerdán
- The
Warren and Katharine Schlinger Laboratory for Chemistry and Chemical
Engineering, Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, United States
- Division
of Theoretical Chemistry & Biology, CBH School, KTH Royal Institute of Technology, Teknikringen 30, S-10044 Stockholm, Sweden
| | - Anjali Gurajapu
- The
Warren and Katharine Schlinger Laboratory for Chemistry and Chemical
Engineering, Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, United States
| | - Yonatan Harnik
- Department
of Chemistry, Ben-Gurion University of the
Negev, Beer-Sheva 841051, Israel
| | - Gina Lee
- The
Warren and Katharine Schlinger Laboratory for Chemistry and Chemical
Engineering, Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, United States
| | - Amitesh Pandey
- The
Warren and Katharine Schlinger Laboratory for Chemistry and Chemical
Engineering, Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, United States
| | - Anat Milo
- Department
of Chemistry, Ben-Gurion University of the
Negev, Beer-Sheva 841051, Israel
| | - Sarah E. Reisman
- The
Warren and Katharine Schlinger Laboratory for Chemistry and Chemical
Engineering, Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, United States
| |
Collapse
|
5
|
Shim E, Tewari A, Cernak T, Zimmerman PM. Recommending reaction conditions with label ranking. Chem Sci 2025; 16:4109-4118. [PMID: 39906388 PMCID: PMC11788591 DOI: 10.1039/d4sc06728b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2024] [Accepted: 01/24/2025] [Indexed: 02/06/2025] Open
Abstract
Pinpointing effective reaction conditions can be challenging, even for reactions with significant precedent. Herein, models that rank reaction conditions are introduced as a conceptually new means for prioritizing experiments, distinct from the mainstream approach of yield regression. Specifically, label ranking, which operates using input features only from substrates, will be shown to better generalize to new substrates than prior models. Evaluation on practical reaction condition selection scenarios - choosing from either 4 or 18 conditions and datasets with or without missing reactions - demonstrates label ranking's utility. Ranking aggregation through Borda's method and relative simplicity are key features of label ranking to achieve consistent high performance.
Collapse
Affiliation(s)
- Eunjae Shim
- Department of Chemistry, University of Michigan Ann Arbor MI USA
| | - Ambuj Tewari
- Department of Statistics, University of Michigan Ann Arbor MI USA
- Department of Electrical Engineering and Computer Science, University of Michigan Ann Arbor MI USA
| | - Tim Cernak
- Department of Chemistry, University of Michigan Ann Arbor MI USA
- Department of Medicinal Chemistry, University of Michigan Ann Arbor MI USA
| | - Paul M Zimmerman
- Department of Chemistry, University of Michigan Ann Arbor MI USA
| |
Collapse
|
6
|
Hua PX, Huang Z, Xu ZY, Zhao Q, Ye CY, Wang YF, Xu YH, Fu Y, Ding H. An active representation learning method for reaction yield prediction with small-scale data. Commun Chem 2025; 8:42. [PMID: 39929993 PMCID: PMC11811124 DOI: 10.1038/s42004-025-01434-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2024] [Accepted: 01/27/2025] [Indexed: 02/13/2025] Open
Abstract
Reaction optimization plays an essential role in chemical research and industrial production. To explore a large reaction system, a practical issue is how to reduce the heavy experimental load for finding the high-yield conditions. In this paper, we present an efficient machine learning tool called "RS-Coreset", where the key idea is to take advantage of deep representation learning techniques to guide an interactive procedure for representing the full reaction space. Our proposed tool only uses small-scale data, say 2.5% to 5% of the instances, to predict the yields of the reaction space. We validate the performance on three public datasets and achieve state-of-the-art results. Moreover, we apply this tool to assist the realistic exploration of the Lewis base-boryl radicals enabled dechlorinative coupling reactions in our lab. The tool can help us to effectively predict the yields and even discover several feasible reaction combinations that were overlooked in previous articles.
Collapse
Affiliation(s)
- Peng-Xiang Hua
- School of Computer Science and Technology, University of Science and Technology of China, Hefei, Anhui, 230026, China
| | - Zhen Huang
- School of Computer Science and Technology, University of Science and Technology of China, Hefei, Anhui, 230026, China
| | - Zhe-Yuan Xu
- Key Laboratory of Precision and Intelligent Chemistry, CAS Key Laboratory of Urban Pollutant Conversion, Anhui Province Key Laboratory of Biomass Clean Energy, Department of Chemistry, University of Science and Technology of China, Hefei, Anhui, 230026, China
| | - Qiang Zhao
- Key Laboratory of Precision and Intelligent Chemistry, CAS Key Laboratory of Urban Pollutant Conversion, Anhui Province Key Laboratory of Biomass Clean Energy, Department of Chemistry, University of Science and Technology of China, Hefei, Anhui, 230026, China
| | - Chen-Yang Ye
- Key Laboratory of Precision and Intelligent Chemistry, CAS Key Laboratory of Urban Pollutant Conversion, Anhui Province Key Laboratory of Biomass Clean Energy, Department of Chemistry, University of Science and Technology of China, Hefei, Anhui, 230026, China
| | - Yi-Feng Wang
- Key Laboratory of Precision and Intelligent Chemistry, CAS Key Laboratory of Urban Pollutant Conversion, Anhui Province Key Laboratory of Biomass Clean Energy, Department of Chemistry, University of Science and Technology of China, Hefei, Anhui, 230026, China.
| | - Yun-He Xu
- Key Laboratory of Precision and Intelligent Chemistry, CAS Key Laboratory of Urban Pollutant Conversion, Anhui Province Key Laboratory of Biomass Clean Energy, Department of Chemistry, University of Science and Technology of China, Hefei, Anhui, 230026, China.
| | - Yao Fu
- Key Laboratory of Precision and Intelligent Chemistry, CAS Key Laboratory of Urban Pollutant Conversion, Anhui Province Key Laboratory of Biomass Clean Energy, Department of Chemistry, University of Science and Technology of China, Hefei, Anhui, 230026, China.
| | - Hu Ding
- School of Computer Science and Technology, University of Science and Technology of China, Hefei, Anhui, 230026, China.
| |
Collapse
|
7
|
Nakamura S, Yasuo N, Sekijima M. Molecular optimization using a conditional transformer for reaction-aware compound exploration with reinforcement learning. Commun Chem 2025; 8:40. [PMID: 39922979 PMCID: PMC11807120 DOI: 10.1038/s42004-025-01437-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2024] [Accepted: 01/28/2025] [Indexed: 02/10/2025] Open
Abstract
Designing molecules with desirable properties is a critical endeavor in drug discovery. Because of recent advances in deep learning, molecular generative models have been developed. However, the existing compound exploration models often disregard the important issue of ensuring the feasibility of organic synthesis. To address this issue, we propose TRACER, which is a framework that integrates the optimization of molecular property optimization with synthetic pathway generation. The model can predict the product derived from a given reactant via a conditional transformer under the constraints of a reaction type. The molecular optimization results of an activity prediction model targeting DRD2, AKT1, and CXCR4 revealed that TRACER effectively generated compounds with high scores. The transformer model, which recognizes the entire structures, captures the complexity of the organic synthesis and enables its navigation in a vast chemical space while considering real-world reactivity constraints.
Collapse
Affiliation(s)
- Shogo Nakamura
- Department of Life Science and Technology, Institute of Science Tokyo, 4259-J3-23, Nagatsuta-cho, Midori-ku, Yokohama, 226-8501, Kanagawa, Japan
| | - Nobuaki Yasuo
- Academy for Convergence of Materials and Informatics (TAC-MI), Institute of Science Tokyo, S6-23, Ookayama, Meguro-ku, 152-8550, Tokyo, Japan
| | - Masakazu Sekijima
- Department Computer Science, Institute of Science Tokyo, 4259-J3-23, Nagatsuta-cho, Midori-ku, Yokohama, 226-8501, Kanagawa, Japan.
| |
Collapse
|
8
|
Yu M, Jia Q, Wang Q, Luo ZH, Yan F, Zhou YN. Data science-centric design, discovery, and evaluation of novel synthetically accessible polyimides with desired dielectric constants. Chem Sci 2024:d4sc05000b. [PMID: 39416299 PMCID: PMC11474456 DOI: 10.1039/d4sc05000b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Accepted: 10/01/2024] [Indexed: 10/19/2024] Open
Abstract
Rapidly advancing computer technology has demonstrated great potential in recent years to assist in the generation and discovery of promising molecular structures. Herein, we present a data science-centric "Design-Discovery-Evaluation" scheme for exploring novel polyimides (PIs) with desired dielectric constants (ε). A virtual library of over 100 000 synthetically accessible PIs is created by extending existing PIs. Within the framework of quantitative structure-property relationship (QSPR), a model sufficient to predict ε at multiple frequencies is developed with an R 2 of 0.9768, allowing further high-throughput screening of the prior structures with desired ε. Furthermore, the structural feature representation method of atomic adjacent group (AAG) is introduced, using which the reliability of high-throughput screening results is evaluated. This workflow identifies 9 novel PIs (ε >5 at 103 Hz and glass transition temperatures between 250 °C and 350 °C) with potential applications in high-temperature capacitive energy storage, and confirms these promising findings by high-fidelity molecular dynamics (MD) simulations.
Collapse
Affiliation(s)
- Mengxian Yu
- School of Chemical Engineering and Material Science, Tianjin University of Science and Technology Tianjin 300457 P. R. China
| | - Qingzhu Jia
- School of Chemical Engineering and Material Science, Tianjin University of Science and Technology Tianjin 300457 P. R. China
| | - Qiang Wang
- School of Chemical Engineering and Material Science, Tianjin University of Science and Technology Tianjin 300457 P. R. China
| | - Zheng-Hong Luo
- Department of Chemical Engineering, School of Chemistry and Chemical Engineering, State Key Laboratory of Metal Matrix Composites, Shanghai Jiao Tong University Shanghai 200240 P. R. China
| | - Fangyou Yan
- School of Chemical Engineering and Material Science, Tianjin University of Science and Technology Tianjin 300457 P. R. China
| | - Yin-Ning Zhou
- Department of Chemical Engineering, School of Chemistry and Chemical Engineering, State Key Laboratory of Metal Matrix Composites, Shanghai Jiao Tong University Shanghai 200240 P. R. China
| |
Collapse
|
9
|
Han Y, Deng M, Liu K, Chen J, Wang Y, Xu YN, Dian L. Computer-Aided Synthesis Planning (CASP) and Machine Learning: Optimizing Chemical Reaction Conditions. Chemistry 2024; 30:e202401626. [PMID: 39083362 DOI: 10.1002/chem.202401626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2024] [Revised: 07/27/2024] [Accepted: 07/28/2024] [Indexed: 08/02/2024]
Abstract
Computer-aided synthesis planning (CASP) has garnered increasing attention in light of recent advancements in machine learning models. While the focus is on reverse synthesis or forward outcome prediction, optimizing reaction conditions remains a significant challenge. For datasets with multiple variables, the choice of descriptors and models is pivotal. This selection dictates the effective extraction of conditional features and the achievement of higher prediction accuracy. This review delineates the origins of data in conditional optimization, the criteria for descriptor selection, the response models, and the metrics for outcome evaluation, aiming to acquaint readers with the latest research trends and facilitate more informed research in this domain.
Collapse
Affiliation(s)
- Yu Han
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, No. 72 Binhai Avenue, Qingdao, 266237, P. R. China
| | - Mingjing Deng
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, No. 72 Binhai Avenue, Qingdao, 266237, P. R. China
| | - Ke Liu
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, No. 72 Binhai Avenue, Qingdao, 266237, P. R. China
| | - Jia Chen
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, No. 72 Binhai Avenue, Qingdao, 266237, P. R. China
| | - Yuting Wang
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, No. 72 Binhai Avenue, Qingdao, 266237, P. R. China
| | - Yu-Ning Xu
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, No. 72 Binhai Avenue, Qingdao, 266237, P. R. China
| | - Longyang Dian
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, No. 72 Binhai Avenue, Qingdao, 266237, P. R. China
- Suzhou Institute of Shandong University, No. 388 Ruoshui Road, Suzhou Industrial Park, Suzhou, 215123, P. R. China
| |
Collapse
|
10
|
Xu J, Ye X, Lv Z, Chen YH, Wang XS. The Role of Base in Reaction Performance of Photochemical Synthesis of Thiazoles: An Integrated Theoretical and Experimental Study. Chemistry 2024; 30:e202304279. [PMID: 38409580 DOI: 10.1002/chem.202304279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 02/25/2024] [Accepted: 02/26/2024] [Indexed: 02/28/2024]
Abstract
Artificial intelligence (AI)/machine learning (ML) is emerging as pivotal in synthetic chemistry, offering revolutionary potential in retrosynthetic analysis, reaction conditions and reaction prediction. We have combined chemical descriptors, primarily based on Density Functional Theory (DFT) calculations, with various AI/ML tools such as Multi-Layer Perceptron (MLP) and Random Forest (RF), to predict the synthesis of 2-arylbenzothiazole in photoredox reactions. Significantly, our models underscore the critical role of the molecular structure and physicochemical characteristics of the base, especially the total atomic polarizabilities, in the rate-determining steps involving cyclohexyl and phenethyl moieties of the substrate. Moreover, we validated our findings in articles through experimental studies. It showcases the power of AI/ML and quantum chemistry in shaping the future of organic chemistry.
Collapse
Affiliation(s)
- Jiaxin Xu
- The Institute for Advanced Studies (IAS), Wuhan University, Wuhan, 430072, China
| | - Xiaoyu Ye
- The Institute for Advanced Studies (IAS), Wuhan University, Wuhan, 430072, China
| | - Zongchao Lv
- The Institute for Advanced Studies (IAS), Wuhan University, Wuhan, 430072, China
- CMC Pharmaceutical Research Center, Wuhan RS Pharmaceutical Co., Ltd., Wuhan, 430073, China
| | - Yi-Hung Chen
- The Institute for Advanced Studies (IAS), Wuhan University, Wuhan, 430072, China
| | - Xiang Simon Wang
- Howard University College of Pharmacy, 2300 Fourth Street NW, Washington, DC 20059, United States
| |
Collapse
|
11
|
Wang JY, Stevens JM, Kariofillis SK, Tom MJ, Golden DL, Li J, Tabora JE, Parasram M, Shields BJ, Primer DN, Hao B, Del Valle D, DiSomma S, Furman A, Zipp GG, Melnikov S, Paulson J, Doyle AG. Identifying general reaction conditions by bandit optimization. Nature 2024; 626:1025-1033. [PMID: 38418912 DOI: 10.1038/s41586-024-07021-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 01/03/2024] [Indexed: 03/02/2024]
Abstract
Reaction conditions that are generally applicable to a wide variety of substrates are highly desired, especially in the pharmaceutical and chemical industries1-6. Although many approaches are available to evaluate the general applicability of developed conditions, a universal approach to efficiently discover these conditions during optimizations is rare. Here we report the design, implementation and application of reinforcement learning bandit optimization models7-10 to identify generally applicable conditions by efficient condition sampling and evaluation of experimental feedback. Performance benchmarking on existing datasets statistically showed high accuracies for identifying general conditions, with up to 31% improvement over baselines that mimic state-of-the-art optimization approaches. A palladium-catalysed imidazole C-H arylation reaction, an aniline amide coupling reaction and a phenol alkylation reaction were investigated experimentally to evaluate use cases and functionalities of the bandit optimization model in practice. In all three cases, the reaction conditions that were most generally applicable yet not well studied for the respective reaction were identified after surveying less than 15% of the expert-designed reaction space.
Collapse
Affiliation(s)
- Jason Y Wang
- Department of Chemistry, Princeton University, Princeton, NJ, USA
- Department of Chemistry and Biochemistry, University of California, Los Angeles, CA, USA
| | - Jason M Stevens
- Chemical Process Development, Bristol Myers Squibb, Summit, NJ, USA
| | - Stavros K Kariofillis
- Department of Chemistry, Princeton University, Princeton, NJ, USA
- Department of Chemistry and Biochemistry, University of California, Los Angeles, CA, USA
- Department of Chemistry, Columbia University, New York, NY, USA
| | - Mai-Jan Tom
- Department of Chemistry and Biochemistry, University of California, Los Angeles, CA, USA
| | - Dung L Golden
- Chemical Process Development, Bristol Myers Squibb, Summit, NJ, USA
| | - Jun Li
- Chemical Process Development, Bristol Myers Squibb, New Brunswick, NJ, USA
| | - Jose E Tabora
- Chemical Process Development, Bristol Myers Squibb, New Brunswick, NJ, USA
| | - Marvin Parasram
- Department of Chemistry, Princeton University, Princeton, NJ, USA
- Department of Chemistry, New York University, New York, NY, USA
| | - Benjamin J Shields
- Department of Chemistry, Princeton University, Princeton, NJ, USA
- Molecular Structure and Design, Bristol Myers Squibb, Cambridge, MA, USA
| | - David N Primer
- Chemical Process Development, Bristol Myers Squibb, Summit, NJ, USA
- Loxo Oncology at Lilly, Louisville, CO, USA
| | - Bo Hao
- Janssen Research and Development, Spring House, PA, USA
| | - David Del Valle
- Chemical Process Development, Bristol Myers Squibb, New Brunswick, NJ, USA
| | - Stacey DiSomma
- Chemical Process Development, Bristol Myers Squibb, New Brunswick, NJ, USA
| | - Ariel Furman
- Chemical Process Development, Bristol Myers Squibb, New Brunswick, NJ, USA
| | - G Greg Zipp
- Discovery Synthesis, Bristol Myers Squibb, Princeton, NJ, USA
| | | | - James Paulson
- Chemical Process Development, Bristol Myers Squibb, New Brunswick, NJ, USA
| | - Abigail G Doyle
- Department of Chemistry, Princeton University, Princeton, NJ, USA.
- Department of Chemistry and Biochemistry, University of California, Los Angeles, CA, USA.
| |
Collapse
|
12
|
Makarov DM, Lukanov MM, Rusanov AI, Mamardashvili NZ, Ksenofontov AA. Machine learning approach for predicting the yield of pyrroles and dipyrromethanes condensation reactions with aldehydes. JOURNAL OF COMPUTATIONAL SCIENCE 2023; 74:102173. [DOI: 10.1016/j.jocs.2023.102173] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/16/2024]
|
13
|
Wang X, Hsieh CY, Yin X, Wang J, Li Y, Deng Y, Jiang D, Wu Z, Du H, Chen H, Li Y, Liu H, Wang Y, Luo P, Hou T, Yao X. Generic Interpretable Reaction Condition Predictions with Open Reaction Condition Datasets and Unsupervised Learning of Reaction Center. RESEARCH (WASHINGTON, D.C.) 2023; 6:0231. [PMID: 37849643 PMCID: PMC10578430 DOI: 10.34133/research.0231] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 08/29/2023] [Indexed: 10/19/2023]
Abstract
Effective synthesis planning powered by deep learning (DL) can significantly accelerate the discovery of new drugs and materials. However, most DL-assisted synthesis planning methods offer either none or very limited capability to recommend suitable reaction conditions (RCs) for their reaction predictions. Currently, the prediction of RCs with a DL framework is hindered by several factors, including: (a) lack of a standardized dataset for benchmarking, (b) lack of a general prediction model with powerful representation, and (c) lack of interpretability. To address these issues, we first created 2 standardized RC datasets covering a broad range of reaction classes and then proposed a powerful and interpretable Transformer-based RC predictor named Parrot. Through careful design of the model architecture, pretraining method, and training strategy, Parrot improved the overall top-3 prediction accuracy on catalysis, solvents, and other reagents by as much as 13.44%, compared to the best previous model on a newly curated dataset. Additionally, the mean absolute error of the predicted temperatures was reduced by about 4 °C. Furthermore, Parrot manifests strong generalization capacity with superior cross-chemical-space prediction accuracy. Attention analysis indicates that Parrot effectively captures crucial chemical information and exhibits a high level of interpretability in the prediction of RCs. The proposed model Parrot exemplifies how modern neural network architecture when appropriately pretrained can be versatile in making reliable, generalizable, and interpretable recommendation for RCs even when the underlying training dataset may still be limited in diversity.
Collapse
Affiliation(s)
- Xiaorui Wang
- Dr. Neher’s Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health,
Macau University of Science and Technology, Macao, 999078, China
- CarbonSilicon AI Technology Co.,
Ltd, Hangzhou, Zhejiang310018, China
| | - Chang-Yu Hsieh
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences,
Zhejiang University, Hangzhou, 310058, China
| | - Xiaodan Yin
- Dr. Neher’s Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health,
Macau University of Science and Technology, Macao, 999078, China
- CarbonSilicon AI Technology Co.,
Ltd, Hangzhou, Zhejiang310018, China
| | - Jike Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences,
Zhejiang University, Hangzhou, 310058, China
- CarbonSilicon AI Technology Co.,
Ltd, Hangzhou, Zhejiang310018, China
| | - Yuquan Li
- College of Chemistry and Chemical Engineering,
Lanzhou University, Lanzhou, 730000, China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co.,
Ltd, Hangzhou, Zhejiang310018, China
| | - Dejun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences,
Zhejiang University, Hangzhou, 310058, China
- CarbonSilicon AI Technology Co.,
Ltd, Hangzhou, Zhejiang310018, China
| | - Zhenxing Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences,
Zhejiang University, Hangzhou, 310058, China
- CarbonSilicon AI Technology Co.,
Ltd, Hangzhou, Zhejiang310018, China
| | - Hongyan Du
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences,
Zhejiang University, Hangzhou, 310058, China
| | - Hongming Chen
- Center of Chemistry and Chemical Biology,
Guangzhou Regenerative Medicine and Health Guangdong Laboratory, Guangzhou 510530, China
| | - Yun Li
- College of Chemistry and Chemical Engineering,
Lanzhou University, Lanzhou, 730000, China
| | - Huanxiang Liu
- Faculty of Applied Sciences,
Macao Polytechnic University, Macao, 999078, China
| | - Yuwei Wang
- College of Pharmacy,
Shaanxi University of Chinese Medicine, Xianyang, Shaanxi, 712044, China
| | - Pei Luo
- Dr. Neher’s Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health,
Macau University of Science and Technology, Macao, 999078, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences,
Zhejiang University, Hangzhou, 310058, China
| | - Xiaojun Yao
- Faculty of Applied Sciences,
Macao Polytechnic University, Macao, 999078, China
| |
Collapse
|
14
|
Rinehart NI, Saunthwal RK, Wellauer J, Zahrt AF, Schlemper L, Shved AS, Bigler R, Fantasia S, Denmark SE. A machine-learning tool to predict substrate-adaptive conditions for Pd-catalyzed C-N couplings. Science 2023; 381:965-972. [PMID: 37651532 DOI: 10.1126/science.adg2114] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Accepted: 08/01/2023] [Indexed: 09/02/2023]
Abstract
Machine-learning methods have great potential to accelerate the identification of reaction conditions for chemical transformations. A tool that gives substrate-adaptive conditions for palladium (Pd)-catalyzed carbon-nitrogen (C-N) couplings is presented. The design and construction of this tool required the generation of an experimental dataset that explores a diverse network of reactant pairings across a set of reaction conditions. A large scope of C-N couplings was actively learned by neural network models by using a systematic process to design experiments. The models showed good performance in experimental validation: Ten products were isolated in more than 85% yield from a range of couplings with out-of-sample reactants designed to challenge the models. Importantly, the developed workflow continually improves the prediction capability of the tool as the corpus of data grows.
Collapse
Affiliation(s)
- N Ian Rinehart
- Roger Adams Laboratory, Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Rakesh K Saunthwal
- Roger Adams Laboratory, Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Joël Wellauer
- Pharmaceutical Division, Synthetic Molecules Technical Development, Process Chemistry and Catalysis, F. Hoffmann-La Roche, Ltd., Basel, Switzerland
| | - Andrew F Zahrt
- Roger Adams Laboratory, Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Lukas Schlemper
- Pharmaceutical Division, Synthetic Molecules Technical Development, Process Chemistry and Catalysis, F. Hoffmann-La Roche, Ltd., Basel, Switzerland
| | - Alexander S Shved
- Roger Adams Laboratory, Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Raphael Bigler
- Pharmaceutical Division, Synthetic Molecules Technical Development, Process Chemistry and Catalysis, F. Hoffmann-La Roche, Ltd., Basel, Switzerland
| | - Serena Fantasia
- Pharmaceutical Division, Synthetic Molecules Technical Development, Process Chemistry and Catalysis, F. Hoffmann-La Roche, Ltd., Basel, Switzerland
| | - Scott E Denmark
- Roger Adams Laboratory, Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| |
Collapse
|
15
|
Shim E, Tewari A, Cernak T, Zimmerman PM. Machine Learning Strategies for Reaction Development: Toward the Low-Data Limit. J Chem Inf Model 2023; 63:3659-3668. [PMID: 37312524 PMCID: PMC11163943 DOI: 10.1021/acs.jcim.3c00577] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Machine learning models are increasingly being utilized to predict outcomes of organic chemical reactions. A large amount of reaction data is used to train these models, which is in stark contrast to how expert chemists discover and develop new reactions by leveraging information from a small number of relevant transformations. Transfer learning and active learning are two strategies that can operate in low-data situations, which may help fill this gap and promote the use of machine learning for tackling real-world challenges in organic synthesis. This Perspective introduces active and transfer learning and connects these to potential opportunities and directions for further research, especially in the area of prospective development of chemical transformations.
Collapse
Affiliation(s)
- Eunjae Shim
- Department of Chemistry, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Ambuj Tewari
- Department of Statistics, University of Michigan, Ann Arbor, Michigan 48109, United States
- Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Tim Cernak
- Department of Chemistry, University of Michigan, Ann Arbor, Michigan 48109, United States
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Paul M Zimmerman
- Department of Chemistry, University of Michigan, Ann Arbor, Michigan 48109, United States
| |
Collapse
|
16
|
Faurschou NV, Taaning RH, Pedersen CM. Substrate specific closed-loop optimization of carbohydrate protective group chemistry using Bayesian optimization and transfer learning. Chem Sci 2023; 14:6319-6329. [PMID: 37325141 PMCID: PMC10266441 DOI: 10.1039/d3sc01261a] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Accepted: 05/12/2023] [Indexed: 06/17/2023] Open
Abstract
A new way of performing reaction optimization within carbohydrate chemistry is presented. This is done by performing closed-loop optimization of regioselective benzoylation of unprotected glycosides using Bayesian optimization. Both 6-O-monobenzoylations and 3,6-O-dibenzoylations of three different monosaccharides are optimized. A novel transfer learning approach, where data from previous optimizations of different substrates is used to speed up the optimizations, has also been developed. The optimal conditions found by the Bayesian optimization algorithm provide new insight into substrate specificity, as the conditions found are significantly different. In most cases, the optimal conditions include Et3N and benzoic anhydride, a new reagent combination for these reactions, discovered by the algorithm, demonstrating the power of this concept to widen the chemical space. Further, the developed procedures include ambient conditions and short reaction times.
Collapse
|
17
|
Capaldo L, Wen Z, Noël T. A field guide to flow chemistry for synthetic organic chemists. Chem Sci 2023; 14:4230-4247. [PMID: 37123197 PMCID: PMC10132167 DOI: 10.1039/d3sc00992k] [Citation(s) in RCA: 77] [Impact Index Per Article: 38.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Accepted: 03/15/2023] [Indexed: 03/17/2023] Open
Abstract
Flow chemistry has unlocked a world of possibilities for the synthetic community, but the idea that it is a mysterious "black box" needs to go. In this review, we show that several of the benefits of microreactor technology can be exploited to push the boundaries in organic synthesis and to unleash unique reactivity and selectivity. By "lifting the veil" on some of the governing principles behind the observed trends, we hope that this review will serve as a useful field guide for those interested in diving into flow chemistry.
Collapse
Affiliation(s)
- Luca Capaldo
- Flow Chemistry Group, Van 't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam 1098 XH Amsterdam The Netherlands
| | - Zhenghui Wen
- Flow Chemistry Group, Van 't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam 1098 XH Amsterdam The Netherlands
| | - Timothy Noël
- Flow Chemistry Group, Van 't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam 1098 XH Amsterdam The Netherlands
| |
Collapse
|
18
|
Chen Y, Ou Y, Zheng P, Huang Y, Ge F, Dral PO. Benchmark of general-purpose machine learning-based quantum mechanical method AIQM1 on reaction barrier heights. J Chem Phys 2023; 158:074103. [PMID: 36813722 DOI: 10.1063/5.0137101] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Artificial intelligence-enhanced quantum mechanical method 1 (AIQM1) is a general-purpose method that was shown to achieve high accuracy for many applications with a speed close to its baseline semiempirical quantum mechanical (SQM) method ODM2*. Here, we evaluate the hitherto unknown performance of out-of-the-box AIQM1 without any refitting for reaction barrier heights on eight datasets, including a total of ∼24 thousand reactions. This evaluation shows that AIQM1's accuracy strongly depends on the type of transition state and ranges from excellent for rotation barriers to poor for, e.g., pericyclic reactions. AIQM1 clearly outperforms its baseline ODM2* method and, even more so, a popular universal potential, ANI-1ccx. Overall, however, AIQM1 accuracy largely remains similar to SQM methods (and B3LYP/6-31G* for most reaction types) suggesting that it is desirable to focus on improving AIQM1 performance for barrier heights in the future. We also show that the built-in uncertainty quantification helps in identifying confident predictions. The accuracy of confident AIQM1 predictions is approaching the level of popular density functional theory methods for most reaction types. Encouragingly, AIQM1 is rather robust for transition state optimizations, even for the type of reactions it struggles with the most. Single-point calculations with high-level methods on AIQM1-optimized geometries can be used to significantly improve barrier heights, which cannot be said for its baseline ODM2* method.
Collapse
Affiliation(s)
- Yuxinxin Chen
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Yanchi Ou
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Peikun Zheng
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Yaohuang Huang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Fuchun Ge
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| |
Collapse
|
19
|
Singh S, Sunoj RB. Molecular Machine Learning for Chemical Catalysis: Prospects and Challenges. Acc Chem Res 2023; 56:402-412. [PMID: 36715248 DOI: 10.1021/acs.accounts.2c00801] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
ConspectusIn the domain of reaction development, one aims to obtain higher efficacies as measured in terms of yield and/or selectivities. During the empirical cycles, an admixture of outcomes from low to high yields/selectivities is expected. While it is not easy to identify all of the factors that might impact the reaction efficiency, complex and nonlinear dependence on the nature of reactants, catalysts, solvents, etc. is quite likely. Developmental stages of newer reactions would typically offer a few hundreds of samples with variations in participating molecules and/or reaction conditions. These "observations" and their "output" can be harnessed as valuable labeled data for developing molecular machine learning (ML) models. Once a robust ML model is built for a specific reaction under development, it can predict the reaction outcome for any new choice of substrates/catalyst in a few seconds/minutes and thus can expedite the identification of promising candidates for experimental validation. Recent years have witnessed impressive applications of ML in the molecular world, most of them aimed at predicting important chemical or biological properties. We believe that an integration of effective ML workflows can be made richly beneficial to reaction discovery.As with any new technology, direct adaptation of ML as used in well-developed domains, such as natural language processing (NLP) and image recognition, is unlikely to succeed in reaction discovery. Some of the challenges stem from ineffective featurization of the molecular space, unavailability of quality data and its distribution, in making the right choice of ML model and its technically robust deployment. It shall be noted that there is no universal ML model suitable for an inherently high-dimensional problem such as chemical reactions. Given these backgrounds, rendering ML tools conducive for reactions is an exciting as well as challenging endeavor at the same time. With the increased availability of efficient ML algorithms, we focused on tapping their potential for small-data reaction discovery (a few hundreds to thousands of samples).In this Account, we describe both feature engineering and feature learning approaches for molecular ML as applied to diverse reactions of high contemporary interest. Among these, catalytic asymmetric hydrogenation of imines/alkenes, β-C(sp3)-H bond functionalization, and relay Heck reaction employed a feature engineering approach using the quantum-chemically derived physical organic descriptors as the molecular features─all designed to predict the enantioselectivity. The selection of molecular features to customize it for a reaction of interest is described, along with emphasizing the chemical insights that could be gathered through the use of such features. Feature learning methods for predicting the yield of Buchwald-Hartwig cross-coupling, deoxyfluorination of alcohols, and enantioselectivity of N,S-acetal formation are found to offer excellent predictions. We propose a transfer learning protocol, wherein an ML model such as a language model is trained on a large number of molecules (105-106) and fine-tuned on a focused library of target task reactions, as an effective alternative for small-data reaction discovery (102-103 reactions). The exploitation of deep neural network latent space as a method for generative tasks to identify useful substrates for a reaction is demonstrated as a promising strategy.
Collapse
Affiliation(s)
- Sukriti Singh
- Department of Chemistry, Indian Institute of Technology Bombay, Mumbai 400076, India
| | - Raghavan B Sunoj
- Department of Chemistry, Indian Institute of Technology Bombay, Mumbai 400076, India.,Centre for Machine Intelligence and Data Science, Indian Institute of Technology Bombay, Mumbai 400076, India
| |
Collapse
|
20
|
Seifrid M, Pollice R, Aguilar-Granda A, Morgan Chan Z, Hotta K, Ser CT, Vestfrid J, Wu TC, Aspuru-Guzik A. Autonomous Chemical Experiments: Challenges and Perspectives on Establishing a Self-Driving Lab. Acc Chem Res 2022; 55:2454-2466. [PMID: 35948428 PMCID: PMC9454899 DOI: 10.1021/acs.accounts.2c00220] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Indexed: 01/19/2023]
Abstract
We must accelerate the pace at which we make technological advancements to address climate change and disease risks worldwide. This swifter pace of discovery requires faster research and development cycles enabled by better integration between hypothesis generation, design, experimentation, and data analysis. Typical research cycles take months to years. However, data-driven automated laboratories, or self-driving laboratories, can significantly accelerate molecular and materials discovery. Recently, substantial advancements have been made in the areas of machine learning and optimization algorithms that have allowed researchers to extract valuable knowledge from multidimensional data sets. Machine learning models can be trained on large data sets from the literature or databases, but their performance can often be hampered by a lack of negative results or metadata. In contrast, data generated by self-driving laboratories can be information-rich, containing precise details of the experimental conditions and metadata. Consequently, much larger amounts of high-quality data are gathered in self-driving laboratories. When placed in open repositories, this data can be used by the research community to reproduce experiments, for more in-depth analysis, or as the basis for further investigation. Accordingly, high-quality open data sets will increase the accessibility and reproducibility of science, which is sorely needed.In this Account, we describe our efforts to build a self-driving lab for the development of a new class of materials: organic semiconductor lasers (OSLs). Since they have only recently been demonstrated, little is known about the molecular and material design rules for thin-film, electrically-pumped OSL devices as compared to other technologies such as organic light-emitting diodes or organic photovoltaics. To realize high-performing OSL materials, we are developing a flexible system for automated synthesis via iterative Suzuki-Miyaura cross-coupling reactions. This automated synthesis platform is directly coupled to the analysis and purification capabilities. Subsequently, the molecules of interest can be transferred to an optical characterization setup. We are currently limited to optical measurements of the OSL molecules in solution. However, material properties are ultimately most important in the solid state (e.g., as a thin-film device). To that end and for a different scientific goal, we are developing a self-driving lab for inorganic thin-film materials focused on the oxygen evolution reaction.While the future of self-driving laboratories is very promising, numerous challenges still need to be overcome. These challenges can be split into cognition and motor function. Generally, the cognitive challenges are related to optimization with constraints or unexpected outcomes for which general algorithmic solutions have yet to be developed. A more practical challenge that could be resolved in the near future is that of software control and integration because few instrument manufacturers design their products with self-driving laboratories in mind. Challenges in motor function are largely related to handling heterogeneous systems, such as dispensing solids or performing extractions. As a result, it is critical to understand that adapting experimental procedures that were designed for human experimenters is not as simple as transferring those same actions to an automated system, and there may be more efficient ways to achieve the same goal in an automated fashion. Accordingly, for self-driving laboratories, we need to carefully rethink the translation of manual experimental protocols.
Collapse
Affiliation(s)
- Martin Seifrid
- Department
of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada
| | - Robert Pollice
- Department
of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada
| | | | - Zamyla Morgan Chan
- Department
of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada
- Acceleration
Consortium, University of Toronto, Toronto, Ontario M5S 3H6, Canada
| | - Kazuhiro Hotta
- Department
of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada
- Science
& Innovation Center, Mitsubishi Chemical
Corporation, 1000 Kamoshidacho, Aoba, Yokohama, Kanagawa 227-8502, Japan
| | - Cher Tian Ser
- Department
of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada
| | - Jenya Vestfrid
- Department
of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada
| | - Tony C. Wu
- Department
of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada
| | - Alán Aspuru-Guzik
- Department
of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada
- Department
of Computer Science, University of Toronto, Toronto, Ontario M5S 3H6, Canada
- Department
of Chemical Engineering & Applied Chemistry, University of Toronto, Toronto, Ontario M5S 3E5, Canada
- Department
of Materials Science, University of Toronto, Toronto, Ontario M5S 3E4, Canada
- Vector
Institute for Artificial Intelligence, Toronto, Ontario M5S 1M1, Canada
- Lebovic
Fellow, Canadian Institute for Advanced
Research, Toronto, Ontario M5S 1M1, Canada
| |
Collapse
|