1
|
Sigmund LM, Assante M, Johansson MJ, Norrby PO, Jorner K, Kabeshov M. Computational tools for the prediction of site- and regioselectivity of organic reactions. Chem Sci 2025; 16:5383-5412. [PMID: 40070469 PMCID: PMC11891785 DOI: 10.1039/d5sc00541h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2025] [Accepted: 03/03/2025] [Indexed: 03/14/2025] Open
Abstract
The regio- and site-selectivity of organic reactions is one of the most important aspects when it comes to synthesis planning. Due to that, massive research efforts were invested into computational models for regio- and site-selectivity prediction, and the introduction of machine learning to the chemical sciences within the past decade has added a whole new dimension to these endeavors. This review article walks through the currently available predictive tools for regio- and site-selectivity with a particular focus on machine learning models while being organized along the individual reaction classes of organic chemistry. Respective featurization techniques and model architectures are described and compared to each other; applications of the tools to critical real-world examples are highlighted. This paper aims to serve as an overview of the field's status quo for both the intended users of the tools, that is synthetic chemists, as well as for developers to find potential new research avenues.
Collapse
Affiliation(s)
- Lukas M Sigmund
- Molecular AI, Discovery Sciences, R&D, AstraZeneca Gothenburg Pepparedsleden 1 43183 Mölndal Sweden
| | - Michele Assante
- Innovation Centre in Digital Molecular Technologies, Department of Chemistry, University of Cambridge Lensfield Rd Cambridge CB2 1EW UK
- Compound Synthesis & Management, The Discovery Centre, AstraZeneca Cambridge Cambridge Biomedical Campus, 1 Francis Crick Avenue CB2 0AA Cambridge UK
| | - Magnus J Johansson
- Medicinal Chemistry, Research and Early Development, Cardiovascular, Renal and Metabolism (CVRM), BioPharmaceuticals, R&D, AstraZeneca Gothenburg Pepparedsleden 1 43183 Mölndal Sweden
| | - Per-Ola Norrby
- Data Science & Modelling, Pharmaceutical Sciences, R&D, AstraZeneca Gothenburg Pepparedsleden 1 43183 Mölndal Sweden
| | - Kjell Jorner
- ETH Zürich, Institute of Chemical and Bioengineering, Department of Chemistry and Applied Biosciences Vladimir-Prelog-Weg 1 CH-8093 Zürich Switzerland
- National Centre of Competence in Research (NCCR) Catalysis, ETH Zurich Zurich Switzerland
| | - Mikhail Kabeshov
- Molecular AI, Discovery Sciences, R&D, AstraZeneca Gothenburg Pepparedsleden 1 43183 Mölndal Sweden
| |
Collapse
|
2
|
Hann MM, Keserű GM. The continuing importance of chemical intuition for the medicinal chemist in the era of Artificial Intelligence. Expert Opin Drug Discov 2025; 20:137-140. [PMID: 39810383 DOI: 10.1080/17460441.2025.2450785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2024] [Revised: 12/15/2024] [Accepted: 01/05/2025] [Indexed: 01/16/2025]
Affiliation(s)
| | - György M Keserű
- Drug Innovation Centre, HUN-REN Research Centre for Natural Sciences, Budapest, Hungary
| |
Collapse
|
3
|
Haas BC, Kalyani D, Sigman MS. Applying statistical modeling strategies to sparse datasets in synthetic chemistry. SCIENCE ADVANCES 2025; 11:eadt3013. [PMID: 39742471 DOI: 10.1126/sciadv.adt3013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/20/2024] [Accepted: 11/20/2024] [Indexed: 01/03/2025]
Abstract
The application of statistical modeling in organic chemistry is emerging as a standard practice for probing structure-activity relationships and as a predictive tool for many optimization objectives. This review is aimed as a tutorial for those entering the area of statistical modeling in chemistry. We provide case studies to highlight the considerations and approaches that can be used to successfully analyze datasets in low data regimes, a common situation encountered given the experimental demands of organic chemistry. Statistical modeling hinges on the data (what is being modeled), descriptors (how data are represented), and algorithms (how data are modeled). Herein, we focus on how various reaction outputs (e.g., yield, rate, selectivity, solubility, stability, and turnover number) and data structures (e.g., binned, heavily skewed, and distributed) influence the choice of algorithm used for constructing predictive and chemically insightful statistical models.
Collapse
Affiliation(s)
- Brittany C Haas
- Department of Chemistry, University of Utah, Salt Lake City, UT 84112, USA
| | | | - Matthew S Sigman
- Department of Chemistry, University of Utah, Salt Lake City, UT 84112, USA
| |
Collapse
|
4
|
Pleiss J. Modeling Enzyme Kinetics: Current Challenges and Future Perspectives for Biocatalysis. Biochemistry 2024; 63:2533-2541. [PMID: 39325558 DOI: 10.1021/acs.biochem.4c00501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/28/2024]
Abstract
Biocatalysis is becoming a data science. High-throughput experimentation generates a rapidly increasing stream of biocatalytic data, which is the raw material for mechanistic and novel data-driven modeling approaches for the predictive design of improved biocatalysts and novel bioprocesses. The holistic and molecular understanding of enzymatic reaction systems will enable us to identify and overcome kinetic bottlenecks and shift the thermodynamics of a reaction. The full characterization and modeling of reaction systems is a community effort; therefore, published methods and results should be findable, accessible, interoperable, and reusable (FAIR), which is achieved by developing standardized data exchange formats, by a complete and reproducible documentation of experimentation, by collaborative platforms for developing sustainable software and for analyzing data, and by repositories for publishing results together with raw data. The FAIRification of biocatalysis is a prerequisite to developing highly automated laboratory infrastructures that improve the reproducibility of scientific results and reduce the time and costs required to develop novel synthesis routes.
Collapse
Affiliation(s)
- Jürgen Pleiss
- Institute of Biochemistry and Technical Biochemistry, University of Stuttgart, Allmandring 31, 70569 Stuttgart, Germany
| |
Collapse
|
5
|
Slattery A, Wen Z, Tenblad P, Sanjosé-Orduna J, Pintossi D, den Hartog T, Noël T. Automated self-optimization, intensification, and scale-up of photocatalysis in flow. Science 2024; 383:eadj1817. [PMID: 38271529 DOI: 10.1126/science.adj1817] [Citation(s) in RCA: 31] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2023] [Accepted: 12/13/2023] [Indexed: 01/27/2024]
Abstract
The optimization, intensification, and scale-up of photochemical processes constitute a particular challenge in a manufacturing environment geared primarily toward thermal chemistry. In this work, we present a versatile flow-based robotic platform to address these challenges through the integration of readily available hardware and custom software. Our open-source platform combines a liquid handler, syringe pumps, a tunable continuous-flow photoreactor, inexpensive Internet of Things devices, and an in-line benchtop nuclear magnetic resonance spectrometer to enable automated, data-rich optimization with a closed-loop Bayesian optimization strategy. A user-friendly graphical interface allows chemists without programming or machine learning expertise to easily monitor, analyze, and improve photocatalytic reactions with respect to both continuous and discrete variables. The system's effectiveness was demonstrated by increasing overall reaction yields and improving space-time yields compared with those of previously reported processes.
Collapse
Affiliation(s)
- Aidan Slattery
- Flow Chemistry Group, van 't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Science Park 904, 1098 XH Amsterdam, Netherlands
| | - Zhenghui Wen
- Flow Chemistry Group, van 't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Science Park 904, 1098 XH Amsterdam, Netherlands
| | - Pauline Tenblad
- Flow Chemistry Group, van 't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Science Park 904, 1098 XH Amsterdam, Netherlands
| | - Jesús Sanjosé-Orduna
- Flow Chemistry Group, van 't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Science Park 904, 1098 XH Amsterdam, Netherlands
| | - Diego Pintossi
- Flow Chemistry Group, van 't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Science Park 904, 1098 XH Amsterdam, Netherlands
| | - Tim den Hartog
- Flow Chemistry Group, van 't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Science Park 904, 1098 XH Amsterdam, Netherlands
- Zuyd University of Applied Sciences, Nieuw Eyckholt 300, 6419 DJ Heerlen, Netherlands
- Netherlands Organisation for Applied Scientific Research (TNO), High Tech Campus 25, 5656 AE Eindhoven, Netherlands
| | - Timothy Noël
- Flow Chemistry Group, van 't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Science Park 904, 1098 XH Amsterdam, Netherlands
| |
Collapse
|
6
|
Selingo JD, Greenwood JW, Andrews MK, Patel C, Neel AJ, Pio B, Shevlin M, Phillips EM, Maddess ML, McNally A. A General Strategy for N-(Hetero)arylpiperidine Synthesis Using Zincke Imine Intermediates. J Am Chem Soc 2024; 146:936-945. [PMID: 38153812 DOI: 10.1021/jacs.3c11504] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2023]
Abstract
Methods to synthesize diverse collections of substituted piperidines are valuable due to the prevalence of this heterocycle in pharmaceutical compounds. Here, we present a general strategy to access N-(hetero)arylpiperidines using a pyridine ring-opening and ring-closing approach via Zincke imine intermediates. This process generates pyridinium salts from a wide variety of substituted pyridines and (heteroaryl)anilines; hydrogenation reactions and nucleophilic additions then access the N-(hetero)arylpiperidine derivatives. We successfully applied high-throughput experimentation (HTE) using pharmaceutically relevant pyridines and (heteroaryl)anilines as inputs and developed a one-pot process using anilines as nucleophiles in the pyridinium salt-forming processes. This strategy is viable for generating piperidine libraries and applications such as the convergent coupling of complex fragments.
Collapse
Affiliation(s)
- Jake D Selingo
- Department of Chemistry, Colorado State University, Fort Collins, Colorado 80523, United States
| | - Jacob W Greenwood
- Department of Chemistry, Colorado State University, Fort Collins, Colorado 80523, United States
| | - Mary Katherine Andrews
- Department of Chemistry, Colorado State University, Fort Collins, Colorado 80523, United States
| | - Chirag Patel
- Department of Chemistry, Colorado State University, Fort Collins, Colorado 80523, United States
| | - Andrew J Neel
- Department of Process Research and Development, Merck & Company, Incorporated, Boston, Massachusetts 02115, United States
| | - Barbara Pio
- Department of Discovery Chemistry, Merck & Co., Inc., Rahway, New Jersey 07065, United States
| | - Michael Shevlin
- Department of Process Research and Development, Merck & Co., Inc., Rahway, New Jersey 07065, United States
| | - Eric M Phillips
- Department of Process Research and Development, Merck & Co., Inc., Rahway, New Jersey 07065, United States
| | - Matthew L Maddess
- Department of Process Research and Development, Merck & Co., Inc., Boston, Massachusetts 02115, United States
| | - Andrew McNally
- Department of Chemistry, Colorado State University, Fort Collins, Colorado 80523, United States
| |
Collapse
|
7
|
Raghavan P, Haas BC, Ruos ME, Schleinitz J, Doyle AG, Reisman SE, Sigman MS, Coley CW. Dataset Design for Building Models of Chemical Reactivity. ACS CENTRAL SCIENCE 2023; 9:2196-2204. [PMID: 38161380 PMCID: PMC10755851 DOI: 10.1021/acscentsci.3c01163] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 11/06/2023] [Accepted: 11/15/2023] [Indexed: 01/03/2024]
Abstract
Models can codify our understanding of chemical reactivity and serve a useful purpose in the development of new synthetic processes via, for example, evaluating hypothetical reaction conditions or in silico substrate tolerance. Perhaps the most determining factor is the composition of the training data and whether it is sufficient to train a model that can make accurate predictions over the full domain of interest. Here, we discuss the design of reaction datasets in ways that are conducive to data-driven modeling, emphasizing the idea that training set diversity and model generalizability rely on the choice of molecular or reaction representation. We additionally discuss the experimental constraints associated with generating common types of chemistry datasets and how these considerations should influence dataset design and model building.
Collapse
Affiliation(s)
- Priyanka Raghavan
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Brittany C. Haas
- Department
of Chemistry, University of Utah, Salt Lake City, Utah 84112, United States
| | - Madeline E. Ruos
- Department
of Chemistry & Biochemistry, University
of California, Los Angeles, Los Angeles, California 90095, United States
| | - Jules Schleinitz
- Division
of Chemistry and Chemical Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| | - Abigail G. Doyle
- Department
of Chemistry & Biochemistry, University
of California, Los Angeles, Los Angeles, California 90095, United States
| | - Sarah E. Reisman
- Division
of Chemistry and Chemical Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| | - Matthew S. Sigman
- Department
of Chemistry, University of Utah, Salt Lake City, Utah 84112, United States
| | - Connor W. Coley
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
8
|
Bianchi P, Monbaliu JCM. Revisiting the Paradigm of Reaction Optimization in Flow with a Priori Computational Reaction Intelligence. Angew Chem Int Ed Engl 2023:e202311526. [PMID: 37875458 DOI: 10.1002/anie.202311526] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 10/21/2023] [Accepted: 10/24/2023] [Indexed: 10/26/2023]
Abstract
The use of micro/meso-fluidic reactors has resulted in both new scenarios for chemistry and new requirements for chemists. Through flow chemistry, large-scale reactions can be performed in drastically reduced reactor sizes and reaction times. This obvious advantage comes with the concomitant challenge of re-designing long-established batch processes to fit these new conditions. The reliance on experimental trial-and-error to perform this translation frequently makes flow chemistry unaffordable, thwarting initial aspirations to revolutionize chemistry. By combining computational chemistry and machine learning, we have developed a model that provides predictive power tailored specifically to flow reactions. We show its applications to translate batch to flow, to provide mechanistic insight, to contribute reagent descriptors, and to synthesize a library of novel compounds in excellent yields after executing a single set of conditions.
Collapse
Affiliation(s)
- Pauline Bianchi
- Center for Integrated Technology and Organic Synthesis (CiTOS), MolSys Research Unit, University of Liège, B6a, Room 3/19, Allée du Six Août 13, 4000, Liège (SartTilman), Belgium
| | - Jean-Christophe M Monbaliu
- Center for Integrated Technology and Organic Synthesis (CiTOS), MolSys Research Unit, University of Liège, B6a, Room 3/19, Allée du Six Août 13, 4000, Liège (SartTilman), Belgium
- WEL Research Institute, Avenue Pasteur 6, 1300, Wavre, Belgium
| |
Collapse
|
9
|
Reid JP, Betinol IO, Kuang Y. Mechanism to model: a physical organic chemistry approach to reaction prediction. Chem Commun (Camb) 2023; 59:10711-10721. [PMID: 37552047 DOI: 10.1039/d3cc03229a] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/09/2023]
Abstract
The application of mechanistic generalizations is at the core of chemical reaction development and application. These strategies are rooted in physical organic chemistry where mechanistic understandings can be derived from one reaction and applied to explain another. Over time these techniques have evolved from rationalizing observed outcomes to leading experimental design through reaction prediction. In parallel, significant progression in asymmetric organocatalysis has expanded the reach of chiral transfer to new reactions with increased efficiency. However, the complex and diverse catalyst structures applied in this arena have rendered the generalization of asymmetric catalytic processes to be exceptionally challenging. Recognizing this, a portion of our research has been focused on understanding the transferability of chemical observations between similar reactions and exploiting this phenomenon as a platform for prediction. Through these experiences, we have relied on a working knowledge of reaction mechanism to guide the development and application of our models which have been advanced from simple qualitative rules to large statistical models for quantitative predictions. In this feature article, we describe the models acquired to generalize organocatalytic reaction mechanisms and demonstrate their use as a powerful approach for accelerating enantioselective synthesis.
Collapse
Affiliation(s)
- Jolene P Reid
- Department of Chemistry, University of British Columbia, 2036 Main Mall, Vancouver, British Columbia, V6T 1Z1, Canada.
| | - Isaiah O Betinol
- Department of Chemistry, University of British Columbia, 2036 Main Mall, Vancouver, British Columbia, V6T 1Z1, Canada.
| | - Yutao Kuang
- Department of Chemistry, University of British Columbia, 2036 Main Mall, Vancouver, British Columbia, V6T 1Z1, Canada.
| |
Collapse
|
10
|
Williams JD, Kappe CO. Self-Optimizing Flow Reactors Get a Boost by Multitasking. ACS CENTRAL SCIENCE 2023; 9:864-866. [PMID: 37252365 PMCID: PMC10214518 DOI: 10.1021/acscentsci.3c00548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
|