1
|
Felten S, He CQ, Emmert MH. C-H Aminoalkylation of 5-Membered Heterocycles: Influence of Descriptors, Data Set Size, and Data Quality on the Predictiveness of Machine Learning Models and Expansion of the Substrate Space Beyond 1,3-Azoles. J Org Chem 2025; 90:2613-2625. [PMID: 39933045 DOI: 10.1021/acs.joc.4c02574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/13/2025]
Abstract
We report a general C-H aminoalkylation of 5-membered heterocycles through a combined machine learning/experimental workflow. Our work describes previously unknown C-H functionalization reactivity and creates a predictive machine learning (ML) model through iterative refinement over 6 rounds of active learning. The initial model established with 1,3-azoles predicts the reactivities of N-aryl indazoles, 1,2,4-triazolopyrazines, 1,2,3-thiadiazoles, and 1,3,4-oxadiazoles, while other substrate classes (e.g., pyrazoles and 1,2,4-triazoles) are not predicted well. The final model includes the reactivities of additional heterocyclic scaffolds in the training data, which results in high predictive accuracy across all of the tested cores. The high prediction performance is shown both within the training set via cross-validation (CV R2 = 0.81) and when predicting unseen substrates of diverse molecular weight and structure (Test R2 = 0.95). The concept of feature engineering is discussed, and we benchmark mechanistically related DFT-based features that are more time-intensive and laborious in comparison with molecular descriptors and fingerprints. Importantly, this work establishes novel reactivity for heterocycles for which C-H functionalization methods are underdeveloped. Since such heterocycles are key motifs in drug discovery and development, we expect this work to be of significant use to the synthetic and synthesis-oriented ML communities.
Collapse
Affiliation(s)
- Stephanie Felten
- Process Research and Development, MRL, Merck & Co., Inc., 126 E Lincoln Ave, Rahway, New Jersey 07065, United States
| | - Cyndi Qixin He
- Computational and Structural Chemistry, MRL, Merck & Co., Inc., 126 E Lincoln Ave, Rahway, New Jersey 07065, United States
| | - Marion H Emmert
- Process Research and Development, MRL, Merck & Co., Inc., 126 E Lincoln Ave, Rahway, New Jersey 07065, United States
| |
Collapse
|
2
|
Schoepfer A, Laplaza R, Wodrich MD, Waser J, Corminboeuf C. Reaction-Agnostic Featurization of Bidentate Ligands for Bayesian Ridge Regression of Enantioselectivity. ACS Catal 2024; 14:9302-9312. [PMID: 38933467 PMCID: PMC11197013 DOI: 10.1021/acscatal.4c02452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Revised: 05/22/2024] [Accepted: 05/22/2024] [Indexed: 06/28/2024]
Abstract
Chiral ligands are important components in asymmetric homogeneous catalysis, but their synthesis and screening can be both time-consuming and resource-intensive. Data-driven approaches, in contrast to screening procedures based on intuition, have the potential to reduce the time and resources needed for reaction optimization by more rapidly identifying an ideal catalyst. These approaches, however, are often nontransferable and cannot be applied across different reactions. To overcome this drawback, we introduce a general featurization strategy for bidentate ligands that is coupled with an automated feature selection pipeline and Bayesian ridge regression to perform multivariate linear regression modeling. This approach, which is applicable to any reaction, incorporates electronic, steric, and topological features (rigidity/flexibility, branching, geometry, and constitution) and is well-suited for early stage ligand optimization. Using only small data sets, our workflow capably predicts the enantioselectivity of four metal-catalyzed asymmetric reactions. Uncertainty estimates provided by Bayesian ridge regression permit the use of Bayesian optimization to efficiently explore pools of prospective ligands. Finally, we constructed the BDL-Cu-2023 data set, composed of 312 bidentate ligands extracted from the Cambridge Structural Database, and screened it with this procedure to identify ligand candidates for a challenging asymmetric oxy-alkynylation reaction.
Collapse
Affiliation(s)
- Alexandre
A. Schoepfer
- Laboratory
for Computational Molecular Design, Institute of Chemical Sciences
and Engineering, École Polytechnique
Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
- Laboratory
of Catalysis and Organic Synthesis, Institute of Chemical Sciences
and Engineering, École Polytechnique
Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
- National
Center for Competence in Research-Catalysis (NCCR-Catalysis), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
| | - Ruben Laplaza
- Laboratory
for Computational Molecular Design, Institute of Chemical Sciences
and Engineering, École Polytechnique
Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
- National
Center for Competence in Research-Catalysis (NCCR-Catalysis), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
| | - Matthew D. Wodrich
- Laboratory
for Computational Molecular Design, Institute of Chemical Sciences
and Engineering, École Polytechnique
Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
- National
Center for Competence in Research-Catalysis (NCCR-Catalysis), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
| | - Jerome Waser
- Laboratory
of Catalysis and Organic Synthesis, Institute of Chemical Sciences
and Engineering, École Polytechnique
Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
- National
Center for Competence in Research-Catalysis (NCCR-Catalysis), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
| | - Clemence Corminboeuf
- Laboratory
for Computational Molecular Design, Institute of Chemical Sciences
and Engineering, École Polytechnique
Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
- National
Center for Competence in Research-Catalysis (NCCR-Catalysis), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
| |
Collapse
|
3
|
Schnitzer T, Schnurr M, Zahrt AF, Sakhaee N, Denmark SE, Wennemers H. Machine Learning to Develop Peptide Catalysts-Successes, Limitations, and Opportunities. ACS CENTRAL SCIENCE 2024; 10:367-373. [PMID: 38435528 PMCID: PMC10906243 DOI: 10.1021/acscentsci.3c01284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 01/02/2024] [Accepted: 01/03/2024] [Indexed: 03/05/2024]
Abstract
Peptides have been established as modular catalysts for various transformations. Still, the vast number of potential amino acid building blocks renders the identification of peptides with desired catalytic activity challenging. Here, we develop a machine-learning workflow for the optimization of peptide catalysts. First-in a hypothetical competition-we challenged our workflow to identify peptide catalysts for the conjugate addition reaction of aldehydes to nitroolefins and compared the performance of the predicted structures with those optimized in our laboratory. On the basis of the positive results, we established a universal training set (UTS) containing 161 catalysts to sample an in silico library of ∼30,000 tripeptide members. Finally, we challenged our machine learning strategy to identify a member of the library as a stereoselective catalyst for an annulation reaction that has not been catalyzed by a peptide thus far. We conclude with a comparison of data-driven versus expert-knowledge-guided peptide catalyst optimization.
Collapse
Affiliation(s)
- Tobias Schnitzer
- Laboratory
of Organic Chemistry, ETH Zurich, D-CHAB, Vladimir-Prelog-Weg 3, 8093 Zurich, Switzerland
| | - Martin Schnurr
- Laboratory
of Organic Chemistry, ETH Zurich, D-CHAB, Vladimir-Prelog-Weg 3, 8093 Zurich, Switzerland
| | - Andrew F. Zahrt
- Roger
Adams Laboratory, Department of Chemistry, University of Illinois, Urbana, Illinois 61801, United States
| | - Nader Sakhaee
- Roger
Adams Laboratory, Department of Chemistry, University of Illinois, Urbana, Illinois 61801, United States
| | - Scott E. Denmark
- Roger
Adams Laboratory, Department of Chemistry, University of Illinois, Urbana, Illinois 61801, United States
| | - Helma Wennemers
- Laboratory
of Organic Chemistry, ETH Zurich, D-CHAB, Vladimir-Prelog-Weg 3, 8093 Zurich, Switzerland
| |
Collapse
|
4
|
van Dijk L, Haas BC, Lim NK, Clagg K, Dotson JJ, Treacy SM, Piechowicz KA, Roytman VA, Zhang H, Toste FD, Miller SJ, Gosselin F, Sigman MS. Data Science-Enabled Palladium-Catalyzed Enantioselective Aryl-Carbonylation of Sulfonimidamides. J Am Chem Soc 2023; 145:20959-20967. [PMID: 37656964 DOI: 10.1021/jacs.3c06674] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/03/2023]
Abstract
New methods for the general asymmetric synthesis of sulfonimidamides are of great interest due to their applications in medicinal chemistry, agrochemical discovery, and academic research. We report a palladium-catalyzed cross-coupling method for the enantioselective aryl-carbonylation of sulfonimidamides. Using data science techniques, a virtual library of calculated bisphosphine ligand descriptors was used to guide reaction optimization by effectively sampling the catalyst chemical space. The optimized conditions identified using this approach provided the desired product in excellent yield and enantioselectivity. As the next step, a data science-driven strategy was also used to explore a diverse set of aryl and heteroaryl iodides, providing key information about the scope and limitations of the method. Furthermore, we tested a range of racemic sulfonimidamides for compatibility of this coupling partner. The developed method offers a general and efficient strategy for accessing enantioenriched sulfonimidamides, which should facilitate their application in industrial and academic settings.
Collapse
Affiliation(s)
- Lucy van Dijk
- Department of Chemistry, University of Utah, Salt Lake City, Utah 84112, United States
| | - Brittany C Haas
- Department of Chemistry, University of Utah, Salt Lake City, Utah 84112, United States
| | - Ngiap-Kie Lim
- Department of Small Molecule Process Chemistry, Genentech, Inc., South San Francisco, California 94080, United States
| | - Kyle Clagg
- Department of Small Molecule Process Chemistry, Genentech, Inc., South San Francisco, California 94080, United States
| | - Jordan J Dotson
- Department of Small Molecule Process Chemistry, Genentech, Inc., South San Francisco, California 94080, United States
| | - Sean M Treacy
- Department of Chemistry, University of California, Berkeley, California 94720, United States
| | - Katarzyna A Piechowicz
- Department of Small Molecule Process Chemistry, Genentech, Inc., South San Francisco, California 94080, United States
| | - Vladislav A Roytman
- Department of Chemistry, University of California, Berkeley, California 94720, United States
| | - Haiming Zhang
- Department of Small Molecule Process Chemistry, Genentech, Inc., South San Francisco, California 94080, United States
| | - F Dean Toste
- Department of Chemistry, University of California, Berkeley, California 94720, United States
| | - Scott J Miller
- Department of Chemistry, Yale University, New Haven, Connecticut 06511, United States
| | - Francis Gosselin
- Department of Small Molecule Process Chemistry, Genentech, Inc., South San Francisco, California 94080, United States
| | - Matthew S Sigman
- Department of Chemistry, University of Utah, Salt Lake City, Utah 84112, United States
| |
Collapse
|
5
|
Li B, Su S, Zhu C, Lin J, Hu X, Su L, Yu Z, Liao K, Chen H. A deep learning framework for accurate reaction prediction and its application on high-throughput experimentation data. J Cheminform 2023; 15:72. [PMID: 37568183 PMCID: PMC10422736 DOI: 10.1186/s13321-023-00732-w] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Accepted: 06/30/2023] [Indexed: 08/13/2023] Open
Abstract
In recent years, it has been seen that artificial intelligence (AI) starts to bring revolutionary changes to chemical synthesis. However, the lack of suitable ways of representing chemical reactions and the scarceness of reaction data has limited the wider application of AI to reaction prediction. Here, we introduce a novel reaction representation, GraphRXN, for reaction prediction. It utilizes a universal graph-based neural network framework to encode chemical reactions by directly taking two-dimension reaction structures as inputs. The GraphRXN model was evaluated by three publically available chemical reaction datasets and gave on-par or superior results compared with other baseline models. To further evaluate the effectiveness of GraphRXN, wet-lab experiments were carried out for the purpose of generating reaction data. GraphRXN model was then built on high-throughput experimentation data and a decent accuracy (R2 of 0.712) was obtained on our in-house data. This highlights that the GraphRXN model can be deployed in an integrated workflow which combines robotics and AI technologies for forward reaction prediction.
Collapse
Affiliation(s)
- Baiqing Li
- Guangzhou Laboratory, Guangzhou, 510005, Guangdong, China
| | - Shimin Su
- Guangzhou Laboratory, Guangzhou, 510005, Guangdong, China
| | - Chan Zhu
- Guangzhou Laboratory, Guangzhou, 510005, Guangdong, China
| | - Jie Lin
- Guangzhou Laboratory, Guangzhou, 510005, Guangdong, China
| | - Xinyue Hu
- Guangzhou Laboratory, Guangzhou, 510005, Guangdong, China
| | - Lebin Su
- Guangzhou Laboratory, Guangzhou, 510005, Guangdong, China
| | - Zhunzhun Yu
- Guangzhou Laboratory, Guangzhou, 510005, Guangdong, China
| | - Kuangbiao Liao
- Guangzhou Laboratory, Guangzhou, 510005, Guangdong, China.
| | - Hongming Chen
- Guangzhou Laboratory, Guangzhou, 510005, Guangdong, China.
| |
Collapse
|
6
|
Rose BT, Timmerman JC, Bawel SA, Chin S, Zhang H, Denmark SE. High-Level Data Fusion Enables the Chemoinformatically Guided Discovery of Chiral Disulfonimide Catalysts for Atropselective Iodination of 2-Amino-6-arylpyridines. J Am Chem Soc 2022; 144:22950-22964. [DOI: 10.1021/jacs.2c08820] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Brennan T. Rose
- Department of Chemistry, University of Illinois at Urbana-Champaign, 600 South Mathews Avenue, Urbana, IIllinois 61801, United States
| | - Jacob C. Timmerman
- Department of Small Molecule Process Chemistry, Genentech, Inc., 1 DNA Way, South San Francisco, California 94080, United States
| | - Seth A. Bawel
- Department of Chemistry, University of Illinois at Urbana-Champaign, 600 South Mathews Avenue, Urbana, IIllinois 61801, United States
| | - Steven Chin
- Department of Small Molecule Process Chemistry, Genentech, Inc., 1 DNA Way, South San Francisco, California 94080, United States
| | - Haiming Zhang
- Department of Small Molecule Process Chemistry, Genentech, Inc., 1 DNA Way, South San Francisco, California 94080, United States
| | - Scott E. Denmark
- Department of Chemistry, University of Illinois at Urbana-Champaign, 600 South Mathews Avenue, Urbana, IIllinois 61801, United States
| |
Collapse
|
7
|
Gensch T, Smith SR, Colacot TJ, Timsina YN, Xu G, Glasspoole BW, Sigman MS. Design and Application of a Screening Set for Monophosphine Ligands in Cross-Coupling. ACS Catal 2022. [DOI: 10.1021/acscatal.2c01970] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Affiliation(s)
- Tobias Gensch
- Department of Chemistry, TU Berlin, Straße des 17. Juni 135, Sekr. C2, 10623 Berlin, Germany
| | - Sleight R. Smith
- Department of Chemistry, University of Utah, 315 South 1400 East, Salt Lake City, Utah 84112, United States
| | - Thomas J. Colacot
- MilliporeSigma, 6000 N. Teutonia Ave, Milwaukee, Wisconsin 53209, United States
| | - Yam N. Timsina
- MilliporeSigma, 6000 N. Teutonia Ave, Milwaukee, Wisconsin 53209, United States
| | - Guolin Xu
- MilliporeSigma, 6000 N. Teutonia Ave, Milwaukee, Wisconsin 53209, United States
| | - Ben W. Glasspoole
- MilliporeSigma, 6000 N. Teutonia Ave, Milwaukee, Wisconsin 53209, United States
| | - Matthew S. Sigman
- Department of Chemistry, University of Utah, 315 South 1400 East, Salt Lake City, Utah 84112, United States
| |
Collapse
|
8
|
Haas BC, Goetz AE, Bahamonde A, McWilliams JC, Sigman MS. Predicting relative efficiency of amide bond formation using multivariate linear regression. Proc Natl Acad Sci U S A 2022; 119:e2118451119. [PMID: 35412905 PMCID: PMC9169781 DOI: 10.1073/pnas.2118451119] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Accepted: 02/09/2022] [Indexed: 01/29/2023] Open
Abstract
Amides are ubiquitous in biologically active natural products and commercial drugs. The most common strategy for introducing this functional group is the coupling of a carboxylic acid with an amine, which requires the use of a coupling reagent to facilitate elimination of water. However, the optimal reaction conditions often appear rather arbitrary to the specific reaction. Herein, we report the development of statistical models correlating measured rates to physical organic descriptors to enable the prediction of reaction rates for untested carboxylic acid/amine pairs. The key to the success of this endeavor was the development of an end-to-end data science–based workflow to select a set of coupling partners that are appropriately distributed in chemical space to facilitate statistical model development. By using a parameterization, dimensionality reduction, and clustering protocol, a training set was identified. Reaction rates for a range of carboxylic acid and primary alkyl amine couplings utilizing carbonyldiimidazole (CDI) as the coupling reagent were measured. The collected rates span five orders of magnitude, confirming that the designed training set encompasses a wide range of chemical space necessary for effective model development. Regressing these rates with high-level density functional theory (DFT) descriptors allowed for identification of a statistical model wherein the molecular features of the carboxylic acid are primarily responsible for the observed rates. Finally, out-of-sample amide couplings are used to determine the limitations and effectiveness of the model.
Collapse
Affiliation(s)
- Brittany C. Haas
- Department of Chemistry, University of Utah, Salt Lake City, UT 84112
| | - Adam E. Goetz
- Chemical Research and Development, Groton Laboratories, Pfizer Worldwide Research and Development, Groton, CT 06340
| | - Ana Bahamonde
- Department of Chemistry, University of Utah, Salt Lake City, UT 84112
| | - J. Christopher McWilliams
- Chemical Research and Development, Groton Laboratories, Pfizer Worldwide Research and Development, Groton, CT 06340
| | - Matthew S. Sigman
- Department of Chemistry, University of Utah, Salt Lake City, UT 84112
| |
Collapse
|
9
|
Betinol IO, Reid JP. A predictive and mechanistic statistical modelling workflow for improving decision making in organic synthesis and catalysis. Org Biomol Chem 2022; 20:6012-6018. [PMID: 35389396 DOI: 10.1039/d2ob00272h] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
The application of multivariate linear regression models has been widely utilized as a strategy to streamline the reaction optimization process. While these tools likely provide relatively safe predictions, embedding a method for forecasting the probability of achieving the desired reaction outcome would be valuable for streamlining the identification of promising structures with the best chance of success. Herein, we present a workflow that predicts the probability that a reaction will be successful and is easy and quick to apply. We show that this probabilistic framework can effectively differentiate between predictions often indistinguishable by multivariate linear regression analysis. Moreover, these techniques can enhance the development of mechanistically informative correlations by producing more direct pathways for molecular development and design. Overall, we anticipate this protocol will be generally applicable and useful for accelerating successful chemical discovery.
Collapse
Affiliation(s)
- Isaiah O Betinol
- Department of Chemistry, University of British Columbia, Vancouver, British Columbia V6T 1Z1, Canada.
| | - Jolene P Reid
- Department of Chemistry, University of British Columbia, Vancouver, British Columbia V6T 1Z1, Canada.
| |
Collapse
|
10
|
Nandy A, Duan C, Taylor MG, Liu F, Steeves AH, Kulik HJ. Computational Discovery of Transition-metal Complexes: From High-throughput Screening to Machine Learning. Chem Rev 2021; 121:9927-10000. [PMID: 34260198 DOI: 10.1021/acs.chemrev.1c00347] [Citation(s) in RCA: 101] [Impact Index Per Article: 25.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Transition-metal complexes are attractive targets for the design of catalysts and functional materials. The behavior of the metal-organic bond, while very tunable for achieving target properties, is challenging to predict and necessitates searching a wide and complex space to identify needles in haystacks for target applications. This review will focus on the techniques that make high-throughput search of transition-metal chemical space feasible for the discovery of complexes with desirable properties. The review will cover the development, promise, and limitations of "traditional" computational chemistry (i.e., force field, semiempirical, and density functional theory methods) as it pertains to data generation for inorganic molecular discovery. The review will also discuss the opportunities and limitations in leveraging experimental data sources. We will focus on how advances in statistical modeling, artificial intelligence, multiobjective optimization, and automation accelerate discovery of lead compounds and design rules. The overall objective of this review is to showcase how bringing together advances from diverse areas of computational chemistry and computer science have enabled the rapid uncovering of structure-property relationships in transition-metal chemistry. We aim to highlight how unique considerations in motifs of metal-organic bonding (e.g., variable spin and oxidation state, and bonding strength/nature) set them and their discovery apart from more commonly considered organic molecules. We will also highlight how uncertainty and relative data scarcity in transition-metal chemistry motivate specific developments in machine learning representations, model training, and in computational chemistry. Finally, we will conclude with an outlook of areas of opportunity for the accelerated discovery of transition-metal complexes.
Collapse
Affiliation(s)
- Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Michael G Taylor
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Fang Liu
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Adam H Steeves
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
11
|
Rinehart NI, Zahrt AF, Henle JJ, Denmark SE. Dreams, False Starts, Dead Ends, and Redemption: A Chronicle of the Evolution of a Chemoinformatic Workflow for the Optimization of Enantioselective Catalysts. Acc Chem Res 2021; 54:2041-2054. [PMID: 33856771 DOI: 10.1021/acs.accounts.0c00826] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Catalyst design in enantioselective catalysis has historically been driven by empiricism. In this endeavor, experimentalists attempt to qualitatively identify trends in structure that lead to a desired catalyst function. In this body of work, we lay the groundwork for an improved, alternative workflow that uses quantitative methods to inform decision making at every step of the process. At the outset, we define a library of synthetically accessible permutations of a catalyst scaffold with the philosophy that the library contains every potential catalyst we are willing to make. To represent these chiral molecules, we have developed general 3D representations, which can be calculated for tens of thousands of structures. This defines the total chemical space of a given catalyst scaffold; it is constructed on the basis of catalyst structure only without regard to a specific reaction or mechanism. As such, any algorithmic subset selection method, which is unsupervised (i.e., only considers catalyst structure), should provide an ideal initial screening set for any new reaction that can be catalyzed by that scaffold. Notably, because this design strategy, the same set of catalysts can be used for any reaction that can be catalyzed with that parent catalyst scaffold. These are tested experimentally, and statistical learning tools can be used to create a model relating catalyst structure to catalyst function. Further, this model can be used to predict the performance of each catalyst candidate in the greater database of virtual catalyst candidates. In this way, it is possible estimate the performance of tens of thousands of catalysts by experimentally testing a smaller subset. Using error assessment metrics, it is possible to understand the confidence in new predictions. An experimentalist using this tool can balance the predicted results (reward) with the prediction confidence (risk) when deciding which catalysts to synthesize next in an optimization campaign. These catalysts are synthesized and tested experimentally. At this stage, either the optimization is a success or the predicted values were incorrect and further optimization is required. In the case of the latter, the information can be fed back into the statistical learning model to refine the model, and this iterative process can be used to determine the optimal catalyst. In this body of work, we not only establish this workflow but quantitatively establish how best to execute each step. Herein, we evaluate several 3D molecular representations to determine how best to represent molecules. Several selection protocols are examined to best decide which set of molecules can be used to represent the library of interest. In addition, the number of reactions needed to make accurate, statistical learning models is evaluated. Taken together these components establish a tool ready to progress from the development stage to the utility stage. As such, current research endeavors focus on applying these tools to optimize new reactions.
Collapse
Affiliation(s)
- N. Ian Rinehart
- Roger Adams Laboratory, Department of Chemistry, University of Illinois, Urbana, Illinois 61801, United States
| | - Andrew F. Zahrt
- Roger Adams Laboratory, Department of Chemistry, University of Illinois, Urbana, Illinois 61801, United States
| | - Jeremy J. Henle
- Roger Adams Laboratory, Department of Chemistry, University of Illinois, Urbana, Illinois 61801, United States
| | - Scott E. Denmark
- Roger Adams Laboratory, Department of Chemistry, University of Illinois, Urbana, Illinois 61801, United States
| |
Collapse
|