1
|
Pasquini M, Stenta M. LinChemIn: SynGraph-a data model and a toolkit to analyze and compare synthetic routes. J Cheminform 2023; 15:41. [PMID: 37005691 PMCID: PMC10067316 DOI: 10.1186/s13321-023-00714-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Accepted: 03/20/2023] [Indexed: 04/04/2023] Open
Abstract
BACKGROUND The increasing amount of chemical reaction data makes traditional ways to navigate its corpus less effective, while the demand for novel approaches and instruments is rising. Recent data science and machine learning techniques support the development of new ways to extract value from the available reaction data. On the one side, Computer-Aided Synthesis Planning tools can predict synthetic routes in a model-driven approach; on the other side, experimental routes can be extracted from the Network of Organic Chemistry, in which reaction data are linked in a network. In this context, the need to combine, compare and analyze synthetic routes generated by different sources arises naturally. RESULTS Here we present LinChemIn, a python toolkit that allows chemoinformatics operations on synthetic routes and reaction networks. Wrapping some third-party packages for handling graph arithmetic and chemoinformatics and implementing new data models and functionalities, LinChemIn allows the interconversion between data formats and data models and enables route-level analysis and operations, including route comparison and descriptors calculation. Object-Oriented Design principles inspire the software architecture, and the modules are structured to maximize code reusability and support code testing and refactoring. The code structure should facilitate external contributions, thus encouraging open and collaborative software development. CONCLUSIONS The current version of LinChemIn allows users to combine synthetic routes generated from various tools and analyze them, and constitutes an open and extensible framework capable of incorporating contributions from the community and fostering scientific discussion. Our roadmap envisages the development of sophisticated metrics for routes evaluation, a multi-parameter scoring system, and the implementation of an entire "ecosystem" of functionalities operating on synthetic routes. LinChemIn is freely available at https://github.com/syngenta/linchemin.
Collapse
Affiliation(s)
- Marta Pasquini
- Syngenta Crop Protection AG, Schaffhauserstrasse, 4332, Stein, AG, Switzerland.
| | - Marco Stenta
- Syngenta Crop Protection AG, Schaffhauserstrasse, 4332, Stein, AG, Switzerland
| |
Collapse
|
2
|
Venkatasubramanian V, Mann V. Artificial intelligence in reaction prediction and chemical synthesis. Curr Opin Chem Eng 2022. [DOI: 10.1016/j.coche.2021.100749] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
3
|
Zahoránszky-Kőhalmi G, Lysov N, Vorontcov I, Wang J, Soundararajan J, Metaxotos D, Mathew B, Sarosh R, Michael SG, Godfrey AG. Algorithm for the Pruning of Synthesis Graphs. J Chem Inf Model 2022; 62:2226-2238. [PMID: 35438992 PMCID: PMC9093600 DOI: 10.1021/acs.jcim.1c01202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Synthesis route planning is in the core of chemical intelligence that will power the autonomous chemistry platforms. In this task, we rely on algorithms to generate possible synthesis routes with the help of retro- and forward-synthetic approaches. Generated synthesis routes can be merged into a synthesis graph which represents theoretical pathways to the target molecule. However, it is often required to modify a synthesis graph due to typical constraints. These constraints might include "undesirable substances", e.g., an intermediate that the chemist does not favor or substances that might be toxic. Consequently, we need to prune the synthesis graph by the elimination of such undesirable substances. Synthesis graphs can be represented as directed (not necessarily acyclic) bipartite graphs, and the pruning of such graphs in the light of a set of undesirable substances has been an open question. In this study, we present the Synthesis Graph Pruning (SGP) algorithm that addresses this question. The input to the SGP algorithm is a synthesis graph and a set of undesirable substances. Furthermore, information for substances is provided as metadata regarding their availability from the inventory. The SGP algorithm operates with a simple local rule set, in order to determine which nodes and edges need to be eliminated from the synthesis graph. In this study, we present the SGP algorithm in detail and provide several case studies that demonstrate the operation of the SGP algorithm. We believe that the SGP algorithm will be an essential component of computer aided synthesis planning.
Collapse
Affiliation(s)
| | - Nikita Lysov
- National Center for Advancing Translational Sciences, Rockville, Maryland 20850, United States
| | - Ilia Vorontcov
- National Center for Advancing Translational Sciences, Rockville, Maryland 20850, United States
| | - Jeffrey Wang
- National Center for Advancing Translational Sciences, Rockville, Maryland 20850, United States
| | - Jeyaraman Soundararajan
- National Center for Advancing Translational Sciences, Rockville, Maryland 20850, United States
| | - Dimitrios Metaxotos
- National Center for Advancing Translational Sciences, Rockville, Maryland 20850, United States
| | - Biju Mathew
- National Center for Advancing Translational Sciences, Rockville, Maryland 20850, United States
| | - Rafat Sarosh
- National Center for Advancing Translational Sciences, Rockville, Maryland 20850, United States
| | - Samuel G Michael
- National Center for Advancing Translational Sciences, Rockville, Maryland 20850, United States
| | - Alexander G Godfrey
- National Center for Advancing Translational Sciences, Rockville, Maryland 20850, United States
| |
Collapse
|
4
|
Warr WA, Nicklaus MC, Nicolaou CA, Rarey M. Exploration of Ultralarge Compound Collections for Drug Discovery. J Chem Inf Model 2022; 62:2021-2034. [PMID: 35421301 DOI: 10.1021/acs.jcim.2c00224] [Citation(s) in RCA: 35] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Designing new medicines more cheaply and quickly is tightly linked to the quest of exploring chemical space more widely and efficiently. Chemical space is monumentally large, but recent advances in computer software and hardware have enabled researchers to navigate virtual chemical spaces containing billions of chemical structures. This review specifically concerns collections of many millions or even billions of enumerated chemical structures as well as even larger chemical spaces that are not fully enumerated. We present examples of chemical libraries and spaces and the means used to construct them, and we discuss new technologies for searching huge libraries and for searching combinatorially in chemical space. We also cover space navigation techniques and consider new approaches to de novo drug design and the impact of the "autonomous laboratory" on synthesis of designed compounds. Finally, we summarize some other challenges and opportunities for the future.
Collapse
Affiliation(s)
- Wendy A Warr
- Wendy Warr & Associates, 6 Berwick Court, Holmes Chapel, Crewe, Cheshire CW4 7HZ, United Kingdom
| | - Marc C Nicklaus
- NCI, NIH, CADD Group, NCI-Frederick, Frederick, Maryland 21702, United States
| | - Christos A Nicolaou
- Discovery Chemistry, Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, Indiana 46285, United States
| | - Matthias Rarey
- Universität Hamburg, ZBH Center for Bioinformatics, 20146 Hamburg, Germany
| |
Collapse
|
5
|
Hardy MA, Nan B, Wiest O, Sarpong R. Strategic elements in computer-assisted retrosynthesis: A case study of the pupukeanane natural products. Tetrahedron 2022; 104:132584. [PMID: 36743342 PMCID: PMC9893929 DOI: 10.1016/j.tet.2021.132584] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Computer-assisted synthesis planning represents a growing area of research, especially for complex molecule synthesis. Here, we present a case study involving the pupukeanane natural products, which are complex, marine-derived, natural products with unique tricyclic scaffolds. Proposed routes to members of each skeletal class informed by pathways generated using the program Synthia™ are compared to previous syntheses of these molecules. In addition, novel synthesis routes are proposed to pupukeanane congeners that have not been prepared previously.
Collapse
Affiliation(s)
- Melissa A. Hardy
- Department of Chemistry, University of California, Berkeley, CA, 94720, United States
| | - Bozhao Nan
- Department of Chemistry & Biochemistry, University of Notre Dame, Notre Dame, IN, 46556, United States
| | - Olaf Wiest
- Department of Chemistry & Biochemistry, University of Notre Dame, Notre Dame, IN, 46556, United States
- Corresponding author. (O. Wiest)
| | - Richmond Sarpong
- Department of Chemistry, University of California, Berkeley, CA, 94720, United States
- Corresponding author. (R. Sarpong)
| |
Collapse
|
6
|
Zabolotna Y, Volochnyuk DM, Ryabukhin SV, Horvath D, Gavrilenko KS, Marcou G, Moroz YS, Oksiuta O, Varnek A. A Close-up Look at the Chemical Space of Commercially Available Building Blocks for Medicinal Chemistry. J Chem Inf Model 2021; 62:2171-2185. [PMID: 34928600 DOI: 10.1021/acs.jcim.1c00811] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The ability to efficiently synthesize desired compounds can be a limiting factor for chemical space exploration in drug discovery. This ability is conditioned not only by the existence of well-studied synthetic protocols but also by the availability of corresponding reagents, so-called building blocks (BBs). In this work, we present a detailed analysis of the chemical space of 400 000 purchasable BBs. The chemical space was defined by corresponding synthons─fragments contributed to the final molecules upon reaction. They allow an analysis of BB physicochemical properties and diversity, unbiased by the leaving and protective groups in actual reagents. The main classes of BBs were analyzed in terms of their availability, rule-of-two-defined quality, and diversity. Available BBs were eventually compared to a reference set of biologically relevant synthons derived from ChEMBL fragmentation, in order to illustrate how well they cover the actual medicinal chemistry needs. This was performed on a newly constructed universal generative topographic map of synthon chemical space that enables visualization of both libraries and analysis of their overlapped and library-specific regions.
Collapse
Affiliation(s)
- Yuliana Zabolotna
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081, France
| | - Dmitriy M Volochnyuk
- Institute of Organic Chemistry, National Academy of Sciences of Ukraine, Murmanska Street 5, Kyiv 02660, Ukraine.,Enamine Ltd., 78 Chervonotkatska str., 02660 Kiev, Ukraine
| | - Sergey V Ryabukhin
- The Institute of High Technologies, Kyiv National Taras Shevchenko University, 64 Volodymyrska Street, Kyiv 01601, Ukraine.,Enamine Ltd., 78 Chervonotkatska str., 02660 Kiev, Ukraine
| | - Dragos Horvath
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081, France
| | - Konstantin S Gavrilenko
- Research-And-Education ChemBioCenter, National Taras Shevchenko University of Kyiv, Chervonotkatska str., 61, 03022 Kiev, Ukraine.,Enamine Ltd., 78 Chervonotkatska str., 02660 Kiev, Ukraine
| | - Gilles Marcou
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081, France
| | - Yurii S Moroz
- Research-And-Education ChemBioCenter, National Taras Shevchenko University of Kyiv, Chervonotkatska str., 61, 03022 Kiev, Ukraine.,Chemspace, Chervonotkatska Street 78, 02094 Kyiv, Ukraine
| | - Oleksandr Oksiuta
- Institute of Organic Chemistry, National Academy of Sciences of Ukraine, Murmanska Street 5, Kyiv 02660, Ukraine.,Chemspace, Chervonotkatska Street 78, 02094 Kyiv, Ukraine
| | - Alexandre Varnek
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081, France.,Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, 001-0021 Sapporo, Japan
| |
Collapse
|
7
|
Szymkuć S, Badowski T, Grzybowski BA. Is Organic Chemistry Really Growing Exponentially? Angew Chem Int Ed Engl 2021. [DOI: 10.1002/ange.202111540] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Sara Szymkuć
- Institute of Organic Chemistry Polish Academy of Sciences Ul. Kasprzaka 44/52 01-224 Warsaw Poland
- Allchemy, Inc. Highland IN USA
| | - Tomasz Badowski
- Institute of Organic Chemistry Polish Academy of Sciences Ul. Kasprzaka 44/52 01-224 Warsaw Poland
- Allchemy, Inc. Highland IN USA
| | - Bartosz A. Grzybowski
- Institute of Organic Chemistry Polish Academy of Sciences Ul. Kasprzaka 44/52 01-224 Warsaw Poland
- Allchemy, Inc. Highland IN USA
- IBS Center for Soft and Living Matter and Department of Chemistry UNIST 50, UNIST-gil, Eonyang-eup, Ulju-gun Ulsan South Korea
| |
Collapse
|
8
|
Szymkuć S, Badowski T, Grzybowski BA. Is Organic Chemistry Really Growing Exponentially? Angew Chem Int Ed Engl 2021; 60:26226-26232. [PMID: 34558168 DOI: 10.1002/anie.202111540] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Indexed: 11/05/2022]
Abstract
In terms of molecules and specific reaction examples, organic chemistry features an impressive, exponential growth. However, new reaction classes/types that fuel this growth are being discovered at a much slower and only linear (or even sublinear) rate. The proportion of newly discovered reaction types to all reactions being performed keeps decreasing, suggesting that synthetic chemistry becomes more reliant on reusing the well-known methods. The newly discovered chemistries are more complex than decades ago and allow for the rapid construction of complex scaffolds in fewer numbers of steps. We study these and other trends in the function of time, reaction-type popularity and complexity based on the algorithm that extracts generalized reaction class templates. These analyses are useful in the context of computer-assisted synthesis, machine learning (to estimate the numbers of models with sufficient reaction statistics), and identifying erroneous entries in reaction databases.
Collapse
Affiliation(s)
- Sara Szymkuć
- Institute of Organic Chemistry, Polish Academy of Sciences, Ul. Kasprzaka 44/52, 01-224, Warsaw, Poland.,Allchemy, Inc., Highland, IN, USA
| | - Tomasz Badowski
- Institute of Organic Chemistry, Polish Academy of Sciences, Ul. Kasprzaka 44/52, 01-224, Warsaw, Poland.,Allchemy, Inc., Highland, IN, USA
| | - Bartosz A Grzybowski
- Institute of Organic Chemistry, Polish Academy of Sciences, Ul. Kasprzaka 44/52, 01-224, Warsaw, Poland.,Allchemy, Inc., Highland, IN, USA.,IBS Center for Soft and Living Matter and Department of Chemistry, UNIST, 50, UNIST-gil, Eonyang-eup, Ulju-gun, Ulsan, South Korea
| |
Collapse
|
9
|
Tse EG, Aithani L, Anderson M, Cardoso-Silva J, Cincilla G, Conduit GJ, Galushka M, Guan D, Hallyburton I, Irwin BWJ, Kirk K, Lehane AM, Lindblom JCR, Lui R, Matthews S, McCulloch J, Motion A, Ng HL, Öeren M, Robertson MN, Spadavecchio V, Tatsis VA, van Hoorn WP, Wade AD, Whitehead TM, Willis P, Todd MH. An Open Drug Discovery Competition: Experimental Validation of Predictive Models in a Series of Novel Antimalarials. J Med Chem 2021; 64:16450-16463. [PMID: 34748707 DOI: 10.1021/acs.jmedchem.1c00313] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
The Open Source Malaria (OSM) consortium is developing compounds that kill the human malaria parasite, Plasmodium falciparum, by targeting PfATP4, an essential ion pump on the parasite surface. The structure of PfATP4 has not been determined. Here, we describe a public competition created to develop a predictive model for the identification of PfATP4 inhibitors, thereby reducing project costs associated with the synthesis of inactive compounds. Competition participants could see all entries as they were submitted. In the final round, featuring private sector entrants specializing in machine learning methods, the best-performing models were used to predict novel inhibitors, of which several were synthesized and evaluated against the parasite. Half possessed biological activity, with one featuring a motif that the human chemists familiar with this series would have dismissed as "ill-advised". Since all data and participant interactions remain in the public domain, this research project "lives" and may be improved by others.
Collapse
Affiliation(s)
- Edwin G Tse
- School of Pharmacy, University College London, London WC1N 1AX, U.K
| | - Laksh Aithani
- Exscientia Ltd., The Schrödinger Building, Oxford Science Park, Oxford OX4 4GE, U.K
| | - Mark Anderson
- Drug Discovery Unit, Division of Biological Chemistry and Drug Discovery, School of Life Sciences, University of Dundee, Dundee DD1 5EH, U.K
| | - Jonathan Cardoso-Silva
- Department of Informatics, Faculty of Natural and Mathematical Sciences, King's College London, London WC2B 4BG, U.K
| | | | - Gareth J Conduit
- Intellegens Ltd., Eagle Labs, Chesterton Road, Cambridge CB4 3AZ, U.K.,Theory of Condensed Matter Group, Cavendish Laboratories, University of Cambridge, Cambridge CB3 0HE, U.K
| | | | - Davy Guan
- School of Medical Sciences, The University of Sydney, Sydney, NSW 2006, Australia
| | - Irene Hallyburton
- Drug Discovery Unit, Division of Biological Chemistry and Drug Discovery, School of Life Sciences, University of Dundee, Dundee DD1 5EH, U.K
| | - Benedict W J Irwin
- Theory of Condensed Matter Group, Cavendish Laboratories, University of Cambridge, Cambridge CB3 0HE, U.K.,Optibrium Ltd. Blenheim House, Denny End Road, Cambridge CB25 9QE, U.K
| | - Kiaran Kirk
- Research School of Biology, Australian National University, Canberra, ACT 2601, Australia
| | - Adele M Lehane
- Research School of Biology, Australian National University, Canberra, ACT 2601, Australia
| | - Julia C R Lindblom
- Research School of Biology, Australian National University, Canberra, ACT 2601, Australia
| | - Raymond Lui
- School of Medical Sciences, The University of Sydney, Sydney, NSW 2006, Australia
| | - Slade Matthews
- School of Medical Sciences, The University of Sydney, Sydney, NSW 2006, Australia
| | - James McCulloch
- Kellerberrin, 6 Wharf Rd, Balmain, Sydney, NSW 2041, Australia
| | - Alice Motion
- School of Chemistry, The University of Sydney, Sydney, NSW 2006, Australia
| | - Ho Leung Ng
- Department of Biochemistry and Molecular Biophysics, Kansas State University, Manhattan Kansas 66506, United States
| | - Mario Öeren
- Optibrium Ltd. Blenheim House, Denny End Road, Cambridge CB25 9QE, U.K
| | - Murray N Robertson
- Strathclyde Institute Of Pharmacy And Biomedical Sciences, University of Strathclyde, Glasgow G4 ORE, U.K
| | | | - Vasileios A Tatsis
- Exscientia Ltd., The Schrödinger Building, Oxford Science Park, Oxford OX4 4GE, U.K
| | - Willem P van Hoorn
- Exscientia Ltd., The Schrödinger Building, Oxford Science Park, Oxford OX4 4GE, U.K
| | - Alexander D Wade
- Theory of Condensed Matter Group, Cavendish Laboratories, University of Cambridge, Cambridge CB3 0HE, U.K
| | | | - Paul Willis
- Medicines for Malaria Venture, PO Box 1826, 20 rte de Pre-Bois, 1215 Geneva 15, Switzerland
| | - Matthew H Todd
- School of Pharmacy, University College London, London WC1N 1AX, U.K
| |
Collapse
|
10
|
Vaucher AC, Schwaller P, Geluykens J, Nair VH, Iuliano A, Laino T. Inferring experimental procedures from text-based representations of chemical reactions. Nat Commun 2021; 12:2573. [PMID: 33958589 PMCID: PMC8102565 DOI: 10.1038/s41467-021-22951-1] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2020] [Accepted: 04/07/2021] [Indexed: 11/19/2022] Open
Abstract
The experimental execution of chemical reactions is a context-dependent and time-consuming process, often solved using the experience collected over multiple decades of laboratory work or searching similar, already executed, experimental protocols. Although data-driven schemes, such as retrosynthetic models, are becoming established technologies in synthetic organic chemistry, the conversion of proposed synthetic routes to experimental procedures remains a burden on the shoulder of domain experts. In this work, we present data-driven models for predicting the entire sequence of synthesis steps starting from a textual representation of a chemical equation, for application in batch organic chemistry. We generated a data set of 693,517 chemical equations and associated action sequences by extracting and processing experimental procedure text from patents, using state-of-the-art natural language models. We used the attained data set to train three different models: a nearest-neighbor model based on recently-introduced reaction fingerprints, and two deep-learning sequence-to-sequence models based on the Transformer and BART architectures. An analysis by a trained chemist revealed that the predicted action sequences are adequate for execution without human intervention in more than 50% of the cases.
Collapse
Affiliation(s)
| | | | | | | | - Anna Iuliano
- Dipartimento di Chimica e Chimica Industriale, Università di Pisa, Pisa, Italy
| | | |
Collapse
|
11
|
|
12
|
Daley SK, Cordell GA. Natural Products, the Fourth Industrial Revolution, and the Quintuple Helix. Nat Prod Commun 2021. [DOI: 10.1177/1934578x211003029] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
The profound interconnectedness of the sciences and technologies embodied in the Fourth Industrial Revolution is discussed in terms of the global role of natural products, and how that interplays with the development of sustainable and climate-conscious practices of cyberecoethnopharmacolomics within the Quintuple Helix for the promotion of a healthier planet and society.
Collapse
Affiliation(s)
| | - Geoffrey A. Cordell
- Natural Products Inc., Evanston, IL, USA
- Department of Pharmaceutics, College of Pharmacy, University of Florida, Gainesville, FL, USA
| |
Collapse
|
13
|
Maser MR, Cui AY, Ryou S, DeLano TJ, Yue Y, Reisman SE. Multilabel Classification Models for the Prediction of Cross-Coupling Reaction Conditions. J Chem Inf Model 2021; 61:156-166. [PMID: 33417449 DOI: 10.1021/acs.jcim.0c01234] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Machine-learned ranking models have been developed for the prediction of substrate-specific cross-coupling reaction conditions. Data sets of published reactions were curated for Suzuki, Negishi, and C-N couplings, as well as Pauson-Khand reactions. String, descriptor, and graph encodings were tested as input representations, and models were trained to predict the set of conditions used in a reaction as a binary vector. Unique reagent dictionaries categorized by expert-crafted reaction roles were constructed for each data set, leading to context-aware predictions. We find that relational graph convolutional networks and gradient-boosting machines are very effective for this learning task, and we disclose a novel reaction-level graph attention operation in the top-performing model.
Collapse
Affiliation(s)
- Michael R Maser
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, United States
| | - Alexander Y Cui
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, California 91125, United States
| | - Serim Ryou
- Computational Vision Lab, California Institute of Technology, Pasadena, California 91125, United States
| | - Travis J DeLano
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, United States
| | - Yisong Yue
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, California 91125, United States
| | - Sarah E Reisman
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, United States
| |
Collapse
|
14
|
Abstract
The identification of synthetic routes that end with the desired product is considered an inherently time-consuming process that is largely dependent on expert knowledge regarding a limited proportion of the entire reaction space. At present, emerging machine learning technologies are reformulating the process of retrosynthetic planning. This study aimed to discover synthetic routes backwardly from a given desired molecule to commercially available compounds. The problem is reduced to a combinatorial optimization task with the solution space subject to the combinatorial complexity of all possible pairs of purchasable reactants. We address this issue within the framework of Bayesian inference and computation. The workflow consists of the training of a deep neural network, which is used to forwardly predict a product of the given reactants with a high level of accuracy, followed by inversion of the forward model into the backward one via Bayes' law of conditional probability. Using the backward model, a diverse set of highly probable reaction sequences ending with a given synthetic target is exhaustively explored using a Monte Carlo search algorithm. With a forward model prediction accuracy of approximately 87%, the Bayesian retrosynthesis algorithm successfully rediscovered 81.8 and 33.3% of known synthetic routes of one-step and two-step reactions, respectively, with top-10 accuracy. Remarkably, the Monte Carlo algorithm, which was specifically designed for the presence of multiple diverse routes, often revealed a ranked list of hundreds of reaction routes to the same synthetic target. We also investigated the potential applicability of such diverse candidates based on expert knowledge of synthetic organic chemistry.
Collapse
Affiliation(s)
- Zhongliang Guo
- The Institute of Statistical Mathematics, Research Organization of Information and Systems, Tachikawa, Tokyo 190-8562, Japan
| | - Stephen Wu
- The Institute of Statistical Mathematics, Research Organization of Information and Systems, Tachikawa, Tokyo 190-8562, Japan.,The Graduate University for Advanced Studies, SOKENDAI, Tachikawa, Tokyo 190-8562, Japan
| | - Mitsuru Ohno
- Daicel Corporation, Kita-ku, Osaka 530-0011, Japan
| | - Ryo Yoshida
- The Institute of Statistical Mathematics, Research Organization of Information and Systems, Tachikawa, Tokyo 190-8562, Japan.,The Graduate University for Advanced Studies, SOKENDAI, Tachikawa, Tokyo 190-8562, Japan.,National Institute for Materials Science, Tsukuba, Ibaraki 305-0047, Japan
| |
Collapse
|
15
|
Wang X, Qian Y, Gao H, Coley CW, Mo Y, Barzilay R, Jensen KF. Towards efficient discovery of green synthetic pathways with Monte Carlo tree search and reinforcement learning. Chem Sci 2020; 11:10959-10972. [PMID: 34094345 PMCID: PMC8162445 DOI: 10.1039/d0sc04184j] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Accepted: 09/11/2020] [Indexed: 12/25/2022] Open
Abstract
Computer aided synthesis planning of synthetic pathways with green process conditions has become of increasing importance in organic chemistry, but the large search space inherent in synthesis planning and the difficulty in predicting reaction conditions make it a significant challenge. We introduce a new Monte Carlo Tree Search (MCTS) variant that promotes balance between exploration and exploitation across the synthesis space. Together with a value network trained from reinforcement learning and a solvent-prediction neural network, our algorithm is comparable to the best MCTS variant (PUCT, similar to Google's Alpha Go) in finding valid synthesis pathways within a fixed searching time, and superior in identifying shorter routes with greener solvents under the same search conditions. In addition, with the same root compound visit count, our algorithm outperforms the PUCT MCTS by 16% in terms of determining successful routes. Overall the success rate is improved by 19.7% compared to the upper confidence bound applied to trees (UCT) MCTS method. Moreover, we improve 71.4% of the routes proposed by the PUCT MCTS variant in pathway length and choices of green solvents. The approach generally enables including Green Chemistry considerations in computer aided synthesis planning with potential applications in process development for fine chemicals or pharmaceuticals.
Collapse
Affiliation(s)
- Xiaoxue Wang
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
- Department of Chemical and Biomolecular Engineering, The Ohio State University Columbus Ohio 43210 USA
| | - Yujie Qian
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
| | - Hanyu Gao
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
| | - Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
| | - Yiming Mo
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
| | - Regina Barzilay
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
| | - Klavs F Jensen
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
| |
Collapse
|
16
|
Struble TJ, Alvarez JC, Brown SP, Chytil M, Cisar J, DesJarlais RL, Engkvist O, Frank SA, Greve DR, Griffin DJ, Hou X, Johannes JW, Kreatsoulas C, Lahue B, Mathea M, Mogk G, Nicolaou CA, Palmer AD, Price DJ, Robinson RI, Salentin S, Xing L, Jaakkola T, Green WH, Barzilay R, Coley CW, Jensen KF. Current and Future Roles of Artificial Intelligence in Medicinal Chemistry Synthesis. J Med Chem 2020; 63:8667-8682. [PMID: 32243158 PMCID: PMC7457232 DOI: 10.1021/acs.jmedchem.9b02120] [Citation(s) in RCA: 79] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
![]()
Artificial
intelligence and machine learning have demonstrated
their potential role in predictive chemistry and synthetic planning
of small molecules; there are at least a few reports of companies
employing in silico synthetic planning into their
overall approach to accessing target molecules. A data-driven synthesis
planning program is one component being developed and evaluated by
the Machine Learning for Pharmaceutical Discovery and Synthesis (MLPDS)
consortium, comprising MIT and 13 chemical and pharmaceutical company
members. Together, we wrote this perspective to share how we think
predictive models can be integrated into medicinal chemistry synthesis
workflows, how they are currently used within MLPDS member companies,
and the outlook for this field.
Collapse
Affiliation(s)
- Thomas J Struble
- Department of Chemical Engineering, MIT, Cambridge, Massachusetts 02139, United States
| | - Juan C Alvarez
- Computational and Structural Chemistry, Merck & Co. Inc., Kenilworth, New Jersey 07033, United States
| | - Scott P Brown
- Sunovion Pharmaceuticals Inc., Marlborough, Massachusetts 01752, United States
| | - Milan Chytil
- Sunovion Pharmaceuticals Inc., Marlborough, Massachusetts 01752, United States
| | - Justin Cisar
- Janssen Research & Development LLC, Spring House, Pennsylvania 19477, United States
| | - Renee L DesJarlais
- Janssen Research & Development LLC, Spring House, Pennsylvania 19477, United States
| | - Ola Engkvist
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca, 431 83 Mölndal, Sweden
| | - Scott A Frank
- Eli Lilly and Company, Indianapolis, Indiana 46285, United States
| | - Daniel R Greve
- LEO Pharma A/S, Industriparken 55, DK-2750 Ballerup, Denmark
| | | | - Xinjun Hou
- Pfizer Inc., Cambridge, Massachusetts 02139, United States
| | - Jeffrey W Johannes
- Medicinal Chemistry, Early Oncology, Oncology R&D, AstraZeneca, Boston, Massachusetts 02451, United States
| | | | - Brian Lahue
- Computational and Structural Chemistry, Merck & Co. Inc., Kenilworth, New Jersey 07033, United States
| | - Miriam Mathea
- BASF SE, Carl-Bosch-Strasse 38, 67056 Ludwigshafen am Rhein, Germany
| | | | | | - Andrew D Palmer
- BASF SE, Carl-Bosch-Strasse 38, 67056 Ludwigshafen am Rhein, Germany
| | - Daniel J Price
- GlaxoSmithKline, Collegeville, Pennsylvania 19426, United States
| | - Richard I Robinson
- Novartis Institutes for BioMedical Research, Cambridge, Massachusetts 02139, United States
| | | | - Li Xing
- WuXi AppTec, Cambridge, Massachusetts 02142, United States
| | - Tommi Jaakkola
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, Massachusetts 02139, United States
| | - William H Green
- Department of Chemical Engineering, MIT, Cambridge, Massachusetts 02139, United States
| | - Regina Barzilay
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, Massachusetts 02139, United States
| | - Connor W Coley
- Department of Chemical Engineering, MIT, Cambridge, Massachusetts 02139, United States
| | - Klavs F Jensen
- Department of Chemical Engineering, MIT, Cambridge, Massachusetts 02139, United States
| |
Collapse
|