1
|
Pham TT, Guo Z, Li B, Lapkin AA, Yan N. Synthesis of Pyrrole-2-Carboxylic Acid from Cellulose- and Chitin-Based Feedstocks Discovered by the Automated Route Search. CHEMSUSCHEM 2024; 17:e202300538. [PMID: 37792551 DOI: 10.1002/cssc.202300538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 10/02/2023] [Accepted: 10/04/2023] [Indexed: 10/06/2023]
Abstract
The shift towards sustainable feedstocks for platform chemicals requires new routes to access functional molecules that contain heteroatoms, but there are limited bio-derived feedstocks that lead to heteroatoms in platform chemicals. Combining renewable molecules of different origins could be a solution to optimize the use of atoms from renewable sources. However, the lack of retrosynthetic tools makes it challenging to examine the extensive reaction networks of various platform molecules focusing on multiple bio-based feedstocks. In this study, a protocol was developed to identify potential transformation pathways that allow for the use of feedstocks from different origins. By analyzing existing knowledge on chemical reactions in large databases, several promising synthetic routes were shortlisted, with the reaction of D-glucosamine and pyruvic acid being the most interesting to make pyrrole-2-carboxylic acid (PCA). The optimized synthetic conditions resulted in 50 % yield of PCA, with insights gained from temperature variant NMR studies. The use of substrates obtained from two different bio-feedstock bases, namely cellulose and chitin, allowed for the establishment of a PCA-based chemical space.
Collapse
Affiliation(s)
- Thuy Trang Pham
- Department of Chemical and Biomolecular Engineering, National University of Singapore, 4 Engineering Drive 4, 117585, Singapore City, Singapore
| | - Zhen Guo
- Cambridge Centre for Advanced Research and Education in Singapore (CARES Ltd), 1 CREATE Way, #05-05 Create Tower, 138602, Singapore City, Singapore
- Chemical Data Intelligence (CDI) Pte Ltd, Robinson Road #02-00, 068898, Singapore City, Singapore
| | - Bing Li
- Department of Chemical and Biomolecular Engineering, National University of Singapore, 4 Engineering Drive 4, 117585, Singapore City, Singapore
| | - Alexei A Lapkin
- Cambridge Centre for Advanced Research and Education in Singapore (CARES Ltd), 1 CREATE Way, #05-05 Create Tower, 138602, Singapore City, Singapore
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, CB3 0AS, UK
| | - Ning Yan
- Department of Chemical and Biomolecular Engineering, National University of Singapore, 4 Engineering Drive 4, 117585, Singapore City, Singapore
| |
Collapse
|
2
|
Zhang B, Lin J, Du L, Zhang L. Harnessing Data Augmentation and Normalization Preprocessing to Improve the Performance of Chemical Reaction Predictions of Data-Driven Model. Polymers (Basel) 2023; 15:polym15092224. [PMID: 37177370 PMCID: PMC10180765 DOI: 10.3390/polym15092224] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 05/03/2023] [Accepted: 05/03/2023] [Indexed: 05/15/2023] Open
Abstract
As a template-free, data-driven methodology, the molecular transformer model provides an alternative by which to predict the outcome of chemical reactions and design the route of the retrosynthetic plane in the field of organic synthesis and polymer chemistry. However, in consideration of the small datasets of chemical reactions, the data-driven model suffers from the difficulty of low accuracy in the prediction tasks of chemical reactions. In this contribution, we integrate the molecular transformer model with the strategies of data augmentation and normalization preprocessing to accomplish the three tasks of chemical reactions, including the forward predictions of chemical reactions, and single-step retrosynthetic predictions with and without the reaction classes. It is clearly demonstrated that the prediction accuracy of the molecular transformer model can be significantly raised by the use of proposed strategies for the three tasks of chemical reactions. Notably, after the introduction of the 40-level data augmentation and normalization preprocessing, the top-1 accuracy of the forward prediction increases markedly from 71.6% to 84.2% and the top-1 accuracy of the single-step retrosynthetic prediction with additional reaction class increases from 53.2% to 63.4%. Furthermore, it is found that the superior performance of the data-driven model originates from the correction of the grammatical errors of the SMILES strings, especially for the case of the reaction classes with small datasets.
Collapse
Affiliation(s)
- Boyu Zhang
- Shanghai Key Laboratory of Advanced Polymeric Materials, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Jiaping Lin
- Shanghai Key Laboratory of Advanced Polymeric Materials, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Lei Du
- Shanghai Key Laboratory of Advanced Polymeric Materials, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Liangshun Zhang
- Shanghai Key Laboratory of Advanced Polymeric Materials, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| |
Collapse
|
3
|
Türtscher PL, Reiher M. Pathfinder─Navigating and Analyzing Chemical Reaction Networks with an Efficient Graph-Based Approach. J Chem Inf Model 2023; 63:147-160. [PMID: 36515968 PMCID: PMC9832502 DOI: 10.1021/acs.jcim.2c01136] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
While the field of first-principles explorations into chemical reaction space has been continuously growing, the development of strategies for analyzing resulting chemical reaction networks (CRNs) is lagging behind. A CRN consists of compounds linked by reactions. Analyzing how these compounds are transformed into one another based on kinetic modeling is a nontrivial task. Here, we present the graph-optimization-driven algorithm and program Pathfinder to allow for such an analysis of a CRN. The CRN for this work has been obtained with our open-source Chemoton reaction network exploration software. Chemoton probes reactive combinations of compounds for elementary steps and sorts them into reactions. By encoding these reactions of the CRN as a graph consisting of compound and reaction vertices and adding information about activation barriers as well as required reagents to the edges of the graph yields a complete graph-theoretical representation of the CRN. Since the probabilities of the formation of compounds depend on the starting conditions, the consumption of any compound during a reaction must be accounted for to reflect the availability of reagents. To account for this, we introduce compound costs to reflect compound availability. Simultaneously, the determined compound costs rank the compounds in the CRN in terms of their probability to be formed. This ranking then allows us to probe easily accessible compounds in the CRN first for further explorations into yet unexplored terrain. We first illustrate the working principle on an abstract small CRN. Afterward, Pathfinder is demonstrated in the example of the disproportionation of iodine with water and the comproportionation of iodic acid and hydrogen iodide. Both processes are analyzed within the same CRN, which we construct with our autonomous first-principles CRN exploration software Chemoton [Unsleber, J. P.; J. Chem. Theory Comput. 2022, 18, 5393-5409] guided by Pathfinder.
Collapse
|
4
|
Wen M, Spotte-Smith EWC, Blau SM, McDermott MJ, Krishnapriyan AS, Persson KA. Chemical reaction networks and opportunities for machine learning. NATURE COMPUTATIONAL SCIENCE 2023; 3:12-24. [PMID: 38177958 DOI: 10.1038/s43588-022-00369-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Accepted: 11/08/2022] [Indexed: 01/06/2024]
Abstract
Chemical reaction networks (CRNs), defined by sets of species and possible reactions between them, are widely used to interrogate chemical systems. To capture increasingly complex phenomena, CRNs can be leveraged alongside data-driven methods and machine learning (ML). In this Perspective, we assess the diverse strategies available for CRN construction and analysis in pursuit of a wide range of scientific goals, discuss ML techniques currently being applied to CRNs and outline future CRN-ML approaches, presenting scientific and technical challenges to overcome.
Collapse
Affiliation(s)
- Mingjian Wen
- Chemical and Biomolecular Engineering, University of Houston, Houston, TX, USA
- Energy Technologies Area, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Evan Walter Clark Spotte-Smith
- Materials Science Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Materials Science and Engineering, University of California, Berkeley, Berkeley, CA, USA
| | - Samuel M Blau
- Energy Technologies Area, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Matthew J McDermott
- Materials Science Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Materials Science and Engineering, University of California, Berkeley, Berkeley, CA, USA
| | - Aditi S Krishnapriyan
- Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Chemical and Biomolecular Engineering, University of California, Berkeley, Berkeley, CA, USA
- Electrical Engineering and Computer Science, University of California, Berkeley, Berkeley, CA, USA
| | - Kristin A Persson
- Materials Science and Engineering, University of California, Berkeley, Berkeley, CA, USA.
- Molecular Foundry, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
| |
Collapse
|
5
|
Sun J, Wen M, Wang H, Ruan Y, Yang Q, Kang X, Zhang H, Zhang Z, Lu H. Prediction of drug-likeness using graph convolutional attention network. Bioinformatics 2022; 38:5262-5269. [PMID: 36222555 DOI: 10.1093/bioinformatics/btac676] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 09/22/2022] [Accepted: 10/08/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION The drug-likeness has been widely used as a criterion to distinguish drug-like molecules from non-drugs. Developing reliable computational methods to predict the drug-likeness of compounds is crucial to triage unpromising molecules and accelerate the drug discovery process. RESULTS In this study, a deep learning method was developed to predict the drug-likeness based on the graph convolutional attention network (D-GCAN) directly from molecular structures. Results showed that the D-GCAN model outperformed other state-of-the-art models for drug-likeness prediction. The combination of graph convolution and attention mechanism made an important contribution to the performance of the model. Specifically, the application of the attention mechanism improved accuracy by 4.0%. The utilization of graph convolution improved the accuracy by 6.1%. Results on the dataset beyond Lipinski's rule of five space and the non-US dataset showed that the model had good versatility. Then, the billion-scale GDB-13 database was used as a case study to screen SARS-CoV-2 3C-like protease inhibitors. Sixty-five drug candidates were screened out, most substructures of which are similar to these of existing oral drugs. Candidates screened from S-GDB13 have higher similarity to existing drugs and better molecular docking performance than those from the rest of GDB-13. The screening speed on S-GDB13 is significantly faster than screening directly on GDB-13. In general, D-GCAN is a promising tool to predict the drug-likeness for selecting potential candidates and accelerating drug discovery by excluding unpromising candidates and avoiding unnecessary biological and clinical testing. AVAILABILITY AND IMPLEMENTATION The source code, model and tutorials are available at https://github.com/JinYSun/D-GCAN. The S-GDB13 database is available at https://doi.org/10.5281/zenodo.7054367. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jinyu Sun
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Ming Wen
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Huabei Wang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Yuezhe Ruan
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Qiong Yang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Xiao Kang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Hailiang Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Zhimin Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Hongmei Lu
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
6
|
Grzybowski BA, Badowski T, Molga K, Szymkuć S. Network search algorithms and scoring functions for advanced‐level computerized synthesis planning. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1630] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Affiliation(s)
- Bartosz A. Grzybowski
- Institute of Organic Chemistry, Polish Academy of Sciences Warsaw Poland
- Center for Soft and Living Matter, Institute for Basic Science (IBS) Ulsan Republic of Korea
- Department of Chemistry Ulsan National Institute of Science and Technology (UNIST) Ulsan Republic of Korea
| | - Tomasz Badowski
- Institute of Organic Chemistry, Polish Academy of Sciences Warsaw Poland
| | - Karol Molga
- Institute of Organic Chemistry, Polish Academy of Sciences Warsaw Poland
| | - Sara Szymkuć
- Institute of Organic Chemistry, Polish Academy of Sciences Warsaw Poland
| |
Collapse
|
7
|
Robinson WE, Daines E, van Duppen P, de Jong T, Huck WTS. Environmental conditions drive self-organization of reaction pathways in a prebiotic reaction network. Nat Chem 2022; 14:623-631. [PMID: 35668214 DOI: 10.1038/s41557-022-00956-7] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Accepted: 04/26/2022] [Indexed: 11/09/2022]
Abstract
The evolution of life from the prebiotic environment required a gradual process of chemical evolution towards greater molecular complexity. Elaborate prebiotically relevant synthetic routes to the building blocks of life have been established. However, it is still unclear how functional chemical systems evolved with direction using only the interaction between inherent molecular chemical reactivity and the abiotic environment. Here we demonstrate how complex systems of chemical reactions exhibit well-defined self-organization in response to varying environmental conditions. This self-organization allows the compositional complexity of the reaction products to be controlled as a function of factors such as feedstock and catalyst availability. We observe how Breslow's cycle contributes to the reaction composition by feeding C2 building blocks into the network, alongside reaction pathways dominated by formaldehyde-driven chain growth. The emergence of organized systems of chemical reactions in response to changes in the environment offers a potential mechanism for a chemical evolution process that bridges the gap between prebiotic chemical building blocks and the origin of life.
Collapse
Affiliation(s)
- William E Robinson
- Institute for Molecules and Materials, Radboud University Nijmegen, Nijmegen, Netherlands
| | - Elena Daines
- Institute for Molecules and Materials, Radboud University Nijmegen, Nijmegen, Netherlands
| | - Peer van Duppen
- Institute for Molecules and Materials, Radboud University Nijmegen, Nijmegen, Netherlands
| | - Thijs de Jong
- Institute for Molecules and Materials, Radboud University Nijmegen, Nijmegen, Netherlands
| | - Wilhelm T S Huck
- Institute for Molecules and Materials, Radboud University Nijmegen, Nijmegen, Netherlands.
| |
Collapse
|
8
|
Venkatasubramanian V, Mann V. Artificial intelligence in reaction prediction and chemical synthesis. Curr Opin Chem Eng 2022. [DOI: 10.1016/j.coche.2021.100749] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
9
|
Ishida S, Terayama K, Kojima R, Takasu K, Okuno Y. AI-Driven Synthetic Route Design Incorporated with Retrosynthesis Knowledge. J Chem Inf Model 2022; 62:1357-1367. [PMID: 35258953 PMCID: PMC8965881 DOI: 10.1021/acs.jcim.1c01074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Computer-aided synthesis planning (CASP) aims to assist chemists in performing retrosynthetic analysis for which they utilize their experiments, intuition, and knowledge. Recent breakthroughs in machine learning (ML) techniques, including deep neural networks, have significantly improved data-driven synthetic route designs without human intervention. However, learning chemical knowledge by ML for practical synthesis planning has not yet been adequately achieved and remains a challenging problem. In this study, we developed a data-driven CASP application integrated with various portions of retrosynthesis knowledge called "ReTReK" that introduces the knowledge as adjustable parameters into the evaluation of promising search directions. The experimental results showed that ReTReK successfully searched synthetic routes based on the specified retrosynthesis knowledge, indicating that the synthetic routes searched with the knowledge were preferred to those without the knowledge. The concept of integrating retrosynthesis knowledge as adjustable parameters into a data-driven CASP application is expected to enhance the performance of both existing data-driven CASP applications and those under development.
Collapse
Affiliation(s)
- Shoichi Ishida
- Graduate School of Pharmaceutical Sciences, Kyoto University, 46-29 Yoshidashimo-Adachicho, Sakyo-ku 606-8501, Kyoto, Japan
| | - Kei Terayama
- Graduate School of Medical Life Science, Yokohama City University, 1-7-29, Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Kanagawa, Japan.,Graduate School of Medicine, Kyoto University, 53 Shogoin-Kawaharacho, Sakyo-ku 606-8507, Kyoto, Japan
| | - Ryosuke Kojima
- Graduate School of Medicine, Kyoto University, 53 Shogoin-Kawaharacho, Sakyo-ku 606-8507, Kyoto, Japan
| | - Kiyosei Takasu
- Graduate School of Pharmaceutical Sciences, Kyoto University, 46-29 Yoshidashimo-Adachicho, Sakyo-ku 606-8501, Kyoto, Japan
| | - Yasushi Okuno
- Graduate School of Medicine, Kyoto University, 53 Shogoin-Kawaharacho, Sakyo-ku 606-8507, Kyoto, Japan.,HPC- and AI-driven Drug Development Platform Division, RIKEN Center for Computational Science, 7-1-26, Minatojima-minami-machi, Chuo-ku, Kobe 650-0047, Hyogo, Japan
| |
Collapse
|
10
|
Dzobo K. The Role of Natural Products as Sources of Therapeutic Agents for Innovative Drug Discovery. COMPREHENSIVE PHARMACOLOGY 2022. [PMCID: PMC8016209 DOI: 10.1016/b978-0-12-820472-6.00041-4] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Emerging threats to human health require a concerted effort in search of both preventive and treatment strategies, placing natural products at the center of efforts to obtain new therapies and reduce disease spread and associated mortality. The therapeutic value of compounds found in plants has been known for ages, resulting in their utilization in homes and in clinics for the treatment of many ailments ranging from common headache to serious conditions such as wounds. Despite the advancement observed in the world, plant based medicines are still being used to treat many pathological conditions or are used as alternatives to modern medicines. In most cases, these natural products or plant-based medicines are used in an un-purified state as extracts. A lot of research is underway to identify and purify the active compounds responsible for the healing process. Some of the current drugs used in clinics have their origins as natural products or came from plant extracts. In addition, several synthetic analogues are natural product-based or plant-based. With the emergence of novel infectious agents such as the SARS-CoV-2 in addition to already burdensome diseases such as diabetes, cancer, tuberculosis and HIV/AIDS, there is need to come up with new drugs that can cure these conditions. Natural products offer an opportunity to discover new compounds that can be converted into drugs given their chemical structure diversity. Advances in analytical processes make drug discovery a multi-dimensional process involving computational designing and testing and eventual laboratory screening of potential drug candidates. Lead compounds will then be evaluated for safety, pharmacokinetics and efficacy. New technologies including Artificial Intelligence, better organ and tissue models such as organoids allow virtual screening, automation and high-throughput screening to be part of drug discovery. The use of bioinformatics and computation means that drug discovery can be a fast and efficient process and enable the use of natural products structures to obtain novel drugs. The removal of potential bottlenecks resulting in minimal false positive leads in drug development has enabled an efficient system of drug discovery. This review describes the biosynthesis and screening of natural products during drug discovery as well as methods used in studying natural products.
Collapse
|
11
|
Szymkuć S, Badowski T, Grzybowski BA. Is Organic Chemistry Really Growing Exponentially? Angew Chem Int Ed Engl 2021. [DOI: 10.1002/ange.202111540] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Sara Szymkuć
- Institute of Organic Chemistry Polish Academy of Sciences Ul. Kasprzaka 44/52 01-224 Warsaw Poland
- Allchemy, Inc. Highland IN USA
| | - Tomasz Badowski
- Institute of Organic Chemistry Polish Academy of Sciences Ul. Kasprzaka 44/52 01-224 Warsaw Poland
- Allchemy, Inc. Highland IN USA
| | - Bartosz A. Grzybowski
- Institute of Organic Chemistry Polish Academy of Sciences Ul. Kasprzaka 44/52 01-224 Warsaw Poland
- Allchemy, Inc. Highland IN USA
- IBS Center for Soft and Living Matter and Department of Chemistry UNIST 50, UNIST-gil, Eonyang-eup, Ulju-gun Ulsan South Korea
| |
Collapse
|
12
|
Szymkuć S, Badowski T, Grzybowski BA. Is Organic Chemistry Really Growing Exponentially? Angew Chem Int Ed Engl 2021; 60:26226-26232. [PMID: 34558168 DOI: 10.1002/anie.202111540] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Indexed: 11/05/2022]
Abstract
In terms of molecules and specific reaction examples, organic chemistry features an impressive, exponential growth. However, new reaction classes/types that fuel this growth are being discovered at a much slower and only linear (or even sublinear) rate. The proportion of newly discovered reaction types to all reactions being performed keeps decreasing, suggesting that synthetic chemistry becomes more reliant on reusing the well-known methods. The newly discovered chemistries are more complex than decades ago and allow for the rapid construction of complex scaffolds in fewer numbers of steps. We study these and other trends in the function of time, reaction-type popularity and complexity based on the algorithm that extracts generalized reaction class templates. These analyses are useful in the context of computer-assisted synthesis, machine learning (to estimate the numbers of models with sufficient reaction statistics), and identifying erroneous entries in reaction databases.
Collapse
Affiliation(s)
- Sara Szymkuć
- Institute of Organic Chemistry, Polish Academy of Sciences, Ul. Kasprzaka 44/52, 01-224, Warsaw, Poland.,Allchemy, Inc., Highland, IN, USA
| | - Tomasz Badowski
- Institute of Organic Chemistry, Polish Academy of Sciences, Ul. Kasprzaka 44/52, 01-224, Warsaw, Poland.,Allchemy, Inc., Highland, IN, USA
| | - Bartosz A Grzybowski
- Institute of Organic Chemistry, Polish Academy of Sciences, Ul. Kasprzaka 44/52, 01-224, Warsaw, Poland.,Allchemy, Inc., Highland, IN, USA.,IBS Center for Soft and Living Matter and Department of Chemistry, UNIST, 50, UNIST-gil, Eonyang-eup, Ulju-gun, Ulsan, South Korea
| |
Collapse
|
13
|
Thomas M, Boardman A, Garcia-Ortegon M, Yang H, de Graaf C, Bender A. Applications of Artificial Intelligence in Drug Design: Opportunities and Challenges. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2021; 2390:1-59. [PMID: 34731463 DOI: 10.1007/978-1-0716-1787-8_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Artificial intelligence (AI) has undergone rapid development in recent years and has been successfully applied to real-world problems such as drug design. In this chapter, we review recent applications of AI to problems in drug design including virtual screening, computer-aided synthesis planning, and de novo molecule generation, with a focus on the limitations of the application of AI therein and opportunities for improvement. Furthermore, we discuss the broader challenges imposed by AI in translating theoretical practice to real-world drug design; including quantifying prediction uncertainty and explaining model behavior.
Collapse
Affiliation(s)
- Morgan Thomas
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Andrew Boardman
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Miguel Garcia-Ortegon
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK.,Department of Pure Mathematics and Mathematical Statistics, University of Cambridge, Cambridge, UK
| | - Hongbin Yang
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | | | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK.
| |
Collapse
|
14
|
Jia P, Pei J, Wang G, Pan X, Zhu Y, Wu Y, Ouyang L. The roles of computer-aided drug synthesis in drug development. GREEN SYNTHESIS AND CATALYSIS 2021. [DOI: 10.1016/j.gresc.2021.11.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
|
15
|
Dong J, Zhao M, Liu Y, Su Y, Zeng X. Deep learning in retrosynthesis planning: datasets, models and tools. Brief Bioinform 2021; 23:6375056. [PMID: 34571535 DOI: 10.1093/bib/bbab391] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 08/16/2021] [Accepted: 08/30/2021] [Indexed: 12/29/2022] Open
Abstract
In recent years, synthesizing drugs powered by artificial intelligence has brought great convenience to society. Since retrosynthetic analysis occupies an essential position in synthetic chemistry, it has received broad attention from researchers. In this review, we comprehensively summarize the development process of retrosynthesis in the context of deep learning. This review covers all aspects of retrosynthesis, including datasets, models and tools. Specifically, we report representative models from academia, in addition to a detailed description of the available and stable platforms in the industry. We also discuss the disadvantages of the existing models and provide potential future trends, so that more abecedarians will quickly understand and participate in the family of retrosynthesis planning.
Collapse
Affiliation(s)
- Jingxin Dong
- College of Information Science and Engineering, Hunan University, 2 Lushan S Rd, Yuelu District, 410086, Hunan, China
| | - Mingyi Zhao
- Department of Pediatrics, Third Xiangya Hospital, Central South University, 400013, Hunan, China
| | - Yuansheng Liu
- College of Information Science and Engineering, Hunan University, 2 Lushan S Rd, Yuelu District, 410086, Hunan, China
| | - Yansen Su
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 230601, Hefei, China
| | - Xiangxiang Zeng
- College of Information Science and Engineering, Hunan University, 2 Lushan S Rd, Yuelu District, 410086, Hunan, China
| |
Collapse
|
16
|
Applications of artificial intelligence to drug design and discovery in the big data era: a comprehensive review. Mol Divers 2021; 25:1643-1664. [PMID: 34110579 DOI: 10.1007/s11030-021-10237-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Accepted: 05/26/2021] [Indexed: 10/21/2022]
Abstract
Artificial intelligence (AI) renders cutting-edge applications in diverse sectors of society. Due to substantial progress in high-performance computing, the development of superior algorithms, and the accumulation of huge biological and chemical data, computer-assisted drug design technology is playing a key role in drug discovery with its advantages of high efficiency, fast speed, and low cost. Over recent years, due to continuous progress in machine learning (ML) algorithms, AI has been extensively employed in various drug discovery stages. Very recently, drug design and discovery have entered the big data era. ML algorithms have progressively developed into a deep learning technique with potent generalization capability and more effectual big data handling, which further promotes the integration of AI technology and computer-assisted drug discovery technology, hence accelerating the design and discovery of the newest drugs. This review mainly summarizes the application progression of AI technology in the drug discovery process, and explores and compares its advantages over conventional methods. The challenges and limitations of AI in drug design and discovery have also been discussed.
Collapse
|
17
|
Kuznetsov A, Sahinidis NV. ExtractionScore: A Quantitative Framework for Evaluating Synthetic Routes on Predicted Liquid-Liquid Extraction Performance. J Chem Inf Model 2021; 61:2274-2282. [PMID: 33881866 DOI: 10.1021/acs.jcim.0c01426] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
A multitude of metrics exist to assign scores to synthetic routes within computer-aided synthesis planning (CASP) tools. A quantitative scoring method is necessary to identify the most promising synthetic approaches to a molecule. However, current CASP tools are limited in their capacity to evaluate reaction selectivity and are unable to fully account for the effect of side products on the purification sequences associated with chemical syntheses. We develop a novel quantitative metric called ExtractionScore for evaluating synthetic routes based on the predicted identities of side products as well as the separability of major and side products by liquid-liquid extraction based on chemical property prediction. By comparing industrially practiced routes to a collection of 200 pharmaceutically relevant compounds with routes suggested by state-of-the-art CASP software, we show that ExtractionScore may improve retrosynthetic recommendations by incorporating information about the formation of side products.
Collapse
Affiliation(s)
- Anatoliy Kuznetsov
- School of Chemical & Biomolecular Engineering, Georgia Institute of Technology, Atlanta 30332, Georgia, United States
| | - Nikolaos V Sahinidis
- H. Milton School of Industrial & Systems Engineering, and School of Chemical & Biomolecular Engineering, Georgia Institute of Technology, Atlanta 30332, Georgia, United States
| |
Collapse
|
18
|
Molga K, Szymkuć S, Grzybowski BA. Chemist Ex Machina: Advanced Synthesis Planning by Computers. Acc Chem Res 2021; 54:1094-1106. [PMID: 33423460 DOI: 10.1021/acs.accounts.0c00714] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Teaching computers to plan multistep syntheses of arbitrary target molecules-including natural products-has been one of the oldest challenges in chemistry, dating back to the 1960s. This Account recapitulates two decades of our group's work on the software platform called Chematica, which very recently achieved this long-sought objective and has been shown capable of planning synthetic routes to complex natural products, several of which were validated in the laboratory.For the machine to plan syntheses at an expert level, it must know the rules describing chemical reactions and use these rules to expand and search the networks of synthetic options. The rules must be of high quality: They must delineate accurately the scope of admissible substituents, capture all relevant stereochemical information, detect potential reactivity conflicts, and protection requirements. They should yield only those synthons that are chemically stable and energetically allowed (e.g., not too strained) and should be able to extrapolate beyond examples already published in the literature. In parallel, the network-search algorithms must be able to assign meaningful scores to the sets of synthons they encounter, make judicious choices which of the network's branches to expand, and when to withdraw from unpromising ones. They must be able to strategize over multiple steps to resolve intermittent reactivity conflicts, exchange functional groups, or overcome local maxima of molecular complexity.Meeting all these requirements makes the problem of computer-driven retrosynthesis very multifaceted, combining expert and AI approaches further supplemented by quantum-mechanical and molecular-mechanics calculations. Development of Chematica has been a very long and gradual process because all these components are needed. Any shortcuts-for example, reliance on only expert or only data-based approaches-yield chemically naïve and often erroneous syntheses, especially for complex targets. On the bright side, once all the requisite algorithms are implemented-as they now are-they not only streamline conventional synthetic planning but also enable completely new modalities that would challenge any human chemist, for example, synthesis with multiple constraints imposed simultaneously or library-wide syntheses in which the machine constructs "global plans" leading to multiple targets and benefiting from the use of common intermediates. These types of analyses will have profound impact on the practice of chemical industry, designing more economical, more green, and less hazardous pathways.
Collapse
Affiliation(s)
- Karol Molga
- Institute of Organic Chemistry, Polish Academy of Sciences, ul. Kasprzaka 44/52, 01-224, Warsaw, Poland
| | - Sara Szymkuć
- Institute of Organic Chemistry, Polish Academy of Sciences, ul. Kasprzaka 44/52, 01-224, Warsaw, Poland
| | - Bartosz A. Grzybowski
- Institute of Organic Chemistry, Polish Academy of Sciences, ul. Kasprzaka 44/52, 01-224, Warsaw, Poland
- Center for Soft and Living Matter, Institute for Basic Science (IBS), Ulsan 44919, Republic of Korea
- Department of Chemistry, Ulsan National Institute of Science and Technology (UNIST), 50 UNIST-gil, Ulsan 44919, Republic of Korea
| |
Collapse
|
19
|
Blau SM, Patel HD, Spotte-Smith EWC, Xie X, Dwaraknath S, Persson KA. A chemically consistent graph architecture for massive reaction networks applied to solid-electrolyte interphase formation. Chem Sci 2021; 12:4931-4939. [PMID: 34163740 PMCID: PMC8179555 DOI: 10.1039/d0sc05647b] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Accepted: 02/23/2021] [Indexed: 01/09/2023] Open
Abstract
Modeling reactivity with chemical reaction networks could yield fundamental mechanistic understanding that would expedite the development of processes and technologies for energy storage, medicine, catalysis, and more. Thus far, reaction networks have been limited in size by chemically inconsistent graph representations of multi-reactant reactions (e.g. A + B → C) that cannot enforce stoichiometric constraints, precluding the use of optimized shortest-path algorithms. Here, we report a chemically consistent graph architecture that overcomes these limitations using a novel multi-reactant representation and iterative cost-solving procedure. Our approach enables the identification of all low-cost pathways to desired products in massive reaction networks containing reactions of any stoichiometry, allowing for the investigation of vastly more complex systems than previously possible. Leveraging our architecture, we construct the first ever electrochemical reaction network from first-principles thermodynamic calculations to describe the formation of the Li-ion solid electrolyte interphase (SEI), which is critical for passivation of the negative electrode. Using this network comprised of nearly 6000 species and 4.5 million reactions, we interrogate the formation of a key SEI component, lithium ethylene dicarbonate. We automatically identify previously proposed mechanisms as well as multiple novel pathways containing counter-intuitive reactions that have not, to our knowledge, been reported in the literature. We envision that our framework and data-driven methodology will facilitate efforts to engineer the composition-related properties of the SEI - or of any complex chemical process - through selective control of reactivity.
Collapse
Affiliation(s)
- Samuel M Blau
- Energy Technologies Area, Lawrence Berkeley National Laboratory Berkeley CA 94720 USA
| | - Hetal D Patel
- Department of Materials Science and Engineering, University of California Berkeley CA 94720 USA
- Materials Science Division, Lawrence Berkeley National Laboratory Berkeley CA 94720 USA
| | - Evan Walter Clark Spotte-Smith
- Department of Materials Science and Engineering, University of California Berkeley CA 94720 USA
- Materials Science Division, Lawrence Berkeley National Laboratory Berkeley CA 94720 USA
| | - Xiaowei Xie
- Materials Science Division, Lawrence Berkeley National Laboratory Berkeley CA 94720 USA
- College of Chemistry, University of California Berkeley CA 94720 USA
| | - Shyam Dwaraknath
- Materials Science Division, Lawrence Berkeley National Laboratory Berkeley CA 94720 USA
| | - Kristin A Persson
- Department of Materials Science and Engineering, University of California Berkeley CA 94720 USA
- Molecular Foundry, Lawrence Berkeley National Laboratory Berkeley CA 94720 USA
| |
Collapse
|
20
|
Kim E, Lee D, Kwon Y, Park MS, Choi YS. Valid, Plausible, and Diverse Retrosynthesis Using Tied Two-Way Transformers with Latent Variables. J Chem Inf Model 2021; 61:123-133. [PMID: 33410697 DOI: 10.1021/acs.jcim.0c01074] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Retrosynthesis is an essential task in organic chemistry for identifying the synthesis pathways of newly discovered materials, and with the recent advances in deep learning, there have been growing attempts to solve the retrosynthesis problem through transformer models, which are the state-of-the-art in neural machine translation, by converting the problem into a machine translation problem. However, the pure transformer provides unsatisfactory results that lack grammatical validity, chemical plausibility, and diversity in reactant candidates. In this study, we develop tied two-way transformers with latent modeling to solve those problems using cycle consistency checks, parameter sharing, and multinomial latent variables. Experimental results obtained using public and in-house datasets demonstrate that the proposed model improves the retrosynthesis accuracy, grammatical error, and diversity, and qualitative evaluation results verify its ability to suggest valid and plausible results.
Collapse
Affiliation(s)
- Eunji Kim
- Samsung Advanced Institute of Technology, Samsung Electronics Co., Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon 16678, Republic of Korea
| | - Dongseon Lee
- Samsung Advanced Institute of Technology, Samsung Electronics Co., Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon 16678, Republic of Korea
| | - Youngchun Kwon
- Samsung Advanced Institute of Technology, Samsung Electronics Co., Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon 16678, Republic of Korea
| | - Min Sik Park
- Samsung Advanced Institute of Technology, Samsung Electronics Co., Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon 16678, Republic of Korea
| | - Youn-Suk Choi
- Samsung Advanced Institute of Technology, Samsung Electronics Co., Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon 16678, Republic of Korea
| |
Collapse
|
21
|
Varnek A, Baskin II. Modern Trends in Chemical Reactions Modeling. SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11543-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022] Open
|
22
|
Stocker S, Csányi G, Reuter K, Margraf JT. Machine learning in chemical reaction space. Nat Commun 2020; 11:5505. [PMID: 33127879 PMCID: PMC7603480 DOI: 10.1038/s41467-020-19267-x] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Accepted: 10/01/2020] [Indexed: 12/29/2022] Open
Abstract
Chemical compound space refers to the vast set of all possible chemical compounds, estimated to contain 1060 molecules. While intractable as a whole, modern machine learning (ML) is increasingly capable of accurately predicting molecular properties in important subsets. Here, we therefore engage in the ML-driven study of even larger reaction space. Central to chemistry as a science of transformations, this space contains all possible chemical reactions. As an important basis for 'reactive' ML, we establish a first-principles database (Rad-6) containing closed and open-shell organic molecules, along with an associated database of chemical reaction energies (Rad-6-RE). We show that the special topology of reaction spaces, with central hub molecules involved in multiple reactions, requires a modification of existing compound space ML-concepts. Showcased by the application to methane combustion, we demonstrate that the learned reaction energies offer a non-empirical route to rationally extract reduced reaction networks for detailed microkinetic analyses.
Collapse
Affiliation(s)
- Sina Stocker
- Chair of Theoretical Chemistry and Catalysis Research Center, Technische Universität München, Garching, Germany
| | - Gábor Csányi
- Engineering Laboratory, University of Cambridge, Cambridge, CB2 1PZ, UK
| | - Karsten Reuter
- Chair of Theoretical Chemistry and Catalysis Research Center, Technische Universität München, Garching, Germany
- Fritz-Haber-Institut der Max-Planck-Gesellschaft, Berlin, Germany
| | - Johannes T Margraf
- Chair of Theoretical Chemistry and Catalysis Research Center, Technische Universität München, Garching, Germany.
| |
Collapse
|
23
|
Computational planning of the synthesis of complex natural products. Nature 2020; 588:83-88. [PMID: 33049755 DOI: 10.1038/s41586-020-2855-y] [Citation(s) in RCA: 88] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2020] [Accepted: 10/06/2020] [Indexed: 12/27/2022]
Abstract
Training algorithms to computationally plan multistep organic syntheses has been a challenge for more than 50 years1-7. However, the field has progressed greatly since the development of early programs such as LHASA1,7, for which reaction choices at each step were made by human operators. Multiple software platforms6,8-14 are now capable of completely autonomous planning. But these programs 'think' only one step at a time and have so far been limited to relatively simple targets, the syntheses of which could arguably be designed by human chemists within minutes, without the help of a computer. Furthermore, no algorithm has yet been able to design plausible routes to complex natural products, for which much more far-sighted, multistep planning is necessary15,16 and closely related literature precedents cannot be relied on. Here we demonstrate that such computational synthesis planning is possible, provided that the program's knowledge of organic chemistry and data-based artificial intelligence routines are augmented with causal relationships17,18, allowing it to 'strategize' over multiple synthetic steps. Using a Turing-like test administered to synthesis experts, we show that the routes designed by such a program are largely indistinguishable from those designed by humans. We also successfully validated three computer-designed syntheses of natural products in the laboratory. Taken together, these results indicate that expert-level automated synthetic planning is feasible, pending continued improvements to the reaction knowledge base and further code optimization.
Collapse
|
24
|
Shibukawa R, Ishida S, Yoshizoe K, Wasa K, Takasu K, Okuno Y, Terayama K, Tsuda K. CompRet: a comprehensive recommendation framework for chemical synthesis planning with algorithmic enumeration. J Cheminform 2020; 12:52. [PMID: 33431005 PMCID: PMC7465358 DOI: 10.1186/s13321-020-00452-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2020] [Accepted: 08/08/2020] [Indexed: 01/21/2023] Open
Abstract
In computer-assisted synthesis planning (CASP) programs, providing as many chemical synthetic routes as possible is essential for considering optimal and alternative routes in a chemical reaction network. As the majority of CASP programs have been designed to provide one or a few optimal routes, it is likely that the desired one will not be included. To avoid this, an exact algorithm that lists possible synthetic routes within the chemical reaction network is required, alongside a recommendation of synthetic routes that meet specified criteria based on the chemist’s objectives. Herein, we propose a chemical-reaction-network-based synthetic route recommendation framework called “CompRet” with a mathematically guaranteed enumeration algorithm. In a preliminary experiment, CompRet was shown to successfully provide alternative routes for a known antihistaminic drug, cetirizine. CompRet is expected to promote desirable enumeration-based chemical synthesis searches and aid the development of an interactive CASP framework for chemists.
Collapse
Affiliation(s)
- Ryosuke Shibukawa
- Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan
| | - Shoichi Ishida
- Graduate School of Pharmaceutical Sciences, Kyoto University, Sakyo-ku, 606-8501, Kyoto, Japan
| | - Kazuki Yoshizoe
- RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
| | | | - Kiyosei Takasu
- Graduate School of Pharmaceutical Sciences, Kyoto University, Sakyo-ku, 606-8501, Kyoto, Japan
| | - Yasushi Okuno
- Graduate School of Medicine, Kyoto University, Kyoto, Japan.,Medical Sciences Innovation Hub Program, RIKEN, Kanagawa, Japan
| | - Kei Terayama
- RIKEN Center for Advanced Intelligence Project, Tokyo, Japan. .,Graduate School of Medicine, Kyoto University, Kyoto, Japan. .,Medical Sciences Innovation Hub Program, RIKEN, Kanagawa, Japan. .,Graduate School of Medical Life Science, Yokohama City University, Kanagawa, Japan.
| | - Koji Tsuda
- Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan. .,RIKEN Center for Advanced Intelligence Project, Tokyo, Japan. .,Research and Services Division of Materials Data and Integrated System, National Institute for Materials Science, Kyoto, Japan.
| |
Collapse
|
25
|
Plehiers PP, Coley CW, Gao H, Vermeire FH, Dobbelaere MR, Stevens CV, Van Geem KM, Green WH. Artificial Intelligence for Computer-Aided Synthesis In Flow: Analysis and Selection of Reaction Components. FRONTIERS IN CHEMICAL ENGINEERING 2020. [DOI: 10.3389/fceng.2020.00005] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
|
26
|
Muraoka K, Chaikittisilp W, Okubo T. Multi-objective de novo molecular design of organic structure-directing agents for zeolites using nature-inspired ant colony optimization. Chem Sci 2020; 11:8214-8223. [PMID: 34094176 PMCID: PMC8163217 DOI: 10.1039/d0sc03075a] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Organic structure-directing agents (OSDAs) are often employed for synthesis of zeolites with desired frameworks. A priori prediction of such OSDAs has mainly relied on the interaction energies between OSDAs and zeolite frameworks, without cost considerations. For practical purposes, the cost of OSDAs becomes a critical issue. Therefore, the development of a computational de novo prediction methodology that can speed up the trial-and-error cycle in the search for less expensive OSDAs is desired. This study utilized a nature-inspired ant colony optimization method to predict physicochemically and/or economically preferable OSDAs, while also taking molecular similarity and heuristics of zeolite synthesis into consideration. The prediction results included experimentally known OSDAs, candidates having structures closely related to known OSDAs, and novel ones, suggesting the applicability of this approach. Inspired by the exploratory methods of ant colonies, adaptive optimization was employed to explore the chemical space for organic molecules that guide zeolite crystallization, giving both physicochemically and economically promising molecules.![]()
Collapse
Affiliation(s)
- Koki Muraoka
- Department of Chemical System Engineering, The University of Tokyo 7-3-1 Hongo, Bunkyo-ku Tokyo 113-8656 Japan
| | - Watcharop Chaikittisilp
- Department of Chemical System Engineering, The University of Tokyo 7-3-1 Hongo, Bunkyo-ku Tokyo 113-8656 Japan
| | - Tatsuya Okubo
- Department of Chemical System Engineering, The University of Tokyo 7-3-1 Hongo, Bunkyo-ku Tokyo 113-8656 Japan
| |
Collapse
|
27
|
Szymkuć S, Gajewska EP, Molga K, Wołos A, Roszak R, Beker W, Moskal M, Dittwald P, Grzybowski BA. Computer-generated "synthetic contingency" plans at times of logistics and supply problems: scenarios for hydroxychloroquine and remdesivir. Chem Sci 2020; 11:6736-6744. [PMID: 33033595 PMCID: PMC7500088 DOI: 10.1039/d0sc01799j] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2020] [Accepted: 06/02/2020] [Indexed: 01/21/2023] Open
Abstract
A computer program for retrosynthetic planning helps develop multiple "synthetic contingency" plans for hydroxychloroquine and also routes leading to remdesivir, both promising but yet unproven medications against COVID-19. These plans are designed to navigate, as much as possible, around known and patented routes and to commence from inexpensive and diverse starting materials, so as to ensure supply in case of anticipated market shortages of commonly used substrates. Looking beyond the current COVID-19 pandemic, development of similar contingency syntheses is advocated for other already-approved medications, in case such medications become urgently needed in mass quantities to face other public-health emergencies.
Collapse
Affiliation(s)
- Sara Szymkuć
- Institute of Organic Chemistry , Polish Academy of Sciences , ul. Kasprzaka 44/52 , Warsaw 02-224 , Poland .
| | - Ewa P Gajewska
- Institute of Organic Chemistry , Polish Academy of Sciences , ul. Kasprzaka 44/52 , Warsaw 02-224 , Poland .
| | - Karol Molga
- Institute of Organic Chemistry , Polish Academy of Sciences , ul. Kasprzaka 44/52 , Warsaw 02-224 , Poland .
| | - Agnieszka Wołos
- Institute of Organic Chemistry , Polish Academy of Sciences , ul. Kasprzaka 44/52 , Warsaw 02-224 , Poland .
| | - Rafał Roszak
- Institute of Organic Chemistry , Polish Academy of Sciences , ul. Kasprzaka 44/52 , Warsaw 02-224 , Poland .
| | - Wiktor Beker
- Institute of Organic Chemistry , Polish Academy of Sciences , ul. Kasprzaka 44/52 , Warsaw 02-224 , Poland .
| | - Martyna Moskal
- Institute of Organic Chemistry , Polish Academy of Sciences , ul. Kasprzaka 44/52 , Warsaw 02-224 , Poland .
| | - Piotr Dittwald
- Institute of Organic Chemistry , Polish Academy of Sciences , ul. Kasprzaka 44/52 , Warsaw 02-224 , Poland .
| | - Bartosz A Grzybowski
- Institute of Organic Chemistry , Polish Academy of Sciences , ul. Kasprzaka 44/52 , Warsaw 02-224 , Poland .
- IBS Center for Soft and Living Matter , 50, UNIST-gil, Eonyang-eup, Ulju-gun , Ulsan , 689-798 , South Korea
- Department of Chemistry , UNIST , 50, UNIST-gil, Eonyang-eup, Ulju-gun , Ulsan , 689-798 , South Korea
| |
Collapse
|
28
|
Johansson S, Thakkar A, Kogej T, Bjerrum E, Genheden S, Bastys T, Kannas C, Schliep A, Chen H, Engkvist O. AI-assisted synthesis prediction. DRUG DISCOVERY TODAY. TECHNOLOGIES 2020; 32-33:65-72. [PMID: 33386096 DOI: 10.1016/j.ddtec.2020.06.002] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Revised: 06/01/2020] [Accepted: 06/10/2020] [Indexed: 11/25/2022]
Abstract
Application of AI technologies in synthesis prediction has developed very rapidly in recent years. We attempt here to give a comprehensive summary on the latest advancement on retro-synthesis planning, forward synthesis prediction as well as quantum chemistry-based reaction prediction models. Besides an introduction on the AI/ML models for addressing various synthesis related problems, the sources of the reaction datasets used in model building is also covered. In addition to the predictive models, the robotics based high throughput experimentation technology will be another crucial factor for conducting synthesis in an automated fashion. Some state-of-the-art of high throughput experimentation practices carried out in the pharmaceutical industry are highlighted in this chapter to give the reader a sense of how future chemistry will be conducted to make compounds faster and cheaper.
Collapse
Affiliation(s)
- Simon Johansson
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca Gothenburg, Sweden; Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, Sweden.
| | - Amol Thakkar
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca Gothenburg, Sweden; Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012 Bern, Switzerland
| | - Thierry Kogej
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca Gothenburg, Sweden
| | - Esben Bjerrum
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca Gothenburg, Sweden
| | - Samuel Genheden
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca Gothenburg, Sweden
| | - Tomas Bastys
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca Gothenburg, Sweden
| | - Christos Kannas
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca Gothenburg, Sweden
| | - Alexander Schliep
- Department of Computer Science and Engineering, University of Gothenburg, Gothenburg, Sweden
| | - Hongming Chen
- Centre of Chemistry and Chemical Biology, Guangzhou Regenerative Medicine and Health - Guangdong Laboratory, Guangzhou 510530, China
| | - Ola Engkvist
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca Gothenburg, Sweden
| |
Collapse
|
29
|
Muratov EN, Bajorath J, Sheridan RP, Tetko IV, Filimonov D, Poroikov V, Oprea TI, Baskin II, Varnek A, Roitberg A, Isayev O, Curtarolo S, Fourches D, Cohen Y, Aspuru-Guzik A, Winkler DA, Agrafiotis D, Cherkasov A, Tropsha A. QSAR without borders. Chem Soc Rev 2020; 49:3525-3564. [PMID: 32356548 PMCID: PMC8008490 DOI: 10.1039/d0cs00098a] [Citation(s) in RCA: 305] [Impact Index Per Article: 76.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Prediction of chemical bioactivity and physical properties has been one of the most important applications of statistical and more recently, machine learning and artificial intelligence methods in chemical sciences. This field of research, broadly known as quantitative structure-activity relationships (QSAR) modeling, has developed many important algorithms and has found a broad range of applications in physical organic and medicinal chemistry in the past 55+ years. This Perspective summarizes recent technological advances in QSAR modeling but it also highlights the applicability of algorithms, modeling methods, and validation practices developed in QSAR to a wide range of research areas outside of traditional QSAR boundaries including synthesis planning, nanotechnology, materials science, biomaterials, and clinical informatics. As modern research methods generate rapidly increasing amounts of data, the knowledge of robust data-driven modelling methods professed within the QSAR field can become essential for scientists working both within and outside of chemical research. We hope that this contribution highlighting the generalizable components of QSAR modeling will serve to address this challenge.
Collapse
Affiliation(s)
- Eugene N Muratov
- UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
30
|
Abstract
Modern computational chemistry has reached a stage at which massive exploration into chemical reaction space with unprecedented resolution with respect to the number of potentially relevant molecular structures has become possible. Various algorithmic advances have shown that such structural screenings must and can be automated and routinely carried out. This will replace the standard approach of manually studying a selected and restricted number of molecular structures for a chemical mechanism. The complexity of the task has led to many different approaches. However, all of them address the same general target, namely to produce a complete atomistic picture of the kinetics of a chemical process. It is the purpose of this overview to categorize the problems that should be targeted and to identify the principal components and challenges of automated exploration machines so that the various existing approaches and future developments can be compared based on well-defined conceptual principles.
Collapse
Affiliation(s)
- Jan P. Unsleber
- Laboratory for Physical Chemistry, ETH Zurich, 8093 Zurich, Switzerland
| | - Markus Reiher
- Laboratory for Physical Chemistry, ETH Zurich, 8093 Zurich, Switzerland
| |
Collapse
|
31
|
Lee K, Woo Kim J, Youn Kim W. Efficient Construction of a Chemical Reaction Network Guided By a Monte Carlo Tree Search. CHEMSYSTEMSCHEM 2020. [DOI: 10.1002/syst.201900057] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Affiliation(s)
- Kyunghoon Lee
- Department of ChemistryKorea Advanced Institute of Science and Technology (KAIST) 291 Daehak-ro, Yuseong-gu Daejeon 305-701 Korea
| | - Jin Woo Kim
- Department of ChemistryKorea Advanced Institute of Science and Technology (KAIST) 291 Daehak-ro, Yuseong-gu Daejeon 305-701 Korea
| | - Woo Youn Kim
- Department of ChemistryKorea Advanced Institute of Science and Technology (KAIST) 291 Daehak-ro, Yuseong-gu Daejeon 305-701 Korea
- KI for Artificial IntelligenceKorea Advanced Institute of Science and Technology (KAIST) 291 Daehak-ro, Yuseong-gu Daejeon 305-701 Korea
| |
Collapse
|
32
|
Lin K, Xu Y, Pei J, Lai L. Automatic retrosynthetic route planning using template-free models. Chem Sci 2020; 11:3355-3364. [PMID: 34122843 PMCID: PMC8152431 DOI: 10.1039/c9sc03666k] [Citation(s) in RCA: 71] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2019] [Accepted: 03/02/2020] [Indexed: 01/05/2023] Open
Abstract
Retrosynthetic route planning can be considered a rule-based reasoning procedure. The possibilities for each transformation are generated based on collected reaction rules, and then potential reaction routes are recommended by various optimization algorithms. Although there has been much progress in computer-assisted retrosynthetic route planning and reaction prediction, fully data-driven automatic retrosynthetic route planning remains challenging. Here we present a template-free approach that is independent of reaction templates, rules, or atom mapping, to implement automatic retrosynthetic route planning. We treated each reaction prediction task as a data-driven sequence-to-sequence problem using the multi-head attention-based Transformer architecture, which has demonstrated power in machine translation tasks. Using reactions from the United States patent literature, our end-to-end models naturally incorporate the global chemical environments of molecules and achieve remarkable performance in top-1 predictive accuracy (63.0%, with the reaction class provided) and top-1 molecular validity (99.6%) in one-step retrosynthetic tasks. Inspired by the success rate of the one-step reaction prediction, we further carried out iterative, multi-step retrosynthetic route planning for four case products, which was successful. We then constructed an automatic data-driven end-to-end retrosynthetic route planning system (AutoSynRoute) using Monte Carlo tree search with a heuristic scoring function. AutoSynRoute successfully reproduced published synthesis routes for the four case products. The end-to-end model for reaction task prediction can be easily extended to larger or customer-requested reaction databases. Our study presents an important step in realizing automatic retrosynthetic route planning.
Collapse
Affiliation(s)
- Kangjie Lin
- BNLMS, Peking-Tsinghua Center for Life Sciences at the College of Chemistry and Molecular Engineering, Peking University Beijing 100871 PR China
| | - Youjun Xu
- BNLMS, Peking-Tsinghua Center for Life Sciences at the College of Chemistry and Molecular Engineering, Peking University Beijing 100871 PR China
| | - Jianfeng Pei
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University Beijing 100871 PR China
| | - Luhua Lai
- BNLMS, Peking-Tsinghua Center for Life Sciences at the College of Chemistry and Molecular Engineering, Peking University Beijing 100871 PR China
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University Beijing 100871 PR China
| |
Collapse
|
33
|
|
34
|
Zheng S, Rao J, Zhang Z, Xu J, Yang Y. Predicting Retrosynthetic Reactions Using Self-Corrected Transformer Neural Networks. J Chem Inf Model 2019; 60:47-55. [PMID: 31825611 DOI: 10.1021/acs.jcim.9b00949] [Citation(s) in RCA: 75] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Synthesis planning is the process of recursively decomposing target molecules into available precursors. Computer-aided retrosynthesis can potentially assist chemists in designing synthetic routes; however, at present, it is cumbersome and cannot provide satisfactory results. In this study, we have developed a template-free self-corrected retrosynthesis predictor (SCROP) to predict retrosynthesis using transformer neural networks. In the method, the retrosynthesis planning was converted to a machine translation problem from the products to molecular linear notations of the reactants. By coupling with a neural network-based syntax corrector, our method achieved an accuracy of 59.0% on a standard benchmark data set, which outperformed other deep learning methods by >21% and template-based methods by >6%. More importantly, our method was 1.7 times more accurate than other state-of-the-art methods for compounds not appearing in the training set.
Collapse
Affiliation(s)
- Shuangjia Zheng
- Research Center for Drug Discovery, School of Pharmaceutical Sciences , Sun Yat-sen University , 132 East Circle at University City , Guangzhou 510006 , China.,School of Data and Computer Science , Sun Yat-sen University , Guangzhou 510006 , China
| | - Jiahua Rao
- School of Data and Computer Science , Sun Yat-sen University , Guangzhou 510006 , China
| | - Zhongyue Zhang
- School of Data and Computer Science , Sun Yat-sen University , Guangzhou 510006 , China
| | - Jun Xu
- Research Center for Drug Discovery, School of Pharmaceutical Sciences , Sun Yat-sen University , 132 East Circle at University City , Guangzhou 510006 , China.,School of Computer Science & Technology , Wuyi University , 99 Yingbin Road , Jiangmen 529020 , China
| | - Yuedong Yang
- School of Data and Computer Science , Sun Yat-sen University , Guangzhou 510006 , China.,Key Laboratory of Machine Intelligence and Advanced Computing , Sun Yat-sen University, Ministry of Education , Guangzhou 510000 , China
| |
Collapse
|
35
|
Cova TFGG, Pais AACC. Deep Learning for Deep Chemistry: Optimizing the Prediction of Chemical Patterns. Front Chem 2019; 7:809. [PMID: 32039134 PMCID: PMC6988795 DOI: 10.3389/fchem.2019.00809] [Citation(s) in RCA: 60] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Accepted: 11/11/2019] [Indexed: 12/14/2022] Open
Abstract
Computational Chemistry is currently a synergistic assembly between ab initio calculations, simulation, machine learning (ML) and optimization strategies for describing, solving and predicting chemical data and related phenomena. These include accelerated literature searches, analysis and prediction of physical and quantum chemical properties, transition states, chemical structures, chemical reactions, and also new catalysts and drug candidates. The generalization of scalability to larger chemical problems, rather than specialization, is now the main principle for transforming chemical tasks in multiple fronts, for which systematic and cost-effective solutions have benefited from ML approaches, including those based on deep learning (e.g. quantum chemistry, molecular screening, synthetic route design, catalysis, drug discovery). The latter class of ML algorithms is capable of combining raw input into layers of intermediate features, enabling bench-to-bytes designs with the potential to transform several chemical domains. In this review, the most exciting developments concerning the use of ML in a range of different chemical scenarios are described. A range of different chemical problems and respective rationalization, that have hitherto been inaccessible due to the lack of suitable analysis tools, is thus detailed, evidencing the breadth of potential applications of these emerging multidimensional approaches. Focus is given to the models, algorithms and methods proposed to facilitate research on compound design and synthesis, materials design, prediction of binding, molecular activity, and soft matter behavior. The information produced by pairing Chemistry and ML, through data-driven analyses, neural network predictions and monitoring of chemical systems, allows (i) prompting the ability to understand the complexity of chemical data, (ii) streamlining and designing experiments, (ii) discovering new molecular targets and materials, and also (iv) planning or rethinking forthcoming chemical challenges. In fact, optimization engulfs all these tasks directly.
Collapse
Affiliation(s)
- Tânia F. G. G. Cova
- Coimbra Chemistry Centre, CQC, Department of Chemistry, Faculty of Sciences and Technology, University of Coimbra, Coimbra, Portugal
| | - Alberto A. C. C. Pais
- Coimbra Chemistry Centre, CQC, Department of Chemistry, Faculty of Sciences and Technology, University of Coimbra, Coimbra, Portugal
| |
Collapse
|
36
|
Molga K, Dittwald P, Grzybowski BA. Computational design of syntheses leading to compound libraries or isotopically labelled targets. Chem Sci 2019; 10:9219-9232. [PMID: 32055308 PMCID: PMC6979321 DOI: 10.1039/c9sc02678a] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2019] [Accepted: 08/09/2019] [Indexed: 01/08/2023] Open
Abstract
Although computer programs for retrosynthetic planning have shown improved and in some cases quite satisfactory performance in designing routes leading to specific, individual targets, no algorithms capable of planning syntheses of entire target libraries - important in modern drug discovery - have yet been reported. This study describes how network-search routines underlying existing retrosynthetic programs can be adapted and extended to multi-target design operating on one common search graph, benefitting from the use of common intermediates and reducing the overall synthetic cost. Implementation in the Chematica platform illustrates the usefulness of such algorithms in the syntheses of either (i) all members of a user-defined library, or (ii) the most synthetically accessible members of this library. In the latter case, algorithms are also readily adapted to the identification of the most facile syntheses of isotopically labelled targets. These examples are industrially relevant in the context of hit-to-lead optimization and syntheses of isotopomers of various bioactive molecules.
Collapse
Affiliation(s)
- Karol Molga
- Institute of Organic Chemistry , Polish Academy of Sciences , ul. Kasprzaka 44/52 , Warsaw 01-224 , Poland .
| | - Piotr Dittwald
- Institute of Organic Chemistry , Polish Academy of Sciences , ul. Kasprzaka 44/52 , Warsaw 01-224 , Poland .
| | - Bartosz A Grzybowski
- Institute of Organic Chemistry , Polish Academy of Sciences , ul. Kasprzaka 44/52 , Warsaw 01-224 , Poland .
- IBS Center for Soft and Living Matter and Department of Chemistry , UNIST , 50, UNIST-gil, Eonyang-eup, Ulju-gun , Ulsan , 689-798 , South Korea
| |
Collapse
|
37
|
de Almeida AF, Moreira R, Rodrigues T. Synthetic organic chemistry driven by artificial intelligence. Nat Rev Chem 2019. [DOI: 10.1038/s41570-019-0124-0] [Citation(s) in RCA: 111] [Impact Index Per Article: 22.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
38
|
Walker E, Kammeraad J, Goetz J, Robo MT, Tewari A, Zimmerman PM. Learning To Predict Reaction Conditions: Relationships between Solvent, Molecular Structure, and Catalyst. J Chem Inf Model 2019; 59:3645-3654. [PMID: 31381340 DOI: 10.1021/acs.jcim.9b00313] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Reaction databases provide a great deal of useful information to assist planning of experiments but do not provide any interpretation or chemical concepts to accompany this information. In this work, reactions are labeled with experimental conditions, and network analysis shows that consistencies within clusters of data points can be leveraged to organize this information. In particular, this analysis shows how particular experimental conditions (specifically solvent) are effective in enabling specific organic reactions (Friedel-Crafts, Aldol addition, Claisen condensation, Diels-Alder, and Wittig), including variations within each reaction class. Network analysis shows data points for reactions tend to break into clusters that depend on the catalyst and chemical structure. This type of clustering, which mimics how a chemist reasons, is derived directly from the network. Therefore, the findings of this work could augment synthesis planning by providing predictions in a fashion that mimics human chemists. To numerically evaluate solvent prediction ability, three methods are compared: network analysis (through the k-nearest neighbor algorithm), a support vector machine, and a deep neural network. The most accurate method in 4 of the 5 test cases is the network analysis, with deep neural networks also showing good prediction scores. The network analysis tool was evaluated by an expert panel of chemists, who generally agreed that the algorithm produced accurate solvent choices while simultaneously being transparent in the underlying reasons for its predictions.
Collapse
Affiliation(s)
- Eric Walker
- Department of Chemistry , University of Michigan , 930 North University Avenue , Ann Arbor , Michigan 48109 , United States
| | - Joshua Kammeraad
- Department of Chemistry , University of Michigan , 930 North University Avenue , Ann Arbor , Michigan 48109 , United States
| | - Jonathan Goetz
- Department of Statistics , University of Michigan , 1085 South University Avenue , Ann Arbor , Michigan 48109 , United States
| | - Michael T Robo
- Department of Chemistry , University of Michigan , 930 North University Avenue , Ann Arbor , Michigan 48109 , United States
| | - Ambuj Tewari
- Department of Statistics , University of Michigan , 1085 South University Avenue , Ann Arbor , Michigan 48109 , United States
| | - Paul M Zimmerman
- Department of Chemistry , University of Michigan , 930 North University Avenue , Ann Arbor , Michigan 48109 , United States
| |
Collapse
|
39
|
Yang X, Wang Y, Byrne R, Schneider G, Yang S. Concepts of Artificial Intelligence for Computer-Assisted Drug Discovery. Chem Rev 2019; 119:10520-10594. [PMID: 31294972 DOI: 10.1021/acs.chemrev.8b00728] [Citation(s) in RCA: 329] [Impact Index Per Article: 65.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Artificial intelligence (AI), and, in particular, deep learning as a subcategory of AI, provides opportunities for the discovery and development of innovative drugs. Various machine learning approaches have recently (re)emerged, some of which may be considered instances of domain-specific AI which have been successfully employed for drug discovery and design. This review provides a comprehensive portrayal of these machine learning techniques and of their applications in medicinal chemistry. After introducing the basic principles, alongside some application notes, of the various machine learning algorithms, the current state-of-the art of AI-assisted pharmaceutical discovery is discussed, including applications in structure- and ligand-based virtual screening, de novo drug design, physicochemical and pharmacokinetic property prediction, drug repurposing, and related aspects. Finally, several challenges and limitations of the current methods are summarized, with a view to potential future directions for AI-assisted drug discovery and design.
Collapse
Affiliation(s)
- Xin Yang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| | - Yifei Wang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| | - Ryan Byrne
- ETH Zurich , Department of Chemistry and Applied Biosciences , Vladimir-Prelog-Weg 4 , CH-8093 Zurich , Switzerland
| | - Gisbert Schneider
- ETH Zurich , Department of Chemistry and Applied Biosciences , Vladimir-Prelog-Weg 4 , CH-8093 Zurich , Switzerland
| | - Shengyong Yang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| |
Collapse
|
40
|
Schreck JS, Coley CW, Bishop KJM. Learning Retrosynthetic Planning through Simulated Experience. ACS CENTRAL SCIENCE 2019; 5:970-981. [PMID: 31263756 PMCID: PMC6598174 DOI: 10.1021/acscentsci.9b00055] [Citation(s) in RCA: 68] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Indexed: 05/11/2023]
Abstract
The problem of retrosynthetic planning can be framed as a one-player game, in which the chemist (or a computer program) works backward from a molecular target to simpler starting materials through a series of choices regarding which reactions to perform. This game is challenging as the combinatorial space of possible choices is astronomical, and the value of each choice remains uncertain until the synthesis plan is completed and its cost evaluated. Here, we address this search problem using deep reinforcement learning to identify policies that make (near) optimal reaction choices during each step of retrosynthetic planning according to a user-defined cost metric. Using a simulated experience, we train a neural network to estimate the expected synthesis cost or value of any given molecule based on a representation of its molecular structure. We show that learned policies based on this value network can outperform a heuristic approach that favors symmetric disconnections when synthesizing unfamiliar molecules from available starting materials using the fewest number of reactions. We discuss how the learned policies described here can be incorporated into existing synthesis planning tools and how they can be adapted to changes in the synthesis cost objective or material availability.
Collapse
Affiliation(s)
- John S. Schreck
- Department
of Chemical Engineering, Columbia University, New York, New York 10027, United States
| | - Connor W. Coley
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Kyle J. M. Bishop
- Department
of Chemical Engineering, Columbia University, New York, New York 10027, United States
| |
Collapse
|
41
|
Rappoport D, Aspuru-Guzik A. Predicting Feasible Organic Reaction Pathways Using Heuristically Aided Quantum Chemistry. J Chem Theory Comput 2019; 15:4099-4112. [DOI: 10.1021/acs.jctc.9b00126] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Affiliation(s)
- Dmitrij Rappoport
- Department of Chemistry and Chemical Biology, Harvard University, 12 Oxford Street, Cambridge, Massachusetts 02138, United States
| | - Alán Aspuru-Guzik
- Department of Chemistry and Chemical Biology, Harvard University, 12 Oxford Street, Cambridge, Massachusetts 02138, United States
| |
Collapse
|
42
|
Badowski T, Molga K, Grzybowski BA. Selection of cost-effective yet chemically diverse pathways from the networks of computer-generated retrosynthetic plans. Chem Sci 2019; 10:4640-4651. [PMID: 31123574 PMCID: PMC6495691 DOI: 10.1039/c8sc05611k] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2018] [Accepted: 02/24/2019] [Indexed: 01/01/2023] Open
Abstract
As the programs for computer-aided retrosynthetic design come of age, they are no longer identifying just one or few synthetic routes but a multitude of chemically plausible syntheses, together forming large, directed graphs of solutions. An important problem then emerges: how to select from these graphs and present to the user manageable numbers of top-scoring pathways that are cost-effective, promote convergent vs. linear solutions, and are chemically diverse so that they do not repeat only minor variations in the same chemical theme. This paper describes a family of reaction network algorithms that address this problem by (i) using recursive formulae to assign realistic prices to individual pathways and (ii) applying penalties to chemically similar strategies so that they are not dominating the top-scoring routes. Synthetic examples are provided to illustrate how these algorithms can be implemented - on the timescales of ∼1 s even for large graphs - to rapidly query the space of synthetic solutions under the scenarios of different reaction yields and/or costs associated with performing reaction operations on different scales.
Collapse
Affiliation(s)
- Tomasz Badowski
- Institute of Organic Chemistry , Polish Academy of Sciences , ul. Kasprzaka 44/52 , Warsaw 01-224 , Poland .
| | - Karol Molga
- Institute of Organic Chemistry , Polish Academy of Sciences , ul. Kasprzaka 44/52 , Warsaw 01-224 , Poland .
| | - Bartosz A Grzybowski
- Institute of Organic Chemistry , Polish Academy of Sciences , ul. Kasprzaka 44/52 , Warsaw 01-224 , Poland .
- IBS Center for Soft and Living Matter , Department of Chemistry , UNIST , 50, UNIST-gil, Eonyang-eup, Ulju-gun , Ulsan , 689-798 , South Korea
| |
Collapse
|
43
|
Lin GM, Warden-Rothman R, Voigt CA. Retrosynthetic design of metabolic pathways to chemicals not found in nature. ACTA ACUST UNITED AC 2019. [DOI: 10.1016/j.coisb.2019.04.004] [Citation(s) in RCA: 57] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
44
|
Roszak R, Bajczyk MD, Gajewska EP, Hołyst R, Grzybowski BA. Propagation of Oscillating Chemical Signals through Reaction Networks. Angew Chem Int Ed Engl 2019; 58:4520-4525. [PMID: 30397988 DOI: 10.1002/anie.201808821] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2018] [Indexed: 12/20/2022]
Abstract
Akin to electronic systems that can tune to and process signals of select frequencies, systems/networks of chemical reactions also "propagate" time-varying concentration inputs in a frequency-dependent manner. Whereas signals of low frequencies are transmitted, higher frequency inputs are dampened and converted into steady-concentration outputs. Such behavior is observed in both idealized reaction chains as well as realistic signaling cascades, in the latter case explaining the experimentally observed responses of such cascades to input calcium oscillations. These and other results are supported by numerical simulations within the freely available Kinetix web application we developed to study chemical systems of arbitrary architectures, reaction kinetics, and boundary conditions.
Collapse
Affiliation(s)
- Rafał Roszak
- Institute of Organic Chemistry, Polish Academy of Sciences, Ul. Kasprzaka 44/52, Warsaw, 02-224, Poland
| | - Michał D Bajczyk
- Institute of Organic Chemistry, Polish Academy of Sciences, Ul. Kasprzaka 44/52, Warsaw, 02-224, Poland
| | - Ewa P Gajewska
- Institute of Organic Chemistry, Polish Academy of Sciences, Ul. Kasprzaka 44/52, Warsaw, 02-224, Poland
| | - Robert Hołyst
- Institute of Physical Chemistry, Polish Academy of Sciences, Ul. Kasprzaka 44/52, Warsaw, 02-224, Poland
| | - Bartosz A Grzybowski
- Institute of Organic Chemistry, Polish Academy of Sciences, Ul. Kasprzaka 44/52, Warsaw, 02-224, Poland.,IBS Center for Soft and Living Matter and Department of Chemistry, UNIST, 50, UNIST-gil, Eonyang-eup, Ulju-gun, Ulsan, South Korea
| |
Collapse
|
45
|
Rappoport D. Reaction Networks and the Metric Structure of Chemical Space(s). J Phys Chem A 2019; 123:2610-2620. [DOI: 10.1021/acs.jpca.9b00519] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Affiliation(s)
- Dmitrij Rappoport
- Department of Chemistry, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| |
Collapse
|
46
|
Affiliation(s)
- Jian Deng
- The State Key Lab of Chemical Engineering, Department of Chemical Engineering; Tsinghua University; Beijing 100084 China
| | - Jisong Zhang
- The State Key Lab of Chemical Engineering, Department of Chemical Engineering; Tsinghua University; Beijing 100084 China
| | - Kai Wang
- The State Key Lab of Chemical Engineering, Department of Chemical Engineering; Tsinghua University; Beijing 100084 China
| | - Guangsheng Luo
- The State Key Lab of Chemical Engineering, Department of Chemical Engineering; Tsinghua University; Beijing 100084 China
| |
Collapse
|
47
|
|
48
|
Molga K, Gajewska EP, Szymkuć S, Grzybowski BA. The logic of translating chemical knowledge into machine-processable forms: a modern playground for physical-organic chemistry. REACT CHEM ENG 2019. [DOI: 10.1039/c9re00076c] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
With renewed interest and significant progress in computer-assisted synthetic planning, it is essential to codify the logic that should be followed when translating organic synthetic knowledge into reaction rules understandable to the machine.
Collapse
Affiliation(s)
- Karol Molga
- Institute of Organic Chemistry
- Polish Academy of Sciences
- Warsaw 01-224
- Poland
| | - Ewa P. Gajewska
- Institute of Organic Chemistry
- Polish Academy of Sciences
- Warsaw 01-224
- Poland
| | - Sara Szymkuć
- Institute of Organic Chemistry
- Polish Academy of Sciences
- Warsaw 01-224
- Poland
| | - Bartosz A. Grzybowski
- Institute of Organic Chemistry
- Polish Academy of Sciences
- Warsaw 01-224
- Poland
- IBS Center for Soft and Living Matter and Department of Chemistry
| |
Collapse
|
49
|
Roszak R, Bajczyk MD, Gajewska EP, Hołyst R, Grzybowski BA. Propagation of Oscillating Chemical Signals through Reaction Networks. Angew Chem Int Ed Engl 2018. [DOI: 10.1002/ange.201808821] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Affiliation(s)
- Rafał Roszak
- Institute of Organic Chemistry Polish Academy of Sciences Ul. Kasprzaka 44/52 Warsaw 02-224 Poland
| | - Michał D. Bajczyk
- Institute of Organic Chemistry Polish Academy of Sciences Ul. Kasprzaka 44/52 Warsaw 02-224 Poland
| | - Ewa P. Gajewska
- Institute of Organic Chemistry Polish Academy of Sciences Ul. Kasprzaka 44/52 Warsaw 02-224 Poland
| | - Robert Hołyst
- Institute of Physical Chemistry Polish Academy of Sciences Ul. Kasprzaka 44/52 Warsaw 02-224 Poland
| | - Bartosz A. Grzybowski
- Institute of Organic Chemistry Polish Academy of Sciences Ul. Kasprzaka 44/52 Warsaw 02-224 Poland
- IBS Center for Soft and Living Matter and Department of Chemistry UNIST 50, UNIST-gil, Eonyang-eup, Ulju-gun Ulsan South Korea
| |
Collapse
|
50
|
Thomford NE, Senthebane DA, Rowe A, Munro D, Seele P, Maroyi A, Dzobo K. Natural Products for Drug Discovery in the 21st Century: Innovations for Novel Drug Discovery. Int J Mol Sci 2018; 19:E1578. [PMID: 29799486 PMCID: PMC6032166 DOI: 10.3390/ijms19061578] [Citation(s) in RCA: 522] [Impact Index Per Article: 87.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2018] [Revised: 05/16/2018] [Accepted: 05/18/2018] [Indexed: 12/12/2022] Open
Abstract
The therapeutic properties of plants have been recognised since time immemorial. Many pathological conditions have been treated using plant-derived medicines. These medicines are used as concoctions or concentrated plant extracts without isolation of active compounds. Modern medicine however, requires the isolation and purification of one or two active compounds. There are however a lot of global health challenges with diseases such as cancer, degenerative diseases, HIV/AIDS and diabetes, of which modern medicine is struggling to provide cures. Many times the isolation of "active compound" has made the compound ineffective. Drug discovery is a multidimensional problem requiring several parameters of both natural and synthetic compounds such as safety, pharmacokinetics and efficacy to be evaluated during drug candidate selection. The advent of latest technologies that enhance drug design hypotheses such as Artificial Intelligence, the use of 'organ-on chip' and microfluidics technologies, means that automation has become part of drug discovery. This has resulted in increased speed in drug discovery and evaluation of the safety, pharmacokinetics and efficacy of candidate compounds whilst allowing novel ways of drug design and synthesis based on natural compounds. Recent advances in analytical and computational techniques have opened new avenues to process complex natural products and to use their structures to derive new and innovative drugs. Indeed, we are in the era of computational molecular design, as applied to natural products. Predictive computational softwares have contributed to the discovery of molecular targets of natural products and their derivatives. In future the use of quantum computing, computational softwares and databases in modelling molecular interactions and predicting features and parameters needed for drug development, such as pharmacokinetic and pharmacodynamics, will result in few false positive leads in drug development. This review discusses plant-based natural product drug discovery and how innovative technologies play a role in next-generation drug discovery.
Collapse
Affiliation(s)
- Nicholas Ekow Thomford
- Pharmacogenomics and Drug Metabolism Group, Division of Human Genetics, Department of Pathology and Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Anzio Road, Observatory, Cape Town 7925, South Africa.
- School of Medical Sciences, University of Cape Coast, PMB, Cape Coast, Ghana.
| | - Dimakatso Alice Senthebane
- International Centre for Genetic Engineering and Biotechnology (ICGEB), Cape Town Component, Wernher and Beit Building (South), University of Cape Town Medical Campus, Anzio Road, Observatory, Cape Town 7925, South Africa.
- Division of Medical Biochemistry and Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Anzio Road, Observatory, Cape Town 7925, South Africa.
| | - Arielle Rowe
- International Centre for Genetic Engineering and Biotechnology (ICGEB), Cape Town Component, Wernher and Beit Building (South), University of Cape Town Medical Campus, Anzio Road, Observatory, Cape Town 7925, South Africa.
| | - Daniella Munro
- Pharmacogenomics and Drug Metabolism Group, Division of Human Genetics, Department of Pathology and Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Anzio Road, Observatory, Cape Town 7925, South Africa.
| | - Palesa Seele
- Division of Chemical and Systems Biology, Department of Integrative Biomedical Sciences, Faculty of Health Sciences, University of Cape Town, Anzio Road, Observatory, Cape Town 7925, South Africa.
| | - Alfred Maroyi
- Department of Botany, University of Fort Hare, Private Bag, Alice X1314, South Africa.
| | - Kevin Dzobo
- International Centre for Genetic Engineering and Biotechnology (ICGEB), Cape Town Component, Wernher and Beit Building (South), University of Cape Town Medical Campus, Anzio Road, Observatory, Cape Town 7925, South Africa.
- Division of Medical Biochemistry and Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Anzio Road, Observatory, Cape Town 7925, South Africa.
| |
Collapse
|