1
|
Bradshaw J, Zhang A, Mahjour B, Graff DE, Segler MHS, Coley CW. Challenging Reaction Prediction Models to Generalize to Novel Chemistry. ACS CENTRAL SCIENCE 2025; 11:539-549. [PMID: 40290152 PMCID: PMC12022916 DOI: 10.1021/acscentsci.5c00055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/09/2025] [Revised: 02/21/2025] [Accepted: 03/03/2025] [Indexed: 04/30/2025]
Abstract
Deep learning models for anticipating the products of organic reactions have found many use cases, including validating retrosynthetic pathways and constraining synthesis-based molecular design tools. Despite compelling performance on popular benchmark tasks, strange and erroneous predictions sometimes ensue when using these models in practice. The core issue is that common benchmarks test models in an in-distribution setting, whereas many real-world uses for these models are in out-of-distribution settings and require a greater degree of extrapolation. To better understand how current reaction predictors work in out-of-distribution domains, we report a series of more challenging evaluations of a prototypical SMILES-based deep learning model. First, we illustrate how performance on randomly sampled data sets is overly optimistic compared to performance when generalizing to new patents or new authors. Second, we conduct time splits that evaluate how models perform when tested on reactions published years after those in their training set, mimicking real-world deployment. Finally, we consider extrapolation across reaction classes to reflect what would be required for the discovery of novel reaction types. This panel of tasks can reveal the capabilities and limitations of today's reaction predictors, acting as a crucial first step in the development of tomorrow's next-generation models capable of reaction discovery.
Collapse
Affiliation(s)
- John Bradshaw
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Anji Zhang
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States
| | - Babak Mahjour
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - David E. Graff
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Chemistry and Chemical Biology, Harvard
University, Cambridge, Massachusetts 02138, United States
| | | | - Connor W. Coley
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
2
|
Long L, Li R, Zhang J. Artificial Intelligence in Retrosynthesis Prediction and its Applications in Medicinal Chemistry. J Med Chem 2025; 68:2333-2355. [PMID: 39883477 DOI: 10.1021/acs.jmedchem.4c02749] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2025]
Abstract
Retrosynthesis is a strategy to analyze the synthetic routes for target molecules in medicinal chemistry. However, traditional retrosynthesis predictions performed by chemists and rule-based expert systems struggle to adapt to the vast chemical space of real-world scenarios. Artificial intelligence (AI) has revolutionized retrosynthesis prediction in recent decades, significantly increasing the accuracy and diversity of predictions for target compounds. Single-step AI-driven retrosynthesis models can be generalized into three types based on their dependence on predefined reaction templates (template-based, semitemplate-based methods, template-free models), with respective advantages and limitations, and common challenges that limit their medicinal chemistry applications. Moreover, there are relatively inadequate multi-step retrosynthesis methods, which lack strong links with single-step methods. Herein, we review the recent advancements in AI applications for retrosynthesis prediction by summarizing related techniques and the landscape of current representative retrosynthesis models and propose feasible solutions to tackle existing problems and outline future directions in this field.
Collapse
Affiliation(s)
- Lanxin Long
- Medicinal Chemistry and Bioinformatics Center, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| | - Rui Li
- Medicinal Chemistry and Bioinformatics Center, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
- State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| | - Jian Zhang
- Medicinal Chemistry and Bioinformatics Center, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
- State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
- Key Laboratory of Protection, Development, and Utilization of Medicinal Resources in Liupanshan Area, Ministry of Education, Peptides & Protein Drug Research Center, School of Pharmacy, Ningxia Medical University, Yinchuan 750004, China
| |
Collapse
|
3
|
Zhao PC, Wei XX, Wang Q, Wang QH, Li JN, Shang J, Lu C, Shi JY. Single-step retrosynthesis prediction via multitask graph representation learning. Nat Commun 2025; 16:814. [PMID: 39827189 PMCID: PMC11742932 DOI: 10.1038/s41467-025-56062-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 01/08/2025] [Indexed: 01/22/2025] Open
Abstract
Inferring appropriate synthesis reaction (i.e., retrosynthesis) routes for newly designed molecules is vital. Recently, computational methods have produced promising single-step retrosynthesis predictions. However, template-based methods are limited by the known synthesis templates; template-free methods are weakly interpretable; and semi template-based methods are deficient with regard to utilizing the associations between chemical entities. To address these issues, this paper leverages the intra-associations between synthons, the inter-associations between synthons and leaving groups (LGs), and the intra-associations between LGs. It develops a multitask graph representation learning model for single-step retrosynthesis prediction (Retro-MTGR) to solve reaction centre deduction and LG identification simultaneously. A comparison with 16 state-of-the-art methods first demonstrates the superiority of Retro-MTGR. Then, its robustness and scalability and the contributions of its crucial components are validated. More importantly, it can determine whether a bond can be a reaction centre and what LGs are appropriate for a given synthon, respectively. The answers reflect underlying chemical synthesis rules, especially opposite electrical properties between chemical entities (e.g., reaction sites, synthons, and LGs). Finally, case studies demonstrate that the retrosynthesis routes inferred by Retro-MTGR are promising for single-step synthesis reactions. The code and data of this study are freely available at https://doi.org/10.5281/zenodo.14346324 .
Collapse
Affiliation(s)
- Peng-Cheng Zhao
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, China
| | - Xue-Xin Wei
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, China
| | - Qiong Wang
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, China
| | - Qi-Hao Wang
- School of Chemistry and Chemical Engineering, Northwestern Polytechnical University, Xi'an, China
| | - Jia-Ning Li
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, China
| | - Jie Shang
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, China.
| | - Cheng Lu
- Institute of Basic Research in Clinical Medicine China Academy of Chinese Medical Sciences, Beijing, China.
| | - Jian-Yu Shi
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, China.
| |
Collapse
|
4
|
Han Y, Deng M, Liu K, Chen J, Wang Y, Xu YN, Dian L. Computer-Aided Synthesis Planning (CASP) and Machine Learning: Optimizing Chemical Reaction Conditions. Chemistry 2024; 30:e202401626. [PMID: 39083362 DOI: 10.1002/chem.202401626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2024] [Revised: 07/27/2024] [Accepted: 07/28/2024] [Indexed: 08/02/2024]
Abstract
Computer-aided synthesis planning (CASP) has garnered increasing attention in light of recent advancements in machine learning models. While the focus is on reverse synthesis or forward outcome prediction, optimizing reaction conditions remains a significant challenge. For datasets with multiple variables, the choice of descriptors and models is pivotal. This selection dictates the effective extraction of conditional features and the achievement of higher prediction accuracy. This review delineates the origins of data in conditional optimization, the criteria for descriptor selection, the response models, and the metrics for outcome evaluation, aiming to acquaint readers with the latest research trends and facilitate more informed research in this domain.
Collapse
Affiliation(s)
- Yu Han
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, No. 72 Binhai Avenue, Qingdao, 266237, P. R. China
| | - Mingjing Deng
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, No. 72 Binhai Avenue, Qingdao, 266237, P. R. China
| | - Ke Liu
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, No. 72 Binhai Avenue, Qingdao, 266237, P. R. China
| | - Jia Chen
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, No. 72 Binhai Avenue, Qingdao, 266237, P. R. China
| | - Yuting Wang
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, No. 72 Binhai Avenue, Qingdao, 266237, P. R. China
| | - Yu-Ning Xu
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, No. 72 Binhai Avenue, Qingdao, 266237, P. R. China
| | - Longyang Dian
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, No. 72 Binhai Avenue, Qingdao, 266237, P. R. China
- Suzhou Institute of Shandong University, No. 388 Ruoshui Road, Suzhou Industrial Park, Suzhou, 215123, P. R. China
| |
Collapse
|
5
|
Westerlund AM, Manohar Koki S, Kancharla S, Tibo A, Saigiridharan L, Kabeshov M, Mercado R, Genheden S. Do Chemformers Dream of Organic Matter? Evaluating a Transformer Model for Multistep Retrosynthesis. J Chem Inf Model 2024; 64:3021-3033. [PMID: 38602390 DOI: 10.1021/acs.jcim.3c01685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/12/2024]
Abstract
Synthesis planning of new pharmaceutical compounds is a well-known bottleneck in modern drug design. Template-free methods, such as transformers, have recently been proposed as an alternative to template-based methods for single-step retrosynthetic predictions. Here, we trained and evaluated a transformer model, called the Chemformer, for retrosynthesis predictions within drug discovery. The proprietary data set used for training comprised ∼18 M reactions from literature, patents, and electronic lab notebooks. Chemformer was evaluated for the purpose of both single-step and multistep retrosynthesis. We found that the single-step performance of Chemformer was especially good on reaction classes common in drug discovery, with most reaction classes showing a top-10 round-trip accuracy above 0.97. Moreover, Chemformer reached a higher round-trip accuracy compared to that of a template-based model. By analyzing multistep retrosynthesis experiments, we observed that Chemformer found synthetic routes, leading to commercial starting materials for 95% of the target compounds, an increase of more than 20% compared to the template-based model on a proprietary compound data set. In addition to this, we discovered that Chemformer suggested novel disconnections corresponding to reaction templates, which are not included in the template-based model. These findings were further supported by a publicly available ChEMBL compound data set. The conclusions drawn from this work allow for the design of a synthesis planning tool where template-based and template-free models work in harmony to optimize retrosynthetic recommendations.
Collapse
Affiliation(s)
- Annie M Westerlund
- Department of Molecular AI, Discovery Sciences, R&D, AstraZeneca, 43183 Mölndal, Sweden
| | - Siva Manohar Koki
- Department of Molecular AI, Discovery Sciences, R&D, AstraZeneca, 43183 Mölndal, Sweden
- Department of Computer Science and Engineering, Chalmers University of Technology, 412 96 Göteborg, Sweden
| | - Supriya Kancharla
- Department of Molecular AI, Discovery Sciences, R&D, AstraZeneca, 43183 Mölndal, Sweden
- Department of Computer Science and Engineering, Chalmers University of Technology, 412 96 Göteborg, Sweden
| | - Alessandro Tibo
- Department of Molecular AI, Discovery Sciences, R&D, AstraZeneca, 43183 Mölndal, Sweden
| | | | - Mikhail Kabeshov
- Department of Molecular AI, Discovery Sciences, R&D, AstraZeneca, 43183 Mölndal, Sweden
| | - Rocío Mercado
- Department of Computer Science and Engineering, Chalmers University of Technology, 412 96 Göteborg, Sweden
| | - Samuel Genheden
- Department of Molecular AI, Discovery Sciences, R&D, AstraZeneca, 43183 Mölndal, Sweden
| |
Collapse
|
6
|
Yu L, He X, Fang X, Liu L, Liu J. Deep Learning with Geometry-Enhanced Molecular Representation for Augmentation of Large-Scale Docking-Based Virtual Screening. J Chem Inf Model 2023; 63:6501-6514. [PMID: 37882338 DOI: 10.1021/acs.jcim.3c01371] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2023]
Abstract
Structure-based virtual screening has been a crucial tool in drug discovery for decades. However, as the chemical space expands, the existing structure-based virtual screening techniques based on molecular docking and scoring struggle to handle billion-entry ultralarge libraries due to the high computational cost. To address this challenge, people have resorted to machine learning techniques to enhance structure-based virtual screening for efficiently exploring the vast chemical space. In those cases, compounds are usually treated as sequential strings or two-dimensional topology graphs, limiting their ability to incorporate three-dimensional structural information for downstream tasks. We herein propose a novel deep learning protocol, GEM-Screen, which utilizes the geometry-enhanced molecular representation of the compounds docking to a specific target and is trained on docking scores of a small fraction of a library through an active learning strategy to approximate the docking outcome for yet nontraining entries. This protocol is applied to virtual screening campaigns against the AmpC and D4 targets, demonstrating that GEM-Screen enriches more than 90% of the hit scaffolds for AmpC in the top 4% of model predictions and more than 80% of the hit scaffolds for D4 in the same top-ranking size of library. GEM-Screen can be used in conjunction with traditional docking programs for docking of only the top-ranked compounds to avoid the exhaustive docking of the whole library, thus allowing for discovering top-scoring compounds from billion-entry libraries in a rapid yet accurate fashion.
Collapse
Affiliation(s)
- Lan Yu
- School of Science, China Pharmaceutical University, Nanjing 210009, China
| | - Xiao He
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, Shanghai Frontiers Science Center of Molecule Intelligent Syntheses, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China
- New York University-East China Normal University Center for Computational Chemistry, New York University Shanghai, Shanghai 200062, China
| | - Xiaomin Fang
- Baidu International Technology (Shenzhen) Co., Ltd., Shenzhen 518063, China
| | - Lihang Liu
- Baidu International Technology (Shenzhen) Co., Ltd., Shenzhen 518063, China
| | - Jinfeng Liu
- School of Science, China Pharmaceutical University, Nanjing 210009, China
- School of Basic Medicine and Clinical Pharmacy, China Pharmaceutical University, Nanjing 210009, China
| |
Collapse
|
7
|
Shim E, Tewari A, Cernak T, Zimmerman PM. Machine Learning Strategies for Reaction Development: Toward the Low-Data Limit. J Chem Inf Model 2023; 63:3659-3668. [PMID: 37312524 PMCID: PMC11163943 DOI: 10.1021/acs.jcim.3c00577] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Machine learning models are increasingly being utilized to predict outcomes of organic chemical reactions. A large amount of reaction data is used to train these models, which is in stark contrast to how expert chemists discover and develop new reactions by leveraging information from a small number of relevant transformations. Transfer learning and active learning are two strategies that can operate in low-data situations, which may help fill this gap and promote the use of machine learning for tackling real-world challenges in organic synthesis. This Perspective introduces active and transfer learning and connects these to potential opportunities and directions for further research, especially in the area of prospective development of chemical transformations.
Collapse
Affiliation(s)
- Eunjae Shim
- Department of Chemistry, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Ambuj Tewari
- Department of Statistics, University of Michigan, Ann Arbor, Michigan 48109, United States
- Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Tim Cernak
- Department of Chemistry, University of Michigan, Ann Arbor, Michigan 48109, United States
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Paul M Zimmerman
- Department of Chemistry, University of Michigan, Ann Arbor, Michigan 48109, United States
| |
Collapse
|
8
|
A Review on Artificial Intelligence Enabled Design, Synthesis, and Process Optimization of Chemical Products for Industry 4.0. Processes (Basel) 2023. [DOI: 10.3390/pr11020330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
With the development of Industry 4.0, artificial intelligence (AI) is gaining increasing attention for its performance in solving particularly complex problems in industrial chemistry and chemical engineering. Therefore, this review provides an overview of the application of AI techniques, in particular machine learning, in chemical design, synthesis, and process optimization over the past years. In this review, the focus is on the application of AI for structure-function relationship analysis, synthetic route planning, and automated synthesis. Finally, we discuss the challenges and future of AI in making chemical products.
Collapse
|
9
|
Tu Z, Stuyver T, Coley CW. Predictive chemistry: machine learning for reaction deployment, reaction development, and reaction discovery. Chem Sci 2023; 14:226-244. [PMID: 36743887 PMCID: PMC9811563 DOI: 10.1039/d2sc05089g] [Citation(s) in RCA: 37] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Accepted: 11/25/2022] [Indexed: 11/29/2022] Open
Abstract
The field of predictive chemistry relates to the development of models able to describe how molecules interact and react. It encompasses the long-standing task of computer-aided retrosynthesis, but is far more reaching and ambitious in its goals. In this review, we summarize several areas where predictive chemistry models hold the potential to accelerate the deployment, development, and discovery of organic reactions and advance synthetic chemistry.
Collapse
Affiliation(s)
- Zhengkai Tu
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| | - Thijs Stuyver
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| | - Connor W Coley
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| |
Collapse
|
10
|
Wu Z, Cai X, Zhang C, Qiao H, Wu Y, Zhang Y, Wang X, Xie H, Luo F, Duan H. Self-Supervised Molecular Pretraining Strategy for Low-Resource Reaction Prediction Scenarios. J Chem Inf Model 2022; 62:4579-4590. [PMID: 36129104 DOI: 10.1021/acs.jcim.2c00588] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
In the face of low-resource reaction training samples, we construct a chemical platform for addressing small-scale reaction prediction problems. Using a self-supervised pretraining strategy called MAsked Sequence to Sequence (MASS), the Transformer model can absorb the chemical information of about 1 billion molecules and then fine-tune on a small-scale reaction prediction. To further strengthen the predictive performance of our model, we combine MASS with the reaction transfer learning strategy. Here, we show that the average improved accuracies of the Transformer model can reach 14.07, 24.26, 40.31, and 57.69% in predicting the Baeyer-Villiger, Heck, C-C bond formation, and functional group interconversion reaction data sets, respectively, marking an important step to low-resource reaction prediction.
Collapse
Affiliation(s)
- Zhipeng Wu
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou 310014, P. R. China
| | - Xiang Cai
- PyWise Biotech, Suzhou 215000, P. R. China
| | - Chengyun Zhang
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou 310014, P. R. China
| | - Haoran Qiao
- College of Mathematics and Physics, Shanghai University of Electric Power, Shanghai 201203, P. R. China
| | - Yejian Wu
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou 310014, P. R. China
| | - Yun Zhang
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou 310014, P. R. China
| | - Xinqiao Wang
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou 310014, P. R. China
| | - Haiying Xie
- PUROTON Gene Medical Institute Co., Ltd., Chongqing 400700, P. R. China
| | - Feng Luo
- PUROTON Gene Medical Institute Co., Ltd., Chongqing 400700, P. R. China
| | - Hongliang Duan
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou 310014, P. R. China
| |
Collapse
|
11
|
RetroComposer: Composing Templates for Template-Based Retrosynthesis Prediction. Biomolecules 2022; 12:biom12091325. [PMID: 36139164 PMCID: PMC9496376 DOI: 10.3390/biom12091325] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Revised: 09/14/2022] [Accepted: 09/15/2022] [Indexed: 11/22/2022] Open
Abstract
The main target of retrosynthesis is to recursively decompose desired molecules into available building blocks. Existing template-based retrosynthesis methods follow a template selection stereotype and suffer from limited training templates, which prevents them from discovering novel reactions. To overcome this limitation, we propose an innovative retrosynthesis prediction framework that can compose novel templates beyond training templates. As far as we know, this is the first method that uses machine learning to compose reaction templates for retrosynthesis prediction. Besides, we propose an effective reactant candidate scoring model that can capture atom-level transformations, which helps our method outperform previous methods on the USPTO-50K dataset. Experimental results show that our method can produce novel templates for 15 USPTO-50K test reactions that are not covered by training templates. We have released our source implementation.
Collapse
|
12
|
Bruce‐Chwatt T, Naidoo KJ. Molecular mechanisms from reaction coordinate graph enabled multidimensional free energies illustrated on water dimer hydrogen bonding. J Comput Chem 2022; 43:1802-1813. [PMID: 36054751 PMCID: PMC9543413 DOI: 10.1002/jcc.26982] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Revised: 06/10/2022] [Accepted: 07/22/2022] [Indexed: 11/11/2022]
Abstract
Computing the free energies of molecular mechanisms in multidimensional space relies on combinations of geometrically complex reaction coordinates. We show how a graph theory implementation reduces complexity, and illustrate this on the arrangements of hydrogen bonding of a water dimer. The reaction coordinates and forces are computed using graphs that define the dependencies on the atoms in the Free Energy from Adaptive Reaction Coordinate Forces (FEARCF) library. The library can be interfaced with classical molecular dynamics as well as quantum molecular dynamics packages. Multidimensional interdependent reaction coordinates are constructed to produce complex free energy hypersurfaces. The reaction coordinates are graphed from atomic and molecular components to define points, distances, vectors, angles, planes and combinations thereof. The resultant free energy surfaces that are a function of the distance, angles, planes, and so on, can represent molecular mechanisms in reduced dimensions from the component atomic Cartesian coordinate degrees of freedom. The FEARCF library can be interfaced with any molecular package. Here, we demonstrate the link to NWChem to compute a hyperdimensional DFT (aug‐cc‐pVDZ basis set and X3LYP exchange correlation functionals) free energy space of a water dimer. Analysis of the water dimer free energy hypervolume reveals that while the chain and cyclic hydrogen bonding configurations are located in stable minimum energy wells, the bifurcated hydrogen bond configuration is a gateway to instability and dimer dissociation.
Collapse
Affiliation(s)
- Tomás Bruce‐Chwatt
- Scientific Computing Research Unit, Department of Chemistry University of Cape Town Cape Town South Africa
| | - Kevin J. Naidoo
- Scientific Computing Research Unit, Department of Chemistry University of Cape Town Cape Town South Africa
| |
Collapse
|
13
|
Kondinski A, Menon A, Nurkowski D, Farazi F, Mosbach S, Akroyd J, Kraft M. Automated Rational Design of Metal-Organic Polyhedra. J Am Chem Soc 2022; 144:11713-11728. [PMID: 35731954 PMCID: PMC9264355 DOI: 10.1021/jacs.2c03402] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Metal-organic polyhedra (MOPs) are hybrid organic-inorganic nanomolecules, whose rational design depends on harmonious consideration of chemical complementarity and spatial compatibility between two or more types of chemical building units (CBUs). In this work, we apply knowledge engineering technology to automate the derivation of MOP formulations based on existing knowledge. For this purpose we have (i) curated relevant MOP and CBU data; (ii) developed an assembly model concept that embeds rules in the MOP construction; (iii) developed an OntoMOPs ontology that defines MOPs and their key properties; (iv) input agents that populate The World Avatar (TWA) knowledge graph; and (v) input agents that, using information from TWA, derive a list of new constructible MOPs. Our result provides rapid and automated instantiation of MOPs in TWA and unveils the immediate chemical space of known MOPs, thus shedding light on new MOP targets for future investigations.
Collapse
Affiliation(s)
- Aleksandar Kondinski
- Department
of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, U.K.
| | - Angiras Menon
- Department
of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, U.K.
| | - Daniel Nurkowski
- CMCL
Innovations, Sheraton House, Castle Park, Cambridge CB3 0AX, U.K.
| | - Feroz Farazi
- Department
of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, U.K.
| | - Sebastian Mosbach
- Department
of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, U.K.
| | - Jethro Akroyd
- Department
of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, U.K.
| | - Markus Kraft
- Department
of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, U.K.
- CMCL
Innovations, Sheraton House, Castle Park, Cambridge CB3 0AX, U.K.
- CARES, Cambridge Centre for Advanced Research and Education
in Singapore, 1 Create
Way, CREATE Tower, #05-05, Singapore 138602
- School
of Chemical and Biomedical Engineering, Nanyang Technological University, 62 Nanyang Drive, Singapore 637459
- The
Alan Turing Institute, 2QR, John Dodson House, 96 Euston Road, London NW1 2DB, U.K.
| |
Collapse
|
14
|
Lustosa DM, Milo A. Mechanistic Inference from Statistical Models at Different Data-Size Regimes. ACS Catal 2022. [DOI: 10.1021/acscatal.2c01741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Danilo M. Lustosa
- Department of Chemistry, Ben-Gurion University of the Negev, Beer Sheva 84105, Israel
| | - Anat Milo
- Department of Chemistry, Ben-Gurion University of the Negev, Beer Sheva 84105, Israel
| |
Collapse
|
15
|
Zheng S, Zeng T, Li C, Chen B, Coley CW, Yang Y, Wu R. Deep learning driven biosynthetic pathways navigation for natural products with BioNavi-NP. Nat Commun 2022; 13:3342. [PMID: 35688826 PMCID: PMC9187661 DOI: 10.1038/s41467-022-30970-9] [Citation(s) in RCA: 59] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Accepted: 05/27/2022] [Indexed: 12/30/2022] Open
Abstract
The complete biosynthetic pathways are unknown for most natural products (NPs), it is thus valuable to make computer-aided bio-retrosynthesis predictions. Here, a navigable and user-friendly toolkit, BioNavi-NP, is developed to predict the biosynthetic pathways for both NPs and NP-like compounds. First, a single-step bio-retrosynthesis prediction model is trained using both general organic and biosynthetic reactions through end-to-end transformer neural networks. Based on this model, plausible biosynthetic pathways can be efficiently sampled through an AND-OR tree-based planning algorithm from iterative multi-step bio-retrosynthetic routes. Extensive evaluations reveal that BioNavi-NP can identify biosynthetic pathways for 90.2% of 368 test compounds and recover the reported building blocks as in the test set for 72.8%, 1.7 times more accurate than existing conventional rule-based approaches. The model is further shown to identify biologically plausible pathways for complex NPs collected from the recent literature. The toolkit as well as the curated datasets and learned models are freely available to facilitate the elucidation and reconstruction of the biosynthetic pathways for NPs. The complete biosynthetic pathway from most natural products (NPs) are unknown. Here, the authors report BioNavi-NP, a computational toolkit for bio-retrosynthetic pathway elucidation or reconstruction for both NPs and NP-like compounds.
Collapse
Affiliation(s)
- Shuangjia Zheng
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou, 510006, China.,School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510006, China.,Galixir, Beijing, China.,School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510006, China
| | - Tao Zeng
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou, 510006, China
| | | | - Binghong Chen
- College of Computing, Georgia Institute of Technology, Atlanta, GA, USA
| | - Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510006, China.
| | - Ruibo Wu
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou, 510006, China.
| |
Collapse
|
16
|
Abstract
The study aims to analyze the degree of similarity of some molecules belonging to two subgroups of Aminoalkylindoles. After extracting the molecules’ characteristics using Cheminformatics methods, and the computation of the Tanimoto coefficients, dendrograms and heatmaps were built to reveal the degree of similarity of the analyzed drugs. Some atom-pair similarities between the molecules in the same group were detected. The clusters determined by the k-means method divided the Benzoylindoles into two subgroups but kept all the Phenylacetylindoles together in the same set. The activity spectrum of the elements in each group was also analyzed, and similarities have been emphasized. The clustering has been validated using the Kruskal–Wallis test on the series of computed probabilities of the main effects.
Collapse
|
17
|
Abstract
There is significant interest and importance to develop robust machine learning models to assist organic chemistry synthesis. Typically, task-specific machine learning models for distinct reaction prediction tasks have been developed. In this work, we develop a unified deep learning model, T5Chem, for a variety of chemical reaction predictions tasks by adapting the "Text-to-Text Transfer Transformer" (T5) framework in natural language processing (NLP). On the basis of self-supervised pretraining with PubChem molecules, the T5Chem model can achieve state-of-the-art performances for four distinct types of task-specific reaction prediction tasks using four different open-source data sets, including reaction type classification on USPTO_TPL, forward reaction prediction on USPTO_MIT, single-step retrosynthesis on USPTO_50k, and reaction yield prediction on high-throughput C-N coupling reactions. Meanwhile, we introduced a new unified multitask reaction prediction data set USPTO_500_MT, which can be used to train and test five different types of reaction tasks, including the above four as well as a new reagent suggestion task. Our results showed that models trained with multiple tasks are more robust and can benefit from mutual learning on related tasks. Furthermore, we demonstrated the use of SHAP (SHapley Additive exPlanations) to explain T5Chem predictions at the functional group level, which provides a way to demystify sequence-based deep learning models in chemistry. T5Chem is accessible through https://yzhang.hpc.nyu.edu/T5Chem.
Collapse
Affiliation(s)
- Jieyu Lu
- Department of Chemistry, New York University, New York, New York 10003, United States
| | - Yingkai Zhang
- Department of Chemistry, New York University, New York, New York 10003, United States
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
18
|
Lin MH, Tu Z, Coley CW. Improving the performance of models for one-step retrosynthesis through re-ranking. J Cheminform 2022; 14:15. [PMID: 35292121 PMCID: PMC8922884 DOI: 10.1186/s13321-022-00594-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Accepted: 02/26/2022] [Indexed: 12/03/2022] Open
Abstract
Abstract Retrosynthesis is at the core of organic chemistry. Recently, the rapid growth of artificial intelligence (AI) has spurred a variety of novel machine learning approaches for data-driven synthesis planning. These methods learn complex patterns from reaction databases in order to predict, for a given product, sets of reactants that can be used to synthesise that product. However, their performance as measured by the top-N accuracy in matching published reaction precedents still leaves room for improvement. This work aims to enhance these models by learning to re-rank their reactant predictions. Specifically, we design and train an energy-based model to re-rank, for each product, the published reaction as the top suggestion and the remaining reactant predictions as lower-ranked. We show that re-ranking can improve one-step models significantly using the standard USPTO-50k benchmark dataset, such as RetroSim, a similarity-based method, from 35.7 to 51.8% top-1 accuracy and NeuralSym, a deep learning method, from 45.7 to 51.3%, and also that re-ranking the union of two models’ suggestions can lead to better performance than either alone. However, the state-of-the-art top-1 accuracy is not improved by this method. Graphical Abstract ![]()
Supplementary Information The online version contains supplementary material available at 10.1186/s13321-022-00594-8.
Collapse
Affiliation(s)
- Min Htoo Lin
- Division of Chemistry and Biological Chemistry, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, 637371, Singapore
| | - Zhengkai Tu
- Computational Science and Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA, 02139, USA
| | - Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA, 02139, USA.
| |
Collapse
|
19
|
Ucak UV, Ashyrmamatov I, Ko J, Lee J. Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nat Commun 2022; 13:1186. [PMID: 35246540 PMCID: PMC8897428 DOI: 10.1038/s41467-022-28857-w] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Accepted: 02/10/2022] [Indexed: 11/09/2022] Open
Abstract
Designing efficient synthetic routes for a target molecule remains a major challenge in organic synthesis. Atom environments are ideal, stand-alone, chemically meaningful building blocks providing a high-resolution molecular representation. Our approach mimics chemical reasoning, and predicts reactant candidates by learning the changes of atom environments associated with the chemical reaction. Through careful inspection of reactant candidates, we demonstrate atom environments as promising descriptors for studying reaction route prediction and discovery. Here, we present a new single-step retrosynthesis prediction method, viz. RetroTRAE, being free from all SMILES-based translation issues, yields a top-1 accuracy of 58.3% on the USPTO test dataset, and top-1 accuracy reaches to 61.6% with the inclusion of highly similar analogs, outperforming other state-of-the-art neural machine translation-based methods. Our methodology introduces a novel scheme for fragmental and topological descriptors to be used as natural inputs for retrosynthetic prediction tasks.
Collapse
Affiliation(s)
- Umit V Ucak
- Department of Chemistry, Division of Chemistry and Biochemistry, Kangwon National University, Chuncheon, 24341, Republic of Korea
| | - Islambek Ashyrmamatov
- Department of Chemistry, Division of Chemistry and Biochemistry, Kangwon National University, Chuncheon, 24341, Republic of Korea
| | - Junsu Ko
- Arontier co., Seoul, Republic of Korea
| | - Juyong Lee
- Department of Chemistry, Division of Chemistry and Biochemistry, Kangwon National University, Chuncheon, 24341, Republic of Korea.
- Arontier co., Seoul, Republic of Korea.
| |
Collapse
|
20
|
Bai J, Cao L, Mosbach S, Akroyd J, Lapkin AA, Kraft M. From Platform to Knowledge Graph: Evolution of Laboratory Automation. JACS AU 2022; 2:292-309. [PMID: 35252980 PMCID: PMC8889618 DOI: 10.1021/jacsau.1c00438] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Indexed: 05/19/2023]
Abstract
High-fidelity computer-aided experimentation is becoming more accessible with the development of computing power and artificial intelligence tools. The advancement of experimental hardware also empowers researchers to reach a level of accuracy that was not possible in the past. Marching toward the next generation of self-driving laboratories, the orchestration of both resources lies at the focal point of autonomous discovery in chemical science. To achieve such a goal, algorithmically accessible data representations and standardized communication protocols are indispensable. In this perspective, we recategorize the recently introduced approach based on Materials Acceleration Platforms into five functional components and discuss recent case studies that focus on the data representation and exchange scheme between different components. Emerging technologies for interoperable data representation and multi-agent systems are also discussed with their recent applications in chemical automation. We hypothesize that knowledge graph technology, orchestrating semantic web technologies and multi-agent systems, will be the driving force to bring data to knowledge, evolving our way of automating the laboratory.
Collapse
Affiliation(s)
- Jiaru Bai
- Department
of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, United Kingdom
| | - Liwei Cao
- Department
of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, United Kingdom
| | - Sebastian Mosbach
- Department
of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, United Kingdom
- Cambridge
Centre for Advanced Research and Education in Singapore (CARES), CREATE Tower #05-05, 1 Create Way, 138602 Singapore
| | - Jethro Akroyd
- Department
of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, United Kingdom
- Cambridge
Centre for Advanced Research and Education in Singapore (CARES), CREATE Tower #05-05, 1 Create Way, 138602 Singapore
| | - Alexei A. Lapkin
- Department
of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, United Kingdom
- Cambridge
Centre for Advanced Research and Education in Singapore (CARES), CREATE Tower #05-05, 1 Create Way, 138602 Singapore
| | - Markus Kraft
- Department
of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, United Kingdom
- Cambridge
Centre for Advanced Research and Education in Singapore (CARES), CREATE Tower #05-05, 1 Create Way, 138602 Singapore
- School
of Chemical and Biomedical Engineering, Nanyang Technological University, 62 Nanyang Drive, 637459 Singapore
- The
Alan Turing Institute, London NW1 2DB, United Kingdom
| |
Collapse
|
21
|
Batchu SP, Hernandez Blazquez B, Malhotra A, Fang H, Ierapetritou M, Vlachos D. Accelerating Manufacturing for Biomass Conversion via Integrated Process and Bench Digitalization: A Perspective. REACT CHEM ENG 2022. [DOI: 10.1039/d1re00560j] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
We present a perspective for accelerating biomass manufacturing via digitalization. We summarize the challenges for manufacturing and identify areas where digitalization can help. A profound potential in using lignocellulosic biomass...
Collapse
|
22
|
Afonina VA, Mazitov DA, Nurmukhametova A, Shevelev MD, Khasanova DA, Nugmanov RI, Burilov VA, Madzhidov TI, Varnek A. Prediction of Optimal Conditions of Hydrogenation Reaction Using the Likelihood Ranking Approach. Int J Mol Sci 2021; 23:ijms23010248. [PMID: 35008674 PMCID: PMC8745269 DOI: 10.3390/ijms23010248] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Revised: 12/18/2021] [Accepted: 12/23/2021] [Indexed: 11/20/2022] Open
Abstract
The selection of experimental conditions leading to a reasonable yield is an important and essential element for the automated development of a synthesis plan and the subsequent synthesis of the target compound. The classical QSPR approach, requiring one-to-one correspondence between chemical structure and a target property, can be used for optimal reaction conditions prediction only on a limited scale when only one condition component (e.g., catalyst or solvent) is considered. However, a particular reaction can proceed under several different conditions. In this paper, we describe the Likelihood Ranking Model representing an artificial neural network that outputs a list of different conditions ranked according to their suitability to a given chemical transformation. Benchmarking calculations demonstrated that our model outperformed some popular approaches to the theoretical assessment of reaction conditions, such as k Nearest Neighbors, and a recurrent artificial neural network performance prediction of condition components (reagents, solvents, catalysts, and temperature). The ability of the Likelihood Ranking model trained on a hydrogenation reactions dataset, (~42,000 reactions) from Reaxys® database, to propose conditions that led to the desired product was validated experimentally on a set of three reactions with rich selectivity issues.
Collapse
Affiliation(s)
- Valentina A. Afonina
- Chemoinformatics and Molecular Modelling Lab, A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya Str. 18, 420008 Kazan, Russia; (V.A.A.); (D.A.M.); (A.N.); (M.D.S.); (D.A.K.); (R.I.N.); (V.A.B.)
| | - Daniyar A. Mazitov
- Chemoinformatics and Molecular Modelling Lab, A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya Str. 18, 420008 Kazan, Russia; (V.A.A.); (D.A.M.); (A.N.); (M.D.S.); (D.A.K.); (R.I.N.); (V.A.B.)
| | - Albina Nurmukhametova
- Chemoinformatics and Molecular Modelling Lab, A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya Str. 18, 420008 Kazan, Russia; (V.A.A.); (D.A.M.); (A.N.); (M.D.S.); (D.A.K.); (R.I.N.); (V.A.B.)
| | - Maxim D. Shevelev
- Chemoinformatics and Molecular Modelling Lab, A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya Str. 18, 420008 Kazan, Russia; (V.A.A.); (D.A.M.); (A.N.); (M.D.S.); (D.A.K.); (R.I.N.); (V.A.B.)
- Laboratory of Chemoinformatics (UMR 7140 CNRS/UniStra), Université de Strasbourg, 4, Rue Blaise Pascal, 67000 Strasbourg, France
| | - Dina A. Khasanova
- Chemoinformatics and Molecular Modelling Lab, A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya Str. 18, 420008 Kazan, Russia; (V.A.A.); (D.A.M.); (A.N.); (M.D.S.); (D.A.K.); (R.I.N.); (V.A.B.)
| | - Ramil I. Nugmanov
- Chemoinformatics and Molecular Modelling Lab, A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya Str. 18, 420008 Kazan, Russia; (V.A.A.); (D.A.M.); (A.N.); (M.D.S.); (D.A.K.); (R.I.N.); (V.A.B.)
| | - Vladimir A. Burilov
- Chemoinformatics and Molecular Modelling Lab, A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya Str. 18, 420008 Kazan, Russia; (V.A.A.); (D.A.M.); (A.N.); (M.D.S.); (D.A.K.); (R.I.N.); (V.A.B.)
| | - Timur I. Madzhidov
- Chemoinformatics and Molecular Modelling Lab, A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya Str. 18, 420008 Kazan, Russia; (V.A.A.); (D.A.M.); (A.N.); (M.D.S.); (D.A.K.); (R.I.N.); (V.A.B.)
- Correspondence: (T.I.M.); (A.V.)
| | - Alexandre Varnek
- Laboratory of Chemoinformatics (UMR 7140 CNRS/UniStra), Université de Strasbourg, 4, Rue Blaise Pascal, 67000 Strasbourg, France
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, Sapporo 001-0021, Japan
- Correspondence: (T.I.M.); (A.V.)
| |
Collapse
|
23
|
Mann V, Venkatasubramanian V. Retrosynthesis prediction using grammar-based neural machine translation: An information-theoretic approach. Comput Chem Eng 2021. [DOI: 10.1016/j.compchemeng.2021.107533] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
24
|
Muller C, Rabal O, Diaz Gonzalez C. Artificial Intelligence, Machine Learning, and Deep Learning in Real-Life Drug Design Cases. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2021; 2390:383-407. [PMID: 34731478 DOI: 10.1007/978-1-0716-1787-8_16] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
The discovery and development of drugs is a long and expensive process with a high attrition rate. Computational drug discovery contributes to ligand discovery and optimization, by using models that describe the properties of ligands and their interactions with biological targets. In recent years, artificial intelligence (AI) has made remarkable modeling progress, driven by new algorithms and by the increase in computing power and storage capacities, which allow the processing of large amounts of data in a short time. This review provides the current state of the art of AI methods applied to drug discovery, with a focus on structure- and ligand-based virtual screening, library design and high-throughput analysis, drug repurposing and drug sensitivity, de novo design, chemical reactions and synthetic accessibility, ADMET, and quantum mechanics.
Collapse
Affiliation(s)
- Christophe Muller
- Evotec (France) SAS, Computational Drug Discovery, Integrated Drug Discovery, Toulouse, France
| | - Obdulia Rabal
- Evotec (France) SAS, Computational Drug Discovery, Integrated Drug Discovery, Toulouse, France
| | | |
Collapse
|
25
|
Wang Z, Zhang W, Liu B. Computational Analysis of Synthetic Planning: Past and Future. CHINESE J CHEM 2021. [DOI: 10.1002/cjoc.202100273] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Zhuang Wang
- Key Laboratory of Green Chemistry & Technology of Ministry of Education, College of Chemistry, Sichuan University, 29 Wangjiang Rd., Chengdu, Sichuan 610064 (China) Center for Molecular Discovery, Department of Chemistry, Boston University, 590 Commonwealth Ave., Boston, Massachusetts 02215, United States cCurrent Address: One Amgen Center Dr. Amgen Inc., Thousand Oaks California 91320 United States
| | - Wenhan Zhang
- Key Laboratory of Green Chemistry & Technology of Ministry of Education, College of Chemistry, Sichuan University, 29 Wangjiang Rd., Chengdu, Sichuan 610064 (China) Center for Molecular Discovery, Department of Chemistry, Boston University, 590 Commonwealth Ave., Boston, Massachusetts 02215, United States cCurrent Address: One Amgen Center Dr. Amgen Inc., Thousand Oaks California 91320 United States
| | - Bo Liu
- Key Laboratory of Green Chemistry & Technology of Ministry of Education, College of Chemistry, Sichuan University, 29 Wangjiang Rd., Chengdu, Sichuan 610064 (China) Center for Molecular Discovery, Department of Chemistry, Boston University, 590 Commonwealth Ave., Boston, Massachusetts 02215, United States cCurrent Address: One Amgen Center Dr. Amgen Inc., Thousand Oaks California 91320 United States
| |
Collapse
|
26
|
Weber JM, Guo Z, Zhang C, Schweidtmann AM, Lapkin AA. Chemical data intelligence for sustainable chemistry. Chem Soc Rev 2021; 50:12013-12036. [PMID: 34520507 DOI: 10.1039/d1cs00477h] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
This study highlights new opportunities for optimal reaction route selection from large chemical databases brought about by the rapid digitalisation of chemical data. The chemical industry requires a transformation towards more sustainable practices, eliminating its dependencies on fossil fuels and limiting its impact on the environment. However, identifying more sustainable process alternatives is, at present, a cumbersome, manual, iterative process, based on chemical intuition and modelling. We give a perspective on methods for automated discovery and assessment of competitive sustainable reaction routes based on renewable or waste feedstocks. Three key areas of transition are outlined and reviewed based on their state-of-the-art as well as bottlenecks: (i) data, (ii) evaluation metrics, and (iii) decision-making. We elucidate their synergies and interfaces since only together these areas can bring about the most benefit. The field of chemical data intelligence offers the opportunity to identify the inherently more sustainable reaction pathways and to identify opportunities for a circular chemical economy. Our review shows that at present the field of data brings about most bottlenecks, such as data completion and data linkage, but also offers the principal opportunity for advancement.
Collapse
Affiliation(s)
- Jana M Weber
- Department of Chemical Engineering and Biotechnology, University of Cambridge, West Cambridge Site, Philippa Fawcett Drive, Cambridge CB3 0AS, UK. .,Chemical Data Intelligence (CDI) Pte Ltd, Robinson Road, #02-00, 068898, Singapore
| | - Zhen Guo
- Chemical Data Intelligence (CDI) Pte Ltd, Robinson Road, #02-00, 068898, Singapore.,Cambridge Centre for Advanced Research and Education in Singapore, CARES Ltd. 1 CREATE Way, CREATE Tower #05-05, 138602, Singapore
| | - Chonghuan Zhang
- Department of Chemical Engineering and Biotechnology, University of Cambridge, West Cambridge Site, Philippa Fawcett Drive, Cambridge CB3 0AS, UK.
| | - Artur M Schweidtmann
- Department of Chemical Engineering, Delft University of Technology, Van der Maasweg 9, Delft 2629 HZ, The Netherlands
| | - Alexei A Lapkin
- Department of Chemical Engineering and Biotechnology, University of Cambridge, West Cambridge Site, Philippa Fawcett Drive, Cambridge CB3 0AS, UK. .,Chemical Data Intelligence (CDI) Pte Ltd, Robinson Road, #02-00, 068898, Singapore.,Cambridge Centre for Advanced Research and Education in Singapore, CARES Ltd. 1 CREATE Way, CREATE Tower #05-05, 138602, Singapore
| |
Collapse
|
27
|
Jia P, Pei J, Wang G, Pan X, Zhu Y, Wu Y, Ouyang L. The roles of computer-aided drug synthesis in drug development. GREEN SYNTHESIS AND CATALYSIS 2021. [DOI: 10.1016/j.gresc.2021.11.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
|
28
|
Hammer AS, Leonov AI, Bell NL, Cronin L. Chemputation and the Standardization of Chemical Informatics. JACS AU 2021; 1:1572-1587. [PMID: 34723260 PMCID: PMC8549037 DOI: 10.1021/jacsau.1c00303] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Indexed: 05/11/2023]
Abstract
The explosion in the use of machine learning for automated chemical reaction optimization is gathering pace. However, the lack of a standard architecture that connects the concept of chemical transformations universally to software and hardware provides a barrier to using the results of these optimizations and could cause the loss of relevant data and prevent reactions from being reproducible or unexpected findings verifiable or explainable. In this Perspective, we describe how the development of the field of digital chemistry or chemputation, that is the universal code-enabled control of chemical reactions using a standard language and ontology, will remove these barriers allowing users to focus on the chemistry and plug in algorithms according to the problem space to be explored or unit function to be optimized. We describe a standard hardware (the chemical processing programming architecture-the ChemPU) to encompass all chemical synthesis, an approach which unifies all chemistry automation strategies, from solid-phase peptide synthesis, to HTE flow chemistry platforms, while at the same time establishing a publication standard so that researchers can exchange chemical code (χDL) to ensure reproducibility and interoperability. Not only can a vast range of different chemistries be plugged into the hardware, but the ever-expanding developments in software and algorithms can also be accommodated. These technologies, when combined will allow chemistry, or chemputation, to follow computation-that is the running of code across many different types of capable hardware to get the same result every time with a low error rate.
Collapse
|
29
|
Dong J, Zhao M, Liu Y, Su Y, Zeng X. Deep learning in retrosynthesis planning: datasets, models and tools. Brief Bioinform 2021; 23:6375056. [PMID: 34571535 DOI: 10.1093/bib/bbab391] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 08/16/2021] [Accepted: 08/30/2021] [Indexed: 12/29/2022] Open
Abstract
In recent years, synthesizing drugs powered by artificial intelligence has brought great convenience to society. Since retrosynthetic analysis occupies an essential position in synthetic chemistry, it has received broad attention from researchers. In this review, we comprehensively summarize the development process of retrosynthesis in the context of deep learning. This review covers all aspects of retrosynthesis, including datasets, models and tools. Specifically, we report representative models from academia, in addition to a detailed description of the available and stable platforms in the industry. We also discuss the disadvantages of the existing models and provide potential future trends, so that more abecedarians will quickly understand and participate in the family of retrosynthesis planning.
Collapse
Affiliation(s)
- Jingxin Dong
- College of Information Science and Engineering, Hunan University, 2 Lushan S Rd, Yuelu District, 410086, Hunan, China
| | - Mingyi Zhao
- Department of Pediatrics, Third Xiangya Hospital, Central South University, 400013, Hunan, China
| | - Yuansheng Liu
- College of Information Science and Engineering, Hunan University, 2 Lushan S Rd, Yuelu District, 410086, Hunan, China
| | - Yansen Su
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 230601, Hefei, China
| | - Xiangxiang Zeng
- College of Information Science and Engineering, Hunan University, 2 Lushan S Rd, Yuelu District, 410086, Hunan, China
| |
Collapse
|
30
|
Machine Learning in Chemical Product Engineering: The State of the Art and a Guide for Newcomers. Processes (Basel) 2021. [DOI: 10.3390/pr9081456] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Chemical Product Engineering (CPE) is marked by numerous challenges, such as the complexity of the properties–structure–ingredients–process relationship of the different products and the necessity to discover and develop constantly and quickly new molecules and materials with tailor-made properties. In recent years, artificial intelligence (AI) and machine learning (ML) methods have gained increasing attention due to their performance in tackling particularly complex problems in various areas, such as computer vision and natural language processing. As such, they present a specific interest in addressing the complex challenges of CPE. This article provides an updated review of the state of the art regarding the implementation of ML techniques in different types of CPE problems with a particular focus on four specific domains, namely the design and discovery of new molecules and materials, the modeling of processes, the prediction of chemical reactions/retrosynthesis and the support for sensorial analysis. This review is further completed by general guidelines for the selection of an appropriate ML technique given the characteristics of each problem and by a critical discussion of several key issues associated with the development of ML modeling approaches. Accordingly, this paper may serve both the experienced researcher in the field as well as the newcomer.
Collapse
|
31
|
Kim J, Kim Y, Lee EK, Chae CH, Lee K, Kim WJ, Choi IS. Rotational Variance-Based Data Augmentation in 3D Graph Convolutional Network. Chem Asian J 2021; 16:2610-2613. [PMID: 34369653 DOI: 10.1002/asia.202100789] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Revised: 07/30/2021] [Indexed: 01/17/2023]
Abstract
This work proposes the data augmentation by molecular rotation, with consideration that the protein-ligand binding events are rotation-variant. As a proof-of-concept, known active (i. e., 1-labeled) ligands to human β-secretase 1 (BACE-1) are rotated for the generation of 0-labeled data, and the rotation-dependent prediction accuracy of 3D graph convolutional network (3DGCN) is investigated after data augmentation. The data augmentation makes the orientation-recognizing ability of 3DGCN improved significantly in the classification task for BACE-1/ligand binding. Furthermore, the data-augmented 3DGCN has a capability for predicting active ligands from a candidate dataset, via improved performance of orientation recognition, which would be applied to virtual drug screening and discovery.
Collapse
Affiliation(s)
- Jihoo Kim
- Department of Chemistry, KAIST, Daejeon, 34141, Korea
| | - Yeji Kim
- Department of Chemistry, KAIST, Daejeon, 34141, Korea
| | - Eok Kyun Lee
- Department of Chemistry, KAIST, Daejeon, 34141, Korea
| | - Chong Hak Chae
- Data Convergence Drug Research Center, Korea Research Institute of Chemical Technology, Daejeon, 34114, Korea
| | - Kwangho Lee
- Data Convergence Drug Research Center, Korea Research Institute of Chemical Technology, Daejeon, 34114, Korea
| | - Won June Kim
- Department of Biology and Chemistry, Changwon National University, Changwon, 51140, Korea
| | - Insung S Choi
- Department of Chemistry, KAIST, Daejeon, 34141, Korea
| |
Collapse
|
32
|
Zhumagambetov R, Molnár F, Peshkov VA, Fazli S. Transmol: repurposing a language model for molecular generation. RSC Adv 2021; 11:25921-25932. [PMID: 35479483 PMCID: PMC9037129 DOI: 10.1039/d1ra03086h] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Accepted: 07/22/2021] [Indexed: 12/29/2022] Open
Abstract
Recent advances in convolutional neural networks have inspired the application of deep learning to other disciplines. Even though image processing and natural language processing have turned out to be the most successful, there are many other domains that have also benefited; among them, life sciences in general and chemistry and drug design in particular. In concordance with this observation, from 2018 the scientific community has seen a surge of methodologies related to the generation of diverse molecular libraries using machine learning. However to date, attention mechanisms have not been employed for the problem of de novo molecular generation. Here we employ a variant of transformers, an architecture recently developed for natural language processing, for this purpose. Our results indicate that the adapted Transmol model is indeed applicable for the task of generating molecular libraries and leads to statistically significant increases in some of the core metrics of the MOSES benchmark. The presented model can be tuned to either input-guided or diversity-driven generation modes by applying a standard one-seed and a novel two-seed approach, respectively. Accordingly, the one-seed approach is best suited for the targeted generation of focused libraries composed of close analogues of the seed structure, while the two-seeds approach allows us to dive deeper into under-explored regions of the chemical space by attempting to generate the molecules that resemble both seeds. To gain more insights about the scope of the one-seed approach, we devised a new validation workflow that involves the recreation of known ligands for an important biological target vitamin D receptor. To further benefit the chemical community, the Transmol algorithm has been incorporated into our cheML.io web database of ML-generated molecules as a second generation on-demand methodology.
Collapse
Affiliation(s)
- Rustam Zhumagambetov
- Department of Computer Science, School of Engineering and Digital Sciences, Nazarbayev University Nur-Sultan Kazakhstan
| | - Ferdinand Molnár
- Department of Biology, School of Sciences and Humanities, Nazarbayev University Nur-Sultan Kazakhstan
| | - Vsevolod A Peshkov
- Department of Chemistry, School of Sciences and Humanities, Nazarbayev University Nur-Sultan Kazakhstan
| | - Siamac Fazli
- Department of Computer Science, School of Engineering and Digital Sciences, Nazarbayev University Nur-Sultan Kazakhstan
| |
Collapse
|
33
|
Sacha M, Błaż M, Byrski P, Dąbrowski-Tumański P, Chromiński M, Loska R, Włodarczyk-Pruszyński P, Jastrzębski S. Molecule Edit Graph Attention Network: Modeling Chemical Reactions as Sequences of Graph Edits. J Chem Inf Model 2021; 61:3273-3284. [PMID: 34251814 DOI: 10.1021/acs.jcim.1c00537] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The central challenge in automated synthesis planning is to be able to generate and predict outcomes of a diverse set of chemical reactions. In particular, in many cases, the most likely synthesis pathway cannot be applied due to additional constraints, which requires proposing alternative chemical reactions. With this in mind, we present Molecule Edit Graph Attention Network (MEGAN), an end-to-end encoder-decoder neural model. MEGAN is inspired by models that express a chemical reaction as a sequence of graph edits, akin to the arrow pushing formalism. We extend this model to retrosynthesis prediction (predicting substrates given the product of a chemical reaction) and scale it up to large data sets. We argue that representing the reaction as a sequence of edits enables MEGAN to efficiently explore the space of plausible chemical reactions, maintaining the flexibility of modeling the reaction in an end-to-end fashion and achieving state-of-the-art accuracy in standard benchmarks. Code and trained models are made available online at https://github.com/molecule-one/megan.
Collapse
Affiliation(s)
| | | | | | - Paweł Dąbrowski-Tumański
- Molecule One, Warsaw 00-815, Poland.,Faculty of Mathematics and Natural Sciences, School of Exact Sciences, Cardinal Stefan Wyszynski University, Warsaw 01-815, Poland
| | | | - Rafał Loska
- Institute of Organic Chemistry, Polish Academy of Sciences, Warsaw 01-224, Poland
| | | | | |
Collapse
|
34
|
Andersen JL, Fagerberg R, Flamm C, Fontana W, Kolčák J, Laurent CVFP, Merkle D, Nøjgaard N. Graph transformation for enzymatic mechanisms. Bioinformatics 2021; 37:i392-i400. [PMID: 34252947 PMCID: PMC8686676 DOI: 10.1093/bioinformatics/btab296] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/26/2021] [Indexed: 11/15/2022] Open
Abstract
MOTIVATION The design of enzymes is as challenging as it is consequential for making chemical synthesis in medical and industrial applications more efficient, cost-effective and environmentally friendly. While several aspects of this complex problem are computationally assisted, the drafting of catalytic mechanisms, i.e. the specification of the chemical steps-and hence intermediate states-that the enzyme is meant to implement, is largely left to human expertise. The ability to capture specific chemistries of multistep catalysis in a fashion that enables its computational construction and design is therefore highly desirable and would equally impact the elucidation of existing enzymatic reactions whose mechanisms are unknown. RESULTS We use the mathematical framework of graph transformation to express the distinction between rules and reactions in chemistry. We derive about 1000 rules for amino acid side chain chemistry from the M-CSA database, a curated repository of enzymatic mechanisms. Using graph transformation, we are able to propose hundreds of hypothetical catalytic mechanisms for a large number of unrelated reactions in the Rhea database. We analyze these mechanisms to find that they combine in chemically sound fashion individual steps from a variety of known multistep mechanisms, showing that plausible novel mechanisms for catalysis can be constructed computationally. AVAILABILITY AND IMPLEMENTATION The source code of the initial prototype of our approach is available at https://github.com/Nojgaard/mechsearch. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jakob L Andersen
- Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| | - Rolf Fagerberg
- Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| | - Christoph Flamm
- Department of Theoretical Chemistry, University of Vienna, Vienna, Austria
| | - Walter Fontana
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Juri Kolčák
- Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| | | | - Daniel Merkle
- Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| | - Nikolai Nøjgaard
- Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| |
Collapse
|
35
|
Applications of artificial intelligence to drug design and discovery in the big data era: a comprehensive review. Mol Divers 2021; 25:1643-1664. [PMID: 34110579 DOI: 10.1007/s11030-021-10237-z] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Accepted: 05/26/2021] [Indexed: 10/21/2022]
Abstract
Artificial intelligence (AI) renders cutting-edge applications in diverse sectors of society. Due to substantial progress in high-performance computing, the development of superior algorithms, and the accumulation of huge biological and chemical data, computer-assisted drug design technology is playing a key role in drug discovery with its advantages of high efficiency, fast speed, and low cost. Over recent years, due to continuous progress in machine learning (ML) algorithms, AI has been extensively employed in various drug discovery stages. Very recently, drug design and discovery have entered the big data era. ML algorithms have progressively developed into a deep learning technique with potent generalization capability and more effectual big data handling, which further promotes the integration of AI technology and computer-assisted drug discovery technology, hence accelerating the design and discovery of the newest drugs. This review mainly summarizes the application progression of AI technology in the drug discovery process, and explores and compares its advantages over conventional methods. The challenges and limitations of AI in drug design and discovery have also been discussed.
Collapse
|
36
|
Jiang J, Liu LP, Hassoun S. Learning graph representations of biochemical networks and its application to enzymatic link prediction. Bioinformatics 2021; 37:793-799. [PMID: 33051674 PMCID: PMC8097755 DOI: 10.1093/bioinformatics/btaa881] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Revised: 08/01/2020] [Accepted: 09/29/2020] [Indexed: 11/20/2022] Open
Abstract
Motivation The complete characterization of enzymatic activities between molecules remains incomplete, hindering biological engineering and limiting biological discovery. We develop in this work a technique, enzymatic link prediction (ELP), for predicting the likelihood of an enzymatic transformation between two molecules. ELP models enzymatic reactions cataloged in the KEGG database as a graph. ELP is innovative over prior works in using graph embedding to learn molecular representations that capture not only molecular and enzymatic attributes but also graph connectivity. Results We explore transductive (test nodes included in the training graph) and inductive (test nodes not part of the training graph) learning models. We show that ELP achieves high AUC when learning node embeddings using both graph connectivity and node attributes. Further, we show that graph embedding improves link prediction by 30% in area under curve over fingerprint-based similarity approaches and by 8% over support vector machines. We compare ELP against rule-based methods. We also evaluate ELP for predicting links in pathway maps and for reconstruction of edges in reaction networks of four common gut microbiota phyla: actinobacteria, bacteroidetes, firmicutes and proteobacteria. To emphasize the importance of graph embedding in the context of biochemical networks, we illustrate how graph embedding can guide visualization. Availability and implementation The code and datasets are available through https://github.com/HassounLab/ELP.
Collapse
Affiliation(s)
- Julie Jiang
- Department of Computer Science, Tufts University, Medford 02155, USA
| | - Li-Ping Liu
- Department of Computer Science, Tufts University, Medford 02155, USA
| | - Soha Hassoun
- Department of Computer Science, Tufts University, Medford 02155, USA.,Department of Chemical and Biological Engineering, Tufts University, Medford 02155, USA
| |
Collapse
|
37
|
Li P, Wang J, Qiao Y, Chen H, Yu Y, Yao X, Gao P, Xie G, Song S. An effective self-supervised framework for learning expressive molecular global representations to drug discovery. Brief Bioinform 2021; 22:6262238. [PMID: 33940598 DOI: 10.1093/bib/bbab109] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Revised: 03/06/2021] [Accepted: 03/12/2021] [Indexed: 11/13/2022] Open
Abstract
How to produce expressive molecular representations is a fundamental challenge in artificial intelligence-driven drug discovery. Graph neural network (GNN) has emerged as a powerful technique for modeling molecular data. However, previous supervised approaches usually suffer from the scarcity of labeled data and poor generalization capability. Here, we propose a novel molecular pre-training graph-based deep learning framework, named MPG, that learns molecular representations from large-scale unlabeled molecules. In MPG, we proposed a powerful GNN for modelling molecular graph named MolGNet, and designed an effective self-supervised strategy for pre-training the model at both the node and graph-level. After pre-training on 11 million unlabeled molecules, we revealed that MolGNet can capture valuable chemical insights to produce interpretable representation. The pre-trained MolGNet can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of drug discovery tasks, including molecular properties prediction, drug-drug interaction and drug-target interaction, on 14 benchmark datasets. The pre-trained MolGNet in MPG has the potential to become an advanced molecular encoder in the drug discovery pipeline.
Collapse
Affiliation(s)
- Pengyong Li
- Department of Biomedical Engineering at Tsinghua University, China
| | - Jun Wang
- Ping An Healthcare Technology, Chaoyang, 100027 Beijing, China
| | - Yixuan Qiao
- Operations Research and Cybernetics at Beijing University of Technology, China
| | - Hao Chen
- Cybernetics at Beijing University of Technology, China
| | - Yihuan Yu
- Beijing University of Biomedical Engineering, China
| | - Xiaojun Yao
- Analytical Chemistry and Chemoinformatics at Lanzhou University, China
| | - Peng Gao
- Ping An Healthcare Technology, Chaoyang, 100027 Beijing, China
| | - Guotong Xie
- Ping An Healthcare Technology, Chaoyang, 100027 Beijing, China
| | - Sen Song
- Tsinghua Laboratory of Brain and Intelligence and Department of Biomedical Engineering, Tsinghua University, Haidian, 100084 Beijing, China
| |
Collapse
|
38
|
|
39
|
Abstract
As more data are introduced in the building of models of chemical reactivity, the mechanistic component can be reduced until 'big data' applications are reached. These methods no longer depend on underlying mechanistic hypotheses, potentially learning them implicitly through extensive data training. Reactivity models often focus on reaction barriers, but can also be trained to directly predict lab-relevant properties, such as yields or conditions. Calculations with a quantum-mechanical component are still preferred for quantitative predictions of reactivity. Although big data applications tend to be more qualitative, they have the advantage to be broadly applied to different kinds of reactions. There is a continuum of methods in between these extremes, such as methods that use quantum-derived data or descriptors in machine learning models. Here, we present an overview of the recent machine learning applications in the field of chemical reactivity from a mechanistic perspective. Starting with a summary of how reactivity questions are addressed by quantum-mechanical methods, we discuss methods that augment or replace quantum-based modelling with faster alternatives relying on machine learning.
Collapse
|
40
|
Ford J, Seritan S, Zhu X, Sakano MN, Islam MM, Strachan A, Martínez TJ. Nitromethane Decomposition via Automated Reaction Discovery and an Ab Initio Corrected Kinetic Model. J Phys Chem A 2021; 125:1447-1460. [PMID: 33569957 DOI: 10.1021/acs.jpca.0c09168] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
We explore the systematic construction of kinetic models from in silico reaction data for the decomposition of nitromethane. Our models are constructed in a computationally affordable manner by using reactions discovered through accelerated molecular dynamics simulations using the ReaxFF reactive force field. The reaction paths are then optimized to determine reaction rate parameters. We introduce a reaction barrier correction scheme that combines accurate thermochemical data from density functional theory with ReaxFF minimal energy paths. We validate our models across different thermodynamic regimes, showing predictions of gas phase CO and NO concentrations and high-pressure induction times that are similar to experimental data. The kinetic models are analyzed to find fundamental decomposition reactions in different thermodynamic regimes.
Collapse
Affiliation(s)
- Jason Ford
- Department of Chemistry and The PULSE Institute, Stanford University, Stanford, California 94305, United States.,SLAC National Accelerator Laboratory, 2575 Sand Hill Road, Menlo Park, California 94025, United States
| | - Stefan Seritan
- Department of Chemistry and The PULSE Institute, Stanford University, Stanford, California 94305, United States.,SLAC National Accelerator Laboratory, 2575 Sand Hill Road, Menlo Park, California 94025, United States
| | - Xiaolei Zhu
- Department of Chemistry and The PULSE Institute, Stanford University, Stanford, California 94305, United States.,SLAC National Accelerator Laboratory, 2575 Sand Hill Road, Menlo Park, California 94025, United States
| | - Michael N Sakano
- School of Materials Engineering and Birck Nanotechnology Center, Purdue University, West Lafayette, Indiana 47907, United States
| | - Md Mahbub Islam
- School of Materials Engineering and Birck Nanotechnology Center, Purdue University, West Lafayette, Indiana 47907, United States.,Department of Mechanical Engineering, Wayne State University, Detroit, Michigan 48202, United States
| | - Alejandro Strachan
- School of Materials Engineering and Birck Nanotechnology Center, Purdue University, West Lafayette, Indiana 47907, United States
| | - Todd J Martínez
- Department of Chemistry and The PULSE Institute, Stanford University, Stanford, California 94305, United States.,SLAC National Accelerator Laboratory, 2575 Sand Hill Road, Menlo Park, California 94025, United States
| |
Collapse
|
41
|
Bort W, Baskin II, Gimadiev T, Mukanov A, Nugmanov R, Sidorov P, Marcou G, Horvath D, Klimchuk O, Madzhidov T, Varnek A. Discovery of novel chemical reactions by deep generative recurrent neural network. Sci Rep 2021; 11:3178. [PMID: 33542271 PMCID: PMC7862614 DOI: 10.1038/s41598-021-81889-y] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2020] [Accepted: 01/06/2021] [Indexed: 12/18/2022] Open
Abstract
The "creativity" of Artificial Intelligence (AI) in terms of generating de novo molecular structures opened a novel paradigm in compound design, weaknesses (stability & feasibility issues of such structures) notwithstanding. Here we show that "creative" AI may be as successfully taught to enumerate novel chemical reactions that are stoichiometrically coherent. Furthermore, when coupled to reaction space cartography, de novo reaction design may be focused on the desired reaction class. A sequence-to-sequence autoencoder with bidirectional Long Short-Term Memory layers was trained on on-purpose developed "SMILES/CGR" strings, encoding reactions of the USPTO database. The autoencoder latent space was visualized on a generative topographic map. Novel latent space points were sampled around a map area populated by Suzuki reactions and decoded to corresponding reactions. These can be critically analyzed by the expert, cleaned of irrelevant functional groups and eventually experimentally attempted, herewith enlarging the synthetic purpose of popular synthetic pathways.
Collapse
Affiliation(s)
- William Bort
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 1, rue Blaise Pascal, 67000, Strasbourg, France
| | - Igor I Baskin
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 1, rue Blaise Pascal, 67000, Strasbourg, France
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya str. 18, 420008, Kazan, Russia
- Department of Materials Science and Engineering, Technion - Israel Institute of Technology, 3200003, Haifa, Israel
| | - Timur Gimadiev
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, Sapporo, 001-0021, Japan
| | - Artem Mukanov
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya str. 18, 420008, Kazan, Russia
| | - Ramil Nugmanov
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya str. 18, 420008, Kazan, Russia
| | - Pavel Sidorov
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, Sapporo, 001-0021, Japan
| | - Gilles Marcou
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 1, rue Blaise Pascal, 67000, Strasbourg, France
| | - Dragos Horvath
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 1, rue Blaise Pascal, 67000, Strasbourg, France
| | - Olga Klimchuk
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 1, rue Blaise Pascal, 67000, Strasbourg, France
| | - Timur Madzhidov
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya str. 18, 420008, Kazan, Russia
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 1, rue Blaise Pascal, 67000, Strasbourg, France.
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, Sapporo, 001-0021, Japan.
| |
Collapse
|
42
|
Sun P, Gu L. Fuzzy knowledge graph system for artificial intelligence-based smart education. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2021. [DOI: 10.3233/jifs-189332] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Fuzzy knowledge graph system is a semantic network that reveals the relationships between entities, and a tool or methodology that can formally describe things in the real world and their relationships. Smart education is an educational concept or model that uses advanced information technology to build a smart environment, integrates theory and practice to build an educational framework for information age, and provides paths to practice it. Artificial intelligence (AI) is a comprehensive discipline developed by the interpenetration of computer science, cybernetics, information theory, linguistics, neurophysiology and other disciplines, which is a direction for the development of information technology in the future. On the basis of summarizing and analyzing of previous research works, this paper expounded the research status and significance of AI technology, elaborated the development background, current status and future challenges of the construction and application of fuzzy knowledge graph system for smart education, introduced the methods and principles of data acquisition methods and digitalized apprenticeship, realized the process design, information extraction, entity recognition and relationship mining of smart education, constructed a systematic framework for fuzzy knowledge graph, and analyzed the high-quality resources sharing and personalized service of AI-assisted smart education, discussed automatic knowledge acquisition and fusion of fuzzy knowledge graph, performed co-occurrence relationship analysis, and finally conducted application case analysis. The results show that the smart education knowledge graph for AI-assisted smart education can integrate teaching experience and domain knowledge of discipline experts, enhance explainable and robust machine intelligence for AI-assisted smart education, and provide data-driven and knowledge-driven information processing methods; it can also discover the analysis hotspots and main content of research objects through clustering of high-frequency topic words, reveal the corresponding research structure in depth, and then systematically explore its research dimensions, subject background and theoretical basis.
Collapse
Affiliation(s)
- Pingping Sun
- Business & Tourism Institute, Hangzhou Vocational & Technical College, Hangzhou, Zhejiang, China
| | - Lingang Gu
- Special Equipment Institute, Hangzhou Vocational & Technical College, Hangzhou, Zhejiang, China
| |
Collapse
|
43
|
Mann V, Venkatasubramanian V. Predicting chemical reaction outcomes: A grammar ontology‐based transformer framework. AIChE J 2021. [DOI: 10.1002/aic.17190] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Vipul Mann
- Department of Chemical Engineering Columbia University New York New York 10027 USA
| | | |
Collapse
|
44
|
Gimadiev T, Nugmanov R, Batyrshin D, Madzhidov T, Maeda S, Sidorov P, Varnek A. Combined Graph/Relational Database Management System for Calculated Chemical Reaction Pathway Data. J Chem Inf Model 2021; 61:554-559. [PMID: 33502186 DOI: 10.1021/acs.jcim.0c01280] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Presently, quantum chemical calculations are widely used to generate extensive data sets for machine learning applications; however, generally, these sets only include information on equilibrium structures and some close conformers. Exploration of potential energy surfaces provides important information on ground and transition states, but analysis of such data is complicated due to the number of possible reaction pathways. Here, we present RePathDB, a database system for managing 3D structural data for both ground and transition states resulting from quantum chemical calculations. Our tool allows one to store, assemble, and analyze reaction pathway data. It combines relational database CGR DB for handling compounds and reactions as molecular graphs with a graph database architecture for pathway analysis by graph algorithms. Original condensed graph of reaction technology is used to store any chemical reaction as a single graph.
Collapse
Affiliation(s)
- Timur Gimadiev
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, 001-0021 Sapporo, Japan
| | - Ramil Nugmanov
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008 Kazan, Russia
| | - Dinar Batyrshin
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008 Kazan, Russia
| | - Timur Madzhidov
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008 Kazan, Russia
| | - Satoshi Maeda
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, 001-0021 Sapporo, Japan
| | - Pavel Sidorov
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, 001-0021 Sapporo, Japan
| | - Alexandre Varnek
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, 001-0021 Sapporo, Japan.,Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 4, Blaise Pascal str., 67081 Strasbourg, France
| |
Collapse
|
45
|
Ucak UV, Kang T, Ko J, Lee J. Substructure-based neural machine translation for retrosynthetic prediction. J Cheminform 2021; 13:4. [PMID: 33431017 PMCID: PMC7802345 DOI: 10.1186/s13321-020-00482-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Accepted: 12/22/2020] [Indexed: 11/10/2022] Open
Abstract
With the rapid improvement of machine translation approaches, neural machine translation has started to play an important role in retrosynthesis planning, which finds reasonable synthetic pathways for a target molecule. Previous studies showed that utilizing the sequence-to-sequence frameworks of neural machine translation is a promising approach to tackle the retrosynthetic planning problem. In this work, we recast the retrosynthetic planning problem as a language translation problem using a template-free sequence-to-sequence model. The model is trained in an end-to-end and a fully data-driven fashion. Unlike previous models translating the SMILES strings of reactants and products, we introduced a new way of representing a chemical reaction based on molecular fragments. It is demonstrated that the new approach yields better prediction results than current state-of-the-art computational methods. The new approach resolves the major drawbacks of existing retrosynthetic methods such as generating invalid SMILES strings. Specifically, our approach predicts highly similar reactant molecules with an accuracy of 57.7%. In addition, our method yields more robust predictions than existing methods.
Collapse
Affiliation(s)
- Umit V Ucak
- Division of Chemistry and Biochemistry, Department of Chemistry, Kangwon National University, Chuncheon, South Korea
| | - Taek Kang
- Center for Neuro-Medicine, Brain Science Institute, Korea Institute of Science and Technology, Seoul, South Korea
| | - Junsu Ko
- Arontier co., Seoul, South Korea
| | - Juyong Lee
- Division of Chemistry and Biochemistry, Department of Chemistry, Kangwon National University, Chuncheon, South Korea.
| |
Collapse
|
46
|
Rodrigues JF, Florea L, de Oliveira MCF, Diamond D, Oliveira ON. Big data and machine learning for materials science. DISCOVER MATERIALS 2021; 1:12. [PMID: 33899049 PMCID: PMC8054236 DOI: 10.1007/s43939-021-00012-0] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 04/01/2021] [Indexed: 05/11/2023]
Abstract
Herein, we review aspects of leading-edge research and innovation in materials science that exploit big data and machine learning (ML), two computer science concepts that combine to yield computational intelligence. ML can accelerate the solution of intricate chemical problems and even solve problems that otherwise would not be tractable. However, the potential benefits of ML come at the cost of big data production; that is, the algorithms demand large volumes of data of various natures and from different sources, from material properties to sensor data. In the survey, we propose a roadmap for future developments with emphasis on computer-aided discovery of new materials and analysis of chemical sensing compounds, both prominent research fields for ML in the context of materials science. In addition to providing an overview of recent advances, we elaborate upon the conceptual and practical limitations of big data and ML applied to materials science, outlining processes, discussing pitfalls, and reviewing cases of success and failure.
Collapse
Affiliation(s)
- Jose F. Rodrigues
- Institute of Mathematical Sciences and Computing, University of São Paulo (USP), São Carlos, SP Brazil
| | - Larisa Florea
- SFI Research Centre for Advanced Materials and BioEngineering Research Trinity College Dublin, The University of Dublin, Dublin, Ireland
| | - Maria C. F. de Oliveira
- Institute of Mathematical Sciences and Computing, University of São Paulo (USP), São Carlos, SP Brazil
| | - Dermot Diamond
- Insight Centre for Data Analytics, National Centre for Sensor Research, Dublin City University, Dublin 9, Dublin, Ireland
| | - Osvaldo N. Oliveira
- São Carlos Institute of Physics, University of São Paulo (USP), São Carlos, SP Brazil
| |
Collapse
|
47
|
Zhang Y, Wang L, Wang X, Zhang C, Ge J, Tang J, Su A, Duan H. Data augmentation and transfer learning strategies for reaction prediction in low chemical data regimes. Org Chem Front 2021. [DOI: 10.1039/d0qo01636e] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
An effective and rapid deep learning method to predict chemical reactions contributes to the research and development of organic chemistry and drug discovery.
Collapse
Affiliation(s)
- Yun Zhang
- Artificial Intelligence Aided Drug Discovery Institute
- College of Pharmaceutical Sciences
- Zhejiang University of Technology
- Hangzhou 310014
- China
| | - Ling Wang
- Artificial Intelligence Aided Drug Discovery Institute
- College of Pharmaceutical Sciences
- Zhejiang University of Technology
- Hangzhou 310014
- China
| | - Xinqiao Wang
- Artificial Intelligence Aided Drug Discovery Institute
- College of Pharmaceutical Sciences
- Zhejiang University of Technology
- Hangzhou 310014
- China
| | - Chengyun Zhang
- Artificial Intelligence Aided Drug Discovery Institute
- College of Pharmaceutical Sciences
- Zhejiang University of Technology
- Hangzhou 310014
- China
| | - Jiamin Ge
- Artificial Intelligence Aided Drug Discovery Institute
- College of Pharmaceutical Sciences
- Zhejiang University of Technology
- Hangzhou 310014
- China
| | - Jing Tang
- Artificial Intelligence Aided Drug Discovery Institute
- College of Pharmaceutical Sciences
- Zhejiang University of Technology
- Hangzhou 310014
- China
| | - An Su
- College of Chemical Engineering
- Zhejiang University of Technology
- Hangzhou 310014
- China
| | - Hongliang Duan
- Artificial Intelligence Aided Drug Discovery Institute
- College of Pharmaceutical Sciences
- Zhejiang University of Technology
- Hangzhou 310014
- China
| |
Collapse
|
48
|
Thakkar A, Johansson S, Jorner K, Buttar D, Reymond JL, Engkvist O. Artificial intelligence and automation in computer aided synthesis planning. REACT CHEM ENG 2021. [DOI: 10.1039/d0re00340a] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
In this perspective we deal with questions pertaining to the development of synthesis planning technologies over the course of recent years.
Collapse
Affiliation(s)
- Amol Thakkar
- Hit Discovery
- Discovery Sciences
- R&D
- AstraZeneca
- Gothenburg
| | | | - Kjell Jorner
- Early Chemical Development
- Pharmaceutical Sciences
- R&D
- AstraZeneca
- Macclesfield
| | - David Buttar
- Early Chemical Development
- Pharmaceutical Sciences
- R&D
- AstraZeneca
- Macclesfield
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry
- University of Bern
- 3012 Bern
- Switzerland
| | - Ola Engkvist
- Hit Discovery
- Discovery Sciences
- R&D
- AstraZeneca
- Gothenburg
| |
Collapse
|
49
|
Agyemang B, Wu WP, Addo D, Kpiebaareh MY, Nanor E, Roland Haruna C. Deep inverse reinforcement learning for structural evolution of small molecules. Brief Bioinform 2020; 22:6043289. [PMID: 33348357 DOI: 10.1093/bib/bbaa364] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2020] [Revised: 09/25/2020] [Accepted: 11/10/2020] [Indexed: 11/14/2022] Open
Abstract
The size and quality of chemical libraries to the drug discovery pipeline are crucial for developing new drugs or repurposing existing drugs. Existing techniques such as combinatorial organic synthesis and high-throughput screening usually make the process extraordinarily tough and complicated since the search space of synthetically feasible drugs is exorbitantly huge. While reinforcement learning has been mostly exploited in the literature for generating novel compounds, the requirement of designing a reward function that succinctly represents the learning objective could prove daunting in certain complex domains. Generative adversarial network-based methods also mostly discard the discriminator after training and could be hard to train. In this study, we propose a framework for training a compound generator and learn a transferable reward function based on the entropy maximization inverse reinforcement learning (IRL) paradigm. We show from our experiments that the IRL route offers a rational alternative for generating chemical compounds in domains where reward function engineering may be less appealing or impossible while data exhibiting the desired objective is readily available.
Collapse
|
50
|
Struble T, Alvarez JC, Brown SP, Chytil M, Cisar J, DesJarlais RL, Engkvist O, Frank SA, Greve DR, Griffin DJ, Hou X, Johannes JW, Kreatsoulas C, Lahue B, Mathea M, Mogk G, Nicolaou CA, Palmer AD, Price DJ, Robinson RI, Salentin S, Xing L, Jaakkola T, Green WH, Barzilay R, Coley CW, Jensen KF. Current and Future Roles of Artificial Intelligence in Medicinal Chemistry Synthesis. J Med Chem 2020; 63:8667-8682. [PMID: 32243158 PMCID: PMC7457232 DOI: 10.1021/acs.jmedchem.9b02120] [Citation(s) in RCA: 100] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Indexed: 12/20/2022]
Abstract
Artificial intelligence and machine learning have demonstrated their potential role in predictive chemistry and synthetic planning of small molecules; there are at least a few reports of companies employing in silico synthetic planning into their overall approach to accessing target molecules. A data-driven synthesis planning program is one component being developed and evaluated by the Machine Learning for Pharmaceutical Discovery and Synthesis (MLPDS) consortium, comprising MIT and 13 chemical and pharmaceutical company members. Together, we wrote this perspective to share how we think predictive models can be integrated into medicinal chemistry synthesis workflows, how they are currently used within MLPDS member companies, and the outlook for this field.
Collapse
Affiliation(s)
- Thomas
J. Struble
- Department
of Chemical Engineering, MIT, Cambridge, Massachusetts 02139, United States
| | - Juan C. Alvarez
- Computational
and Structural Chemistry, Merck & Co.
Inc., Kenilworth, New Jersey 07033, United States
| | - Scott P. Brown
- Sunovion
Pharmaceuticals Inc., Marlborough, Massachusetts 01752, United States
| | - Milan Chytil
- Sunovion
Pharmaceuticals Inc., Marlborough, Massachusetts 01752, United States
| | - Justin Cisar
- Janssen
Research & Development LLC, Spring House, Pennsylvania 19477, United States
| | - Renee L. DesJarlais
- Janssen
Research & Development LLC, Spring House, Pennsylvania 19477, United States
| | - Ola Engkvist
- Hit
Discovery, Discovery Sciences, R&D, AstraZeneca, 431 83 Mölndal, Sweden
| | - Scott A. Frank
- Eli Lilly
and Company, Indianapolis, Indiana 46285, United States
| | - Daniel R. Greve
- LEO
Pharma A/S, Industriparken 55, DK-2750 Ballerup, Denmark
| | | | - Xinjun Hou
- Pfizer
Inc., Cambridge, Massachusetts 02139, United States
| | - Jeffrey W. Johannes
- Medicinal Chemistry, Early Oncology, Oncology
R&D, AstraZeneca, Boston, Massachusetts 02451, United States
| | | | - Brian Lahue
- Computational
and Structural Chemistry, Merck & Co.
Inc., Kenilworth, New Jersey 07033, United States
| | - Miriam Mathea
- BASF
SE, Carl-Bosch-Strasse
38, 67056 Ludwigshafen
am Rhein, Germany
| | | | | | - Andrew D. Palmer
- BASF
SE, Carl-Bosch-Strasse
38, 67056 Ludwigshafen
am Rhein, Germany
| | - Daniel J. Price
- GlaxoSmithKline, Collegeville, Pennsylvania 19426, United States
| | - Richard I. Robinson
- Novartis Institutes for BioMedical Research, Cambridge, Massachusetts 02139, United States
| | | | - Li Xing
- WuXi
AppTec, Cambridge, Massachusetts 02142, United States
| | - Tommi Jaakkola
- Computer
Science and Artificial Intelligence Laboratory, MIT, Cambridge, Massachusetts 02139, United States
| | - William. H. Green
- Department
of Chemical Engineering, MIT, Cambridge, Massachusetts 02139, United States
| | - Regina Barzilay
- Computer
Science and Artificial Intelligence Laboratory, MIT, Cambridge, Massachusetts 02139, United States
| | - Connor W. Coley
- Department
of Chemical Engineering, MIT, Cambridge, Massachusetts 02139, United States
| | - Klavs F. Jensen
- Department
of Chemical Engineering, MIT, Cambridge, Massachusetts 02139, United States
| |
Collapse
|