1
|
Mroz AM, Basford AR, Hastedt F, Jayasekera IS, Mosquera-Lois I, Sedgwick R, Ballester PJ, Bocarsly JD, Antonio Del Río Chanona E, Evans ML, Frost JM, Ganose AM, Greenaway RL, Kuok Mimi Hii K, Li Y, Misener R, Walsh A, Zhang D, Jelfs KE. Cross-disciplinary perspectives on the potential for artificial intelligence across chemistry. Chem Soc Rev 2025. [PMID: 40278836 PMCID: PMC12024683 DOI: 10.1039/d5cs00146c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2025] [Indexed: 04/26/2025]
Abstract
From accelerating simulations and exploring chemical space, to experimental planning and integrating automation within experimental labs, artificial intelligence (AI) is changing the landscape of chemistry. We are seeing a significant increase in the number of publications leveraging these powerful data-driven insights and models to accelerate all aspects of chemical research. For example, how we represent molecules and materials to computer algorithms for predictive and generative models, as well as the physical mechanisms by which we perform experiments in the lab for automation. Here, we present ten diverse perspectives on the impact of AI coming from those with a range of backgrounds from experimental chemistry, computational chemistry, computer science, engineering and across different areas of chemistry, including drug discovery, catalysis, chemical automation, chemical physics, materials chemistry. The ten perspectives presented here cover a range of themes, including AI for computation, facilitating discovery, supporting experiments, and enabling technologies for transformation. We highlight and discuss imminent challenges and ways in which we are redefining problems to accelerate the impact of chemical research via AI.
Collapse
Affiliation(s)
- Austin M Mroz
- Department of Chemistry, Imperial College London, London W12 0BZ, UK.
- I-X Centre for AI in Science, Imperial College London, London W12 0BZ, UK
| | - Annabel R Basford
- Department of Chemistry, Imperial College London, London W12 0BZ, UK.
| | - Friedrich Hastedt
- Department of Chemical Engineering, Imperial College London, London SW7 2AZ, UK
| | | | | | - Ruby Sedgwick
- Department of Computing, Imperial College London, London SW7 2AZ, UK
| | - Pedro J Ballester
- Department of Bioengineering, Imperial College London, London SW7 2AZ, UK
| | - Joshua D Bocarsly
- Department of Chemistry and Texas Center for Superconductivity, University of Houston, Houston, USA
| | | | - Matthew L Evans
- UCLouvain, Institute of Condensed Matter and Nanosciences (IMCN), Chemin des Étoiles 8, Louvain-la-Neuve 1348, Belgium
- Matgenix SRL, A6K Advanced Engineering Center, Charleroi, Belgium
- Datalab Industries Ltd, King's Lynn, Norfolk, UK
| | - Jarvist M Frost
- Department of Chemistry, Imperial College London, London W12 0BZ, UK.
| | - Alex M Ganose
- Department of Chemistry, Imperial College London, London W12 0BZ, UK.
| | | | | | - Yingzhen Li
- Department of Computing, Imperial College London, London SW7 2AZ, UK
| | - Ruth Misener
- Department of Computing, Imperial College London, London SW7 2AZ, UK
| | - Aron Walsh
- Department of Materials, Imperial College London, London SW7 2AZ, UK
| | - Dandan Zhang
- I-X Centre for AI in Science, Imperial College London, London W12 0BZ, UK
- Department of Bioengineering, Imperial College London, London SW7 2AZ, UK
| | - Kim E Jelfs
- Department of Chemistry, Imperial College London, London W12 0BZ, UK.
| |
Collapse
|
2
|
Hu X, Chen Z, Peng B, Adu-Ampratwum D, Ning X. log-RRIM: Yield Prediction via Local-to-global Reaction Representation Learning and Interaction Modeling. ARXIV 2025:arXiv:2411.03320v4. [PMID: 39606718 PMCID: PMC11601803] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
Accurate prediction of chemical reaction yields is crucial for optimizing organic synthesis, potentially reducing time and resources spent on experimentation. With the rise of artificial intelligence (AI), there is growing interest in leveraging AI-based methods to accelerate yield predictions without conducting in vitro experiments. We present log-RRIM, an innovative graph transformer-based framework designed for predicting chemical reaction yields. A key feature of log-RRIM is its integration of a cross-attention mechanism that focuses on the interplay between reagents and reaction centers. This design reflects a fundamental principle in chemical reactions: the crucial role of reagents in influencing bond-breaking and formation processes, which ultimately affect reaction yields. log-RRIM also implements a local-to-global reaction representation learning strategy. This approach initially captures detailed molecule-level information and then models and aggregates intermolecular interactions. Through this hierarchical process, log-RRIM effectively captures how different molecular fragments contribute to and influence the overall reaction yield, regardless of their size variations. log-RRIM shows superior performance in our experiments, especially for medium to high-yielding reactions, proving its reliability as a predictor. The framework's sophisticated modeling of reactant-reagent interactions and precise capture of molecular fragment contributions make it a valuable tool for reaction planning and optimization in chemical synthesis. The data and codes of log-RRIM are accessible through https://github.com/ninglab/Yield_log_RRIM.
Collapse
|
3
|
Zhao PC, Wei XX, Wang Q, Wang QH, Li JN, Shang J, Lu C, Shi JY. Single-step retrosynthesis prediction via multitask graph representation learning. Nat Commun 2025; 16:814. [PMID: 39827189 PMCID: PMC11742932 DOI: 10.1038/s41467-025-56062-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 01/08/2025] [Indexed: 01/22/2025] Open
Abstract
Inferring appropriate synthesis reaction (i.e., retrosynthesis) routes for newly designed molecules is vital. Recently, computational methods have produced promising single-step retrosynthesis predictions. However, template-based methods are limited by the known synthesis templates; template-free methods are weakly interpretable; and semi template-based methods are deficient with regard to utilizing the associations between chemical entities. To address these issues, this paper leverages the intra-associations between synthons, the inter-associations between synthons and leaving groups (LGs), and the intra-associations between LGs. It develops a multitask graph representation learning model for single-step retrosynthesis prediction (Retro-MTGR) to solve reaction centre deduction and LG identification simultaneously. A comparison with 16 state-of-the-art methods first demonstrates the superiority of Retro-MTGR. Then, its robustness and scalability and the contributions of its crucial components are validated. More importantly, it can determine whether a bond can be a reaction centre and what LGs are appropriate for a given synthon, respectively. The answers reflect underlying chemical synthesis rules, especially opposite electrical properties between chemical entities (e.g., reaction sites, synthons, and LGs). Finally, case studies demonstrate that the retrosynthesis routes inferred by Retro-MTGR are promising for single-step synthesis reactions. The code and data of this study are freely available at https://doi.org/10.5281/zenodo.14346324 .
Collapse
Affiliation(s)
- Peng-Cheng Zhao
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, China
| | - Xue-Xin Wei
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, China
| | - Qiong Wang
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, China
| | - Qi-Hao Wang
- School of Chemistry and Chemical Engineering, Northwestern Polytechnical University, Xi'an, China
| | - Jia-Ning Li
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, China
| | - Jie Shang
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, China.
| | - Cheng Lu
- Institute of Basic Research in Clinical Medicine China Academy of Chinese Medical Sciences, Beijing, China.
| | - Jian-Yu Shi
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, China.
| |
Collapse
|
4
|
Baker F, Chen Z, Adu-Ampratwum D, Ning X. RLSynC: Offline-Online Reinforcement Learning for Synthon Completion. J Chem Inf Model 2024; 64:6723-6735. [PMID: 39154287 PMCID: PMC11388466 DOI: 10.1021/acs.jcim.4c00554] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Revised: 07/05/2024] [Accepted: 07/15/2024] [Indexed: 08/19/2024]
Abstract
Retrosynthesis is the process of determining the set of reactant molecules that can react to form a desired product. Semitemplate-based retrosynthesis methods, which imitate the reverse logic of synthesis reactions, first predict the reaction centers in the products and then complete the resulting synthons back into reactants. We develop a new offline-online reinforcement learning method RLSynC for synthon completion in semitemplate-based methods. RLSynC assigns one agent to each synthon, all of which complete the synthons by conducting actions step by step in a synchronized fashion. RLSynC learns the policy from both offline training episodes and online interactions, which allows RLSynC to explore new reaction spaces. RLSynC uses a standalone forward synthesis model to evaluate the likelihood of the predicted reactants in synthesizing a product and thus guides the action search. Our results demonstrate that RLSynC can outperform state-of-the-art synthon completion methods with improvements as high as 14.9%, highlighting its potential in synthesis planning.
Collapse
Affiliation(s)
- Frazier
N. Baker
- Department
of Computer Science and Engineering, College of Engineering, The Ohio State University, Columbus, Ohio 43210, United States
| | - Ziqi Chen
- Department
of Computer Science and Engineering, College of Engineering, The Ohio State University, Columbus, Ohio 43210, United States
| | - Daniel Adu-Ampratwum
- Division
of Medicinal Chemistry and Pharmacognosy, College of Pharmacy, The Ohio State University, Columbus, Ohio 43210, United States
| | - Xia Ning
- Department
of Computer Science and Engineering, College of Engineering, The Ohio State University, Columbus, Ohio 43210, United States
- Division
of Medicinal Chemistry and Pharmacognosy, College of Pharmacy, The Ohio State University, Columbus, Ohio 43210, United States
- Translational
Data Analytics Institute, The Ohio State
University, Columbus, Ohio 43210, United States
- Department
of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, Ohio 43210, United States
| |
Collapse
|
5
|
Liu X, Ai C, Yang H, Dong R, Tang J, Zheng S, Guo F. RetroCaptioner: beyond attention in end-to-end retrosynthesis transformer via contrastively captioned learnable graph representation. Bioinformatics 2024; 40:btae561. [PMID: 39342389 PMCID: PMC11520410 DOI: 10.1093/bioinformatics/btae561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Revised: 08/28/2024] [Accepted: 09/12/2024] [Indexed: 10/01/2024] Open
Abstract
MOTIVATION Retrosynthesis identifies available precursor molecules for various and novel compounds. With the advancements and practicality of language models, Transformer-based models have increasingly been used to automate this process. However, many existing methods struggle to efficiently capture reaction transformation information, limiting the accuracy and applicability of their predictions. RESULTS We introduce RetroCaptioner, an advanced end-to-end, Transformer-based framework featuring a Contrastive Reaction Center Captioner. This captioner guides the training of dual-view attention models using a contrastive learning approach. It leverages learned molecular graph representations to capture chemically plausible constraints within a single-step learning process. We integrate the single-encoder, dual-encoder, and encoder-decoder paradigms to effectively fuse information from the sequence and graph representations of molecules. This involves modifying the Transformer encoder into a uni-view sequence encoder and a dual-view module. Furthermore, we enhance the captioning of atomic correspondence between SMILES and graphs. Our proposed method, RetroCaptioner, achieved outstanding performance with 67.2% in top-1 and 93.4% in top-10 exact matched accuracy on the USPTO-50k dataset, alongside an exceptional SMILES validity score of 99.4%. In addition, RetroCaptioner has demonstrated its reliability in generating synthetic routes for the drug protokylol. AVAILABILITY AND IMPLEMENTATION The code and data are available at https://github.com/guofei-tju/RetroCaptioner.
Collapse
Affiliation(s)
- Xiaoyi Liu
- School of Chinese Materia Medica, Beijing University of Chinese Medicine, Beijing, 102488, China
- Ministry of Education, Engineering Research Center for Pharmaceutics of Chinese Materia Medica and New Drug Development, Beijing, 100102, China
| | - Chengwei Ai
- Computer Science and Engineering, Central South University, Changsha, 410083, China
| | - Hongpeng Yang
- Computer Science and Engineering, University of South Carolina, Columbia, South Carolina, 29208, United States
| | - Ruihan Dong
- Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China
| | - Jijun Tang
- Faculty of Computer Science and Control Engineering, Shenzhen University of Advanced Technology, Shenzhen, 518055, China
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Nanshan, 518055, China
| | - Shuangjia Zheng
- Global Institute of Future Technology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Fei Guo
- Computer Science and Engineering, Central South University, Changsha, 410083, China
| |
Collapse
|
6
|
Gricourt G, Meyer P, Duigou T, Faulon JL. Artificial Intelligence Methods and Models for Retro-Biosynthesis: A Scoping Review. ACS Synth Biol 2024; 13:2276-2294. [PMID: 39047143 PMCID: PMC11334239 DOI: 10.1021/acssynbio.4c00091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Revised: 06/14/2024] [Accepted: 06/14/2024] [Indexed: 07/27/2024]
Abstract
Retrosynthesis aims to efficiently plan the synthesis of desirable chemicals by strategically breaking down molecules into readily available building block compounds. Having a long history in chemistry, retro-biosynthesis has also been used in the fields of biocatalysis and synthetic biology. Artificial intelligence (AI) is driving us toward new frontiers in synthesis planning and the exploration of chemical spaces, arriving at an opportune moment for promoting bioproduction that would better align with green chemistry, enhancing environmental practices. In this review, we summarize the recent advancements in the application of AI methods and models for retrosynthetic and retro-biosynthetic pathway design. These techniques can be based either on reaction templates or generative models and require scoring functions and planning strategies to navigate through the retrosynthetic graph of possibilities. We finally discuss limitations and promising research directions in this field.
Collapse
Affiliation(s)
- Guillaume Gricourt
- Université
Paris-Saclay, INRAE, AgroParisTech, Micalis
Institute, 78350 Jouy-en-Josas, France
| | - Philippe Meyer
- Université
Paris-Saclay, INRAE, AgroParisTech, Micalis
Institute, 78350 Jouy-en-Josas, France
| | - Thomas Duigou
- Université
Paris-Saclay, INRAE, AgroParisTech, Micalis
Institute, 78350 Jouy-en-Josas, France
| | - Jean-Loup Faulon
- Université
Paris-Saclay, INRAE, AgroParisTech, Micalis
Institute, 78350 Jouy-en-Josas, France
- The
University of Manchester, Manchester Institute
of Biotechnology, Manchester M1 7DN, U.K.
| |
Collapse
|
7
|
Gholap AD, Uddin MJ, Faiyazuddin M, Omri A, Gowri S, Khalid M. Advances in artificial intelligence for drug delivery and development: A comprehensive review. Comput Biol Med 2024; 178:108702. [PMID: 38878397 DOI: 10.1016/j.compbiomed.2024.108702] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 05/12/2024] [Accepted: 06/01/2024] [Indexed: 07/24/2024]
Abstract
Artificial intelligence (AI) has emerged as a powerful tool to revolutionize the healthcare sector, including drug delivery and development. This review explores the current and future applications of AI in the pharmaceutical industry, focusing on drug delivery and development. It covers various aspects such as smart drug delivery networks, sensors, drug repurposing, statistical modeling, and simulation of biotechnological and biological systems. The integration of AI with nanotechnologies and nanomedicines is also examined. AI offers significant advancements in drug discovery by efficiently identifying compounds, validating drug targets, streamlining drug structures, and prioritizing response templates. Techniques like data mining, multitask learning, and high-throughput screening contribute to better drug discovery and development innovations. The review discusses AI applications in drug formulation and delivery, clinical trials, drug safety, and pharmacovigilance. It addresses regulatory considerations and challenges associated with AI in pharmaceuticals, including privacy, data security, and interpretability of AI models. The review concludes with future perspectives, highlighting emerging trends, addressing limitations and biases in AI models, and emphasizing the importance of collaboration and knowledge sharing. It provides a comprehensive overview of AI's potential to transform the pharmaceutical industry and improve patient care while identifying further research and development areas.
Collapse
Affiliation(s)
- Amol D Gholap
- Department of Pharmaceutics, St. John Institute of Pharmacy and Research, Palghar, Maharashtra, 401404, India.
| | - Md Jasim Uddin
- Department of Pharmaceutical Technology, Faculty of Pharmacy, Universiti Malaya, 50603, Kuala Lumpur, Malaysia.
| | - Md Faiyazuddin
- School of Pharmacy, Al-Karim University, Katihar, Bihar, 854106, India; Centre for Global Health Research, Saveetha Institute of Medical and Technical Sciences, Tamil Nadu, India.
| | - Abdelwahab Omri
- Department of Chemistry and Biochemistry, The Novel Drug and Vaccine Delivery Systems Facility, Laurentian University, Sudbury, ON, P3E 2C6, Canada.
| | - S Gowri
- PG & Research, Department of Physics, Cauvery College for Women, Tiruchirapalli, Tamil Nadu, 620018, India
| | - Mohammad Khalid
- James Watt School of Engineering, University of Glasgow, Glasgow G12 8QQ, UK; Sunway Centre for Electrochemical Energy and Sustainable Technology (SCEEST), School of Engineering and Technology, Sunway University, No. 5, Jalan Universiti, Bandar Sunway, 47500 Selangor Darul Ehsan, Malaysia; University Centre for Research and Development, Chandigarh University, Mohali, Punjab, 140413, India.
| |
Collapse
|
8
|
Han Y, Xu X, Hsieh CY, Ding K, Xu H, Xu R, Hou T, Zhang Q, Chen H. Retrosynthesis prediction with an iterative string editing model. Nat Commun 2024; 15:6404. [PMID: 39080274 PMCID: PMC11289138 DOI: 10.1038/s41467-024-50617-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 07/09/2024] [Indexed: 08/02/2024] Open
Abstract
Retrosynthesis is a crucial task in drug discovery and organic synthesis, where artificial intelligence (AI) is increasingly employed to expedite the process. However, existing approaches employ token-by-token decoding methods to translate target molecule strings into corresponding precursors, exhibiting unsatisfactory performance and limited diversity. As chemical reactions typically induce local molecular changes, reactants and products often overlap significantly. Inspired by this fact, we propose reframing single-step retrosynthesis prediction as a molecular string editing task, iteratively refining target molecule strings to generate precursor compounds. Our proposed approach involves a fragment-based generative editing model that uses explicit sequence editing operations. Additionally, we design an inference module with reposition sampling and sequence augmentation to enhance both prediction accuracy and diversity. Extensive experiments demonstrate that our model generates high-quality and diverse results, achieving superior performance with a promising top-1 accuracy of 60.8% on the standard benchmark dataset USPTO-50 K.
Collapse
Affiliation(s)
- Yuqiang Han
- College of Computer Science and Technology, Zhejiang University, Hangzhou, 310027, China
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou, 311200, China
| | - Xiaoyang Xu
- Polytechnic Institute, Zhejiang University, Hangzhou, 310015, China
| | - Chang-Yu Hsieh
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310018, China
| | - Keyan Ding
- College of Computer Science and Technology, Zhejiang University, Hangzhou, 310027, China
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou, 311200, China
| | - Hongxia Xu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310018, China
- Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 311121, China
| | - Renjun Xu
- College of Computer Science and Technology, Zhejiang University, Hangzhou, 310027, China
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou, 311200, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310018, China.
| | - Qiang Zhang
- College of Computer Science and Technology, Zhejiang University, Hangzhou, 310027, China.
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou, 311200, China.
| | - Huajun Chen
- College of Computer Science and Technology, Zhejiang University, Hangzhou, 310027, China.
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou, 311200, China.
- Zhejiang-University-Ant-Group Joint Center for Knowledge Graphs, Hangzhou, 310000, China.
- Hangzhou Institute of Medicine Chinese Academy of Science, Hangzhou, 310023, China.
| |
Collapse
|
9
|
Zeng K, Yang B, Zhao X, Zhang Y, Nie F, Yang X, Jin Y, Xu Y. Ualign: pushing the limit of template-free retrosynthesis prediction with unsupervised SMILES alignment. J Cheminform 2024; 16:80. [PMID: 39010144 PMCID: PMC11247856 DOI: 10.1186/s13321-024-00877-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Accepted: 06/30/2024] [Indexed: 07/17/2024] Open
Abstract
MOTIVATION Retrosynthesis planning poses a formidable challenge in the organic chemical industry, particularly in pharmaceuticals. Single-step retrosynthesis prediction, a crucial step in the planning process, has witnessed a surge in interest in recent years due to advancements in AI for science. Various deep learning-based methods have been proposed for this task in recent years, incorporating diverse levels of additional chemical knowledge dependency. RESULTS This paper introduces UAlign, a template-free graph-to-sequence pipeline for retrosynthesis prediction. By combining graph neural networks and Transformers, our method can more effectively leverage the inherent graph structure of molecules. Based on the fact that the majority of molecule structures remain unchanged during a chemical reaction, we propose a simple yet effective SMILES alignment technique to facilitate the reuse of unchanged structures for reactant generation. Extensive experiments show that our method substantially outperforms state-of-the-art template-free and semi-template-based approaches. Importantly, our template-free method achieves effectiveness comparable to, or even surpasses, established powerful template-based methods. SCIENTIFIC CONTRIBUTION We present a novel graph-to-sequence template-free retrosynthesis prediction pipeline that overcomes the limitations of Transformer-based methods in molecular representation learning and insufficient utilization of chemical information. We propose an unsupervised learning mechanism for establishing product-atom correspondence with reactant SMILES tokens, achieving even better results than supervised SMILES alignment methods. Extensive experiments demonstrate that UAlign significantly outperforms state-of-the-art template-free methods and rivals or surpasses template-based approaches, with up to 5% (top-5) and 5.4% (top-10) increased accuracy over the strongest baseline.
Collapse
Affiliation(s)
- Kaipeng Zeng
- MoE Key Laboratory of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, 200240, Shanghai, China
| | - Bo Yang
- Frontiers Science Center for Transformative Molecules (FSCTM), Zhangjiang Institute for Advanced Study, Shanghai Jiao Tong University, Shanghai, 200240, Shanghai, China
| | - Xin Zhao
- MoE Key Laboratory of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, 200240, Shanghai, China
| | - Yu Zhang
- MoE Key Laboratory of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, 200240, Shanghai, China
| | - Fan Nie
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, 200240, Shanghai, China
| | - Xiaokang Yang
- MoE Key Laboratory of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, 200240, Shanghai, China
| | - Yaohui Jin
- MoE Key Laboratory of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, 200240, Shanghai, China.
| | - Yanyan Xu
- MoE Key Laboratory of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, 200240, Shanghai, China.
| |
Collapse
|
10
|
Yao L, Guo W, Wang Z, Xiang S, Liu W, Ke G. Node-Aligned Graph-to-Graph: Elevating Template-free Deep Learning Approaches in Single-Step Retrosynthesis. JACS AU 2024; 4:992-1003. [PMID: 38559728 PMCID: PMC10976575 DOI: 10.1021/jacsau.3c00737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Revised: 01/19/2024] [Accepted: 01/29/2024] [Indexed: 04/04/2024]
Abstract
Single-step retrosynthesis in organic chemistry increasingly benefits from deep learning (DL) techniques in computer-aided synthesis design. While template-free DL models are flexible and promising for retrosynthesis prediction, they often ignore vital 2D molecular information and struggle with atom alignment for node generation, resulting in lower performance compared to the template-based and semi-template-based methods. To address these issues, we introduce node-aligned graph-to-graph (NAG2G), a transformer-based template-free DL model. NAG2G combines 2D molecular graphs and 3D conformations to retain comprehensive molecular details and incorporates product-reactant atom mapping through node alignment, which determines the order of the node-by-node graph outputs process in an autoregressive manner. Through rigorous benchmarking and detailed case studies, we have demonstrated that NAG2G stands out with its remarkable predictive accuracy on the expansive data sets of USPTO-50k and USPTO-FULL. Moreover, the model's practical utility is underscored by its successful prediction of synthesis pathways for multiple drug candidate molecules. This proves not only NAG2G's robustness but also its potential to revolutionize the prediction of complex chemical synthesis processes for future synthetic route design tasks.
Collapse
Affiliation(s)
- Lin Yao
- DP
Technology, Beijing 100080, China
| | - Wentao Guo
- DP
Technology, Beijing 100080, China
- Department
of Chemistry, University of California, Davis, California 95616, United States
- Department
of Statistics, University of California, Davis, California 95616, United States
| | - Zhen Wang
- DP
Technology, Beijing 100080, China
| | | | | | - Guolin Ke
- DP
Technology, Beijing 100080, China
| |
Collapse
|
11
|
Yan Y, Zhao Y, Yao H, Feng J, Liang L, Han W, Xu X, Pu C, Zang C, Chen L, Li Y, Liu H, Lu T, Chen Y, Zhang Y. RPBP: Deep Retrosynthesis Reaction Prediction Based on Byproducts. J Chem Inf Model 2023; 63:5956-5970. [PMID: 37724339 DOI: 10.1021/acs.jcim.3c00274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/20/2023]
Abstract
Retrosynthesis prediction is crucial in organic synthesis and drug discovery, aiding chemists in designing efficient synthetic routes for target molecules. Data-driven deep retrosynthesis prediction has gained importance due to new algorithms and enhanced computing power. Although existing models show certain predictive power on the USPTO-50K benchmark data set, no one considers the effects of byproducts during the prediction process, which may be due to the lack of byproduct information in the benchmark data set. Here, we propose a novel two-stage retrosynthesis reaction prediction framework based on byproducts called RPBP. First, RPBP predicts the byproduct involved in the reaction based on the product molecule. Then, it handles an end-to-end prediction problem based on the prediction of reactants by product and byproduct. Unlike other methods that first identify the potential reaction center and then predict reactant molecules, RPBP considers additional information from byproducts, such as reaction reagents, conditions, and sites. Interestingly, adding byproducts reduces model learning complexity in natural language processing (NLP). Our RPBP model achieves 54.7% and 66.6% top-1 retrosynthesis prediction accuracy when the reaction class is unknown and known, respectively. It outperforms existing methods for known-class reactions, thanks to the rich chemical information in byproducts. The prediction of four kinase drugs from the literature demonstrates the model's practicality and potential to accelerate drug discovery.
Collapse
Affiliation(s)
- Yingchao Yan
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Yang Zhao
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Huifeng Yao
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Jie Feng
- State Key Laboratory of Natural Medicines, China Pharmaceutical University, 24 Tongjiaxiang, Nanjing 210009, China
| | - Li Liang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Weijie Han
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Xiaohe Xu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Chengtao Pu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Chengdong Zang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Lingfeng Chen
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Yuanyuan Li
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Haichun Liu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Tao Lu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
- State Key Laboratory of Natural Medicines, China Pharmaceutical University, 24 Tongjiaxiang, Nanjing 210009, China
| | - Yadong Chen
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Yanmin Zhang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| |
Collapse
|