1
|
Zhao PC, Wei XX, Wang Q, Wang QH, Li JN, Shang J, Lu C, Shi JY. Single-step retrosynthesis prediction via multitask graph representation learning. Nat Commun 2025; 16:814. [PMID: 39827189 PMCID: PMC11742932 DOI: 10.1038/s41467-025-56062-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 01/08/2025] [Indexed: 01/22/2025] Open
Abstract
Inferring appropriate synthesis reaction (i.e., retrosynthesis) routes for newly designed molecules is vital. Recently, computational methods have produced promising single-step retrosynthesis predictions. However, template-based methods are limited by the known synthesis templates; template-free methods are weakly interpretable; and semi template-based methods are deficient with regard to utilizing the associations between chemical entities. To address these issues, this paper leverages the intra-associations between synthons, the inter-associations between synthons and leaving groups (LGs), and the intra-associations between LGs. It develops a multitask graph representation learning model for single-step retrosynthesis prediction (Retro-MTGR) to solve reaction centre deduction and LG identification simultaneously. A comparison with 16 state-of-the-art methods first demonstrates the superiority of Retro-MTGR. Then, its robustness and scalability and the contributions of its crucial components are validated. More importantly, it can determine whether a bond can be a reaction centre and what LGs are appropriate for a given synthon, respectively. The answers reflect underlying chemical synthesis rules, especially opposite electrical properties between chemical entities (e.g., reaction sites, synthons, and LGs). Finally, case studies demonstrate that the retrosynthesis routes inferred by Retro-MTGR are promising for single-step synthesis reactions. The code and data of this study are freely available at https://doi.org/10.5281/zenodo.14346324 .
Collapse
Affiliation(s)
- Peng-Cheng Zhao
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, China
| | - Xue-Xin Wei
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, China
| | - Qiong Wang
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, China
| | - Qi-Hao Wang
- School of Chemistry and Chemical Engineering, Northwestern Polytechnical University, Xi'an, China
| | - Jia-Ning Li
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, China
| | - Jie Shang
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, China.
| | - Cheng Lu
- Institute of Basic Research in Clinical Medicine China Academy of Chinese Medical Sciences, Beijing, China.
| | - Jian-Yu Shi
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, China.
| |
Collapse
|
2
|
Tang S, Zhang Y, Chatterjee P. PepTune: De Novo Generation of Therapeutic Peptides with Multi-Objective-Guided Discrete Diffusion. ARXIV 2025:arXiv:2412.17780v3. [PMID: 39764410 PMCID: PMC11703324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/30/2025]
Abstract
Peptide therapeutics, a major class of medicines, have achieved remarkable success across diseases such as diabetes and cancer, with landmark examples such as GLP-1 receptor agonists revolutionizing the treatment of type-2 diabetes and obesity. Despite their success, designing peptides that satisfy multiple conflicting objectives, such as target binding affinity, solubility, and membrane permeability, remains a major challenge. Classical drug development and target structure-based design are ineffective for such tasks, as they fail to optimize global functional properties critical for therapeutic efficacy. Existing generative frameworks are largely limited to continuous spaces, unconditioned outputs, or single-objective guidance, making them unsuitable for discrete sequence optimization across multiple properties. To address this, we present PepTune, a multi-objective discrete diffusion model for the simultaneous generation and optimization of therapeutic peptide SMILES. Built on the Masked Discrete Language Model (MDLM) framework, PepTune ensures valid peptide structures with bond-dependent masking schedules and penalty-based objectives. To guide the diffusion process, we propose a Monte Carlo Tree Search (MCTS)-based strategy that balances exploration and exploitation to iteratively refine Pareto-optimal sequences. MCTS integrates classifier-based rewards with search-tree expansion, overcoming gradient estimation challenges and data sparsity. Using PepTune, we generate diverse, chemically modified peptides optimized for multiple therapeutic properties, including target binding affinity, membrane permeability, solubility, hemolysis, and non-fouling for various disease-relevant targets. In total, our results demonstrate that MCTS-guided masked discrete diffusion is a powerful and modular approach for multi-objective sequence design in discrete state spaces.
Collapse
Affiliation(s)
- Sophia Tang
- Department of Biomedical Engineering, Duke University
- Management and Technology Program, University of Pennsylvania
| | - Yinuo Zhang
- Department of Biomedical Engineering, Duke University
- Center of Computational Biology, Duke-NUS Medical School
| | - Pranam Chatterjee
- Department of Biomedical Engineering, Duke University
- Department of Computer Science, Duke University
- Department of Biostatistics and Bioinformatics, Duke University
| |
Collapse
|
3
|
Kim H, Ryu S, Jung N, Yang J, Seok C. CSearch: chemical space search via virtual synthesis and global optimization. J Cheminform 2024; 16:137. [PMID: 39639340 PMCID: PMC11622599 DOI: 10.1186/s13321-024-00936-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Accepted: 11/22/2024] [Indexed: 12/07/2024] Open
Abstract
The two key components of computational molecular design are virtually generating molecules and predicting the properties of these generated molecules. This study focuses on an effective method for molecular generation through virtual synthesis and global optimization of a given objective function. Using a pre-trained graph neural network (GNN) objective function to approximate the docking energies of compounds for four target receptors, we generated highly optimized compounds with 300-400 times less computational effort compared to virtual compound library screening. These optimized compounds exhibit similar synthesizability and diversity to known binders with high potency and are notably novel compared to library chemicals or known ligands. This method, called CSearch, can be effectively utilized to generate chemicals optimized for a given objective function. With the GNN function approximating docking energies, CSearch generated molecules with predicted binding poses to the target receptors similar to known inhibitors, demonstrating its effectiveness in producing drug-like binders.Scientific Contribution We have developed a method for effectively exploring the chemical space of drug-like molecules using a global optimization algorithm with fragment-based virtual synthesis. The compounds generated using this method optimize the given objective function efficiently and are synthesizable like commercial library compounds. Furthermore, they are diverse, novel drug-like molecules with properties similar to known inhibitors for target receptors.
Collapse
Affiliation(s)
- Hakjean Kim
- Department of Chemistry, Seoul National University, Seoul, 08826, Republic of Korea
| | | | - Nuri Jung
- Department of Chemistry, Seoul National University, Seoul, 08826, Republic of Korea
| | | | - Chaok Seok
- Department of Chemistry, Seoul National University, Seoul, 08826, Republic of Korea.
- Galux Inc, Seoul, 08738, Republic of Korea.
| |
Collapse
|
4
|
Yang Y, Chen G, Li J, Li J, Zhang O, Zhang X, Li L, Hao J, Wang E, Heng PA. Enabling target-aware molecule generation to follow multi objectives with Pareto MCTS. Commun Biol 2024; 7:1074. [PMID: 39223327 PMCID: PMC11368924 DOI: 10.1038/s42003-024-06746-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Accepted: 08/16/2024] [Indexed: 09/04/2024] Open
Abstract
Target-aware drug discovery has greatly accelerated the drug discovery process to design small-molecule ligands with high binding affinity to disease-related protein targets. Conditioned on targeted proteins, previous works utilize various kinds of deep generative models and have shown great potential in generating molecules with strong protein-ligand binding interactions. However, beyond binding affinity, effective drug molecules must manifest other essential properties such as high drug-likeness, which are not explicitly addressed by current target-aware generative methods. In this article, aiming to bridge the gap of multi-objective target-aware molecule generation in the field of deep learning-based drug discovery, we propose ParetoDrug, a Pareto Monte Carlo Tree Search (MCTS) generation algorithm. ParetoDrug searches molecules on the Pareto Front in chemical space using MCTS to enable synchronous optimization of multiple properties. Specifically, ParetoDrug utilizes pretrained atom-by-atom autoregressive generative models for the exploration guidance to desired molecules during MCTS searching. Besides, when selecting the next atom symbol, a scheme named ParetoPUCT is proposed to balance exploration and exploitation. Benchmark experiments and case studies demonstrate that ParetoDrug is highly effective in traversing the large and complex chemical space to discover novel compounds with satisfactory binding affinities and drug-like properties for various multi-objective target-aware drug discovery tasks.
Collapse
Affiliation(s)
- Yaodong Yang
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China
| | | | - Jinpeng Li
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China
| | | | | | | | | | - Jianye Hao
- Noah's Ark Lab, Huawei, Shenzhen, China.
| | | | - Pheng-Ann Heng
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China
| |
Collapse
|
5
|
Lavecchia A. Advancing drug discovery with deep attention neural networks. Drug Discov Today 2024; 29:104067. [PMID: 38925473 DOI: 10.1016/j.drudis.2024.104067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Revised: 06/10/2024] [Accepted: 06/19/2024] [Indexed: 06/28/2024]
Abstract
In the dynamic field of drug discovery, deep attention neural networks are revolutionizing our approach to complex data. This review explores the attention mechanism and its extended architectures, including graph attention networks (GATs), transformers, bidirectional encoder representations from transformers (BERT), generative pre-trained transformers (GPTs) and bidirectional and auto-regressive transformers (BART). Delving into their core principles and multifaceted applications, we uncover their pivotal roles in catalyzing de novo drug design, predicting intricate molecular properties and deciphering elusive drug-target interactions. Despite challenges, these attention-based architectures hold unparalleled promise to drive transformative breakthroughs and accelerate progress in pharmaceutical research.
Collapse
Affiliation(s)
- Antonio Lavecchia
- Drug Discovery Laboratory, Department of Pharmacy, University of Napoli Federico II, I-80131 Naples, Italy.
| |
Collapse
|
6
|
Nowak D, Huczyński A, Bachorz RA, Hoffmann M. Machine Learning Application for Medicinal Chemistry: Colchicine Case, New Structures, and Anticancer Activity Prediction. Pharmaceuticals (Basel) 2024; 17:173. [PMID: 38399388 PMCID: PMC10892630 DOI: 10.3390/ph17020173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 01/02/2024] [Accepted: 01/12/2024] [Indexed: 02/25/2024] Open
Abstract
In the contemporary era, the exploration of machine learning (ML) has gained widespread attention and is being leveraged to augment traditional methodologies in quantitative structure-activity relationship (QSAR) investigations. The principal objective of this research was to assess the anticancer potential of colchicine-based compounds across five distinct cell lines. This research endeavor ultimately sought to construct ML models proficient in forecasting anticancer activity as quantified by the IC50 value, while concurrently generating innovative colchicine-derived compounds. The resistance index (RI) is computed to evaluate the drug resistance exhibited by LoVo/DX cells relative to LoVo cancer cell lines. Meanwhile, the selectivity index (SI) is computed to determine the potential of a compound to demonstrate superior efficacy against tumor cells compared to its toxicity against normal cells, such as BALB/3T3. We introduce a novel ML system adept at recommending novel chemical structures predicated on known anticancer activity. Our investigation entailed the assessment of inhibitory capabilities across five cell lines, employing predictive models utilizing various algorithms, including random forest, decision tree, support vector machines, k-nearest neighbors, and multiple linear regression. The most proficient model, as determined by quality metrics, was employed to predict the anticancer activity of novel colchicine-based compounds. This methodological approach yielded the establishment of a library encompassing new colchicine-based compounds, each assigned an IC50 value. Additionally, this study resulted in the development of a validated predictive model, capable of reasonably estimating IC50 values based on molecular structure input.
Collapse
Affiliation(s)
- Damian Nowak
- Department of Quantum Chemistry, Faculty of Chemistry, Adam Mickiewicz University in Poznan, Uniwersytetu Poznanskiego 8, 61-614 Poznan, Poland
| | - Adam Huczyński
- Department of Medical Chemistry, Faculty of Chemistry, Adam Mickiewicz University in Poznan, Uniwersytetu Poznanskiego 8, 61-614 Poznan, Poland;
| | - Rafał Adam Bachorz
- Institute of Medical Biology of Polish Academy of Sciences, Lodowa 106, 93-232 Lodz, Poland;
- Institute of Computing Science, Faculty of Computing, Poznań University of Technology, Piotrowo 2, 60-965 Poznań, Poland
| | - Marcin Hoffmann
- Department of Quantum Chemistry, Faculty of Chemistry, Adam Mickiewicz University in Poznan, Uniwersytetu Poznanskiego 8, 61-614 Poznan, Poland
| |
Collapse
|
7
|
Zhang Y, Liu C, Liu M, Liu T, Lin H, Huang CB, Ning L. Attention is all you need: utilizing attention in AI-enabled drug discovery. Brief Bioinform 2023; 25:bbad467. [PMID: 38189543 PMCID: PMC10772984 DOI: 10.1093/bib/bbad467] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 11/03/2023] [Accepted: 11/25/2023] [Indexed: 01/09/2024] Open
Abstract
Recently, attention mechanism and derived models have gained significant traction in drug development due to their outstanding performance and interpretability in handling complex data structures. This review offers an in-depth exploration of the principles underlying attention-based models and their advantages in drug discovery. We further elaborate on their applications in various aspects of drug development, from molecular screening and target binding to property prediction and molecule generation. Finally, we discuss the current challenges faced in the application of attention mechanisms and Artificial Intelligence technologies, including data quality, model interpretability and computational resource constraints, along with future directions for research. Given the accelerating pace of technological advancement, we believe that attention-based models will have an increasingly prominent role in future drug discovery. We anticipate that these models will usher in revolutionary breakthroughs in the pharmaceutical domain, significantly accelerating the pace of drug development.
Collapse
Affiliation(s)
- Yang Zhang
- Innovative Institute of Chinese Medicine and Pharmacy, Academy for Interdiscipline, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Caiqi Liu
- Department of Gastrointestinal Medical Oncology, Harbin Medical University Cancer Hospital, No.150 Haping Road, Nangang District, Harbin, Heilongjiang 150081, China
- Key Laboratory of Molecular Oncology of Heilongjiang Province, No.150 Haping Road, Nangang District, Harbin, Heilongjiang 150081, China
| | - Mujiexin Liu
- Chongqing Key Laboratory of Sichuan-Chongqing Co-construction for Diagnosis and Treatment of Infectious Diseases Integrated Traditional Chinese and Western Medicine, College of Medical Technology, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Tianyuan Liu
- Graduate School of Science and Technology, University of Tsukuba, Tsukuba, Japan
| | - Hao Lin
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Cheng-Bing Huang
- School of Computer Science and Technology, Aba Teachers University, Aba, China
| | - Lin Ning
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu 611844, China
| |
Collapse
|