1
|
Ziv Y, Imrie F, Marsden B, Deane CM. MolSnapper: Conditioning Diffusion for Structure-Based Drug Design. J Chem Inf Model 2025; 65:4263-4273. [PMID: 40248896 PMCID: PMC12076506 DOI: 10.1021/acs.jcim.4c02008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2024] [Revised: 04/02/2025] [Accepted: 04/04/2025] [Indexed: 04/19/2025]
Abstract
Generative models have emerged as potentially powerful methods for molecular design, yet challenges persist in generating molecules that effectively bind to the intended target. The ability to control the design process and incorporate prior knowledge would be highly beneficial for better tailoring molecules to fit specific binding sites. In this paper, we introduce MolSnapper, a novel tool that is able to condition diffusion models for structure-based drug design by seamlessly integrating expert knowledge in the form of 3D pharmacophores. We demonstrate through comprehensive testing on both the CrossDocked and Binding MOAD data sets that our method generates molecules better tailored to fit a given binding site, achieving high structural and chemical similarity to the original molecules. Additionally, MolSnapper yields approximately twice as many valid molecules as alternative methods.
Collapse
Affiliation(s)
- Yael Ziv
- Department
of Statistics, University of Oxford, St Giles, Oxford OX1 3LB, U.K.
| | - Fergus Imrie
- Department
of Statistics, University of Oxford, St Giles, Oxford OX1 3LB, U.K.
| | - Brian Marsden
- Nuffield
Department of Medicine, University of Oxford, Old Road, Oxford OX3 7BN, U.K.
| | - Charlotte M. Deane
- Department
of Statistics, University of Oxford, St Giles, Oxford OX1 3LB, U.K.
| |
Collapse
|
2
|
Lv W, Jia X, Tang B, Ma C, Fang X, Jin X, Niu Z, Han X. In silico modeling of targeted protein degradation. Eur J Med Chem 2025; 289:117432. [PMID: 40015161 DOI: 10.1016/j.ejmech.2025.117432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2024] [Revised: 02/18/2025] [Accepted: 02/19/2025] [Indexed: 03/01/2025]
Abstract
Targeted protein degradation (TPD) techniques, particularly proteolysis-targeting chimeras (PROTAC) and molecular glue degraders (MGD), have offered novel strategies in drug discovery. With rapid advancement of computer-aided drug design (CADD) and artificial intelligence-driven drug discovery (AIDD) in the biomedical field, a major focus has become how to effectively integrate these technologies into the TPD drug discovery pipeline to accelerate development, shorten timelines, and reduce costs. Currently, the main research directions for applying CADD and AIDD in TPD include: 1) ternary complex modeling; 2) linker generation; 3) strategies to predict degrader targets, activities and ADME/T properties; 4) In silico degrader design and discovery. Models developed in these areas play a crucial role in target identification, drug design, and optimization at various stages of the discovery process. However, the limited size and quality of datasets related to TPD present challenges, leaving room for further improvement in these models. TPD involves the complex ubiquitin-proteasome system, with numerous factors influencing outcomes. Most current models adopt a static perspective to interpret and predict relevant tasks. In the future, it may be necessary to shift toward dynamic approaches that better capture the intricate relationships among these components. Furthermore, incorporating new and diverse chemical spaces will enhance the precision design and application of TPD agents.
Collapse
Affiliation(s)
- Wenxing Lv
- Cancer Institute (Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education) of the Second Affiliated Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou, 310029, China; Hangzhou Institute of Advanced Technology, Hangzhou, 310000, China.
| | - Xiaojuan Jia
- Cancer Institute (Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education) of the Second Affiliated Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou, 310029, China.
| | - Bowen Tang
- College of Life Sciences, Zhejiang University, Hangzhou, 310058, China; Guangzhou New Block Technology Co., Ltd., Guangzhou, 510000, China.
| | - Chao Ma
- Guangzhou New Block Technology Co., Ltd., Guangzhou, 510000, China.
| | - Xiaopeng Fang
- Hangzhou Institute of Advanced Technology, Hangzhou, 310000, China.
| | - Xurui Jin
- MindRank AI, Hangzhou, 310000, China.
| | - Zhangming Niu
- MindRank AI, Hangzhou, 310000, China; National Heart and Lung Institute, Imperial College London, London, SW7 2AZ, UK.
| | - Xin Han
- Cancer Institute (Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education) of the Second Affiliated Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou, 310029, China; State Key Laboratory for Chemistry and Molecular Engineering of Medicinal Resources (Guangxi Normal University), Guilin, 541004, China.
| |
Collapse
|
3
|
Bassani D, Pavan M, Moro S. Evaluating AutoGrow4 - an open-source toolkit for semi-automated computer-aided drug discovery. Expert Opin Drug Discov 2025:1-10. [PMID: 40299468 DOI: 10.1080/17460441.2025.2499122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2024] [Revised: 04/18/2025] [Accepted: 04/24/2025] [Indexed: 04/30/2025]
Abstract
INTRODUCTION Drug discovery is a long and expensive process characterized by a high failure rate. To make this process more rational and efficient, scientists always look for new and better ways to design novel ligands for a target of interest. Among different approaches, de novo ones gained popularity in the last decade, thanks to their ability to efficiently explore the chemical space and their increasing reliability in generating high-quality compounds. Autogrow4 is open-source software for de novo drug design that generates ligands for a given target by exploiting a combination of a genetic algorithm and molecular docking calculations. AREAS COVERED In the present paper, the authors dissect this program's usefulness and limitations in generating new compounds from a pharmacodynamic and pharmacokinetic perspective. Specifically, this article examines all reported applications of the Autogrow code in the literature (as retrieved from the Scopus database) from the release of its first version in 2009 to the present. EXPERT OPINION In the hands of an expert molecular modeler, Autogrow4 is a useful tool for de novo ligand design. Its modular and open-source codebase offers many protocol customization features. The main downsides are limited control over the pharmacokinetic features of generated ligands and the bias toward high molecular weight compounds.
Collapse
Affiliation(s)
| | | | - Stefano Moro
- Molecular Modeling Section (MMS), Department of Pharmaceutical and Pharmacological Sciences, University of Padova, Padova, Italy
| |
Collapse
|
4
|
Qin R, Zhang H, Huang W, Shao Z, Lei J. Deep learning-based design and screening of benzimidazole-pyrazine derivatives as adenosine A 2B receptor antagonists. J Biomol Struct Dyn 2025; 43:3225-3241. [PMID: 38133953 DOI: 10.1080/07391102.2023.2295974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Accepted: 12/11/2023] [Indexed: 12/24/2023]
Abstract
The Adenosine A2B receptor (A2BAR) is considered a novel potential target for the immunotherapy of cancer, and A2BAR antagonists have an inhibitory effect on tumor growth, proliferation, and metastasis. In our previous studies, we identified a class of benzimidazole-pyrazine scaffolds whose derivatives exhibited the antagonistic effect but lacked subtype selectivity towards A2BAR. In this work, we developed a scaffold-based protocol that incorporates a deep generative model and multilayer virtual screening to design benzimidazole-pyrazine derivatives as potential selective A2BAR antagonists. By utilizing a generative model with reported A2BAR antagonists as the training set, we built up a scaffold-focused library of benzimidazole-pyrazine derivatives and processed a virtual screening protocol to discover potential A2BAR antagonists. Finally, five molecules with different Bemis-Murcko scaffolds were identified and exhibited higher binding free energies than the reference molecule 12o. Further computational analysis revealed that the 3-benzyl derivative ABA-1266 presented high selectivity toward A2BAR and showed preferred draggability, providing future potent development of selective A2BAR antagonists.
Collapse
Affiliation(s)
- Rui Qin
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou, China
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Hao Zhang
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou, China
| | - Weifeng Huang
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou, China
| | - Zhenglin Shao
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou, China
| | - Jinping Lei
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
5
|
Xie W, Zhang J, Xie Q, Gong C, Ren Y, Xie J, Sun Q, Xu Y, Lai L, Pei J. Accelerating discovery of bioactive ligands with pharmacophore-informed generative models. Nat Commun 2025; 16:2391. [PMID: 40064886 PMCID: PMC11894060 DOI: 10.1038/s41467-025-56349-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Accepted: 01/13/2025] [Indexed: 03/14/2025] Open
Abstract
Deep generative models have advanced drug discovery but often generate compounds with limited structural novelty, providing constrained inspiration for medicinal chemists. To address this, we develop TransPharmer, a generative model that integrates ligand-based interpretable pharmacophore fingerprints with a generative pre-training transformer (GPT)-based framework for de novo molecule generation. TransPharmer excels in unconditioned distribution learning, de novo generation, and scaffold elaboration under pharmacophoric constraints. Its unique exploration mode could enhance scaffold hopping, producing structurally distinct but pharmaceutically related compounds. Its efficacy is validated through two case studies involving the dopamine receptor D2 (DRD2) and polo-like kinase 1 (PLK1). Notably, three out of four synthesized PLK1-targeting compounds show submicromolar activities, with the most potent, IIP0943, exhibiting a potency of 5.1 nM. Featuring a new 4-(benzo[b]thiophen-7-yloxy)pyrimidine scaffold, IIP0943 also has high PLK1 selectivity and submicromolar inhibitory activity in HCT116 cell proliferation. TransPharmer offers a promising tool for discovering structurally novel and bioactive ligands.
Collapse
Affiliation(s)
- Weixin Xie
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | | | - Qin Xie
- Infinite Intelligence Pharma, Beijing, China
| | | | - Yuhao Ren
- BNLMS, Peking-Tsinghua Center for Life Sciences at the College of Chemistry and Molecular Engineering, Peking University, Beijing, China
| | - Jin Xie
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Qi Sun
- BNLMS, Peking-Tsinghua Center for Life Sciences at the College of Chemistry and Molecular Engineering, Peking University, Beijing, China
- Peking University Chengdu Academy for Advanced Interdisciplinary Biotechnologies, Chengdu, China
- Research Unit of Drug Design Method, Chinese Academy of Medical Sciences, Beijing, China
| | - Youjun Xu
- Infinite Intelligence Pharma, Beijing, China.
| | - Luhua Lai
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China.
- BNLMS, Peking-Tsinghua Center for Life Sciences at the College of Chemistry and Molecular Engineering, Peking University, Beijing, China.
- Peking University Chengdu Academy for Advanced Interdisciplinary Biotechnologies, Chengdu, China.
- Research Unit of Drug Design Method, Chinese Academy of Medical Sciences, Beijing, China.
| | - Jianfeng Pei
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China.
- Research Unit of Drug Design Method, Chinese Academy of Medical Sciences, Beijing, China.
| |
Collapse
|
6
|
Oestreich M, Merdivan E, Lee M, Schultze JL, Piraud M, Becker M. DrugDiff: small molecule diffusion model with flexible guidance towards molecular properties. J Cheminform 2025; 17:23. [PMID: 40001177 PMCID: PMC11854002 DOI: 10.1186/s13321-025-00965-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2024] [Accepted: 01/27/2025] [Indexed: 02/27/2025] Open
Abstract
With the cost/yield-ratio of drug development becoming increasingly unfavourable, recent work has explored machine learning to accelerate early stages of the development process. Given the current success of deep generative models across domains, we here investigated their application to the property-based proposal of new small molecules for drug development. Specifically, we trained a latent diffusion model-DrugDiff-paired with predictor guidance to generate novel compounds with a variety of desired molecular properties. The architecture was designed to be highly flexible and easily adaptable to future scenarios. Our experiments showed successful generation of unique, diverse and novel small molecules with targeted properties. The code is available at https://github.com/MarieOestreich/DrugDiff . SCIENTIFIC CONTRIBUTION: This work expands the use of generative modelling in the field of drug development from previously introduced models for proteins and RNA to the here presented application to small molecules. With small molecules making up the majority of drugs, but simultaneously being difficult to model due to their elaborate chemical rules, this work tackles a new level of difficulty in comparison to sequence-based molecule generation as is the case for proteins and RNA. Additionally, the demonstrated framework is highly flexible, allowing easy addition or removal of considered molecular properties without the need to retrain the model, making it highly adaptable to diverse research settings and it shows compelling performance for a wide variety of targeted molecular properties.
Collapse
Affiliation(s)
- Marie Oestreich
- Modular High-Performance Computing and Artificial Intelligence, German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany.
- Systems Medicine, German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany.
| | - Erinc Merdivan
- Helmholtz AI, Helmholtz Munich, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany
| | - Michael Lee
- Modular High-Performance Computing and Artificial Intelligence, German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany
- Systems Medicine, German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany
| | - Joachim L Schultze
- Systems Medicine, German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany
- Genomics & Immunoregulation, Life and Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany
- PRECISE Platform for Single Cell Genomics and Epigenomics, DZNE and University of Bonn, Bonn, Germany
| | - Marie Piraud
- Helmholtz AI, Helmholtz Munich, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany
| | - Matthias Becker
- Modular High-Performance Computing and Artificial Intelligence, German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany.
- Systems Medicine, German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany.
| |
Collapse
|
7
|
Cree B, Bieniek MK, Amin S, Kawamura A, Cole DJ. Active learning driven prioritisation of compounds from on-demand libraries targeting the SARS-CoV-2 main protease. DIGITAL DISCOVERY 2025; 4:438-450. [PMID: 39816163 PMCID: PMC11726688 DOI: 10.1039/d4dd00343h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/25/2024] [Accepted: 01/08/2025] [Indexed: 01/18/2025]
Abstract
FEgrow is an open-source software package for building congeneric series of compounds in protein binding pockets. For a given ligand core and receptor structure, it employs hybrid machine learning/molecular mechanics potential energy functions to optimise the bioactive conformers of supplied linkers and functional groups. Here, we introduce significant new functionality to automate, parallelise and accelerate the building and scoring of compound suggestions, such that it can be used for automated de novo design. We interface the workflow with active learning to improve the efficiency of searching the combinatorial space of possible linkers and functional groups, make use of interactions formed by crystallographic fragments in scoring compound designs, and introduce the option to seed the chemical space with molecules available from on-demand chemical libraries. As a test case, we target the main protease (Mpro) of SARS-CoV-2, identifying several small molecules with high similarity to molecules discovered by the COVID moonshot effort, using only structural information from a fragment screen in a fully automated fashion. Finally, we order and test 19 compound designs, of which three show weak activity in a fluorescence-based Mpro assay, but work is needed to further optimise the prioritisation of compounds for purchase. The FEgrow package and full tutorials demonstrating the active learning workflow are available at https://github.com/cole-group/FEgrow.
Collapse
Affiliation(s)
- Ben Cree
- School of Natural and Environmental Sciences, Newcastle University Newcastle Upon Tyne NE1 7RU UK
| | - Mateusz K Bieniek
- School of Natural and Environmental Sciences, Newcastle University Newcastle Upon Tyne NE1 7RU UK
| | - Siddique Amin
- School of Natural and Environmental Sciences, Newcastle University Newcastle Upon Tyne NE1 7RU UK
| | - Akane Kawamura
- School of Natural and Environmental Sciences, Newcastle University Newcastle Upon Tyne NE1 7RU UK
| | - Daniel J Cole
- School of Natural and Environmental Sciences, Newcastle University Newcastle Upon Tyne NE1 7RU UK
| |
Collapse
|
8
|
An Q, Huang L, Wang C, Wang D, Tu Y. New strategies to enhance the efficiency and precision of drug discovery. Front Pharmacol 2025; 16:1550158. [PMID: 40008135 PMCID: PMC11850385 DOI: 10.3389/fphar.2025.1550158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2024] [Accepted: 01/22/2025] [Indexed: 02/27/2025] Open
Abstract
Drug discovery plays a crucial role in medicinal chemistry, serving as the cornerstone for developing new treatments to address a wide range of diseases. This review emphasizes the significance of advanced strategies, such as Click Chemistry, Targeted Protein Degradation (TPD), DNA-Encoded Libraries (DELs), and Computer-Aided Drug Design (CADD), in boosting the drug discovery process. Click Chemistry streamlines the synthesis of diverse compound libraries, facilitating efficient hit discovery and lead optimization. TPD harnesses natural degradation pathways to target previously undruggable proteins, while DELs enable high-throughput screening of millions of compounds. CADD employs computational methods to refine candidate selection and reduce resource expenditure. To demonstrate the utility of these methodologies, we highlight exemplary small molecules discovered in the past decade, along with a summary of marketed drugs and investigational new drugs that exemplify their clinical impact. These examples illustrate how these techniques directly contribute to advancing medicinal chemistry from the bench to bedside. Looking ahead, Artificial Intelligence (AI) technologies and interdisciplinary collaboration are poised to address the growing complexity of drug discovery. By fostering a deeper understanding of these transformative strategies, this review aims to inspire innovative research directions and further advance the field of medicinal chemistry.
Collapse
Affiliation(s)
| | | | | | - Dongmei Wang
- Scientific Research and Teaching Department, Public Health Clinical Center of Chengdu, Chengdu, Sichuan, China
| | - Yalan Tu
- Scientific Research and Teaching Department, Public Health Clinical Center of Chengdu, Chengdu, Sichuan, China
| |
Collapse
|
9
|
Dai J, Zhou Z, Zhao Y, Kong F, Zhai Z, Zhu Z, Cai J, Huang S, Xu Y, Sun T. Combined usage of ligand- and structure-based virtual screening in the artificial intelligence era. Eur J Med Chem 2025; 283:117162. [PMID: 39673863 DOI: 10.1016/j.ejmech.2024.117162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2024] [Revised: 11/27/2024] [Accepted: 12/09/2024] [Indexed: 12/16/2024]
Abstract
Drug design has always been pursuing techniques with time- and cost-benefits. Virtual screening, generally classified as ligand-based (LBVS) and structure-based (SBVS) approaches, could identify active compounds in the large chemical library to reduce time and cost. Owing to the intrinsic flaws and complementary nature of both approaches, continued efforts have been made to combine them to mitigate limitations. Meanwhile, the emergence of machine learning (ML) endows them with opportunities to leverage vast amounts of data to improve their defects. However, few discussions on how to merge ML-improved LBVS and SBVS have been conducted. Therefore, this review provides insights into combined usage of ML-improved LBVS and SBVS to enlighten medicinal chemists to utilize these joint strategies to lift the screening efficiency as well as AI professionals to design novel techniques.
Collapse
Affiliation(s)
- Jingyi Dai
- School of Intelligent Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, Sichuan, China.
| | - Ziyi Zhou
- School of Intelligent Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, Sichuan, China.
| | - Yanru Zhao
- School of Intelligent Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, Sichuan, China.
| | - Fanjing Kong
- School of Intelligent Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, Sichuan, China.
| | - Zhenwei Zhai
- School of Intelligent Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, Sichuan, China.
| | - Zhishan Zhu
- School of Intelligent Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, Sichuan, China.
| | - Jie Cai
- School of Intelligent Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, Sichuan, China.
| | - Sha Huang
- School of Intelligent Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, Sichuan, China.
| | - Ying Xu
- Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, 610072, Sichuan, China.
| | - Tao Sun
- School of Intelligent Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, Sichuan, China; State Key Laboratory of Southwestern Chinese Medicine Resources, School of Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, Sichuan, China.
| |
Collapse
|
10
|
Ferla MP, Sánchez-García R, Skyner RE, Gahbauer S, Taylor JC, von Delft F, Marsden BD, Deane CM. Fragmenstein: predicting protein-ligand structures of compounds derived from known crystallographic fragment hits using a strict conserved-binding-based methodology. J Cheminform 2025; 17:4. [PMID: 39806443 PMCID: PMC11731148 DOI: 10.1186/s13321-025-00946-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2024] [Accepted: 01/01/2025] [Indexed: 01/16/2025] Open
Abstract
Current strategies centred on either merging or linking initial hits from fragment-based drug design (FBDD) crystallographic screens generally do not fully leaverage 3D structural information. We show that an algorithmic approach (Fragmenstein) that 'stitches' the ligand atoms from this structural information together can provide more accurate and reliable predictions for protein-ligand complex conformation than general methods such as pharmacophore-constrained docking. This approach works under the assumption of conserved binding: when a larger molecule is designed containing the initial fragment hit, the common substructure between the two will adopt the same binding mode. Fragmenstein either takes the atomic coordinates of ligands from a experimental fragment screen and combines the atoms together to produce a novel merged virtual compound, or uses them to predict the bound complex for a provided molecule. The molecule is then energy minimised under strong constraints to obtain a structurally plausible conformer. The code is available at https://github.com/oxpig/Fragmenstein .Scientific contributionThis work shows the importance of using the coordinates of known binders when predicting the conformation of derivative molecules through a retrospective analysis of the COVID Moonshot data. This method has had a prior real-world application in hit-to-lead screening, yielding a sub-micromolar merger from parent hits in a single round. It is therefore likely to further benefit future drug design campaigns and be integrated in future pipelines.
Collapse
Affiliation(s)
- Matteo P Ferla
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford, UK.
- Centre for Medicine Discoveries, Nuffield Department of Medicine, University of Oxford, Oxford, UK.
- Wellcome Centre for Human Genetics, NIHR Oxford BRC Genomic Medicine, University of Oxford, Oxford, UK.
| | - Rubén Sánchez-García
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford, UK
| | - Rachael E Skyner
- Diamond Light Source, Science and Technology Facilities Council, Oxford, UK
- OMass Therapeutics, ARC Oxford, Oxford, UK
| | - Stefan Gahbauer
- Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, USA
| | - Jenny C Taylor
- Wellcome Centre for Human Genetics, NIHR Oxford BRC Genomic Medicine, University of Oxford, Oxford, UK
| | - Frank von Delft
- Centre for Medicine Discoveries, Nuffield Department of Medicine, University of Oxford, Oxford, UK
- Diamond Light Source, Science and Technology Facilities Council, Oxford, UK
- Department of Biochemistry, University of Johannesburg, Johannesburg, South Africa
| | - Brian D Marsden
- Centre for Medicine Discoveries, Nuffield Department of Medicine, University of Oxford, Oxford, UK
- Diamond Light Source, Science and Technology Facilities Council, Oxford, UK
| | - Charlotte M Deane
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford, UK
| |
Collapse
|
11
|
Aggarwal R, R Koes D. PharmRL: pharmacophore elucidation with deep geometric reinforcement learning. BMC Biol 2024; 22:301. [PMID: 39736736 DOI: 10.1186/s12915-024-02096-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2024] [Accepted: 12/16/2024] [Indexed: 01/01/2025] Open
Abstract
BACKGROUND Molecular interactions between proteins and their ligands are important for drug design. A pharmacophore consists of favorable molecular interactions in a protein binding site and can be utilized for virtual screening. Pharmacophores are easiest to identify from co-crystal structures of a bound protein-ligand complex. However, designing a pharmacophore in the absence of a ligand is a much harder task. RESULTS In this work, we develop a deep learning method that can identify pharmacophores in the absence of a ligand. Specifically, we train a CNN model to identify potential favorable interactions in the binding site, and develop a deep geometric Q-learning algorithm that attempts to select an optimal subset of these interaction points to form a pharmacophore. With this algorithm, we show better prospective virtual screening performance, in terms of F1 scores, on the DUD-E dataset than random selection of ligand-identified features from co-crystal structures. We also conduct experiments on the LIT-PCBA dataset and show that it provides efficient solutions for identifying active molecules. Finally, we test our method by screening the COVID moonshot dataset and show that it would be effective in identifying prospective lead molecules even in the absence of fragment screening experiments. CONCLUSIONS PharmRL addresses the need for automated methods in pharmacophore design, particularly in cases where a cognate ligand is unavailable. Experimental results demonstrate that PharmRL generates functional pharmacophores. Additionally, we provide a Google Colab notebook to facilitate the use of this method.
Collapse
Affiliation(s)
- Rishal Aggarwal
- Joint PhD Program in Computational Biology, Carnegie Mellon University-University of Pittsburgh, Pittsburgh, PA, USA
- Computational & Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
| | - David R Koes
- Computational & Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA.
| |
Collapse
|
12
|
Seo S, Kim WY. PharmacoNet: deep learning-guided pharmacophore modeling for ultra-large-scale virtual screening. Chem Sci 2024; 15:19473-19487. [PMID: 39568882 PMCID: PMC11575537 DOI: 10.1039/d4sc04854g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2024] [Accepted: 11/03/2024] [Indexed: 11/22/2024] Open
Abstract
As ultra-large-scale virtual screening becomes critical for early-stage drug discovery, highly efficient screening methods are gaining prominence. Deep-learning-based approaches which directly estimate binding affinities without binding conformation have attracted great attention as an alternative solution to molecular docking, but the generalization capability of existing methods in vast chemical space remains uncertain due to restricted training data. Here, we introduce PharmacoNet, the first deep-learning framework for pharmacophore modeling toward ultra-fast virtual screening. PharmacoNet offers fully automated protein-based pharmacophore modeling and evaluates the potency of ligands with a parameterized analytical scoring function, ensuring high generalization ability across unseen targets and ligands. Our benchmark study shows that PharmacoNet is extremely fast yet reasonably accurate compared to traditional docking methods and existing deep learning-based scoring models. We successfully identified selective inhibitors from 187 million compounds against cannabinoid receptors within 21 hours on a single CPU. This study uncovers the hitherto untapped potential of deep learning in pharmacophore modeling.
Collapse
Affiliation(s)
- Seonghwan Seo
- Department of Chemistry, KAIST 291 Daehak-ro, Yuseong-gu Daejeon 34141 Republic of Korea
| | - Woo Youn Kim
- Department of Chemistry, KAIST 291 Daehak-ro, Yuseong-gu Daejeon 34141 Republic of Korea
- Graduate School of Data Science, KAIST 291 Daehak-ro, Yuseong-gu Daejeon 34141 Republic of Korea
- HITS Inc. 28 Teheran-ro 4-gil, Gangnam-gu Seoul 06234 Republic of Korea
| |
Collapse
|
13
|
Zhang O, Lin H, Zhang H, Zhao H, Huang Y, Hsieh CY, Pan P, Hou T. Deep Lead Optimization: Leveraging Generative AI for Structural Modification. J Am Chem Soc 2024; 146:31357-31370. [PMID: 39499822 DOI: 10.1021/jacs.4c11686] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2024]
Abstract
The integration of deep learning-based molecular generation models into drug discovery has garnered significant attention for its potential to expedite the development process. Central to this is lead optimization, a critical phase where existing molecules are refined into viable drug candidates. As various methods for deep lead optimization continue to emerge, it is essential to classify these approaches more clearly. We categorize lead optimization methods into two main types: goal-directed and structure-directed. Our focus is on structure-directed optimization, which, while highly relevant to practical applications, is less explored compared to goal-directed methods. Through a systematic review of conventional computational approaches, we identify four tasks specific to structure-directed optimization: fragment replacement, linker design, scaffold hopping, and side-chain decoration. We discuss the motivations, training data construction, and current developments for each of these tasks. Additionally, we use classical optimization taxonomy to classify both goal-directed and structure-directed methods, highlighting their challenges and future development prospects. Finally, we propose a reference protocol for experimental chemists to effectively utilize Generative AI (GenAI)-based tools in structural modification tasks, bridging the gap between methodological advancements and practical applications.
Collapse
Affiliation(s)
- Odin Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Haitao Lin
- AI Lab, Research Center for Industries of the Future, Westlake University, Hangzhou 310024, Zhejiang, China
| | - Hui Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Huifeng Zhao
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Yufei Huang
- AI Lab, Research Center for Industries of the Future, Westlake University, Hangzhou 310024, Zhejiang, China
| | - Chang-Yu Hsieh
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Peichen Pan
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| |
Collapse
|
14
|
Abbas A, Ye F. Computational methods and key considerations for in silico design of proteolysis targeting chimera (PROTACs). Int J Biol Macromol 2024; 277:134293. [PMID: 39084437 DOI: 10.1016/j.ijbiomac.2024.134293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Revised: 07/19/2024] [Accepted: 07/28/2024] [Indexed: 08/02/2024]
Abstract
Proteolysis-targeting chimeras (PROTACs), as heterobifunctional molecules, have garnered significant attention for their ability to target previously undruggable proteins. Due to the challenges in obtaining crystal structures of PROTAC molecules in the ternary complex, a plethora of computational tools have been developed to aid in PROTAC design. These computational tools can be broadly classified into artificial intelligence (AI)-based or non-AI-based methods. This review aims to provide a comprehensive overview of the latest computational methods for the PROTAC design process, covering both AI and non-AI approaches, from protein selection to ternary complex modeling and prediction. Key considerations for in silico PROTAC design are discussed, along with additional considerations for deploying AI-based models. These considerations are intended to guide subsequent model development in the PROTAC design process. Finally, future directions and recommendations are provided.
Collapse
Affiliation(s)
- Amr Abbas
- College of Life Sciences and Medicine, Zhejiang Sci-Tech University, Hangzhou 310018, China; Pharmaceutical Chemistry Department, Faculty of Pharmacy, Cairo University, Cairo 11562, Egypt
| | - Fei Ye
- College of Life Sciences and Medicine, Zhejiang Sci-Tech University, Hangzhou 310018, China.
| |
Collapse
|
15
|
Shi J, Gao S, Zhang PX, Zhang FH, Zhao LX, Ye F, Fu Y. Identification of novel dual-target 4-hydroxyphenylpyruvate dioxygenase & phytoene dehydrogenase inhibitors via multiple virtual screening. Int J Biol Macromol 2024; 276:133892. [PMID: 39019355 DOI: 10.1016/j.ijbiomac.2024.133892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 07/12/2024] [Accepted: 07/13/2024] [Indexed: 07/19/2024]
Abstract
Two important plant enzymes are 4-hydroxyphenylpyruvate dioxygenase (HPPD; EC 1.13.11.27), which is necessary for biosynthesis of plastoquinone and tocopherols, and phytoene dehydrogenase (PDS; EC 1.3.99.26), which plays an important role in colour rendering. Dual-target proteins that inhibit pigment synthesis will prevent resistant weeds and improve the spectral characteristics of herbicides. This study introduces virtual screening of pharmacophores based on the complex structure of the two targets. A three-dimensional database was established by screening 1,492,858 compounds based on the Lipinski principle. HPPD&PDS dual-target receptor-ligand pharmacophore models were then constructed, and nine potential dual-target inhibitors were obtained through pharmacophore modeling, molecular docking, and molecular dynamics simulations. Ultimately, ADMET prediction software yielded three compounds with high potential as dual-target herbicides. The obtained nine inhibitors were stable when combined with both HPPD and PDS proteins. This study offers guidance for the development of HPPD&PDS dual-target inhibitors with novel skeletons.
Collapse
Affiliation(s)
- Juan Shi
- Department of Chemistry, College of Arts and Sciences, Northeast Agricultural University, Harbin 150030, PR China
| | - Shuang Gao
- Department of Chemistry, College of Arts and Sciences, Northeast Agricultural University, Harbin 150030, PR China; Key Laboratory of Agricultural Functional Molecule Design and Utilization of Heilongjiang Province, PR China
| | - Pan-Xiu Zhang
- Department of Chemistry, College of Arts and Sciences, Northeast Agricultural University, Harbin 150030, PR China
| | - Fang-Hao Zhang
- Department of Chemistry, College of Arts and Sciences, Northeast Agricultural University, Harbin 150030, PR China
| | - Li-Xia Zhao
- Department of Chemistry, College of Arts and Sciences, Northeast Agricultural University, Harbin 150030, PR China; Key Laboratory of Agricultural Functional Molecule Design and Utilization of Heilongjiang Province, PR China
| | - Fei Ye
- Department of Chemistry, College of Arts and Sciences, Northeast Agricultural University, Harbin 150030, PR China; Key Laboratory of Agricultural Functional Molecule Design and Utilization of Heilongjiang Province, PR China.
| | - Ying Fu
- Department of Chemistry, College of Arts and Sciences, Northeast Agricultural University, Harbin 150030, PR China; Key Laboratory of Agricultural Functional Molecule Design and Utilization of Heilongjiang Province, PR China.
| |
Collapse
|
16
|
Li Y, Avelar PHDC, Chen X, Zhang L, Wu M, Tsoka S. CLigOpt: controllable ligand design through target-specific optimization. Bioinformatics 2024; 40:ii62-ii69. [PMID: 39230708 PMCID: PMC11373314 DOI: 10.1093/bioinformatics/btae396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/05/2024] Open
Abstract
MOTIVATION A key challenge in deep generative models for molecular design is to navigate random sampling of the vast molecular space, and produce promising molecules that strike a balance across multiple chemical criteria. Fragment-based drug design (FBDD), using fragments as starting points, is an effective way to constrain chemical space and improve generation of biologically active molecules. Furthermore, optimization approaches are often implemented with generative models to search through chemical space, and identify promising samples which satisfy specific properties. Controllable FBDD has promising potential in efficient target-specific ligand design. RESULTS We propose a controllable FBDD model, CLigOpt, which can generate molecules with desired properties from a given fragment pair. CLigOpt is a variational autoencoder-based model which utilizes co-embeddings of node and edge features to fully mine information from molecular graphs, as well as a multi-objective Controllable Generation Module to generate molecules under property controls. CLigOpt achieves consistently strong performance in generating structurally and chemically valid molecules, as evaluated across six metrics. Applicability is illustrated through ligand candidates for hDHFR and it is shown that the proportion of feasible active molecules from the generated set is increased by 10%. Molecular docking and synthesizability prediction tasks are conducted to prioritize generated molecules to derive potential lead compounds. AVAILABILITY AND IMPLEMENTATION The source code is available via https://github.com/yutongLi1997/CLigOpt-Controllable-Ligand-Design-through-Target-Specific-Optimisation.
Collapse
Affiliation(s)
- Yutong Li
- Department of Informatics, King's College London, London WC2B 4BG, United Kingdom
| | - Pedro Henrique da Costa Avelar
- Department of Informatics, King's College London, London WC2B 4BG, United Kingdom
- Institute for Infocomm Research, Agency for Science, Technology and Research (A*STAR), Singapore 138632, Singapore
| | - Xinyue Chen
- Department of Informatics, King's College London, London WC2B 4BG, United Kingdom
| | - Li Zhang
- Institute for Infocomm Research, Agency for Science, Technology and Research (A*STAR), Singapore 138632, Singapore
| | - Min Wu
- Institute for Infocomm Research, Agency for Science, Technology and Research (A*STAR), Singapore 138632, Singapore
| | - Sophia Tsoka
- Department of Informatics, King's College London, London WC2B 4BG, United Kingdom
| |
Collapse
|
17
|
Yue J, Peng B, Chen Y, Jin J, Zhao X, Shen C, Ji X, Hsieh CY, Song J, Hou T, Deng Y, Wang J. Unlocking comprehensive molecular design across all scenarios with large language model and unordered chemical language. Chem Sci 2024; 15:13727-13740. [PMID: 39211505 PMCID: PMC11352393 DOI: 10.1039/d4sc03744h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Accepted: 07/28/2024] [Indexed: 09/04/2024] Open
Abstract
Molecular generation stands at the forefront of AI-driven technologies, playing a crucial role in accelerating the development of small molecule drugs. The intricate nature of practical drug discovery necessitates the development of a versatile molecular generation framework that can tackle diverse drug design challenges. However, existing methodologies often struggle to encompass all aspects of small molecule drug design, particularly those rooted in language models, especially in tasks like linker design, due to the autoregressive nature of large language model-based approaches. To empower a language model for a wider range of molecular design tasks, we introduce an unordered simplified molecular-input line-entry system based on fragments (FU-SMILES). Building upon this foundation, we propose FragGPT, a universal fragment-based molecular generation model. Initially pretrained on extensive molecular datasets, FragGPT utilizes FU-SMILES to facilitate efficient generation across various practical applications, such as de novo molecule design, linker design, R-group exploration, scaffold hopping, and side chain optimization. Furthermore, we integrate conditional generation and reinforcement learning (RL) methodologies to ensure that the generated molecules possess multiple desired biological and physicochemical properties. Experimental results across diverse scenarios validate FragGPT's superiority in generating molecules with enhanced properties and novel structures, outperforming existing state-of-the-art models. Moreover, its robust drug design capability is further corroborated through real-world drug design cases.
Collapse
Affiliation(s)
- Jie Yue
- College of Information Engineering, Hebei University of Architecture Zhangjiakou 075132 Hebei China
| | - Bingxin Peng
- College of Information Engineering, Hebei University of Architecture Zhangjiakou 075132 Hebei China
- CarbonSilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Yu Chen
- CarbonSilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Jieyu Jin
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Xinda Zhao
- CarbonSilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Chao Shen
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
- CarbonSilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Xiangyang Ji
- Department of Automation, Tsinghua University Beijing 100084 China
| | - Chang-Yu Hsieh
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
- CarbonSilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Jianfei Song
- CarbonSilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
- CarbonSilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
- Department of Automation, Tsinghua University Beijing 100084 China
| | - Jike Wang
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
- CarbonSilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| |
Collapse
|
18
|
van Tilborg D, Brinkmann H, Criscuolo E, Rossen L, Özçelik R, Grisoni F. Deep learning for low-data drug discovery: Hurdles and opportunities. Curr Opin Struct Biol 2024; 86:102818. [PMID: 38669740 DOI: 10.1016/j.sbi.2024.102818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 03/27/2024] [Accepted: 03/29/2024] [Indexed: 04/28/2024]
Abstract
Deep learning is becoming increasingly relevant in drug discovery, from de novo design to protein structure prediction and synthesis planning. However, it is often challenged by the small data regimes typical of certain drug discovery tasks. In such scenarios, deep learning approaches-which are notoriously 'data-hungry'-might fail to live up to their promise. Developing novel approaches to leverage the power of deep learning in low-data scenarios is sparking great attention, and future developments are expected to propel the field further. This mini-review provides an overview of recent low-data-learning approaches in drug discovery, analyzing their hurdles and advantages. Finally, we venture to provide a forecast of future research directions in low-data learning for drug discovery.
Collapse
Affiliation(s)
- Derek van Tilborg
- Institute for Complex Molecular Systems (ICMS), Department of Biomedical Engineering, Eindhoven University of Technology, PO Box 513, 5600 MB Eindhoven, the Netherlands; Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Princetonlaan 6, 3584 CB, Utrecht, the Netherlands. https://twitter.com/DerekvTilborg
| | - Helena Brinkmann
- Institute for Complex Molecular Systems (ICMS), Department of Biomedical Engineering, Eindhoven University of Technology, PO Box 513, 5600 MB Eindhoven, the Netherlands. https://twitter.com/hlnbrkmnn
| | - Emanuele Criscuolo
- Institute for Complex Molecular Systems (ICMS), Department of Biomedical Engineering, Eindhoven University of Technology, PO Box 513, 5600 MB Eindhoven, the Netherlands. https://twitter.com/emanuelecriscu9
| | - Luke Rossen
- Institute for Complex Molecular Systems (ICMS), Department of Biomedical Engineering, Eindhoven University of Technology, PO Box 513, 5600 MB Eindhoven, the Netherlands. https://twitter.com/molecular_ml
| | - Rıza Özçelik
- Institute for Complex Molecular Systems (ICMS), Department of Biomedical Engineering, Eindhoven University of Technology, PO Box 513, 5600 MB Eindhoven, the Netherlands; Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Princetonlaan 6, 3584 CB, Utrecht, the Netherlands. https://twitter.com/Rza_ozcelik
| | - Francesca Grisoni
- Institute for Complex Molecular Systems (ICMS), Department of Biomedical Engineering, Eindhoven University of Technology, PO Box 513, 5600 MB Eindhoven, the Netherlands; Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Princetonlaan 6, 3584 CB, Utrecht, the Netherlands.
| |
Collapse
|
19
|
Kim H, Lee K, Kim C, Lim J, Kim WY. DFRscore: Deep Learning-Based Scoring of Synthetic Complexity with Drug-Focused Retrosynthetic Analysis for High-Throughput Virtual Screening. J Chem Inf Model 2024; 64:2432-2444. [PMID: 37651152 DOI: 10.1021/acs.jcim.3c01134] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
Recently emerging generative AI models enable us to produce a vast number of compounds for potential applications. While they can provide novel molecular structures, the synthetic feasibility of the generated molecules is often questioned. To address this issue, a few recent studies have attempted to use deep learning models to estimate the synthetic accessibility of many molecules rapidly. However, retrosynthetic analysis tools used to train the models rely on reaction templates automatically extracted from a large reaction database that are not domain-specific and may exhibit low chemical correctness. To overcome this limitation, we introduce DFRscore (Drug-Focused Retrosynthetic score), a deep learning-based approach for a more practical assessment of synthetic accessibility in drug discovery. The DFRscore model is trained exclusively on drug-focused reactions, providing a predicted number of minimally required synthetic steps for each compound. This approach enables practitioners to filter out compounds that do not meet their desired level of synthetic accessibility at an early stage of high-throughput virtual screening for accelerated drug discovery. The proposed strategy can be easily adapted to other domains by adjusting the synthesis planning setup of the reaction templates and starting materials.
Collapse
Affiliation(s)
- Hyeongwoo Kim
- Department of Chemistry, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
| | - Kyunghoon Lee
- Department of Chemistry, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
| | - Chansu Kim
- Department of Chemistry, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
| | - Jaechang Lim
- HITS Incorporation, 124 Teheran-ro, Gangnam-gu, Seoul 06234, Republic of Korea
| | - Woo Youn Kim
- Department of Chemistry, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
- HITS Incorporation, 124 Teheran-ro, Gangnam-gu, Seoul 06234, Republic of Korea
- AI Institute, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
| |
Collapse
|
20
|
Tang Y, Moretti R, Meiler J. Recent Advances in Automated Structure-Based De Novo Drug Design. J Chem Inf Model 2024; 64:1794-1805. [PMID: 38485516 PMCID: PMC10966644 DOI: 10.1021/acs.jcim.4c00247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2024] [Revised: 02/26/2024] [Accepted: 02/29/2024] [Indexed: 03/26/2024]
Abstract
As the number of determined and predicted protein structures and the size of druglike 'make-on-demand' libraries soar, the time-consuming nature of structure-based computer-aided drug design calls for innovative computational algorithms. De novo drug design introduces in silico heuristics to accelerate searching in the vast chemical space. This review focuses on recent advances in structure-based de novo drug design, ranging from conventional fragment-based methods, evolutionary algorithms, and Metropolis Monte Carlo methods to deep generative models. Due to the historical limitation of de novo drug design generating readily available drug-like molecules, we highlight the synthetic accessibility efforts in each category and the benchmarking strategies taken to validate the proposed framework.
Collapse
Affiliation(s)
- Yidan Tang
- Department
of Chemistry, Vanderbilt University, Nashville, Tennessee 37235, United States
| | - Rocco Moretti
- Department
of Chemistry, Vanderbilt University, Nashville, Tennessee 37235, United States
- Center
for Structural Biology, Vanderbilt University, Nashville, Tennessee 37240, United States
| | - Jens Meiler
- Department
of Chemistry, Vanderbilt University, Nashville, Tennessee 37235, United States
- Center
for Structural Biology, Vanderbilt University, Nashville, Tennessee 37240, United States
- Institute
of Drug Discovery, Faculty of Medicine, University of Leipzig, 04103 Leipzig, Germany
| |
Collapse
|
21
|
Wang M, Wu Z, Wang J, Weng G, Kang Y, Pan P, Li D, Deng Y, Yao X, Bing Z, Hsieh CY, Hou T. Genetic Algorithm-Based Receptor Ligand: A Genetic Algorithm-Guided Generative Model to Boost the Novelty and Drug-Likeness of Molecules in a Sampling Chemical Space. J Chem Inf Model 2024; 64:1213-1228. [PMID: 38302422 DOI: 10.1021/acs.jcim.3c01964] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2024]
Abstract
Deep learning-based de novo molecular design has recently gained significant attention. While numerous DL-based generative models have been successfully developed for designing novel compounds, the majority of the generated molecules lack sufficiently novel scaffolds or high drug-like profiles. The aforementioned issues may not be fully captured by commonly used metrics for the assessment of molecular generative models, such as novelty, diversity, and quantitative estimation of the drug-likeness score. To address these limitations, we proposed a genetic algorithm-guided generative model called GARel (genetic algorithm-based receptor-ligand interaction generator), a novel framework for training a DL-based generative model to produce drug-like molecules with novel scaffolds. To efficiently train the GARel model, we utilized dense net to update the parameters based on molecules with novel scaffolds and drug-like features. To demonstrate the capability of the GARel model, we used it to design inhibitors for three targets: AA2AR, EGFR, and SARS-Cov2. The results indicate that GARel-generated molecules feature more diverse and novel scaffolds and possess more desirable physicochemical properties and favorable docking scores. Compared with other generative models, GARel makes significant progress in balancing novelty and drug-likeness, providing a promising direction for the further development of DL-based de novo design methodology with potential impacts on drug discovery.
Collapse
Affiliation(s)
- Mingyang Wang
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
- CarbonSilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang ,China
| | - Zhengjian Wu
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
- School of Computer Science, Wuhan University, Wuhan 430072, Hubei ,China
| | - Jike Wang
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
- CarbonSilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang ,China
| | - Gaoqi Weng
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
| | - Yu Kang
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
| | - Peichen Pan
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
| | - Dan Li
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang ,China
| | - Xiaojun Yao
- Dr. Neher's Biophysics Laboratory for Innovative Drug Discovery Macau Institute for Applied Research in Medicine and Health State Key Laboratory of Quality Research in Chinese Medicine, Macau University of Science and Technology, Taipa, Macau 999078, China
| | - Zhitong Bing
- Institute of Modern Physics, Chinese Academy of Sciences, Lanzhou, Gansu 730000, China
| | - Chang-Yu Hsieh
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
| | - Tingjun Hou
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
| |
Collapse
|
22
|
Zhang H, Huang J, Xie J, Huang W, Yang Y, Xu M, Lei J, Chen H. GRELinker: A Graph-Based Generative Model for Molecular Linker Design with Reinforcement and Curriculum Learning. J Chem Inf Model 2024; 64:666-676. [PMID: 38241022 DOI: 10.1021/acs.jcim.3c01700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/13/2024]
Abstract
Fragment-based drug discovery (FBDD) is widely used in drug design. One useful strategy in FBDD is designing linkers for linking fragments to optimize their molecular properties. In the current study, we present a novel generative fragment linking model, GRELinker, which utilizes a gated-graph neural network combined with reinforcement and curriculum learning to generate molecules with desirable attributes. The model has been shown to be efficient in multiple tasks, including controlling log P, optimizing synthesizability or predicted bioactivity of compounds, and generating molecules with high 3D similarity but low 2D similarity to the lead compound. Specifically, our model outperforms the previously reported reinforcement learning (RL) built-in method DRlinker on these benchmark tasks. Moreover, GRELinker has been successfully used in an actual FBDD case to generate optimized molecules with enhanced affinities by employing the docking score as the scoring function in RL. Besides, the implementation of curriculum learning in our framework enables the generation of structurally complex linkers more efficiently. These results demonstrate the benefits and feasibility of GRELinker in linker design for molecular optimization and drug discovery.
Collapse
Affiliation(s)
- Hao Zhang
- School of Pharmaceutical Science, Sun Yat-sen University, Guangzhou 510006, China
| | - Jinchao Huang
- School of Pharmaceutical Science, Sun Yat-sen University, Guangzhou 510006, China
| | - Junjie Xie
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510006, China
| | - Weifeng Huang
- School of Pharmaceutical Science, Sun Yat-sen University, Guangzhou 510006, China
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510006, China
| | - Mingyuan Xu
- Guangzhou National Laboratory, Guangzhou International Bio Island, No. 9 Xin Dao Huan Bei Road, Guangzhou 510005, China
| | - Jinping Lei
- School of Pharmaceutical Science, Sun Yat-sen University, Guangzhou 510006, China
| | - Hongming Chen
- Guangzhou National Laboratory, Guangzhou International Bio Island, No. 9 Xin Dao Huan Bei Road, Guangzhou 510005, China
| |
Collapse
|
23
|
Lv Q, Zhou F, Liu X, Zhi L. Artificial intelligence in small molecule drug discovery from 2018 to 2023: Does it really work? Bioorg Chem 2023; 141:106894. [PMID: 37776682 DOI: 10.1016/j.bioorg.2023.106894] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 09/24/2023] [Accepted: 09/25/2023] [Indexed: 10/02/2023]
Abstract
Utilizing artificial intelligence (AI) in drug design represents an advanced approach for identifying targets and developing new drugs. Integrating AI techniques significantly reduces the workload involved in drug development and enhances the efficiency of early-stage drug discovery. This review aims to present a comprehensive overview of the utilization of AI methods in the field of small drug design, with a specific focus on four key areas: protein structure prediction, molecular virtual screening, molecular design, and absorption, distribution, metabolism, excretion, and toxicity (ADMET) prediction. Additionally, the role and limitations of AI in drug development are explored, and the impact of AI on decision-making processes is studied. It is important to note that while AI can bring numerous benefits to the early stage of drug development, the direction and quality of decision-making should still be emphasized, as AI should be considered as a tool rather than a decisive factor.
Collapse
Affiliation(s)
- Qi Lv
- School of Pharmacy, Inflammation and Immune Mediated Diseases Laboratory of Anhui Province, Hefei 230032, PR China
| | - Feilong Zhou
- School of Pharmacy, Inflammation and Immune Mediated Diseases Laboratory of Anhui Province, Hefei 230032, PR China
| | - Xinhua Liu
- School of Pharmacy, Inflammation and Immune Mediated Diseases Laboratory of Anhui Province, Hefei 230032, PR China.
| | - Liping Zhi
- School of Health Management, Anhui Medical University Hefei, 230032, PR China.
| |
Collapse
|
24
|
Du H, Jiang D, Zhang O, Wu Z, Gao J, Zhang X, Wang X, Deng Y, Kang Y, Li D, Pan P, Hsieh CY, Hou T. A flexible data-free framework for structure-based de novo drug design with reinforcement learning. Chem Sci 2023; 14:12166-12181. [PMID: 37969589 PMCID: PMC10631243 DOI: 10.1039/d3sc04091g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Accepted: 10/11/2023] [Indexed: 11/17/2023] Open
Abstract
Contemporary structure-based molecular generative methods have demonstrated their potential to model the geometric and energetic complementarity between ligands and receptors, thereby facilitating the design of molecules with favorable binding affinity and target specificity. Despite the introduction of deep generative models for molecular generation, the atom-wise generation paradigm that partially contradicts chemical intuition limits the validity and synthetic accessibility of the generated molecules. Additionally, the dependence of deep learning models on large-scale structural data has hindered their adaptability across different targets. To overcome these challenges, we present a novel search-based framework, 3D-MCTS, for structure-based de novo drug design. Distinct from prevailing atom-centric methods, 3D-MCTS employs a fragment-based molecular editing strategy. The fragments decomposed from small-molecule drugs are recombined under predefined retrosynthetic rules, offering improved drug-likeness and synthesizability, overcoming the inherent limitations of atom-based approaches. Leveraging multi-threaded parallel simulations combined with a real-time energy constraint-based pruning strategy, 3D-MCTS achieves remarkable efficiency. At a fixed computational cost, it outperforms other state-of-the-art (SOTA) methods by producing molecules with enhanced binding affinity. Furthermore, its fragment-based approach ensures the generation of more dependable binding conformations, exhibiting a success rate 43.6% higher than that of other SOTAs. This advantage becomes even more pronounced when handling targets that significantly deviate from the training dataset. 3D-MCTS is capable of achieving thirty times more hits with high binding affinity than traditional virtual screening methods, which demonstrates the superior ability of 3D-MCTS to explore chemical space. Moreover, the flexibility of our framework makes it easy to incorporate domain knowledge during the process, thereby enabling the generation of molecules with desirable pharmacophores and enhanced binding affinity. The adaptability of 3D-MCTS is further showcased in metalloprotein applications, highlighting its potential across various drug design scenarios.
Collapse
Affiliation(s)
- Hongyan Du
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Dejun Jiang
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Odin Zhang
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Zhenxing Wu
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Junbo Gao
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Xujun Zhang
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Xiaorui Wang
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
- Dr. Neher's Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology Macao 999078 China
| | - Yafeng Deng
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Yu Kang
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Dan Li
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Peichen Pan
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Chang-Yu Hsieh
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| |
Collapse
|
25
|
Zhu H, Zhou R, Cao D, Tang J, Li M. A pharmacophore-guided deep learning approach for bioactive molecular generation. Nat Commun 2023; 14:6234. [PMID: 37803000 PMCID: PMC10558534 DOI: 10.1038/s41467-023-41454-9] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Accepted: 08/30/2023] [Indexed: 10/08/2023] Open
Abstract
The rational design of novel molecules with the desired bioactivity is a critical but challenging task in drug discovery, especially when treating a novel target family or understudied targets. We propose a Pharmacophore-Guided deep learning approach for bioactive Molecule Generation (PGMG). Through the guidance of pharmacophore, PGMG provides a flexible strategy for generating bioactive molecules. PGMG uses a graph neural network to encode spatially distributed chemical features and a transformer decoder to generate molecules. A latent variable is introduced to solve the many-to-many mapping between pharmacophores and molecules to improve the diversity of the generated molecules. Compared to existing methods, PGMG generates molecules with strong docking affinities and high scores of validity, uniqueness, and novelty. In the case studies, we use PGMG in a ligand-based and structure-based drug de novo design. Overall, the flexibility and effectiveness make PGMG a useful tool to accelerate the drug discovery process.
Collapse
Affiliation(s)
- Huimin Zhu
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| | - Renyi Zhou
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410008, China
| | - Jing Tang
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki, 00290, Finland
- Department of Biochemistry and Developmental Biology, Faculty of Medicine, University of Helsinki, Helsinki, 00290, Finland
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China.
| |
Collapse
|
26
|
Hagg A, Kirschner KN. Open-Source Machine Learning in Computational Chemistry. J Chem Inf Model 2023; 63:4505-4532. [PMID: 37466636 PMCID: PMC10430767 DOI: 10.1021/acs.jcim.3c00643] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Indexed: 07/20/2023]
Abstract
The field of computational chemistry has seen a significant increase in the integration of machine learning concepts and algorithms. In this Perspective, we surveyed 179 open-source software projects, with corresponding peer-reviewed papers published within the last 5 years, to better understand the topics within the field being investigated by machine learning approaches. For each project, we provide a short description, the link to the code, the accompanying license type, and whether the training data and resulting models are made publicly available. Based on those deposited in GitHub repositories, the most popular employed Python libraries are identified. We hope that this survey will serve as a resource to learn about machine learning or specific architectures thereof by identifying accessible codes with accompanying papers on a topic basis. To this end, we also include computational chemistry open-source software for generating training data and fundamental Python libraries for machine learning. Based on our observations and considering the three pillars of collaborative machine learning work, open data, open source (code), and open models, we provide some suggestions to the community.
Collapse
Affiliation(s)
- Alexander Hagg
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Electrical Engineering, Mechanical Engineering and Technical Journalism, University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| | - Karl N. Kirschner
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Computer Science, University of Applied
Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| |
Collapse
|
27
|
Jin J, Wang D, Shi G, Bao J, Wang J, Zhang H, Pan P, Li D, Yao X, Liu H, Hou T, Kang Y. FFLOM: A Flow-Based Autoregressive Model for Fragment-to-Lead Optimization. J Med Chem 2023; 66:10808-10823. [PMID: 37471134 DOI: 10.1021/acs.jmedchem.3c01009] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/21/2023]
Abstract
Recently, deep generative models have been regarded as promising tools in fragment-based drug design (FBDD). Despite the growing interest in these models, they still face challenges in generating molecules with desired properties in low data regimes. In this study, we propose a novel flow-based autoregressive model named FFLOM for linker and R-group design. In a large-scale benchmark evaluation on ZINC, CASF, and PDBbind test sets, FFLOM achieves state-of-the-art performance in terms of validity, uniqueness, novelty, and recovery of the generated molecules and can recover over 92% of the original molecules in the PDBbind test set (with at least five atoms). FFLOM also exhibits excellent potential applicability in several practical scenarios encompassing fragment linking, PROTAC design, R-group growing, and R-group optimization. In all four cases, FFLOM can perfectly reconstruct the ground-truth compounds and generate over 74% of molecules with novel fragments, some of which have higher binding affinity than the ground truth.
Collapse
Affiliation(s)
- Jieyu Jin
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Dong Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Guqin Shi
- Shanghai Qilu Pharmaceutical R&D Center, 576 Libing Road, Pudong New Area District, Shanghai 310115, China
| | - Jingxiao Bao
- Shanghai Qilu Pharmaceutical R&D Center, 576 Libing Road, Pudong New Area District, Shanghai 310115, China
| | - Jike Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Haotian Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Dan Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Xiaojun Yao
- State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Macau 999078, China
| | - Huanxiang Liu
- Faculty of Applied Science, Macao Polytechnic University, Macau 999078, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| |
Collapse
|
28
|
Baillif B, Cole J, McCabe P, Bender A. Deep generative models for 3D molecular structure. Curr Opin Struct Biol 2023; 80:102566. [DOI: 10.1016/j.sbi.2023.102566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 02/05/2023] [Accepted: 02/15/2023] [Indexed: 03/30/2023]
|
29
|
Wills S, Sanchez-Garcia R, Dudgeon T, Roughley SD, Merritt A, Hubbard RE, Davidson J, von Delft F, Deane CM. Fragment Merging Using a Graph Database Samples Different Catalogue Space than Similarity Search. J Chem Inf Model 2023. [PMID: 37229647 DOI: 10.1021/acs.jcim.3c00276] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Fragment merging is a promising approach to progressing fragments directly to on-scale potency: each designed compound incorporates the structural motifs of overlapping fragments in a way that ensures compounds recapitulate multiple high-quality interactions. Searching commercial catalogues provides one useful way to quickly and cheaply identify such merges and circumvents the challenge of synthetic accessibility, provided they can be readily identified. Here, we demonstrate that the Fragment Network, a graph database that provides a novel way to explore the chemical space surrounding fragment hits, is well-suited to this challenge. We use an iteration of the database containing >120 million catalogue compounds to find fragment merges for four crystallographic screening campaigns and contrast the results with a traditional fingerprint-based similarity search. The two approaches identify complementary sets of merges that recapitulate the observed fragment-protein interactions but lie in different regions of chemical space. We further show our methodology is an effective route to achieving on-scale potency by retrospective analyses for two different targets; in analyses of public COVID Moonshot and Mycobacterium tuberculosis EthR inhibitors, potential inhibitors with micromolar IC50 values were identified. This work demonstrates the use of the Fragment Network to increase the yield of fragment merges beyond that of a classical catalogue search.
Collapse
Affiliation(s)
- Stephanie Wills
- Department of Statistics, University of Oxford, Oxford OX1 3LB, United Kingdom
- Centre for Medicines Discovery, University of Oxford, Oxford OX3 7DQ, United Kingdom
| | - Ruben Sanchez-Garcia
- Department of Statistics, University of Oxford, Oxford OX1 3LB, United Kingdom
- Centre for Medicines Discovery, University of Oxford, Oxford OX3 7DQ, United Kingdom
| | - Tim Dudgeon
- Informatics Matters, Ltd., Perch Coworking, Franklins House, Bicester OX26 6JU, United Kingdom
| | - Stephen D Roughley
- Vernalis (R&D) Limited, Granta Park, Great Abington, Cambridge CB21 6GB, United Kingdom
| | - Andy Merritt
- LifeArc, Lynton House, 7-12 Tavistock Square, London WC1H 9LT, United Kingdom
| | - Roderick E Hubbard
- Vernalis (R&D) Limited, Granta Park, Great Abington, Cambridge CB21 6GB, United Kingdom
| | - James Davidson
- Vernalis (R&D) Limited, Granta Park, Great Abington, Cambridge CB21 6GB, United Kingdom
| | - Frank von Delft
- Centre for Medicines Discovery, University of Oxford, Oxford OX3 7DQ, United Kingdom
- Diamond Light Source, Didcot OX11 0DE, United Kingdom
- Research Complex at Harwell, Harwell Science and Innovation Campus, Didcot OX11 0FA, United Kingdom
- Department of Biochemistry, University of Johannesburg, Auckland Park, Johannesburg 2006, South Africa
| | - Charlotte M Deane
- Department of Statistics, University of Oxford, Oxford OX1 3LB, United Kingdom
| |
Collapse
|
30
|
Yang Y, Hsieh CY, Kang Y, Hou T, Liu H, Yao X. Deep Generation Model Guided by the Docking Score for Active Molecular Design. J Chem Inf Model 2023; 63:2983-2991. [PMID: 37163364 DOI: 10.1021/acs.jcim.3c00572] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
A deep generation model, as a novel drug design and discovery tool, shows obvious advantages in generating compounds with novel backbones and has been applied successfully in the field of drug discovery. However, it is still a challenge to generate molecules with expected properties, especially high activity. Here, to obtain compounds both with novelty and high activity to a target, we proposed a conditional molecular generation model COMG by considering the docking score and 3D pharmacophore matching during molecular generation. The proposed model was based on the conditional variational autoencoder architecture constrained by the pharmacophore matching score. During Bayesian optimization, the docking score was applied to enhance the target relevance of generated compounds. Furthermore, to overcome the problem of high structural similarity caused by Bayesian optimization, the idea of the scaffold memory unit was also introduced. The evaluation results of COMG show that our model not only can improve the structural diversity of generated molecules but also can effectively improve the proportion of target-related drug-active molecules. The obtained results indicate that our proposed model COMG is a useful drug design tool.
Collapse
Affiliation(s)
- Yuwei Yang
- Faculty of Applied Sciences, Macao Polytechnic University, Macao (SAR) 999078, P. R. China
- School of Pharmacy, Lanzhou University, Lanzhou 730000, Gansu, P. R. China
| | - Chang-Yu Hsieh
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, P. R. China
| | - Yu Kang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, P. R. China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, P. R. China
| | - Huanxiang Liu
- Faculty of Applied Sciences, Macao Polytechnic University, Macao (SAR) 999078, P. R. China
| | - Xiaojun Yao
- State Key Laboratory of Quality Research in Chinese Medicine, Macau University of Science and Technology, Taipa, 999078 Macau (SAR), P. R. China
| |
Collapse
|
31
|
Tysinger EP, Rai BK, Sinitskiy AV. Can We Quickly Learn to "Translate" Bioactive Molecules with Transformer Models? J Chem Inf Model 2023; 63:1734-1744. [PMID: 36914216 DOI: 10.1021/acs.jcim.2c01618] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/16/2023]
Abstract
Meaningful exploration of the chemical space of druglike molecules in drug design is a highly challenging task due to a combinatorial explosion of possible modifications of molecules. In this work, we address this problem with transformer models, a type of machine learning (ML) model originally developed for machine translation. By training transformer models on pairs of similar bioactive molecules from the public ChEMBL data set, we enable them to learn medicinal-chemistry-meaningful, context-dependent transformations of molecules, including those absent from the training set. By retrospective analysis on the performance of transformer models on ChEMBL subsets of ligands binding to COX2, DRD2, or HERG protein targets, we demonstrate that the models can generate structures identical or highly similar to most active ligands, despite the models having not seen any ligands active against the corresponding protein target during training. Our work demonstrates that human experts working on hit expansion in drug design can easily and quickly employ transformer models, originally developed to translate texts from one natural language to another, to "translate" from known molecules active against a given protein target to novel molecules active against the same target.
Collapse
Affiliation(s)
- Emma P Tysinger
- Machine Learning and Computational Sciences, Pfizer Worldwide Research, Development, and Medical, 610 Main Street, Cambridge, Massachusetts 02139, United States
| | - Brajesh K Rai
- Machine Learning and Computational Sciences, Pfizer Worldwide Research, Development, and Medical, 610 Main Street, Cambridge, Massachusetts 02139, United States
| | - Anton V Sinitskiy
- Machine Learning and Computational Sciences, Pfizer Worldwide Research, Development, and Medical, 610 Main Street, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
32
|
Yu Y, Huang J, He H, Han J, Ye G, Xu T, Sun X, Chen X, Ren X, Li C, Li H, Huang W, Liu Y, Wang X, Gao Y, Cheng N, Guo N, Chen X, Feng J, Hua Y, Liu C, Zhu G, Xie Z, Yao L, Zhong W, Chen X, Liu W, Li H. Accelerated Discovery of Macrocyclic CDK2 Inhibitor QR-6401 by Generative Models and Structure-Based Drug Design. ACS Med Chem Lett 2023; 14:297-304. [PMID: 36923916 PMCID: PMC10009793 DOI: 10.1021/acsmedchemlett.2c00515] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Accepted: 01/19/2023] [Indexed: 02/11/2023] Open
Abstract
Selective CDK2 inhibitors have the potential to provide effective therapeutics for CDK2-dependent cancers and for combating drug resistance due to high cyclin E1 (CCNE1) expression intrinsically or CCNE1 amplification induced by treatment of CDK4/6 inhibitors. Generative models that take advantage of deep learning are being increasingly integrated into early drug discovery for hit identification and lead optimization. Here we report the discovery of a highly potent and selective macrocyclic CDK2 inhibitor QR-6401 (23) accelerated by the application of generative models and structure-based drug design (SBDD). QR-6401 (23) demonstrated robust antitumor efficacy in an OVCAR3 ovarian cancer xenograft model via oral administration.
Collapse
Affiliation(s)
- Yang Yu
- Tencent
AI Lab, Tencent, Shenzhen 518057, China
| | | | - Hu He
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Jing Han
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Geyan Ye
- Tencent
AI Lab, Tencent, Shenzhen 518057, China
| | - Tingyang Xu
- Tencent
AI Lab, Tencent, Shenzhen 518057, China
| | | | - Xiumei Chen
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Xiaoming Ren
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Chunlai Li
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Huijuan Li
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Wei Huang
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Yangyang Liu
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Xinjuan Wang
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Yongzhi Gao
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Nianhe Cheng
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Na Guo
- BioDuro-Sundia, Shanghai, 200131, China
| | - Xibo Chen
- BioDuro-Sundia, Shanghai, 200131, China
| | | | - Yuxia Hua
- BioDuro-Sundia, Beijing, 102200, China
| | - Chong Liu
- BioDuro-Sundia, Beijing, 102200, China
| | - Guoyun Zhu
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Zhi Xie
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Lili Yao
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Wenge Zhong
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Xinde Chen
- Tencent
AI Lab, Tencent, Shenzhen 518057, China
| | - Wei Liu
- Tencent
AI Lab, Tencent, Shenzhen 518057, China
| | - Hailong Li
- Regor
Therapeutics Group, Shanghai, 201210, China
| |
Collapse
|
33
|
Danel T, Łęski J, Podlewska S, Podolak IT. Docking-based generative approaches in the search for new drug candidates. Drug Discov Today 2023; 28:103439. [PMID: 36372330 DOI: 10.1016/j.drudis.2022.103439] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 10/08/2022] [Accepted: 11/08/2022] [Indexed: 11/13/2022]
Abstract
Despite the popularity of virtual screening (VS) of existing compound libraries, the search for new potential drug candidates also takes advantage of generative protocols, where new compound suggestions are enumerated using various algorithms. To increase the activity potency of generative approaches, they have recently been coupled with molecular docking, a leading methodology of structure-based drug design (SBDD). In this review, we summarize progress since docking-based generative models emerged. We propose a new taxonomy for these methods and discuss their importance for the field of computer-aided drug design (CADD). In addition, we discuss the most promising directions for the further development of generative protocols coupled with docking.
Collapse
Affiliation(s)
- Tomasz Danel
- Faculty of Mathematics and Computer Science, Jagiellonian University, 6 Łojasiewicza Street, 30-348 Kraków, Poland.
| | - Jan Łęski
- Faculty of Mathematics and Computer Science, Jagiellonian University, 6 Łojasiewicza Street, 30-348 Kraków, Poland
| | - Sabina Podlewska
- Maj Institute of Pharmacology, Polish Academy of Sciences, Department of Medicinal Chemistry, 31-343 Kraków, Smętna Street 12, Poland
| | - Igor T Podolak
- Faculty of Mathematics and Computer Science, Jagiellonian University, 6 Łojasiewicza Street, 30-348 Kraków, Poland
| |
Collapse
|
34
|
Zeng X, Wang F, Luo Y, Kang SG, Tang J, Lightstone FC, Fang EF, Cornell W, Nussinov R, Cheng F. Deep generative molecular design reshapes drug discovery. Cell Rep Med 2022; 3:100794. [PMID: 36306797 PMCID: PMC9797947 DOI: 10.1016/j.xcrm.2022.100794] [Citation(s) in RCA: 70] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Revised: 08/05/2022] [Accepted: 09/30/2022] [Indexed: 11/05/2022]
Abstract
Recent advances and accomplishments of artificial intelligence (AI) and deep generative models have established their usefulness in medicinal applications, especially in drug discovery and development. To correctly apply AI, the developer and user face questions such as which protocols to consider, which factors to scrutinize, and how the deep generative models can integrate the relevant disciplines. This review summarizes classical and newly developed AI approaches, providing an updated and accessible guide to the broad computational drug discovery and development community. We introduce deep generative models from different standpoints and describe the theoretical frameworks for representing chemical and biological structures and their applications. We discuss the data and technical challenges and highlight future directions of multimodal deep generative models for accelerating drug discovery.
Collapse
Affiliation(s)
- Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, P.R. China
| | - Fei Wang
- Department of Population Health Sciences, Weill Cornell Medical College, Cornell University, New York, NY 10065, USA
| | - Yuan Luo
- Division of Health and Biomedical Informatics, Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Seung-Gu Kang
- Healthcare & Life Sciences Research, IBM TJ Watson Research Center, 1101 Kitchawan Road, Yorktown Heights, NY 10598, USA
| | - Jian Tang
- Mila-Quebec Institute for Learning Algorithms and CIFAR AI Research Chair, HEC Montreal, Montréal, QC H3T 2A7, Canada
| | - Felice C Lightstone
- Biosciences and Biotechnology Division, Physical and Life Sciences Directorate, Lawrence Livermore National Lab, Livermore, CA 94550, USA
| | - Evandro F Fang
- Department of Clinical Molecular Biology, University of Oslo and Akershus University Hospital, 1478 Lørenskog, Oslo, Norway; The Norwegian Centre on Healthy Ageing (NO-Age), Oslo, Norway
| | - Wendy Cornell
- Healthcare & Life Sciences Research, IBM TJ Watson Research Center, 1101 Kitchawan Road, Yorktown Heights, NY 10598, USA
| | - Ruth Nussinov
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research in the Laboratory of Cancer Immunometabolism, National Cancer Institute, Frederick, MD 21702, USA; Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Feixiong Cheng
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA; Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH 44195, USA; Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA.
| |
Collapse
|
35
|
Wang M, Wang J, Weng G, Kang Y, Pan P, Li D, Deng Y, Li H, Hsieh CY, Hou T. ReMODE: a deep learning-based web server for target-specific drug design. J Cheminform 2022; 14:84. [PMID: 36510307 PMCID: PMC9743675 DOI: 10.1186/s13321-022-00665-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 12/01/2022] [Indexed: 12/14/2022] Open
Abstract
Deep learning (DL) and machine learning contribute significantly to basic biology research and drug discovery in the past few decades. Recent advances in DL-based generative models have led to superior developments in de novo drug design. However, data availability, deep data processing, and the lack of user-friendly DL tools and interfaces make it difficult to apply these DL techniques to drug design. We hereby present ReMODE (Receptor-based MOlecular DEsign), a new web server based on DL algorithm for target-specific ligand design, which integrates different functional modules to enable users to develop customizable drug design tasks. As designed, the ReMODE sever can construct the target-specific tasks toward the protein targets selected by users. Meanwhile, the server also provides some extensions: users can optimize the drug-likeness or synthetic accessibility of the generated molecules, and control other physicochemical properties; users can also choose a sub-structure/scaffold as a starting point for fragment-based drug design. The ReMODE server also enables users to optimize the pharmacophore matching and docking conformations of the generated molecules. We believe that the ReMODE server will benefit researchers for drug discovery. ReMODE is publicly available at http://cadd.zju.edu.cn/relation/remode/ .
Collapse
Affiliation(s)
- Mingyang Wang
- grid.13402.340000 0004 1759 700XInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, 310058 Zhejiang People’s Republic of China ,CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018 Zhejiang People’s Republic of China
| | - Jike Wang
- grid.13402.340000 0004 1759 700XInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, 310058 Zhejiang People’s Republic of China ,CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018 Zhejiang People’s Republic of China
| | - Gaoqi Weng
- grid.13402.340000 0004 1759 700XInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, 310058 Zhejiang People’s Republic of China ,CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018 Zhejiang People’s Republic of China
| | - Yu Kang
- grid.13402.340000 0004 1759 700XInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, 310058 Zhejiang People’s Republic of China
| | - Peichen Pan
- grid.13402.340000 0004 1759 700XInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, 310058 Zhejiang People’s Republic of China
| | - Dan Li
- grid.13402.340000 0004 1759 700XInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, 310058 Zhejiang People’s Republic of China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018 Zhejiang People’s Republic of China
| | - Honglin Li
- grid.28056.390000 0001 2163 4895Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science & Technology, Shanghai, 200237 People’s Republic of China
| | - Chang-Yu Hsieh
- grid.13402.340000 0004 1759 700XInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, 310058 Zhejiang People’s Republic of China
| | - Tingjun Hou
- grid.13402.340000 0004 1759 700XInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, 310058 Zhejiang People’s Republic of China
| |
Collapse
|
36
|
Chan L, Kumar R, Verdonk M, Poelking C. A multilevel generative framework with hierarchical self-contrasting for bias control and transparency in structure-based ligand design. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00564-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
37
|
Bieniek MK, Cree B, Pirie R, Horton JT, Tatum NJ, Cole DJ. An open-source molecular builder and free energy preparation workflow. Commun Chem 2022; 5:136. [PMID: 36320862 PMCID: PMC9607723 DOI: 10.1038/s42004-022-00754-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Accepted: 10/11/2022] [Indexed: 01/27/2023] Open
Abstract
Automated free energy calculations for the prediction of binding free energies of congeneric series of ligands to a protein target are growing in popularity, but building reliable initial binding poses for the ligands is challenging. Here, we introduce the open-source FEgrow workflow for building user-defined congeneric series of ligands in protein binding pockets for input to free energy calculations. For a given ligand core and receptor structure, FEgrow enumerates and optimises the bioactive conformations of the grown functional group(s), making use of hybrid machine learning/molecular mechanics potential energy functions where possible. Low energy structures are optionally scored using the gnina convolutional neural network scoring function, and output for more rigorous protein-ligand binding free energy predictions. We illustrate use of the workflow by building and scoring binding poses for ten congeneric series of ligands bound to targets from a standard, high quality dataset of protein-ligand complexes. Furthermore, we build a set of 13 inhibitors of the SARS-CoV-2 main protease from the literature, and use free energy calculations to retrospectively compute their relative binding free energies. FEgrow is freely available at https://github.com/cole-group/FEgrow, along with a tutorial.
Collapse
Affiliation(s)
- Mateusz K. Bieniek
- School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, NE1 7RU UK
| | - Ben Cree
- School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, NE1 7RU UK
| | - Rachael Pirie
- School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, NE1 7RU UK
| | - Joshua T. Horton
- School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, NE1 7RU UK
| | - Natalie J. Tatum
- Newcastle University Centre for Cancer, Translational and Clinical Research Institute, Newcastle University, Newcastle upon Tyne, NE2 4HH UK
| | - Daniel J. Cole
- School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, NE1 7RU UK
| |
Collapse
|
38
|
Eguida M, Schmitt-Valencia C, Hibert M, Villa P, Rognan D. Target-Focused Library Design by Pocket-Applied Computer Vision and Fragment Deep Generative Linking. J Med Chem 2022; 65:13771-13783. [PMID: 36256484 DOI: 10.1021/acs.jmedchem.2c00931] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
We here describe a computational approach (POEM: Pocket Oriented Elaboration of Molecules) to drive the generation of target-focused libraries while taking advantage of all publicly available structural information on protein-ligand complexes. A collection of 31 384 PDB-derived images with key shapes and pharmacophoric properties, describing fragment-bound microenvironments, is first aligned to the query target cavity by a computer vision method. The fragments of the most similar PDB subpockets are then directly positioned in the query cavity using the corresponding image transformation matrices. Lastly, suitable connectable atoms of oriented fragment pairs are linked by a deep generative model to yield fully connected molecules. POEM was applied to generate a library of 1.5 million potential cyclin-dependent kinase 8 inhibitors. By synthesizing and testing as few as 43 compounds, a few nanomolar inhibitors were quickly obtained with limited resources in just two iterative cycles.
Collapse
Affiliation(s)
- Merveille Eguida
- Laboratoire d'Innovation Thérapeutique, UMR7200 CNRS-Université de Strasbourg, F-67400Illkirch, France
| | - Christel Schmitt-Valencia
- Plateforme de Chimie Biologique Intégrative de Strasbourg, UAR3286 CNRS-Université de Strasbourg, Institut du Médicament de Strasbourg, F-67400Illkirch, France
| | - Marcel Hibert
- Laboratoire d'Innovation Thérapeutique, UMR7200 CNRS-Université de Strasbourg, F-67400Illkirch, France
| | - Pascal Villa
- Plateforme de Chimie Biologique Intégrative de Strasbourg, UAR3286 CNRS-Université de Strasbourg, Institut du Médicament de Strasbourg, F-67400Illkirch, France
| | - Didier Rognan
- Laboratoire d'Innovation Thérapeutique, UMR7200 CNRS-Université de Strasbourg, F-67400Illkirch, France
| |
Collapse
|
39
|
Wang A, Durrant JD. Open-Source Browser-Based Tools for Structure-Based Computer-Aided Drug Discovery. Molecules 2022; 27:4623. [PMID: 35889494 PMCID: PMC9319651 DOI: 10.3390/molecules27144623] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Revised: 07/17/2022] [Accepted: 07/18/2022] [Indexed: 01/27/2023] Open
Abstract
We here outline the importance of open-source, accessible tools for computer-aided drug discovery (CADD). We begin with a discussion of drug discovery in general to provide context for a subsequent discussion of structure-based CADD applied to small-molecule ligand discovery. Next, we identify usability challenges common to many open-source CADD tools. To address these challenges, we propose a browser-based approach to CADD tool deployment in which CADD calculations run in modern web browsers on users' local computers. The browser app approach eliminates the need for user-initiated download and installation, ensures broad operating system compatibility, enables easy updates, and provides a user-friendly graphical user interface. Unlike server apps-which run calculations "in the cloud" rather than on users' local computers-browser apps do not require users to upload proprietary information to a third-party (remote) server. They also eliminate the need for the difficult-to-maintain computer infrastructure required to run user-initiated calculations remotely. We conclude by describing some CADD browser apps developed in our lab, which illustrate the utility of this approach. Aside from introducing readers to these specific tools, we are hopeful that this review highlights the need for additional browser-compatible, user-friendly CADD software.
Collapse
Affiliation(s)
| | - Jacob D. Durrant
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260, USA;
| |
Collapse
|
40
|
Wang M, Hsieh CY, Wang J, Wang D, Weng G, Shen C, Yao X, Bing Z, Li H, Cao D, Hou T. RELATION: A Deep Generative Model for Structure-Based De Novo Drug Design. J Med Chem 2022; 65:9478-9492. [PMID: 35713420 DOI: 10.1021/acs.jmedchem.2c00732] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Deep learning (DL)-based de novo molecular design has recently gained considerable traction. Many DL-based generative models have been successfully developed to design novel molecules, but most of them are ligand-centric and the role of the 3D geometries of target binding pockets in molecular generation has not been well-exploited. Here, we proposed a new 3D-based generative model called RELATION. In the RELATION model, the BiTL algorithm was specifically designed to extract and transfer the desired geometric features of the protein-ligand complexes to a latent space for generation. The pharmacophore conditioning and docking-based Bayesian sampling were applied to efficiently navigate the vast chemical space for the design of molecules with desired geometric properties and pharmacophore features. As a proof of concept, the RELATION model was used to design inhibitors for two targets, AKT1 and CDK2. The calculation results demonstrated that the RELATION model could efficiently generate novel molecules with favorable binding affinity and pharmacophore features.
Collapse
Affiliation(s)
- Mingyang Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Chang-Yu Hsieh
- Tencent, Tencent Quantum Lab, Shenzhen 518057, Guangdong, P. R. China
| | - Jike Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Dong Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Gaoqi Weng
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Xiaojun Yao
- Dr. Neher's Biophysics Laboratory for Innovative Drug Discovery Macau Institute for Applied Research in Medicine and Health State Key Laboratory of Quality Research in Chinese Medicine, Macau University of Science and Technology, Taipa 999078, Macau, P. R. China
| | - Zhitong Bing
- Institute of Modern Physics, Chinese Academy of Sciences, Lanzhou 730000, P. R. China
| | - Honglin Li
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science & Technology, Shanghai 200237, P. R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| |
Collapse
|
41
|
Xie W, Wang F, Li Y, Lai L, Pei J. Advances and Challenges in De Novo Drug Design Using Three-Dimensional Deep Generative Models. J Chem Inf Model 2022; 62:2269-2279. [PMID: 35544331 DOI: 10.1021/acs.jcim.2c00042] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
A persistent goal for de novo drug design is to generate novel chemical compounds with desirable properties in a labor-, time-, and cost-efficient manner. Deep generative models provide alternative routes to this goal. Numerous model architectures and optimization strategies have been explored in recent years, most of which have been developed to generate two-dimensional molecular structures. Some generative models aiming at three-dimensional (3D) molecule generation have also been proposed, gaining attention for their unique advantages and potential to directly design drug-like molecules in a target-conditioning manner. This review highlights current developments in 3D molecular generative models combined with deep learning and discusses future directions for de novo drug design.
Collapse
Affiliation(s)
- Weixin Xie
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Fanhao Wang
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Yibo Li
- Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Luhua Lai
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China.,Peking-Tsinghua Center for Life Science at BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Jianfeng Pei
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| |
Collapse
|
42
|
Hadfield TE, Imrie F, Merritt A, Birchall K, Deane CM. Incorporating Target-Specific Pharmacophoric Information into Deep Generative Models for Fragment Elaboration. J Chem Inf Model 2022; 62:2280-2292. [PMID: 35499971 PMCID: PMC9131447 DOI: 10.1021/acs.jcim.1c01311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Despite recent interest in deep generative models for scaffold elaboration, their applicability to fragment-to-lead campaigns has so far been limited. This is primarily due to their inability to account for local protein structure or a user's design hypothesis. We propose a novel method for fragment elaboration, STRIFE, that overcomes these issues. STRIFE takes as input fragment hotspot maps (FHMs) extracted from a protein target and processes them to provide meaningful and interpretable structural information to its generative model, which in turn is able to rapidly generate elaborations with complementary pharmacophores to the protein. In a large-scale evaluation, STRIFE outperforms existing, structure-unaware, fragment elaboration methods in proposing highly ligand-efficient elaborations. In addition to automatically extracting pharmacophoric information from a protein target's FHM, STRIFE optionally allows the user to specify their own design hypotheses.
Collapse
Affiliation(s)
- Thomas E Hadfield
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford OX1 3LB, United Kingdom
| | - Fergus Imrie
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford OX1 3LB, United Kingdom
| | - Andy Merritt
- LifeArc, SBC Open Innovation Campus, Stevenage SG1 2FX, United Kingdom
| | - Kristian Birchall
- LifeArc, SBC Open Innovation Campus, Stevenage SG1 2FX, United Kingdom
| | - Charlotte M Deane
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford OX1 3LB, United Kingdom
| |
Collapse
|
43
|
Halder AK, Moura AS, Cordeiro MNDS. Moving Average-Based Multitasking In Silico Classification Modeling: Where Do We Stand and What Is Next? Int J Mol Sci 2022; 23:ijms23094937. [PMID: 35563327 PMCID: PMC9099502 DOI: 10.3390/ijms23094937] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 04/24/2022] [Accepted: 04/28/2022] [Indexed: 01/27/2023] Open
Abstract
Conventional in silico modeling is often viewed as 'one-target' or 'single-task' computer-aided modeling since it mainly relies on forecasting an endpoint of interest from similar input data. Multitasking or multitarget in silico modeling, in contrast, embraces a set of computational techniques that efficiently integrate multiple types of input data for setting up unique in silico models able to predict the outcome(s) relating to various experimental and/or theoretical conditions. The latter, specifically, based upon the Box-Jenkins moving average approach, has been applied in the last decade to several research fields including drug and materials design, environmental sciences, and nanotechnology. The present review discusses the current status of multitasking computer-aided modeling efforts, meanwhile describing both the existing challenges and future opportunities of its underlying techniques. Some important applications are also discussed to exemplify the ability of multitasking modeling in deriving holistic and reliable in silico classification-based models as well as in designing new chemical entities, either through fragment-based design or virtual screening. Focus will also be given to some software recently developed to automate and accelerate such types of modeling. Overall, this review may serve as a guideline for researchers to grasp the scope of multitasking computer-aided modeling as a promising in silico tool.
Collapse
Affiliation(s)
- Amit Kumar Halder
- LAQV@REQUIMTE, Faculty of Sciences, University of Porto, 4169-007 Porto, Portugal; (A.K.H.); (A.S.M.)
- Dr. B. C. Roy College of Pharmacy and Allied Health Sciences, Dr. Meghnad Saha Sarani, Bidhannagar, Durgapur 713212, West Bengal, India
| | - Ana S. Moura
- LAQV@REQUIMTE, Faculty of Sciences, University of Porto, 4169-007 Porto, Portugal; (A.K.H.); (A.S.M.)
| | - Maria Natália D. S. Cordeiro
- LAQV@REQUIMTE, Faculty of Sciences, University of Porto, 4169-007 Porto, Portugal; (A.K.H.); (A.S.M.)
- Correspondence: ; Tel.: +35-12-2040-2502
| |
Collapse
|
44
|
Ragoza M, Masuda T, Koes DR. Generating 3D molecules conditional on receptor binding sites with deep generative models. Chem Sci 2022; 13:2701-2713. [PMID: 35356675 PMCID: PMC8890264 DOI: 10.1039/d1sc05976a] [Citation(s) in RCA: 57] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Accepted: 02/06/2022] [Indexed: 11/22/2022] Open
Abstract
The goal of structure-based drug discovery is to find small molecules that bind to a given target protein. Deep learning has been used to generate drug-like molecules with certain cheminformatic properties, but has not yet been applied to generating 3D molecules predicted to bind to proteins by sampling the conditional distribution of protein-ligand binding interactions. In this work, we describe for the first time a deep learning system for generating 3D molecular structures conditioned on a receptor binding site. We approach the problem using a conditional variational autoencoder trained on an atomic density grid representation of cross-docked protein-ligand structures. We apply atom fitting and bond inference procedures to construct valid molecular conformations from generated atomic densities. We evaluate the properties of the generated molecules and demonstrate that they change significantly when conditioned on mutated receptors. We also explore the latent space learned by our generative model using sampling and interpolation techniques. This work opens the door for end-to-end prediction of stable bioactive molecules from protein structures with deep learning.
Collapse
Affiliation(s)
- Matthew Ragoza
- Intelligent Systems Program, University of Pittsburgh Pittsburgh PA 15213 USA
| | - Tomohide Masuda
- Department of Computational and Systems Biology, University of Pittsburgh Pittsburgh PA 15213 USA
| | - David Ryan Koes
- Department of Computational and Systems Biology, University of Pittsburgh Pittsburgh PA 15213 USA
| |
Collapse
|
45
|
Hadfield TE, Deane CM. AI in 3D compound design. Curr Opin Struct Biol 2022; 73:102326. [PMID: 35101671 DOI: 10.1016/j.sbi.2021.102326] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Revised: 11/22/2021] [Accepted: 12/13/2021] [Indexed: 11/18/2022]
Abstract
The success of Artificial Intelligence (AI) across a wide range of domains has fuelled significant interest in its application to designing novel compounds and screening compounds against a specific target. However, many existing AI methods either do not account for the 3D structure of the target at all or struggle to capture meaningful spatial information from the target. In this Opinion, we highlight a range of recent structure-aware approaches which utilise deep learning for compound design and virtual screening. We discuss how such methods can be better integrated into existing drug discovery pipelines by facilitating the design of compounds which conform to a specified design hypothesis and by uncovering key protein-ligand interactions which can be used to aid molecule design.
Collapse
Affiliation(s)
- Thomas E Hadfield
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford OX1 3LB, UK
| | - Charlotte M Deane
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford OX1 3LB, UK.
| |
Collapse
|