1
|
Wang J, Zhu Y, Liu Y, Yu B. DTF-diffusion: A 3D equivariant diffusion generation model based on ligand-target information fusion. Comput Biol Chem 2025; 117:108392. [PMID: 40020563 DOI: 10.1016/j.compbiolchem.2025.108392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2025] [Revised: 02/14/2025] [Accepted: 02/15/2025] [Indexed: 03/03/2025]
Abstract
The goal of drug discovery based on deep learning is to generate drug molecules that bind to a given target protein. Recently, the use of three-dimensional molecular structures has shown superior performance over other two-dimensional structural models. However, most of the current depth generation models are based on ligands, and in the process of molecular generation, the models only learn the independent information of ligands or targets, without considering the complex interaction information of them. In addition, chemical knowledge was not considered in the process of molecular formation, which led to generation unreasonable drug molecular structure. In order to solve above problems, this paper proposes DTF-diffusion, a 3D equivariant diffusion generation model based on ligand-target information fusion. Firstly based on the diffusion model, DTF-diffusion uses multimodal feature fusion module proposed in this paper to fuse the three-dimensional position feature information of ligand molecules and targets, and extract advanced hidden features from ligand atom information and target sequence information. Secondly, this paper designs a chemical rule discrimination module, and learns the real ligand molecular structure and the characteristic information of the generated ligand molecules through the discriminator, and then capture the chemical rules in the drug molecular structure, which effectively improve the rationality of the ligand structure generated by the model. This paper evaluates the generation performance of DTF-diffusion and other baseline methods from multiple perspectives based on the CrossDocket2020 dataset. In the quantitative estimate of drug-likeness index, DTF-diffusion is 3.85 % higher than the existing optimal model, the drug validity index increased by 4.34 %. More generation experiments have proved that DTF-diffusion has excellent performance, indicating that it has a good application prospect in the field of drug molecule generation.
Collapse
Affiliation(s)
- Jianxin Wang
- School of Data Science, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Yongxin Zhu
- School of Data Science, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Yushuang Liu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China.
| | - Bin Yu
- School of Data Science, Qingdao University of Science and Technology, Qingdao 266061, China.
| |
Collapse
|
2
|
Yang Z, Wang K, Zhang G, Jiang Y, Zeng R, Qiao J, Li Y, Deng X, Xia Z, Yao R, Zeng X, Zhang L, Zhao Y, Lei J, Chen R. A deep learning model for structure-based bioactivity optimization and its application in the bioactivity optimization of a SARS-CoV-2 main protease inhibitor. Eur J Med Chem 2025; 291:117602. [PMID: 40239482 DOI: 10.1016/j.ejmech.2025.117602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2024] [Revised: 04/02/2025] [Accepted: 04/03/2025] [Indexed: 04/18/2025]
Abstract
Bioactivity optimization is a crucial and technical task in the early stages of drug discovery, traditionally carried out through iterative substituent optimization, a process that is often both time-consuming and expensive. To address this challenge, we present Pocket-StrMod, a deep-learning model tailored for structure-based bioactivity optimization. Pocket-StrMod employs an autoregressive flow-based architecture, optimizing molecules within a specific protein binding pocket while explicitly incorporating chemical expertise. It synchronously optimizes all substituents by generating atoms and covalent bonds at designated sites within a molecular scaffold nestled inside a protein pocket. We applied this model to optimize the bioactivity of Hit1, an inhibitor of the SARS-CoV-2 main protease (Mpro) with initially poor bioactivity (IC50 : 34.56 μM). Following two rounds of optimization, six compounds were selected for synthesis and bioactivity testing. This led to the discovery of C5, a potent compound with an IC50 value of 33.6 nM, marking a remarkable 1028-fold improvement over Hit1. Furthermore, C5 demonstrated promising in vitro antiviral activity against SARS-CoV-2. Collectively, these findings underscore the great potential of deep learning in facilitating rapid and cost-effective bioactivity optimization in the early phases of drug development.
Collapse
Affiliation(s)
- Zhenyu Yang
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Kai Wang
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Guo Zhang
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Yuanyuan Jiang
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Rui Zeng
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Jingxin Qiao
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Yueyue Li
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Xinyue Deng
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Ziyi Xia
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Rui Yao
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Xiaoxi Zeng
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Liyun Zhang
- Lead Generation Unit, HitGen Inc., Tianfu International Bio-Town, Shuangliu District, Chengdu, Sichuan, 610200, China
| | - Yi Zhao
- Key Laboratory of Intelligent Information Processing, Advanced Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China.
| | - Jian Lei
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China; National Clinical Research Center for Geriatrics, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China.
| | - Runsheng Chen
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China; Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China.
| |
Collapse
|
3
|
Kyro GW, Martin MT, Watt ED, Batista VS. CardioGenAI: a machine learning-based framework for re-engineering drugs for reduced hERG liability. J Cheminform 2025; 17:30. [PMID: 40045386 PMCID: PMC11881490 DOI: 10.1186/s13321-025-00976-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2024] [Accepted: 02/21/2025] [Indexed: 03/09/2025] Open
Abstract
The link between in vitro hERG ion channel inhibition and subsequent in vivo QT interval prolongation, a critical risk factor for the development of arrythmias such as Torsade de Pointes, is so well established that in vitro hERG activity alone is often sufficient to end the development of an otherwise promising drug candidate. It is therefore of tremendous interest to develop advanced methods for identifying hERG-active compounds in the early stages of drug development, as well as for proposing redesigned compounds with reduced hERG liability and preserved primary pharmacology. In this work, we present CardioGenAI, a machine learning-based framework for re-engineering both developmental and commercially available drugs for reduced hERG activity while preserving their pharmacological activity. The framework incorporates novel state-of-the-art discriminative models for predicting hERG channel activity, as well as activity against the voltage-gated NaV1.5 and CaV1.2 channels due to their potential implications in modulating the arrhythmogenic potential induced by hERG channel blockade. We applied the complete framework to pimozide, an FDA-approved antipsychotic agent that demonstrates high affinity to the hERG channel, and generated 100 refined candidates. Remarkably, among the candidates is fluspirilene, a compound which is of the same class of drugs as pimozide (diphenylmethanes) and therefore has similar pharmacological activity, yet exhibits over 700-fold weaker binding to hERG. Furthermore, we demonstrated the framework's ability to optimize hERG, NaV1.5 and CaV1.2 profiles of multiple FDA-approved compounds while maintaining the physicochemical nature of the original drugs. We envision that this method can effectively be applied to developmental compounds exhibiting hERG liabilities to provide a means of rescuing drug development programs that have stalled due to hERG-related safety concerns. Additionally, the discriminative models can also serve independently as effective components of virtual screening pipelines. We have made all of our software open-source at https://github.com/gregory-kyro/CardioGenAI to facilitate integration of the CardioGenAI framework for molecular hypothesis generation into drug discovery workflows.Scientific contributionThis work introduces CardioGenAI, an open-source machine learning-based framework designed to re-engineer drugs for reduced hERG liability while preserving their pharmacological activity. The complete CardioGenAI framework can be applied to developmental compounds exhibiting hERG liabilities to provide a means of rescuing drug discovery programs facing hERG-related challenges. In addition, the framework incorporates novel state-of-the-art discriminative models for predicting hERG, NaV1.5 and CaV1.2 channel activity, which can function independently as effective components of virtual screening pipelines.
Collapse
Affiliation(s)
- Gregory W Kyro
- Department of Chemistry, Yale University, New Haven, CT, 06511, USA.
- Drug Safety Research & Development, Pfizer Research & Development, Groton, CT, 06340, USA.
| | - Matthew T Martin
- Drug Safety Research & Development, Pfizer Research & Development, Groton, CT, 06340, USA
| | - Eric D Watt
- Drug Safety Research & Development, Pfizer Research & Development, Groton, CT, 06340, USA
| | - Victor S Batista
- Department of Chemistry, Yale University, New Haven, CT, 06511, USA.
| |
Collapse
|
4
|
Li J, Zhang O, Sun K, Wang Y, Guan X, Bagni D, Haghighatlari M, Kearns FL, Parks C, Amaro RE, Head-Gordon T. Mining for Potent Inhibitors through Artificial Intelligence and Physics: A Unified Methodology for Ligand Based and Structure Based Drug Design. J Chem Inf Model 2024; 64:9082-9097. [PMID: 38843070 PMCID: PMC11683870 DOI: 10.1021/acs.jcim.4c00634] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2024] [Revised: 05/19/2024] [Accepted: 05/21/2024] [Indexed: 12/11/2024]
Abstract
Determining the viability of a new drug molecule is a time- and resource-intensive task that makes computer-aided assessments a vital approach to rapid drug discovery. Here we develop a machine learning algorithm, iMiner, that generates novel inhibitor molecules for target proteins by combining deep reinforcement learning with real-time 3D molecular docking using AutoDock Vina, thereby simultaneously creating chemical novelty while constraining molecules for shape and molecular compatibility with target active sites. Moreover, through the use of various types of reward functions, we have introduced novelty in generative tasks for new molecules such as chemical similarity to a target ligand, molecules grown from known protein bound fragments, and creation of molecules that enforce interactions with target residues in the protein active site. The iMiner algorithm is embedded in a composite workflow that filters out Pan-assay interference compounds, Lipinski rule violations, uncommon structures in medicinal chemistry, and poor synthetic accessibility with options for cross-validation against other docking scoring functions and automation of a molecular dynamics simulation to measure pose stability. We also allow users to define a set of rules for the structures they would like to exclude during the training process and postfiltering steps. Because our approach relies only on the structure of the target protein, iMiner can be easily adapted for the future development of other inhibitors or small molecule therapeutics of any target protein.
Collapse
Affiliation(s)
- Jie Li
- Pitzer
Center for Theoretical Chemistry, Department of Chemistry, University of California, Berkeley, California 94720, United States
| | - Oufan Zhang
- Pitzer
Center for Theoretical Chemistry, Department of Chemistry, University of California, Berkeley, California 94720, United States
| | - Kunyang Sun
- Pitzer
Center for Theoretical Chemistry, Department of Chemistry, University of California, Berkeley, California 94720, United States
| | - Yingze Wang
- Pitzer
Center for Theoretical Chemistry, Department of Chemistry, University of California, Berkeley, California 94720, United States
| | - Xingyi Guan
- Pitzer
Center for Theoretical Chemistry, Department of Chemistry, University of California, Berkeley, California 94720, United States
| | - Dorian Bagni
- Pitzer
Center for Theoretical Chemistry, Department of Chemistry, University of California, Berkeley, California 94720, United States
| | - Mojtaba Haghighatlari
- Pitzer
Center for Theoretical Chemistry, Department of Chemistry, University of California, Berkeley, California 94720, United States
| | - Fiona L. Kearns
- Department
of Chemistry and Biochemistry, University
of California, San Diego, La Jolla, California 92093, United States
| | - Conor Parks
- Department
of Chemistry and Biochemistry, University
of California, San Diego, La Jolla, California 92093, United States
| | - Rommie E. Amaro
- Department
of Chemistry and Biochemistry, University
of California, San Diego, La Jolla, California 92093, United States
| | - Teresa Head-Gordon
- Pitzer
Center for Theoretical Chemistry, Department of Chemistry, University of California, Berkeley, California 94720, United States
- Departments
of Bioengineering and Chemical and Biomolecular Engineering, University of California, Berkeley, California 94720, United States
| |
Collapse
|
5
|
Ozawa M, Nakamura S, Yasuo N, Sekijima M. IEV2Mol: Molecular Generative Model Considering Protein-Ligand Interaction Energy Vectors. J Chem Inf Model 2024; 64:6969-6978. [PMID: 39254942 PMCID: PMC11423338 DOI: 10.1021/acs.jcim.4c00842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/11/2024]
Abstract
Generating drug candidates with desired protein-ligand interactions is a significant challenge in structure-based drug design. In this study, a new generative model, IEV2Mol, is proposed that incorporates interaction energy vectors (IEVs) between proteins and ligands obtained from docking simulations, which quantitatively capture the strength of each interaction type, such as hydrogen bonds, electrostatic interactions, and van der Waals forces. By integrating this IEV into an end-to-end variational autoencoder (VAE) framework that learns the chemical space from SMILES and minimizes the reconstruction error of the SMILES, the model can more accurately generate compounds with the desired interactions. To evaluate the effectiveness of IEV2Mol, we performed benchmark comparisons with randomly selected compounds, unconstrained VAE models (JT-VAE), and compounds generated by RNN models based on interaction fingerprints (IFP-RNN). The results show that the compounds generated by IEV2Mol retain a significantly greater percentage of the binding mode of the query structure than those of the other methods. Furthermore, IEV2Mol was able to generate compounds with interactions similar to those of the input compounds, regardless of structural similarity. The source code and trained models for IEV2Mol, JT-VAE, and IFP-RNN designed for generating compounds active against the DRD2, AA2AR, and AKT1, as well as the data sets (DM-QP-1M, active compounds to each protein, and ChEMBL33) utilized in this study, are released under the MIT License and available at https://github.com/sekijima-lab/IEV2Mol.
Collapse
Affiliation(s)
- Mami Ozawa
- Department of Computer Science, Tokyo Institute of Technology, Yokohama, Kanagawa 226-8501, Japan
| | - Shogo Nakamura
- Department of Life Science and Technology, Tokyo Institute of Technology, Yokohama, Kanagawa 226-8501, Japan
| | - Nobuaki Yasuo
- Academy for Convergence of Materials and Informatics (TAC-MI), Tokyo Institute of Technology, Tokyo 152-8550, Japan
| | - Masakazu Sekijima
- Department of Computer Science, Tokyo Institute of Technology, Yokohama, Kanagawa 226-8501, Japan
| |
Collapse
|
6
|
Yang Y, Chen G, Li J, Li J, Zhang O, Zhang X, Li L, Hao J, Wang E, Heng PA. Enabling target-aware molecule generation to follow multi objectives with Pareto MCTS. Commun Biol 2024; 7:1074. [PMID: 39223327 PMCID: PMC11368924 DOI: 10.1038/s42003-024-06746-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Accepted: 08/16/2024] [Indexed: 09/04/2024] Open
Abstract
Target-aware drug discovery has greatly accelerated the drug discovery process to design small-molecule ligands with high binding affinity to disease-related protein targets. Conditioned on targeted proteins, previous works utilize various kinds of deep generative models and have shown great potential in generating molecules with strong protein-ligand binding interactions. However, beyond binding affinity, effective drug molecules must manifest other essential properties such as high drug-likeness, which are not explicitly addressed by current target-aware generative methods. In this article, aiming to bridge the gap of multi-objective target-aware molecule generation in the field of deep learning-based drug discovery, we propose ParetoDrug, a Pareto Monte Carlo Tree Search (MCTS) generation algorithm. ParetoDrug searches molecules on the Pareto Front in chemical space using MCTS to enable synchronous optimization of multiple properties. Specifically, ParetoDrug utilizes pretrained atom-by-atom autoregressive generative models for the exploration guidance to desired molecules during MCTS searching. Besides, when selecting the next atom symbol, a scheme named ParetoPUCT is proposed to balance exploration and exploitation. Benchmark experiments and case studies demonstrate that ParetoDrug is highly effective in traversing the large and complex chemical space to discover novel compounds with satisfactory binding affinities and drug-like properties for various multi-objective target-aware drug discovery tasks.
Collapse
Affiliation(s)
- Yaodong Yang
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China
| | | | - Jinpeng Li
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China
| | | | | | | | | | - Jianye Hao
- Noah's Ark Lab, Huawei, Shenzhen, China.
| | | | - Pheng-Ann Heng
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China
| |
Collapse
|
7
|
Choudhary R, Mahadevan R. FOCUS on NOD2: Advancing IBD Drug Discovery with a User-Informed Machine Learning Framework. ACS Med Chem Lett 2024; 15:1057-1070. [PMID: 39015268 PMCID: PMC11247655 DOI: 10.1021/acsmedchemlett.4c00148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Revised: 05/17/2024] [Accepted: 06/03/2024] [Indexed: 07/18/2024] Open
Abstract
In this study, we introduce the Framework for Optimized Customizable User-Informed Synthesis (FOCUS), a generative machine learning model tailored for drug discovery. FOCUS integrates domain expertise and uses Proximal Policy Optimization (PPO) to guide Monte Carlo Tree Search (MCTS) to efficiently explore chemical space. It generates SMILES representations of potential drug candidates, optimizing for druggability and binding efficacy to NOD2, PEP, and MCT1 receptors. The model is highly interpretive, allowing for user-feedback and expert-driven adjustments based on detailed cycle reports. Employing tools like SHAP and LIME, FOCUS provides a transparent analysis of decision-making processes, emphasizing features such as docking scores and interaction fingerprints. Comparative studies with Muramyl Dipeptide (MDP) demonstrate improved interaction profiles. FOCUS merges advanced machine learning with expert insight, accelerating the drug discovery pipeline.
Collapse
Affiliation(s)
- Ruhi Choudhary
- Department of Chemical Engineering
and Applied Chemistry, University of Toronto, Toronto, Ontario M5S 3E5, Canada
| | - Radhakrishnan Mahadevan
- Department of Chemical Engineering
and Applied Chemistry, University of Toronto, Toronto, Ontario M5S 3E5, Canada
| |
Collapse
|
8
|
Zhang X, Sheng Y, Liu X, Yang J, Goddard Iii WA, Ye C, Zhang W. Polymer-Unit Graph: Advancing Interpretability in Graph Neural Network Machine Learning for Organic Polymer Semiconductor Materials. J Chem Theory Comput 2024; 20:2908-2920. [PMID: 38551455 DOI: 10.1021/acs.jctc.3c01385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/10/2024]
Abstract
The graph representation of complex materials plays a crucial role in the field of inorganic and organic materials investigations for developing data-centric materials science, such as those using graph neural networks (GNNs). However, the currently prevalent GNN models are primarily employed for investigating periodic crystals and organic small molecule data, yet they still encounter challenges in terms of interpretability and computational efficiency when applied to polymer monomers and organic macromolecules data. There is still a lack of graph representation of organic polymers and macromolecules specifically tailored for GNN models to explore the structural characteristics. The Polymer-unit Graph, a novel coarse-grained graph representation method introduced in study, is dedicated to expressing and analyzing polymers and macromolecules. By incorporating the Polymer-unit Graph into the GNN models and analyzing the organic semiconductor (OSC) materials database, it becomes possible to uncover intricate structure-property relationships involving branched-chain engineering, fluoridation substitution, and donor-acceptor combination effects on the elementary structure of OSC polymers. Furthermore, the Polymer-unit Graph enables visualizing the relationship between target properties and polymer units while reducing training time by an impressive 98% and minimizing molecular graph representation models. In conclusion, the Polymer-unit Graph successfully integrates the concept of Polymer-unit into the field of GNNs, enabling more accurate analysis and understanding of organic polymers and macromolecules.
Collapse
Affiliation(s)
- Xinyue Zhang
- Department of Materials Science and Engineering & Guangdong Provincial Key Laboratory of Computational Science and Material Design, Southern University of Science and Technology, Shenzhen 518055, PR China
| | - Ye Sheng
- Department of Materials Science and Engineering & Guangdong Provincial Key Laboratory of Computational Science and Material Design, Southern University of Science and Technology, Shenzhen 518055, PR China
| | - Xiumin Liu
- Department of Materials Science and Engineering & Guangdong Provincial Key Laboratory of Computational Science and Material Design, Southern University of Science and Technology, Shenzhen 518055, PR China
- Key Laboratory of Soft Chemistry and Functional Materials of MOE, School of Chemistry and Chemical Engineering, Nanjing University of Science and Technology, Nanjing 210094, PR China
| | - Jiong Yang
- Materials Genome Institute, Shanghai University, Shanghai 200444, PR China
| | - William A Goddard Iii
- Materials and Process Simulation Center (MSC), California Institute of Technology, Pasadena, California 91125, United States
| | - Caichao Ye
- Department of Materials Science and Engineering & Guangdong Provincial Key Laboratory of Computational Science and Material Design, Southern University of Science and Technology, Shenzhen 518055, PR China
- Academy for Advanced Interdisciplinary Studies, Southern University of Science and Technology, Shenzhen 518055, PR China
| | - Wenqing Zhang
- Department of Materials Science and Engineering & Guangdong Provincial Key Laboratory of Computational Science and Material Design, Southern University of Science and Technology, Shenzhen 518055, PR China
| |
Collapse
|
9
|
Pang C, Qiao J, Zeng X, Zou Q, Wei L. Deep Generative Models in De Novo Drug Molecule Generation. J Chem Inf Model 2024; 64:2174-2194. [PMID: 37934070 DOI: 10.1021/acs.jcim.3c01496] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2023]
Abstract
The discovery of new drugs has important implications for human health. Traditional methods for drug discovery rely on experiments to optimize the structure of lead molecules, which are time-consuming and high-cost. Recently, artificial intelligence has exhibited promising and efficient performance for drug-like molecule generation. In particular, deep generative models achieve great success in de novo generation of drug-like molecules with desired properties, showing massive potential for novel drug discovery. In this study, we review the recent progress of molecule generation using deep generative models, mainly focusing on molecule representations, public databases, data processing tools, and advanced artificial intelligence based molecule generation frameworks. In particular, we present a comprehensive comparison of state-of-the-art deep generative models for molecule generation and a summary of commonly used molecular design strategies. We identify research gaps and challenges of molecule generation such as the need for better databases, missing 3D information in molecular representation, and the lack of high-precision evaluation metrics. We suggest future directions for molecular generation and drug discovery.
Collapse
Affiliation(s)
- Chao Pang
- School of Software, Shandong University, Jinan 250100, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250100, China
| | - Jianbo Qiao
- School of Software, Shandong University, Jinan 250100, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250100, China
| | - Xiangxiang Zeng
- College of Information Science and Engineering, Hunan University, Changsha 410082, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Leyi Wei
- School of Software, Shandong University, Jinan 250100, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250100, China
| |
Collapse
|
10
|
Ghiandoni GM, Evertsson E, Riley DJ, Tyrchan C, Rathi PC. Augmenting DMTA using predictive AI modelling at AstraZeneca. Drug Discov Today 2024; 29:103945. [PMID: 38460568 DOI: 10.1016/j.drudis.2024.103945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Revised: 02/27/2024] [Accepted: 03/05/2024] [Indexed: 03/11/2024]
Abstract
Design-Make-Test-Analyse (DMTA) is the discovery cycle through which molecules are designed, synthesised, and assayed to produce data that in turn are analysed to inform the next iteration. The process is repeated until viable drug candidates are identified, often requiring many cycles before reaching a sweet spot. The advent of artificial intelligence (AI) and cloud computing presents an opportunity to innovate drug discovery to reduce the number of cycles needed to yield a candidate. Here, we present the Predictive Insight Platform (PIP), a cloud-native modelling platform developed at AstraZeneca. The impact of PIP in each step of DMTA, as well as its architecture, integration, and usage, are discussed and used to provide insights into the future of drug discovery.
Collapse
Affiliation(s)
- Gian Marco Ghiandoni
- Augmented DMTA Platform, R&D IT, AstraZeneca, The Discovery Centre (DISC), Francis Crick Avenue, Cambridge CB2 0AA, UK.
| | - Emma Evertsson
- Research and Early Development, Respiratory and Immunology (R&I), Biopharmaceuticals R&D, AstraZeneca, Pepparedsleden, Mölndal, SE 43183, Sweden
| | - David J Riley
- Augmented DMTA Platform, R&D IT, AstraZeneca, The Discovery Centre (DISC), Francis Crick Avenue, Cambridge CB2 0AA, UK
| | - Christian Tyrchan
- Research and Early Development, Respiratory and Immunology (R&I), Biopharmaceuticals R&D, AstraZeneca, Pepparedsleden, Mölndal, SE 43183, Sweden
| | - Prakash Chandra Rathi
- Augmented DMTA Platform, R&D IT, AstraZeneca, The Discovery Centre (DISC), Francis Crick Avenue, Cambridge CB2 0AA, UK
| |
Collapse
|
11
|
Wang M, Wu Z, Wang J, Weng G, Kang Y, Pan P, Li D, Deng Y, Yao X, Bing Z, Hsieh CY, Hou T. Genetic Algorithm-Based Receptor Ligand: A Genetic Algorithm-Guided Generative Model to Boost the Novelty and Drug-Likeness of Molecules in a Sampling Chemical Space. J Chem Inf Model 2024; 64:1213-1228. [PMID: 38302422 DOI: 10.1021/acs.jcim.3c01964] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2024]
Abstract
Deep learning-based de novo molecular design has recently gained significant attention. While numerous DL-based generative models have been successfully developed for designing novel compounds, the majority of the generated molecules lack sufficiently novel scaffolds or high drug-like profiles. The aforementioned issues may not be fully captured by commonly used metrics for the assessment of molecular generative models, such as novelty, diversity, and quantitative estimation of the drug-likeness score. To address these limitations, we proposed a genetic algorithm-guided generative model called GARel (genetic algorithm-based receptor-ligand interaction generator), a novel framework for training a DL-based generative model to produce drug-like molecules with novel scaffolds. To efficiently train the GARel model, we utilized dense net to update the parameters based on molecules with novel scaffolds and drug-like features. To demonstrate the capability of the GARel model, we used it to design inhibitors for three targets: AA2AR, EGFR, and SARS-Cov2. The results indicate that GARel-generated molecules feature more diverse and novel scaffolds and possess more desirable physicochemical properties and favorable docking scores. Compared with other generative models, GARel makes significant progress in balancing novelty and drug-likeness, providing a promising direction for the further development of DL-based de novo design methodology with potential impacts on drug discovery.
Collapse
Affiliation(s)
- Mingyang Wang
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
- CarbonSilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang ,China
| | - Zhengjian Wu
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
- School of Computer Science, Wuhan University, Wuhan 430072, Hubei ,China
| | - Jike Wang
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
- CarbonSilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang ,China
| | - Gaoqi Weng
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
| | - Yu Kang
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
| | - Peichen Pan
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
| | - Dan Li
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang ,China
| | - Xiaojun Yao
- Dr. Neher's Biophysics Laboratory for Innovative Drug Discovery Macau Institute for Applied Research in Medicine and Health State Key Laboratory of Quality Research in Chinese Medicine, Macau University of Science and Technology, Taipa, Macau 999078, China
| | - Zhitong Bing
- Institute of Modern Physics, Chinese Academy of Sciences, Lanzhou, Gansu 730000, China
| | - Chang-Yu Hsieh
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
| | - Tingjun Hou
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
| |
Collapse
|
12
|
Kyro GW, Morgunov A, Brent RI, Batista VS. ChemSpaceAL: An Efficient Active Learning Methodology Applied to Protein-Specific Molecular Generation. J Chem Inf Model 2024; 64:653-665. [PMID: 38287889 DOI: 10.1021/acs.jcim.3c01456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2024]
Abstract
The incredible capabilities of generative artificial intelligence models have inevitably led to their application in the domain of drug discovery. Within this domain, the vastness of chemical space motivates the development of more efficient methods for identifying regions with molecules that exhibit desired characteristics. In this work, we present a computationally efficient active learning methodology and demonstrate its applicability to targeted molecular generation. When applied to c-Abl kinase, a protein with FDA-approved small-molecule inhibitors, the model learns to generate molecules similar to the inhibitors without prior knowledge of their existence and even reproduces two of them exactly. We also show that the methodology is effective for a protein without any commercially available small-molecule inhibitors, the HNH domain of the CRISPR-associated protein 9 (Cas9) enzyme. To facilitate implementation and reproducibility, we made all of our software available through the open-source ChemSpaceAL Python package.
Collapse
Affiliation(s)
- Gregory W Kyro
- Department of Chemistry, Yale University, New Haven, Connecticut 06511-8499, United States
| | - Anton Morgunov
- Department of Chemistry, Yale University, New Haven, Connecticut 06511-8499, United States
| | - Rafael I Brent
- Department of Chemistry, Yale University, New Haven, Connecticut 06511-8499, United States
| | - Victor S Batista
- Department of Chemistry, Yale University, New Haven, Connecticut 06511-8499, United States
| |
Collapse
|
13
|
Barghout RA, Xu Z, Betala S, Mahadevan R. Advances in generative modeling methods and datasets to design novel enzymes for renewable chemicals and fuels. Curr Opin Biotechnol 2023; 84:103007. [PMID: 37931573 DOI: 10.1016/j.copbio.2023.103007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Revised: 09/12/2023] [Accepted: 09/13/2023] [Indexed: 11/08/2023]
Abstract
Biotechnology has revolutionized the development of sustainable energy sources by harnessing biomass as a feedstock for energy production. However, challenges such as recalcitrant feedstocks and inefficient metabolic pathways hinder the large-scale integration of renewable energy systems. Enzyme engineering has emerged as a powerful tool to address these challenges by enhancing enzyme activity, specificity, and stability. Generative machine learning (ML) models have shown great promise in accelerating protein design, allowing for the generation of novel protein sequences with desired properties by navigating vast spaces. This review paper aims to summarize the state of the art in generative models for protein design and how they can be applied to bioenergy applications, including the underlying architectures and training strategies. Additionally, it highlights the importance of high-quality datasets for training and evaluating generative models, organizes available datasets for generative protein design, and discusses the potential of applying generative models to strain design for bioenergy production.
Collapse
Affiliation(s)
- Rana A Barghout
- Department of Chemical Engineering & Applied Chemistry, University of Toronto, 200 College St, Toronto, ON, Canada.
| | - Zhiqing Xu
- Department of Chemical Engineering & Applied Chemistry, University of Toronto, 200 College St, Toronto, ON, Canada
| | - Siddharth Betala
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, India
| | - Radhakrishnan Mahadevan
- Department of Chemical Engineering & Applied Chemistry, University of Toronto, 200 College St, Toronto, ON, Canada
| |
Collapse
|
14
|
Baillif B, Cole J, McCabe P, Bender A. Deep generative models for 3D molecular structure. Curr Opin Struct Biol 2023; 80:102566. [DOI: 10.1016/j.sbi.2023.102566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 02/05/2023] [Accepted: 02/15/2023] [Indexed: 03/30/2023]
|
15
|
Pliushcheuskaya P, Künze G. Recent Advances in Computer-Aided Structure-Based Drug Design on Ion Channels. Int J Mol Sci 2023; 24:ijms24119226. [PMID: 37298178 DOI: 10.3390/ijms24119226] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 05/16/2023] [Accepted: 05/22/2023] [Indexed: 06/12/2023] Open
Abstract
Ion channels play important roles in fundamental biological processes, such as electric signaling in cells, muscle contraction, hormone secretion, and regulation of the immune response. Targeting ion channels with drugs represents a treatment option for neurological and cardiovascular diseases, muscular degradation disorders, and pathologies related to disturbed pain sensation. While there are more than 300 different ion channels in the human organism, drugs have been developed only for some of them and currently available drugs lack selectivity. Computational approaches are an indispensable tool for drug discovery and can speed up, especially, the early development stages of lead identification and optimization. The number of molecular structures of ion channels has considerably increased over the last ten years, providing new opportunities for structure-based drug development. This review summarizes important knowledge about ion channel classification, structure, mechanisms, and pathology with the main focus on recent developments in the field of computer-aided, structure-based drug design on ion channels. We highlight studies that link structural data with modeling and chemoinformatic approaches for the identification and characterization of new molecules targeting ion channels. These approaches hold great potential to advance research on ion channel drugs in the future.
Collapse
Affiliation(s)
- Palina Pliushcheuskaya
- Institute for Drug Discovery, Medical Faculty, University of Leipzig, Brüderstr. 34, D-04103 Leipzig, Germany
| | - Georg Künze
- Institute for Drug Discovery, Medical Faculty, University of Leipzig, Brüderstr. 34, D-04103 Leipzig, Germany
- Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstr. 16-18, D-04107 Leipzig, Germany
| |
Collapse
|
16
|
Danel T, Łęski J, Podlewska S, Podolak IT. Docking-based generative approaches in the search for new drug candidates. Drug Discov Today 2023; 28:103439. [PMID: 36372330 DOI: 10.1016/j.drudis.2022.103439] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 10/08/2022] [Accepted: 11/08/2022] [Indexed: 11/13/2022]
Abstract
Despite the popularity of virtual screening (VS) of existing compound libraries, the search for new potential drug candidates also takes advantage of generative protocols, where new compound suggestions are enumerated using various algorithms. To increase the activity potency of generative approaches, they have recently been coupled with molecular docking, a leading methodology of structure-based drug design (SBDD). In this review, we summarize progress since docking-based generative models emerged. We propose a new taxonomy for these methods and discuss their importance for the field of computer-aided drug design (CADD). In addition, we discuss the most promising directions for the further development of generative protocols coupled with docking.
Collapse
Affiliation(s)
- Tomasz Danel
- Faculty of Mathematics and Computer Science, Jagiellonian University, 6 Łojasiewicza Street, 30-348 Kraków, Poland.
| | - Jan Łęski
- Faculty of Mathematics and Computer Science, Jagiellonian University, 6 Łojasiewicza Street, 30-348 Kraków, Poland
| | - Sabina Podlewska
- Maj Institute of Pharmacology, Polish Academy of Sciences, Department of Medicinal Chemistry, 31-343 Kraków, Smętna Street 12, Poland
| | - Igor T Podolak
- Faculty of Mathematics and Computer Science, Jagiellonian University, 6 Łojasiewicza Street, 30-348 Kraków, Poland
| |
Collapse
|
17
|
Chan L, Kumar R, Verdonk M, Poelking C. A multilevel generative framework with hierarchical self-contrasting for bias control and transparency in structure-based ligand design. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00564-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
18
|
Interpretable Machine Learning Models for Molecular Design of Tyrosine Kinase Inhibitors Using Variational Autoencoders and Perturbation-Based Approach of Chemical Space Exploration. Int J Mol Sci 2022; 23:ijms231911262. [PMID: 36232566 PMCID: PMC9569663 DOI: 10.3390/ijms231911262] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 09/21/2022] [Accepted: 09/21/2022] [Indexed: 11/17/2022] Open
Abstract
In the current study, we introduce an integrative machine learning strategy for the autonomous molecular design of protein kinase inhibitors using variational autoencoders and a novel cluster-based perturbation approach for exploration of the chemical latent space. The proposed strategy combines autoencoder-based embedding of small molecules with a cluster-based perturbation approach for efficient navigation of the latent space and a feature-based kinase inhibition likelihood classifier that guides optimization of the molecular properties and targeted molecular design. In the proposed generative approach, molecules sharing similar structures tend to cluster in the latent space, and interpolating between two molecules in the latent space enables smooth changes in the molecular structures and properties. The results demonstrated that the proposed strategy can efficiently explore the latent space of small molecules and kinase inhibitors along interpretable directions to guide the generation of novel family-specific kinase molecules that display a significant scaffold diversity and optimal biochemical properties. Through assessment of the latent-based and chemical feature-based binary and multiclass classifiers, we developed a robust probabilistic evaluator of kinase inhibition likelihood that is specifically tailored to guide the molecular design of novel SRC kinase molecules. The generated molecules originating from LCK and ABL1 kinase inhibitors yielded ~40% of novel and valid SRC kinase compounds with high kinase inhibition likelihood probability values (p > 0.75) and high similarity (Tanimoto coefficient > 0.6) to the known SRC inhibitors. By combining the molecular perturbation design with the kinase inhibition likelihood analysis and similarity assessments, we showed that the proposed molecular design strategy can produce novel valid molecules and transform known inhibitors of different kinase families into potential chemical probes of the SRC kinase with excellent physicochemical profiles and high similarity to the known SRC kinase drugs. The results of our study suggest that task-specific manipulation of a biased latent space may be an important direction for more effective task-oriented and target-specific autonomous chemical design models.
Collapse
|