1
|
Shahin R, Jaafreh S, Azzam Y. Tracking protein kinase targeting advances: integrating QSAR into machine learning for kinase-targeted drug discovery. Future Sci OA 2025; 11:2483631. [PMID: 40181786 PMCID: PMC11980485 DOI: 10.1080/20565623.2025.2483631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Accepted: 03/06/2025] [Indexed: 04/05/2025] Open
Abstract
Protein kinases are vital drug targets, yet designing selective inhibitors is challenging, compounded by resistance and kinome complexity. This review explores Quantitative Structure-Activity Relationship (QSAR) modeling for kinase drug discovery, focusing on integrating traditional QSAR with machine learning (ML)-CNNs, RNNs-and structural data. Methods include structural databases, docking, and deep learning QSAR. Key findings show ML-integrated QSAR significantly improves selective inhibitor design for CDKs, JAKs, PIM kinases. The IDG-DREAM challenge exemplifies ML's potential for accurate kinase-inhibitor interaction prediction, outperforming traditional methods and enabling inhibitors with enhanced selectivity, efficacy, and resistance mitigation. QSAR combined with advanced computation and experimental data accelerates kinase drug discovery, offering transformative precision medicine potential. This review highlights deep learning-enhanced QSAR's novelty in automating feature extraction and capturing complex relationships, surpassing traditional QSAR, while emphasizing interpretability and experimental validation for clinical translation.
Collapse
Affiliation(s)
- Rand Shahin
- Drug Design Unit, Department of Pharmaceutical Chemistry, Hashemite University, Zarqa, Jordan
| | - Sawsan Jaafreh
- Department of Chemistry, The Hashemite University, Zarqa, Jordan
| | - Yusra Azzam
- Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA
| |
Collapse
|
2
|
Jarallah SJ, Almughem FA, Alhumaid NK, Fayez NA, Alradwan I, Alsulami KA, Tawfik EA, Alshehri AA. Artificial intelligence revolution in drug discovery: A paradigm shift in pharmaceutical innovation. Int J Pharm 2025; 680:125789. [PMID: 40451590 DOI: 10.1016/j.ijpharm.2025.125789] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2025] [Revised: 05/06/2025] [Accepted: 05/27/2025] [Indexed: 06/16/2025]
Abstract
Integrating artificial intelligence (AI) into drug discovery has revolutionized pharmaceutical innovation, addressing the challenges of traditional methods that are costly, time-consuming, and suffer from high failure rates. By utilizing machine learning (ML), deep learning (DL), and natural language processing (NLP), AI enhances various stages of drug development, including target identification, lead optimization, de novo drug design, and drug repurposing. AI tools, such as AlphaFold for protein structure prediction and AtomNet for structure-based drug design, have significantly accelerated the discovery process, improved efficiency and reduced costs. Success stories like Insilico Medicine's AI-designed molecule for idiopathic pulmonary fibrosis and BenevolentAI's identification of baricitinib for COVID-19 highlight AI's transformative potential. Additionally, AI enables the exploration of vast chemical spaces, optimization of clinical trials, and the identification of novel therapeutic targets, paving the way for precision medicine. However, challenges such as limited data accessibility, integration of diverse datasets, interpretability of AI models, and ethical concerns remain critical hurdles. Overcoming these limitations through enhanced algorithms, standardized databases, and interdisciplinary collaboration is essential. Overall, AI continues to reshape drug discovery, reducing timelines, increasing success rates, and driving the development of innovative and accessible therapies for unmet medical needs.
Collapse
Affiliation(s)
- Somayah J Jarallah
- Advanced Diagnostics and Therapeutics Institute, Health Sector, King Abdulaziz City for Science and Technology (KACST), Riyadh 11442, Saudi Arabia
| | - Fahad A Almughem
- Advanced Diagnostics and Therapeutics Institute, Health Sector, King Abdulaziz City for Science and Technology (KACST), Riyadh 11442, Saudi Arabia
| | - Nada K Alhumaid
- Advanced Diagnostics and Therapeutics Institute, Health Sector, King Abdulaziz City for Science and Technology (KACST), Riyadh 11442, Saudi Arabia
| | - Nojoud Al Fayez
- Advanced Diagnostics and Therapeutics Institute, Health Sector, King Abdulaziz City for Science and Technology (KACST), Riyadh 11442, Saudi Arabia
| | - Ibrahim Alradwan
- Advanced Diagnostics and Therapeutics Institute, Health Sector, King Abdulaziz City for Science and Technology (KACST), Riyadh 11442, Saudi Arabia
| | - Khulud A Alsulami
- Advanced Diagnostics and Therapeutics Institute, Health Sector, King Abdulaziz City for Science and Technology (KACST), Riyadh 11442, Saudi Arabia
| | - Essam A Tawfik
- Advanced Diagnostics and Therapeutics Institute, Health Sector, King Abdulaziz City for Science and Technology (KACST), Riyadh 11442, Saudi Arabia.
| | - Abdullah A Alshehri
- Advanced Diagnostics and Therapeutics Institute, Health Sector, King Abdulaziz City for Science and Technology (KACST), Riyadh 11442, Saudi Arabia.
| |
Collapse
|
3
|
Yuan S, Zhao C, Liu L, Zhou G. MGDM: Molecular generation using a multinomial diffusion model. Methods 2025; 239:1-9. [PMID: 40049434 DOI: 10.1016/j.ymeth.2025.03.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2025] [Revised: 02/12/2025] [Accepted: 03/03/2025] [Indexed: 03/20/2025] Open
Abstract
Accurate analysis of molecular structures and the rapid generation of valid molecules remain significant challenges in De Novo drug design. In this study, we propose the Multinomial Generated Diffusion Model (MGDM) for molecular generation. This model leverages a multinomial diffusion framework to process discrete data, with a focus on learning the multinomial distribution inherent in the dataset. During the generation process, the model progressively denoises molecules, transitioning from a uniform noise distribution to ultimately produce valid molecular structures. Initially, we generate molecules unconditionally to expand the compound library. In the next phase, we focus on generating molecules with specific properties to assess the model's capacity for conditional generation. For this, we implement a classifier-free guidance strategy, which directs the diffusion model's task without the need for training separate classifier models. To validate the effectiveness of our framework, we conducted experiments using the Molecular Sets (MOSES) dataset. The results demonstrate that, compared to several state-of-the-art methods, MGDM generates valid molecules while achieving superior or comparable performance in terms of novelty and diversity.
Collapse
Affiliation(s)
- Sisi Yuan
- Department of Bioinformatics and Genomics, the University of North Carolina at Charlotte, Charlotte, NC, USA.
| | - Chen Zhao
- College of Information Science and Engineering, Hunan University, Changsha, Hunan 410086, PR China.
| | - Lin Liu
- School of Information Science and Technology, Yunnan Normal University, Kunming 650500, PR China.
| | - Guifei Zhou
- School of Information Science and Technology, Yunnan Normal University, Kunming 650500, PR China.
| |
Collapse
|
4
|
Yewale A, Yang Y, Nazemifard N, Papageorgiou CD, Rielly CD, Benyahia B. Deep Reinforcement Learning-Based Self-Optimization of Flow Chemistry. ACS ENGINEERING AU 2025; 5:247-266. [PMID: 40556644 PMCID: PMC12183679 DOI: 10.1021/acsengineeringau.5c00004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/09/2025] [Revised: 04/26/2025] [Accepted: 04/29/2025] [Indexed: 06/28/2025]
Abstract
The development of effective synthetic pathways is critical in many industrial sectors. The growing adoption of flow chemistry has opened new opportunities for more cost-effective and environmentally friendly manufacturing technologies. However, the development of effective flow chemistry processes is still hampered by labor- and experiment-intensive methodologies and poor or suboptimal performance. In this context, integrating advanced machine learning strategies into chemical process optimization can significantly reduce experimental burdens and enhance overall efficiency. This paper demonstrates the capabilities of deep reinforcement learning (DRL) as an effective self-optimization strategy for imine synthesis in flow, a key building block in many compounds such as pharmaceuticals and heterocyclic products. A deep deterministic policy gradient (DDPG) agent was designed to iteratively interact with the environment, the flow reactor, and learn how to deliver optimal operating conditions. A mathematical model of the reactor was developed based on new experimental data to train the agent and evaluate alternative self-optimization strategies. To optimize the DDPG agent's training performance, different hyperparameter tuning methods were investigated and compared, including trial-and-error and Bayesian optimization. Most importantly, a novel adaptive dynamic hyperparameter tuning was implemented to further enhance the training performance and optimization outcome of the agent. The performance of the proposed DRL strategy was compared against state-of-the-art gradient-free methods, namely SnobFit and Nelder-Mead. Finally, the outcomes of the different self-optimization strategies were tested experimentally. It was shown that the proposed DDPG agent has superior performance compared to its self-optimization counterparts. It offered better tracking of the global solution and reduced the number of required experiments by approximately 50 and 75% compared to Nelder-Mead and SnobFit, respectively. These findings hold significant promise for the chemical engineering community, offering a robust, efficient, and sustainable approach to optimizing flow chemistry processes and paving the way for broader integration of data-driven methods in process design and operation.
Collapse
Affiliation(s)
- Ashish Yewale
- Department
of Chemical Engineering, Loughborough University, Loughborough, LeicestershireLE11 3TU, U.K.
| | - Yihui Yang
- Synthetic
Molecule Process Development, Process Engineering and Technology, Takeda Pharmaceuticals International Company, 40 Landsdowne Street, Cambridge, Massachusetts02139, United States
| | - Neda Nazemifard
- Synthetic
Molecule Process Development, Process Engineering and Technology, Takeda Pharmaceuticals International Company, 40 Landsdowne Street, Cambridge, Massachusetts02139, United States
| | - Charles D. Papageorgiou
- Synthetic
Molecule Process Development, Process Engineering and Technology, Takeda Pharmaceuticals International Company, 40 Landsdowne Street, Cambridge, Massachusetts02139, United States
| | - Chris D. Rielly
- Department
of Chemical Engineering, Loughborough University, Loughborough, LeicestershireLE11 3TU, U.K.
| | - Brahim Benyahia
- Department
of Chemical Engineering, Loughborough University, Loughborough, LeicestershireLE11 3TU, U.K.
| |
Collapse
|
5
|
Nguyen TM, Tawfik SA, Tran T, Gupta S, Rana S, Venkatesh S. The search for superionic solid-state electrolytes using a physics-informed generative model. MATERIALS HORIZONS 2025. [PMID: 40525853 DOI: 10.1039/d5mh00767d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2025]
Abstract
The discovery of superionic solid-state electrolytes for cation batteries is currently limited by the range of materials available in online materials databases. Generative artificial intelligence approaches have recently been applied to overcome this limitation and explore unknown stoichiometries and structures, but efficiently generating candidates that satisfy strict stability criteria remains challenging. Here we introduce a physics-informed hierarchical generative framework that leverages symmetry-aware crystallographic principles to systematically explore molecular configurations, lattice parameters, and bonding environments. Our approach integrates empirical physical constraints and reinforcement learning utilizing a hierarchical state representation to generate chemically valid and structurally stable candidates. We propose symmetry-aware hierarchical architecture for flow-based traversal with density (SHAFT-density) that ensures efficient exploration of the material search space, prioritizing low formation energy, molecular packing optimized for stability and conductivity, and enhanced electrochemical properties. We discovered new binary and ternary metastable phases, of which we find highly conductive LiBr, LiCl, Li2IBr, and Li3CBr2. These materials can either function as solid-state electrolyte materials or be part of solid-state electrolyte mixtures. Our results demonstrate the model's capability to identify stable, diverse, and potentially superionic compounds, offering promising candidates for developing next-generation solid-state electrolytes with improved characteristics.
Collapse
Affiliation(s)
- Tri Minh Nguyen
- Applied Artificial Intelligence Institute, Deakin University, Geelong, Victoria 3216, Australia.
| | - Sherif Abdulkader Tawfik
- Applied Artificial Intelligence Institute, Deakin University, Geelong, Victoria 3216, Australia.
| | - Truyen Tran
- Applied Artificial Intelligence Institute, Deakin University, Geelong, Victoria 3216, Australia.
| | - Sunil Gupta
- Applied Artificial Intelligence Institute, Deakin University, Geelong, Victoria 3216, Australia.
| | - Santu Rana
- Applied Artificial Intelligence Institute, Deakin University, Geelong, Victoria 3216, Australia.
| | - Svetha Venkatesh
- Applied Artificial Intelligence Institute, Deakin University, Geelong, Victoria 3216, Australia.
| |
Collapse
|
6
|
da Cunha NB, Fernandes FC, Gil-Ley A, Franco OL, Timakondu N, Costa FF. Bridging BioSciences and technology: The impact of AI & GenAI in life sciences and agribusiness. Gene 2025:149623. [PMID: 40516836 DOI: 10.1016/j.gene.2025.149623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2025] [Revised: 05/26/2025] [Accepted: 06/08/2025] [Indexed: 06/16/2025]
Abstract
The intersection of biosciences and technology has yielded transformative advancements, and Generative Artificial Intelligence (GenAI) started to stand at the forefront of this synergy. In the field of life sciences, GenAI is emerging as a catalyst, accelerating drug discovery by swiftly generating and predicting novel molecules. This expedites the identification of potential drug candidates, significantly reducing time and costs compared to traditional methods. Beyond drug discovery, GenAI contributes to protein folding predictions, genomics research, disease diagnosis and biomarker identification, enhancing our understanding of diseases and health conditions, fostering the development of personalized medicine. In agribusiness, GenAI proves instrumental in optimizing crop breeding and improving agricultural productivity. It can generate new crop varieties with desired traits by analyzing vast datasets comprising genomic and ecological information, addressing challenges such as disease resistance, improved yield, and enhanced nutritional content. Moreover, GenAI transcends traditional applications and extends its influence on synthetic biology, contributing to the design of novel enzymes and pathways. This opens avenues for bio-based manufacturing, renewable energy production, and environmental remediation. By harnessing the power of GenAI, the synergies between biosciences and technology accelerate innovation, improve efficiency, decrease costs, and address critical challenges. Conversely, the ethical considerations surrounding GenAI, especially Large Language Model (LLM) utilization in life sciences and agribusiness, such as data privacy, algorithmic bias, and the equitable distribution of benefits, must be addressed to ensure responsible and fair implementation, especially environment sustainability when utilizing this technology. This review article discusses the multifaceted impact of GenAI in a new era of advancements in life sciences and agribusiness.
Collapse
Affiliation(s)
- Nicolau Brito da Cunha
- Genomic Sciences and Biotechnology Program, Catholic University of Brasilia, SGAN 916 Modulo B, Bloco C, 70790-160 Brasília, DF, Brazil; Faculty of Agronomy and Veterinary Medicine (FAV), Campus Darcy Ribeiro, University of Brasilia (UnB), 70910-900 Brasília, DF, Brazil.
| | - Fabiano Cavalcanti Fernandes
- Genomic Sciences and Biotechnology Program, Catholic University of Brasilia, SGAN 916 Modulo B, Bloco C, 70790-160 Brasília, DF, Brazil; Computer Science Department, Instituto Federal de Brasília (IFB), Brasília, DF, Brazil
| | - Abel Gil-Ley
- Genomic Sciences and Biotechnology Program, Catholic University of Brasilia, SGAN 916 Modulo B, Bloco C, 70790-160 Brasília, DF, Brazil; S-Inova Biotech, Graduate Program in Biotechnology, Catholic University of Dom Bosco, Campo Grande, MT, Brazil
| | - Octavio L Franco
- Genomic Sciences and Biotechnology Program, Catholic University of Brasilia, SGAN 916 Modulo B, Bloco C, 70790-160 Brasília, DF, Brazil; S-Inova Biotech, Graduate Program in Biotechnology, Catholic University of Dom Bosco, Campo Grande, MT, Brazil
| | - Naagma Timakondu
- Cancer Biology and Epigenomics Program, Northwestern University's Feinberg School of Medicine, Chicago, IL 60611, USA; AIx4ALL, San Francisco Bay Area, CA 94066, USA
| | - Fabricio F Costa
- Genomic Sciences and Biotechnology Program, Catholic University of Brasilia, SGAN 916 Modulo B, Bloco C, 70790-160 Brasília, DF, Brazil; Cancer Biology and Epigenomics Program, Northwestern University's Feinberg School of Medicine, Chicago, IL 60611, USA; AIx4ALL, San Francisco Bay Area, CA 94066, USA.
| |
Collapse
|
7
|
Aw DZH, Zhang DX, Vignuzzi M. Strategies and efforts in circumventing the emergence of antiviral resistance against conventional antivirals. NPJ ANTIMICROBIALS AND RESISTANCE 2025; 3:54. [PMID: 40490516 DOI: 10.1038/s44259-025-00125-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/26/2024] [Accepted: 05/21/2025] [Indexed: 06/11/2025]
Abstract
Antiviral resistance stemming from rapid viral evolution and adaptation is a major challenge faced in treating viral infections. Here, we describe the mechanisms and factors underlying antiviral resistance and their implications to future drug development. Current improvements to conventional methods provide viable options to overcome antiviral resistance. Ongoing efforts in developing new antiviral strategies are also discussed. Examples from across virology are used to illustrate how virus evolution and antiviral therapy influence each other.
Collapse
Affiliation(s)
- Daryl Zheng Hao Aw
- A*STAR Infectious Diseases Labs (A*STAR ID Labs), Agency for Science, Technology and Research (A*STAR), 8A Biomedical Grove, Immunos #05-13, Singapore, 138648, Singapore
| | - Denzel Xugeng Zhang
- A*STAR Infectious Diseases Labs (A*STAR ID Labs), Agency for Science, Technology and Research (A*STAR), 8A Biomedical Grove, Immunos #05-13, Singapore, 138648, Singapore
| | - Marco Vignuzzi
- A*STAR Infectious Diseases Labs (A*STAR ID Labs), Agency for Science, Technology and Research (A*STAR), 8A Biomedical Grove, Immunos #05-13, Singapore, 138648, Singapore.
- Infectious Diseases Translational Research Programme, Department of Microbiology and Immunology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore.
| |
Collapse
|
8
|
Bassani D, Pavan M, Moro S. Evaluating AutoGrow4 - an open-source toolkit for semi-automated computer-aided drug discovery. Expert Opin Drug Discov 2025; 20:711-720. [PMID: 40299468 DOI: 10.1080/17460441.2025.2499122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2024] [Revised: 04/18/2025] [Accepted: 04/24/2025] [Indexed: 04/30/2025]
Abstract
INTRODUCTION Drug discovery is a long and expensive process characterized by a high failure rate. To make this process more rational and efficient, scientists always look for new and better ways to design novel ligands for a target of interest. Among different approaches, de novo ones gained popularity in the last decade, thanks to their ability to efficiently explore the chemical space and their increasing reliability in generating high-quality compounds. Autogrow4 is open-source software for de novo drug design that generates ligands for a given target by exploiting a combination of a genetic algorithm and molecular docking calculations. AREAS COVERED In the present paper, the authors dissect this program's usefulness and limitations in generating new compounds from a pharmacodynamic and pharmacokinetic perspective. Specifically, this article examines all reported applications of the Autogrow code in the literature (as retrieved from the Scopus database) from the release of its first version in 2009 to the present. EXPERT OPINION In the hands of an expert molecular modeler, Autogrow4 is a useful tool for de novo ligand design. Its modular and open-source codebase offers many protocol customization features. The main downsides are limited control over the pharmacokinetic features of generated ligands and the bias toward high molecular weight compounds.
Collapse
Affiliation(s)
| | | | - Stefano Moro
- Molecular Modeling Section (MMS), Department of Pharmaceutical and Pharmacological Sciences, University of Padova, Padova, Italy
| |
Collapse
|
9
|
Yang Y, Gu S, Liu B, Gong X, Lu R, Qiu J, Yao X, Liu H. DiffMC-Gen: A Dual Denoising Diffusion Model for Multi-Conditional Molecular Generation. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2025; 12:e2417726. [PMID: 40170290 PMCID: PMC12165109 DOI: 10.1002/advs.202417726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/29/2024] [Revised: 02/28/2025] [Indexed: 04/03/2025]
Abstract
The precise and efficient design of potential drug molecules with diverse physicochemical properties has long been a critical challenge. In recent years, the emergence of various deep learning-based de novo molecular generation algorithms offered new directions to this issue, among which denoising diffusion models have demonstrated significant potential. However, previous methods often fail to simultaneously optimize multiple properties of candidate compounds, which may stem from directly employing nongeometric graph neural networks (GNNs), rendering them incapable of accurately capturing molecular topologic and geometric information. In this study, a dual denoising diffusion model is developed for multi-conditional molecular generation (DiffMC-Gen), which integrates both discrete and continuous features to enhance its ability to perceive 3D molecular structures. Additionally, it involves a multi-objective optimization strategy to simultaneously optimize multiple properties of the target molecule, including binding affinity, drug-likeness, synthesizability, and toxicity. From the perspectives of both 2D and 3D molecular generation, the molecules generated by DiffMC-Gen exhibit state-of-the-art (SOTA) performance in terms of novelty and uniqueness, meanwhile achieving comparable results to previous methods in drug-likeness and synthesizability. Furthermore, the generated molecules have well-predicted biological activity and druglike properties for three target proteins-LRRK2, HPK1, and GLP-1 receptor, while also maintaining high standards of validity, uniqueness, and novelty. These results underscore its potential for practical applications in drug design.
Collapse
Affiliation(s)
- Yuwei Yang
- Faculty of Applied SciencesMacao Polytechnic UniversityMacao999078China
| | - Shukai Gu
- Faculty of Applied SciencesMacao Polytechnic UniversityMacao999078China
| | - Bo Liu
- Faculty of Applied SciencesMacao Polytechnic UniversityMacao999078China
| | - Xiaoqing Gong
- Faculty of Applied SciencesMacao Polytechnic UniversityMacao999078China
| | - Ruiqiang Lu
- Faculty of Applied SciencesMacao Polytechnic UniversityMacao999078China
| | - Jiayue Qiu
- Faculty of Applied SciencesMacao Polytechnic UniversityMacao999078China
| | - Xiaojun Yao
- Faculty of Applied SciencesMacao Polytechnic UniversityMacao999078China
| | - Huanxiang Liu
- Faculty of Applied SciencesMacao Polytechnic UniversityMacao999078China
| |
Collapse
|
10
|
Martínez León A, Ries B, Hub JS, Magarkar A. Moldrug algorithm for an automated ligand binding site exploration by 3D aware molecular enumerations. J Cheminform 2025; 17:85. [PMID: 40420238 DOI: 10.1186/s13321-025-01022-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2025] [Accepted: 04/22/2025] [Indexed: 05/28/2025] Open
Abstract
We present Moldrug, a computational tool for accelerating the hit-to-lead phase in structure-based drug design. Moldrug explores the chemical space using structural modifications suggested by the CReM library and by optimizing an adaptable fitness function with a genetic algorithm. Moldrug is complemented by Moldrug-Dashboard, a cross-platform and user-friendly graphical interface tailored for the analysis of Moldrug simulations. To illustrate Moldrug, we designed new potential inhibitors targeting the main protease (MPro) of SARS-CoV-2 by optimizing a consensus fitness function that balances binding affinity, drug-likeness, and synthetic accessibility. The designed molecules exhibited high chemical diversity. A subset of the designed molecules were ranked using MM/GBSA and alchemical binding free energy calculations, revealing predicted affinities as low as - 10 kcal mol - 1 . Moldrug is distributed as a Python package under the Apache 2.0 license. It offers pre-configured multi-parameter fitness functions for molecular design, while being highly adaptable for integrating functionalities from external software. Documentation and tutorials are available at https://moldrug.rtfd.io .
Collapse
Affiliation(s)
- Alejandro Martínez León
- Theoretical Physics and Center for Biophysics, Universität des Saarlandes, PharmaScienceHub (PSH), 66123, Saarbrücken, Germany
| | - Benjamin Ries
- Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co KG, Birkendorfer Str. 65, 88397, Biberach an der Riss, Germany
- Open Molecular Software Foundation, Open Free Energy, Davis, CA, 95616, USA
| | - Jochen S Hub
- Theoretical Physics and Center for Biophysics, Universität des Saarlandes, PharmaScienceHub (PSH), 66123, Saarbrücken, Germany
| | - Aniket Magarkar
- Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co KG, Birkendorfer Str. 65, 88397, Biberach an der Riss, Germany.
| |
Collapse
|
11
|
Strandgaard M, Linjordet T, Kneiding H, Burnage AL, Nova A, Jensen JH, Balcells D. A Deep Generative Model for the Inverse Design of Transition Metal Ligands and Complexes. JACS AU 2025; 5:2294-2308. [PMID: 40443902 PMCID: PMC12117439 DOI: 10.1021/jacsau.5c00242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/04/2025] [Revised: 04/15/2025] [Accepted: 04/15/2025] [Indexed: 06/02/2025]
Abstract
Deep generative models yielding transition metal complexes (TMCs) remain scarce despite the key role of these compounds in industrial catalytic processes, anticancer therapies, and the energy transition. Compared to drug discovery within the chemical space of organic molecules, TMCs pose further challenges, including the encoding of chemical bonds of higher complexity and the need to optimize multiple properties. In this work, we developed a generative model for the inverse design of transition metal ligands and complexes, based on the junction tree variational autoencoder (JT-VAE). After implementing a SMILES-based encoding of the metal-ligand bonds, the model was trained with the tmQMg-L ligand library, allowing for the generation of thousands of novel, highly diverse monodentate (κ1) and bidentate (κ2) ligands, including imines, phosphines, and carbenes. Further, the generated ligands were labeled with two target properties reflecting the stability and electron density of the associated homoleptic iridium TMCs: the HOMO-LUMO gap (ϵ) and the charge of the metal center (q Ir). This data was used to implement a conditional model that generated ligands from a prompt, with the single- or dual-objective of optimizing either or both the ϵ and q Ir properties and allowing for chemical interpretation based on the optimization trajectories. The optimizations also had an impact on other chemical properties, including ligand dissociation energies and oxidative addition barriers. A similar model was implemented to condition ligand generation by solubility and steric bulk.
Collapse
Affiliation(s)
- Magnus Strandgaard
- Hylleraas
Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo, P.O. Box 1033, Blindern, Oslo0315, Norway
- Department
of Chemistry, University of Copenhagen, Copenhagen2100, Denmark
| | - Trond Linjordet
- Hylleraas
Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo, P.O. Box 1033, Blindern, Oslo0315, Norway
| | - Hannes Kneiding
- Hylleraas
Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo, P.O. Box 1033, Blindern, Oslo0315, Norway
| | - Arron L. Burnage
- Hylleraas
Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo, P.O. Box 1033, Blindern, Oslo0315, Norway
| | - Ainara Nova
- Hylleraas
Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo, P.O. Box 1033, Blindern, Oslo0315, Norway
- Centre
for Materials Science and Nanotechnology, Department of Chemistry, University of Oslo, OsloN-0315, Norway
| | - Jan Halborg Jensen
- Department
of Chemistry, University of Copenhagen, Copenhagen2100, Denmark
| | - David Balcells
- Hylleraas
Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo, P.O. Box 1033, Blindern, Oslo0315, Norway
| |
Collapse
|
12
|
Cleeton C, Sarkisov L. Inverse design of metal-organic frameworks using deep dreaming approaches. Nat Commun 2025; 16:4806. [PMID: 40410161 PMCID: PMC12102185 DOI: 10.1038/s41467-025-59952-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Accepted: 05/05/2025] [Indexed: 05/25/2025] Open
Abstract
Exploring the expansive and largely untapped chemical space of metal-organic frameworks (MOFs) holds promise for revolutionising the field of materials science. MOFs, hailed for their modular architecture, offer unmatched flexibility in customising functionalities to meet specific application needs. However, navigating this chemical space to identify optimal MOF structures poses a significant challenge. Traditional high-throughput computational screening (HTCS), while useful, is often limited by a distribution bias towards materials not aligned with the desired functionalities. To overcome these limitations, this study adopts a "deep dreaming" methodology to optimise MOFs in silico, aiming to generate structures with systematically shifted properties that are closer to target functionalities from the outset. Our approach integrates property prediction and structure optimisation within a single interpretable framework, leveraging a specialised chemical language model augmented with attention mechanisms. Focusing on a curated set of MOF properties critical to applications like carbon capture and energy storage, we demonstrate how deep dreaming can be utilised as a tool for targeted material design.
Collapse
Affiliation(s)
- Conor Cleeton
- Department of Chemical Engineering, University of Manchester, Manchester, UK.
| | - Lev Sarkisov
- Department of Chemical Engineering, University of Manchester, Manchester, UK.
| |
Collapse
|
13
|
Wang J, Qin R, Wang M, Fang M, Zhang Y, Zhu Y, Su Q, Gou Q, Shen C, Zhang O, Wu Z, Jiang D, Zhang X, Zhao H, Ge J, Wu Z, Kang Y, Hsieh CY, Hou T. Token-Mol 1.0: tokenized drug design with large language models. Nat Commun 2025; 16:4416. [PMID: 40360500 PMCID: PMC12075800 DOI: 10.1038/s41467-025-59628-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2024] [Accepted: 04/25/2025] [Indexed: 05/15/2025] Open
Abstract
The integration of large language models (LLMs) into drug design is gaining momentum; however, existing approaches often struggle to effectively incorporate three-dimensional molecular structures. Here, we present Token-Mol, a token-only 3D drug design model that encodes both 2D and 3D structural information, along with molecular properties, into discrete tokens. Built on a transformer decoder and trained with causal masking, Token-Mol introduces a Gaussian cross-entropy loss function tailored for regression tasks, enabling superior performance across multiple downstream applications. The model surpasses existing methods, improving molecular conformation generation by over 10% and 20% across two datasets, while outperforming token-only models by 30% in property prediction. In pocket-based molecular generation, it enhances drug-likeness and synthetic accessibility by approximately 11% and 14%, respectively. Notably, Token-Mol operates 35 times faster than expert diffusion models. In real-world validation, it improves success rates and, when combined with reinforcement learning, further optimizes affinity and drug-likeness, advancing AI-driven drug discovery.
Collapse
Affiliation(s)
- Jike Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Rui Qin
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Mingyang Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Meijing Fang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Yangyang Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Yuchen Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Qun Su
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Qiaolin Gou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Chao Shen
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Odin Zhang
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA, USA
| | - Zhenxing Wu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Dejun Jiang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Xujun Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Huifeng Zhao
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Jingxuan Ge
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Zhourui Wu
- Key Laboratory of Spine and Spinal cord Injury Repair and Regeneration, Ministry of Education, Tongji University, Shanghai, China
| | - Yu Kang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China.
| | - Chang-Yu Hsieh
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China.
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China.
| |
Collapse
|
14
|
Zhang K, Lin Y, Wu G, Ren Y, Zhang X, Wang B, Zhang XY, Du W. Sculpting molecules in text-3D space: a flexible substructure aware framework for text-oriented molecular optimization. BMC Bioinformatics 2025; 26:123. [PMID: 40335938 PMCID: PMC12060419 DOI: 10.1186/s12859-025-06072-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2024] [Accepted: 01/30/2025] [Indexed: 05/09/2025] Open
Abstract
The integration of deep learning, particularly AI-Generated Content, with high-quality data derived from ab initio calculations has emerged as a promising avenue for transforming the landscape of scientific research. However, the challenge of designing molecular drugs or materials that incorporate multi-modality prior knowledge remains a critical and complex undertaking. Specifically, achieving a practical molecular design necessitates not only meeting the diversity requirements but also addressing structural and textural constraints with various symmetries outlined by domain experts. In this article, we present an innovative approach to tackle this inverse design problem by formulating it as a multi-modality guidance optimization task. Our proposed solution involves a textural-structure alignment symmetric diffusion framework for the implementation of molecular optimization tasks, namely 3DToMolo. 3DToMolo aims to harmonize diverse modalities including textual description features and graph structural features, aligning them seamlessly to produce molecular structures adhere to specified symmetric structural and textural constraints by experts in the field. Experimental trials across three guidance optimization settings have shown a superior hit optimization performance compared to state-of-the-art methodologies. Moreover, 3DToMolo demonstrates the capability to discover potential novel molecules, incorporating specified target substructures, without the need for prior knowledge. This work not only holds general significance for the advancement of deep learning methodologies but also paves the way for a transformative shift in molecular design strategies. 3DToMolo creates opportunities for a more nuanced and effective exploration of the vast chemical space, opening new frontiers in the development of molecular entities with tailored properties and functionalities.
Collapse
Affiliation(s)
- Kaiwei Zhang
- Institute of Information Engineering, Chinese Academy of Sciences, Beijing, 100085, China
| | - Yange Lin
- Huawei Technologies, Shenzhen, China
| | - Guangcheng Wu
- Department of Chemistry, The University of Hong Kong, Hong Kong SAR, 999077, China
| | | | | | - Bo Wang
- Huawei Technologies, Shenzhen, China
- School of Chemistry and Chemical Engineering, Harbin Institute of Technology, Harbin, 150001, China
| | - Xiao-Yu Zhang
- Institute of Information Engineering, Chinese Academy of Sciences, Beijing, 100085, China
| | - Weitao Du
- Huawei Technologies, Shenzhen, China.
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
15
|
Ryzhkov FV, Ryzhkova YE, Elinson MN. Machine learning: Python tools for studying biomolecules and drug design. Mol Divers 2025:10.1007/s11030-025-11199-2. [PMID: 40301135 DOI: 10.1007/s11030-025-11199-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2025] [Accepted: 04/13/2025] [Indexed: 05/01/2025]
Abstract
The increasing adoption of computational methods and artificial intelligence in scientific research has led to a growing interest in versatile tools like Python. In the fields of medical chemistry, biochemistry, and bioinformatics, Python has emerged as a key language for tackling complex challenges. It is used to solve various tasks, such as drug discovery, high-throughput and virtual screening, protein and genome analysis, and predicting drug efficacy. This review presents a list of tools for these tasks, including scripts, libraries, and ready-made programs, and serves as a starting point for scientists wishing to apply automation or optimization to routine tasks in medical chemistry and bioinformatics.
Collapse
Affiliation(s)
- Fedor V Ryzhkov
- N. D. Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, 47 Leninsky Prospekt, 119991, Moscow, Russia.
| | - Yuliya E Ryzhkova
- N. D. Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, 47 Leninsky Prospekt, 119991, Moscow, Russia
| | - Michail N Elinson
- N. D. Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, 47 Leninsky Prospekt, 119991, Moscow, Russia
| |
Collapse
|
16
|
Sun S, Huggins DJ. Comparing Molecules Generated by MMPDB and REINVENT4 with Ideas from Drug Discovery Design Teams. J Chem Inf Model 2025; 65:4219-4231. [PMID: 40207451 DOI: 10.1021/acs.jcim.5c00250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/11/2025]
Abstract
This study compares molecules designed by drug discovery project teams from the Sanders Tri-Institutional Therapeutics Discovery Institute with molecules generated by two computational tools: MMPDB and REINVENT4. Seven different test cases with diverse chemotypes are studied in order to explore the potential of these computational tools in complementing human expertise in the early stages of drug discovery. By comparing the molecular structures and properties generated by MMPDB and REINVENT4 to those designed by project design teams, we aim to assess the value of such tools. The results indicate that MMPDB and REINVENT4 cover regions of chemical space larger than those covered by ideas from the drug discovery project teams. However, the chemical spaces covered by the two methods are quite different, and neither method completely covers the chemical space identified by the drug discovery project teams. Thus, the computational methods are complementary to one another and to drug discovery project team ideation. Effective application of generative molecule design tools has the potential to accelerate the identification of novel therapeutic candidates by expanding the chemical space explored during drug discovery and enabling optimal exploration.
Collapse
Affiliation(s)
- Shan Sun
- Sanders Tri-Institutional Therapeutics Discovery Institute, New York, New York 10021, United States
| | - David J Huggins
- Sanders Tri-Institutional Therapeutics Discovery Institute, New York, New York 10021, United States
- Department of Physiology and Biophysics, Weill Cornell Medical College of Cornell University, New York, New York 10065, United States
| |
Collapse
|
17
|
Wang L, Liu Y, Fu X, Ye X, Shi J, Yen GG, Zou Q, Zeng X, Cao D. HMAMP: Designing Highly Potent Antimicrobial Peptides Using a Hypervolume-Driven Multiobjective Deep Generative Model. J Med Chem 2025; 68:8346-8360. [PMID: 40232176 DOI: 10.1021/acs.jmedchem.4c03073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/16/2025]
Abstract
Antimicrobial peptides (AMPs) have exhibited unprecedented potential as biomaterials in combating multidrug-resistant bacteria, prompting the proposal of many excellent generative models. However, the multiobjective nature of AMP discovery is often overlooked, contributing to the high attrition rate of drug candidates. Here, we propose a novel approach termed hypervolume-driven multiobjective AMP design (HMAMP), which prioritizes the simultaneous optimization of multiattribute AMPs. By synergizing reinforcement learning and a gradient descent algorithm rooted in the hypervolume maximization concept, HMAMP effectively biases generative processes and mitigates the pattern collapse issue. Comparative experiments show that HMAMP significantly outperforms state-of-the-art methods in effectiveness and diversity. A knee-based decision strategy is then employed to fast screen candidates with favorable physicochemical properties, aligning with the enhanced antimicrobial activity and reduced side effects. Molecular visualization further elucidates structural and functional properties of the AMPs. Overall, HMAMP is an effective approach to traverse large and complex exploration spaces to search for idealism-realism trade-off AMPs.
Collapse
Affiliation(s)
- Li Wang
- College of Computer Science and Electronic Engineering, Hunan University, ChangSha 410082, China
| | - Yiping Liu
- College of Computer Science and Electronic Engineering, Hunan University, ChangSha 410082, China
| | - Xiangzheng Fu
- School of Chinese Medicine, Hong Kong Baptist University, Hong Kong 999077, China
| | - Xiucai Ye
- System Information and Engineering, University of Tsukuba, Tsukuba 305-8571, Japan
| | - Junfeng Shi
- Interdisciplinary Life Sciences, Hunan University, ChangSha 410082, China
| | - Gary G Yen
- Electrical and Computer Engineering, Oklahoma State University, Stillwater, Oklahoma 74078, United States
| | - Quan Zou
- Basic and Frontier Research Institute, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, ChangSha 410082, China
| | - Dongsheng Cao
- Xiangya School of Pharmacy, Central South University, Changsha 410083, China
| |
Collapse
|
18
|
Fan X, Fang S, Li Z, Ji H, Yue M, Li J, Ren X. ICVAE: Interpretable Conditional Variational Autoencoder for De Novo Molecular Design. Int J Mol Sci 2025; 26:3980. [PMID: 40362221 PMCID: PMC12071458 DOI: 10.3390/ijms26093980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2025] [Revised: 04/15/2025] [Accepted: 04/21/2025] [Indexed: 05/15/2025] Open
Abstract
Recent studies have demonstrated that machine learning-based generative models can create novel molecules with desirable properties. Among them, Conditional Variational Autoencoder (CVAE) is a powerful approach to generate molecules with desired physiochemical and pharmacological properties. However, the CVAE's latent space is still a black-box, making it difficult to understand the relationship between the latent space and molecular properties. To address this issue, we propose the Interpretable Conditional Variational Autoencoder (ICVAE), which introduces a modified loss function that correlates the latent value with molecular properties. ICVAE established a linear mapping between latent variables and molecular properties. This linearity is not only crucial for improving interpretability, by assigning clear semantic meaning to latent dimensions, but also provides a practical advantage. It enables direct manipulation of molecular attributes through simple coordinate shifts in latent space, rather than relying on opaque, black-box optimization algorithms. Our experimental results show that the ICVAE can linearly relate one or multiple molecular properties with the latent value and generate molecules with precise properties by controlling the latent values. The ICVAE's interpretability allows us to gain insight into the molecular generation process, making it a promising approach in drug discovery and material design.
Collapse
Affiliation(s)
- Xiaqiong Fan
- School of Artificial Intelligence and Big Data, Henan University of Technology, Zhengzhou 450001, China; (X.F.); (M.Y.); (J.L.)
| | - Senlin Fang
- Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China; (S.F.); (Z.L.); (H.J.)
| | - Zhengyan Li
- Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China; (S.F.); (Z.L.); (H.J.)
- State Key Laboratory of Crop Stress Adaptation and Improvement, School of Life Sciences, Henan University, Kaifeng 475001, China
| | - Hongchao Ji
- Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China; (S.F.); (Z.L.); (H.J.)
| | - Minghan Yue
- School of Artificial Intelligence and Big Data, Henan University of Technology, Zhengzhou 450001, China; (X.F.); (M.Y.); (J.L.)
| | - Jiamin Li
- School of Artificial Intelligence and Big Data, Henan University of Technology, Zhengzhou 450001, China; (X.F.); (M.Y.); (J.L.)
| | - Xiaozhen Ren
- School of Artificial Intelligence and Big Data, Henan University of Technology, Zhengzhou 450001, China; (X.F.); (M.Y.); (J.L.)
| |
Collapse
|
19
|
Yates J, Van Allen EM. New horizons at the interface of artificial intelligence and translational cancer research. Cancer Cell 2025; 43:708-727. [PMID: 40233719 PMCID: PMC12007700 DOI: 10.1016/j.ccell.2025.03.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/24/2025] [Revised: 03/04/2025] [Accepted: 03/12/2025] [Indexed: 04/17/2025]
Abstract
Artificial intelligence (AI) is increasingly being utilized in cancer research as a computational strategy for analyzing multiomics datasets. Advances in single-cell and spatial profiling technologies have contributed significantly to our understanding of tumor biology, and AI methodologies are now being applied to accelerate translational efforts, including target discovery, biomarker identification, patient stratification, and therapeutic response prediction. Despite these advancements, the integration of AI into clinical workflows remains limited, presenting both challenges and opportunities. This review discusses AI applications in multiomics analysis and translational oncology, emphasizing their role in advancing biological discoveries and informing clinical decision-making. Key areas of focus include cellular heterogeneity, tumor microenvironment interactions, and AI-aided diagnostics. Challenges such as reproducibility, interpretability of AI models, and clinical integration are explored, with attention to strategies for addressing these hurdles. Together, these developments underscore the potential of AI and multiomics to enhance precision oncology and contribute to advancements in cancer care.
Collapse
Affiliation(s)
- Josephine Yates
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA; Institute for Machine Learning, Department of Computer Science, ETH Zürich, Zurich, Switzerland; ETH AI Center, ETH Zurich, Zurich, Switzerland; Swiss Institute for Bioinformatics (SIB), Lausanne, Switzerland
| | - Eliezer M Van Allen
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA; Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Division of Medical Sciences, Harvard University, Boston, MA, USA; Parker Institute for Cancer Immunotherapy, Dana-Farber Cancer Institute, Boston, MA, USA.
| |
Collapse
|
20
|
Basnet BB, Zhou ZY, Wei B, Wang H. Advances in AI-based strategies and tools to facilitate natural product and drug development. Crit Rev Biotechnol 2025:1-32. [PMID: 40159111 DOI: 10.1080/07388551.2025.2478094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2024] [Revised: 02/11/2025] [Accepted: 02/16/2025] [Indexed: 04/02/2025]
Abstract
Natural products and their derivatives have been important for treating diseases in humans, animals, and plants. However, discovering new structures from natural sources is still challenging. In recent years, artificial intelligence (AI) has greatly aided the discovery and development of natural products and drugs. AI facilitates to: connect genetic data to chemical structures or vice-versa, repurpose known natural products, predict metabolic pathways, and design and optimize metabolites biosynthesis. More recently, the emergence and improvement in neural networks such as deep learning and ensemble automated web based bioinformatics platforms have sped up the discovery process. Meanwhile, AI also improves the identification and structure elucidation of unknown compounds from raw data like mass spectrometry and nuclear magnetic resonance. This article reviews these AI-driven methods and tools, highlighting their practical applications and guide for efficient natural product discovery and drug development.
Collapse
Affiliation(s)
- Buddha Bahadur Basnet
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, China
- Central Department of Biotechnology, Tribhuvan University, Kathmandu, Nepal
| | - Zhen-Yi Zhou
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, China
| | - Bin Wei
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, China
| | - Hong Wang
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, China
- Key Laboratory of Marine Fishery Resources Exploitment, Utilization of Zhejiang Province, Zhejiang University of Technology, Hangzhou, China
| |
Collapse
|
21
|
Kattuparambil AA, Chaurasia DK, Shekhar S, Srinivasan A, Mondal S, Aduri R, Jayaram B. Exploring chemical space for "druglike" small molecules in the age of AI. Front Mol Biosci 2025; 12:1553667. [PMID: 40166082 PMCID: PMC11955463 DOI: 10.3389/fmolb.2025.1553667] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2024] [Accepted: 02/27/2025] [Indexed: 04/02/2025] Open
Abstract
The announcement of 2024 Nobel Prize in Chemistry to Alphafold has reiterated the role of AI in biology and mainly in the domain of "drug discovery". Till few years ago, structure-based drug design (SBDD) has been the preferred experimental design in many academic and pharmaceutical R and D divisions for developing novel therapeutics. However, with the advent of AI, the drug design field especially has seen a paradigm shift in its R&D across platforms. If "drug design" is a game, there are two main players, the small molecule drug and its target biomolecule, and the rules governing the game are mainly based on the interactions between these two players. In this brief review, we will be discussing our efforts in improving the state-of-the-art technology with respect to small molecules as well as in understanding the rules of the game. The review is broadly divided into five sections with the first section introducing the field and the challenges faced and the role of AI in this domain. In the second section, we describe some of the existing small molecule libraries developed in our labs and follow-up this section with a more recent knowledge-based resource available for public use. In section four, we describe some of the screening tools developed in our laboratories and are available for public use. Finally, section five delves into how domain knowledge is improving the utilization of AI in drug design. We provide three case studies from our work to illustrate this work. Finally, we conclude with our thoughts on the future scope of AI in drug design.
Collapse
Affiliation(s)
| | - Dheeraj Kumar Chaurasia
- School of Interdisciplinary Research, Indian Institute of Technology Delhi, New Delhi, India
- Supercomputing Facility for Bioinformatics and Computational Biology, Indian Institute of Technology Delhi, New Delhi, India
| | - Shashank Shekhar
- Supercomputing Facility for Bioinformatics and Computational Biology, Indian Institute of Technology Delhi, New Delhi, India
| | - Ashwin Srinivasan
- Department of Computer Science & Information Systems, BITS Pilani K K Birla Goa Campus, Zuarinagar, Goa, India
| | - Sukanta Mondal
- Department of Biological Sciences, BITS Pilani K K Birla Goa Campus, Zuarinagar, Goa, India
| | - Raviprasad Aduri
- Department of Biological Sciences, BITS Pilani K K Birla Goa Campus, Zuarinagar, Goa, India
| | - B. Jayaram
- Supercomputing Facility for Bioinformatics and Computational Biology, Indian Institute of Technology Delhi, New Delhi, India
- Department of Chemistry, Indian Institute of Technology Delhi, New Delhi, India
| |
Collapse
|
22
|
Ocana A, Pandiella A, Privat C, Bravo I, Luengo-Oroz M, Amir E, Gyorffy B. Integrating artificial intelligence in drug discovery and early drug development: a transformative approach. Biomark Res 2025; 13:45. [PMID: 40087789 PMCID: PMC11909971 DOI: 10.1186/s40364-025-00758-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2024] [Accepted: 03/05/2025] [Indexed: 03/17/2025] Open
Abstract
Artificial intelligence (AI) can transform drug discovery and early drug development by addressing inefficiencies in traditional methods, which often face high costs, long timelines, and low success rates. In this review we provide an overview of how to integrate AI to the current drug discovery and development process, as it can enhance activities like target identification, drug discovery, and early clinical development. Through multiomics data analysis and network-based approaches, AI can help to identify novel oncogenic vulnerabilities and key therapeutic targets. AI models, such as AlphaFold, predict protein structures with high accuracy, aiding druggability assessments and structure-based drug design. AI also facilitates virtual screening and de novo drug design, creating optimized molecular structures for specific biological properties. In early clinical development, AI supports patient recruitment by analyzing electronic health records and improves trial design through predictive modeling, protocol optimization, and adaptive strategies. Innovations like synthetic control arms and digital twins can reduce logistical and ethical challenges by simulating outcomes using real-world or virtual patient data. Despite these advancements, limitations remain. AI models may be biased if trained on unrepresentative datasets, and reliance on historical or synthetic data can lead to overfitting or lack generalizability. Ethical and regulatory issues, such as data privacy, also challenge the implementation of AI. In conclusion, in this review we provide a comprehensive overview about how to integrate AI into current processes. These efforts, although they will demand collaboration between professionals, and robust data quality, have a transformative potential to accelerate drug development.
Collapse
Affiliation(s)
- Alberto Ocana
- Experimental Therapeutics in Cancer Unit, Medical Oncology Department, Instituto de Investigación Sanitaria San Carlos (IdISSC), Hospital Clínico San Carlos and CIBERONC, Madrid, Spain.
- INTHEOS-CEU-START Catedra, Facultad de Medicina, Universidad CEU San Pablo, 28668 Boadilla del Monte, Madrid, Spain.
| | - Atanasio Pandiella
- Instituto de Biología Molecular y Celular del Cáncer, CSIC, IBSAL and CIBERONC, Salamanca, 37007, Spain
| | - Cristian Privat
- , CancerAppy, Av Ribera de Axpe, 28, Erando, 48950, Vizcaya, Spain
| | - Iván Bravo
- Facultad de Farmacia, Universidad de Castilla La Mancha, Albacete, Spain
| | | | - Eitan Amir
- Princess Margaret Cancer Center, Toronto, Canada
| | - Balazs Gyorffy
- Department of Bioinformatics, Semmelweis University, Tűzoltó U. 7-9, Budapest, 1094, Hungary
- Research Centre for Natural Sciences, Hungarian Research Network, Magyar Tudosok Korutja 2, Budapest, 1117, Hungary
- Department of Biophysics, Medical School, University of Pecs, Pecs, 7624, Hungary
| |
Collapse
|
23
|
Park J, Ahn J, Choi J, Kim J. Mol-AIR: Molecular Reinforcement Learning with Adaptive Intrinsic Rewards for Goal-Directed Molecular Generation. J Chem Inf Model 2025; 65:2283-2296. [PMID: 39988822 PMCID: PMC11898073 DOI: 10.1021/acs.jcim.4c01669] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2024] [Revised: 02/11/2025] [Accepted: 02/12/2025] [Indexed: 02/25/2025]
Abstract
Optimizing techniques for discovering molecular structures with desired properties is crucial in artificial intelligence (AI)-based drug discovery. Combining deep generative models with reinforcement learning has emerged as an effective strategy for generating molecules with specific properties. Despite its potential, this approach is ineffective in exploring the vast chemical space and optimizing particular chemical properties. To overcome these limitations, we present Mol-AIR, a reinforcement learning-based framework using adaptive intrinsic rewards for effective goal-directed molecular generation. Mol-AIR leverages the strengths of both history-based and learning-based intrinsic rewards by exploiting random distillation network and counting-based strategies. In benchmark tests, Mol-AIR demonstrates improved performance over existing approaches in generating molecules having the desired properties, including penalized LogP, QED, and celecoxib similarity, without any prior knowledge. We believe that Mol-AIR represents a significant advancement in drug discovery, offering a more efficient path to discovering novel therapeutics.
Collapse
Affiliation(s)
- Jinyeong Park
- Department
of Computer Science and Engineering, Incheon
National University, Incheon 22012, Republic
of Korea
| | - Jaegyoon Ahn
- Department
of Computer Science and Engineering, Incheon
National University, Incheon 22012, Republic
of Korea
| | - Jonghwan Choi
- Division
of Software, Hallym University, Chuncheon-si, Kangwon-do 24252, Republic
of Korea
| | - Jibum Kim
- Department
of Computer Science and Engineering, Incheon
National University, Incheon 22012, Republic
of Korea
- Center
for Brain-Machine Interface, Incheon National
University, Incheon 22012, Republic
of Korea
| |
Collapse
|
24
|
Kyro GW, Martin MT, Watt ED, Batista VS. CardioGenAI: a machine learning-based framework for re-engineering drugs for reduced hERG liability. J Cheminform 2025; 17:30. [PMID: 40045386 PMCID: PMC11881490 DOI: 10.1186/s13321-025-00976-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2024] [Accepted: 02/21/2025] [Indexed: 03/09/2025] Open
Abstract
The link between in vitro hERG ion channel inhibition and subsequent in vivo QT interval prolongation, a critical risk factor for the development of arrythmias such as Torsade de Pointes, is so well established that in vitro hERG activity alone is often sufficient to end the development of an otherwise promising drug candidate. It is therefore of tremendous interest to develop advanced methods for identifying hERG-active compounds in the early stages of drug development, as well as for proposing redesigned compounds with reduced hERG liability and preserved primary pharmacology. In this work, we present CardioGenAI, a machine learning-based framework for re-engineering both developmental and commercially available drugs for reduced hERG activity while preserving their pharmacological activity. The framework incorporates novel state-of-the-art discriminative models for predicting hERG channel activity, as well as activity against the voltage-gated NaV1.5 and CaV1.2 channels due to their potential implications in modulating the arrhythmogenic potential induced by hERG channel blockade. We applied the complete framework to pimozide, an FDA-approved antipsychotic agent that demonstrates high affinity to the hERG channel, and generated 100 refined candidates. Remarkably, among the candidates is fluspirilene, a compound which is of the same class of drugs as pimozide (diphenylmethanes) and therefore has similar pharmacological activity, yet exhibits over 700-fold weaker binding to hERG. Furthermore, we demonstrated the framework's ability to optimize hERG, NaV1.5 and CaV1.2 profiles of multiple FDA-approved compounds while maintaining the physicochemical nature of the original drugs. We envision that this method can effectively be applied to developmental compounds exhibiting hERG liabilities to provide a means of rescuing drug development programs that have stalled due to hERG-related safety concerns. Additionally, the discriminative models can also serve independently as effective components of virtual screening pipelines. We have made all of our software open-source at https://github.com/gregory-kyro/CardioGenAI to facilitate integration of the CardioGenAI framework for molecular hypothesis generation into drug discovery workflows.Scientific contributionThis work introduces CardioGenAI, an open-source machine learning-based framework designed to re-engineer drugs for reduced hERG liability while preserving their pharmacological activity. The complete CardioGenAI framework can be applied to developmental compounds exhibiting hERG liabilities to provide a means of rescuing drug discovery programs facing hERG-related challenges. In addition, the framework incorporates novel state-of-the-art discriminative models for predicting hERG, NaV1.5 and CaV1.2 channel activity, which can function independently as effective components of virtual screening pipelines.
Collapse
Affiliation(s)
- Gregory W Kyro
- Department of Chemistry, Yale University, New Haven, CT, 06511, USA.
- Drug Safety Research & Development, Pfizer Research & Development, Groton, CT, 06340, USA.
| | - Matthew T Martin
- Drug Safety Research & Development, Pfizer Research & Development, Groton, CT, 06340, USA
| | - Eric D Watt
- Drug Safety Research & Development, Pfizer Research & Development, Groton, CT, 06340, USA
| | - Victor S Batista
- Department of Chemistry, Yale University, New Haven, CT, 06511, USA.
| |
Collapse
|
25
|
Zhang Y, Huang J, Li X, Sun W, Zhang N, Zhang J, Chen T, Wang L. Self-awareness of retrosynthesis via chemically inspired contrastive learning for reinforced molecule generation. Brief Bioinform 2025; 26:bbaf185. [PMID: 40254835 PMCID: PMC12009711 DOI: 10.1093/bib/bbaf185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2024] [Revised: 03/19/2025] [Accepted: 03/30/2025] [Indexed: 04/22/2025] Open
Abstract
The recent progress of deep generative models in modeling complex real-world data distributions has enabled the generation of novel compounds with potential therapeutic applications for various diseases. However, most studies fail to optimize the properties of generated molecules from the perspective of the intrinsic nature of chemical reactions. In this work, we propose a novel molecule generation model to overcome the limitation by deep reinforcement learning, in which an agent learns to optimize the properties of molecules initialized with a chemically inspired contrastive pretrained model. We finally assess the generation model by evaluating its ability to generate inhibitors against two prominent therapeutic targets in cancer treatment. Experimental results show that our model could generate 100% valid and novel structures and also exhibits superior performance in generating molecules with fewer structural alerts against several baselines. More importantly, the molecules generated by our proposed model show potent biological activities against ataxia telangiectasia and Rad3-related (ATR) and cyclin-dependent kinase 9 (CDK9) targets in wet-lab experiments.
Collapse
Affiliation(s)
- Yi Zhang
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, No. 382 Waihuan East Road, Higher Education Mega Center, Guangzhou 510006, China
| | - Jindi Huang
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, No. 382 Waihuan East Road, Higher Education Mega Center, Guangzhou 510006, China
| | - Xinze Li
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, No. 382 Waihuan East Road, Higher Education Mega Center, Guangzhou 510006, China
| | - Wenqi Sun
- Guizhou Provincial Engineering Technology Research Center for Chemical Drug R&D, College of Pharmacy, Guizhou Medical University, No. 6 Ankang Avenue, Guian New District, Guiyang 561113, China
| | - Nana Zhang
- Guizhou Provincial Engineering Technology Research Center for Chemical Drug R&D, College of Pharmacy, Guizhou Medical University, No. 6 Ankang Avenue, Guian New District, Guiyang 561113, China
| | - Jiquan Zhang
- Guizhou Provincial Engineering Technology Research Center for Chemical Drug R&D, College of Pharmacy, Guizhou Medical University, No. 6 Ankang Avenue, Guian New District, Guiyang 561113, China
| | - Tiegen Chen
- Zhongshan Institute for Drug Discovery, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Zhongshan Life Science Park, No. 10 Heqing Road, Tsui Hang New District, Zhongshan 528400, China
| | - Ling Wang
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, No. 382 Waihuan East Road, Higher Education Mega Center, Guangzhou 510006, China
| |
Collapse
|
26
|
Zhang PZ, Ballard J, Esquivel Fagiani F, Smith D, Gibson C, Yu X. Large-Scale Compartmental Model-Based Study of Preclinical Pharmacokinetic Data and Its Impact on Compound Triaging in Drug Discovery. Mol Pharm 2025; 22:1230-1240. [PMID: 39960135 DOI: 10.1021/acs.molpharmaceut.4c00813] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/04/2025]
Abstract
Reliable and robust human dose prediction plays a pivotal role in drug discovery. The prediction of human dose requires proper modeling of preclinical intravenous (IV) pharmacokinetic (PK) data, which is usually achieved either through noncompartmental analysis (NCA) or compartmental analysis. While NCA is straightforward, it loses valuable information about the shape of the PK curves. In contrast, compartmental analysis offers a more comprehensive interpretation but poses challenges in scaling up for high-throughput applications in discovery. To address this challenge, we developed computational frameworks, termed compartmental PK (CPK) and automated dose prediction (ADP), to enable automated compartmental model-based IV PK data modeling, translation, and simulation for human dose prediction in compound triaging and optimization. With CPK and ADP, we analyzed compounds with data collected at the MRL between 2013 and 2023 to quantitatively characterize the impact of different PK modeling and simulation methods on human dose prediction. Our study revealed that despite minimal impact on estimating animal PK parameters, different methods significantly impacted predicted human dose, exposure, and Cmax, driven more by different simulation assumptions than by the PK modeling itself. CPK-ADP therefore enables us to efficiently perform complex human dose predictions on a large scale while integrating the latest and best information available on absorption, distribution, and clearance to support decision-making in discovery.
Collapse
Affiliation(s)
- Peter Zhiping Zhang
- Pharmacokinetics, Dynamics, Metabolism, and Bioanalytics (PDMB), MRL, Merck & Co., Inc., West Point, Pennsylvania 19486, United States
| | - Jeanine Ballard
- Pharmacokinetics, Dynamics, Metabolism, and Bioanalytics (PDMB), MRL, Merck & Co., Inc., West Point, Pennsylvania 19486, United States
| | - Facundo Esquivel Fagiani
- Pharmacokinetics, Dynamics, Metabolism, and Bioanalytics (PDMB), MRL, Merck & Co., Inc., West Point, Pennsylvania 19486, United States
| | - Dustin Smith
- Pharmacokinetics, Dynamics, Metabolism, and Bioanalytics (PDMB), MRL, Merck & Co., Inc., West Point, Pennsylvania 19486, United States
| | - Christopher Gibson
- Pharmacokinetics, Dynamics, Metabolism, and Bioanalytics (PDMB), MRL, Merck & Co., Inc., West Point, Pennsylvania 19486, United States
| | - Xiang Yu
- Pharmacokinetics, Dynamics, Metabolism, and Bioanalytics (PDMB), MRL, Merck & Co., Inc., West Point, Pennsylvania 19486, United States
| |
Collapse
|
27
|
Yang R, Li B, Dong J, Cai Z, Lin H, Wang F, Yang G. Reinforcement learning-based generative artificial intelligence for novel pesticide design. J Adv Res 2025:S2090-1232(25)00128-6. [PMID: 40032026 DOI: 10.1016/j.jare.2025.02.030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2024] [Revised: 02/04/2025] [Accepted: 02/23/2025] [Indexed: 03/05/2025] Open
Abstract
INTRODUCTION Pesticides play a pivotal role in ensuring food security, and the development of green pesticides is an inevitable trend in global agricultural progress. Although deep learning-based generative models have revolutionized de novo drug design in pharmaceutical research, their application in pesticide research and development remains unexplored. OBJECTIVES This study aims to pioneer the application of generative artificial intelligence to pesticide design by proposing a reinforcement learning-based framework for obtaining pesticide-like molecules with high binding affinity. METHODS This framework comprises two key components: PestiGen-G, which systematically explores the pesticide-like chemical space using a character-based generative model coupled with the REINFORCE algorithm; and PestiGen-S, which combines a fragment-based generative model with the Monte Carlo Tree Search algorithm to generate molecules that stably bind to the specific target protein. RESULTS Experimental results show that the molecules generated by PestiGen have superior pesticide-likeness and binding affinity compared to those generated by existing methods. In addition, we employ an active learning strategy to reduce the false-positive rate of the generated molecules. Finally, through collaboration with domain experts, we successfully designed a novel 4-hydroxyphenylpyruvate dioxygenase inhibitor (YH23768) with favorable enzyme inhibition and herbicidal potency. CONCLUSION This proof-of-concept study highlights the utility of PestiGen as a valuable tool for pesticide design. The web server based on the model is freely available at https://dpai.ccnu.edu.cn/PestiGen/.
Collapse
Affiliation(s)
- Ruoqi Yang
- State Key Laboratory of Green Pesticide, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan 430079, PR China
| | - Biao Li
- State Key Laboratory of Green Pesticide, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan 430079, PR China
| | - Jin Dong
- State Key Laboratory of Green Pesticide, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan 430079, PR China
| | - Zhuomei Cai
- State Key Laboratory of Green Pesticide, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan 430079, PR China
| | - Hongyan Lin
- State Key Laboratory of Green Pesticide, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan 430079, PR China
| | - Fan Wang
- State Key Laboratory of Green Pesticide, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan 430079, PR China.
| | - Guangfu Yang
- State Key Laboratory of Green Pesticide, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan 430079, PR China.
| |
Collapse
|
28
|
Lv Q, Chen G, Yang Z, Zhong W, Chen CYC. Meta-MolNet: A Cross-Domain Benchmark for Few Examples Drug Discovery. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:4849-4863. [PMID: 40038923 DOI: 10.1109/tnnls.2024.3359657] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
Predicting the pharmacological activity, toxicity, and pharmacokinetic properties of molecules is a central task in drug discovery. Existing machine learning methods are transferred from one resource rich molecular property to another data scarce property in the same scaffold dataset. However, existing models may produce fragile and highly uncertain predictions for new scaffold molecules. And these models were tested on different benchmarks, which seriously affected the quality of their evaluation results. In this article, we introduce Meta-MolNet, a collection of data benchmark and algorithms, which is a standard benchmark platform for measuring model generalization and uncertainty quantification capabilities. Meta-MolNet manages a wide range of molecular datasets with high ratio of molecules/scaffolds, which often leads to more difficult data shift and generalization problems. Furthermore, we propose a graph attention network based on cross-domain meta-learning, Meta-GAT, which uses bilevel optimization to learn meta-knowledge from the scaffold family molecular dataset in the source domain. Meta-GAT benefits from meta-knowledge that reduces the requirement of sample complexity to enable reliable predictions of new scaffold molecules in the target domain through internal iteration of a few examples. We evaluate existing methods as baselines for the community, and the Meta-MolNet benchmark demonstrates the effectiveness of measuring the proposed algorithm in domain generalization and uncertainty quantification. Extensive experiments demonstrate that the Meta-GAT model has state-of-the-art domain generalization performance and robustly estimates uncertainty under few examples constraints. By publishing AI-ready data, evaluation frameworks, and baseline results, we hope to see the Meta-MolNet suite become a comprehensive resource for the AI-assisted drug discovery community. Meta-MolNet is freely accessible at https://github.com/lol88/Meta-MolNet.
Collapse
|
29
|
Ambreen S, Umar M, Noor A, Jain H, Ali R. Advanced AI and ML frameworks for transforming drug discovery and optimization: With innovative insights in polypharmacology, drug repurposing, combination therapy and nanomedicine. Eur J Med Chem 2025; 284:117164. [PMID: 39721292 DOI: 10.1016/j.ejmech.2024.117164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2024] [Revised: 11/24/2024] [Accepted: 11/27/2024] [Indexed: 12/28/2024]
Abstract
Artificial Intelligence (AI) and Machine Learning (ML) are transforming drug discovery by overcoming traditional challenges like high costs, time-consuming, and frequent failures. AI-driven approaches streamline key phases, including target identification, lead optimization, de novo drug design, and drug repurposing. Frameworks such as deep neural networks (DNNs), convolutional neural networks (CNNs), and deep reinforcement learning (DRL) models have shown promise in identifying drug targets, optimizing delivery systems, and accelerating drug repurposing. Generative adversarial networks (GANs) and variational autoencoders (VAEs) aid de novo drug design by creating novel drug-like compounds with desired properties. Case studies, such as DDR1 kinase inhibitors designed using generative models and CDK20 inhibitors developed via structure-based methods, highlight AI's ability to produce highly specific therapeutics. Models like SNF-CVAE and DeepDR further advance drug repurposing by uncovering new therapeutic applications for existing drugs. Advanced ML algorithms enhance precision in predicting drug efficacy, toxicity, and ADME-Tox properties, reducing development costs and improving drug-target interactions. AI also supports polypharmacology by optimizing multi-target drug interactions and enhances combination therapy through predictions of drug synergies and antagonisms. In nanomedicine, AI models like CURATE.AI and the Hartung algorithm optimize personalized treatments by predicting toxicological risks and real-time dosing adjustments with high accuracy. Despite its potential, challenges like data quality, model interpretability, and ethical concerns must be addressed. High-quality datasets, transparent models, and unbiased algorithms are essential for reliable AI applications. As AI continues to evolve, it is poised to revolutionize drug discovery and personalized medicine, advancing therapeutic development and patient care.
Collapse
Affiliation(s)
- Subiya Ambreen
- Department of Pharmaceutical Chemistry, Delhi Institute of Pharmaceutical Sciences and Research (DIPSAR), DPSRU, Pushp Vihar, New Delhi, 110017, India
| | - Mohammad Umar
- Department of Pharmaceutical Chemistry, Delhi Institute of Pharmaceutical Sciences and Research (DIPSAR), DPSRU, Pushp Vihar, New Delhi, 110017, India
| | - Aaisha Noor
- Department of Pharmaceutical Chemistry, Delhi Institute of Pharmaceutical Sciences and Research (DIPSAR), DPSRU, Pushp Vihar, New Delhi, 110017, India
| | - Himangini Jain
- Department of Pharmaceutical Chemistry, Delhi Institute of Pharmaceutical Sciences and Research (DIPSAR), DPSRU, Pushp Vihar, New Delhi, 110017, India
| | - Ruhi Ali
- Department of Pharmaceutical Chemistry, Delhi Institute of Pharmaceutical Sciences and Research (DIPSAR), DPSRU, Pushp Vihar, New Delhi, 110017, India.
| |
Collapse
|
30
|
Abstract
In this work, we introduce ChemBFN, a language model that handles chemistry tasks based on Bayesian flow networks working with discrete data. A new accuracy schedule is proposed to improve sampling quality by significantly reducing reconstruction loss. We show evidence that our method is appropriate for generating molecules with satisfied diversity, even when a smaller number of sampling steps is used. A classifier-free guidance method is adapted for conditional generation. It is also worthwhile to point out that after generative training, our model can be fine-tuned on regression and classification tasks with state-of-the-art performance, which opens the gate of building all-in-one models in a single module style. Our model has been open sourced at https://github.com/Augus1999/bayesian-flow-network-for-chemistry.
Collapse
Affiliation(s)
- Nianze Tao
- Department of Chemistry, Graduate School of Advanced Science and Engineering, Hiroshima University, 1-3-1 Kagamiyama, Higashi-Hiroshima 739-8524, Japan
| | - Minori Abe
- Department of Chemistry, Graduate School of Advanced Science and Engineering, Hiroshima University, 1-3-1 Kagamiyama, Higashi-Hiroshima 739-8524, Japan
| |
Collapse
|
31
|
Li X, Walsh R, Abbas W, Pascual-Diaz S, Hand C, Garland R, Khan FM, Mohan Das N, Desai V, AbouZleikha M, Clark MA. Multiclass Synthetic Accessibility Prediction. J Chem Inf Model 2025; 65:1155-1165. [PMID: 39818777 DOI: 10.1021/acs.jcim.4c01663] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2025]
Abstract
Evaluating synthetic accessibility of in silico molecules is an integral component of the drug discovery process. While the application of machine learning models to predict whether small molecules are easy or hard to synthesize has gained attention recently, predetermined thresholds and data set imbalances present challenges for these binary classification approaches. In this study, we introduce a novel multiclass fold-ensembled classification approach to predict the minimum number of steps needed to synthesize a small molecule. By ensembling the base models trained on multiple stratified subsampled folds, this approach effectively mitigates the impact of class imbalance through probability aggregation or voting aggregation strategies. Additionally, we propose fuzzy evaluation metrics that account for practical tolerances in predictions, providing a more flexible and realistic assessment of model performance. Through experimentation on two reaction benchmark data sets, we demonstrate the effectiveness of our model in a multiclass synthetic accessibility prediction task and the superiority of our proposed method over six existing models in binary synthetic accessibility prediction tasks.
Collapse
Affiliation(s)
- Xinqi Li
- X-Chem U.K., 1 Ashley Road, Altrincham, Cheshire WA14 2DT, U.K
| | - Ryan Walsh
- X-Chem Canada, 4800 Rue Levy, Montreal QC H4R 2P1, Canada
- X-Chem Global HQ, 100 Beaver Street, Waltham, Massachusetts 02453, United States
| | - Waseem Abbas
- X-Chem U.K., 1 Ashley Road, Altrincham, Cheshire WA14 2DT, U.K
| | | | - Calum Hand
- X-Chem U.K., 1 Ashley Road, Altrincham, Cheshire WA14 2DT, U.K
| | - Rory Garland
- X-Chem U.K., 1 Ashley Road, Altrincham, Cheshire WA14 2DT, U.K
| | | | | | - Vedant Desai
- X-Chem U.K., 1 Ashley Road, Altrincham, Cheshire WA14 2DT, U.K
| | | | - Matthew A Clark
- X-Chem Global HQ, 100 Beaver Street, Waltham, Massachusetts 02453, United States
| |
Collapse
|
32
|
Beck AG, Fine J, Lam YH, Sherer EC, Regalado EL, Aggarwal P. Dedenser: A Python Package for Clustering and Downsampling Chemical Libraries. J Chem Inf Model 2025; 65:1053-1060. [PMID: 39883037 DOI: 10.1021/acs.jcim.4c01980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2025]
Abstract
The screening of chemical libraries is an essential starting point in the drug discovery process. While some researchers desire a more thorough screening of drug targets against a narrower scope of molecules, it is not uncommon for diverse screening sets to be favored during the early stages of drug discovery. However, a cost burden is associated with the screening of molecules, with potential drawbacks if particular areas of chemical space are needlessly overrepresented. To facilitate triaged sampling of chemical libraries and other collections of molecules, we have developed Dedenser, a tool for the downsampling of chemical clusters. Dedenser functions by reducing the membership of clusters within chemical point clouds while maintaining the initial topology or distribution in chemical space. Dedenser is a Python package that utilizes Hierarchical Density-Based Spatial Clustering of Applications with Noise to first identify clusters present in 3D chemical point clouds and then downsamples by applying Poisson disk sampling to clusters based on either their volume or density in chemical space. A command line interface tool and graphic user interface are available with Dedenser, which allow for the generation of chemical point clouds, using Mordred for QSAR descriptor calculations and uniform manifold approximation and projection for 3D embedding, as well as visualization. We hope that Dedenser will serve the community by enabling quick access to reduced collections of molecules that are representative of larger sets and selecting even distributions of molecules within clusters rather than single representative molecules from clusters. All code for Dedenser is open source and available at https://github.com/MSDLLCpapers/dedenser.
Collapse
Affiliation(s)
- Armen G Beck
- Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, New Jersey 07065, United States
| | - Jonathan Fine
- Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, New Jersey 07065, United States
| | - Yu-Hong Lam
- Modeling and Informatics, MRL, Merck & Co., Inc., Rahway, New Jersey 07065, United States
| | - Edward C Sherer
- Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, New Jersey 07065, United States
| | - Erik L Regalado
- Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, New Jersey 07065, United States
| | - Pankaj Aggarwal
- Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, New Jersey 07065, United States
| |
Collapse
|
33
|
Pržulj N, Malod-Dognin N. Simplicity within biological complexity. BIOINFORMATICS ADVANCES 2025; 5:vbae164. [PMID: 39927291 PMCID: PMC11805345 DOI: 10.1093/bioadv/vbae164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Revised: 10/01/2024] [Accepted: 10/23/2024] [Indexed: 02/11/2025]
Abstract
Motivation Heterogeneous, interconnected, systems-level, molecular (multi-omic) data have become increasingly available and key in precision medicine. We need to utilize them to better stratify patients into risk groups, discover new biomarkers and targets, repurpose known and discover new drugs to personalize medical treatment. Existing methodologies are limited and a paradigm shift is needed to achieve quantitative and qualitative breakthroughs. Results In this perspective paper, we survey the literature and argue for the development of a comprehensive, general framework for embedding of multi-scale molecular network data that would enable their explainable exploitation in precision medicine in linear time. Network embedding methods (also called graph representation learning) map nodes to points in low-dimensional space, so that proximity in the learned space reflects the network's topology-function relationships. They have recently achieved unprecedented performance on hard problems of utilizing few omic data in various biomedical applications. However, research thus far has been limited to special variants of the problems and data, with the performance depending on the underlying topology-function network biology hypotheses, the biomedical applications, and evaluation metrics. The availability of multi-omic data, modern graph embedding paradigms and compute power call for a creation and training of efficient, explainable and controllable models, having no potentially dangerous, unexpected behaviour, that make a qualitative breakthrough. We propose to develop a general, comprehensive embedding framework for multi-omic network data, from models to efficient and scalable software implementation, and to apply it to biomedical informatics, focusing on precision medicine and personalized drug discovery. It will lead to a paradigm shift in the computational and biomedical understanding of data and diseases that will open up ways to solve some of the major bottlenecks in precision medicine and other domains.
Collapse
Affiliation(s)
- Nataša Pržulj
- Computational Biology Department, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, 00000, United Arabic Emirates
- Barcelona Supercomputing Center, Barcelona 08034, Spain
- Department of Computer Science, University College London, London WC1E6BT, United Kingdom
- ICREA, Pg. Lluís Companys 23, Barcelona 08010, Spain
| | | |
Collapse
|
34
|
Choudhury A, Ghosh D. Hybrid Unsupervised/Supervised Machine Learning for Identifying Molecular Structural Fingerprints From Ensemble Property. J Comput Chem 2025; 46:e70038. [PMID: 39868791 DOI: 10.1002/jcc.70038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2024] [Revised: 12/22/2024] [Accepted: 12/26/2024] [Indexed: 01/28/2025]
Abstract
The ensemble properties of a system are obtained by averaging over the properties calculated for the various configurations it can have at a finite temperature and thus cannot be captured by a single molecular structure. Such ensemble properties are often important in material discovery. In designing new materials, the goal is to predict those ensemble structures that display a tailored property. However, mapping this average property to multiple structures introduces ambiguities and unreliable convergence in supervised machine learning. This presents a major obstacle in designing new materials. Here, we introduce a hybrid unsupervised/supervised learning method and demonstrate how to predict the structural parameters defining the conformers of a heterogeneous system, melanin, from its ensemble-averaged spectra. This also shows a new way to identify different structural fingerprints responsible for an ensemble-averaged superposition spectrum.
Collapse
Affiliation(s)
- Arpan Choudhury
- School of Chemical Sciences, Indian Association for the Cultivation of Science, Kolkata, India
| | - Debashree Ghosh
- School of Chemical Sciences, Indian Association for the Cultivation of Science, Kolkata, India
| |
Collapse
|
35
|
Wang J, Luo H, Qin R, Wang M, Wan X, Fang M, Zhang O, Gou Q, Su Q, Shen C, You Z, Liu L, Hsieh CY, Hou T, Kang Y. 3DSMILES-GPT: 3D molecular pocket-based generation with token-only large language model. Chem Sci 2025; 16:637-648. [PMID: 39664804 PMCID: PMC11629531 DOI: 10.1039/d4sc06864e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2024] [Accepted: 12/03/2024] [Indexed: 12/13/2024] Open
Abstract
The generation of three-dimensional (3D) molecules based on target structures represents a cutting-edge challenge in drug discovery. Many existing approaches often produce molecules with invalid configurations, unphysical conformations, suboptimal drug-like qualities, limited synthesizability, and require extensive generation times. To address these challenges, we present 3DSMILES-GPT, a fully language-model-driven framework for 3D molecular generation that utilizes tokens exclusively. We treat both two-dimensional (2D) and 3D molecular representations as linguistic expressions, combining them through full-dimensional representations and pre-training the model on a vast dataset encompassing tens of millions of drug-like molecules. This token-only approach enables the model to comprehensively understand the 2D and 3D characteristics of large-scale molecules. Subsequently, we fine-tune the model using pair-wise structural data of protein pockets and molecules, followed by reinforcement learning to further optimize the biophysical and chemical properties of the generated molecules. Experimental results demonstrate that 3DSMILES-GPT generates molecules that comprehensively outperform existing methods in terms of binding affinity, drug-likeness (QED), and synthetic accessibility score (SAS). Notably, it achieves a 33% enhancement in the quantitative estimation of QED, meanwhile the binding affinity estimated by Vina docking maintaining its state-of-the-art performance. The generation speed is remarkably fast, with the average time approximately 0.45 seconds per generation, representing a threefold increase over the fastest existing methods. This innovative 3DSMILES-GPT approach has the potential to positively impact the generation of 3D molecules in drug discovery.
Collapse
Affiliation(s)
- Jike Wang
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Hao Luo
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Rui Qin
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Mingyang Wang
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Xiaozhe Wan
- Advanced Computing and Storage Laboratory, Central Research Institute, 2012 Laboratories, Huawei Technologies Co., Ltd Nanjing 210000 Jiangsu China
| | - Meijing Fang
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Odin Zhang
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Qiaolin Gou
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Qun Su
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Chao Shen
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Ziyi You
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Liwei Liu
- Advanced Computing and Storage Laboratory, Central Research Institute, 2012 Laboratories, Huawei Technologies Co., Ltd Nanjing 210000 Jiangsu China
| | - Chang-Yu Hsieh
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Yu Kang
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| |
Collapse
|
36
|
Bai J, Ni Y, Zhang Y, Wan J, Liang L, Qiao H, Zhu Y, Zhao Q, Li H. AI-based Virtual Screening of Traditional Chinese Medicine and the Discovery of Novel Inhibitors of TCTP. Curr Comput Aided Drug Des 2025; 21:362-374. [PMID: 38310576 DOI: 10.2174/0115734099277605231218071503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 10/16/2023] [Accepted: 10/23/2023] [Indexed: 02/06/2024]
Abstract
BACKGROUND Translationally controlled tumour protein (TCTP) is associated with tumor diseases, such as breast cancer, and its inhibitor can reduce the growth of tumor cells. Unfortunately, there is currently no effective medication available for treating TCTP-related breast cancer. OBJECTIVES The objective of this study was to explore the inhibitor candidates among natural compounds for the treatment of breast cancer related to TCTP protein. METHODS To explore the potential inhibitors of TCTP, we first screened out four potential inhibitors in the Traditional Chinese Medicine (TCM) for cancer based on AI virtual screening using the docking method, and then revealed the interaction mechanism of TCTP and four candidate inhibitors from TCM with molecular docking and molecular dynamics (MD) methods. RESULTS Based on the conformational characteristics and the MD properties of the four leading compounds, we designed the new skeleton molecules with the AI method using MolAICal software. Our MD simulations have revealed that different small molecules bind to different sites of TCTP, but the flexible regions and the signaling pathways are almost the same, and the VDW and hydrophobic interactions are crucial in the interactions between TCTP and ligands. CONCLUSION We have proposed the candidate inhibitor of TCTP. Our study has provided a potential new method for exploring inhibitors from Traditional Chinese Medicine (TCM).
Collapse
Affiliation(s)
- Juxia Bai
- College of Mathematics and Physics, Shanghai University of Electric Power, Shanghai, 201306, China
| | - Yangyang Ni
- College of Mathematics and Physics, Shanghai University of Electric Power, Shanghai, 201306, China
| | - Yuqi Zhang
- College of Mathematics and Physics, Shanghai University of Electric Power, Shanghai, 201306, China
| | - Junfeng Wan
- College of Mathematics and Physics, Shanghai University of Electric Power, Shanghai, 201306, China
| | - Liqun Liang
- College of Mathematics and Physics, Shanghai University of Electric Power, Shanghai, 201306, China
| | - Haoran Qiao
- College of Mathematics and Physics, Shanghai University of Electric Power, Shanghai, 201306, China
| | - Yanyan Zhu
- College of Mathematics and Physics, Shanghai University of Electric Power, Shanghai, 201306, China
| | - Qingjie Zhao
- Shanghai Frontiers Science Center for TCM Chemical Biology, Innovation Research Institute of Traditional Chinese Medicine, Shanghai University of Traditional Chinese Medicine, 1200 Cailun Road, Shanghai, 201203, China
| | - Huiyu Li
- College of Mathematics and Physics, Shanghai University of Electric Power, Shanghai, 201306, China
| |
Collapse
|
37
|
Obeidat R, Alsmadi I, Baker QB, Al-Njadat A, Srinivasan S. Researching public health datasets in the era of deep learning: a systematic literature review. Health Informatics J 2025; 31:14604582241307839. [PMID: 39794941 DOI: 10.1177/14604582241307839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2025]
Abstract
Objective: Explore deep learning applications in predictive analytics for public health data, identify challenges and trends, and then understand the current landscape. Materials and Methods: A systematic literature review was conducted in June 2023 to search articles on public health data in the context of deep learning, published from the inception of medical and computer science databases through June 2023. The review focused on diverse datasets, abstracting applications, challenges, and advancements in deep learning. Results: 2004 articles were reviewed, identifying 14 disease categories. Observed trends include explainable-AI, patient embedding learning, and integrating different data sources and employing deep learning models in health informatics. Noted challenges were technical reproducibility and handling sensitive data. Discussion: There has been a notable surge in deep learning applications on public health data publications since 2015. Consistent deep learning applications and models continue to be applied across public health data. Despite the wide applications, a standard approach still does not exist for addressing the outstanding challenges and issues in this field. Conclusion: Guidelines are needed for applying deep learning and models in public health data to improve FAIRness, efficiency, transparency, comparability, and interoperability of research. Interdisciplinary collaboration among data scientists, public health experts, and policymakers is needed to harness the full potential of deep learning.
Collapse
Affiliation(s)
- Rand Obeidat
- Department of Management Information Systems, Bowie State University, Bowie, USA
| | - Izzat Alsmadi
- Department of Computational, Engineering and Mathematical Sciences, Texas A & M San Antonio, San Antonio, USA
| | - Qanita Bani Baker
- Department of Computer Science, Jordan University of Science and Technology, Irbid, Jordan
| | | | - Sriram Srinivasan
- Department of Management Information Systems, Bowie State University, Bowie, USA
| |
Collapse
|
38
|
Singh PK, Sachan K, Khandelwal V, Singh S, Singh S. Role of Artificial Intelligence in Drug Discovery to Revolutionize the Pharmaceutical Industry: Resources, Methods and Applications. Recent Pat Biotechnol 2025; 19:35-52. [PMID: 39840410 DOI: 10.2174/0118722083297406240313090140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Revised: 02/22/2024] [Accepted: 02/28/2024] [Indexed: 01/23/2025]
Abstract
Traditional drug discovery methods such as wet-lab testing, validations, and synthetic techniques are time-consuming and expensive. Artificial Intelligence (AI) approaches have progressed to the point where they can have a significant impact on the drug discovery process. Using massive volumes of open data, artificial intelligence methods are revolutionizing the pharmaceutical industry. In the last few decades, many AI-based models have been developed and implemented in many areas of the drug development process. These models have been used as a supplement to conventional research to uncover superior pharmaceuticals expeditiously. AI's involvement in the pharmaceutical industry was used mostly for reverse engineering of existing patents and the invention of new synthesis pathways. Drug research and development to repurposing and productivity benefits in the pharmaceutical business through clinical trials. AI is studied in this article for its numerous potential uses. We have discussed how AI can be put to use in the pharmaceutical sector, specifically for predicting a drug's toxicity, bioactivity, and physicochemical characteristics, among other things. In this review article, we have discussed its application to a variety of problems, including de novo drug discovery, target structure prediction, interaction prediction, and binding affinity prediction. AI for predicting drug interactions and nanomedicines were also considered.
Collapse
Affiliation(s)
- Pranjal Kumar Singh
- Department of Pharmacy, Kalka Institute for Research and Advanced Studies, Meerut, Uttar Pradesh, India
| | - Kapil Sachan
- KIET School of Pharmacy, KIET Group of Institutions, Ghaziabad, Uttar Pradesh, India
| | - Vishal Khandelwal
- Department of Biotechnology, GLA University, Mathura, Uttar Pradesh, India
| | - Sumita Singh
- Faculty of Pharmacy, Swami Vivekanand Subharti University, Meerut, Uttar Pradesh, India
| | - Smita Singh
- SRM Modinagar College of Pharmacy, SRM Institute of Science and Technology, Delhi NCR Campus, Modinagar, Ghaziabad, Uttar Pradesh, India
| |
Collapse
|
39
|
Attri M, Raghav A, Sinha J. Revolutionising Neurological Therapeutics: Investigating Drug Repurposing Strategies. CNS & NEUROLOGICAL DISORDERS DRUG TARGETS 2025; 24:115-131. [PMID: 39323347 DOI: 10.2174/0118715273329531240911075309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/09/2024] [Revised: 07/08/2024] [Accepted: 07/15/2024] [Indexed: 09/27/2024]
Abstract
Repurposing drugs (DR) has become a viable approach to hasten the search for cures for neurodegenerative diseases (NDs). This review examines different off-target and on-target drug discovery techniques and how they might be used to find possible treatments for non-diagnostic depressions. Off-target strategies look at the known or unknown side effects of currently approved drugs for repositioning, whereas on-target strategies connect disease pathways to targets that can be treated with drugs. The review highlights the potential of experimental and computational methodologies, such as machine learning, proteomic techniques, network and genomics-based approaches, and in silico screening, in uncovering new drug-disease correlations. It also looks at difficulties and failed attempts at drug repurposing for NDs, highlighting the necessity of exact and standardised procedures to increase success rates. This review's objectives are to address the purpose of drug repurposing in human disorders, particularly neurological diseases, and to provide an overview of repurposing candidates that are presently undergoing clinical trials for neurological conditions, along with any possible causes and early findings. We then include a list of drug repurposing strategies, restrictions, and difficulties for upcoming research.
Collapse
Affiliation(s)
- Meenakshi Attri
- School of Medical & Allied Sciences, K.R. Mangalam University, Gurugram, Haryana 122103, India
| | - Asha Raghav
- Department of Pharmaceutics, School of Health Sciences, Sushant University, Gurugram, Haryana 122003, India
| | - Jyoti Sinha
- Department of Pharmaceutics, School of Health Sciences, Sushant University, Gurugram, Haryana 122003, India
| |
Collapse
|
40
|
Shi H, Wang Z, Zhou L, Xu Z, Xie L, Kong R, Chang S. Status and Prospects of Research on Deep Learning-based De Novo Generation of Drug Molecules. Curr Comput Aided Drug Des 2025; 21:257-269. [PMID: 38321907 DOI: 10.2174/0115734099287389240126072433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 01/10/2024] [Accepted: 01/18/2024] [Indexed: 02/08/2024]
Abstract
Traditional molecular de novo generation methods, such as evolutionary algorithms, generate new molecules mainly by linking existing atomic building blocks. The challenging issues in these methods include difficulty in synthesis, failure to achieve desired properties, and structural optimization requirements. Advances in deep learning offer new ideas for rational and robust de novo drug design. Deep learning, a branch of machine learning, is more efficient than traditional methods for processing problems, such as speech, image, and translation. This study provides a comprehensive overview of the current state of research in de novo drug design based on deep learning and identifies key areas for further development. Deep learning-based de novo drug design is pivotal in four key dimensions. Molecular databases form the basis for model training, while effective molecular representations impact model performance. Common DL models (GANs, RNNs, VAEs, CNNs, DMs) generate drug molecules with desired properties. The evaluation metrics guide research directions by determining the quality and applicability of generated molecules. This abstract highlights the foundational aspects of DL-based de novo drug design, offering a concise overview of its multifaceted contributions. Consequently, deep learning in de novo molecule generation has attracted more attention from academics and industry. As a result, many deep learning-based de novo molecule generation types have been actively proposed.
Collapse
Affiliation(s)
- Huanghao Shi
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, 213001, China
| | - Zhichao Wang
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, 213001, China
| | - Litao Zhou
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, 213001, China
| | - Zhiwang Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, 213001, China
| | - Liangxu Xie
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, 213001, China
| | - Ren Kong
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, 213001, China
| | - Shan Chang
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, 213001, China
| |
Collapse
|
41
|
Li J, Zhang O, Sun K, Wang Y, Guan X, Bagni D, Haghighatlari M, Kearns FL, Parks C, Amaro RE, Head-Gordon T. Mining for Potent Inhibitors through Artificial Intelligence and Physics: A Unified Methodology for Ligand Based and Structure Based Drug Design. J Chem Inf Model 2024; 64:9082-9097. [PMID: 38843070 PMCID: PMC11683870 DOI: 10.1021/acs.jcim.4c00634] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2024] [Revised: 05/19/2024] [Accepted: 05/21/2024] [Indexed: 12/11/2024]
Abstract
Determining the viability of a new drug molecule is a time- and resource-intensive task that makes computer-aided assessments a vital approach to rapid drug discovery. Here we develop a machine learning algorithm, iMiner, that generates novel inhibitor molecules for target proteins by combining deep reinforcement learning with real-time 3D molecular docking using AutoDock Vina, thereby simultaneously creating chemical novelty while constraining molecules for shape and molecular compatibility with target active sites. Moreover, through the use of various types of reward functions, we have introduced novelty in generative tasks for new molecules such as chemical similarity to a target ligand, molecules grown from known protein bound fragments, and creation of molecules that enforce interactions with target residues in the protein active site. The iMiner algorithm is embedded in a composite workflow that filters out Pan-assay interference compounds, Lipinski rule violations, uncommon structures in medicinal chemistry, and poor synthetic accessibility with options for cross-validation against other docking scoring functions and automation of a molecular dynamics simulation to measure pose stability. We also allow users to define a set of rules for the structures they would like to exclude during the training process and postfiltering steps. Because our approach relies only on the structure of the target protein, iMiner can be easily adapted for the future development of other inhibitors or small molecule therapeutics of any target protein.
Collapse
Affiliation(s)
- Jie Li
- Pitzer
Center for Theoretical Chemistry, Department of Chemistry, University of California, Berkeley, California 94720, United States
| | - Oufan Zhang
- Pitzer
Center for Theoretical Chemistry, Department of Chemistry, University of California, Berkeley, California 94720, United States
| | - Kunyang Sun
- Pitzer
Center for Theoretical Chemistry, Department of Chemistry, University of California, Berkeley, California 94720, United States
| | - Yingze Wang
- Pitzer
Center for Theoretical Chemistry, Department of Chemistry, University of California, Berkeley, California 94720, United States
| | - Xingyi Guan
- Pitzer
Center for Theoretical Chemistry, Department of Chemistry, University of California, Berkeley, California 94720, United States
| | - Dorian Bagni
- Pitzer
Center for Theoretical Chemistry, Department of Chemistry, University of California, Berkeley, California 94720, United States
| | - Mojtaba Haghighatlari
- Pitzer
Center for Theoretical Chemistry, Department of Chemistry, University of California, Berkeley, California 94720, United States
| | - Fiona L. Kearns
- Department
of Chemistry and Biochemistry, University
of California, San Diego, La Jolla, California 92093, United States
| | - Conor Parks
- Department
of Chemistry and Biochemistry, University
of California, San Diego, La Jolla, California 92093, United States
| | - Rommie E. Amaro
- Department
of Chemistry and Biochemistry, University
of California, San Diego, La Jolla, California 92093, United States
| | - Teresa Head-Gordon
- Pitzer
Center for Theoretical Chemistry, Department of Chemistry, University of California, Berkeley, California 94720, United States
- Departments
of Bioengineering and Chemical and Biomolecular Engineering, University of California, Berkeley, California 94720, United States
| |
Collapse
|
42
|
Li X, Chen H, Yan J, Liu G, Li C, Zhou X, Wang Y, Wu Y, Yan B, Yan X. Balancing the Functionality and Biocompatibility of Materials with a Deep-Learning-Based Inverse Design Framework. ENVIRONMENT & HEALTH (WASHINGTON, D.C.) 2024; 2:875-885. [PMID: 39722843 PMCID: PMC11667291 DOI: 10.1021/envhealth.4c00088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/05/2024] [Revised: 07/08/2024] [Accepted: 07/09/2024] [Indexed: 12/28/2024]
Abstract
The rational design of molecules with the desired functionality presents a significant challenge in chemistry. Moreover, it is worth noting that making chemicals safe and sustainable is crucial to bringing them to the market. To address this, we propose a novel deep learning framework developed explicitly for inverse design of molecules with both functionality and biocompatibility. This innovative approach comprises two predictive models and one generative model, facilitating the targeted screening of novel molecules from created virtual chemical space. Our method's versatility is highlighted in the inverse design process, where it successfully generates molecules with specified motifs or composition, discovers synthetically accessible molecules, and jointly targets functional and safe properties beyond the training regime. The utility of this method is demonstrated in its ability to design ionic liquids (ILs) with enhanced antibacterial properties and reduced cytotoxicity, addressing the issue of balancing functionality and biocompatibility in molecular design.
Collapse
Affiliation(s)
- Xiaofang Li
- Institute
of Environmental Research at Greater Bay Area, Key Laboratory for
Water Quality and Conservation of the Pearl River Delta, Ministry
of Education, Guangzhou University, Guangzhou 510006, China
| | - Hanle Chen
- Institute
of Environmental Research at Greater Bay Area, Key Laboratory for
Water Quality and Conservation of the Pearl River Delta, Ministry
of Education, Guangzhou University, Guangzhou 510006, China
| | - Jiachen Yan
- Institute
of Environmental Research at Greater Bay Area, Key Laboratory for
Water Quality and Conservation of the Pearl River Delta, Ministry
of Education, Guangzhou University, Guangzhou 510006, China
| | - Guohong Liu
- School
of Health, Guangzhou Vocational University
of Science and Technology, Guangzhou 510555, China
| | - Chengjun Li
- Institute
of Environmental Research at Greater Bay Area, Key Laboratory for
Water Quality and Conservation of the Pearl River Delta, Ministry
of Education, Guangzhou University, Guangzhou 510006, China
| | - Xiaoxia Zhou
- Institute
of Environmental Research at Greater Bay Area, Key Laboratory for
Water Quality and Conservation of the Pearl River Delta, Ministry
of Education, Guangzhou University, Guangzhou 510006, China
| | - Yan Wang
- College
of Animal Science, South China Agricultural
University, Guangzhou 510642, China
| | - Yinbao Wu
- College
of Animal Science, South China Agricultural
University, Guangzhou 510642, China
| | - Bing Yan
- Institute
of Environmental Research at Greater Bay Area, Key Laboratory for
Water Quality and Conservation of the Pearl River Delta, Ministry
of Education, Guangzhou University, Guangzhou 510006, China
| | - Xiliang Yan
- Institute
of Environmental Research at Greater Bay Area, Key Laboratory for
Water Quality and Conservation of the Pearl River Delta, Ministry
of Education, Guangzhou University, Guangzhou 510006, China
- College
of Animal Science, South China Agricultural
University, Guangzhou 510642, China
| |
Collapse
|
43
|
Morán-González L, Betten JE, Kneiding H, Balcells D. AABBA Graph Kernel: Atom-Atom, Bond-Bond, and Bond-Atom Autocorrelations for Machine Learning. J Chem Inf Model 2024; 64:8756-8769. [PMID: 39580812 PMCID: PMC11632777 DOI: 10.1021/acs.jcim.4c01583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2024] [Revised: 11/03/2024] [Accepted: 11/15/2024] [Indexed: 11/26/2024]
Abstract
Graphs are one of the most natural and powerful representations available for molecules; natural because they have an intuitive correspondence to skeletal formulas, the language used by chemists worldwide, and powerful, because they are highly expressive both globally (molecular topology) and locally (atom and bond properties). Graph kernels are used to transform molecular graphs into fixed-length vectors, which, based on their capacity of measuring similarity, can be used as fingerprints for machine learning (ML). To date, graph kernels have mostly focused on the atomic nodes of the graph. In this work, we developed a graph kernel based on atom-atom, bond-bond, and bond-atom (AABBA) autocorrelations. The resulting vector representations were tested on regression ML tasks on a data set of transition metal complexes; a benchmark motivated by the higher complexity of these compounds relative to organic molecules. In particular, we tested different flavors of the AABBA kernel in the prediction of the energy barriers and bond distances of the Vaska's complex data set (Friederich et al., Chem. Sci., 2020, 11, 4584). For a variety of ML models, including neural networks, gradient boosting machines, and Gaussian processes, we showed that AABBA outperforms the baseline including only atom-atom autocorrelations. Dimensionality reduction studies also showed that the bond-bond and bond-atom autocorrelations yield many of the most relevant features. We believe that the AABBA graph kernel can accelerate the exploration of large chemical spaces and inspire novel molecular representations in which both atomic and bond properties play an important role.
Collapse
Affiliation(s)
- Lucía Morán-González
- Hylleraas
Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo, P.O. Box 1033 0315 Oslo, Norway
- Centre
for Materials Science and Nanotechnology, Department of Chemistry, University of Oslo, P.O.
Box 1033 0315 Oslo, Norway
| | - Jørn Eirik Betten
- Simula
Research Laboratory, Kristian Augusts Gate 23, 0164 Oslo, Norway
| | - Hannes Kneiding
- Hylleraas
Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo, P.O. Box 1033 0315 Oslo, Norway
| | - David Balcells
- Hylleraas
Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo, P.O. Box 1033 0315 Oslo, Norway
| |
Collapse
|
44
|
Nahal Y, Menke J, Martinelli J, Heinonen M, Kabeshov M, Janet JP, Nittinger E, Engkvist O, Kaski S. Human-in-the-loop active learning for goal-oriented molecule generation. J Cheminform 2024; 16:138. [PMID: 39654043 PMCID: PMC11629536 DOI: 10.1186/s13321-024-00924-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2024] [Accepted: 11/02/2024] [Indexed: 12/12/2024] Open
Abstract
Machine learning (ML) systems have enabled the modelling of quantitative structure-property relationships (QSPR) and structure-activity relationships (QSAR) using existing experimental data to predict target properties for new molecules. These property predictors hold significant potential in accelerating drug discovery by guiding generative artificial intelligence (AI) agents to explore desired chemical spaces. However, they often struggle to generalize due to the limited scope of the training data. When optimized by generative agents, this limitation can result in the generation of molecules with artificially high predicted probabilities of satisfying target properties, which subsequently fail experimental validation. To address this challenge, we propose an adaptive approach that integrates active learning (AL) and iterative feedback to refine property predictors, thereby improving the outcomes of their optimization by generative AI agents. Our method leverages the Expected Predictive Information Gain (EPIG) criterion to select additional molecules for evaluation by an oracle. This process aims to provide the greatest reduction in predictive uncertainty, enabling more accurate model evaluations of subsequently generated molecules. Recognizing the impracticality of immediate wet-lab or physics-based experiments due to time and logistical constraints, we propose leveraging human experts for their cost-effectiveness and domain knowledge to effectively augment property predictors, bridging gaps in the limited training data. Empirical evaluations through both simulated and real human-in-the-loop experiments demonstrate that our approach refines property predictors to better align with oracle assessments. Additionally, we observe improved accuracy of predicted properties as well as improved drug-likeness among the top-ranking generated molecules. SCIENTIFIC CONTRIBUTION: We present an adaptable framework that integrates AL and human expertise to refine property predictors for goal-oriented molecule generation. This approach is robust to noise in human feedback and ensures that navigating chemical space with human-refined predictors leverages human insights to identify molecules that not only satisfy predicted property profiles but also score highly on oracle models. Additionally, it prioritizes practical characteristics such as drug-likeness, synthetic accessibility, and a favorable balance between exploring diverse chemical space and exploiting similarity to existing training data.
Collapse
Affiliation(s)
- Yasmine Nahal
- Department of Computer Science, Aalto University, 02150, Espoo, Finland.
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, 431 83, Mölndal, Sweden.
| | - Janosch Menke
- Department of Computer Science and Engineering, Chalmers University of Technology, 412 96, Gothenburg, Sweden
| | - Julien Martinelli
- Inserm Bordeaux Population Health, Vaccine Research Institute, Université de Bordeaux, Inria Bordeaux Sud-ouest, 33405, Talence, France
| | - Markus Heinonen
- Department of Computer Science, Aalto University, 02150, Espoo, Finland
| | - Mikhail Kabeshov
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, 431 83, Mölndal, Sweden
| | - Jon Paul Janet
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, 431 83, Mölndal, Sweden
| | - Eva Nittinger
- Medicinal Chemistry, Research and Early Development, Respiratory and Immunology (R&I), R&D, AstraZeneca, 412 96, Gothenburg, Sweden
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, 431 83, Mölndal, Sweden
- Department of Computer Science and Engineering, Chalmers University of Technology, 412 96, Gothenburg, Sweden
| | - Samuel Kaski
- Department of Computer Science, Aalto University, 02150, Espoo, Finland
- Department of Computer Science, University of Manchester, Manchester, M13 9PL, United Kingdom
| |
Collapse
|
45
|
Arisa OT, Beatson EL, Reno A, Chau CH, Aurigemma R, Steeg PS, Figg WD. Navigating the oncology drug discovery and development process with programmes supported by the National Institutes of Health. Lancet Oncol 2024; 25:e685-e693. [PMID: 39637905 DOI: 10.1016/s1470-2045(24)00348-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 06/13/2024] [Accepted: 06/14/2024] [Indexed: 12/07/2024]
Abstract
The translation of basic drug discoveries from laboratories to clinical use presents substantial challenges. Factors such as insufficient funding, misdirected project focus, and inability to understand a drug's limitations or strengths contribute to the difficulty of this process. To address these issues, the National Institutes of Health (NIH) has established various resources dedicated to streamlining drug development. The NIH offers access to regularly curated databases encompassing categories like drug discovery, target discovery, genomics, proteomics, and clinical datasets. The NIH also provides access to key resources through various programmes, such as the Developmental Therapeutics Program, focusing on preclinical drug discovery and the Cancer Therapy Evaluation Program, which oversees clinical trial efforts for investigational agents. These resources might include funding opportunities, access to a network of scientific experts, and services to address gaps in scientific work. This Review explores the diverse platforms and resources available at the NIH and outlines how researchers can leverage them to expedite the drug development process.
Collapse
Affiliation(s)
- Oluwatobi T Arisa
- Clinical Pharmacology Program, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Erica L Beatson
- Molecular Pharmacology Section, Genitourinary Malignancies Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Annieka Reno
- Clinical Pharmacology Program, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Cindy H Chau
- Molecular Pharmacology Section, Genitourinary Malignancies Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Rosemarie Aurigemma
- Developmental Therapeutics Program, Division of Cancer Treatment and Diagnosis, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Patricia S Steeg
- Women Malignancies Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA; Office of Translational Resources, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - William D Figg
- Clinical Pharmacology Program, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA; Office of Translational Resources, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA; Molecular Pharmacology Section, Genitourinary Malignancies Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
46
|
Gao J, Wang D. Quantifying the use and potential benefits of artificial intelligence in scientific research. Nat Hum Behav 2024; 8:2281-2292. [PMID: 39394445 DOI: 10.1038/s41562-024-02020-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Accepted: 09/12/2024] [Indexed: 10/13/2024]
Abstract
The rapid advancement of artificial intelligence (AI) is poised to reshape almost every line of work. Despite enormous efforts devoted to understanding AI's economic impacts, we lack a systematic understanding of the benefits to scientific research associated with the use of AI. Here we develop a measurement framework to estimate the direct use of AI and associated benefits in science. We find that the use and benefits of AI appear widespread throughout the sciences, growing especially rapidly since 2015. However, there is a substantial gap between AI education and its application in research, highlighting a misalignment between AI expertise supply and demand. Our analysis also reveals demographic disparities, with disciplines with higher proportions of women or Black scientists reaping fewer benefits from AI, potentially exacerbating existing inequalities in science. These findings have implications for the equity and sustainability of the research enterprise, especially as the integration of AI with science continues to deepen.
Collapse
Affiliation(s)
- Jian Gao
- Center for Science of Science and Innovation, Northwestern University, Evanston, IL, USA
- Kellogg School of Management, Northwestern University, Evanston, IL, USA
- Ryan Institute on Complexity, Northwestern University, Evanston, IL, USA
- Faculty of Social Sciences, The University of Hong Kong, Hong Kong SAR, China
| | - Dashun Wang
- Center for Science of Science and Innovation, Northwestern University, Evanston, IL, USA.
- Kellogg School of Management, Northwestern University, Evanston, IL, USA.
- Ryan Institute on Complexity, Northwestern University, Evanston, IL, USA.
- McCormick School of Engineering, Northwestern University, Evanston, IL, USA.
| |
Collapse
|
47
|
Srivastava N, Verma S, Singh A, Shukla P, Singh Y, Oza AD, Kaur T, Chowdhury S, Kapoor M, Yadav AN. Advances in artificial intelligence-based technologies for increasing the quality of medical products. Daru 2024; 33:1. [PMID: 39613923 PMCID: PMC11607247 DOI: 10.1007/s40199-024-00548-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2024] [Accepted: 10/09/2024] [Indexed: 12/01/2024] Open
Abstract
Artificial intelligence (AI) is a technology that combines machine learning (ML) and deep learning. It has numerous usages in the domains of medicine and other sciences. Artificial intelligence can forecast the behavior of a drug's target protein and predict its desired physicochemical qualities. AI's potential to enhance healthcare services offerings formerly unheard-of opportunities for cost reserves, enhanced overall clinical and patient outcomes. The recent development of research in the biomedical field, encompassing fields such as genomics, computational medicine, AI, and algorithms for learning, has led to the demand for novel technology, a fresh workforce, and new standards of practice set the stage for the revolution in healthcare. By connecting these health statistics with cutting-edge AI technologies, precise insights into patient treatment can be obtained. Moreover, AI can aid in the search for new drugs by foretelling the target protein's two-dimensional structure. In the current review, an overview of the latest AI-based technologies and how they may be employed to reduce product development time to market and snowballing product quality, cost-effectiveness, as well as security throughout the manufacturing process is detailed.
Collapse
Affiliation(s)
- Nidhi Srivastava
- Maharishi School of Pharmaceutical Sciences, Maharishi University of Information and Technology, Lucknow, Uttar Pradesh, India.
| | - Sneha Verma
- Maharishi School of Science and Humanities, Maharishi University of Information and Technology, Lucknow, Uttar Pradesh, India
| | - Anupama Singh
- Maharishi School of Pharmaceutical Sciences, Maharishi University of Information and Technology, Lucknow, Uttar Pradesh, India
| | - Pranki Shukla
- Maharishi School of Pharmaceutical Sciences, Maharishi University of Information and Technology, Lucknow, Uttar Pradesh, India
| | - Yashvardhan Singh
- Maharishi School of Pharmaceutical Sciences, Maharishi University of Information and Technology, Lucknow, Uttar Pradesh, India
| | - Ankit D Oza
- Department of Mechanical Engineering, Parul Institute of Technology, Parul University, Vadodara, Gujarat, India
| | - Tanvir Kaur
- Department of Biotechnology, Graphic Era Deemed to be University, Dehradun, Uttarakhand, India
| | - Sohini Chowdhury
- Chitkara Centre for Research and Development, Chitkara University, Baddi, Himachal Pradesh, India
| | - Monit Kapoor
- Centre of Research Impact and Outcome, Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura, Punjab, India
| | - Ajar Nath Yadav
- Department of Genetics, Plant Breeding and Biotechnology, Dr. Khem Singh Gill Akal College of Agriculture, Eternal University, Baru Sahib, Sirmour, Himachal Pradesh, India.
- University Centre for Research and Development, Chandigarh University, Mohali, Punjab, India.
| |
Collapse
|
48
|
Flores-Hernandez H, Martinez-Ledesma E. A systematic review of deep learning chemical language models in recent era. J Cheminform 2024; 16:129. [PMID: 39558376 PMCID: PMC11571686 DOI: 10.1186/s13321-024-00916-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Accepted: 10/17/2024] [Indexed: 11/20/2024] Open
Abstract
Discovering new chemical compounds with specific properties can provide advantages for fields that rely on materials for their development, although this task comes at a high cost in terms of complexity and resources. Since the beginning of the data age, deep learning techniques have revolutionized the process of designing molecules by analyzing and learning from representations of molecular data, greatly reducing the resources and time involved. Various deep learning approaches have been developed to date, using a variety of architectures and strategies, in order to explore the extensive and discontinuous chemical space, providing benefits for generating compounds with specific properties. In this study, we present a systematic review that offers a statistical description and comparison of the strategies utilized to generate molecules through deep learning techniques, utilizing the metrics proposed in Molecular Sets (MOSES) or Guacamol. The study included 48 articles retrieved from a query-based search of Scopus and Web of Science and 25 articles retrieved from citation search, yielding a total of 72 retrieved articles, of which 62 correspond to chemical language models approaches to molecule generation and other 10 retrieved articles correspond to molecular graph representations. Transformers, recurrent neural networks (RNNs), generative adversarial networks (GANs), Structured Space State Sequence (S4) models, and variational autoencoders (VAEs) are considered the main deep learning architectures used for molecule generation in the set of retrieved articles. In addition, transfer learning, reinforcement learning, and conditional learning are the most employed techniques for biased model generation and exploration of specific chemical space regions. Finally, this analysis focuses on the central themes of molecular representation, databases, training dataset size, validity-novelty trade-off, and performance of unbiased and biased chemical language models. These themes were selected to conduct a statistical analysis utilizing graphical representation and statistical tests. The resulting analysis reveals the main challenges, advantages, and opportunities in the field of chemical language models over the past four years.
Collapse
Affiliation(s)
- Hector Flores-Hernandez
- Tecnológico de Monterrey, School of Engineering and Sciences, Monterrey, 64710, Nuevo León, México
| | - Emmanuel Martinez-Ledesma
- Tecnológico de Monterrey, School of Medicine and Health Sciences, Monterrey, 64710, Nuevo León, México.
- Institute for Obesity Research, Tecnológico de Monterrey, Monterrey, 64710, Nuevo León, México.
| |
Collapse
|
49
|
Li Y, Ma F, Wang Z, Chen X. Transferable and Interpretable Prediction of Site-Specific Dehydrogenation Reaction Rate Constants with NMR Spectra. J Phys Chem Lett 2024; 15:11282-11290. [PMID: 39495481 DOI: 10.1021/acs.jpclett.4c02647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2024]
Abstract
Accurate and efficient determination of site-specific reaction rate constants over a wide temperature range remains challenging, both experimentally and theoretically. Taking the dehydrogenation reaction as an example, our study addresses this issue by an innovative combination of machine learning techniques and cost-effective NMR spectra. Through descriptor screening, we identified a minimal set of NMR chemical shifts that can effectively determine reaction rate constants. The constructed model performs exceptionally well on theoretical data sets and demonstrates impressive generalization capabilities, extending from small molecules to larger ones. Furthermore, this model shows outstanding performance when applied to limited experimental data sets, highlighting its robust applicability and transferability. Utilizing the Sure Independence Screening and Sparsifying Operator (SISSO) algorithm, we also present an interpretable rate constant-temperature-NMR (k-T-NMR) relationship with a mathematical formula. This study reveals the great potential of combining machine learning with easily accessible spectroscopic descriptors in the study of reaction kinetics, enabling the rapid determination of reaction rate constants and promoting our understanding of reactivity.
Collapse
Affiliation(s)
- Yanbo Li
- School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, China
- GuSu Laboratory of Materials, Suzhou 215123, China
| | - Fenfen Ma
- GuSu Laboratory of Materials, Suzhou 215123, China
| | - Zhandong Wang
- National Synchrotron Radiation Laboratory, University of Science and Technology of China, Hefei, Anhui 230029, China
| | - Xin Chen
- Suzhou Laboratory, Suzhou 215123, China
| |
Collapse
|
50
|
Zhao D, Zhou J, Tu S, Xu L. De Novo Drug Design by Multi-Objective Path Consistency Learning With Beam A * Search. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:2459-2470. [PMID: 39383073 DOI: 10.1109/tcbb.2024.3477592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/11/2024]
Abstract
Generating high-quality and drug-like molecules from scratch within the expansive chemical space presents a significant challenge in the field of drug discovery. In prior research, value-based reinforcement learning algorithms have been employed to generate molecules with multiple desired properties iteratively. The immediate reward was defined as the evaluation of intermediate-state molecules at each step, and the learning objective would be maximizing the expected cumulative evaluation scores for all molecules along the generative path. However, this definition of the reward was misleading, as in reality, the optimization target should be the evaluation score of only the final generated molecule. Furthermore, in previous works, randomness was introduced into the decision-making process, enabling the generation of diverse molecules but no longer pursuing the maximum future rewards. In this paper, immediate reward is defined as the improvement achieved through the modification of the molecule to maximize the evaluation score of the final generated molecule exclusively. Originating from the A search, path consistency (PC), i.e., values on one optimal path should be identical, is employed as the objective function in the update of the value estimator to train a multi-objective de novo drug designer. By incorporating the value into the decision-making process of beam search, the DrugBA algorithm is proposed to enable the large-scale generation of molecules that exhibit both high quality and diversity. Experimental results demonstrate a substantial enhancement over the state-of-the-art algorithm QADD in multiple molecular properties of the generated molecules.
Collapse
|