1
|
Thaingtamtanha T, Ravichandran R, Gentile F. On the application of artificial intelligence in virtual screening. Expert Opin Drug Discov 2025:1-13. [PMID: 40388244 DOI: 10.1080/17460441.2025.2508866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2025] [Revised: 04/22/2025] [Accepted: 05/16/2025] [Indexed: 05/21/2025]
Abstract
INTRODUCTION Artificial intelligence (AI) has emerged as a transformative tool in drug discovery, particularly in virtual screening (VS), a crucial initial step in identifying potential drug candidates. This article highlights the significance of AI in revolutionizing both ligand-based virtual screening (LBVS) and structure-based virtual screening (SBVS) approaches, streamlining and enhancing the drug discovery process. AREAS COVERED The authors provide an overview of AI applications in drug discovery, with a focus on LBVS and SBVS approaches utilized in prospective cases where new bioactive molecules were identified and experimentally validated. Discussion includes the use of AI in quantitative structure-activity relationship (QSAR) modeling for LBVS, as well as its role in enhancing SBVS techniques such as molecular docking and molecular dynamics simulations. The article is based on literature searches on studies published up to March 2025. EXPERT OPINION AI is rapidly transforming VS in drug discovery, by leveraging increasing amounts of experimental data and expanding its scalability. These innovations promise to enhance efficiency and precision across both LBVS and SBVS approaches, yet challenges such as data curation, rigorous and prospective validation of new models, and efficient integration with experimental methods remain critical for realizing AI's full potential in drug discovery.
Collapse
Affiliation(s)
- Thanawat Thaingtamtanha
- Department of Chemistry and Biomolecular Sciences, University of Ottawa, Ottawa, Ontario, Canada
| | - Rahul Ravichandran
- Department of Chemistry and Biomolecular Sciences, University of Ottawa, Ottawa, Ontario, Canada
| | - Francesco Gentile
- Department of Chemistry and Biomolecular Sciences, University of Ottawa, Ottawa, Ontario, Canada
- Ottawa Institute of Systems Biology, Ottawa, Ontario, Canada
| |
Collapse
|
2
|
Li J, Gong X. Harnessing pre-trained models for accurate prediction of protein-ligand binding affinity. BMC Bioinformatics 2025; 26:55. [PMID: 39962390 PMCID: PMC11834573 DOI: 10.1186/s12859-025-06064-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2024] [Accepted: 01/22/2025] [Indexed: 02/20/2025] Open
Abstract
BACKGROUND The binding between proteins and ligands plays a crucial role in the field of drug discovery. However, this area currently faces numerous challenges. On one hand, existing methods are constrained by the limited availability of labeled data, often performing inadequately when addressing complex protein-ligand interactions. On the other hand, many models struggle to effectively capture the flexible variations and relative spatial relationships between proteins and ligands. These issues not only significantly hinder the advancement of protein-ligand binding research but also adversely affect the accuracy and efficiency of drug discovery. Therefore, in response to these challenges, our study aims to enhance predictive capabilities through innovative approaches, providing more reliable support for drug discovery efforts. METHODS This study leverages a pre-trained model with spatial awareness to enhance the prediction of protein-ligand binding affinity. By perturbing the structures of small molecules in a manner consistent with physical constraints and employing self-supervised tasks, we improve the representation of small molecule structures, allowing for better adaptation to affinity predictions. Meanwhile, our approach enables the identification of potential binding sites on proteins. RESULTS Our model demonstrates a significantly higher correlation coefficient in binding affinity predictions. Extensive evaluation on the PDBBind v2019 refined set, CASF, and Merck FEP benchmarks confirms the model's robustness and strong generalization across diverse datasets. Additionally, the model achieves over 95% in classification ROC for binding site identification, underscoring its high accuracy in pinpointing protein-ligand interaction regions. CONCLUSION This research presents a novel approach that not only enhances the accuracy of binding affinity predictions but also facilitates the identification of binding sites, showcasing the potential of pre-trained models in computational drug design. Data and code are available at https://github.com/MIALAB-RUC/SableBind .
Collapse
Affiliation(s)
- Jiashan Li
- Institute for Mathematical Sciences, School of Mathematics, Renmin University of China, 59 Zhongguancun Street, Beijing, 100872, China
| | - Xinqi Gong
- Institute for Mathematical Sciences, School of Mathematics, Renmin University of China, 59 Zhongguancun Street, Beijing, 100872, China.
| |
Collapse
|
3
|
Hu Y, Yang H, Li M, Zhong Z, Zhou Y, Bai F, Wang Q. Exploring Protein Conformational Changes Using a Large-Scale Biophysical Sampling Augmented Deep Learning Strategy. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2400884. [PMID: 39387316 PMCID: PMC11600214 DOI: 10.1002/advs.202400884] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 07/22/2024] [Indexed: 10/15/2024]
Abstract
Inspired by the success of deep learning in predicting static protein structures, researchers are now actively exploring other deep learning algorithms aimed at predicting the conformational changes of proteins. Currently, a major challenge in the development of such models lies in the limited training data characterizing different conformational transitions. To address this issue, molecular dynamics simulations is combined with enhanced sampling methods to create a large-scale database. To this end, the study simulates the conformational changes of 2635 proteins featuring two known stable states, and collects the structural information along each transition pathway. Utilizing this database, a general deep learning model capable of predicting the transition pathway for a given protein is developed. The model exhibits general robustness across proteins with varying sequence lengths (ranging from 44 to 704 amino acids) and accommodates different types of conformational changes. Great agreement is shown between predictions and experimental data in several systems and successfully apply this model to identify a novel allosteric regulation in an important biological system, the human β-cardiac myosin. These results demonstrate the effectiveness of the model in revealing the nature of protein conformational changes.
Collapse
Affiliation(s)
- Yao Hu
- Department of PhysicsUniversity of Science and Technology of ChinaHefeiAnhui230026China
| | - Hao Yang
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and TechnologyShanghaiTech University393 Middle Huaxia RoadShanghai201210China
| | - Mingwei Li
- Department of PhysicsUniversity of Science and Technology of ChinaHefeiAnhui230026China
| | - Zhicheng Zhong
- Department of PhysicsUniversity of Science and Technology of ChinaHefeiAnhui230026China
| | - Yongqi Zhou
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and TechnologyShanghaiTech University393 Middle Huaxia RoadShanghai201210China
| | - Fang Bai
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and TechnologyShanghaiTech University393 Middle Huaxia RoadShanghai201210China
- School of Information Science and TechnologyShanghaiTech University393 Middle Huaxia RoadShanghai201210China
- Shanghai Clinical Research and Trial CenterShanghai201210China
| | - Qian Wang
- Department of PhysicsUniversity of Science and Technology of ChinaHefeiAnhui230026China
| |
Collapse
|
4
|
Min Y, Wei Y, Wang P, Wang X, Li H, Wu N, Bauer S, Zheng S, Shi Y, Wang Y, Wu J, Zhao D, Zeng J. From Static to Dynamic Structures: Improving Binding Affinity Prediction with Graph-Based Deep Learning. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2405404. [PMID: 39206846 PMCID: PMC11516055 DOI: 10.1002/advs.202405404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Revised: 07/29/2024] [Indexed: 09/04/2024]
Abstract
Accurate prediction of protein-ligand binding affinities is an essential challenge in structure-based drug design. Despite recent advances in data-driven methods for affinity prediction, their accuracy is still limited, partially because they only take advantage of static crystal structures while the actual binding affinities are generally determined by the thermodynamic ensembles between proteins and ligands. One effective way to approximate such a thermodynamic ensemble is to use molecular dynamics (MD) simulation. Here, an MD dataset containing 3,218 different protein-ligand complexes is curated, and Dynaformer, a graph-based deep learning model is further developed to predict the binding affinities by learning the geometric characteristics of the protein-ligand interactions from the MD trajectories. In silico experiments demonstrated that the model exhibits state-of-the-art scoring and ranking power on the CASF-2016 benchmark dataset, outperforming the methods hitherto reported. Moreover, in a virtual screening on heat shock protein 90 (HSP90) using Dynaformer, 20 candidates are identified and their binding affinities are further experimentally validated. Dynaformer displayed promising results in virtual drug screening, revealing 12 hit compounds (two are in the submicromolar range), including several novel scaffolds. Overall, these results demonstrated that the approach offer a promising avenue for accelerating the early drug discovery process.
Collapse
Affiliation(s)
- Yaosen Min
- Institute for Interdisciplinary Information SciencesTsinghua UniversityBeijing100084China
| | - Ye Wei
- Institute for Interdisciplinary Information SciencesTsinghua UniversityBeijing100084China
| | - Peizhuo Wang
- Institute for Interdisciplinary Information SciencesTsinghua UniversityBeijing100084China
- School of Life Science and TechnologyXidian UniversityXi'an710071ShaanxiChina
| | - Xiaoting Wang
- School of MedicineTsinghua UniversityBeijing100084China
| | - Han Li
- Institute for Interdisciplinary Information SciencesTsinghua UniversityBeijing100084China
| | - Nian Wu
- Institute for Interdisciplinary Information SciencesTsinghua UniversityBeijing100084China
| | - Stefan Bauer
- Department of Intelligent SystemsKTHStockholm10044Sweden
| | | | - Yu Shi
- Microsoft Research AsiaBeijing100080China
| | - Yingheng Wang
- Department of Electrical EngineeringTsinghua UniversityBeijing100084China
| | - Ji Wu
- Department of Electrical EngineeringTsinghua UniversityBeijing100084China
| | - Dan Zhao
- Institute for Interdisciplinary Information SciencesTsinghua UniversityBeijing100084China
| | - Jianyang Zeng
- School of EngineeringWestlake UniversityHangzhou310030China
- Research Center for Industries of the FutureWestlake UniversityHangzhou310030China
- Present address:
Westlake Laboratory of Life Sciences and BiomedicineWestlake UniversityHangzhou310024China
| |
Collapse
|
5
|
Wang H, Chen B, Sun H, Zhang Y. Carbon-based molecular properties efficiently predicted by deep learning-based quantum chemical simulation with large language models. Comput Biol Med 2024; 176:108531. [PMID: 38728991 DOI: 10.1016/j.compbiomed.2024.108531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2024] [Revised: 04/21/2024] [Accepted: 04/28/2024] [Indexed: 05/12/2024]
Abstract
The prediction of thermodynamic properties of carbon-based molecules based on their geometrical conformation using fluctuation and density functional theories has achieved great success in the field of energy chemistry, while the excessive computational cost provides both opportunities and challenges for the integration of machine learning. In this work, a deep learning-based quantum chemical prediction model was constructed for efficient prediction of thermodynamic properties of carbon-based molecules. We constructed a novel framework - encoding the 3D information into a large language model (LLM), which in turn generates a 2D SMILES string, while embedding a learnable encoding designed to preserve the integrity of the original 3D information, providing better structural information for the model. Additionally, we have designed an equivariant learning module to encompass representations of conformations and feature learning for conformational sampling. This framework aims to predict thermodynamic properties more accurately than learning from 2D topology alone, while providing faster computational speeds than conventional simulations. By combining machine learning and quantum chemistry, we pioneer efficient practical applications in the field of energy chemistry. Our model advances the integration of data-driven and physics-based modeling to unlock novel insights into carbon-based molecules.
Collapse
Affiliation(s)
- Haoyu Wang
- University of Shanghai for Science and Technology, Shanghai, China; School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai, China.
| | - Bin Chen
- University of Shanghai for Science and Technology, Shanghai, China; School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai, China.
| | - Hangling Sun
- Hengtu Imalligent Technology (Shanghai) Co., Ltd., Shanghai, China
| | - Yuxuan Zhang
- University of Shanghai for Science and Technology, Shanghai, China
| |
Collapse
|
6
|
Visani GM, Pun MN, Angaji A, Nourmohammad A. Holographic-(V)AE: An end-to-end SO(3)-equivariant (variational) autoencoder in Fourier space. PHYSICAL REVIEW RESEARCH 2024; 6:023006. [PMID: 39711614 PMCID: PMC11661850 DOI: 10.1103/physrevresearch.6.023006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/24/2024]
Abstract
Group-equivariant neural networks have emerged as an efficient approach to model complex data, using generalized convolutions that respect the relevant symmetries of a system. These techniques have made advances in both the supervised learning tasks for classification and regression, and the unsupervised tasks to generate new data. However, little work has been done in leveraging the symmetry-aware expressive representations that could be extracted from these approaches. Here, we present holographic-(variational) autoencoder [H-(V)AE], a fully end-to-end SO(3)-equivariant (variational) autoencoder in Fourier space, suitable for unsupervised learning and generation of data distributed around a specified origin in 3D. H-(V)AE is trained to reconstruct the spherical Fourier encoding of data, learning in the process a low-dimensional representation of the data (i.e., a latent space) with a maximally informative rotationally invariant embedding alongside an equivariant frame describing the orientation of the data. We extensively test the performance of H-(V)AE on diverse datasets. We show that the learned latent space efficiently encodes the categorical features of spherical images. Moreover, the low-dimensional representations learned by H-VAE can be used for downstream data-scarce tasks. Specifically, we show that H-(V)AE's latent space can be used to extract compact embeddings for protein structure microenvironments, and when paired with a random forest regressor, it enables state-of-the-art predictions of protein-ligand binding affinity.
Collapse
Affiliation(s)
- Gian Marco Visani
- Paul G. Allen School of Computer Science and Engineering, University of Washington, 85 E Stevens Way NE, Seattle, Washington 98195, USA
| | - Michael N. Pun
- Department of Physics, University of Washington, 3910 15th Avenue Northeast, Seattle, Washington 98195, USA
| | - Arman Angaji
- Institute for Biological Physics, University of Cologne, Zülpicher Str. 77, 50937 Cologne, Germany
| | - Armita Nourmohammad
- Department of Physics, University of Washington, 3910 15th Avenue Northeast, Seattle, Washington 98195, USA
- Paul G. Allen School of Computer Science and Engineering, University of Washington, 85 E Stevens Way NE, Seattle, Washington 98195, USA
- Department of Applied Mathematics, University of Washington, 4182 W Stevens Way NE, Seattle, Washington 98105, USA; and Fred Hutchinson Cancer Center, 1241 Eastlake Ave E, Seattle, Washington 98102, USA
| |
Collapse
|
7
|
Wang H. Prediction of protein-ligand binding affinity via deep learning models. Brief Bioinform 2024; 25:bbae081. [PMID: 38446737 PMCID: PMC10939342 DOI: 10.1093/bib/bbae081] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 01/31/2024] [Indexed: 03/08/2024] Open
Abstract
Accurately predicting the binding affinity between proteins and ligands is crucial in drug screening and optimization, but it is still a challenge in computer-aided drug design. The recent success of AlphaFold2 in predicting protein structures has brought new hope for deep learning (DL) models to accurately predict protein-ligand binding affinity. However, the current DL models still face limitations due to the low-quality database, inaccurate input representation and inappropriate model architecture. In this work, we review the computational methods, specifically DL-based models, used to predict protein-ligand binding affinity. We start with a brief introduction to protein-ligand binding affinity and the traditional computational methods used to calculate them. We then introduce the basic principles of DL models for predicting protein-ligand binding affinity. Next, we review the commonly used databases, input representations and DL models in this field. Finally, we discuss the potential challenges and future work in accurately predicting protein-ligand binding affinity via DL models.
Collapse
Affiliation(s)
- Huiwen Wang
- School of Physics and Engineering, Henan University of Science and Technology, Luoyang 471023, China
| |
Collapse
|
8
|
Gu S, Liu H, Liu L, Hou T, Kang Y. Artificial intelligence methods in kinase target profiling: Advances and challenges. Drug Discov Today 2023; 28:103796. [PMID: 37805065 DOI: 10.1016/j.drudis.2023.103796] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 09/29/2023] [Accepted: 10/03/2023] [Indexed: 10/09/2023]
Abstract
Kinases have a crucial role in regulating almost the full range of cellular processes, making them essential targets for therapeutic interventions against various diseases. Accurate kinase-profiling prediction is vital for addressing the selectivity/specificity challenges in kinase drug discovery, which is closely related to lead optimization, drug repurposing, and the understanding of potential drug side effects. In this review, we provide an overview of the latest advancements in machine learning (ML)-based and deep learning (DL)-based quantitative structure-activity relationship (QSAR) models for kinase profiling. We highlight current trends in this rapidly evolving field and discuss the existing challenges and future directions regarding experimental data set construction and model architecture design. Our aim is to offer practical insights and guidance for the development and utilization of these approaches.
Collapse
Affiliation(s)
- Shukai Gu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Huanxiang Liu
- Faculty of Applied Science, Macao Polytechnic University, Macao 999078
| | - Liwei Liu
- Advanced Computing and Storage Laboratory, Central Research Institute, 2012 Laboratories, Huawei Technologies Co. Ltd, Nanjing 210000, Jiangsu, China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China.
| | - Yu Kang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China.
| |
Collapse
|
9
|
Yue Y, Li S, Wang L, Liu H, Tong HHY, He S. MpbPPI: a multi-task pre-training-based equivariant approach for the prediction of the effect of amino acid mutations on protein-protein interactions. Brief Bioinform 2023; 24:bbad310. [PMID: 37651610 PMCID: PMC10516393 DOI: 10.1093/bib/bbad310] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Revised: 07/12/2023] [Accepted: 08/04/2023] [Indexed: 09/02/2023] Open
Abstract
The accurate prediction of the effect of amino acid mutations for protein-protein interactions (PPI $\Delta \Delta G$) is a crucial task in protein engineering, as it provides insight into the relevant biological processes underpinning protein binding and provides a basis for further drug discovery. In this study, we propose MpbPPI, a novel multi-task pre-training-based geometric equivariance-preserving framework to predict PPI $\Delta \Delta G$. Pre-training on a strictly screened pre-training dataset is employed to address the scarcity of protein-protein complex structures annotated with PPI $\Delta \Delta G$ values. MpbPPI employs a multi-task pre-training technique, forcing the framework to learn comprehensive backbone and side chain geometric regulations of protein-protein complexes at different scales. After pre-training, MpbPPI can generate high-quality representations capturing the effective geometric characteristics of labeled protein-protein complexes for downstream $\Delta \Delta G$ predictions. MpbPPI serves as a scalable framework supporting different sources of mutant-type (MT) protein-protein complexes for flexible application. Experimental results on four benchmark datasets demonstrate that MpbPPI is a state-of-the-art framework for PPI $\Delta \Delta G$ predictions. The data and source code are available at https://github.com/arantir123/MpbPPI.
Collapse
Affiliation(s)
- Yang Yue
- School of Computer Science from the University of Birmingham, UK
| | - Shu Li
- Centre for Artificial Intelligence Driven Drug Discovery at Macao Polytechnic University
| | - Lingling Wang
- Centre for Artificial Intelligence Driven Drug Discovery at Macao Polytechnic University
| | - Huanxiang Liu
- Centre for Artificial Intelligence Driven Drug Discovery at Macao Polytechnic University
| | - Henry H Y Tong
- Centre for Artificial Intelligence Driven Drug Discovery at Macao Polytechnic University
| | - Shan He
- School of Computer Science, the University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK
| |
Collapse
|
10
|
Wu F, Courty N, Jin S, Li SZ. Improving molecular representation learning with metric learning-enhanced optimal transport. PATTERNS (NEW YORK, N.Y.) 2023; 4:100714. [PMID: 37123438 PMCID: PMC10140620 DOI: 10.1016/j.patter.2023.100714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 12/29/2022] [Accepted: 03/01/2023] [Indexed: 05/02/2023]
Abstract
Training data are usually limited or heterogeneous in many chemical and biological applications. Existing machine learning models for chemistry and materials science fail to consider generalizing beyond training domains. In this article, we develop a novel optimal transport-based algorithm termed MROT to enhance their generalization capability for molecular regression problems. MROT learns a continuous label of the data by measuring a new metric of domain distances and a posterior variance regularization over the transport plan to bridge the chemical domain gap. Among downstream tasks, we consider basic chemical regression tasks in unsupervised and semi-supervised settings, including chemical property prediction and materials adsorption selection. Extensive experiments show that MROT significantly outperforms state-of-the-art models, showing promising potential in accelerating the discovery of new substances with desired properties.
Collapse
Affiliation(s)
- Fang Wu
- School of Engineering, Westlake University, Hangzhou 310024, China
- Institute of AI Industry Research, Tsinghua University, Beijing 100084, China
| | - Nicolas Courty
- French National Centre for Scientific Research, Southern Brittany University, Lorient, France
| | - Shuting Jin
- School of Informatics, Xiamen University, Xiamen 361005, China
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China
| | - Stan Z. Li
- School of Engineering, Westlake University, Hangzhou 310024, China
| |
Collapse
|