1
|
Jusoh AS, Remli MA, Mohamad MS, Cazenave T, Fong CS. How generative Artificial Intelligence can transform drug discovery? Eur J Med Chem 2025; 295:117825. [PMID: 40456205 DOI: 10.1016/j.ejmech.2025.117825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2025] [Revised: 05/06/2025] [Accepted: 05/26/2025] [Indexed: 06/11/2025]
Abstract
Generative Artificial Intelligence (Generative AI) is transforming drug discovery by enabling advanced analysis of complex biological and chemical data. This review explores key Generative AI models, including Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), flow-based models and Transformer-based models, with Transformers gaining prominence due to the abundance of text-based biological data and the success of language models like ChatGPT. The paper discusses molecular representations, performance evaluation metrics, and current trends in Generative AI-driven drug discovery, such as protein-protein interactions (PPIs), drug-target interactions (DTIs) and de-novo drug design. However, these approaches face significant challenges, including applicability domain issues, lack of interpretability, data scarcity, novelty, scalability, computational resource limitations, and the absence of standardized evaluation metrics. These challenges hinder model performance, complicate decision-making, and limit the generation of novel and viable drug candidates. To address these issues, strategies such as hybrid models, integration of multiomics datasets, explainable AI (XAI) techniques, data augmentation, transfer learning, and cloud-based solutions are proposed. Additionally, a curated list of databases supporting drug discovery research is provided. The review concludes by emphasizing the need for optimized AI models, robust validation methods, interdisciplinary collaboration, and future academic efforts to fully realize the potential of Generative AI in advancing drug discovery.
Collapse
Affiliation(s)
- Ainin Sofia Jusoh
- Institute for Artificial Intelligence and Big Data, Universiti Malaysia Kelantan, Kota Bharu, 16100, Kelantan, Malaysia; Faculty of Data Science and Computing, Universiti Malaysia Kelantan, Kota Bharu, 16100, Kelantan, Malaysia.
| | - Muhammad Akmal Remli
- Institute for Artificial Intelligence and Big Data, Universiti Malaysia Kelantan, Kota Bharu, 16100, Kelantan, Malaysia; Faculty of Data Science and Computing, Universiti Malaysia Kelantan, Kota Bharu, 16100, Kelantan, Malaysia.
| | - Mohd Saberi Mohamad
- Health Data Science Lab, Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Al Ain, 15551, United Arab Emirates; Faculty of Engineering and Technology, Multimedia University, 75450, Melaka, Malaysia; Department of Biosystems Engineering, Faculty of Agricultural Technology, Universitas Brawijaya, 65145, Malang, East Java, Indonesia.
| | | | - Chin Siok Fong
- UKM Medical Molecular Biology Institute (UMBI), 56000, Kuala Lumpur, Malaysia.
| |
Collapse
|
2
|
Li G, Yuan Y, Zhang R. Predicting Protein-Ligand Binding Affinity Using Fusion Model of Spatial-Temporal Graph Neural Network and 3D Structure-Based Complex Graph. Interdiscip Sci 2025; 17:257-276. [PMID: 39541085 DOI: 10.1007/s12539-024-00644-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2024] [Revised: 07/09/2024] [Accepted: 07/16/2024] [Indexed: 11/16/2024]
Abstract
The investigation of molecular interactions between ligands and their target molecules is becoming more significant as protein structure data continues to develop. In this study, we introduce PLA-STGCNnet, a deep fusion spatial-temporal graph neural network designed to study protein-ligand interactions based on the 3D structural data of protein-ligand complexes. Unlike 1D protein sequences or 2D ligand graphs, the 3D graph representation offers a more precise portrayal of the complex interactions between proteins and ligands. Research studies have shown that our fusion model, PLA-STGCNnet, outperforms individual algorithms in accurately predicting binding affinity. The advantage of a fusion model is the ability to fully combine the advantages of multiple different models and improve overall performance by combining their features and outputs. Our fusion model shows satisfactory performance on different data sets, which proves its generalization ability and stability. The fusion-based model showed good performance in protein-ligand affinity prediction, and we successfully applied the model to drug screening. Our research underscores the promise of fusion spatial-temporal graph neural networks in addressing complex challenges in protein-ligand affinity prediction. The Python scripts for implementing various model components are accessible at https://github.com/ligaili01/PLA-STGCN.
Collapse
Affiliation(s)
- Gaili Li
- School of Information science and Engineering, Lanzhou University, lanzhou, 730000, China
| | - Yongna Yuan
- School of Information science and Engineering, Lanzhou University, lanzhou, 730000, China.
| | - Ruisheng Zhang
- School of Information science and Engineering, Lanzhou University, lanzhou, 730000, China.
| |
Collapse
|
3
|
Asim MN, Asif T, Hassan F, Dengel A. Protein Sequence Analysis landscape: A Systematic Review of Task Types, Databases, Datasets, Word Embeddings Methods, and Language Models. Database (Oxford) 2025; 2025:baaf027. [PMID: 40448683 DOI: 10.1093/database/baaf027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2024] [Revised: 02/06/2025] [Accepted: 03/26/2025] [Indexed: 06/02/2025]
Abstract
Protein sequence analysis examines the order of amino acids within protein sequences to unlock diverse types of a wealth of knowledge about biological processes and genetic disorders. It helps in forecasting disease susceptibility by finding unique protein signatures, or biomarkers that are linked to particular disease states. Protein Sequence analysis through wet-lab experiments is expensive, time-consuming and error prone. To facilitate large-scale proteomics sequence analysis, the biological community is striving for utilizing AI competence for transitioning from wet-lab to computer aided applications. However, Proteomics and AI are two distinct fields and development of AI-driven protein sequence analysis applications requires knowledge of both domains. To bridge the gap between both fields, various review articles have been written. However, these articles focus revolves around few individual tasks or specific applications rather than providing a comprehensive overview about wide tasks and applications. Following the need of a comprehensive literature that presents a holistic view of wide array of tasks and applications, contributions of this manuscript are manifold: It bridges the gap between Proteomics and AI fields by presenting a comprehensive array of AI-driven applications for 63 distinct protein sequence analysis tasks. It equips AI researchers by facilitating biological foundations of 63 protein sequence analysis tasks. It enhances development of AI-driven protein sequence analysis applications by providing comprehensive details of 68 protein databases. It presents a rich data landscape, encompassing 627 benchmark datasets of 63 diverse protein sequence analysis tasks. It highlights the utilization of 25 unique word embedding methods and 13 language models in AI-driven protein sequence analysis applications. It accelerates the development of AI-driven applications by facilitating current state-of-the-art performances across 63 protein sequence analysis tasks.
Collapse
Affiliation(s)
- Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence, Kaiserslautern 67663, Germany
- Intelligentx GmbH (intelligentx.com), Kaiserslautern, Germany
| | - Tayyaba Asif
- Department of Computer Science, Rheinland Pfälzische Technische Universität, Kaiserslautern 67663, Germany
| | - Faiza Hassan
- Department of Computer Science, Rheinland Pfälzische Technische Universität, Kaiserslautern 67663, Germany
| | - Andreas Dengel
- German Research Center for Artificial Intelligence, Kaiserslautern 67663, Germany
- Department of Computer Science, Rheinland Pfälzische Technische Universität, Kaiserslautern 67663, Germany
- Intelligentx GmbH (intelligentx.com), Kaiserslautern, Germany
| |
Collapse
|
4
|
Shi H, Hu J, Zhang X, Jin S, Xu X. Prediction of drug-target interactions based on substructure subsequences and cross-public attention mechanism. PLoS One 2025; 20:e0324146. [PMID: 40445972 PMCID: PMC12124583 DOI: 10.1371/journal.pone.0324146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2024] [Accepted: 04/22/2025] [Indexed: 06/02/2025] Open
Abstract
Drug-target interactions (DTIs) play a critical role in drug discovery and repurposing. Deep learning-based methods for predicting drug-target interactions are more efficient than wet-lab experiments. The extraction of original and substructural features from drugs and proteins plays a key role in enhancing the accuracy of DTI predictions, while the integration of multi-feature information and effective representation of interaction data also impact the precision of DTI forecasts. Consequently, we propose a drug-target interaction prediction model, SSCPA-DTI, based on substructural subsequences and a cross co-attention mechanism. We use drug SMILES sequences and protein sequences as inputs for the model, employing a Multi-feature information mining module (MIMM) to extract original and substructural features of DTIs. Substructural information provides detailed insights into molecular local structures, while original features enhance the model's understanding of the overall molecular architecture. Subsequently, a Cross-public attention module (CPA) is utilized to first integrate the extracted original and substructural features, then to extract interaction information between the protein and drug, addressing issues such as insufficient accuracy and weak interpretability arising from mere concatenation without interactive integration of feature information. We conducted experiments on three public datasets and demonstrated superior performance compared to baseline models.
Collapse
Affiliation(s)
- Haikuo Shi
- Jing Hu, School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, Hubei, China
| | - Jing Hu
- Jing Hu, School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, Hubei, China
| | - Xiaolong Zhang
- Jing Hu, School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, Hubei, China
| | - Shuting Jin
- Jing Hu, School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, Hubei, China
| | - Xin Xu
- Jing Hu, School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, Hubei, China
| |
Collapse
|
5
|
Sun Q, Wang H, Xie J, Wang L, Mu J, Li J, Ren Y, Lai L. Computer-Aided Drug Discovery for Undruggable Targets. Chem Rev 2025. [PMID: 40423592 DOI: 10.1021/acs.chemrev.4c00969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/28/2025]
Abstract
Undruggable targets are those of therapeutical significance but challenging for conventional drug design approaches. Such targets often exhibit unique features, including highly dynamic structures, a lack of well-defined ligand-binding pockets, the presence of highly conserved active sites, and functional modulation by protein-protein interactions. Recent advances in computational simulations and artificial intelligence have revolutionized the drug design landscape, giving rise to innovative strategies for overcoming these obstacles. In this review, we highlight the latest progress in computational approaches for drug design against undruggable targets, present several successful case studies, and discuss remaining challenges and future directions. Special emphasis is placed on four primary target categories: intrinsically disordered proteins, protein allosteric regulation, protein-protein interactions, and protein degradation, along with discussion of emerging target types. We also examine how AI-driven methodologies have transformed the field, from applications in protein-ligand complex structure prediction and virtual screening to de novo ligand generation for undruggable targets. Integration of computational methods with experimental techniques is expected to bring further breakthroughs to overcome the hurdles of undruggable targets. As the field continues to evolve, these advancements hold great promise to expand the druggable space, offering new therapeutic opportunities for previously untreatable diseases.
Collapse
Affiliation(s)
- Qi Sun
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
- Peking University Chengdu Academy for Advanced Interdisciplinary Biotechnologies, Chengdu, Sichuan 610213, China
| | - Hanping Wang
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Juan Xie
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Liying Wang
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Junxi Mu
- Peking-Tsinghua Center for Life Science, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Junren Li
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Yuhao Ren
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Luhua Lai
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- Peking-Tsinghua Center for Life Science, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- Peking University Chengdu Academy for Advanced Interdisciplinary Biotechnologies, Chengdu, Sichuan 610213, China
- Research Unit of Drug Design Method, Chinese Academy of Medical Sciences, Peking University, Beijing 100871, China
| |
Collapse
|
6
|
Wang N, Zhao S, Li Z, Sun J, Yi M. WDGBANDTI: A Deep Graph Convolutional Network-Based Bilinear Attention Network for Drug-Target Interaction Prediction with Domain Adaptation. Interdiscip Sci 2025:10.1007/s12539-025-00714-6. [PMID: 40410523 DOI: 10.1007/s12539-025-00714-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2024] [Revised: 04/09/2025] [Accepted: 04/10/2025] [Indexed: 05/25/2025]
Abstract
BACKGROUNDS During the development of new drugs, it is essential to assess their effectiveness and examine the potential mechanisms behind side effects. This process typically involves combining the analysis of drugs under development with relevant existing drugs to more precisely evaluate the effects of drugs and targets. The use of deep learning methods to analyze this problem is currently a research hotspot, but several limitations remain: (i) how to deepen the analysis from the molecular level to the atomic level and analyze the key substructures that affect interactions on the basis of pharmaceutical mechanisms; (ii) how to integrate biomedical analysis with deep learning methods to make it medically sound and enhance interpretability. METHODS To address the limitations of existing research, based on Deep Graph Convolutional Network (Deep-GCN) and Bilinear Attention Network (BAN), we have constructed an interpretable deep learning framework, WDGBANDTI, to analyze and predict drug‒target interactions at the substructure level and enhance the prediction capability of the model with respect to unidentified target pairings by adding modules. RESULTS For different application scenarios, we validated the model via several commonly used and highly covered datasets. We also selected several state-of-the-art computer methods as comparison objects, and our model demonstrates advantages in accuracy, sensitivity, specificity, and other deep learning features. More importantly, the model can identify the substructures that play a role in drug‒target interactions through BAN, highlighting its excellent interpretability. CONCLUSION In conclusion, we believe that our work will contribute to advancements in drug development and side effect experiments and provide meaningful guidance for drug design.
Collapse
Affiliation(s)
- Nianrui Wang
- School of Mathematics and Physics, China University of Geosciences, Wuhan, 430074, China
| | - Shumin Zhao
- School of Mathematics and Physics, China University of Geosciences, Wuhan, 430074, China
| | - Ziwei Li
- School of Mathematics and Physics, China University of Geosciences, Wuhan, 430074, China
| | - Jianqiang Sun
- School of Information Science and Engineering, Linyi University, Linyi, 276000, China.
| | - Ming Yi
- School of Mathematics and Physics, China University of Geosciences, Wuhan, 430074, China.
| |
Collapse
|
7
|
Liu W, Li X, Hang B, Wang P. EnGCI: enhancing GPCR-compound interaction prediction via large molecular models and KAN network. BMC Biol 2025; 23:136. [PMID: 40375308 DOI: 10.1186/s12915-025-02238-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2024] [Accepted: 05/06/2025] [Indexed: 05/18/2025] Open
Abstract
BACKGROUND Identifying GPCR-compound interactions (GCI) plays a significant role in drug discovery and chemogenomics. Machine learning, particularly deep learning, has become increasingly influential in this domain. Large molecular models, due to their ability to capture detailed structural and functional information, have shown promise in enhancing the predictive accuracy of downstream tasks. Consequently, exploring the performance of these models in GCI prediction, as well as evaluating their effectiveness when integrated with other deep learning models, has emerged as a compelling research area. This paper aims to investigate these challenges. RESULTS This study introduces EnGCI, a novel model comprising two distinct modules. The MSBM integrates a graph isomorphism network (GIN) and a convolutional neural network (CNN) to extract features from GPCRs and compounds, respectively. These features are then processed by a Kolmogorov-Arnold network (KAN) for decision-making. The LMMBM utilizes two large-scale pre-trained models to extract features from compounds and GPCRs, and subsequently, KAN is again employed for decision-making. Each module leverages different sources of multimodal information, and their fusion enhances the overall accuracy of GPCR-compound interaction (GCI) prediction. Evaluating the EnGCI model on a rigorously curated GCI dataset, we achieved an AUC of approximately 0.89, significantly outperforming current state-of-the-art benchmark models. CONCLUSIONS The EnGCI model integrates two complementary modules: one that learns molecular features from scratch for the GPCR-compound interaction (GCI) prediction task, and another that extracts molecular features using pre-trained large molecular models. After further processing and integration, these multimodal information sources enable a more profound exploration and understanding of the complex interaction relationships between GPCRs and compounds. The EnGCI model offers a robust and efficient framework that enhances GCI predictive capabilities and has the potential to significantly contribute to GPCR drug discovery.
Collapse
Affiliation(s)
- Weihao Liu
- Computer School, Hubei University of Arts and Science, Longzhong Road, Xiangyang, 441053, Hubei, China
| | - Xiaoli Li
- Computer School, Hubei University of Arts and Science, Longzhong Road, Xiangyang, 441053, Hubei, China
| | - Bo Hang
- Computer School, Hubei University of Arts and Science, Longzhong Road, Xiangyang, 441053, Hubei, China
| | - Pu Wang
- Computer School, Hubei University of Arts and Science, Longzhong Road, Xiangyang, 441053, Hubei, China.
| |
Collapse
|
8
|
Du BX, Yu H, Zhu B, Long Y, Wu M, Shi JY. A novel interpretability framework for enzyme turnover number prediction boosted by pre-trained enzyme embeddings and adaptive gate network. Methods 2025; 237:45-52. [PMID: 40021034 DOI: 10.1016/j.ymeth.2025.02.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2024] [Revised: 01/05/2025] [Accepted: 02/25/2025] [Indexed: 03/03/2025] Open
Abstract
It is a vital step to identify the enzyme turnover number (kcat) in synthetic biology and early-stage drug discovery. Recently, deep learning methods have achieved inspiring process to predict kcat with the development of multi-species enzyme-substrate pairs turnover number data. However, the performance of existing approaches still heavily depends on the effectiveness of feature extraction for enzymes and substrates, as well as the optimal fusion of these two types of features. Furthermore, it is essential to identify the key molecular substructures that significantly impact kcat prediction. To address these issues, we develop a novel end-to-end dual-representation interpretability framework GELKcat by harnessing graph transformers for substrate molecular encoding and CNNs for enzyme word2vec embeddings. We further integrate substrate and enzyme features using the adaptive gate network, which assigns optimal weights to capture the most suitable feature combinations. The comparison with several state-of-the-art methods demonstrates the superiority of our GELKcat and the ablation studies further illuminate the invaluable roles of three main components. Furthermore, case studies illustrate the interpretability of GELKcat by identifying the key functional groups in a substrate, which are significantly associated with turnover number. It is anticipated that this work can bridge current gaps in enzyme-substrate representation, which can give some guidance for drug discovery and synthetic biology.
Collapse
Affiliation(s)
- Bing-Xue Du
- School of Life Sciences, Northwestern Polytechnical University, Xi'an 710072, China; Institute for Infocomm Research (I(2)R), Agency for Science, Technology and Research (A*STAR), Singapore 138632, Singapore.
| | - Haoyang Yu
- School of Life Sciences, Northwestern Polytechnical University, Xi'an 710072, China.
| | - Bei Zhu
- School of Life Sciences, Northwestern Polytechnical University, Xi'an 710072, China
| | - Yahui Long
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore 138671, Singapore.
| | - Min Wu
- Institute for Infocomm Research (I(2)R), Agency for Science, Technology and Research (A*STAR), Singapore 138632, Singapore.
| | - Jian-Yu Shi
- School of Life Sciences, Northwestern Polytechnical University, Xi'an 710072, China.
| |
Collapse
|
9
|
Li Z, Han K, Wang Z, Lei L, Wang Z, Dai R, Wang M, Zhang Z, Guo Q. Enhanced inhibitor-kinase affinity prediction via integrated multimodal analysis of drug molecule and protein sequence features. Int J Biol Macromol 2025; 309:142871. [PMID: 40194581 DOI: 10.1016/j.ijbiomac.2025.142871] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2025] [Revised: 03/26/2025] [Accepted: 04/04/2025] [Indexed: 04/09/2025]
Abstract
The accurate prediction of inhibitor-kinase binding affinity is pivotal for advancing drug development and precision medicine. In this study, we developed predictive models for human kinases, including cyclin-dependent kinases (CDKs), mitogen-activated protein kinases (MAP kinases), glycogen synthase kinases (GSKs), CDK-like kinases (CMGC kinase group) and receptor tyrosine kinases (RTKs)-key regulators of cellular signaling and disease progression. These kinases serve as primary drug targets in cancer and other critical diseases. To enhance affinity prediction precision, we introduce an innovative multimodal fusion model, KinNet. The model integrates the GraphKAN network, which effectively captures both local and global structural features of drug molecules. Furthermore, it leverages kernel functions and learnable activation functions to dynamically optimize node and edge feature representations. Additionally, the model incorporates the Conv-Enhanced Mamba module, combining Conv1D's ability to capture local features with Mamba's strength in processing long sequences, facilitating comprehensive feature extraction from protein sequences and molecular fingerprints. Experimental results confirm that the KinNet model achieves superior prediction accuracy compared to existing approaches, underscoring its potential to elucidate inhibitor-kinase binding mechanisms. This model serves as a robust computational framework to support drug discovery and the development of kinase-targeted therapies.
Collapse
Affiliation(s)
- Zhenxing Li
- Academy of Artificial Intelligence, Beijing Institute of Petrochemical Technology, Beijing 102617, China
| | - Kaitai Han
- Academy of Artificial Intelligence, Beijing Institute of Petrochemical Technology, Beijing 102617, China
| | - Zijun Wang
- Academy of Artificial Intelligence, Beijing Institute of Petrochemical Technology, Beijing 102617, China
| | - Lixin Lei
- Academy of Artificial Intelligence, Beijing Institute of Petrochemical Technology, Beijing 102617, China
| | - Zhenghui Wang
- Academy of Artificial Intelligence, Beijing Institute of Petrochemical Technology, Beijing 102617, China
| | - Ruoyan Dai
- Academy of Artificial Intelligence, Beijing Institute of Petrochemical Technology, Beijing 102617, China
| | - Mengqiu Wang
- Academy of Artificial Intelligence, Beijing Institute of Petrochemical Technology, Beijing 102617, China
| | - Zhiwei Zhang
- Academy of Artificial Intelligence, Beijing Institute of Petrochemical Technology, Beijing 102617, China
| | - Qianjin Guo
- Academy of Artificial Intelligence, Beijing Institute of Petrochemical Technology, Beijing 102617, China.
| |
Collapse
|
10
|
Heo R, Lee D, Kim BJ, Seo S, Park S, Park C. KNU-DTI: KNowledge United Drug-Target Interaction prediction. Comput Biol Med 2025; 189:109927. [PMID: 40024184 DOI: 10.1016/j.compbiomed.2025.109927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2024] [Revised: 01/17/2025] [Accepted: 02/24/2025] [Indexed: 03/04/2025]
Abstract
MOTIVATION Accurately predicting drug-target protein interactions (DTI) is a cornerstone of drug discovery, enabling the identification of potential therapeutic compounds. Sequence-based prediction models, despite their simplicity, hold great promise in extracting essential information directly from raw sequences. However, the focus in recent DTI studies has increasingly shifted toward enhancing algorithmic complexity, often at the expense of fully leveraging robust sequence representation learning methods. This shift has led to the underestimation and gradual neglect of methodologies aimed at effectively capturing discriminative features from sequences. Our work seeks to address this oversight by emphasizing the value of well-constructed sequence representation algorithms, demonstrating that even with simple interaction mapping algorithm techniques, accurate DTI models can be achieved. By prioritizing meaningful information extraction over excessive model complexity, we aim to advance the development of practical and generalizable DTI prediction frameworks. RESULTS We developed the KNowledge Uniting DTI model (KNU-DTI), which retrieves structural information and unites them. Protein structural properties were obtained using structural property sequence (SPS). Extended-connectivity fingerprint (ECFP) was used to estimate the structure-activity relationship in molecules. Including these two features, a total of five latent vectors were derived from protein and molecule via various neural networks and integrated by elemental-wise addition to predict binding interactions or affinity. Using four test concepts to evaluate the model, we show that the model outperforms recently published competitors. Finally, a case study indicated that our model has a competitive edge over existing docking simulations in some cases.
Collapse
Affiliation(s)
- Ryong Heo
- Interdisciplinary Graduate Program in Medical Bigdata Convergence, Kangwon National University, Chuncheon-si, 24341, Gangwon-do, Republic of Korea; UBLBio Corporation, Yeongtong-ro 237, Suwon, 16679, Gyeonggi-do, Republic of Korea
| | - Dahyeon Lee
- Department of Data Science, Kangwon National University, Republic of Korea
| | - Byung Ju Kim
- UBLBio Corporation, Yeongtong-ro 237, Suwon, 16679, Gyeonggi-do, Republic of Korea
| | - Sangmin Seo
- Department of Computer Science, Yonsei University, Yonsei-ro 50, Seodaemun-gu, 03722, Seoul, Republic of Korea
| | - Sanghyun Park
- Department of Computer Science, Yonsei University, Yonsei-ro 50, Seodaemun-gu, 03722, Seoul, Republic of Korea
| | - Chihyun Park
- Interdisciplinary Graduate Program in Medical Bigdata Convergence, Kangwon National University, Chuncheon-si, 24341, Gangwon-do, Republic of Korea; Department of Data Science, Kangwon National University, Republic of Korea; UBLBio Corporation, Yeongtong-ro 237, Suwon, 16679, Gyeonggi-do, Republic of Korea; Department of Computer Science and Engineering, Kangwon National University, Republic of Korea.
| |
Collapse
|
11
|
Wei Z, Wang Z, Tang C. Dynamic Prediction of Drug-Target Interactions via Cross-Modal Feature Mapping with Learnable Association Information. J Chem Inf Model 2025; 65:3915-3927. [PMID: 40227648 DOI: 10.1021/acs.jcim.4c02348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/15/2025]
Abstract
Predicting drug-target interactions (DTIs) is essential for advancing drug discovery and personalized medicine. However, accurately capturing the intricate binding relationships between drugs and targets remains a significant challenge, particularly when attempting to fully leverage the vast correlation information inherent in molecular data. This complexity is further exacerbated by the structural differences and sequence length disparities between drug molecules and protein targets, which can hinder effective feature alignment and interaction modeling. To address these challenges, we propose a model named LAM-DTI. First, drug and target features are extracted from the original molecular sequence data using a multilayer convolutional neural network. To address the sequence length discrepancy between drug and target features, we apply a connectionist temporal classification module to generate normalized feature sequences. Building on this, we introduce a learnable association information matrix as a flexible intermediary, which dynamically adjusts to capture accurate DTI association information, thereby enhancing cross-modal mapping within a unified latent space. This progressive mapping strategy enables the model to form an interaction projection between drugs and targets, effectively identifying critical interaction regions and guiding the capture of complex interaction-related features. Extensive experiments on three well-known benchmark data sets demonstrate that LAM-DTI significantly outperforms previous models.
Collapse
Affiliation(s)
- Ziyu Wei
- School of Computer Science, China University of Geosciences, Wuhan 430074, China
| | - Zhengyu Wang
- Office of the Drug Clinical Trials Agency, The Affiliated Huai'an Hospital of Xuzhou Medical University and The Second People's Hospital of Huai'an, Huai'an 223002, China
| | - Chang Tang
- School of Computer Science, China University of Geosciences, Wuhan 430074, China
| |
Collapse
|
12
|
E U, T M. DICCA-DTA: Diffusion and Contextualized Capsule Attention guided Factorized Cross-Pooling for Drug-Target Affinity prediction. Comput Biol Chem 2025; 118:108472. [PMID: 40288256 DOI: 10.1016/j.compbiolchem.2025.108472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2025] [Revised: 03/27/2025] [Accepted: 04/08/2025] [Indexed: 04/29/2025]
Abstract
Drug-Target Affinity (DTA) prediction plays a crucial role in the drug discovery process by evaluating the strength of the interaction between a drug and its biological target, which is often a protein. Despite advancements in DTA prediction through deep learning, several fundamental challenges persist: (i) suboptimal information propagation in molecular graphs, limiting the effective representation of complex drug structures, (ii) accurately modeling the complex interactions between drug-binding sites and protein substructures, and (iii) prioritizing critical substructure interactions to enhance both accuracy and interpretability. To address these challenges, the DICCA-DTA framework is introduced, aiming to improve the contextual integration of molecular information and facilitate a more comprehensive representation of drug-target interactions in allopathic research. It employs a Diffused Isomorphic Network (DIN) to extract comprehensive drug features from molecular graphs, capturing both local substructures and global information. Furthermore, a Contextualized Capsule Attention Network (CCAN) module incorporates multi-head attention with capsule networks to capture both local and global protein sequence characteristics. The attention-guided Factorized Cross-Pooling (FCP) mechanism dynamically refines drug-protein interaction modeling by selectively emphasizing critical binding site interactions, thereby enhancing predictive accuracy. Explainable attention maps further reveal the most crucial drug-protein binding site interactions, providing transparent insights into the model's decision-making process. Comprehensive evaluations across the Davis, KIBA, Metz and BindingDB datasets demonstrate the superior performance of the DICCA-DTA framework over existing state-of-the-art models. A case study on cancer-related protein interactions from the DrugBank database further demonstrates the framework's precision in identifying key drug-protein affinities, reinforcing its potential to accelerate drug discovery and repurposing.
Collapse
Affiliation(s)
- Uma E
- Department of Information Science and Technology, College of Engineering Guindy, Chennai, India.
| | - Mala T
- Department of Information Science and Technology, College of Engineering Guindy, Chennai, India
| |
Collapse
|
13
|
Zhou P, Wang J, Li C, Wang Z, Liu Y, Sun S, Lin J, Wei L, Cai X, Lai H, Liu W, Wang L, Liu Y, Zeng X. Instruction multi-constraint molecular generation using a teacher-student large language model. BMC Biol 2025; 23:105. [PMID: 40269927 PMCID: PMC12020078 DOI: 10.1186/s12915-025-02200-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2024] [Accepted: 03/27/2025] [Indexed: 04/25/2025] Open
Abstract
BACKGROUND While various models and computational tools have been proposed for structure and property analysis of molecules, generating molecules that conform to all desired structures and properties remains a challenge. RESULTS We introduce a multi-constraint molecular generation large language model, TSMMG, which, akin to a student, incorporates knowledge from various small models and tools, namely, the "teachers." To train TSMMG, we construct a large set of text-molecule pairs by extracting molecular knowledge from these "teachers," enabling it to generate novel molecules that conform to the descriptions through various text prompts. We experimentally show that TSMMG remarkably performs in generating molecules that meet complex property requirements described in natural language across two-, three-, and four-constraint tasks, with an average molecular validity of over 99% and success ratio of 82.58%, 68.03%, and 67.48%, respectively. The model also exhibits adaptability through zero-shot testing, creating molecules that satisfy combinations of properties that have not been encountered. It can comprehend text inputs with various language styles, extending beyond the confines of outlined prompts. CONCLUSIONS TSMMG presents an effective model for multi-constraint molecular generation using natural language. This framework is not only applicable to drug discovery but also serves as a reference for other related fields.
Collapse
Affiliation(s)
- Peng Zhou
- College of Information Science and Engineering, Hunan University, Changsha, 410082, Hunan, China
- AI for Life Sciences Lab, Tencent, Shenzhen, China
| | - Jianmin Wang
- The Interdisciplinary Graduate Program in Integrative Biotechnology, Yonsei University, Incheon, 21983, Seoul, Korea
| | - Chunyan Li
- School of Informatics, Yunnan Normal University, Kunming, 650500, Yunnan, China
| | - Zixu Wang
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan
| | - Yiping Liu
- College of Information Science and Engineering, Hunan University, Changsha, 410082, Hunan, China
| | - Siqi Sun
- Research Institute of Intelligent Complex Systems, Fudan University, Shanghai, 200433, China
- Shanghai AI Laboratory, Shanghai, 200232, China
| | - Jianxin Lin
- College of Information Science and Engineering, Hunan University, Changsha, 410082, Hunan, China
| | - Leyi Wei
- Centre for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Science, Macao Polytechnic University, Macao SAR, China
- School of Informatics, Xiamen University, Xiamen, China
| | - Xibao Cai
- College of Information Science and Engineering, Hunan University, Changsha, 410082, Hunan, China
| | - Houtim Lai
- AI for Life Sciences Lab, Tencent, Shenzhen, China
| | - Wei Liu
- AI for Life Sciences Lab, Tencent, Shenzhen, China
| | - Longyue Wang
- Alibaba International Digital Commerce, Hangzhou, China.
| | - Yuansheng Liu
- College of Information Science and Engineering, Hunan University, Changsha, 410082, Hunan, China.
| | - Xiangxiang Zeng
- College of Information Science and Engineering, Hunan University, Changsha, 410082, Hunan, China.
| |
Collapse
|
14
|
Li C, Mi J, Wang H, Liu Z, Gao J, Wan J. MGMA-DTI: Drug target interaction prediction using multi-order gated convolution and multi-attention fusion. Comput Biol Chem 2025; 118:108449. [PMID: 40239449 DOI: 10.1016/j.compbiolchem.2025.108449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2025] [Revised: 03/11/2025] [Accepted: 03/28/2025] [Indexed: 04/18/2025]
Abstract
Accurately predicting drug-target interactions (DTI) is crucial for drug discovery and can reduce drug development costs. Recent deep learning-based DTI predictions have demonstrated promising performance, but they still face two challenges: (i) The over-reliance on the extraction of local features and insufficient learning of global features limit the model's performance. (ii) The lack of effective fusion of drug-target interaction features leads to the lack of interpretability of the model. To address these challenges, we propose a new model for predicting drug-target interactions based on multi-order gated convolution and multi-attention fusion, MGMA-DTI. The drug feature encoder obtains a two-dimensional molecular graph based on the drug's SMILES string and uses a graph convolutional neural network to encode the drug features. The protein encoder is based on a multi-order gated convolution, which enhances the model's ability to capture global feature between amino acid sequences. In order to better achieve interactive learning between drugs and proteins, we designed a multi-attention fusion module that effectively captures the drug-target interaction features. Experimental results show that MGMA-DTI outperforms other baseline models on three benchmark datasets: BindingDB, BioSNAP, and Human. Case studies further demonstrate that the model provides valuable insights for drug discovery. In addition, our model provides molecular-level interpretability, which can provide more scientifically meaningful guidance.
Collapse
Affiliation(s)
- Chang Li
- The College of Information Science and Technology, Beijing University of Chemical Technology, North Third Ring Road 15, Beijing, 100029, China
| | - Jia Mi
- The College of Information Science and Technology, Beijing University of Chemical Technology, North Third Ring Road 15, Beijing, 100029, China
| | - Han Wang
- The College of Information Science and Technology, Beijing University of Chemical Technology, North Third Ring Road 15, Beijing, 100029, China
| | - Zhikang Liu
- The College of Information Science and Technology, Beijing University of Chemical Technology, North Third Ring Road 15, Beijing, 100029, China
| | - Jingyang Gao
- The College of Information Science and Technology, Beijing University of Chemical Technology, North Third Ring Road 15, Beijing, 100029, China
| | - Jing Wan
- The College of Information Science and Technology, Beijing University of Chemical Technology, North Third Ring Road 15, Beijing, 100029, China.
| |
Collapse
|
15
|
Huang W, Tian X, Su Y, Zhang S, Chen C, Chen C. Sensing Compound Substructures Combined with Molecular Fingerprinting to Predict Drug-Target Interactions. Interdiscip Sci 2025:10.1007/s12539-025-00698-3. [PMID: 40178777 DOI: 10.1007/s12539-025-00698-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2024] [Revised: 02/19/2025] [Accepted: 02/22/2025] [Indexed: 04/05/2025]
Abstract
Identification of drug-target interactions (DTIs) is critical for drug discovery and drug repositioning. However, most DTI methods that extract features from drug molecules and protein entities neglect specific substructure information of pharmacological responses, which leads to poor predictive performance. Moreover, most existing methods are based on molecular graphs or molecular descriptors to obtain abstract representations of molecules, but combining the two feature learning methods for DTI prediction remains unexplored. Therefore, a new ASCS-DTI framework for DTI prediction is proposed, which utilizes a substructure attention mechanism to flexibly capture substructures of compounds at different grain sizes, allowing the important substructure information of each molecule to be learned. Additionally, the framework combines three different molecular fingerprinting information to comprehensively characterize molecular representations. A stacked convolutional coding module processes the sequence information of target proteins in a multi-scale and multi-level view. Finally, multi-modal fusion of molecular graph features and molecular fingerprint features, along with multi-modal information encoding of DTIs, is performed by the feature fusion module. The method outperforms six advanced baseline models on different benchmark datasets: Biosnap, BindingDB, and Human, with a significant improvement in performance, particularly in maintaining strong results across different experimental settings.
Collapse
Affiliation(s)
- Wanhua Huang
- School of Computer Science and Technology, Xinjiang University, Urumqi, 830046, China
| | - Xuecong Tian
- School of Computer Science and Technology, Xinjiang University, Urumqi, 830046, China
| | - Ying Su
- School of Computer Science and Technology, Xinjiang University, Urumqi, 830046, China
| | - Sizhe Zhang
- School of Software, Xinjiang University, Urumqi, 830046, Xinjiang, China
| | - Chen Chen
- School of Software, Xinjiang University, Urumqi, 830046, Xinjiang, China
| | - Cheng Chen
- School of Software, Xinjiang University, Urumqi, 830046, Xinjiang, China.
| |
Collapse
|
16
|
Wang H, Zhang S, Pan Q, Guo J, Li N, Chen L, Xu J, Zhou J, Gu Y, Wang X, Zhang G, Lian Y, Zhang W, Lin N, Jin Z, Zang Y, Lan W, Cheng X, Tan M, Chen FX, Jiang J, Liu Q, Zheng M, Qin J. Targeting the histone reader ZMYND8 inhibits antiandrogen-induced neuroendocrine tumor transdifferentiation of prostate cancer. NATURE CANCER 2025; 6:629-646. [PMID: 40102673 DOI: 10.1038/s43018-025-00928-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Accepted: 02/10/2025] [Indexed: 03/20/2025]
Abstract
The transdifferentiation from adenocarcinoma to neuroendocrine prostate cancer (NEPC) in men confers antiandrogen therapy resistance. Here our analysis combining CRISPR‒Cas9 screening with single-cell RNA sequencing tracking of tumor transition demonstrated that antiandrogen-induced zinc finger MYND-type containing 8 (ZMYND8)-dependent epigenetic programming orchestrates NEPC transdifferentiation. Ablation of Zmynd8 prevents NEPC development, while ZMYND8 upregulation mediated by achaete-scute homolog 1 promotes NEPC differentiation. We show that forkhead box protein M1 (FOXM1) stabilizes ZMYND8 binding to chromatin regions characterized by H3K4me1-H3K14ac modification and FOXM1 targeting. Antiandrogen therapy releases the SWI/SNF chromatin remodeling complex from the androgen receptor, facilitating its interaction with ZMYND8-FOXM1 to upregulate critical neuroendocrine lineage regulators. We develop iZMYND8-34, a small molecule designed to inhibit ZMYND8's histone recognition, which effectively blocks NEPC development. These findings reveal the critical role of ZMYND8-dependent epigenetic programming induced by androgen deprivation therapy in orchestrating lineage fate. Targeting ZMYND8 emerges as a promising strategy for impeding NEPC development.
Collapse
Affiliation(s)
- Hanling Wang
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, China
| | - Sulin Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, China
| | - Qiang Pan
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, China
- Jinfeng Laboratory, Chongqing, China
| | - Jiacheng Guo
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, China
| | - Ni Li
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, China
- Jinfeng Laboratory, Chongqing, China
| | - Lifan Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, China
| | - Junyu Xu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, China
| | - Jingyi Zhou
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, China
| | - Yongqiang Gu
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, China
| | - Xuege Wang
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, China
| | - Guoying Zhang
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, China
| | - Yannan Lian
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, China
| | - Wei Zhang
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, China
| | - Naiheng Lin
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, China
| | - Zige Jin
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, China
| | - Yi Zang
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, China
| | - Weihua Lan
- Department of Urology, Daping Hospital, Army Medical University, Chongqing, China
| | | | - Minjia Tan
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, China
| | - Fei Xavier Chen
- Fudan University Shanghai Cancer Center, Shanghai Key Laboratory of Medical Epigenetics, International Co-laboratory of Medical Epigenetics and Metabolism (Ministry of Science and Technology), Institutes of Biomedical Sciences, Fudan University, Shanghai, China
| | - Jun Jiang
- Department of Urology, Daping Hospital, Army Medical University, Chongqing, China
| | - Qiuli Liu
- Department of Urology, Daping Hospital, Army Medical University, Chongqing, China.
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, China.
| | - Jun Qin
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, China.
- Jinfeng Laboratory, Chongqing, China.
| |
Collapse
|
17
|
Ünlü A, Ulusoy E, Yiğit MG, Darcan M, Doğan T. Protein language models for predicting drug-target interactions: Novel approaches, emerging methods, and future directions. Curr Opin Struct Biol 2025; 91:103017. [PMID: 39985946 DOI: 10.1016/j.sbi.2025.103017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2024] [Revised: 01/28/2025] [Accepted: 01/29/2025] [Indexed: 02/24/2025]
Abstract
Identifying new drug candidates remains a critical and complex challenge in drug development. Recent advances in deep learning have demonstrated significant potential to accelerate this process, particularly through the use of protein language models (pLMs). These models aim to effectively capture the structural and functional properties of proteins by embedding them in high-dimensional spaces, thereby providing powerful tools for predictive tasks. This review examines the application of pLMs in drug-target interaction (DTI) prediction, addressing both small-molecule and protein-based therapeutics. We explore diverse methodologies, including end-to-end learning models and those that leverage pre-trained foundational pLMs. Furthermore, we highlight the role of heterogeneous data integration-ranging from protein structures to knowledge graphs-to improve the accuracy of DTI predictions. Despite notable progress, challenges persist in accurately identifying DTIs, mainly due to data-related limitations and algorithmic constraints. Future research directions include utilising multimodal learning approaches, incorporating temporal/dynamic interaction data into training, and employing novel deep learning architectures to refine protein representations, gain a deeper understanding of biological context regarding molecular interactions, and, thus, advance the DTI prediction field.
Collapse
Affiliation(s)
- Atabey Ünlü
- Biological Data Science Lab, Dept. of Computer Engineering, Hacettepe University, 06800, Ankara, Türkiye; Dept. of Bioinformatics, Graduate School of Health Sciences, Hacettepe University, 06800, Ankara, Türkiye
| | - Erva Ulusoy
- Biological Data Science Lab, Dept. of Computer Engineering, Hacettepe University, 06800, Ankara, Türkiye; Dept. of Bioinformatics, Graduate School of Health Sciences, Hacettepe University, 06800, Ankara, Türkiye
| | - Melih Gökay Yiğit
- Biological Data Science Lab, Dept. of Computer Engineering, Hacettepe University, 06800, Ankara, Türkiye; Dept. of Computer Engineering, Middle East Technical University, 06800, Ankara, Türkiye
| | - Melih Darcan
- Biological Data Science Lab, Dept. of Computer Engineering, Hacettepe University, 06800, Ankara, Türkiye
| | - Tunca Doğan
- Biological Data Science Lab, Dept. of Computer Engineering, Hacettepe University, 06800, Ankara, Türkiye; Dept. of Bioinformatics, Graduate School of Health Sciences, Hacettepe University, 06800, Ankara, Türkiye; Dept. of Health Informatics, Institute of Informatics, Hacettepe University, 06800, Ankara, Türkiye.
| |
Collapse
|
18
|
Ye Q, Zeng Y, Jiang L, Kang Y, Pan P, Chen J, Deng Y, Zhao H, He S, Hou T, Hsieh C. A Knowledge-Guided Graph Learning Approach Bridging Phenotype- and Target-Based Drug Discovery. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2025; 12:e2412402. [PMID: 40047372 PMCID: PMC12021103 DOI: 10.1002/advs.202412402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2024] [Revised: 01/24/2025] [Indexed: 04/26/2025]
Abstract
Discovering therapeutic molecules requires the integration of both phenotype-based drug discovery (PDD) and target-based drug discovery (TDD). However, this integration remains challenging due to the inherent heterogeneity, noise, and bias present in biomedical data. In this study, Knowledge-Guided Drug Relational Predictor (KGDRP), a graph representation learning approach is developed that effectively integrates multimodal biomedical data, including network data containing biological system information, gene expression data, and sequence data that incorporates chemical molecular structures, all within a heterogeneous graph (HG) structure. By incorporating biomedical HG (BioHG) into a heterogeneous graph neural network (HGNN)-based architecture, KGDRP exhibits a remarkable 12% improvement compared to previous methods in real-world screening scenarios. Notably, the biology-informed representation, derived from KGDRP, significantly enhance target prioritization by 26% in drug target discovery. Furthermore, zero-shot evaluation on COVID-19 exhibited a notably higher success rate in identifying diverse potential drugs. The utilization of BioHG facilitates a unique KGDRP-based analysis of cell-target-drug interactions, thereby enabling the elucidation of drug mechanisms. Overall, KGDRP provides a robust infrastructure for the seamlessly integration of multimodal data and biomedical networks, effectively accelerating PDD, guiding therapeutic target discovery, and ultimately expediting therapeutic molecule discovery.
Collapse
Affiliation(s)
- Qing Ye
- College of Control Science and EngineeringZhejiang UniversityHangzhouZhejiang310027China
- College of Pharmaceutical SciencesZhejiang UniversityHangzhouZhejiang310058China
| | - Yundian Zeng
- College of Control Science and EngineeringZhejiang UniversityHangzhouZhejiang310027China
- College of Pharmaceutical SciencesZhejiang UniversityHangzhouZhejiang310058China
| | - Linlong Jiang
- College of Pharmaceutical SciencesZhejiang UniversityHangzhouZhejiang310058China
| | - Yu Kang
- College of Pharmaceutical SciencesZhejiang UniversityHangzhouZhejiang310058China
| | - Peichen Pan
- College of Pharmaceutical SciencesZhejiang UniversityHangzhouZhejiang310058China
| | - Jiming Chen
- College of Control Science and EngineeringZhejiang UniversityHangzhouZhejiang310027China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., LtdHangzhouZhejiang310018China
| | - Haitao Zhao
- Center for Intelligent and Biomimetic SystemsShenzhen Institutes of Advanced TechnologyChinese Academy of SciencesShenzhenGuangdong440305China
| | - Shibo He
- College of Control Science and EngineeringZhejiang UniversityHangzhouZhejiang310027China
| | - Tingjun Hou
- College of Pharmaceutical SciencesZhejiang UniversityHangzhouZhejiang310058China
| | - Chang‐Yu Hsieh
- College of Pharmaceutical SciencesZhejiang UniversityHangzhouZhejiang310058China
| |
Collapse
|
19
|
Chen M, Gong X, Pan S, Wu J, Lin F, Du B, Hu W. Unified Knowledge-Guided Molecular Graph Encoder with multimodal fusion and multi-task learning. Neural Netw 2025; 184:107068. [PMID: 39732065 DOI: 10.1016/j.neunet.2024.107068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2024] [Revised: 12/02/2024] [Accepted: 12/17/2024] [Indexed: 12/30/2024]
Abstract
The remarkable success of Graph Neural Networks underscores their formidable capacity to assimilate multimodal inputs, markedly enhancing performance across a broad spectrum of domains. In the context of molecular modeling, considerable efforts have been made to enrich molecular representations by integrating data from diverse aspects. Nevertheless, current methodologies frequently compartmentalize geometric and semantic components, resulting in a fragmented approach that impairs the holistic integration of molecular attributes. This constrained scope limits the generalizability and efficacy of such models in downstream applications. A pivotal challenge lies in harmonizing heterogeneous data sources, particularly in addressing the inherent inconsistencies and sparsity within multimodal molecular datasets. To overcome these limitations, we present the Unified Knowledge-Guided Molecular Graph Encoder (UKGE), a groundbreaking framework that leverages heterogeneous graphs to unify the representation of diverse molecular modalities. Unlike prior methods, UKGE reconciles geometric and semantic features through the use of elemental knowledge graphs (KGs) and meta-path definitions by constructing Unified Molecular Graphs, enabling comprehensive and unified molecular representations. It employs an innovative Meta-Path Aware Message Passing mechanism within its molecular encoder, enhancing the integration of multimodal data. Additionally, a multi-task learning strategy balances data from different modalities, further enriching UKGE's capability to embed complex biological insights.Empirical evaluations highlight UKGE's excellence across tasks: DDI prediction achieves 96.91% ACC and 99.14% AUC in warm-start settings, with 83.15% ACC in cold-start scenarios. For CPI prediction, it reaches 0.644 CI on Davis and 0.659 on KIBA. In LBDD, it achieves 99.3% validity, 98.4% uniqueness, and 98.9% novelty, establishing UKGE as a state-of-the-art molecular modeling framework.
Collapse
Affiliation(s)
- Mukun Chen
- School of Computer Science, Wuhan University, Luojiashan Road, Wuchang District., Wuhan, 430072, Hubei Province, China.
| | - Xiuwen Gong
- University of Technology Sydney, 15 Broadway Ultimo, NSW 2007, Sydney, 2007, Australia.
| | - Shirui Pan
- School of Information and Communication Technology, Griffith University, 170 Kessels Road, Nathan Qld 4111, Queensland, 4111, Australia.
| | - Jia Wu
- School of Computing, Macquarie University, Balaclava Rd, Macquarie Park NSW 2109, Sydney, 2109, Australia.
| | - Fu Lin
- School of Computer Science, Wuhan University, Luojiashan Road, Wuchang District., Wuhan, 430072, Hubei Province, China.
| | - Bo Du
- School of Computer Science, Wuhan University, Luojiashan Road, Wuchang District., Wuhan, 430072, Hubei Province, China.
| | - Wenbin Hu
- School of Computer Science, Wuhan University, Luojiashan Road, Wuchang District., Wuhan, 430072, Hubei Province, China; Hubei Key Laboratory of Digital Finance Innovation, Hubei University of Economics, No. 8, Yangqiaohu Avenue, Zanglong Island Development Zone, Jiangxia District, Wuhan, 2007, Hubei Province, China.
| |
Collapse
|
20
|
Hou Z, Xu Z, Yan C, Luo H, Luo J. CPI-GGS: A deep learning model for predicting compound-protein interaction based on graphs and sequences. Comput Biol Chem 2025; 115:108326. [PMID: 39752853 DOI: 10.1016/j.compbiolchem.2024.108326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2024] [Revised: 12/17/2024] [Accepted: 12/24/2024] [Indexed: 02/26/2025]
Abstract
BACKGROUND Compound-protein interaction (CPI) is essential to drug discovery and design, where traditional methods are often costly and have low success rates. Recently, the integration of machine learning and deep learning in CPI research has shown potential to reduce costs and enhance discovery efficiency by improving protein target identification accuracy. Additionally, with an urgent need for novel therapies against complex diseases, CPI investigation could lead to the identification of effective new drugs. Since drug-target interactions involve complex biological processes, refined models are necessary for precise feature extraction and analysis. Nevertheless, current CPI prediction methods still face significant limitations: predictions lack sufficient accuracy, models require improved generalization ability, and further validation across diverse datasets remains essential. RESULTS To address some issues at the current stage, this paper proposes a combined deep learning method, CPI-GGS, for predicting and analyzing compound-protein interactions. The source code is available on GitHub at https://github.com/xingjie321/CPI-GGS. CONCLUSIONS The experimental results demonstrate improved accuracy in predicting compound-protein interactions and enhance the understanding of how compounds and proteins interact, providing a valuable new tool for drug discovery and development.
Collapse
Affiliation(s)
- Zhanwei Hou
- School of Software, Henan Polytechnic University, Jiaozuo 454003, China
| | - Zhenhan Xu
- School of Software, Henan Polytechnic University, Jiaozuo 454003, China
| | - Chaokun Yan
- School of Computer and Information Engineering, Henan University, Kaifeng 475001, China
| | - Huimin Luo
- School of Computer and Information Engineering, Henan University, Kaifeng 475001, China
| | - Junwei Luo
- School of Software, Henan Polytechnic University, Jiaozuo 454003, China.
| |
Collapse
|
21
|
Jing Y, Zhang D, Li L. H2GnnDTI: hierarchical heterogeneous graph neural networks for drug-target interaction prediction. Bioinformatics 2025; 41:btaf117. [PMID: 40097269 PMCID: PMC11954568 DOI: 10.1093/bioinformatics/btaf117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Revised: 02/10/2025] [Accepted: 03/13/2025] [Indexed: 03/19/2025] Open
Abstract
MOTIVATION Identifying drug-target interactions (DTIs) is a crucial step in drug repurposing and drug discovery. The significant increase in demand and the expensive nature for experimentally identifying DTIs necessitate computational tools for automated prediction and comprehension of DTIs. Despite recent advancements, current methods fail to fully leverage the hierarchical information in DTIs. RESULTS Here, we introduce H2GnnDTI, a novel two-level hierarchical heterogeneous graph learning model to predict DTIs, by integrating the structures of drugs and proteins via a low-level view GNN and a high-level view GNN. The hierarchical graph consists of high-level heterogeneous nodes representing drugs and proteins, connected by edges representing known DTIs. Each drug or protein node is further detailed in a low-level graph, where nodes represent molecules within each drug or amino acids within each protein, accompanied by their respective chemical descriptors. Two distinct low-level graph neural networks are first deployed to capture structural and chemical features specific to drugs and proteins from these low-level graphs. Subsequently, a high-level graph encoder (GE) is used to comprehensively capture and merge interactive features pertaining to drugs and proteins from the high-level graph. The high-level encoder incorporates a structure and attribute information fusion module designed to explicitly integrate representations acquired from both a feature encoder and a GE, facilitating consensus representation learning. Extensive experiments conducted on three benchmark datasets have shown that our proposed H2GnnDTI model consistently outperforms state-of-the-art deep learning methods. AVAILABILITY AND IMPLEMENTATION The codes are freely available at https://github.com/LiminLi-xjtu/H2GnnDTI.
Collapse
Affiliation(s)
- Yueying Jing
- School of Mathematics and Statistics, Xi’an Jiaotong University, Xi'an, Shaanxi 710049, China
| | - Dongxue Zhang
- School of Mathematics and Statistics, Xi’an Jiaotong University, Xi'an, Shaanxi 710049, China
| | - Limin Li
- School of Mathematics and Statistics, Xi’an Jiaotong University, Xi'an, Shaanxi 710049, China
| |
Collapse
|
22
|
Zhong F, Yue R, Chen J, Wang D, Ma S, Chen S. Folding-Based End-To-End Chemical Drug Design with Uncertainty Estimation: Tackling Hallucination in the Post-GPT Era. J Med Chem 2025; 68:6804-6814. [PMID: 40056132 DOI: 10.1021/acs.jmedchem.5c00271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/10/2025]
Abstract
In the post-GPT era, Llama-Gram represents a promising advancement in AI-driven chemical drug discovery, grounded in the chemical principle that molecular structure determines properties. This folding-based end-to-end framework seeks to address the hallucination issues of traditional large language models by integrating protein folding embeddings, graph-based molecular representations, and uncertainty estimation to better capture the structural complexities of protein-ligand interactions. By leveraging the frozen-gradient ESMFold model and a Graph Transformer variant, Llama-Gram aims to enhance predictive accuracy and reliability through grouped-query attention and a Gram layer inspired by support points theory. By incorporating protein folding information, the model demonstrates competitive performance against state-of-the-art approaches such as Transformer CPI 2.0 and Graph-DTA, offering improvements in compound-target interaction. Llama-Gram provides a scalable and innovative chemical theory that could contribute to accelerating the chemical drug discovery process.
Collapse
Affiliation(s)
- Feisheng Zhong
- Fujian Key Laboratory of Drug Target Discovery and Structural and Functional Research, School of Pharmacy, Fujian Medical University, Fuzhou 350122, China
| | - Rongcai Yue
- Fujian Key Laboratory of Drug Target Discovery and Structural and Functional Research, School of Pharmacy, Fujian Medical University, Fuzhou 350122, China
| | - Jinxing Chen
- Fujian Key Laboratory of Drug Target Discovery and Structural and Functional Research, School of Pharmacy, Fujian Medical University, Fuzhou 350122, China
- The Graduate School of Fujian Medical University, Fujian Medical University, Fuzhou 350122, China
| | - Dingyan Wang
- Lingang Laboratory, Shanghai 200031, China
- Shanghai Center for Innovative Drug Discovery and Development, Shanghai 201306, China
| | - Shaojie Ma
- Fujian Key Laboratory of Drug Target Discovery and Structural and Functional Research, School of Pharmacy, Fujian Medical University, Fuzhou 350122, China
- Jiangsu Key Laboratory of Marine Pharmaceutical Compound Screening, College of Pharmacy, Jiangsu Ocean University, Lianyungang 222005, China
| | - Shiming Chen
- Fujian Key Laboratory of Drug Target Discovery and Structural and Functional Research, School of Pharmacy, Fujian Medical University, Fuzhou 350122, China
- The Graduate School of Fujian Medical University, Fujian Medical University, Fuzhou 350122, China
| |
Collapse
|
23
|
Gao X, Yan M, Zhang C, Wu G, Shang J, Zhang C, Yang K. MDNN-DTA: a multimodal deep neural network for drug-target affinity prediction. Front Genet 2025; 16:1527300. [PMID: 40182923 PMCID: PMC11965683 DOI: 10.3389/fgene.2025.1527300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2024] [Accepted: 02/24/2025] [Indexed: 04/05/2025] Open
Abstract
Determining drug-target affinity (DTA) is a pivotal step in drug discovery, where in silico methods can significantly improve efficiency and reduce costs. Artificial intelligence (AI), especially deep learning models, can automatically extract high-dimensional features from the biological sequences of drug molecules and target proteins. This technology demonstrates lower complexity in DTA prediction compared to traditional experimental methods, particularly when handling large-scale data. In this study, we introduce a multimodal deep neural network model for DTA prediction, referred to as MDNN-DTA. This model employs Graph Convolutional Networks (GCN) and Convolutional Neural Networks (CNN) to extract features from the drug and protein sequences, respectively. One notable strength of our method is its ability to accurately predict DTA directly from the sequences of the target proteins, obviating the need for protein 3D structures, which are frequently unavailable in drug discovery. To comprehensively extract features from the protein sequence, we leverage an ESM pre-trained model for extracting biochemical features and design a specific Protein Feature Extraction (PFE) block for capturing both global and local features of the protein sequence. Furthermore, a Protein Feature Fusion (PFF) Block is engineered to augment the integration of multi-scale protein features derived from the abovementioned techniques. We then compare MDNN-DTA with other models on the same dataset, conducting a series of ablation experiments to assess the performance and efficacy of each component. The results highlight the advantages and effectiveness of the MDNN-DTA method.
Collapse
Affiliation(s)
- Xu Gao
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, China
- National Supercomputing Center in Zhengzhou, Zhengzhou, China
| | - Mengfan Yan
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, China
- National Supercomputing Center in Zhengzhou, Zhengzhou, China
| | - Chengwei Zhang
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, China
- National Supercomputing Center in Zhengzhou, Zhengzhou, China
| | - Gang Wu
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, China
- National Supercomputing Center in Zhengzhou, Zhengzhou, China
| | - Jiandong Shang
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, China
- National Supercomputing Center in Zhengzhou, Zhengzhou, China
| | - Congxiang Zhang
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, China
- National Supercomputing Center in Zhengzhou, Zhengzhou, China
| | - Kecheng Yang
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, China
- National Supercomputing Center in Zhengzhou, Zhengzhou, China
| |
Collapse
|
24
|
Lu Z, Song G, Zhu H, Lei C, Sun X, Wang K, Qin L, Chen Y, Tang J, Li M. DTIAM: a unified framework for predicting drug-target interactions, binding affinities and drug mechanisms. Nat Commun 2025; 16:2548. [PMID: 40089473 PMCID: PMC11910601 DOI: 10.1038/s41467-025-57828-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Accepted: 02/26/2025] [Indexed: 03/17/2025] Open
Abstract
Accurate and robust prediction of drug-target interactions (DTIs) plays a vital role in drug discovery but remains challenging due to limited labeled data, cold start problems, and insufficient understanding of mechanisms of action (MoA). Distinguishing activation and inhibition mechanisms is particularly critical in clinical applications. Here, we propose DTIAM, a unified framework for predicting interactions, binding affinities, and activation/inhibition mechanisms between drugs and targets. DTIAM learns drug and target representations from large amounts of label-free data through self-supervised pre-training, which accurately extracts their substructure and contextual information, and thus benefits the downstream prediction based on these representations. DTIAM achieves substantial performance improvement over other state-of-the-art methods in all tasks, particularly in the cold start scenario. Moreover, independent validation demonstrates the strong generalization ability of DTIAM. All these results suggest that DTIAM can provide a practically useful tool for predicting novel DTIs and further distinguishing the MoA of candidate drugs.
Collapse
Affiliation(s)
- Zhangli Lu
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| | - Guoqiang Song
- School of Health Sciences and Biomedical Engineering, Hebei University of Technology, Tianjin, 300401, China
| | - Huimin Zhu
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| | - Chuqi Lei
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| | - Xinliang Sun
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| | - Kaili Wang
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| | - Libo Qin
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| | - Yafei Chen
- School of Health Sciences and Biomedical Engineering, Hebei University of Technology, Tianjin, 300401, China
| | - Jing Tang
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki, 00290, Finland
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China.
- Xiangjiang Laboratory, Changsha, 410205, China.
- Furong Laboratory, Central South University, Changsha, 410013, China.
| |
Collapse
|
25
|
Quan L, Wu J, Jiang Y, Pan D, Qiang L. DTA-GTOmega: Enhancing Drug-Target Binding Affinity Prediction with Graph Transformers Using OmegaFold Protein Structures. J Mol Biol 2025; 437:168843. [PMID: 39481634 DOI: 10.1016/j.jmb.2024.168843] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2024] [Revised: 10/05/2024] [Accepted: 10/24/2024] [Indexed: 11/02/2024]
Abstract
Understanding drug-protein interactions is crucial for elucidating drug mechanisms and optimizing drug development. However, existing methods have limitations in representing the three-dimensional structure of targets and capturing the complex relationships between drugs and targets. This study proposes a new method, DTA-GTOmega, for predicting drug-target binding affinity. DTA-GTOmega utilizes OmegaFold to predict protein three-dimensional structure and construct target graphs, while processing drug SMILES sequences with RDKit to generate drug graphs. By employing multi-layer graph transformer modules and co-attention modules, this method effectively integrates atomic-level features of drugs and residue-level features of targets, accurately modeling the complex interactions between drugs and targets, thereby significantly improving the accuracy of binding affinity predictions. Our method outperforms existing techniques on benchmark datasets such as KIBA, Davis, and BindingDB_Kd under cold-start setting. Moreover, DTA-GTOmega demonstrates competitive performance in real-world DTI scenarios involving DrugBank data and drug-target interactions related to cardiovascular and nervous system-related diseases, highlighting its robust generalization capabilities. Additionally, the introduced DTI evaluation metrics further validate DTA-GTOmega's potential in handling imbalanced data.
Collapse
Affiliation(s)
- Lijun Quan
- School of Computer Science and Technology, Soochow University, Jiangsu 215006, China; Collaborative Innovation Center of Novel Software Technology and Industrialization, Jiangsu 210000, China
| | - Jian Wu
- China Mobile (Suzhou) Software Technology Co., Ltd., Suzhou 215000, China
| | - Yelu Jiang
- School of Computer Science and Technology, Soochow University, Jiangsu 215006, China
| | - Deng Pan
- School of Computer Science and Technology, Soochow University, Jiangsu 215006, China
| | - Lyu Qiang
- School of Computer Science and Technology, Soochow University, Jiangsu 215006, China; Collaborative Innovation Center of Novel Software Technology and Industrialization, Jiangsu 210000, China.
| |
Collapse
|
26
|
Li W, Li X, Wang M, Liu F, Luo Y, Guo R, Pan Q. Protein-ligand interaction prediction based on heterogeneity maps and data enhancement. J Biomol Struct Dyn 2025:1-13. [PMID: 40072484 DOI: 10.1080/07391102.2025.2475229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2024] [Accepted: 02/20/2025] [Indexed: 03/14/2025]
Abstract
Prediction of protein-ligand interactions is critical for drug discovery and repositioning. Traditional prediction methods are computationally intensive and limited in modeling structural changes. In contrast, data-driven deep learning methods significantly reduce computational costs and offer a more efficient approach for drug discovery. However, existing models often fail to fully exploit metadata and low-frequency features, leading to suboptimal performance on sparse, imbalanced datasets. To address these challenges, this paper proposes a novel interaction prediction model based on heterogeneous graphs and data enhancement, named Heterogeneous Graph Enhanced Fusion Network (HGEF-Net). The model utilizes a heterogeneous information learning module, which deeply analyzes molecular subgraphs and substructures, fully leveraging metadata features to better capture the biological interactions between ligands and proteins. Additionally, to address the issue of low-frequency category features, a data enhancement strategy based on multi-level contrastive learning is proposed. Furthermore, a heterogeneous attention integration framework is presented, which uses multi-level attention to assign different weights to various features. This approach efficiently fuses both intramolecular and intermolecular features, enhancing the model's ability to capture key information and improving its performance on sparse, imbalanced datasets. Experimental results show that HGEF-Net outperforms other state-of-the-art models. On the BindingDB dataset (1:100 positive-to-negative ratio), HGEF-Net achieves an AUC of 0.826, AUPRC of 0.811, Precision of 0.715, and Recall of 0.709. On the Davis dataset (1:10 ratio), the data enhancement module improves AUC, AUPRC, Precision, and Recall by 11.7%, 9.7%, 10.5%, and 16.3%, respectively, validating the model's effectiveness.
Collapse
Affiliation(s)
- Weimin Li
- School of Computer Engineering and Science, Shanghai University, Shanghai, China
| | - Xiaoyang Li
- School of Computer Engineering and Science, Shanghai University, Shanghai, China
| | - Mengying Wang
- School of Computer Engineering and Science, Shanghai University, Shanghai, China
| | - Fangfang Liu
- School of Computer Engineering and Science, Shanghai University, Shanghai, China
| | - Yin Luo
- School of Life Sciences, East China Normal University, Shanghai, China
| | - Ruiqiang Guo
- College of Computer and Cyber Security, Hebei Normal University, Shijiazhuang, Hebei, China
| | - Quanke Pan
- School of Mechatronic Engineering and automation, Shanghai University, Shanghai, China
| |
Collapse
|
27
|
Michels J, Bandarupalli R, Ahangar Akbari A, Le T, Xiao H, Li J, Hom EFY. Natural Language Processing Methods for the Study of Protein-Ligand Interactions. J Chem Inf Model 2025; 65:2191-2213. [PMID: 39993834 PMCID: PMC11898065 DOI: 10.1021/acs.jcim.4c01907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2024] [Revised: 02/05/2025] [Accepted: 02/06/2025] [Indexed: 02/26/2025]
Abstract
Natural Language Processing (NLP) has revolutionized the way computers are used to study and interact with human languages and is increasingly influential in the study of protein and ligand binding, which is critical for drug discovery and development. This review examines how NLP techniques have been adapted to decode the "language" of proteins and small molecule ligands to predict protein-ligand interactions (PLIs). We discuss how methods such as long short-term memory (LSTM) networks, transformers, and attention mechanisms can leverage different protein and ligand data types to identify potential interaction patterns. Significant challenges are highlighted including the scarcity of high-quality negative data, difficulties in interpreting model decisions, and sampling biases in existing data sets. We argue that focusing on improving data quality, enhancing model robustness, and fostering both collaboration and competition could catalyze future advances in machine-learning-based predictions of PLIs.
Collapse
Affiliation(s)
- James Michels
- Department
of Computer and Information Science, University
of Mississippi, University, Mississippi 38677, United States
| | - Ramya Bandarupalli
- Department
of BioMolecular Sciences, School of Pharmacy, University of Mississippi, University, Mississippi 38677, United States
| | - Amin Ahangar Akbari
- Department
of BioMolecular Sciences, School of Pharmacy, University of Mississippi, University, Mississippi 38677, United States
| | - Thai Le
- Department
of Computer Science, Indiana University, Bloomington, Indiana 47408, United States
| | - Hong Xiao
- Department
of Computer and Information Science and Institute for Data Science, University of Mississippi, University, Mississippi 38677, United States
| | - Jing Li
- Department
of BioMolecular Sciences, School of Pharmacy, University of Mississippi, University, Mississippi 38677, United States
| | - Erik F. Y. Hom
- Department
of Biology and Center for Biodiversity and Conservation Research, University of Mississippi, University, Mississippi 38677, United States
| |
Collapse
|
28
|
Peng L, Liu X, Yang L, Liu L, Bai Z, Chen M, Lu X, Nie L. BINDTI: A Bi-Directional Intention Network for Drug-Target Interaction Identification Based on Attention Mechanisms. IEEE J Biomed Health Inform 2025; 29:1602-1612. [PMID: 38457318 DOI: 10.1109/jbhi.2024.3375025] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/10/2024]
Abstract
The identification of drug-target interactions (DTIs) is an essential step in drug discovery. In vitro experimental methods are expensive, laborious, and time-consuming. Deep learning has witnessed promising progress in DTI prediction. However, how to precisely represent drug and protein features is a major challenge for DTI prediction. Here, we developed an end-to-end DTI identification framework called BINDTI based on bi-directional Intention network. First, drug features are encoded with graph convolutional networks based on its 2D molecular graph obtained by its SMILES string. Next, protein features are encoded based on its amino acid sequence through a mixed model called ACmix, which integrates self-attention mechanism and convolution. Third, drug and target features are fused through bi-directional Intention network, which combines Intention and multi-head attention. Finally, unknown drug-target (DT) pairs are classified through multilayer perceptron based on the fused DT features. The results demonstrate that BINDTI greatly outperformed four baseline methods (i.e., CPI-GNN, TransfomerCPI, MolTrans, and IIFDTI) on the BindingDB, BioSNAP, DrugBank, and Human datasets. More importantly, it was more appropriate to predict new DTIs than the four baseline methods on imbalanced datasets. Ablation experimental results elucidated that both bi-directional Intention and ACmix could greatly advance DTI prediction. The fused feature visualization and case studies manifested that the predicted results by BINDTI were basically consistent with the true ones. We anticipate that the proposed BINDTI framework can find new low-cost drug candidates, improve drugs' virtual screening, and further facilitate drug repositioning as well as drug discovery.
Collapse
|
29
|
Yin J, Zhang H, Sun X, You N, Mou M, Lu M, Pan Z, Li F, Li H, Zeng S, Zhu F. Decoding Drug Response With Structurized Gridding Map-Based Cell Representation. IEEE J Biomed Health Inform 2025; 29:1702-1713. [PMID: 38090819 DOI: 10.1109/jbhi.2023.3342280] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/08/2025]
Abstract
A thorough understanding of cell-line drug response mechanisms is crucial for drug development, repurposing, and resistance reversal. While targeted anticancer therapies have shown promise, not all cancers have well-established biomarkers to stratify drug response. Single-gene associations only explain a small fraction of the observed drug sensitivity, so a more comprehensive method is needed. However, while deep learning models have shown promise in predicting drug response in cell lines, they still face significant challenges when it comes to their application in clinical applications. Therefore, this study proposed a new strategy called DD-Response for cell-line drug response prediction. First, a limitation of narrow modeling horizons was overcome to expand the model training domain by integrating multiple datasets through source-specific label binarization. Second, a modified representation based on a two-dimensional structurized gridding map (SGM) was developed for cell lines & drugs, avoiding feature correlation neglect and potential information loss. Third, a dual-branch, multi-channel convolutional neural network-based model for pairwise response prediction was constructed, enabling accurate outcomes and improved exploration of underlying mechanisms. As a result, the DD-Response demonstrated superior performance, captured cell-line characteristic variations, and provided insights into key factors impacting cell-line drug response. In addition, DD-Response exhibited scalability in predicting clinical patient responses to drug therapy. Overall, because of DD-response's excellent ability to predict drug response and capture key molecules behind them, DD-response is expected to greatly facilitate drug discovery, repurposing, resistance reversal, and therapeutic optimization.
Collapse
|
30
|
Cheng Z, Xu D, Ding D, Ding Y. Prediction of Drug-Target Interactions With High- Quality Negative Samples and a Network-Based Deep Learning Framework. IEEE J Biomed Health Inform 2025; 29:1567-1578. [PMID: 38227407 DOI: 10.1109/jbhi.2024.3354953] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2024]
Abstract
Identification of drug-target interactions (DTIs) plays a crucial role in drug discovery. Compared to traditional experimental methods, computer-based methods for predicting DTIs can significantly reduce the time and financial burdens of drug development. In recent years, numerous machine learning-based methods have been proposed for predicting potential DTIs. However, a common limitation among these methods is the absence of high-quality negative samples. Moreover, the effective extraction of multisource information of drugs and proteins for DTI prediction remains a significant challenge. In this paper, we investigated two aspects: the selection of high-quality negative samples and the construction of a high-performance DTI prediction framework. Specifically, we found two types of hidden biases when randomly selecting negative samples from unlabeled drug-protein pairs and proposed a negative sample selection approach based on complex network theory. Furthermore, we proposed a novel DTI prediction method named HNetPa-DTI, which integrates topological information from the drug-protein-disease heterogeneous network and gene ontology (GO) and pathway annotation information of proteins. Specifically, we extracted topological information of the drug-protein-disease heterogeneous network using heterogeneous graph neural networks, and obtained GO and pathway annotation information of proteins from the GO term semantic similarity networks, GO term-protein bipartite networks, and pathway-protein bipartite network using graph neural networks. Experimental results show that HNetPa-DTI outperforms the baseline methods on four types of prediction tasks, demonstrating the superiority of our method.
Collapse
|
31
|
Zhang W, Hu F, Yin P, Cai Y. A transferability-guided protein-ligand interaction prediction method. Methods 2025; 235:64-70. [PMID: 39920915 DOI: 10.1016/j.ymeth.2025.01.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2024] [Revised: 01/19/2025] [Accepted: 01/21/2025] [Indexed: 02/10/2025] Open
Abstract
Accurate prediction of protein-ligand interaction (PLI) is crucial for drug discovery and development. However, existing methods often struggle with effectively integrating heterogeneous protein and ligand data modalities and optimizing knowledge transfer from pretraining to the target task. This paper proposes a novel transferability-guided PLI prediction method that maximizes knowledge transfer by deeply integrating protein and ligand representations through a cross-attention mechanism and incorporating transferability metrics to guide fine-tuning. The cross-attention mechanism facilitates interactive information exchange between modalities, enabling the model to capture intricate interdependencies. Meanwhile, the transferability-guided strategy quantifies transferability from pretraining tasks and incorporates it into the training objective, ensuring the effective utilization of beneficial knowledge while mitigating negative transfer. Extensive experiments demonstrate significant and consistent improvements over traditional fine-tuning, validated by statistical tests. Ablation studies highlight the pivotal role of cross-attention, and quantitative analysis reveals the method's ability to reduce harmful transfer. Our guided strategy provides a paradigm for more comprehensive utilization of pretraining knowledge, offering prospects for enhancing other PLI prediction approaches. This method advances PLI prediction via innovative modality fusion and guided knowledge transfer, paving the way for accelerated drug discovery pipelines. Code and data are freely available at https://github.com/brian-zZZ/Guided-PLI.
Collapse
Affiliation(s)
- Weihong Zhang
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Fan Hu
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.
| | - Peng Yin
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Yunpeng Cai
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.
| |
Collapse
|
32
|
Lu Y, Zhang R, Jiang T, Fu Q, Cui Z, Wu H. TrGPCR: GPCR-Ligand Binding Affinity Prediction Based on Dynamic Deep Transfer Learning. IEEE J Biomed Health Inform 2025; 29:1613-1624. [PMID: 37610904 DOI: 10.1109/jbhi.2023.3307928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/25/2023]
Abstract
Predicting G protein-coupled receptor (GPCR) -ligand binding affinity plays a crucial role in drug development. However, determining GPCR-ligand binding affinities is time-consuming and resource-intensive. Although many studies used data-driven methods to predict binding affinity, most of these methods required protein 3D structure, which was often unknown. Moreover, part of these studies only considered the sequence characteristics of the protein, ignoring the secondary structure of the protein. The number of known GPCR for affinity prediction is only a few thousand, which is insufficient for deep learning training. Therefore, this study aimed to propose a deep transfer learning method called TrGPCR, which used dynamic transfer learning to solve the problem of insufficient GPCR data. We used the Binding Database (BindingDB) as the source domain and the GLASS (GPCR-Ligand Association) database as the target domain. We also introduced protein secondary structures, called pockets, as features to predict binding affinities. Compared with DeepDTA, our model improved by 5.2% on RMSE (root mean square error) and 4.5% on MAE (mean squared error).
Collapse
|
33
|
Bi X, Zhang S, Ma W, Jiang H, Wei Z. HiSIF-DTA: A Hierarchical Semantic Information Fusion Framework for Drug-Target Affinity Prediction. IEEE J Biomed Health Inform 2025; 29:1579-1590. [PMID: 37983161 DOI: 10.1109/jbhi.2023.3334239] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Abstract
Accurately identifying drug-target affinity (DTA) plays a significant role in promoting drug discovery and has attracted increasing attention in recent years. Exploring appropriate protein representation methods and increasing the abundance of protein information is critical in enhancing the accuracy of DTA prediction. Recently, numerous deep learning-based models have been proposed to utilize the sequential or structural features of target proteins. However, these models capture only the low-order semantics that exist in a single protein, while the high-order semantics abundant in biological networks are largely ignored. In this article, we propose HiSIF-DTA-a hierarchical semantic information fusion framework for DTA prediction. In this framework, a hierarchical protein graph is constructed that includes not only contact maps as low-order structural semantics but also protein-protein interaction (PPI) networks as high-order functional semantics. Particularly, two distinct hierarchical fusion strategies (i.e., Top-down and Bottom-Up) are designed to integrate the different protein semantics, therefore contributing to a richer protein representation. Comprehensive experimental results demonstrate that HiSIF-DTA outperforms current state -of-the-art methods for prediction on the benchmark datasets of the DTA task. Further validation on binary tasks and visualization analysis demonstrates the generalization and interpretation abilities of the proposed method.
Collapse
|
34
|
Yuan Y, Chen S, Hu R, Wang X. MutualDTA: An Interpretable Drug-Target Affinity Prediction Model Leveraging Pretrained Models and Mutual Attention. J Chem Inf Model 2025; 65:1211-1227. [PMID: 39878060 DOI: 10.1021/acs.jcim.4c01893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2025]
Abstract
Efficient and accurate drug-target affinity (DTA) prediction can significantly accelerate the drug development process. Recently, deep learning models have been widely applied to DTA prediction and have achieved notable success. However, existing methods often encounter several common issues: first, the data representations lack sufficient information; second, the extracted features are not comprehensive; and third, most methods lack interpretability when modeling drug-target binding. To overcome the above-mentioned problems, we propose an interpretable deep learning model called MutualDTA for predicting DTA. MutualDTA leverages the power of pretrained models to obtain accurate representations of drugs and targets. It also employs well-designed modules to extract hidden features from these representations. Furthermore, the interpretability of MutualDTA is realized by the Mutual-Attention module, which (i) establishes relationships between drugs and proteins from the perspective of intermolecular interactions between drug atoms and protein amino acid residues and (ii) allows MutualDTA to capture the binding sites based on attention scores. The test results on two benchmark data sets show that MutualDTA achieves the best performance compared to the 12 state-of-the-art models. Attention visualization experiments show that MutualDTA can capture partial interaction sites, which not only helps drug developers reduce the search space for binding sites, but also demonstrates the interpretability of MutualDTA. Finally, the trained MutualDTA is applied to screen high-affinity drug screens targeting Alzheimer's disease (AD)-related proteins, and the screened drugs are partially present in the anti-AD drug library. These results demonstrate the reliability of MutualDTA in drug development.
Collapse
Affiliation(s)
- Yongna Yuan
- School of Information Science & Engineering, Lanzhou University, Lanzhou 730000, China
| | - Siming Chen
- School of Information Science & Engineering, Lanzhou University, Lanzhou 730000, China
| | - Rizhen Hu
- School of Information Science & Engineering, Lanzhou University, Lanzhou 730000, China
| | - Xin Wang
- School of Information Science & Engineering, Lanzhou University, Lanzhou 730000, China
| |
Collapse
|
35
|
Talo M, Bozdag S. Top-DTI: Integrating Topological Deep Learning and Large Language Models for Drug Target Interaction Prediction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.07.637146. [PMID: 39975019 PMCID: PMC11839103 DOI: 10.1101/2025.02.07.637146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Abstract
Motivation The accurate prediction of drug-target interactions (DTI) is a crucial step in drug discovery, providing a foundation for identifying novel therapeutics. Traditional drug development is both costly and time-consuming, often spanning over a decade. Computational approaches help narrow the pool of compound candidates, offering significant starting points for experimental validation. In this study, we propose Top-DTI framework for predicting DTI by integrating topological data analysis (TDA) with large language models (LLMs). Top-DTI leverages persistent homology to extract topological features from protein contact maps and drug molecular images. Simultaneously, protein and drug LLMs generate semantically rich embeddings that capture sequential and contextual information from protein sequences and drug SMILES strings. By combining these complementary features, Top-DTI enhances predictive performance and robustness. Results Experimental results on the public BioSNAP and Human DTI benchmark datasets demonstrate that the proposed Top-DTI model outperforms state-of-the-art approaches across multiple evaluation metrics, including AUROC, AUPRC, sensitivity, and specificity. Furthermore, the Top-DTI model achieves superior performance in the challenging cold-split scenario, where the test and validation sets contain drugs or targets absent from the training set. This setting simulates real-world scenarios and highlights the robustness of the model. Notably, incorporating topological features alongside LLM embeddings significantly improves predictive performance, underscoring the value of integrating structural and sequence-based representations. Availability The data and source code of Top-DTI is available at https://github.com/bozdaglab/Top_DTI under Creative Commons Attribution Non Commercial 4.0 International Public License.
Collapse
Affiliation(s)
- Muhammed Talo
- Department of Computer Science and Engineering, University of North Texas, Denton, TX 76207, USA
- BioDiscovery Institute, University of North Texas, Denton, TX 76207, USA
- Center for Computational Life Sciences, University of North Texas, Denton, TX 76207, USA
| | - Serdar Bozdag
- Department of Computer Science and Engineering, University of North Texas, Denton, TX 76207, USA
- BioDiscovery Institute, University of North Texas, Denton, TX 76207, USA
- Center for Computational Life Sciences, University of North Texas, Denton, TX 76207, USA
- Department of Mathematics, University of North Texas, Denton, TX 76207, USA
| |
Collapse
|
36
|
Schuh MG, Boldini D, Bohne AI, Sieber SA. Barlow Twins deep neural network for advanced 1D drug-target interaction prediction. J Cheminform 2025; 17:18. [PMID: 39910404 PMCID: PMC11800607 DOI: 10.1186/s13321-025-00952-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2024] [Accepted: 01/08/2025] [Indexed: 02/07/2025] Open
Abstract
Accurate prediction of drug-target interactions is critical for advancing drug discovery. By reducing time and cost, machine learning and deep learning can accelerate this laborious discovery process. In a novel approach, BarlowDTI, we utilise the powerful Barlow Twins architecture for feature-extraction while considering the structure of the target protein. Our method achieves state-of-the-art predictive performance against multiple established benchmarks using only one-dimensional input. The use of our hybrid approach of deep learning and gradient boosting machine as the underlying predictor ensures fast and efficient predictions without the need for substantial computational resources. We also propose the use of an influence method to investigate how the model reaches its decision based on individual training samples. By comparing co-crystal structures, we find that BarlowDTI effectively exploits catalytically active and stabilising residues, highlighting the model's ability to generalise from one-dimensional input data. In addition, we further benchmark new baselines against existing methods. Together, these innovations improve the efficiency and effectiveness of drug-target interactions predictions, providing robust tools for accelerating drug development and deepening the understanding of molecular interactions. Therefore, we provide an easy-to-use web interface that can be freely accessed at https://www.bio.nat.tum.de/oc2/barlowdti . SCIENTIFIC CONTRIBUTION: Our computationally efficient and effective hybrid approach, combining the deep learning model Barlow Twins and gradient boosting machines, outperforms state-of-the-art methods across multiple splits and benchmarks using only one-dimensional input. Furthermore, we advance the field by proposing an influence method that elucidates model decision-making, thereby providing deeper insights into molecular interactions and improving the interpretability of drug-target interactions predictions.
Collapse
Affiliation(s)
- Maximilian G Schuh
- Chair of Organic Chemistry II, Department of Bioscience, TUM School of Natural Sciences, Center for Functional Protein Assemblies (CPA), Technical University of Munich, Ernst-Otto-Fischer Str. 8, 85748, Garching bei München, Bavaria, Germany
| | - Davide Boldini
- Chair of Organic Chemistry II, Department of Bioscience, TUM School of Natural Sciences, Center for Functional Protein Assemblies (CPA), Technical University of Munich, Ernst-Otto-Fischer Str. 8, 85748, Garching bei München, Bavaria, Germany.
| | - Annkathrin I Bohne
- Chair of Biochemistry, Department of Bioscience, TUM School of Natural Sciences, Center for Functional Protein Assemblies (CPA), Technical University of Munich, Ernst-Otto-Fischer Str. 8, 85748, Garching bei München, Bavaria, Germany
| | - Stephan A Sieber
- Chair of Organic Chemistry II, Department of Bioscience, TUM School of Natural Sciences, Center for Functional Protein Assemblies (CPA), Technical University of Munich, Ernst-Otto-Fischer Str. 8, 85748, Garching bei München, Bavaria, Germany.
| |
Collapse
|
37
|
Fu H, Ding Z, Wang W. Trans-m5C: A transformer-based model for predicting 5-methylcytosine (m5C) sites. Methods 2025; 234:178-186. [PMID: 39742984 DOI: 10.1016/j.ymeth.2024.12.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2023] [Revised: 10/31/2024] [Accepted: 12/11/2024] [Indexed: 01/04/2025] Open
Abstract
5-Methylcytosine (m5C) plays a pivotal role in various RNA metabolic processes, including RNA localization, stability, and translation. Current high-throughput sequencing technologies for m5C site identification are resource-intensive in terms of cost, labor, and time. As such, there is a pressing need for efficient computational approaches. Many existing computational methods rely on intricate hand-crafted features, requiring unavailable features, often leading to suboptimal prediction accuracy. Addressing these challenges, we introduce a novel deep-learning method, Trans-m5C. We first categorize m5C sites into NSUN2-dependent and NSUN6-dependent types for independent feature extraction. Subsequently, meticulously crafted transformer neural networks are employed to distill global features. The prediction of m5C sites is then accomplished using a discriminator built from a multi-layer perceptron. A rigorous evaluation for the performance of Trans-m5C on experimentally validated m5C data from human and mouse species reveals that our method offers a competitive edge over both baseline and existing methodologies.
Collapse
Affiliation(s)
- Haitao Fu
- School of Artificial Intelligence, Hubei University, Wuhan, 430062, China
| | - Zewen Ding
- University of Edinburgh, Centre for Discovery Brain Sciences, Edinburgh, EH89XD, United Kingdom
| | - Wen Wang
- University of Edinburgh, Queen's Medical Research Institute, Edinburgh, EH164TJ, United Kingdom.
| |
Collapse
|
38
|
Tian Z, Zhang Z, Zhou W, Teng Z, Song W, Zou Q. DSANIB: Drug-Target Interaction Predictions With Dual-View Synergistic Attention Network and Information Bottleneck Strategy. IEEE J Biomed Health Inform 2025; 29:1484-1493. [PMID: 40030194 DOI: 10.1109/jbhi.2024.3497591] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/06/2025]
Abstract
Prediction of drug-target interactions (DTIs) is one of the crucial steps for drug repositioning. Identifying DTIs through bio-experimental manners is always expensive and time-consuming. Recently, deep learning-based approaches have shown promising advancements in DTI prediction, but they face two notable challenges: (i) how to explicitly capture local interactions between drug-target pairs and learn their higher-order substructure embeddings; (ii) How to filter out redundant information to obtain effective embeddings for drugs and targets. Results: In this study, we propose a novel approach, termed DSANIB, to infer potential interactions between drugs and targets. DSANIB comprises two primary components: (1) DSAN component: The Inter-view Attention Network Module explicitly learns the local interactions between drugs and targets, while the Intra-view Attention Network Module aggregates information from local interaction features to obtain their higher-order substructure embeddings. (2) Information Bottleneck (IB) component: DSANIB adopts the IB strategy, which could retain relevant information while minimizing the redundant features to obtain their discriminative representations. Extensive experimental results demonstrate that DSANIB outperforms other SOTA prediction models. In addition, visualization of drug and target embeddings learned through DSANIB could provide interpretable insights for the prediction results.
Collapse
|
39
|
Cao D, Chen M, Zhang R, Wang Z, Huang M, Yu J, Jiang X, Fan Z, Zhang W, Zhou H, Li X, Fu Z, Zhang S, Zheng M. SurfDock is a surface-informed diffusion generative model for reliable and accurate protein-ligand complex prediction. Nat Methods 2025; 22:310-322. [PMID: 39604569 DOI: 10.1038/s41592-024-02516-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Accepted: 10/16/2024] [Indexed: 11/29/2024]
Abstract
Accurately predicting protein-ligand interactions is crucial for understanding cellular processes. We introduce SurfDock, a deep-learning method that addresses this challenge by integrating protein sequence, three-dimensional structural graphs and surface-level features into an equivariant architecture. SurfDock employs a generative diffusion model on a non-Euclidean manifold, optimizing molecular translations, rotations and torsions to generate reliable binding poses. Our extensive evaluations across various benchmarks demonstrate SurfDock's superiority over existing methods in docking success rates and adherence to physical constraints. It also exhibits remarkable generalizability to unseen proteins and predicted apo structures, while achieving state-of-the-art performance in virtual screening tasks. In a real-world application, SurfDock identified seven novel hit molecules in a virtual screening project targeting aldehyde dehydrogenase 1B1, a key enzyme in cellular metabolism. This showcases SurfDock's ability to elucidate molecular mechanisms underlying cellular processes. These results highlight SurfDock's potential as a transformative tool in structural biology, offering enhanced accuracy, physical plausibility and practical applicability in understanding protein-ligand interactions.
Collapse
Affiliation(s)
- Duanhua Cao
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
| | - Mingan Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Physical Science and Technology, ShanghaiTech University, Shanghai, China
- Lingang Laboratory, Shanghai, China
| | - Runze Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhaokun Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Manlin Huang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- Nanchang University, Nanchang, China
| | - Jie Yu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- Lingang Laboratory, Shanghai, China
- School of Information Science and Technology, ShanghaiTech University, Shanghai, China
| | - Xinyu Jiang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhehuan Fan
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Wei Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Hao Zhou
- Institute for AI Industry Research (AIR), Tsinghua University, Beijing, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
| | - Zunyun Fu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
| | - Sulin Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Mingyue Zheng
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China.
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China.
- University of Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
40
|
Chun Y, Li H, Wang S. SS-DTI: A deep learning method integrating semantic and structural information for drug-target interaction prediction. J Bioinform Comput Biol 2025; 23:2550002. [PMID: 40134345 DOI: 10.1142/s0219720025500027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/27/2025]
Abstract
Drug-target interaction (DTI) prediction is pivotal in drug discovery and repurposing, providing a more efficient alternative to traditional wet-lab experiments by saving time and resources and expediting the identification of potential targets. Current DTI methods predominantly focus on extracting semantic features from drug and protein sequences or utilizing structural information, often neglecting the integration of both. This gap hinders the achievement of a comprehensive representation of drug and protein molecules. To address this, we propose SS-DTI, a novel end-to-end deep learning approach that integrates both semantic and structural information. Our method features a multi-scale semantic feature extraction block to capture local and global information from sequences and employs Graph Convolutional Networks (GCNs) to learn structural features. Evaluations on four benchmark datasets demonstrate that SS-DTI outperforms state-of-the-art methods, showcasing its superior predictive performance. Our code is available at https://github.com/RobinChun/SS-DTI.
Collapse
Affiliation(s)
- Yujie Chun
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming 650504, Yunnan, P. R. China
| | - Huaihu Li
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming 650504, Yunnan, P. R. China
| | - Shunfang Wang
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming 650504, Yunnan, P. R. China
- Yunnan Key Laboratory of Intelligent Systems and Computing, Yunnan University, Kunming 650504, Yunnan, P. R. China
| |
Collapse
|
41
|
Li C, Li G. DynHeter-DTA: Dynamic Heterogeneous Graph Representation for Drug-Target Binding Affinity Prediction. Int J Mol Sci 2025; 26:1223. [PMID: 39940990 PMCID: PMC11818550 DOI: 10.3390/ijms26031223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2024] [Revised: 01/27/2025] [Accepted: 01/28/2025] [Indexed: 02/16/2025] Open
Abstract
In drug development, drug-target affinity (DTA) prediction is a key indicator for assessing the drug's efficacy and safety. Despite significant progress in deep learning-based affinity prediction approaches in recent years, there are still limitations in capturing the complex interactions between drugs and target receptors. To address this issue, a dynamic heterogeneous graph prediction model, DynHeter-DTA, is proposed in this paper, which fully leverages the complex relationships between drug-drug, protein-protein, and drug-protein interactions, allowing the model to adaptively learn the optimal graph structures. Specifically, (1) in the data processing layer, to better utilize the similarities and interactions between drugs and proteins, the model dynamically adjusts the connection strengths between drug-drug, protein-protein, and drug-protein pairs, constructing a variable heterogeneous graph structure, which significantly improves the model's expressive power and generalization performance; (2) in the model design layer, considering that the quantity of protein nodes significantly exceeds that of drug nodes, an approach leveraging Graph Isomorphism Networks (GIN) and Self-Attention Graph Pooling (SAGPooling) is proposed to enhance prediction efficiency and accuracy. Comprehensive experiments on the Davis, KIBA, and Human public datasets demonstrate that DynHeter-DTA exceeds the performance of previous models in drug-target interaction forecasting, providing an innovative solution for drug-target affinity prediction.
Collapse
Affiliation(s)
- Changli Li
- School of Artificial Intelligence, Nanjing University of Information Science & Technology, Nanjing 210044, China;
| | | |
Collapse
|
42
|
Shen X, Yan S, Zeng T, Xia F, Jiang D, Wan G, Cao D, Wu R. TarIKGC: A Target Identification Tool Using Semantics-Enhanced Knowledge Graph Completion with Application to CDK2 Inhibitor Discovery. J Med Chem 2025; 68:1793-1809. [PMID: 39745279 DOI: 10.1021/acs.jmedchem.4c02543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2025]
Abstract
Target identification is a critical stage in the drug discovery pipeline. Various computational methodologies have been dedicated to enhancing the classification performance of compound-target interactions, yet significant room remains for improving the recommendation performance. To address this challenge, we developed TarIKGC, a tool for target prioritization that leverages semantics enhanced knowledge graph (KG) completion. This method harnesses knowledge representation learning within a heterogeneous compound-target-disease network. Specifically, TarIKGC combines an attention-based aggregation graph neural network with a multimodal feature extractor network to simultaneously learn internal semantic features from biomedical entities and topological features from the KG. Furthermore, a KG embedding model is employed to identify missing relationships among compounds and targets. In silico evaluations highlighted the superior performance of TarIKGC in drug repositioning tasks. In addition, TarIKGC successfully identified two potential cyclin-dependent kinase 2 (CDK2) inhibitors with novel scaffolds through reverse target fishing. Both compounds exhibited antiproliferative activities across multiple therapeutic indications targeting CDK2.
Collapse
Affiliation(s)
- Xiaojuan Shen
- State Key Laboratory of Anti-Infective Drug Discovery and Development, School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, China
| | - Shijia Yan
- State Key Laboratory of Anti-Infective Drug Discovery and Development, School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, China
| | - Tao Zeng
- State Key Laboratory of Anti-Infective Drug Discovery and Development, School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, China
| | - Fei Xia
- School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China
| | - Dejun Jiang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, China
| | - Guohui Wan
- State Key Laboratory of Anti-Infective Drug Discovery and Development, School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, China
| | - Ruibo Wu
- State Key Laboratory of Anti-Infective Drug Discovery and Development, School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, China
| |
Collapse
|
43
|
Wang X, Zhou J, Mueller J, Quinn D, Carvalho A, Moody TS, Huang M. BioStructNet: Structure-Based Network with Transfer Learning for Predicting Biocatalyst Functions. J Chem Theory Comput 2025; 21:474-490. [PMID: 39705058 PMCID: PMC11736791 DOI: 10.1021/acs.jctc.4c01391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2024] [Revised: 12/03/2024] [Accepted: 12/12/2024] [Indexed: 12/21/2024]
Abstract
Enzyme-substrate interactions are essential to both biological processes and industrial applications. Advanced machine learning techniques have significantly accelerated biocatalysis research, revolutionizing the prediction of biocatalytic activities and facilitating the discovery of novel biocatalysts. However, the limited availability of data for specific enzyme functions, such as conversion efficiency and stereoselectivity, presents challenges for prediction accuracy. In this study, we developed BioStructNet, a structure-based deep learning network that integrates both protein and ligand structural data to capture the complexity of enzyme-substrate interactions. Benchmarking studies with different algorithms showed the enhanced predictive accuracy of BioStructNet. To further optimize the prediction accuracy for the small data set, we implemented transfer learning in the framework, training a source model on a large data set and fine-tuning it on a small, function-specific data set, using the CalB data set as a case study. The model performance was validated by comparing the attention heat maps generated by the BioStructNet interaction module with the enzyme-substrate interactions revealed from molecular dynamics simulations of enzyme-substrate complexes. BioStructNet would accelerate the discovery of functional enzymes for industrial use, particularly in cases where the training data sets for machine learning are small.
Collapse
Affiliation(s)
- Xiangwen Wang
- School
of Chemistry and Chemical Engineering, Queen’s
University Belfast, BT9 5AG Belfast, Northern Ireland, U.K.
- Department
of Biocatalysis and Isotope Chemistry, Almac
Sciences, BT63 5QD Craigavon, Northern
Ireland, U.K.
| | - Jiahui Zhou
- School
of Chemistry and Chemical Engineering, Queen’s
University Belfast, BT9 5AG Belfast, Northern Ireland, U.K.
| | - Jane Mueller
- Department
of Biocatalysis and Isotope Chemistry, Almac
Sciences, BT63 5QD Craigavon, Northern
Ireland, U.K.
| | - Derek Quinn
- Department
of Biocatalysis and Isotope Chemistry, Almac
Sciences, BT63 5QD Craigavon, Northern
Ireland, U.K.
| | - Alexandra Carvalho
- Department
of Biocatalysis and Isotope Chemistry, Almac
Sciences, BT63 5QD Craigavon, Northern
Ireland, U.K.
| | - Thomas S. Moody
- Department
of Biocatalysis and Isotope Chemistry, Almac
Sciences, BT63 5QD Craigavon, Northern
Ireland, U.K.
- Arran
Chemical Company Limited, Unit 1 Monksland Industrial Estate, Athlone, Co. Roscommon N37 DN24, Ireland
| | - Meilan Huang
- School
of Chemistry and Chemical Engineering, Queen’s
University Belfast, BT9 5AG Belfast, Northern Ireland, U.K.
| |
Collapse
|
44
|
Huang H, Shi X, Lei H, Hu F, Cai Y. ProtChat: An AI Multi-Agent for Automated Protein Analysis Leveraging GPT-4 and Protein Language Model. J Chem Inf Model 2025; 65:62-70. [PMID: 39690112 DOI: 10.1021/acs.jcim.4c01345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2024]
Abstract
Large language models (LLMs) have transformed natural language processing, enabling advanced human-machine communication. Similarly, in computational biology, protein sequences are interpreted as natural language, facilitating the creation of protein large language models (PLLMs). However, applying PLLMs requires specialized preprocessing and script development, increasing the complexity of their use. Researchers have integrated LLMs with PLLMs to develop automated protein analysis tools to address these challenges, simplifying analytical workflows. Existing technologies often require substantial human intervention for specific protein-related tasks, maintaining high barriers to implementing automated protein analysis systems. Here, we propose ProtChat, an AI multiagent system for protein analysis that integrates the inference capabilities of PLLMs with the task-planning abilities of LLMs. ProtChat integrates GPT-4 with multiple PLLMs, like ESM and MASSA, to automate tasks such as protein property prediction and protein-drug interactions without human intervention. This AI agent enables users to input instructions directly, significantly improving efficiency and usability, making it suitable for researchers without a computational background. Experiments demonstrate that ProtChat can automate complex protein tasks accurately, avoiding manual intervention and delivering results rapidly. This advancement opens new research avenues in computational biology and drug discovery. Future applications may extend ProtChat's capabilities to broader biological data analysis. Our code and data are publicly available at github.com/SIAT-code/ProtChat.
Collapse
Affiliation(s)
- Huazhen Huang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Xianguo Shi
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Hongyang Lei
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Fan Hu
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Yunpeng Cai
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| |
Collapse
|
45
|
Wang J, He R, Wang X, Li H, Lu Y. MCF-DTI: Multi-Scale Convolutional Local-Global Feature Fusion for Drug-Target Interaction Prediction. Molecules 2025; 30:274. [PMID: 39860144 PMCID: PMC11767603 DOI: 10.3390/molecules30020274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2024] [Revised: 12/21/2024] [Accepted: 01/10/2025] [Indexed: 01/27/2025] Open
Abstract
Predicting drug-target interactions (DTIs) is a crucial step in the development of new drugs and drug repurposing. In this paper, we propose a novel drug-target prediction model called MCF-DTI. The model utilizes the SMILES representation of drugs and the sequence features of targets, employing a multi-scale convolutional neural network (MSCNN) with parallel shared-weight modules to extract features from the drug side. For the target side, it combines MSCNN with Transformer modules to capture both local and global features effectively. The extracted features are then weighted and fused, enabling comprehensive feature representation to enhance the predictive power of the model. Experimental results on the Davis dataset demonstrate that MCF-DTI achieves an AUC of 0.9746 and an AUPR of 0.9542, outperforming other state-of-the-art models. Our case study demonstrates that our model effectively validated several known drug-target relationships in lung cancer and predicted the therapeutic potential of certain preclinical compounds in treating lung cancer. These findings contribute valuable insights for subsequent drug repurposing efforts and novel drug development.
Collapse
Affiliation(s)
- Jihong Wang
- School of Computer, Guangdong University of Education, Guangzhou 510310, China
| | - Ruijia He
- School of Computer, Guangdong University of Education, Guangzhou 510310, China
| | - Xiaodan Wang
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Zhongshan 528458, China
| | - Hongjian Li
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Zhongshan 528458, China
| | - Yulei Lu
- School of Computer, Guangdong University of Education, Guangzhou 510310, China
| |
Collapse
|
46
|
Wang R, Ji Y, Li Y, Lee ST. Applications of Transformers in Computational Chemistry: Recent Progress and Prospects. J Phys Chem Lett 2025; 16:421-434. [PMID: 39737793 DOI: 10.1021/acs.jpclett.4c03128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2025]
Abstract
The powerful data processing and pattern recognition capabilities of machine learning (ML) technology have provided technical support for the innovation in computational chemistry. Compared with traditional ML and deep learning (DL) techniques, transformers possess fine-grained feature-capturing abilities, which are able to efficiently and accurately model the dependencies of long-sequence data, simulate complex and diverse chemical spaces, and explore the computational logic behind the data. In this Perspective, we provide an overview of the application of transformer models in computational chemistry. We first introduce the working principle of transformer models and analyze the transformer-based architectures in computational chemistry. Next, we explore the practical applications of the model in a number of specific scenarios such as property prediction and chemical structure generation. Finally, based on these applications and research results, we provide an outlook for the research of this field in the future.
Collapse
Affiliation(s)
- Rui Wang
- Macao Institute of Materials Science and Engineering, Faculty of Innovation Engineering, Macau University of Science and Technology, Taipa, Macau SAR 999078, China
| | - Yujin Ji
- Institute of Functional Nano & Soft Materials (FUNSOM), Jiangsu Key Laboratory for Carbon-Based Functional Materials & Devices, Soochow University, Suzhou, Jiangsu 215123, China
| | - Youyong Li
- Macao Institute of Materials Science and Engineering, Faculty of Innovation Engineering, Macau University of Science and Technology, Taipa, Macau SAR 999078, China
- Institute of Functional Nano & Soft Materials (FUNSOM), Jiangsu Key Laboratory for Carbon-Based Functional Materials & Devices, Soochow University, Suzhou, Jiangsu 215123, China
| | - Shuit-Tong Lee
- Macao Institute of Materials Science and Engineering, Faculty of Innovation Engineering, Macau University of Science and Technology, Taipa, Macau SAR 999078, China
- Institute of Functional Nano & Soft Materials (FUNSOM), Jiangsu Key Laboratory for Carbon-Based Functional Materials & Devices, Soochow University, Suzhou, Jiangsu 215123, China
| |
Collapse
|
47
|
Zhang Q, Yin W, Chen X, Zhou A, Zhang G, Zhao Z, Li Z, Zhang Y, Bunu SJ, Shen J, Zhu W, Jiang X, Xu Z. F-CPI: A Multimodal Deep Learning Approach for Predicting Compound Bioactivity Changes Induced by Fluorine Substitution. J Med Chem 2025; 68:706-718. [PMID: 39707149 DOI: 10.1021/acs.jmedchem.4c02668] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2024]
Abstract
Fluorine (F) substitution is a common method of drug discovery and development. However, there are no accurate approaches available for predicting the bioactivity changes after F-substitution, as the effect of substitution on the interactions between compounds and proteins (CPI) remains a mystery. In this study, we constructed a data set with 111,168 pairs of fluorine-substituted and nonfluorine-substituted compounds. We developed a multimodal deep learning model (F-CPI). In comparison with traditional machine learning and popular CPI task models, the accuracy, precision, and recall of F-CPI (∼90, ∼79, and ∼45%) were higher than those of GraphDTA (∼86, ∼58, and ∼40%). The application of the F-CPI for the structural optimization of hit compounds against SARS-CoV-2 3CLpro by F-substitution achieved a more than 100-fold increase in bioactivity (IC50: 0.23 μM vs 28.19 μM). Therefore, the multimodal deep learning model F-CPI would be a veritable and effective tool in the context of drug discovery and design.
Collapse
Affiliation(s)
- Qian Zhang
- School of Computer Science and Technology, Shanghai Frontiers Science Center of Molecule Intelligent Syntheses, East China Normal University, Shanghai 200241, China
| | - Wenhai Yin
- School of Computer Science and Technology, Shanghai Frontiers Science Center of Molecule Intelligent Syntheses, East China Normal University, Shanghai 200241, China
| | - Xinyao Chen
- Wuya College of Innovation, Shenyang Pharmaceutical University, Shenyang 110016, China
- Yangtze Delta Drug Advanced Research Institute and Yangtze Delta Pharmaceutical College, Nantong 226133, China
| | - Aimin Zhou
- School of Computer Science and Technology, Shanghai Frontiers Science Center of Molecule Intelligent Syntheses, East China Normal University, Shanghai 200241, China
| | - Guixu Zhang
- School of Computer Science and Technology, Shanghai Frontiers Science Center of Molecule Intelligent Syntheses, East China Normal University, Shanghai 200241, China
| | - Zhi Zhao
- Wuya College of Innovation, Shenyang Pharmaceutical University, Shenyang 110016, China
- Yangtze Delta Drug Advanced Research Institute and Yangtze Delta Pharmaceutical College, Nantong 226133, China
| | - Zhiqiang Li
- Vigonvita Life Sciences Co., Ltd., Suzhou 215021, China
| | - Yan Zhang
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
- School of Pharmacy, University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
- Shandong Laboratory of Yantai Drug Discovery, Bohai Rim Advanced Research Institute for Drug Discovery, Yantai 264117, China
| | - Samuel Jacob Bunu
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
- School of Pharmacy, University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
- Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | - Jingshan Shen
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
- School of Pharmacy, University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Weiliang Zhu
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
- School of Pharmacy, University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
- Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | - Xiangrui Jiang
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
- School of Pharmacy, University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
- Shandong Laboratory of Yantai Drug Discovery, Bohai Rim Advanced Research Institute for Drug Discovery, Yantai 264117, China
| | - Zhijian Xu
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
- School of Pharmacy, University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
- Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| |
Collapse
|
48
|
Ye Q, Sun Y. Improving drug-target affinity prediction by adaptive self-supervised learning. PeerJ Comput Sci 2025; 11:e2622. [PMID: 39896027 PMCID: PMC11784864 DOI: 10.7717/peerj-cs.2622] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2024] [Accepted: 12/02/2024] [Indexed: 02/04/2025]
Abstract
Computational drug-target affinity prediction is important for drug screening and discovery. Currently, self-supervised learning methods face two major challenges in drug-target affinity prediction. The first difficulty lies in the phenomenon of sample mismatch: self-supervised learning processes drug and target samples independently, while actual prediction requires the integration of drug-target pairs. Another challenge is the mismatch between the broadness of self-supervised learning objectives and the precision of biological mechanisms of drug-target affinity (i.e., the induced-fit principle). The former focuses on global feature extraction, while the latter emphasizes the importance of local precise matching. To address these issues, an adaptive self-supervised learning-based drug-target affinity prediction (ASSLDTA) was designed. ASSLDTA integrates a novel adaptive self-supervised learning (ASSL) module with a high-level feature learning network to extract the feature. The ASSL leverages a large amount of unlabeled training data to effectively capture low-level features of drugs and targets. Its goal is to maximize the retention of original feature information, thereby bridging the objective gap between self-supervised learning and drug-target affinity prediction and alleviating the sample mismatch problem. The high-level feature learning network, on the other hand, focuses on extracting effective high-level features for affinity prediction through a small amount of labeled data. Through this two-stage feature extraction design, each stage undertakes specific tasks, fully leveraging the advantages of each model while efficiently integrating information from different data sources, providing a more accurate and comprehensive solution for drug-target affinity prediction. In our experiments, ASSLDTA is much better than other deep methods, and the result of ASSLDTA is significantly increased by learning adaptive self-supervised learning-based features, which validates the effectiveness of our ASSLDTA.
Collapse
Affiliation(s)
- Qing Ye
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou, China
| | - Yaxin Sun
- School of Computer Science and Technology (School of Artificial Intelligence), Zhejiang Normal University, Jinhua, China
- Department of Algorithm, Zhejiang Aerospace Hengjia Data Technology Co. Ltd., Jiaxing, China
| |
Collapse
|
49
|
Wu Y, Xie L. AI-driven multi-omics integration for multi-scale predictive modeling of genotype-environment-phenotype relationships. Comput Struct Biotechnol J 2025; 27:265-277. [PMID: 39886532 PMCID: PMC11779603 DOI: 10.1016/j.csbj.2024.12.030] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2024] [Revised: 12/22/2024] [Accepted: 12/26/2024] [Indexed: 02/01/2025] Open
Abstract
Despite the wealth of single-cell multi-omics data, it remains challenging to predict the consequences of novel genetic and chemical perturbations in the human body. It requires knowledge of molecular interactions at all biological levels, encompassing disease models and humans. Current machine learning methods primarily establish statistical correlations between genotypes and phenotypes but struggle to identify physiologically significant causal factors, limiting their predictive power. Key challenges in predictive modeling include scarcity of labeled data, generalization across different domains, and disentangling causation from correlation. In light of recent advances in multi-omics data integration, we propose a new artificial intelligence (AI)-powered biology-inspired multi-scale modeling framework to tackle these issues. This framework will integrate multi-omics data across biological levels, organism hierarchies, and species to predict genotype-environment-phenotype relationships under various conditions. AI models inspired by biology may identify novel molecular targets, biomarkers, pharmaceutical agents, and personalized medicines for presently unmet medical needs.
Collapse
Affiliation(s)
- You Wu
- Ph.D. Program in Computer Science, The Graduate Center, The City University of New York, New York, NY, USA
| | - Lei Xie
- Ph.D. Program in Computer Science, The Graduate Center, The City University of New York, New York, NY, USA
- Ph.D. Program in Biology and Biochemistry, The Graduate Center, The City University of New York, New York, NY, USA
- Department of Computer Science, Hunter College, The City University of New York, New York, NY, USA
- Helen & Robert Appel Alzheimer's Disease Research Institute, Feil Family Brain & Mind Research Institute, Weill Cornell Medicine, Cornell University, New York, NY, USA
| |
Collapse
|
50
|
Bochtler M. How the technologies behind self-driving cars, social networks, ChatGPT, and DALL-E2 are changing structural biology. Bioessays 2025; 47:e2400155. [PMID: 39404756 DOI: 10.1002/bies.202400155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2024] [Revised: 09/08/2024] [Accepted: 09/26/2024] [Indexed: 12/22/2024]
Abstract
The performance of deep Neural Networks (NNs) in the text (ChatGPT) and image (DALL-E2) domains has attracted worldwide attention. Convolutional NNs (CNNs), Large Language Models (LLMs), Denoising Diffusion Probabilistic Models (DDPMs)/Noise Conditional Score Networks (NCSNs), and Graph NNs (GNNs) have impacted computer vision, language editing and translation, automated conversation, image generation, and social network management. Proteins can be viewed as texts written with the alphabet of amino acids, as images, or as graphs of interacting residues. Each of these perspectives suggests the use of tools from a different area of deep learning for protein structural biology. Here, I review how CNNs, LLMs, DDPMs/NCSNs, and GNNs have led to major advances in protein structure prediction, inverse folding, protein design, and small molecule design. This review is primarily intended as a deep learning primer for practicing experimental structural biologists. However, extensive references to the deep learning literature should also make it relevant to readers who have a background in machine learning, physics or statistics, and an interest in protein structural biology.
Collapse
Affiliation(s)
- Matthias Bochtler
- International institute of Molecular and Cell Biology in Warsaw, Warsaw, Poland
- Institute of Biochemistry and Biophysics, Warsaw, Poland
| |
Collapse
|