1
|
Li Z, Chen G, Tan G, Chen CYC. CoupleMDA: Metapath-Induced Structural-Semantic Coupling Network for miRNA-Disease Association Prediction. Int J Mol Sci 2025; 26:4948. [PMID: 40430088 PMCID: PMC12112494 DOI: 10.3390/ijms26104948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2025] [Revised: 05/18/2025] [Accepted: 05/19/2025] [Indexed: 05/29/2025] Open
Abstract
The prediction of microRNA-disease associations (MDAs) is crucial for understanding disease mechanisms and biomarker discovery. While graph neural networks have emerged as promising tools for MDA prediction, existing methods face critical limitations: (1) data leakage caused by improper use of Gaussian interaction profile (GIP) kernel similarity during feature construction, (2) self-validation loops in calculating miRNA functional similarity using known MDA data, and (3) information bottlenecks in conventional graph neural network (GNN) architectures that flatten heterogeneous relationships and employ over-simplified decoders. To address these challenges, we propose CoupleMDA, a metapath-guided heterogeneous graph learning framework coupling structural and semantic features. The model constructs a biological heterogeneous network using independent data sources to eliminate feature-target space coupling. Our framework implements a two-stage encoding strategy: (1) relational graph convolutional networks (RGCN) for pre-encoding and (2) metapath-guided semantic aggregation for secondary encoding. During decoding, common metapaths between node pairs structurally guide feature pooling, mitigating information bottlenecks. The comprehensive evaluation shows that CoupleMDA achieves a 2-5% performance improvement over the current state-of-the-art baseline methods in the heterogeneous graph link prediction task. Ablation studies confirm the necessity of each proposed component, while case analyses reveal the framework's capability to recover cancer-related miRNA-disease associations through biologically interpretable metapaths.
Collapse
Affiliation(s)
- Zhuojian Li
- School of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China; (Z.L.); (G.C.)
| | - Guanxing Chen
- School of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China; (Z.L.); (G.C.)
| | - Guang Tan
- School of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China; (Z.L.); (G.C.)
| | - Calvin Yu-Chian Chen
- School of AI for Science, Peking University, Beijing 100871, China
- State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Genomics, Peking University Shenzhen Graduate School, Shenzhen 518055, China
- Department of Medical Research, China Medical University Hospital, Taichung 40447, Taiwan
| |
Collapse
|
2
|
Jiang X, Wen L, Li W, Que D, Ming L. DTGHAT: multi-molecule heterogeneous graph transformer based on multi-molecule graph for drug-target identification. Front Pharmacol 2025; 16:1596216. [PMID: 40356956 PMCID: PMC12066497 DOI: 10.3389/fphar.2025.1596216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2025] [Accepted: 04/14/2025] [Indexed: 05/15/2025] Open
Abstract
Introduction Drug target identification is a fundamental step in drug discovery and plays a pivotal role in new therapies development. Existing computational methods focus on the direct interactions between drugs and targets, often ignoring the complex interrelationships between drugs, targets and various biomolecules in the human system. Method To address this limitation, we propose a novel prediction model named DTGHAT (Drug and Target Association Prediction using Heterogeneous Graph Attention Transformer based on Molecular Heterogeneous). DTGHAT utilizes a graph attention transformer to identify novel targets from 15 heterogeneous drug-gene-disease networks characterized by chemical, genomic, phenotypic, and cellular networks. Result In a 5-fold cross-validation study, DTGHAT achieved an area under the receiver operating characteristic curve (AUC) of 0.9634, which is at least 4% higher than current state-of-the-art methods. Characterization ablation experiments highlight the importance of integrating biomolecular data from multiple sources in revealing drug-target interactions. In addition, a case study on cancer drugs further validates DTGHAT's effectiveness in predicting novel drug target identification. DTGHAT is free and available at: https://github.com/stella-007/DTGHAT.git.
Collapse
Affiliation(s)
- Xinchen Jiang
- The National Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha, China
- Hunan provincical key laboratory of Neurorestoratology, The Second Affiliated Hospital of Hunan Normal University, Changsha, China
| | - Lu Wen
- Hunan provincical key laboratory of Neurorestoratology, The Second Affiliated Hospital of Hunan Normal University, Changsha, China
- Department of Ophthalmology, 921 Hospital of Joint Logistics Support Force People’s Liberation Army of China, (The Second Affiliated Hospital of Hunan Normal University), Changsha, China
| | - Wenshui Li
- The National Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha, China
- Hunan provincical key laboratory of Neurorestoratology, The Second Affiliated Hospital of Hunan Normal University, Changsha, China
| | - Deng Que
- The National Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha, China
- Hunan provincical key laboratory of Neurorestoratology, The Second Affiliated Hospital of Hunan Normal University, Changsha, China
- Department of Neurology, 921 Hospital of Joint Logistics Support Force People’s Liberation Army of China, (The Second Affiliated Hospital of Hunan Normal University), Changsha, China
| | - Lu Ming
- The National Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha, China
- Hunan provincical key laboratory of Neurorestoratology, The Second Affiliated Hospital of Hunan Normal University, Changsha, China
| |
Collapse
|
3
|
Xie J, Li W, You H, Zhang D. GraphTransNet: predicting epilepsy-related genes using a graph-augmented protein language model. Front Pharmacol 2025; 16:1584625. [PMID: 40235533 PMCID: PMC11996831 DOI: 10.3389/fphar.2025.1584625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2025] [Accepted: 03/19/2025] [Indexed: 04/17/2025] Open
Abstract
Introduction: Epilepsy, a complex neurological disorder characterised by recurrent seizures and significant genetic heterogeneity, presents considerable challenges form accurate diagnosis and drug target identification. While traditional genomewide association studies (GWAS) and sequencing technologies have advanced our understanding of epilepsy-related gene targets, they often struggle to identify novel and rare variants crucial for precise diagnosis and targeted drug development. The increasing availability of large-scale genomic data, coupled with the power of deep learning, offers a promising avenue for progress. Method: In this work, we introduce GraphTransNet, a novel hybrid neural network model designed for predicting epilepsy-associated gene targets, with direct implications for improved disease diagnosis and therapeutic target identification. GraphTransNet leverages protein language models (specifically ESM) to generate numerical embeddings from gene sequences. These embeddings are then processed by a novel architecture integrating transformer and convolutional neural network (CNN)components to predict epilepsy-related gene targets. Results: Our results demonstrate that GraphTransNet achieves high accuracy in identifying epilepsy targets, outperforming existing predictive tools in terms of both recall and precision metrics for reliable disease diagnosis and effective drug target identification. Rigorous comparisons with established machine learning methods and other deep learning architectures further underscore the efficacy of GraphTransNet. Discussion: This approach represents a valuable computational tool for advancing epilepsy genetics research, with the potential to contribute to more accurate diagnostic strategies and the discovery of novel drug targets for improved treatment outcomes.
Collapse
Affiliation(s)
- Junfeng Xie
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Wei Li
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Hairu You
- School of Engineering, The University of Sydney, Sydney, NSW, Australia
| | - Dafang Zhang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| |
Collapse
|
4
|
Asim MN, Ibrahim MA, Asif T, Dengel A. RNA sequence analysis landscape: A comprehensive review of task types, databases, datasets, word embedding methods, and language models. Heliyon 2025; 11:e41488. [PMID: 39897847 PMCID: PMC11783440 DOI: 10.1016/j.heliyon.2024.e41488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2024] [Revised: 12/23/2024] [Accepted: 12/24/2024] [Indexed: 02/04/2025] Open
Abstract
Deciphering information of RNA sequences reveals their diverse roles in living organisms, including gene regulation and protein synthesis. Aberrations in RNA sequence such as dysregulation and mutations can drive a diverse spectrum of diseases including cancers, genetic disorders, and neurodegenerative conditions. Furthermore, researchers are harnessing RNA's therapeutic potential for transforming traditional treatment paradigms into personalized therapies through the development of RNA-based drugs and gene therapies. To gain insights of biological functions and to detect diseases at early stages and develop potent therapeutics, researchers are performing diverse types RNA sequence analysis tasks. RNA sequence analysis through conventional wet-lab methods is expensive, time-consuming and error prone. To enable large-scale RNA sequence analysis, empowerment of wet-lab experimental methods with Artificial Intelligence (AI) applications necessitates scientists to have a comprehensive knowledge of both DNA and AI fields. While molecular biologists encounter challenges in understanding AI methods, computer scientists often lack basic foundations of RNA sequence analysis tasks. Considering the absence of a comprehensive literature that bridges this research gap and promotes the development of AI-driven RNA sequence analysis applications, the contributions of this manuscript are manifold: It equips AI researchers with biological foundations of 47 distinct RNA sequence analysis tasks. It sets a stage for development of benchmark datasets related to 47 distinct RNA sequence analysis tasks by facilitating cruxes of 64 different biological databases. It presents word embeddings and language models applications across 47 distinct RNA sequence analysis tasks. It streamlines the development of new predictors by providing a comprehensive survey of 58 word embeddings and 70 language models based predictive pipelines performance values as well as top performing traditional sequence encoding based predictors and their performances across 47 RNA sequence analysis tasks.
Collapse
Affiliation(s)
- Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
| | - Muhammad Ali Ibrahim
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany
| | - Tayyaba Asif
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany
| | - Andreas Dengel
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany
| |
Collapse
|
5
|
Wang S, Liu JX, Li F, Wang J, Gao YL. M 3HOGAT: A Multi-View Multi-Modal Multi-Scale High-Order Graph Attention Network for Microbe-Disease Association Prediction. IEEE J Biomed Health Inform 2024; 28:6259-6267. [PMID: 39012741 DOI: 10.1109/jbhi.2024.3429128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/18/2024]
Abstract
Numerous scientific studies have found a link between diverse microorganisms in the human body and complex human diseases. Because traditional experimental approaches are time-consuming and expensive, using computational methods to identify microbes correlated with diseases is critical. In this paper, a new microbe-disease association prediction model is proposed that combines a multi-view multi-modal network and a multi-scale feature fusion mechanism, called M3HOGAT. Firstly, a microbe-disease association network and multiple similarity views are constructed based on multi-source information. Then, consider that neighbor information from disparate orders might be more adept at learning node representations. Consequently, the higher-order graph attention network (HOGAT) is devised to aggregate neighbor information from disparate orders to extract microbe and disease features from different networks and views. Given that the embedding features of microbe and disease from different views possess varying importance, a multi-scale feature fusion mechanism is employed to learn their interaction information, thereby generating the final feature of microbes and diseases. Finally, an inner product decoder is used to reconstruct the microbe-disease association matrix. Compared with five state-of-the-art methods on the HMDAD and Disbiome datasets, the results of 5-fold cross-validations show that M3HOGAT achieves the best performance. Furthermore, case studies on asthma and obesity confirm the effectiveness of M3HOGAT in identifying potential disease-related microbes.
Collapse
|