1
|
Wang G, Chen H, Wang H, Fu Y, Shi C, Cao C, Hu X. Heterogeneous Graph Contrastive Learning with Graph Diffusion for Drug Repositioning. J Chem Inf Model 2025. [PMID: 40377926 DOI: 10.1021/acs.jcim.5c00435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/18/2025]
Abstract
Drug repositioning, which identifies novel therapeutic applications for existing drugs, offers a cost-effective alternative to traditional drug development. However, effectively capturing the complex relationships between drugs and diseases remains challenging. We present HGCL-DR, a novel heterogeneous graph contrastive learning framework for drug repositioning that effectively integrates global and local feature representations through three key components. First, we introduce an improved heterogeneous graph contrastive learning approach to model drug-disease relationships. Second, for local feature extraction, we employ a bidirectional graph convolutional network with a subgraph generation strategy in the bipartite drug-disease association graph, while utilizing a graph diffusion process to capture long-range dependencies in drug-drug and disease-disease relation graphs. Third, for global feature extraction, we leverage contrastive learning in the heterogeneous graph to enhance embedding consistency across different feature spaces. Extensive experiments on four benchmark data sets using 10-fold cross-validation demonstrate that HGCL-DR consistently outperforms state-of-the-art baselines in both AUPR, AUROC, and F1-score metrics. Ablation studies confirm the significance of each proposed component, while case studies on Alzheimer's disease and breast neoplasms validate HGCL-DR's practical utility in identifying novel drug candidates. These results establish HGCL-DR as an effective approach for computational drug repositioning.
Collapse
Affiliation(s)
- Guishen Wang
- School of Computer Science and Engineering, Changchun University of Technology, North Yuanda Street No. 3000, Changchun 130012, Jilin, China
| | - Honghan Chen
- School of Computer Science and Engineering, Changchun University of Technology, North Yuanda Street No. 3000, Changchun 130012, Jilin, China
| | - Handan Wang
- School of Computer Science and Engineering, Changchun University of Technology, North Yuanda Street No. 3000, Changchun 130012, Jilin, China
| | - Yuyouqiang Fu
- School of Computer Science and Engineering, Changchun University of Technology, North Yuanda Street No. 3000, Changchun 130012, Jilin, China
| | - Caiye Shi
- School of Computer Science and Engineering, Changchun University of Technology, North Yuanda Street No. 3000, Changchun 130012, Jilin, China
| | - Chen Cao
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Longmian Avenue No. 101, Nanjing 211166, Jiangsu, China
| | - Xiaowen Hu
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Longmian Avenue No. 101, Nanjing 211166, Jiangsu, China
| |
Collapse
|
2
|
He F, Duan L, Xing G, Chang X, Zhou H, Yu M. AMFGNN: an adaptive multi-view fusion graph neural network model for drug prediction. Front Pharmacol 2025; 16:1543966. [PMID: 40356971 PMCID: PMC12066569 DOI: 10.3389/fphar.2025.1543966] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2024] [Accepted: 04/15/2025] [Indexed: 05/15/2025] Open
Abstract
Introduction Drug development is a complex and lengthy process, and drug-disease association prediction aims to significantly improve research efficiency and success rates by precisely identifying potential associations. However, existing methods for drug-disease association prediction still face limitations in feature representation, feature integration, and generalization capabilities. Methods To address these challenges, we propose a novel model named AMFGNN (Adaptive Multi-View Fusion Graph Neural Network). This model leverages an adaptive graph neural network and a graph attention network to extract drug features and disease features, respectively. These features are then used as the initial representations of nodes in the drug-disease association network to enable efficient information fusion. Additionally, the model incorporates a contrastive learning mechanism, which enhances the similarity and differentiation between drugs and diseases through cross-view contrastive learning, thereby improving the accuracy of association prediction. Furthermore, a Kolmogorov-Arnold network is employed to perform weighted fusion of various final features, optimizing prediction performance. Results AMFGNN demonstrates a significant advantage in predictive performance, achieving an average AUC value of 0.9453, which reflects the model's high accuracy in prediction. Discussion Cross-validation results across multiple datasets indicate that AMFGNN outperforms seven advanced drug-disease association prediction methods. Additionally, case studies on Hepatoblastoma, asthma and Alzheimer's disease further confirm the model's effectiveness and potential value in real-world applications.
Collapse
Affiliation(s)
- Fang He
- Faculty of Pediatrics, The Chinese PLA General Hospital, Beijing, China
- Department of Child Growth and Development Clinic, The Seventh Medical Center of PLA General Hospital, Beijing, China
- National Engineering Laboratory for Birth Defects Prevention and Control of Key Technology, Beijing, China
- Beijing Key Laboratory of Pediatric Organ Failure, Beijing, China
| | - Lian Duan
- Faculty of Pediatrics, The Chinese PLA General Hospital, Beijing, China
- National Engineering Laboratory for Birth Defects Prevention and Control of Key Technology, Beijing, China
- Beijing Key Laboratory of Pediatric Organ Failure, Beijing, China
- Department of Pediatric Surgery, The Seventh Medical Center of PLA General Hospital, Beijing, China
| | - Guodong Xing
- Faculty of Pediatrics, The Chinese PLA General Hospital, Beijing, China
- National Engineering Laboratory for Birth Defects Prevention and Control of Key Technology, Beijing, China
- Beijing Key Laboratory of Pediatric Organ Failure, Beijing, China
- Department of Pediatric Surgery, The Seventh Medical Center of PLA General Hospital, Beijing, China
| | - Xiaojing Chang
- Faculty of Pediatrics, The Chinese PLA General Hospital, Beijing, China
- National Engineering Laboratory for Birth Defects Prevention and Control of Key Technology, Beijing, China
- Beijing Key Laboratory of Pediatric Organ Failure, Beijing, China
- Department of Pediatric Surgery, The Seventh Medical Center of PLA General Hospital, Beijing, China
| | - Huixia Zhou
- Faculty of Pediatrics, The Chinese PLA General Hospital, Beijing, China
- National Engineering Laboratory for Birth Defects Prevention and Control of Key Technology, Beijing, China
- Beijing Key Laboratory of Pediatric Organ Failure, Beijing, China
- Department of Pediatric Surgery, The Seventh Medical Center of PLA General Hospital, Beijing, China
| | - Mengnan Yu
- Faculty of Pediatrics, The Chinese PLA General Hospital, Beijing, China
- National Engineering Laboratory for Birth Defects Prevention and Control of Key Technology, Beijing, China
- Beijing Key Laboratory of Pediatric Organ Failure, Beijing, China
- Department of Pediatric Surgery, The Seventh Medical Center of PLA General Hospital, Beijing, China
| |
Collapse
|
3
|
Shang Y, Wang Z, Chen Y, Yang X, Ren Z, Zeng X, Xu L. HNF-DDA: subgraph contrastive-driven transformer-style heterogeneous network embedding for drug-disease association prediction. BMC Biol 2025; 23:101. [PMID: 40241152 PMCID: PMC12004644 DOI: 10.1186/s12915-025-02206-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2024] [Accepted: 04/03/2025] [Indexed: 04/18/2025] Open
Abstract
BACKGROUND Drug-disease association (DDA) prediction aims to identify potential links between drugs and diseases, facilitating the discovery of new therapeutic potentials and reducing the cost and time associated with traditional drug development. However, existing DDA prediction methods often overlook the global relational information provided by other biological entities, and the complex association structure between drug diseases, limiting the potential correlations of drug and disease embeddings. RESULTS In this study, we propose HNF-DDA, a subgraph contrastive-driven transformer-style heterogeneous network embedding model for DDA prediction. Specifically, HNF-DDA adopts all-pairs message passing strategy to capture the global structure of the network, fully integrating multi-omics information. HNF-DDA also proposes the concept of subgraph contrastive learning to capture the local structure of drug-disease subgraphs, learning the high-order semantic information of nodes. Experimental results on two benchmark datasets demonstrate that HNF-DDA outperforms several state-of-the-art methods. Additionally, it shows superior performance across different dataset splitting schemes, indicating HNF-DDA's capability to generalize to novel drug and disease categories. Case studies for breast cancer and prostate cancer reveal that 9 out of the top 10 predicted candidate drugs for breast cancer and 8 out of the top 10 for prostate cancer have documented therapeutic effects. CONCLUSIONS HNF-DDA incorporates all-pairs message passing and subgraph capture strategies into heterogeneous network embedding, enabling effective learning of drug and disease representations enriched with heterogeneous information, while also demonstrating significant potential for applications in drug repositioning.
Collapse
Affiliation(s)
- Yifan Shang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
| | - Zixu Wang
- Department of Computer Science, University of Tsukuba, Tsukuba, 305-8577, Japan
| | - Yangyang Chen
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
| | - Xinyu Yang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
| | - Zhonghao Ren
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic University, Shenzhen, 518055, China.
| |
Collapse
|
4
|
Le MH, Dao NA, Dang XT. Bayesian Inference for Drug Discovery by High Negative Samples and Oversampling. Bioinform Biol Insights 2025; 19:11779322251328269. [PMID: 40290635 PMCID: PMC12033409 DOI: 10.1177/11779322251328269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2024] [Accepted: 03/03/2025] [Indexed: 04/30/2025] Open
Abstract
Drug repositioning holds great promise for reducing the time and cost associated with traditional drug discovery, but it faces significant challenges related to data imbalance and noise in negative samples. In this article, we introduce a novel method leveraging high negative oversampling (HNO) to address these challenges. Our approach integrates HNO with advanced techniques such as network-based graph mining, matrix factorization, and Bayesian inference, specifically designed for imbalanced data scenarios. Constructing high-quality negative samples is crucial to mitigate the detrimental effects of noisy negative data and enhance model performance. Experimental results demonstrate the efficacy of our approach in enhancing the performance of drug discovery models by effectively managing data imbalance and refining the selection of negative samples. This methodology provides a robust framework for improving drug repositioning, with potential applications in broader biomedical domains.
Collapse
|
5
|
Suay-García B, Climent J, Pérez-Gracia MT, Falcó A. A comprehensive update on the use of molecular topology applications for anti-infective drug discovery. Expert Opin Drug Discov 2025; 20:465-474. [PMID: 40056200 DOI: 10.1080/17460441.2025.2477625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2024] [Revised: 02/17/2025] [Accepted: 03/06/2025] [Indexed: 03/10/2025]
Abstract
INTRODUCTION The rapid emergence of infectious diseases poses a significant threat to global economies and public health. To combat this, it is crucial to develop effective treatments. One essential tool in drug design is molecular topology, which uses topological indices to build QSAR models. This mathematical framework describes chemical compound structures, facilitating easy characterization. AREAS COVERED Classical ligand-based molecular topology has a series of limitations that can be overcome by shifting focus into structure-based approaches. Recent developments have emerged, focusing on target protein topology rather than drug molecules. Techniques like TDA, ESPH, LWPH, and molecular GDL are among the new methods being explored. This review is based on literature searches utilizing PubMed, Web of Science, and Google Scholar to identify articles published between the year 2000 and 2024. EXPERT OPINION The authors believe that it is time to move away from traditional molecular topology and toward innovative approaches and technologies. Shifting focus from ligand-based to structure-based molecular topology, combined with new databases and algorithms, can aid in fighting drug-resistant microorganisms. This shift opens a broader chemical space for developing new anti-infective drugs, ultimately improving public health outcomes.
Collapse
Affiliation(s)
- Beatriz Suay-García
- Departamento de Matemáticas, Física y Ciencias Tecnológicas, Universidad Cardenal Herrera-CEU, CEU Universities, Valencia, Spain
| | - Joan Climent
- Departamento de Producción y Sanidad Animal, Salud Pública Veterinaria y Ciencia y Tecnología de los Alimentos, Facultad de Veterinaria, Universidad CEU Cardenal Herrera, CEU Universities, Valencia, Spain
| | - María Teresa Pérez-Gracia
- Área de Microbiología, Departamento de Farmacia, Instituto de Ciencias Biomédicas, Facultad de Ciencias de la Salud Universidad Cardenal Herrera-CEU, CEU Universities, Alfara del Patriarca, Valencia, Spain
| | - Antonio Falcó
- Departamento de Matemáticas, Física y Ciencias Tecnológicas, Universidad Cardenal Herrera-CEU, CEU Universities, Valencia, Spain
| |
Collapse
|
6
|
Li J, Chen J, Huang J, Lei X. Hyperbolic multivariate feature learning in higher-order heterogeneous networks for drug-disease prediction. Artif Intell Med 2025; 162:103090. [PMID: 39985835 DOI: 10.1016/j.artmed.2025.103090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 10/02/2024] [Accepted: 02/14/2025] [Indexed: 02/24/2025]
Abstract
New drug discovery has always been a costly, time-consuming process with a high failure rate. Repurposing existing drugs offers a valuable alternative and reduces the risks associated with developing new drugs. Various experimental methods have been employed to facilitate drug repositioning; however, associations prediction between drugs and diseases through biological experiments is both expensive and time-consuming. Consequently, it is imperative to develop efficient and highly precise computational methods for predicting these associations. Based on this, we propose a drug-disease associations prediction method based on Hyperbolic Multivariate feature Learning in High-order Heterogeneous Networks for Drug-Disease Prediction, called H3ML. Our approach begins by mining high-order information from protein-disease and drug-protein networks to construct high-order heterogeneous networks. Subsequently, we employ multivariate feature learning to create hyperbolic representations, and then enhance the features of the heterogeneous network. Finally, we utilize a hyperbolic graph attention network in the hyperbolic space to aggregate neighbor information and perform the final prediction task. In addition, we evaluate the performance of H3ML by comparing it with some state-of-the-art methods across different datasets. The case study further validate the effectiveness of H3ML. Our implementation will be publicly available at: https://github.com/jianruichen/H-3ML.
Collapse
Affiliation(s)
- Jiamin Li
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Jianrui Chen
- School of Computer Science, Shaanxi Normal University, Xi'an, China; Key Laboratory of Modern Teaching Technology, Ministry of Education, Xi'an, China; Engineering Laboratory of Teaching Information Technology of Shaanxi Province, Xi'an, China.
| | - Junjie Huang
- School of Mathematical Sciences, Inner Mongolia University, Hohhot, China
| | - Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| |
Collapse
|
7
|
Zhao X, Wang Q, Zhang Y, He C, Yin M, Zhao X. CBKG-DTI: Multi-Level Knowledge Distillation and Biomedical Knowledge Graph for Drug-Target Interaction Prediction. IEEE J Biomed Health Inform 2025; 29:2284-2296. [PMID: 40030432 DOI: 10.1109/jbhi.2024.3500027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
The prediction of drug-target interactions (DTIs) has emerged as a vital step in drug discovery. Recently, biomedical knowledge graph enables the utilization of multi-omics resources for modelling complex biological systems and further improves overall performance of specific predictive task. However, due to the scale and generalization of biomedical knowledge graph, it is necessary to capture task-specific knowledge from biomedical knowledge graph for DTI prediction. Moreover, although biomedical knowledge graph has rich interactions between biological entities, there still needs to contain unignorable structural information of drugs or targets in the multi-modal fusion manner. To this end, we develop a novel DTI identification framework, CBKG-DTI, which aims to distill task-specific knowledge from the complex knowledge graph to the lightweight DTI prediction model. Specifically, CBKG-DTI first introduces a hierarchy-aware knowledge graph embedding as teacher model to capture semantic hierarchy information of biomedical knowledge graph. Then, to further improve model performance, CBKG-DTI integrates information from multiple aspects such as relational information and structural information by constructing a heterogeneous network and then employs a heterogeneous graph attention network framework as the lightweight student model. Moreover, we design a multi-level distillation mechanism to improve the representation and prediction ability of the lightweight student model via capturing the representation and logit distribution of the teacher model. Finally, we conduct the extensive comparison experiments and can reach the AUC of 0.9751 and the AUPR of 0.6310 under 5-fold cross validation. This not only demonstrates the superiority of CBKG-DTI in DTI prediction, but also, more importantly, validate the effectiveness of the framework capturing task-specific knowledge from biomedical knowledge graph.
Collapse
|
8
|
Yang G, Liu Y, Wen S, Chen W, Zhu X, Wang Y. DTI-MHAPR: optimized drug-target interaction prediction via PCA-enhanced features and heterogeneous graph attention networks. BMC Bioinformatics 2025; 26:11. [PMID: 39800678 PMCID: PMC11726937 DOI: 10.1186/s12859-024-06021-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Accepted: 12/20/2024] [Indexed: 01/16/2025] Open
Abstract
Drug-target interactions (DTIs) are pivotal in drug discovery and development, and their accurate identification can significantly expedite the process. Numerous DTI prediction methods have emerged, yet many fail to fully harness the feature information of drugs and targets or address the issue of feature redundancy. We aim to refine DTI prediction accuracy by eliminating redundant features and capitalizing on the node topological structure to enhance feature extraction. To achieve this, we introduce a PCA-augmented multi-layer heterogeneous graph-based network that concentrates on key features throughout the encoding-decoding phase. Our approach initiates with the construction of a heterogeneous graph from various similarity metrics, which is then encoded via a graph neural network. We concatenate and integrate the resultant representation vectors to merge multi-level information. Subsequently, principal component analysis is applied to distill the most informative features, with the random forest algorithm employed for the final decoding of the integrated data. Our method outperforms six baseline models in terms of accuracy, as demonstrated by extensive experimentation. Comprehensive ablation studies, visualization of results, and in-depth case analyses further validate our framework's efficacy and interpretability, providing a novel tool for drug discovery that integrates multimodal features.
Collapse
Affiliation(s)
- Guang Yang
- School of Information and Artificial Intelligence, Anhui Agricultural University, Changjiang West Road, Hefei, 230036, Anhui, China
| | - Yinbo Liu
- School of Information and Artificial Intelligence, Anhui Agricultural University, Changjiang West Road, Hefei, 230036, Anhui, China
| | - Sijian Wen
- School of Information and Artificial Intelligence, Anhui Agricultural University, Changjiang West Road, Hefei, 230036, Anhui, China
| | - Wenxi Chen
- School of Information and Artificial Intelligence, Anhui Agricultural University, Changjiang West Road, Hefei, 230036, Anhui, China
| | - Xiaolei Zhu
- School of Information and Artificial Intelligence, Anhui Agricultural University, Changjiang West Road, Hefei, 230036, Anhui, China
| | - Yongmei Wang
- School of Information and Artificial Intelligence, Anhui Agricultural University, Changjiang West Road, Hefei, 230036, Anhui, China.
| |
Collapse
|
9
|
Tang X, Hou Y, Meng Y, Wang Z, Lu C, Lv J, Hu X, Xu J, Yang J. CDPMF-DDA: contrastive deep probabilistic matrix factorization for drug-disease association prediction. BMC Bioinformatics 2025; 26:5. [PMID: 39773275 PMCID: PMC11708303 DOI: 10.1186/s12859-024-06032-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2024] [Accepted: 12/27/2024] [Indexed: 01/11/2025] Open
Abstract
The process of new drug development is complex, whereas drug-disease association (DDA) prediction aims to identify new therapeutic uses for existing medications. However, existing graph contrastive learning approaches typically rely on single-view contrastive learning, which struggle to fully capture drug-disease relationships. Subsequently, we introduce a novel multi-view contrastive learning framework, named CDPMF-DDA, which enhances the model's ability to capture drug-disease associations by incorporating diverse information representations from different views. First, we decompose the original drug-disease association matrix into drug and disease feature matrices, which are then used to reconstruct the drug-disease association network, as well as the drug-drug and disease-disease similarity networks. This process effectively reduces noise in the data, establishing a reliable foundation for the networks produced. Next, we generate multiple contrastive views from both the original and generated networks. These views effectively capture hidden feature associations, significantly enhancing the model's ability to represent complex relationships. Extensive cross-validation experiments on three standard datasets show that CDPMF-DDA achieves an average AUC of 0.9475 and an AUPR of 0.5009, outperforming existing models. Additionally, case studies on Alzheimer's disease and epilepsy further validate the model's effectiveness, demonstrating its high accuracy and robustness in drug-disease association prediction. Based on a multi-view contrastive learning framework, CDPMF-DDA is capable of integrating multi-source information and effectively capturing complex drug-disease associations, making it a powerful tool for drug repositioning and the discovery of new therapeutic strategies.
Collapse
Affiliation(s)
- Xianfang Tang
- School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan, 430200, China
| | - Yawen Hou
- School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan, 430200, China
| | - Yajie Meng
- School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan, 430200, China
| | - Zhaojing Wang
- School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan, 430200, China
| | - Changcheng Lu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
| | - Juan Lv
- College of Traditional Chinese Medicine, Changsha Medical University, Changsha, 410000, China
| | - Xinrong Hu
- School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan, 430200, China
| | - Junlin Xu
- School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, 430065, Hubei, China.
| | | |
Collapse
|
10
|
Wang C, Kumar GA, Rajapakse JC. Drug discovery and mechanism prediction with explainable graph neural networks. Sci Rep 2025; 15:179. [PMID: 39747341 PMCID: PMC11696803 DOI: 10.1038/s41598-024-83090-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2024] [Accepted: 12/11/2024] [Indexed: 01/04/2025] Open
Abstract
Apprehension of drug action mechanism is paramount for drug response prediction and precision medicine. The unprecedented development of machine learning and deep learning algorithms has expedited the drug response prediction research. However, existing methods mainly focus on forward encoding of drugs, which is to obtain an accurate prediction of the response levels, but omitted to decipher the reaction mechanism between drug molecules and genes. We propose the eXplainable Graph-based Drug response Prediction (XGDP) approach that achieves a precise drug response prediction and reveals the comprehensive mechanism of action between drugs and their targets. XGDP represents drugs with molecular graphs, which naturally preserve the structural information of molecules and a Graph Neural Network module is applied to learn the latent features of molecules. Gene expression data from cancer cell lines are incorporated and processed by a Convolutional Neural Network module. A couple of deep learning attribution algorithms are leveraged to interpret interactions between drug molecular features and genes. We demonstrate that XGDP not only enhances the prediction accuracy compared to pioneering works but is also capable of capturing the salient functional groups of drugs and interactions with significant genes of cancer cells.
Collapse
Affiliation(s)
- Conghao Wang
- College of Computing and Data Science, Nanyang Technological University, Singapore, 639798, Singapore
| | - Gaurav Asok Kumar
- College of Computing and Data Science, Nanyang Technological University, Singapore, 639798, Singapore
| | - Jagath C Rajapakse
- College of Computing and Data Science, Nanyang Technological University, Singapore, 639798, Singapore.
| |
Collapse
|
11
|
Van Norden M, Mangione W, Falls Z, Samudrala R. Strategies for robust, accurate, and generalizable benchmarking of drug discovery platforms. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.12.10.627863. [PMID: 39764006 PMCID: PMC11702551 DOI: 10.1101/2024.12.10.627863] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/15/2025]
Abstract
Benchmarking is an important step in the improvement, assessment, and comparison of the performance of drug discovery platforms and technologies. We revised the existing benchmarking protocols in our Computational Analysis of Novel Drug Opportunities (CANDO) multiscale therapeutic discovery platform to improve utility and performance. We optimized multiple parameters used in drug candidate prediction and assessment with these updated benchmarking protocols. CANDO ranked 7.4% of known drugs in the top 10 compounds for their respective diseases/indications based on drug-indication associations/mappings obtained from the Comparative Toxicogenomics Database (CTD) using these optimized parameters. This increased to 12.1% when drug-indication mappings were obtained from the Therapeutic Targets Database. Performance on an indication was weakly correlated (Spearman correlation coefficient >0.3) with indication size (number of drugs associated with an indication) and moderately correlated (correlation coefficient >0.5) with compound chemical similarity. There was also moderate correlation between our new and original benchmarking protocols when assessing performance per indication using each protocol. Benchmarking results were also dependent on the source of the drug-indication mapping used: a higher proportion of indication-associated drugs were recalled in the top 100 compounds when using the Therapeutic Targets Database (TTD), which only includes FDA-approved drug-indication associations (in contrast to the CTD, which includes associations drawn from the literature). We also created compbench, a publicly available head-to-head benchmarking protocol that allows consistent assessment and comparison of different drug discovery platforms. Using this protocol, we compared two pipelines for drug repurposing within CANDO; our primary pipeline outperformed another similarity-based pipeline still in development that clusters signatures based on their associated Gene Ontology terms. Our study sets a precedent for the complete, comprehensive, and comparable benchmarking of drug discovery platforms, resulting in more accurate drug candidate predictions.
Collapse
Affiliation(s)
- Melissa Van Norden
- Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, State University of New York, Buffalo, NY, USA
| | - William Mangione
- Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, State University of New York, Buffalo, NY, USA
| | - Zackary Falls
- Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, State University of New York, Buffalo, NY, USA
| | - Ram Samudrala
- Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, State University of New York, Buffalo, NY, USA
| |
Collapse
|
12
|
Ceskoutsé RFT, Bomgni AB, Gnimpieba Zanfack DR, Agany DDM, Thomas BB, Zohim EG. HeteroKGRep: Heterogeneous Knowledge Graph based Drug Repositioning. Knowl Based Syst 2024; 305:112638. [PMID: 39610660 PMCID: PMC11600970 DOI: 10.1016/j.knosys.2024.112638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2024]
Abstract
The process of developing new drugs is both time-consuming and costly, often taking over a decade and billions of dollars to obtain regulatory approval. Additionally, the complexity of patent protection for novel compounds presents challenges for pharmaceutical innovation. Drug repositioning offers an alternative strategy to uncover new therapeutic uses for existing medicines. Previous repositioning models have been limited by their reliance on homogeneous data sources, failing to leverage the rich information available in heterogeneous biomedical knowledge graphs. We propose HeteroKGRep, a novel drug repositioning model that utilizes heterogeneous graphs to address these limitations. HeteroKGRep is a multi-step framework that first generates a similarity graph from hierarchical concept relations. It then applies SMOTE over-sampling to address class imbalance before generating node sequences using a heterogeneous graph neural network. Drug and disease embeddings are extracted from the network and used for prediction. We evaluated HeteroKGRep on a graph containing biomedical concepts and relations from ontologies, pathways and literature. It achieved state-of-the-art performance with 99% accuracy, 95% AUC ROC and 94% average precision on predicting repurposing opportunities. Compared to existing homogeneous approaches, HeteroKGRep leverages diverse knowledge sources to enrich representation learning. Based on heterogeneous graphs, HeteroKGRep can discover new drug-desease associations, leveraging de novo drug development. This work establishes a promising new paradigm for knowledge-guided drug repositioning using multimodal biomedical data.
Collapse
Affiliation(s)
- Ribot Fleury T Ceskoutsé
- Ecole Nationale Supérieure Polytechnique, University of Yaounde I, P.O. Box. 8390, Yaoundé, Cameroon
| | - Alain Bertrand Bomgni
- University of South Dakota, 4800 N Career Avenue, 57107, SD, USA
- Departement of Mathematics and computer science, University of Dschang, P.O. Box. 67, Dschang, Cameroon
| | - David R Gnimpieba Zanfack
- Laboratory of Innovative Technologies (LTI), University of Picardie Jule Verne (UPJV), 48 Rue Raspail, 02100 Saint Quentin, France
| | - Diing D M Agany
- University of South Dakota, 4800 N Career Avenue, 57107, SD, USA
| | - Bouetou Bouetou Thomas
- Ecole Nationale Supérieure Polytechnique, University of Yaounde I, P.O. Box. 8390, Yaoundé, Cameroon
| | | |
Collapse
|
13
|
Zhang D, Yu N, Sun X, Li H, Zhang W, Qiao X, Zhang W, Gao R. Deciphering spatial domains from spatially resolved transcriptomics through spatially regularized deep graph networks. BMC Genomics 2024; 25:1160. [PMID: 39614161 DOI: 10.1186/s12864-024-11072-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2024] [Accepted: 11/21/2024] [Indexed: 12/01/2024] Open
Abstract
BACKGROUND Recent advancements in spatially resolved transcriptomics (SRT) have opened up unprecedented opportunities to explore gene expression patterns within spatial contexts. Deciphering spatial domains is a critical task in spatial transcriptomic data analysis, aiding in the elucidation of tissue structural heterogeneity and biological functions. However, existing spatial domain detection methods ignore the consistency of expression patterns and spatial arrangements between spots, as well as the severe gene dropout phenomenon present in SRT data, resulting in suboptimal performance in identifying tissue spatial heterogeneity. RESULTS In this paper, we introduce a novel framework, spatially regularized deep graph networks (SR-DGN), which integrates gene expression profiles with spatial information to learn spatially-consistent and informative spot representations. Specifically, SR-DGN employs graph attention networks (GAT) to adaptively aggregate gene expression information from neighboring spots, considering local expression patterns between spots. In addition, the spatial regularization constraint ensures the consistency of neighborhood relationships between physical and embedded spaces in an end-to-end manner. SR-DGN also employs cross-entropy (CE) loss to model gene expression states, effectively mitigating the impact of noisy gene dropouts. CONCLUSIONS Experimental results demonstrate that SR-DGN outperforms state-of-the-art methods in spatial domain identification across SRT data from different sequencing platforms. Moreover, SR-DGN is capable of recovering known microanatomical structures, yielding clearer low-dimensional visualizations and more accurate spatial trajectory inferences.
Collapse
Affiliation(s)
- Daoliang Zhang
- Center of Intelligent Medicine, School of Control Science and Engineering, Shandong University, Jinan, Shandong, 250061, China
| | - Na Yu
- Center of Intelligent Medicine, School of Control Science and Engineering, Shandong University, Jinan, Shandong, 250061, China
| | - Xue Sun
- Center of Intelligent Medicine, School of Control Science and Engineering, Shandong University, Jinan, Shandong, 250061, China
| | - Haoyang Li
- Center of Intelligent Medicine, School of Control Science and Engineering, Shandong University, Jinan, Shandong, 250061, China
| | - Wenjing Zhang
- Center of Intelligent Medicine, School of Control Science and Engineering, Shandong University, Jinan, Shandong, 250061, China
| | - Xu Qiao
- Center of Intelligent Medicine, School of Control Science and Engineering, Shandong University, Jinan, Shandong, 250061, China.
| | - Wei Zhang
- Center of Intelligent Medicine, School of Control Science and Engineering, Shandong University, Jinan, Shandong, 250061, China.
| | - Rui Gao
- Center of Intelligent Medicine, School of Control Science and Engineering, Shandong University, Jinan, Shandong, 250061, China.
| |
Collapse
|
14
|
Liu T, Wang S, Zhang Y, Li Y, Liu Y, Huang S. TIWMFLP: Two-Tier Interactive Weighted Matrix Factorization and Label Propagation Based on Similarity Matrix Fusion for Drug-Disease Association Prediction. J Chem Inf Model 2024; 64:8641-8654. [PMID: 39486090 DOI: 10.1021/acs.jcim.4c01589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2024]
Abstract
Accurately identifying new therapeutic uses for drugs is crucial for advancing pharmaceutical research and development. Matrix factorization is often used in association prediction due to its simplicity and high interpretability. However, existing matrix factorization models do not enable real-time interaction between molecular feature matrices and similarity matrices, nor do they consider the geometric structure of the matrices. Additionally, efficiently integrating multisource data remains a significant challenge. To address these issues, we propose a two-tier interactive weighted matrix factorization and label propagation model based on similarity matrix fusion (TIWMFLP) to assist in personalized treatment. First, we calculate the Gaussian and Laplace kernel similarities for drugs and diseases using known drug-disease associations. We then introduce a new multisource similarity fusion method, called similarity matrix fusion (SMF), to integrate these drug/disease similarities. SMF not only considers the different contributions represented by each neighbor but also incorporates drug-disease association information to enhance the contextual topological relationships and potential features of each drug/disease node in the network. Second, we innovatively developed a two-tier interactive weighted matrix factorization (TIWMF) method to process three biological networks. This method realizes for the first time the real-time interaction between the drug/disease feature matrix and its similarity matrix, allowing for a better capture of the complex relationships between drugs and diseases. Additionally, the weighted matrix of the drug/disease similarity matrix is introduced to preserve the underlying structure of the similarity matrix. Finally, the label propagation algorithm makes predictions based on the three updated biological networks. Experimental outcomes reveal that TIWMFLP consistently surpasses state-of-the-art models on four drug-disease data sets, two small molecule-miRNA data sets, and one miRNA-disease data set.
Collapse
Affiliation(s)
- Tiyao Liu
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum, Qingdao 266580, China
| | - Shudong Wang
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum, Qingdao 266580, China
| | - Yuanyuan Zhang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao 266525, China
| | - Yunyin Li
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum, Qingdao 266580, China
| | - Yingye Liu
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum, Qingdao 266580, China
| | - Shiyuan Huang
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum, Qingdao 266580, China
| |
Collapse
|
15
|
Cui H, Duan M, Bi H, Li X, Hou X, Zhang Y. Heterogeneous graph contrastive learning with gradient balance for drug repositioning. Brief Bioinform 2024; 26:bbae650. [PMID: 39692448 DOI: 10.1093/bib/bbae650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2024] [Revised: 11/02/2024] [Accepted: 11/29/2024] [Indexed: 12/19/2024] Open
Abstract
Drug repositioning, which involves identifying new therapeutic indications for approved drugs, is pivotal in accelerating drug discovery. Recently, to mitigate the effect of label sparsity on inferring potential drug-disease associations (DDAs), graph contrastive learning (GCL) has emerged as a promising paradigm to supplement high-quality self-supervised signals through designing auxiliary tasks, then transfer shareable knowledge to main task, i.e. DDA prediction. However, existing approaches still encounter two limitations. The first is how to generate augmented views for fully capturing higher-order interaction semantics. The second is the optimization imbalance issue between auxiliary and main tasks. In this paper, we propose a novel heterogeneous Graph Contrastive learning method with Gradient Balance for DDA prediction, namely GCGB. To handle the first challenge, a fusion view is introduced to integrate both semantic views (drug and disease similarity networks) and interaction view (heterogeneous biomedical network). Next, inter-view contrastive learning auxiliary tasks are designed to contrast the fusion view with semantic and interaction views, respectively. For the second challenge, we adaptively adjust the gradient of GCL auxiliary tasks from the perspective of gradient direction and magnitude for better guiding parameter update toward main task. Extensive experiments conducted on three benchmarks under 10-fold cross-validation demonstrate the model effectiveness.
Collapse
Affiliation(s)
- Hai Cui
- Information Science and Technology College, Dalian Maritime University, No.1 Linghai Road, Dalian 116026, Liaoning, China
| | - Meiyu Duan
- Information Science and Technology College, Dalian Maritime University, No.1 Linghai Road, Dalian 116026, Liaoning, China
| | - Haijia Bi
- College of Computer Science and Technology, Jilin University, No.2699 Qianjin Street, Changchun 130012, Jilin, China
| | - Xiaobo Li
- Information Science and Technology College, Dalian Maritime University, No.1 Linghai Road, Dalian 116026, Liaoning, China
| | - Xiaodi Hou
- Information Science and Technology College, Dalian Maritime University, No.1 Linghai Road, Dalian 116026, Liaoning, China
| | - Yijia Zhang
- Information Science and Technology College, Dalian Maritime University, No.1 Linghai Road, Dalian 116026, Liaoning, China
| |
Collapse
|
16
|
Muniyappan S, Rayan AXA, Varrieth GT. DRADTiP: Drug repurposing for aging disease through drug-target interaction prediction. Comput Biol Med 2024; 182:109145. [PMID: 39305733 DOI: 10.1016/j.compbiomed.2024.109145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Revised: 09/08/2024] [Accepted: 09/08/2024] [Indexed: 11/14/2024]
Abstract
MOTIVATION The greatest risk factor for many non-communicable diseases is aging. Studies on model organisms have demonstrated that genetic and chemical perturbation alterations can lengthen longevity and overall health. However, finding longevity-enhancing medications and their related targets is difficult. METHOD In this work, we designed a novel drug repurposing model by identifying the interaction between aging-related genes or targets and drugs similar to aging disease. Each disease is associated with certain specific genetic factors for the occurrence of that disease. The factors include gene expression, pathway, miRNA, and degree of genes in the protein-protein interaction network. In this paper, we aim to find the drugs that prolong the life span of humans with their aging-related targets using the above-mentioned factors. In addition, the contribution or importance of each factor may vary among drugs and targets. Therefore, we designed a novel multi-layer random walk-based network representation learning model including node and edge weight to learn the features of drugs and targets respectively. RESULT The performance of the proposed model is demonstrated using k-fold cross-validation (k = 5). This model achieved better performance with scores of 0.93 and 0.91 for precision and recall respectively. The drugs identified by the system are evaluated to be potential candidates for aging since the degree of interaction between the potential drugs and their gene sets are high. In addition, the genes that are interacting with drugs produce the same biological functions. Hence the life span of the human will be increased or prolonged.
Collapse
Affiliation(s)
- Saranya Muniyappan
- Computer Science and Engineering, CEG Campus, Anna University, Chennai, Tamil Nadu, India.
| | | | | |
Collapse
|
17
|
Baek B, Lee H. Crossfeat: a transformer-based cross-feature learning model for predicting drug side effect frequency. BMC Bioinformatics 2024; 25:324. [PMID: 39379821 PMCID: PMC11459996 DOI: 10.1186/s12859-024-05915-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2024] [Accepted: 08/23/2024] [Indexed: 10/10/2024] Open
Abstract
BACKGROUND Safe drug treatment requires an understanding of the potential side effects. Identifying the frequency of drug side effects can reduce the risks associated with drug use. However, existing computational methods for predicting drug side effect frequencies heavily depend on known drug side effect frequency information. Consequently, these methods face challenges when predicting the side effect frequencies of new drugs. Although a few methods can predict the side effect frequencies of new drugs, they exhibit unreliable performance owing to the exclusion of drug-side effect relationships. RESULTS This study proposed CrossFeat, a model based on convolutional neural network-transformer architecture with cross-feature learning that can predict the occurrence and frequency of drug side effects for new drugs, even in the absence of information regarding drug-side effect relationships. CrossFeat facilitates the concurrent learning of drugs and side effect information within its transformer architecture. This simultaneous exchange of information enables drugs to learn about their associated side effects, while side effects concurrently acquire information about the respective drugs. Such bidirectional learning allows for the comprehensive integration of drug and side effect knowledge. Our five-fold cross-validation experiments demonstrated that CrossFeat outperforms existing studies in predicting side effect frequencies for new drugs without prior knowledge. CONCLUSIONS Our model offers a promising approach for predicting the drug side effect frequencies, particularly for new drugs where prior information is limited. CrossFeat's superior performance in cross-validation experiments, along with evidence from case studies and ablation experiments, highlights its effectiveness.
Collapse
Affiliation(s)
- Bin Baek
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju, 61005, Korea
| | - Hyunju Lee
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju, 61005, Korea.
- AI Graduate School, Gwangju Institute of Science and Technology, Gwangju, 61005, Korea.
| |
Collapse
|
18
|
Wei J, Zhuo L, Fu X, Zeng X, Wang L, Zou Q, Cao D. DrugReAlign: a multisource prompt framework for drug repurposing based on large language models. BMC Biol 2024; 22:226. [PMID: 39379930 PMCID: PMC11463036 DOI: 10.1186/s12915-024-02028-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Accepted: 10/01/2024] [Indexed: 10/10/2024] Open
Abstract
Drug repurposing is a promising approach in the field of drug discovery owing to its efficiency and cost-effectiveness. Most current drug repurposing models rely on specific datasets for training, which limits their predictive accuracy and scope. The number of both market-approved and experimental drugs is vast, forming an extensive molecular space. Due to limitations in parameter size and data volume, traditional drug-target interaction (DTI) prediction models struggle to generalize well within such a broad space. In contrast, large language models (LLMs), with their vast parameter sizes and extensive training data, demonstrate certain advantages in drug repurposing tasks. In our research, we introduce a novel drug repurposing framework, DrugReAlign, based on LLMs and multi-source prompt techniques, designed to fully exploit the potential of existing drugs efficiently. Leveraging LLMs, the DrugReAlign framework acquires general knowledge about targets and drugs from extensive human knowledge bases, overcoming the data availability limitations of traditional approaches. Furthermore, we collected target summaries and target-drug space interaction data from databases as multi-source prompts, substantially improving LLM performance in drug repurposing. We validated the efficiency and reliability of the proposed framework through molecular docking and DTI datasets. Significantly, our findings suggest a direct correlation between the accuracy of LLMs' target analysis and the quality of prediction outcomes. These findings signify that the proposed framework holds the promise of inaugurating a new paradigm in drug repurposing.
Collapse
Affiliation(s)
- Jinhang Wei
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou, 325027, China
| | - Linlin Zhuo
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou, 325027, China.
| | - Xiangzheng Fu
- School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, 519087, China.
| | - XiangXiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410012, China
| | - Li Wang
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 611730, China
| | - Dongsheng Cao
- Central South University, Hunan University, Changsha, 410083, China.
| |
Collapse
|
19
|
Hausleitner C, Mueller H, Holzinger A, Pfeifer B. Collaborative weighting in federated graph neural networks for disease classification with the human-in-the-loop. Sci Rep 2024; 14:21839. [PMID: 39294334 PMCID: PMC11410954 DOI: 10.1038/s41598-024-72748-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2024] [Accepted: 09/10/2024] [Indexed: 09/20/2024] Open
Abstract
The authors introduce a novel framework that integrates federated learning with Graph Neural Networks (GNNs) to classify diseases, incorporating Human-in-the-Loop methodologies. This advanced framework innovatively employs collaborative voting mechanisms on subgraphs within a Protein-Protein Interaction (PPI) network, situated in a federated ensemble-based deep learning context. This methodological approach marks a significant stride in the development of explainable and privacy-aware Artificial Intelligence, significantly contributing to the progression of personalized digital medicine in a responsible and transparent manner.
Collapse
Affiliation(s)
- Christian Hausleitner
- Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, 8036, Graz, Austria
| | - Heimo Mueller
- Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, 8036, Graz, Austria
| | - Andreas Holzinger
- Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, 8036, Graz, Austria.
- Human-Centered AI Lab, Institute of Forest Engineering, Department of Forest and Soil Sciences, University of Natural Resources and Life Sciences Vienna, 1190, Vienna, Austria.
- Alberta Machine Intelligence Institute, Edmonton, T6G 2R3, Canada.
| | - Bastian Pfeifer
- Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, 8036, Graz, Austria
| |
Collapse
|
20
|
Dao NA, Le MH, Dang XT. Label Transfer for Drug Disease Association in Three Meta-Paths. Evol Bioinform Online 2024; 20:11769343241272414. [PMID: 39279816 PMCID: PMC11401013 DOI: 10.1177/11769343241272414] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Accepted: 07/15/2024] [Indexed: 09/18/2024] Open
Abstract
The identification of potential interactions and relationships between diseases and drugs is significant in public health care and drug discovery. As we all know, experimenting to determine the drug-disease interactions is very expensive in both time and money. However, there are still many drug-disease associations that are still undiscovered and potential. Therefore, the development of computational methods to explore the relationship between drugs and diseases is very important and essential. Many computational methods for predicting drug-disease associations have been developed based on known interactions to learn potential interactions of unknown drug-disease pairs. In this paper, we propose 3 new main groups of meta-paths based on the heterogeneous biological network of drug-protein-disease objects. For each meta-path, we design a machine learning model, then an integrated learning method is formed by these models. We evaluated our approach on 3 standard datasets which are DrugBank, OMIM, and Gottlieb's dataset. Experimental results demonstrate that the proposed method is better than some recent methods such as EMP-SVD, LRSSL, MBiRW, MPG-DDA, SCMFDD,. . . in some measures such as AUC, AUPR, and F1-score.
Collapse
|
21
|
Ahmed W, Zaman S, Asif E, Ali K, Mahmoud EE, Asheboss MA. Exploring the role of topological descriptors to predict physicochemical properties of anti-HIV drugs by using supervised machine learning algorithms. BMC Chem 2024; 18:167. [PMID: 39267184 PMCID: PMC11395299 DOI: 10.1186/s13065-024-01266-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Accepted: 08/12/2024] [Indexed: 09/14/2024] Open
Abstract
In order to explore the role of topological indices for predicting physio-chemical properties of anti-HIV drugs, this research uses python program-based algorithms to compute topological indices as well as machine learning algorithms. Degree-based topological indices are calculated using Python algorithm, providing important information about the structural behavior of drugs that are essential to their anti-HIV effectiveness. Furthermore, machine learning algorithms analyze the physio-chemical properties that correspond to anti-HIV activities, making use of their ability to identify complex trends in large, convoluted datasets. In addition to improving our comprehension of the links between molecular structure and effectiveness, the collaboration between machine learning and QSPR research further highlights the potential of computational approaches in drug discovery. This work reveals the mechanisms underlying anti-HIV effectiveness, which paves the way for the development of more potent anti-HIV drugs. This work reveals the mechanisms underlying anti-HIV efficiency, which paves the way for the development of more potent anti-HIV drugs which demonstrates the invaluable advantages of machine learning in assessing drug properties by clarifying the biological processes underlying anti-HIV behavior, which paves the way for the design and development of more effective anti-HIV drugs.
Collapse
Affiliation(s)
- Wakeel Ahmed
- Department of Mathematics, University of Sialkot, Sialkot, 51310, Pakistan.
- Department of Mathematics, COMSATS University, Islamabad Lahore Campus, Lahore, 51000, Pakistan.
| | - Shahid Zaman
- Department of Mathematics, University of Sialkot, Sialkot, 51310, Pakistan
- Department of Mathematical and Physical Sciences, University of Nizwa, Nizwa, Oman
| | - Eizzah Asif
- Department of Mathematics, University of Sialkot, Sialkot, 51310, Pakistan
| | - Kashif Ali
- Department of Mathematics, COMSATS University, Islamabad Lahore Campus, Lahore, 51000, Pakistan
| | - Emad E Mahmoud
- Department of Mathematics and Statistics, Collage of Science, Taif University, P.O. Box 11099, 21944, Taif, Saudi Arabia
| | | |
Collapse
|
22
|
Chen H, Dan L, Lu Y, Chen M, Zhang J. An improved data augmentation approach and its application in medical named entity recognition. BMC Med Inform Decis Mak 2024; 24:221. [PMID: 39103849 DOI: 10.1186/s12911-024-02624-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 07/29/2024] [Indexed: 08/07/2024] Open
Abstract
Performing data augmentation in medical named entity recognition (NER) is crucial due to the unique challenges posed by this field. Medical data is characterized by high acquisition costs, specialized terminology, imbalanced distributions, and limited training resources. These factors make achieving high performance in medical NER particularly difficult. Data augmentation methods help to mitigate these issues by generating additional training samples, thus balancing data distribution, enriching the training dataset, and improving model generalization. This paper proposes two data augmentation methods-Contextual Random Replacement based on Word2Vec Augmentation (CRR) and Targeted Entity Random Replacement Augmentation (TER)-aimed at addressing the scarcity and imbalance of data in the medical domain. When combined with a deep learning-based Chinese NER model, these methods can significantly enhance performance and recognition accuracy under limited resources. Experimental results demonstrate that both augmentation methods effectively improve the recognition capability of medical named entities. Specifically, the BERT-BiLSTM-CRF model achieved the highest F1 score of 83.587%, representing a 1.49% increase over the baseline model. This validates the importance and effectiveness of data augmentation in medical NER.
Collapse
Affiliation(s)
- Hongyu Chen
- School of Information Management, Sun Yat-Sen University, Guangzhou, 510006, China
| | - Li Dan
- School of Information Management, Sun Yat-Sen University, Guangzhou, 510006, China
| | - Yonghe Lu
- School of Artificial Intelligence, Sun Yat-Sen University, Zhuhai, 519082, China.
| | - Minghong Chen
- School of Information Management, Sun Yat-Sen University, Guangzhou, 510006, China
| | - Jinxia Zhang
- Department of Cardiology, General Hospital of Southern Theatre Command of PLA, Guangzhou, 510010, China.
| |
Collapse
|
23
|
Wang Y, Yin Z. Drug-target interaction prediction through fine-grained selection and bidirectional random walk methodology. Sci Rep 2024; 14:18104. [PMID: 39103483 PMCID: PMC11300600 DOI: 10.1038/s41598-024-69186-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Accepted: 08/01/2024] [Indexed: 08/07/2024] Open
Abstract
The study of drug-target interaction plays an important role in the process of drug development. The subject of DTI forecasting has advanced significantly in the last several years, yielding numerous significant research findings and methodologies. Heterogeneous data sources provide richer information and comprehensive perspectives for drug-target interaction prediction, so many existing methods rely on heterogeneous networks, and graph embedding technology becomes an important technology to extract information from heterogeneous networks. These approaches, however, are less concerned with potential noisy information in heterogeneous networks and more focused on the extent of information extraction in those networks. Based on this, a potential DTI predictive network model called FBRWPC is proposed in this paper. It uses a fine-grained similarity selection program to first integrate similarity on similar networks and then a bidirectional random walk graph embedding learning method with restart to obtain an updated drug target interaction matrix. Through the use of similarity selection and fine-grained selection similarity integration, the framework can effectively filter out the noise present in heterogeneous networks and enhance the model's prediction performance. The experimental findings demonstrate that, even after being split up into four distinct types of data sets, FBRWPC can still retain great prediction performance, a sign of the model's resilience and good generalization.
Collapse
Affiliation(s)
- YaPing Wang
- School of Mathematics, Physics and Statistics, Institute for Frontier Medical Technology, Center of Intelligent Computing and Applied Statistics, Shanghai University of Engineering Science, Shanghai, 201620, China
| | - ZhiXiang Yin
- School of Mathematics, Physics and Statistics, Institute for Frontier Medical Technology, Center of Intelligent Computing and Applied Statistics, Shanghai University of Engineering Science, Shanghai, 201620, China.
| |
Collapse
|
24
|
Wang Y, Su Y, Zhao K, Huo D, Du Z, Wang Z, Xie H, Liu L, Jin Q, Ren X, Chen X, Zhang D. A deep learning drug screening framework for integrating local-global characteristics: A novel attempt for limited data. Heliyon 2024; 10:e34244. [PMID: 39130417 PMCID: PMC11315141 DOI: 10.1016/j.heliyon.2024.e34244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 05/31/2024] [Accepted: 07/05/2024] [Indexed: 08/13/2024] Open
Abstract
At the beginning of the "Disease X" outbreak, drug discovery and development are often challenged by insufficient and unbalanced data. To address this problem and maximize the information value of limited data, we propose a drug screening model, LGCNN, based on convolutional neural network (CNN), which enables rapid drug screening by integrating features of drug molecular structures and drug-target interactions at both local and global (LG) levels. Experimental results show that LGCNN exhibits better performance compared to other state-of-the-art classification methods under limited data. In addition, LGCNN was applied to anti-SARS-CoV-2 drug screening to realize therapeutic drug mining against COVID-19. LGCNN transcends the limitations of traditional models for predicting interactions between single drug targets and shows new advantages in predicting multi-target drug-target interactions. Notably, the cross-coronavirus generalizability of the model is also implied by the analysis of targets, drugs, and mechanisms in the prediction results. In conclusion, LGCNN provides new ideas and methods for rapid drug screening in emergency situations where data are scarce.
Collapse
Affiliation(s)
- Ying Wang
- Department of Pharmacogenomics, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, 150081, China
| | - Yangguang Su
- Department of Pharmacogenomics, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, 150081, China
| | - Kairui Zhao
- Department of Pharmacogenomics, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, 150081, China
| | - Diwei Huo
- The Fourth Hospital of Harbin Medical University, No.37 Yiyuan Street, Harbin, Heilongjiang, 150001, China
| | - Zhenshun Du
- Department of Pharmacogenomics, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, 150081, China
| | - Zhiju Wang
- Department of Pharmacogenomics, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, 150081, China
| | - Hongbo Xie
- Department of Pharmacogenomics, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, 150081, China
| | - Lei Liu
- Department of Pharmacogenomics, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, 150081, China
| | - Qing Jin
- Department of Pharmacogenomics, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, 150081, China
| | - Xuekun Ren
- College of Mathematics of Harbin Institute of Technology, No.92 Xidazhi Street, Harbin, Heilongjiang, 150001, China
| | - Xiujie Chen
- Department of Pharmacogenomics, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, 150081, China
| | - Denan Zhang
- Department of Pharmacogenomics, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, 150081, China
| |
Collapse
|
25
|
Jiménez A, Merino MJ, Parras J, Zazo S. Explainable drug repurposing via path based knowledge graph completion. Sci Rep 2024; 14:16587. [PMID: 39025897 PMCID: PMC11258358 DOI: 10.1038/s41598-024-67163-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 07/09/2024] [Indexed: 07/20/2024] Open
Abstract
Drug repurposing aims to find new therapeutic applications for existing drugs in the pharmaceutical market, leading to significant savings in time and cost. The use of artificial intelligence and knowledge graphs to propose repurposing candidates facilitates the process, as large amounts of data can be processed. However, it is important to pay attention to the explainability needed to validate the predictions. We propose a general architecture to understand several explainable methods for graph completion based on knowledge graphs and design our own architecture for drug repurposing. We present XG4Repo (eXplainable Graphs for Repurposing), a framework that takes advantage of the connectivity of any biomedical knowledge graph to link compounds to the diseases they can treat. Our method allows methapaths of different types and lengths, which are automatically generated and optimised based on data. XG4Repo focuses on providing meaningful explanations to the predictions, which are based on paths from compounds to diseases. These paths include nodes such as genes, pathways, side effects, or anatomies, so they provide information about the targets and other characteristics of the biomedical mechanism that link compounds and diseases. Paths make predictions interpretable for experts who can validate them and use them in further research on drug repurposing. We also describe three use cases where we analyse new uses for Epirubicin, Paclitaxel, and Predinisone and present the paths that support the predictions.
Collapse
Affiliation(s)
- Ana Jiménez
- Information Processing and Telecommunications Center, Universidad Politécnica de Madrid, ETSI Telecomunicación, Avda. Complutense, 30, 28040, Madrid, Spain
| | - María José Merino
- Information Processing and Telecommunications Center, Universidad Politécnica de Madrid, ETSI Telecomunicación, Avda. Complutense, 30, 28040, Madrid, Spain
| | - Juan Parras
- Information Processing and Telecommunications Center, Universidad Politécnica de Madrid, ETSI Telecomunicación, Avda. Complutense, 30, 28040, Madrid, Spain.
| | - Santiago Zazo
- Information Processing and Telecommunications Center, Universidad Politécnica de Madrid, ETSI Telecomunicación, Avda. Complutense, 30, 28040, Madrid, Spain
| |
Collapse
|
26
|
Zhang H, Liu Y, Liu X, Wang C, Guo M. Equivariant score-based generative diffusion framework for 3D molecules. BMC Bioinformatics 2024; 25:203. [PMID: 38816718 PMCID: PMC11556161 DOI: 10.1186/s12859-024-05810-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Accepted: 05/13/2024] [Indexed: 06/01/2024] Open
Abstract
BACKGROUND Molecular biology is crucial for drug discovery, protein design, and human health. Due to the vastness of the drug-like chemical space, depending on biomedical experts to manually design molecules is exceedingly expensive. Utilizing generative methods with deep learning technology offers an effective approach to streamline the search space for molecular design and save costs. This paper introduces a novel E(3)-equivariant score-based diffusion framework for 3D molecular generation via SDEs, aiming to address the constraints of unified Gaussian diffusion methods. Within the proposed framework EMDS, the complete diffusion is decomposed into separate diffusion processes for distinct components of the molecular feature space, while the modeling processes also capture the complex dependency among these components. Moreover, angle and torsion angle information is integrated into the networks to enhance the modeling of atom coordinates and utilize spatial information more effectively. RESULTS Experiments on the widely utilized QM9 dataset demonstrate that our proposed framework significantly outperforms the state-of-the-art methods in all evaluation metrics for 3D molecular generation. Additionally, ablation experiments are conducted to highlight the contribution of key components in our framework, demonstrating the effectiveness of the proposed framework and the performance improvements of incorporating angle and torsion angle information for molecular generation. Finally, the comparative results of distribution show that our method is highly effective in generating molecules that closely resemble the actual scenario. CONCLUSION Through the experiments and comparative results, our framework clearly outperforms previous 3D molecular generation methods, exhibiting significantly better capacity for modeling chemically realistic molecules. The excellent performance of EMDS in 3D molecular generation brings novel and encouraging opportunities for tackling challenging biomedical molecule and protein scenarios.
Collapse
Affiliation(s)
- Hao Zhang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China
| | - Yang Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China.
| | - Xiaoyan Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China
| | - Cheng Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, 100044, China
| |
Collapse
|
27
|
Zhu C, Zhang C, Shang T, Zhang C, Zhai S, Cao L, Xu Z, Su Z, Song Y, Su A, Li C, Duan H. GAPS: a geometric attention-based network for peptide binding site identification by the transfer learning approach. Brief Bioinform 2024; 25:bbae297. [PMID: 38990514 PMCID: PMC11238429 DOI: 10.1093/bib/bbae297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2024] [Revised: 04/28/2024] [Accepted: 06/07/2024] [Indexed: 07/12/2024] Open
Abstract
Protein-peptide interactions (PPepIs) are vital to understanding cellular functions, which can facilitate the design of novel drugs. As an essential component in forming a PPepI, protein-peptide binding sites are the basis for understanding the mechanisms involved in PPepIs. Therefore, accurately identifying protein-peptide binding sites becomes a critical task. The traditional experimental methods for researching these binding sites are labor-intensive and time-consuming, and some computational tools have been invented to supplement it. However, these computational tools have limitations in generality or accuracy due to the need for ligand information, complex feature construction, or their reliance on modeling based on amino acid residues. To deal with the drawbacks of these computational algorithms, we describe a geometric attention-based network for peptide binding site identification (GAPS) in this work. The proposed model utilizes geometric feature engineering to construct atom representations and incorporates multiple attention mechanisms to update relevant biological features. In addition, the transfer learning strategy is implemented for leveraging the protein-protein binding sites information to enhance the protein-peptide binding sites recognition capability, taking into account the common structure and biological bias between proteins and peptides. Consequently, GAPS demonstrates the state-of-the-art performance and excellent robustness in this task. Moreover, our model exhibits exceptional performance across several extended experiments including predicting the apo protein-peptide, protein-cyclic peptide and the AlphaFold-predicted protein-peptide binding sites. These results confirm that the GAPS model is a powerful, versatile, stable method suitable for diverse binding site predictions.
Collapse
Affiliation(s)
- Cheng Zhu
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Chaowang Road, Gongshu District, Hangzhou 310014, China
| | - Chengyun Zhang
- AI Department, Shanghai Highslab Therapeutics. Inc, Zhangheng Road, Pudong New Area, Shanghai 201203, China
| | - Tianfeng Shang
- AI Department, Shanghai Highslab Therapeutics. Inc, Zhangheng Road, Pudong New Area, Shanghai 201203, China
| | - Chenhao Zhang
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Chaowang Road, Gongshu District, Hangzhou 310014, China
| | - Silong Zhai
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Chaowang Road, Gongshu District, Hangzhou 310014, China
| | - Lujing Cao
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Chaowang Road, Gongshu District, Hangzhou 310014, China
| | - Zhenyu Xu
- AI Department, Shanghai Highslab Therapeutics. Inc, Zhangheng Road, Pudong New Area, Shanghai 201203, China
| | - Zhihao Su
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Chaowang Road, Gongshu District, Hangzhou 310014, China
| | - Ying Song
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Chaowang Road, Gongshu District, Hangzhou 310014, China
| | - An Su
- College of Chemical Engineering, Zhejiang University of Technology, Chaowang Road, Gongshu District, Hangzhou 310014, China
| | - Chengxi Li
- College of Chemical and Biological Engineering, Zhejiang University, Yuhangtang Road, Xihu District, Hangzhou 310027, China
| | - Hongliang Duan
- Faculty of Applied Sciences, Macao Polytechnic University, R. de Luís Gonzaga Gomes, Macao 999078, China
| |
Collapse
|
28
|
Hassanali Aragh A, Givehchian P, Moslemi Amirani R, Masumshah R, Eslahchi C. MiRAGE: mining relationships for advanced generative evaluation in drug repositioning. Brief Bioinform 2024; 25:bbae337. [PMID: 39038932 PMCID: PMC11262809 DOI: 10.1093/bib/bbae337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Revised: 06/09/2024] [Accepted: 07/16/2024] [Indexed: 07/24/2024] Open
Abstract
MOTIVATION Drug repositioning, the identification of new therapeutic uses for existing drugs, is crucial for accelerating drug discovery and reducing development costs. Some methods rely on heterogeneous networks, which may not fully capture the complex relationships between drugs and diseases. However, integrating diverse biological data sources offers promise for discovering new drug-disease associations (DDAs). Previous evidence indicates that the combination of information would be conducive to the discovery of new DDAs. However, the challenge lies in effectively integrating different biological data sources to identify the most effective drugs for a certain disease based on drug-disease coupled mechanisms. RESULTS In response to this challenge, we present MiRAGE, a novel computational method for drug repositioning. MiRAGE leverages a three-step framework, comprising negative sampling using hard negative mining, classification employing random forest models, and feature selection based on feature importance. We evaluate MiRAGE on multiple benchmark datasets, demonstrating its superiority over state-of-the-art algorithms across various metrics. Notably, MiRAGE consistently outperforms other methods in uncovering novel DDAs. Case studies focusing on Parkinson's disease and schizophrenia showcase MiRAGE's ability to identify top candidate drugs supported by previous studies. Overall, our study underscores MiRAGE's efficacy and versatility as a computational tool for drug repositioning, offering valuable insights for therapeutic discoveries and addressing unmet medical needs.
Collapse
Affiliation(s)
- Aria Hassanali Aragh
- Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Daneshjou Blvd, District 1, Tehran 1983969411, Iran
| | - Pegah Givehchian
- Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Daneshjou Blvd, District 1, Tehran 1983969411, Iran
| | - Razieh Moslemi Amirani
- Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Daneshjou Blvd, District 1, Tehran 1983969411, Iran
| | - Raziyeh Masumshah
- Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Daneshjou Blvd, District 1, Tehran 1983969411, Iran
| | - Changiz Eslahchi
- Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Daneshjou Blvd, District 1, Tehran 1983969411, Iran
- School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Farmanieh Ave, Tajrish, District 1, Tehran 193955746, Iran
| |
Collapse
|
29
|
Li Y, Yang Y, Tong Z, Wang Y, Mi Q, Bai M, Liang G, Li B, Shu K. A comparative benchmarking and evaluation framework for heterogeneous network-based drug repositioning methods. Brief Bioinform 2024; 25:bbae172. [PMID: 38647153 PMCID: PMC11033846 DOI: 10.1093/bib/bbae172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2023] [Revised: 02/25/2024] [Accepted: 04/02/2024] [Indexed: 04/25/2024] Open
Abstract
Computational drug repositioning, which involves identifying new indications for existing drugs, is an increasingly attractive research area due to its advantages in reducing both overall cost and development time. As a result, a growing number of computational drug repositioning methods have emerged. Heterogeneous network-based drug repositioning methods have been shown to outperform other approaches. However, there is a dearth of systematic evaluation studies of these methods, encompassing performance, scalability and usability, as well as a standardized process for evaluating new methods. Additionally, previous studies have only compared several methods, with conflicting results. In this context, we conducted a systematic benchmarking study of 28 heterogeneous network-based drug repositioning methods on 11 existing datasets. We developed a comprehensive framework to evaluate their performance, scalability and usability. Our study revealed that methods such as HGIMC, ITRPCA and BNNR exhibit the best overall performance, as they rely on matrix completion or factorization. HINGRL, MLMC, ITRPCA and HGIMC demonstrate the best performance, while NMFDR, GROBMC and SCPMF display superior scalability. For usability, HGIMC, DRHGCN and BNNR are the top performers. Building on these findings, we developed an online tool called HN-DREP (http://hn-drep.lyhbio.com/) to facilitate researchers in viewing all the detailed evaluation results and selecting the appropriate method. HN-DREP also provides an external drug repositioning prediction service for a specific disease or drug by integrating predictions from all methods. Furthermore, we have released a Snakemake workflow named HN-DRES (https://github.com/lyhbio/HN-DRES) to facilitate benchmarking and support the extension of new methods into the field.
Collapse
Affiliation(s)
- Yinghong Li
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, P. R. China
| | - Yinqi Yang
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, P. R. China
| | - Zhuohao Tong
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, P. R. China
| | - Yu Wang
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, P. R. China
| | - Qin Mi
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, P. R. China
| | - Mingze Bai
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, P. R. China
| | - Guizhao Liang
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing, 400044, P. R. China
| | - Bo Li
- College of Life Sciences, Chongqing Normal University, Chongqing 401331, P. R. China
| | - Kunxian Shu
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, P. R. China
| |
Collapse
|
30
|
Lin KZ, Qiu Y, Roeder K. eSVD-DE: cohort-wide differential expression in single-cell RNA-seq data using exponential-family embeddings. BMC Bioinformatics 2024; 25:113. [PMID: 38486150 PMCID: PMC10941434 DOI: 10.1186/s12859-024-05724-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 02/28/2024] [Indexed: 03/17/2024] Open
Abstract
BACKGROUND Single-cell RNA-sequencing (scRNA) datasets are becoming increasingly popular in clinical and cohort studies, but there is a lack of methods to investigate differentially expressed (DE) genes among such datasets with numerous individuals. While numerous methods exist to find DE genes for scRNA data from limited individuals, differential-expression testing for large cohorts of case and control individuals using scRNA data poses unique challenges due to substantial effects of human variation, i.e., individual-level confounding covariates that are difficult to account for in the presence of sparsely-observed genes. RESULTS We develop the eSVD-DE, a matrix factorization that pools information across genes and removes confounding covariate effects, followed by a novel two-sample test in mean expression between case and control individuals. In general, differential testing after dimension reduction yields an inflation of Type-1 errors. However, we overcome this by testing for differences between the case and control individuals' posterior mean distributions via a hierarchical model. In previously published datasets of various biological systems, eSVD-DE has more accuracy and power compared to other DE methods typically repurposed for analyzing cohort-wide differential expression. CONCLUSIONS eSVD-DE proposes a novel and powerful way to test for DE genes among cohorts after performing a dimension reduction. Accurate identification of differential expression on the individual level, instead of the cell level, is important for linking scRNA-seq studies to our understanding of the human population.
Collapse
Affiliation(s)
- Kevin Z Lin
- Department of Biostatistics, University of Washington, Seattle, WA, USA.
| | - Yixuan Qiu
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, People's Republic of China
| | - Kathryn Roeder
- Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA, USA
| |
Collapse
|
31
|
Zhang X, Li PH, Wang D, Li H, Kong X, Zhang G, Zhao Y, Liu J, Wu W, Zhang Y, Li ZH, Luo H. Causal effect of gut microbiota of Defluviitaleaceae on the clinical pathway of "Influenza-Subacute Thyroiditis-Hypothyroidism". Front Microbiol 2024; 15:1354989. [PMID: 38476943 PMCID: PMC10929266 DOI: 10.3389/fmicb.2024.1354989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Accepted: 01/29/2024] [Indexed: 03/14/2024] Open
Abstract
Introduction Hypothyroidism has been found to be influenced by gut microbiota. However, it remains unclear which a taxon of gut microbiota plays a key role in this function. Identifying the key bacteria affects hypothyroidism and through what mechanism will be helpful for the prevention of hypothyroidism through specific clinical pathways. Materials and methods In Study A, 35 families and 130 genera of gut microbiota are used as exposures, with hypothyroidism as the outcome. The causal effect of the gut microbiota on hypothyroidism is estimated through two-sample Mendelian randomization. Combining the results of the two taxonomical levels, key taxa are selected, which in Study B are investigated for their causal association with multiple generally admitted causes of hypothyroidism and their more upstream factors. For validating and revealing the potential mechanism, enrichment analyses of the related genes and interacting transcription factors were performed. Results In Study A, Defluviitaleaceae (OR: 0.043, 95% CI: 0.005-0.363, P = 0.018)/Defluviitaleaceae_UCG_011 (OR: 0.385, 95% CI: 0.172-0.865, P = 0.021) are significantly causally associated with hypothyroidism at both taxonomical levels. In Study B, Defluviitaleaceae family and Defluviitaleaceae_UCG_011 genus show the causal association with decreased thyroiditis (Family: OR: 0.174, 95% CI: 0.046-0.653, P = 0.029; Genus: OR: 0.139, 95% CI: 0.029-0.664, P = 0.043), decreased subacute thyroiditis (Family: OR: 0.028, 95% CI: 0.004-0.213, P = 0.007; Genus: OR: 0.018, 95% CI: 0.002-0.194, P = 0.013), decreased influenza (Family: OR: 0.818, 95% CI: 0.676-0.989, P = 0.038; Genus: OR: 0.792, 95% CI: 0.644-0.974, P = 0.027), and increased anti-influenza H3N2 IgG levels (Family: OR: 1.934, 95% CI: 1.123-3.332, P = 0.017; Genus: OR: 1.675, 95% CI: 0.953-2.943, P = 0.073). The results of the enrichment analysis are consistent with the findings and the suggested possible mechanisms. Conclusion Defluviitaleaceae of the gut microbiota displays the probability of causally inhibiting the clinical pathway of "Influenza-Subacute Thyroiditis-Hypothyroidism" and acts as the potential probiotics to prevent influenza, subacute thyroiditis, and hypothyroidism.
Collapse
Affiliation(s)
- Xin Zhang
- Department of Radiation Oncology, Cancer Center, West China Hospital, Sichuan University, Chengdu, China
- Department of Biotherapy, Cancer Center, West China Hospital, Sichuan University, Chengdu, China
| | - Pei-Heng Li
- Division of Thyroid Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Dongyue Wang
- Department of Ophthalmology, West China Hospital, Sichuan University, Chengdu, China
| | - Hancong Li
- Division of Thyroid Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Xiangyu Kong
- Division of Thyroid Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Gongshuang Zhang
- Division of Thyroid Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Yue Zhao
- Division of Thyroid Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Jiaye Liu
- Division of Thyroid Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Wenshuang Wu
- Division of Thyroid Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Yuwei Zhang
- Department of Endocrinology and Metabolism, West China Hospital of Sichuan University, Chengdu, China
- Center for Diabetes and Metabolism Research, West China Hospital of Sichuan University, Chengdu, China
| | - Zhi-Hui Li
- Division of Thyroid Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Han Luo
- Division of Thyroid Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan, China
- Department of Laboratory Medicine/Research Centre of Clinical Laboratory Medicine, West China Hospital, Sichuan University, Chengdu, Sichuan, China
| |
Collapse
|
32
|
Wang H, Lu H, Sun J, Safo SE. Interpretable deep learning methods for multiview learning. BMC Bioinformatics 2024; 25:69. [PMID: 38350879 PMCID: PMC11265116 DOI: 10.1186/s12859-024-05679-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 01/29/2024] [Indexed: 02/15/2024] Open
Abstract
BACKGROUND Technological advances have enabled the generation of unique and complementary types of data or views (e.g. genomics, proteomics, metabolomics) and opened up a new era in multiview learning research with the potential to lead to new biomedical discoveries. RESULTS We propose iDeepViewLearn (Interpretable Deep Learning Method for Multiview Learning) to learn nonlinear relationships in data from multiple views while achieving feature selection. iDeepViewLearn combines deep learning flexibility with the statistical benefits of data and knowledge-driven feature selection, giving interpretable results. Deep neural networks are used to learn view-independent low-dimensional embedding through an optimization problem that minimizes the difference between observed and reconstructed data, while imposing a regularization penalty on the reconstructed data. The normalized Laplacian of a graph is used to model bilateral relationships between variables in each view, therefore, encouraging selection of related variables. iDeepViewLearn is tested on simulated and three real-world data for classification, clustering, and reconstruction tasks. For the classification tasks, iDeepViewLearn had competitive classification results with state-of-the-art methods in various settings. For the clustering task, we detected molecular clusters that differed in their 10-year survival rates for breast cancer. For the reconstruction task, we were able to reconstruct handwritten images using a few pixels while achieving competitive classification accuracy. The results of our real data application and simulations with small to moderate sample sizes suggest that iDeepViewLearn may be a useful method for small-sample-size problems compared to other deep learning methods for multiview learning. CONCLUSION iDeepViewLearn is an innovative deep learning model capable of capturing nonlinear relationships between data from multiple views while achieving feature selection. It is fully open source and is freely available at https://github.com/lasandrall/iDeepViewLearn .
Collapse
Affiliation(s)
- Hengkang Wang
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, 55455, USA
| | - Han Lu
- Division of Biostatistics and Health Data Science, University of Minnesota, Minneapolis, 55414, USA
| | - Ju Sun
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, 55455, USA
| | - Sandra E Safo
- Division of Biostatistics and Health Data Science, University of Minnesota, Minneapolis, 55414, USA.
| |
Collapse
|
33
|
Ren ZH, Yu CQ, Li LP, You ZH, Li ZW, Zhang SW, Zeng X, Shang YF. SiSGC: A Drug Repositioning Prediction Model Based on Heterogeneous Simplifying Graph Convolution. J Chem Inf Model 2024; 64:238-249. [PMID: 38103039 DOI: 10.1021/acs.jcim.3c01665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2023]
Abstract
Drug repositioning plays a key role in disease treatment. With the large-scale chemical data increasing, many computational methods are utilized for drug-disease association prediction. However, most of the existing models neglect the positive influence of non-Euclidean data and multisource information, and there is still a critical issue for graph neural networks regarding how to set the feature diffuse distance. To solve the problems, we proposed SiSGC, which makes full use of the biological knowledge information as initial features and learns the structure information from the constructed heterogeneous graph with the adaptive selection of the information diffuse distance. Then, the structural features are fused with the denoised similarity information and fed to the advanced classifier of CatBoost to make predictions. Three different data sets are used to confirm the robustness and generalization of SiSGC under two splitting strategies. Experiment results demonstrate that the proposed model achieves superior performance compared with the six leading methods and four variants. Our case study on breast neoplasms further indicates that SiSGC is trustworthy and robust yet simple. We also present four drugs for breast cancer treatment with high confidence and further give an explanation for demonstrating the rationality. There is no doubt that SiSGC can be used as a beneficial supplement for drug repositioning.
Collapse
Affiliation(s)
- Zhong-Hao Ren
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China
| | - Chang-Qing Yu
- School of Information Engineering, Xijing University, Xi'an 710123, China
| | - Li-Ping Li
- College of Agriculture and Forestry, Longdong University, Qingyang 745000, China
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an 710129, China
| | - Zheng-Wei Li
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China
| | - Shan-Wen Zhang
- School of Information Engineering, Xijing University, Xi'an 710123, China
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China
| | - Yi-Fan Shang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China
| |
Collapse
|
34
|
Amiri R, Razmara J, Parvizpour S, Izadkhah H. A novel efficient drug repurposing framework through drug-disease association data integration using convolutional neural networks. BMC Bioinformatics 2023; 24:442. [PMID: 37993777 PMCID: PMC10664633 DOI: 10.1186/s12859-023-05572-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 11/17/2023] [Indexed: 11/24/2023] Open
Abstract
Drug repurposing is an exciting field of research toward recognizing a new FDA-approved drug target for the treatment of a specific disease. It has received extensive attention regarding the tedious, time-consuming, and highly expensive procedure with a high risk of failure of new drug discovery. Data-driven approaches are an important class of methods that have been introduced for identifying a candidate drug against a target disease. In the present study, a model is proposed illustrating the integration of drug-disease association data for drug repurposing using a deep neural network. The model, so-called IDDI-DNN, primarily constructs similarity matrices for drug-related properties (three matrices), disease-related properties (two matrices), and drug-disease associations (one matrix). Then, these matrices are integrated into a unique matrix through a two-step procedure benefiting from the similarity network fusion method. The model uses a constructed matrix for the prediction of novel and unknown drug-disease associations through a convolutional neural network. The proposed model was evaluated comparatively using two different datasets including the gold standard dataset and DNdataset. Comparing the results of evaluations indicates that IDDI-DNN outperforms other state-of-the-art methods concerning prediction accuracy.
Collapse
Affiliation(s)
- Ramin Amiri
- Department of Computer Science, Faculty of Mathematics, Statistics and Computer Science, University of Tabriz, Tabriz, Iran
| | - Jafar Razmara
- Department of Computer Science, Faculty of Mathematics, Statistics and Computer Science, University of Tabriz, Tabriz, Iran.
| | - Sepideh Parvizpour
- Research Center for Pharmaceutical Nanotechnology, Biomedicine Institute, Tabriz University of Medical Sciences, Tabriz, Iran
- Department of Medical Biotechnology, Faculty of Advanced Medical Sciences, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Habib Izadkhah
- Department of Computer Science, Faculty of Mathematics, Statistics and Computer Science, University of Tabriz, Tabriz, Iran
| |
Collapse
|
35
|
Lecca P, Lecca M. Graph embedding and geometric deep learning relevance to network biology and structural chemistry. Front Artif Intell 2023; 6:1256352. [PMID: 38035201 PMCID: PMC10687447 DOI: 10.3389/frai.2023.1256352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 10/16/2023] [Indexed: 12/02/2023] Open
Abstract
Graphs are used as a model of complex relationships among data in biological science since the advent of systems biology in the early 2000. In particular, graph data analysis and graph data mining play an important role in biology interaction networks, where recent techniques of artificial intelligence, usually employed in other type of networks (e.g., social, citations, and trademark networks) aim to implement various data mining tasks including classification, clustering, recommendation, anomaly detection, and link prediction. The commitment and efforts of artificial intelligence research in network biology are motivated by the fact that machine learning techniques are often prohibitively computational demanding, low parallelizable, and ultimately inapplicable, since biological network of realistic size is a large system, which is characterised by a high density of interactions and often with a non-linear dynamics and a non-Euclidean latent geometry. Currently, graph embedding emerges as the new learning paradigm that shifts the tasks of building complex models for classification, clustering, and link prediction to learning an informative representation of the graph data in a vector space so that many graph mining and learning tasks can be more easily performed by employing efficient non-iterative traditional models (e.g., a linear support vector machine for the classification task). The great potential of graph embedding is the main reason of the flourishing of studies in this area and, in particular, the artificial intelligence learning techniques. In this mini review, we give a comprehensive summary of the main graph embedding algorithms in light of the recent burgeoning interest in geometric deep learning.
Collapse
Affiliation(s)
- Paola Lecca
- Faculty of Engineering, Free University of Bozen-Bolzano, Bolzano, Italy
| | - Michela Lecca
- Fondazione Bruno Kessler, Digital Industry Center, Technologies of Vision, Trento, Italy
| |
Collapse
|
36
|
Yan P, Ma H, Tian W, Liu J, Yan X, Ma L, Wei S, Zhu J, Zhu Y, Lai J. Methadone maintenance treatment is more effective than compulsory detoxification in addressing gut microbiota dysbiosis caused by heroin abuse. Front Microbiol 2023; 14:1283276. [PMID: 37954240 PMCID: PMC10635210 DOI: 10.3389/fmicb.2023.1283276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 10/09/2023] [Indexed: 11/14/2023] Open
Abstract
Introduction Heroin use disorder (HUD) is commonly accompanied by gut dysbiosis, but the roles of gut microbiota in HUD treatment, such as compulsory detoxification and methadone maintenance treatment (MMT), remain poorly understood. Methods In this study, we performed 16 s rDNA and whole metagenome sequencing to analyze the gut microbial profiles of HUD patients undergoing heroin addiction, heroin withdrawal (compulsory detoxification), and MMT. Results Our findings revealed that, compared to healthy controls, microbial diversity was significantly decreased in HUD patients who were in a state of heroin addiction and withdrawal, but not in those receiving MMT. We observed significant alterations in 10 bacterial phyla and 20 bacterial families in HUD patients, while MMT partially restored these changes. Whole metagenome sequencing indicated gut microbiota functions were significantly disrupted in HUD patients experiencing heroin addiction and withdrawal, but MMT was found to almost reverse these dysfunctions. In addition, we identified 24 featured bacteria at the genus level that could be used to effectively distinguish between healthy individuals and those with heroin addiction, heroin withdrawal, or receiving MMT. Furthermore, we found the relative abundance of Actinomyces, Turicibacter and Weissella were positively associated with the Hamilton Depression Scale score in different states of HUD patients. Discussion This study provides evidence from the gut microbiota perspective that MMT is a more effective approach than compulsory detoxification for HUD treatment.
Collapse
Affiliation(s)
- Peng Yan
- NHC Key Laboratory of Forensic Science, College of Forensic Science, Xi’an Jiaotong University, Xi’an, China
- National Biosafety Evidence Foundation, Bio-evidence Sciences Academy, Western China Science and Technology Innovation Harbor, Xi'an Jiaotong University, Xi'an, China
| | - Haotian Ma
- NHC Key Laboratory of Forensic Science, College of Forensic Science, Xi’an Jiaotong University, Xi’an, China
- National Biosafety Evidence Foundation, Bio-evidence Sciences Academy, Western China Science and Technology Innovation Harbor, Xi'an Jiaotong University, Xi'an, China
| | - Wenrong Tian
- NHC Key Laboratory of Forensic Science, College of Forensic Science, Xi’an Jiaotong University, Xi’an, China
- National Biosafety Evidence Foundation, Bio-evidence Sciences Academy, Western China Science and Technology Innovation Harbor, Xi'an Jiaotong University, Xi'an, China
| | - Jincen Liu
- NHC Key Laboratory of Forensic Science, College of Forensic Science, Xi’an Jiaotong University, Xi’an, China
- National Biosafety Evidence Foundation, Bio-evidence Sciences Academy, Western China Science and Technology Innovation Harbor, Xi'an Jiaotong University, Xi'an, China
| | - Xinyue Yan
- NHC Key Laboratory of Forensic Science, College of Forensic Science, Xi’an Jiaotong University, Xi’an, China
- National Biosafety Evidence Foundation, Bio-evidence Sciences Academy, Western China Science and Technology Innovation Harbor, Xi'an Jiaotong University, Xi'an, China
| | - Lei Ma
- NHC Key Laboratory of Forensic Science, College of Forensic Science, Xi’an Jiaotong University, Xi’an, China
- National Biosafety Evidence Foundation, Bio-evidence Sciences Academy, Western China Science and Technology Innovation Harbor, Xi'an Jiaotong University, Xi'an, China
| | - Shuguang Wei
- NHC Key Laboratory of Forensic Science, College of Forensic Science, Xi’an Jiaotong University, Xi’an, China
- National Biosafety Evidence Foundation, Bio-evidence Sciences Academy, Western China Science and Technology Innovation Harbor, Xi'an Jiaotong University, Xi'an, China
| | - Jie Zhu
- NHC Key Laboratory of Forensic Science, College of Forensic Science, Xi’an Jiaotong University, Xi’an, China
- National Biosafety Evidence Foundation, Bio-evidence Sciences Academy, Western China Science and Technology Innovation Harbor, Xi'an Jiaotong University, Xi'an, China
| | - Yongsheng Zhu
- NHC Key Laboratory of Forensic Science, College of Forensic Science, Xi’an Jiaotong University, Xi’an, China
- National Biosafety Evidence Foundation, Bio-evidence Sciences Academy, Western China Science and Technology Innovation Harbor, Xi'an Jiaotong University, Xi'an, China
| | - Jianghua Lai
- NHC Key Laboratory of Forensic Science, College of Forensic Science, Xi’an Jiaotong University, Xi’an, China
- National Biosafety Evidence Foundation, Bio-evidence Sciences Academy, Western China Science and Technology Innovation Harbor, Xi'an Jiaotong University, Xi'an, China
| |
Collapse
|
37
|
Ghorbanali Z, Zare-Mirakabad F, Salehi N, Akbari M, Masoudi-Nejad A. DrugRep-HeSiaGraph: when heterogenous siamese neural network meets knowledge graphs for drug repurposing. BMC Bioinformatics 2023; 24:374. [PMID: 37789314 PMCID: PMC10548718 DOI: 10.1186/s12859-023-05479-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 09/12/2023] [Indexed: 10/05/2023] Open
Abstract
BACKGROUND Drug repurposing is an approach that holds promise for identifying new therapeutic uses for existing drugs. Recently, knowledge graphs have emerged as significant tools for addressing the challenges of drug repurposing. However, there are still major issues with constructing and embedding knowledge graphs. RESULTS This study proposes a two-step method called DrugRep-HeSiaGraph to address these challenges. The method integrates the drug-disease knowledge graph with the application of a heterogeneous siamese neural network. In the first step, a drug-disease knowledge graph named DDKG-V1 is constructed by defining new relationship types, and then numerical vector representations for the nodes are created using the distributional learning method. In the second step, a heterogeneous siamese neural network called HeSiaNet is applied to enrich the embedding of drugs and diseases by bringing them closer in a new unified latent space. Then, it predicts potential drug candidates for diseases. DrugRep-HeSiaGraph achieves impressive performance metrics, including an AUC-ROC of 91.16%, an AUC-PR of 90.32%, an accuracy of 84.63%, a BS of 0.119, and an MCC of 69.31%. CONCLUSION We demonstrate the effectiveness of the proposed method in identifying potential drugs for COVID-19 as a case study. In addition, this study shows the role of dipeptidyl peptidase 4 (DPP-4) as a potential receptor for SARS-CoV-2 and the effectiveness of DPP-4 inhibitors in facing COVID-19. This highlights the practical application of the model in addressing real-world challenges in the field of drug repurposing. The code and data for DrugRep-HeSiaGraph are publicly available at https://github.com/CBRC-lab/DrugRep-HeSiaGraph .
Collapse
Affiliation(s)
- Zahra Ghorbanali
- Computational Biology Research Center (CBRC), Department of Mathematics and Computer Science, Amirkabir University of Technology, Tehran, Iran
| | - Fatemeh Zare-Mirakabad
- Computational Biology Research Center (CBRC), Department of Mathematics and Computer Science, Amirkabir University of Technology, Tehran, Iran.
| | - Najmeh Salehi
- School of Biological Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
| | - Mohammad Akbari
- Computational Biology Research Center (CBRC), Department of Mathematics and Computer Science, Amirkabir University of Technology, Tehran, Iran
| | - Ali Masoudi-Nejad
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| |
Collapse
|
38
|
Li X, Shang C, Xu C, Wang Y, Xu J, Zhou Q. Development and comparison of machine learning-based models for predicting heart failure after acute myocardial infarction. BMC Med Inform Decis Mak 2023; 23:165. [PMID: 37620904 PMCID: PMC10463624 DOI: 10.1186/s12911-023-02240-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Accepted: 07/13/2023] [Indexed: 08/26/2023] Open
Abstract
AIMS Heart failure (HF) is one of the common adverse cardiovascular events after acute myocardial infarction (AMI), but the predictive efficacy of numerous machine learning (ML) built models is unclear. This study aimed to build an optimal model to predict the occurrence of HF in AMI patients by comparing seven ML algorithms. METHODS Cohort 1 included AMI patients from 2018 to 2019 divided into HF and control groups. All first routine test data of the study subjects were collected as the features to be selected for the model, and seven ML algorithms with screenable features were evaluated. Cohort 2 contains AMI patients from 2020 to 2021 to establish an early warning model with external validation. ROC curve and DCA curve to analyze the diagnostic efficacy and clinical benefit of the model respectively. RESULTS The best performer among the seven ML algorithms was XgBoost, and the features of XgBoost algorithm for troponin I, triglycerides, urine red blood cell count, γ-glutamyl transpeptidase, glucose, urine specific gravity, prothrombin time, prealbumin, and urea were ranked high in importance. The AUC of the HF-Lab9 prediction model built by the XgBoost algorithm was 0.966 and had good clinical benefits. CONCLUSIONS This study screened the optimal ML algorithm as XgBoost and developed the model HF-Lab9 will improve the accuracy of clinicians in assessing the occurrence of HF after AMI and provide a reference for the selection of subsequent model-building algorithms.
Collapse
Affiliation(s)
- Xuewen Li
- Department of Laboratory Medicine, First Hospital of Jilin University, Changchun, China
| | - Chengming Shang
- Information center, First Hospital of Jilin University, Changchun, China
| | - Changyan Xu
- Medical Department, First Hospital of Jilin University, Changchun, China
| | - Yiting Wang
- Department of Laboratory Medicine, First Hospital of Jilin University, Changchun, China
| | - Jiancheng Xu
- Department of Laboratory Medicine, First Hospital of Jilin University, Changchun, China
| | - Qi Zhou
- Department of Pediatrics, First Hospital of Jilin University, 1Xinmin Street, Changchun, 130021, Jilin, China.
| |
Collapse
|
39
|
Syama K, Jothi JAA, Khanna N. Automatic disease prediction from human gut metagenomic data using boosting GraphSAGE. BMC Bioinformatics 2023; 24:126. [PMID: 37003965 PMCID: PMC10067187 DOI: 10.1186/s12859-023-05251-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 03/23/2023] [Indexed: 04/03/2023] Open
Abstract
BACKGROUND The human microbiome plays a critical role in maintaining human health. Due to the recent advances in high-throughput sequencing technologies, the microbiome profiles present in the human body have become publicly available. Hence, many works have been done to analyze human microbiome profiles. These works have identified that different microbiome profiles are present in healthy and sick individuals for different diseases. Recently, several computational methods have utilized the microbiome profiles to automatically diagnose and classify the host phenotype. RESULTS In this work, a novel deep learning framework based on boosting GraphSAGE is proposed for automatic prediction of diseases from metagenomic data. The proposed framework has two main components, (a). Metagenomic Disease graph (MD-graph) construction module, (b). Disease prediction Network (DP-Net) module. The graph construction module constructs a graph by considering each metagenomic sample as a node in the graph. The graph captures the relationship between the samples using a proximity measure. The DP-Net consists of a boosting GraphSAGE model which predicts the status of a sample as sick or healthy. The effectiveness of the proposed method is verified using real and synthetic datasets corresponding to diseases like inflammatory bowel disease and colorectal cancer. The proposed model achieved a highest AUC of 93%, Accuracy of 95%, F1-score of 95%, AUPRC of 95% for the real inflammatory bowel disease dataset and a best AUC of 90%, Accuracy of 91%, F1-score of 87% and AUPRC of 93% for the real colorectal cancer dataset. CONCLUSION The proposed framework outperforms other machine learning and deep learning models in terms of classification accuracy, AUC, F1-score and AUPRC for both synthetic and real metagenomic data.
Collapse
Affiliation(s)
- K Syama
- Department of Computer Science, Birla Institute of Technology and Science Pilani Dubai Campus, Dubai International Academic City , Dubai, UAE
| | - J Angel Arul Jothi
- Department of Computer Science, Birla Institute of Technology and Science Pilani Dubai Campus, Dubai International Academic City , Dubai, UAE.
| | | |
Collapse
|