1
|
Wang Z, Meng J, Li H, Dai Q, Lin X, Luan Y. Attention-augmented multi-domain cooperative graph representation learning for molecular interaction prediction. Neural Netw 2025; 186:107265. [PMID: 39987715 DOI: 10.1016/j.neunet.2025.107265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2024] [Revised: 01/23/2025] [Accepted: 02/07/2025] [Indexed: 02/25/2025]
Abstract
Accurate identification of molecular interactions is crucial for biological network analysis, which can provide valuable insights into fundamental regulatory mechanisms. Despite considerable progress driven by computational advancements, existing methods often rely on task-specific prior knowledge or inherent structural properties of molecules, which limits their generalizability and applicability. Recently, graph-based methods have emerged as a promising approach for predicting links in molecular networks. However, most of these methods focus primarily on aggregating topological information within individual domains, leading to an inadequate characterization of molecular interactions. To mitigate these challenges, we propose AMCGRL, a generalized multi-domain cooperative graph representation learning framework for multifarious molecular interaction prediction tasks. Concretely, AMCGRL incorporates multiple graph encoders to simultaneously learn molecular representations from both intra-domain and inter-domain graphs in a comprehensive manner. Then, the cross-domain decoder is employed to bridge these graph encoders to facilitate the extraction of task-relevant information across different domains. Furthermore, a hierarchical mutual attention mechanism is developed to capture complex pairwise interaction patterns between distinct types of molecules through inter-molecule communicative learning. Extensive experiments conducted on the various datasets demonstrate the superior representation learning capability of AMCGRL compared to the state-of-the-art methods, proving its effectiveness in advancing the prediction of molecular interactions.
Collapse
Affiliation(s)
- Zhaowei Wang
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China.
| | - Jun Meng
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China.
| | - Haibin Li
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China.
| | - Qiguo Dai
- School of Computer Science and Engineering, Dalian Minzu University, Dalian 116600, China.
| | - Xiaohui Lin
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China.
| | - Yushi Luan
- School of Bioengineering, Dalian University of Technology, Dalian 116024, China.
| |
Collapse
|
2
|
Chen X, Cai R, Huang Z, Li Z, Zheng J, Wu M. Interpretable high-order knowledge graph neural network for predicting synthetic lethality in human cancers. Brief Bioinform 2025; 26:bbaf142. [PMID: 40194555 PMCID: PMC11975366 DOI: 10.1093/bib/bbaf142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2024] [Revised: 02/21/2025] [Accepted: 03/07/2025] [Indexed: 04/09/2025] Open
Abstract
Synthetic lethality (SL) is a promising gene interaction for cancer therapy. Recent SL prediction methods integrate knowledge graphs (KGs) into graph neural networks (GNNs) and employ attention mechanisms to extract local subgraphs as explanations for target gene pairs. However, attention mechanisms often lack fidelity, typically generate a single explanation per gene pair, and fail to ensure trustworthy high-order structures in their explanations. To overcome these limitations, we propose Diverse Graph Information Bottleneck for Synthetic Lethality (DGIB4SL), a KG-based GNN that generates multiple faithful explanations for the same gene pair and effectively encodes high-order structures. Specifically, we introduce a novel DGIB objective, integrating a determinant point process constraint into the standard information bottleneck objective, and employ 13 motif-based adjacency matrices to capture high-order structures in gene representations. Experimental results show that DGIB4SL outperforms state-of-the-art baselines and provides multiple explanations for SL prediction, revealing diverse biological mechanisms underlying SL inference.
Collapse
Affiliation(s)
- Xuexin Chen
- School of Computer Science, Guangdong University of Technology, No. 100 Waihuan Xi Road, Panyu, Guangdong, Guangzhou, 510006, China
| | - Ruichu Cai
- School of Computer Science, Guangdong University of Technology, No. 100 Waihuan Xi Road, Panyu, Guangdong, Guangzhou, 510006, China
- Pazhou Laboratory (Huangpu), No. 248 Pazhou Qiaotou Street, Haizhu, Guangdong Province, Guangzhou, 510335, China
| | - Zhengting Huang
- School of Computer Science, Guangdong University of Technology, No. 100 Waihuan Xi Road, Panyu, Guangdong, Guangzhou, 510006, China
| | - Zijian Li
- Machine Learning Department, Mohamed bin Zayed University of Artificial Intelligence, Masdar, Abu Dhabi, United Arab Emirates
| | - Jie Zheng
- School of Information Science and Technology, ShanghaiTech University, No. 393 Huaxia Middle Road, Pudong, Shanghai, 201210, China
- School of Information Science and Technology, Shanghai Engineering Research Center of Intelligent Vision and Imaging, ShanghaiTech University, No. 393 Huaxia Middle Road, Pudong, Shanghai, 201210, China
| | - Min Wu
- Institute for Infocomm Research (IR), A*STAR, No. 2 Fusionopolis Way, Queenstown Planning, Singapore 138632, Singapore
| |
Collapse
|
3
|
Zhang X, Liu Q. A graph neural network approach for hierarchical mapping of breast cancer protein communities. BMC Bioinformatics 2025; 26:23. [PMID: 39838298 PMCID: PMC11749236 DOI: 10.1186/s12859-024-06015-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2024] [Accepted: 12/16/2024] [Indexed: 01/23/2025] Open
Abstract
BACKGROUND Comprehensively mapping the hierarchical structure of breast cancer protein communities and identifying potential biomarkers from them is a promising way for breast cancer research. Existing approaches are subjective and fail to take information from protein sequences into consideration. Deep learning can automatically learn features from protein sequences and protein-protein interactions for hierarchical clustering. RESULTS Using a large amount of publicly available proteomics data, we created a hierarchical tree for breast cancer protein communities using a novel hierarchical graph neural network, with the supervision of gene ontology terms and assistance of a pre-trained deep contextual language model. Then, a group-lasso algorithm was applied to identify protein communities that are under both mutation burden and survival burden, undergo significant alterations when targeted by specific drug molecules, and show cancer-dependent perturbations. The resulting hierarchical map of protein communities shows how gene-level mutations and survival information converge on protein communities at different scales. Internal validity of the model was established through the convergence on BRCA2 as a breast cancer hotspot. Further overlaps with breast cancer cell dependencies revealed SUPT6H and RAD21, along with their respective protein systems, HOST:37 and HOST:861, as potential biomarkers. Using gene-level perturbation data of the HOST:37 and HOST:861 gene sets, three FDA-approved drugs with high therapeutic value were selected as potential treatments to be further evaluated. These drugs include mercaptopurine, pioglitazone, and colchicine. CONCLUSION The proposed graph neural network approach to analyzing breast cancer protein communities in a hierarchical structure provides a novel perspective on breast cancer prognosis and treatment. By targeting entire gene sets, we were able to evaluate the prognostic and therapeutic value of genes (or gene sets) at different levels, from gene-level to system-level biology. Cancer-specific gene dependencies provide additional context for pinpointing cancer-related systems and drug-induced alterations can highlight potential therapeutic targets. These identified protein communities, in conjunction with other protein communities under strong mutation and survival burdens, can potentially be used as clinical biomarkers for breast cancer.
Collapse
Affiliation(s)
- Xiao Zhang
- Department of Applied Computer Science, University of Winnipeg, Winnipeg, MB, R3B 2E9, Canada
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB, R3E 0W2, Canada
| | - Qian Liu
- Department of Applied Computer Science, University of Winnipeg, Winnipeg, MB, R3B 2E9, Canada.
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB, R3E 0W2, Canada.
| |
Collapse
|
4
|
Sisakht M, Shahrestanaki MK, Fallahi J, Razban V. PyComp: A Versatile Tool for Efficient Data Extraction, Conversion, and Management in High-throughput Virtual Drug Screening. Curr Comput Aided Drug Des 2025; 21:479-486. [PMID: 38192133 DOI: 10.2174/0115734099274495231218150611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 11/29/2023] [Accepted: 12/02/2023] [Indexed: 01/10/2024]
Abstract
BACKGROUND Virtual screening (VS) is essential for analyzing potential drug candidates in drug discovery. Often, this involves the conversion of large volumes of compound data into specific formats suitable for computational analysis. Managing and processing this wealth of information, especially when dealing with vast numbers of compounds in various forms, such as names, identifiers, or SMILES strings, can present significant logistical and technical challenges. METHODS To streamline this process, we developed PyComp, a software tool using Python's PyQt5 library, and compiled it into an executable with Pyinstaller. PyComp provides a systematic way for users to retrieve and convert a list of compound names, IDs (even in a range), or SMILES strings into the desired 3D format. RESULTS PyComp greatly enhances the efficiency of data extraction, conversion, and storage processes involved in VS. It searches for similar compounds coupled with its ability to handle misidentified compounds and offers users an easy-to-use, customizable tool for managing largescale compound data. By streamlining these operations, PyComp allows researchers to save significant time and effort, thus accelerating the pace of drug discovery research. CONCLUSION PyComp effectively addresses some of the most pressing challenges in highthroughput VS: efficient management and conversion of large volumes of compound data. As a user-friendly, customizable software tool, PyComp is pivotal in improving the efficiency and success of large-scale drug screening efforts, paving the way for faster discovery of potential therapeutic compounds.
Collapse
Affiliation(s)
- Mohsen Sisakht
- Department of Molecular Medicine, School of Advanced Medical Sciences and Technologies, Shiraz University of Medical Sciences, Shiraz, Iran
| | | | - Jafar Fallahi
- Department of Molecular Medicine, School of Advanced Medical Sciences and Technologies, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Vahid Razban
- Department of Molecular Medicine, School of Advanced Medical Sciences and Technologies, Shiraz University of Medical Sciences, Shiraz, Iran
| |
Collapse
|
5
|
Lin CX, Li HD, Wang J. LIMO-GCN: a linear model-integrated graph convolutional network for predicting Alzheimer disease genes. Brief Bioinform 2024; 26:bbae611. [PMID: 39592152 PMCID: PMC11596108 DOI: 10.1093/bib/bbae611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2024] [Revised: 10/02/2024] [Accepted: 11/11/2024] [Indexed: 11/28/2024] Open
Abstract
Alzheimer's disease (AD) is a complex disease with its genetic etiology not fully understood. Gene network-based methods have been proven promising in predicting AD genes. However, existing approaches are limited in their ability to model the nonlinear relationship between networks and disease genes, because (i) any data can be theoretically decomposed into the sum of a linear part and a nonlinear part, (ii) the linear part can be best modeled by a linear model since a nonlinear model is biased and can be easily overfit, and (iii) existing methods do not separate the linear part from the nonlinear part when building the disease gene prediction model. To address the limitation, we propose linear model-integrated graph convolutional network (LIMO-GCN), a generic disease gene prediction method that models the data linearity and nonlinearity by integrating a linear model with GCN. The reason to use GCN is that it is by design naturally suitable to dealing with network data, and the reason to integrate a linear model is that the linearity in the data can be best modeled by a linear model. The weighted sum of the prediction of the two components is used as the final prediction of LIMO-GCN. Then, we apply LIMO-GCN to the prediction of AD genes. LIMO-GCN outperforms the state-of-the-art approaches including GCN, network-wide association studies, and random walk. Furthermore, we show that the top-ranked genes are significantly associated with AD based on molecular evidence from heterogeneous genomic data. Our results indicate that LIMO-GCN provides a novel method for prioritizing AD genes.
Collapse
Affiliation(s)
- Cui-Xiang Lin
- School of Computer Science and Engineering, Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
- School of Mathematics and Computational Science, National Center for Applied Mathematics in Hunan, Xiangtan University, Xiangtan, Hunan 411105, P.R. China
| | - Hong-Dong Li
- School of Computer Science and Engineering, Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| | - Jianxin Wang
- School of Computer Science and Engineering, Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| |
Collapse
|
6
|
Marandon A, Rebafka T, Sokolovska N, Soula H. Conformal novelty detection for multiple metabolic networks. BMC Bioinformatics 2024; 25:358. [PMID: 39550534 PMCID: PMC11569617 DOI: 10.1186/s12859-024-05971-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2024] [Accepted: 10/25/2024] [Indexed: 11/18/2024] Open
Abstract
BACKGROUND Graphical representations are useful to model complex data in general and biological interactions in particular. Our main motivation is the comparison of metabolic networks in the wider context of developing noninvasive accurate diagnostic tools. However, comparison and classification of graphs is still extremely challenging, although a number of highly efficient methods such as graph neural networks were developed in the recent decade. Important aspects are still lacking in graph classification: interpretability and guarantees on classification quality, i.e., control of the risk level or false discovery rate control. RESULTS In our contribution, we introduce a statistically sound approach to control the false discovery rate in a classification task for graphs in a semi-supervised setting. Our procedure identifies novelties in a dataset, where a graph is considered to be a novelty when its topology is significantly different from those in the reference class. It is noteworthy that the procedure is a conformal prediction approach, which does not make any distributional assumptions on the data and that can be seen as a wrapper around traditional machine learning models, so that it takes full advantage of existing methods. The performance of the proposed method is assessed on several standard benchmarks. It is also adapted and applied to the difficult task of classifying metabolic networks, where each graph is a representation of all metabolic reactions of a bacterium and to real task from a cancer data repository. CONCLUSIONS Our approach efficiently controls - in highly complex data - the false discovery rate, while maximizing the true discovery rate to get the most reasonable predictive performance. This contribution is focused on confident classification of complex data, what can be further used to explore complex human pathologies and their mechanisms.
Collapse
Affiliation(s)
- Ariane Marandon
- LPSM, Sorbonne university, 4 place Jussieu, 75005, Paris, France
| | - Tabea Rebafka
- LPSM, Sorbonne university, 4 place Jussieu, 75005, Paris, France
- MaIAGE, INRAE, Domaine de Vilvert, 78350, Jouy-en-Josas, France
| | | | - Hédi Soula
- NutriOmics, Sorbonne university, 91 boulevard de l'Hôpital, 75013, Paris, France
| |
Collapse
|
7
|
Feng Y, Long Y, Wang H, Ouyang Y, Li Q, Wu M, Zheng J. Benchmarking machine learning methods for synthetic lethality prediction in cancer. Nat Commun 2024; 15:9058. [PMID: 39428397 PMCID: PMC11491473 DOI: 10.1038/s41467-024-52900-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 09/23/2024] [Indexed: 10/22/2024] Open
Abstract
Synthetic lethality (SL) is a gold mine of anticancer drug targets, exposing cancer-specific dependencies of cellular survival. To complement resource-intensive experimental screening, many machine learning methods for SL prediction have emerged recently. However, a comprehensive benchmarking is lacking. This study systematically benchmarks 12 recent machine learning methods for SL prediction, assessing their performance across diverse data splitting scenarios, negative sample ratios, and negative sampling techniques, on both classification and ranking tasks. We observe that all the methods can perform significantly better by improving data quality, e.g., excluding computationally derived SLs from training and sampling negative labels based on gene expression. Among the methods, SLMGAE performs the best. Furthermore, the methods have limitations in realistic scenarios such as cold-start independent tests and context-specific SLs. These results, together with source code and datasets made freely available, provide guidance for selecting suitable methods and developing more powerful techniques for SL virtual screening.
Collapse
Affiliation(s)
- Yimiao Feng
- School of Information Science and Technology, ShanghaiTech University, Shanghai, China
- Lingang Laboratory, Shanghai, China
| | - Yahui Long
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - He Wang
- School of Information Science and Technology, ShanghaiTech University, Shanghai, China
| | - Yang Ouyang
- School of Information Science and Technology, ShanghaiTech University, Shanghai, China
| | - Quan Li
- School of Information Science and Technology, ShanghaiTech University, Shanghai, China
| | - Min Wu
- Institute for Infocomm Research, Agency for Science, Technology and Research (A*STAR), Singapore, Singapore.
| | - Jie Zheng
- School of Information Science and Technology, ShanghaiTech University, Shanghai, China.
- Shanghai Engineering Research Center of Intelligent Vision and Imaging, Shanghai, China.
| |
Collapse
|
8
|
Tian Z, Yu Y, Ni F, Zou Q. Drug-target interaction prediction with collaborative contrastive learning and adaptive self-paced sampling strategy. BMC Biol 2024; 22:216. [PMID: 39334132 PMCID: PMC11437672 DOI: 10.1186/s12915-024-02012-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Accepted: 09/06/2024] [Indexed: 09/30/2024] Open
Abstract
BACKGROUND Drug-target interaction (DTI) prediction plays a pivotal role in drug discovery and drug repositioning, enabling the identification of potential drug candidates. However, most previous approaches often do not fully utilize the complementary relationships among multiple biological networks, which limits their ability to learn more consistent representations. Additionally, the selection strategy of negative samples significantly affects the performance of contrastive learning methods. RESULTS In this study, we propose CCL-ASPS, a novel deep learning model that incorporates Collaborative Contrastive Learning (CCL) and Adaptive Self-Paced Sampling strategy (ASPS) for drug-target interaction prediction. CCL-ASPS leverages multiple networks to learn the fused embeddings of drugs and targets, ensuring their consistent representations from individual networks. Furthermore, ASPS dynamically selects more informative negative sample pairs for contrastive learning. Experiment results on the established dataset demonstrate that CCL-ASPS achieves significant improvements compared to current state-of-the-art methods. Moreover, ablation experiments confirm the contributions of the proposed CCL and ASPS strategies. CONCLUSIONS By integrating Collaborative Contrastive Learning and Adaptive Self-Paced Sampling, the proposed CCL-ASPS effectively addresses the limitations of previous methods. This study demonstrates that CCL-ASPS achieves notable improvements in DTI predictive performance compared to current state-of-the-art approaches. The case study and cold start experiments further illustrate the capability of CCL-ASPS to effectively predict previously unknown DTI, potentially facilitating the identification of new drug-target interactions.
Collapse
Affiliation(s)
- Zhen Tian
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, 450001, Henan, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Yue Yu
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, 450001, Henan, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Fengming Ni
- Department of Gastroenterology, The First Hospital of Jilin University, Changchun, 130021, China.
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| |
Collapse
|
9
|
Cingiz MÖ. Ensemble decision of local similarity indices on the biological network for disease related gene prediction. PeerJ 2024; 12:e17975. [PMID: 39247551 PMCID: PMC11380840 DOI: 10.7717/peerj.17975] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Accepted: 08/05/2024] [Indexed: 09/10/2024] Open
Abstract
Link prediction (LP) is a task for the identification of potential, missing and spurious links in complex networks. Protein-protein interaction (PPI) networks are important for understanding the underlying biological mechanisms of diseases. Many complex networks have been constructed using LP methods; however, there are a limited number of studies that focus on disease-related gene predictions and evaluate these genes using various evaluation criteria. The main objective of the study is to investigate the effect of a simple ensemble method in disease related gene predictions. Local similarity indices (LSIs) based disease related gene predictions were integrated by a simple ensemble decision method, simple majority voting (SMV), on the PPI network to detect accurate disease related genes. Human PPI network was utilized to discover potential disease related genes using four LSIs for the gene prediction. LSIs discovered potential links between disease related genes, which were obtained from OMIM database for gastric, colorectal, breast, prostate and lung cancers. LSIs based disease related genes were ranked due to their LSI scores in descending order for retrieving the top 10, 50 and 100 disease related genes. SMV integrated four LSIs based predictions to obtain SMV based the top 10, 50 and 100 disease related genes. The performance of LSIs based and SMV based genes were evaluated separately by employing overlap analyses, which were performed with GeneCard disease-gene relation dataset and Gene Ontology (GO) terms. The GO-terms were used for biological assessment for the inferred gene lists by LSIs and SMV on all cancer types. Adamic-Adar (AA), Resource Allocation Index (RAI), and SMV based gene lists are generally achieved good performance results on all cancers in both overlap analyses. SMV also outperformed on breast cancer data. The increment in the selection of the number of the top ranked disease related genes also enhanced the performance results of SMV.
Collapse
Affiliation(s)
- Mustafa Özgür Cingiz
- Department of Computer Engineering, Faculty of Engineering and Natural Sciences, Bursa Technical University, Bursa, Turkey
| |
Collapse
|
10
|
Ohnuki Y, Akiyama M, Sakakibara Y. Deep learning of multimodal networks with topological regularization for drug repositioning. J Cheminform 2024; 16:103. [PMID: 39180095 PMCID: PMC11342530 DOI: 10.1186/s13321-024-00897-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2024] [Accepted: 08/12/2024] [Indexed: 08/26/2024] Open
Abstract
MOTIVATION Computational techniques for drug-disease prediction are essential in enhancing drug discovery and repositioning. While many methods utilize multimodal networks from various biological databases, few integrate comprehensive multi-omics data, including transcriptomes, proteomes, and metabolomes. We introduce STRGNN, a novel graph deep learning approach that predicts drug-disease relationships using extensive multimodal networks comprising proteins, RNAs, metabolites, and compounds. We have constructed a detailed dataset incorporating multi-omics data and developed a learning algorithm with topological regularization. This algorithm selectively leverages informative modalities while filtering out redundancies. RESULTS STRGNN demonstrates superior accuracy compared to existing methods and has identified several novel drug effects, corroborating existing literature. STRGNN emerges as a powerful tool for drug prediction and discovery. The source code for STRGNN, along with the dataset for performance evaluation, is available at https://github.com/yuto-ohnuki/STRGNN.git .
Collapse
Affiliation(s)
- Yuto Ohnuki
- Department of Biosciences and Informatics, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, 223-8522, Japan
| | - Manato Akiyama
- Department of Biosciences and Informatics, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, 223-8522, Japan
| | - Yasubumi Sakakibara
- Department of Biosciences and Informatics, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, 223-8522, Japan.
| |
Collapse
|
11
|
Trevena W, Zhong X, Lal A, Rovati L, Cubro E, Dong Y, Schulte P, Gajic O. Model-driven engineering for digital twins: a graph model-based patient simulation application. Front Physiol 2024; 15:1424931. [PMID: 39189027 PMCID: PMC11345177 DOI: 10.3389/fphys.2024.1424931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2024] [Accepted: 07/19/2024] [Indexed: 08/28/2024] Open
Abstract
INTRODUCTION Digital twins of patients are virtual models that can create a digital patient replica to test clinical interventions in silico without exposing real patients to risk. With the increasing availability of electronic health records and sensor-derived patient data, digital twins offer significant potential for applications in the healthcare sector. METHODS This article presents a scalable full-stack architecture for a patient simulation application driven by graph-based models. This patient simulation application enables medical practitioners and trainees to simulate the trajectory of critically ill patients with sepsis. Directed acyclic graphs are utilized to model the complex underlying causal pathways that focus on the physiological interactions and medication effects relevant to the first 6 h of critical illness. To realize the sepsis patient simulation at scale, we propose an application architecture with three core components, a cross-platform frontend application that clinicians and trainees use to run the simulation, a simulation engine hosted in the cloud on a serverless function that performs all of the computations, and a graph database that hosts the graph model utilized by the simulation engine to determine the progression of each simulation. RESULTS A short case study is presented to demonstrate the viability of the proposed simulation architecture. DISCUSSION The proposed patient simulation application could help train future generations of healthcare professionals and could be used to facilitate clinicians' bedside decision-making.
Collapse
Affiliation(s)
- William Trevena
- Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL, United States
| | - Xiang Zhong
- Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL, United States
| | - Amos Lal
- Mayo Clinic, Rochester, MN, United States
| | | | - Edin Cubro
- Mayo Clinic, Rochester, MN, United States
| | - Yue Dong
- Mayo Clinic, Rochester, MN, United States
| | | | | |
Collapse
|
12
|
Li M, Wang Z, Liu L, Liu X, Zhang W. Subgraph-Aware Graph Kernel Neural Network for Link Prediction in Biological Networks. IEEE J Biomed Health Inform 2024; 28:4373-4381. [PMID: 38630566 DOI: 10.1109/jbhi.2024.3390092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2024]
Abstract
Identifying links within biological networks is important in various biomedical applications. Recent studies have revealed that each node in a network may play a unique role in different links, but most link prediction methods overlook distinctive node roles, hindering the acquisition of effective link representations. Subgraph-based methods have been introduced as solutions but often ignore shared information among subgraphs. To address these limitations, we propose a Subgraph-aware Graph Kernel Neural Network (SubKNet) for link prediction in biological networks. Specifically, SubKNet extracts a subgraph for each node pair and feeds it into a graph kernel neural network, which decomposes each subgraph into a combination of trainable graph filters with diversity regularization for subgraph-aware representation learning. Additionally, node embeddings of the network are extracted as auxiliary information, aiding in distinguishing node pairs that share the same subgraph. Extensive experiments on five biological networks demonstrate that SubKNet outperforms baselines, including methods especially designed for biological networks and methods adapted to various networks. Further investigations confirm that employing graph filters to subgraphs helps to distinguish node roles in different subgraphs, and the inclusion of diversity regularization further enhances its capacity from diverse perspectives, generating effective link representations that contribute to more accurate link prediction.
Collapse
|
13
|
Hu Y, Liao T, Chen J, Bian J, Zheng Z, Chen C. Migrate demographic group for fair Graph Neural Networks. Neural Netw 2024; 175:106264. [PMID: 38581810 DOI: 10.1016/j.neunet.2024.106264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Revised: 03/11/2024] [Accepted: 03/20/2024] [Indexed: 04/08/2024]
Abstract
Graph Neural networks (GNNs) have been applied in many scenarios due to the superior performance of graph learning. However, fairness is always ignored when designing GNNs. As a consequence, biased information in training data can easily affect vanilla GNNs, causing biased results toward particular demographic groups (divided by sensitive attributes, such as race and age). There have been efforts to address the fairness issue. However, existing fair techniques generally divide the demographic groups by raw sensitive attributes and assume that are fixed. The biased information correlated with raw sensitive attributes will run through the training process regardless of the implemented fair techniques. It is urgent to resolve this problem for training fair GNNs. To tackle this problem, we propose a brand new framework, FairMigration, which is able to migrate the demographic groups dynamically, instead of keeping that fixed with raw sensitive attributes. FairMigration is composed of two training stages. In the first stage, the GNNs are initially optimized by personalized self-supervised learning, and the demographic groups are adjusted dynamically. In the second stage, the new demographic groups are frozen and supervised learning is carried out under the constraints of new demographic groups and adversarial training. Extensive experiments reveal that FairMigration achieves a high trade-off between model performance and fairness.
Collapse
Affiliation(s)
- YanMing Hu
- School of Computer Science and Engineering, Sun Yat-sen University, GuangZhou, China.
| | - TianChi Liao
- School of Software Engineering, Sun Yat-sen University, ZhuHai, China.
| | - JiaLong Chen
- School of Computer Science and Engineering, Sun Yat-sen University, GuangZhou, China.
| | - Jing Bian
- School of Computer Science and Engineering, Sun Yat-sen University, GuangZhou, China.
| | - ZiBin Zheng
- School of Software Engineering, Sun Yat-sen University, ZhuHai, China.
| | - Chuan Chen
- School of Computer Science and Engineering, Sun Yat-sen University, GuangZhou, China.
| |
Collapse
|
14
|
Yang Z, Wang L, Zhang X, Zeng B, Zhang Z, Liu X. LCASPMDA: a computational model for predicting potential microbe-drug associations based on learnable graph convolutional attention networks and self-paced iterative sampling ensemble. Front Microbiol 2024; 15:1366272. [PMID: 38846568 PMCID: PMC11153849 DOI: 10.3389/fmicb.2024.1366272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2024] [Accepted: 05/06/2024] [Indexed: 06/09/2024] Open
Abstract
Introduction Numerous studies show that microbes in the human body are very closely linked to the human host and can affect the human host by modulating the efficacy and toxicity of drugs. However, discovering potential microbe-drug associations through traditional wet labs is expensive and time-consuming, hence, it is important and necessary to develop effective computational models to detect possible microbe-drug associations. Methods In this manuscript, we proposed a new prediction model named LCASPMDA by combining the learnable graph convolutional attention network and the self-paced iterative sampling ensemble strategy to infer latent microbe-drug associations. In LCASPMDA, we first constructed a heterogeneous network based on newly downloaded known microbe-drug associations. Then, we adopted the learnable graph convolutional attention network to learn the hidden features of nodes in the heterogeneous network. After that, we utilized the self-paced iterative sampling ensemble strategy to select the most informative negative samples to train the Multi-Layer Perceptron classifier and put the newly-extracted hidden features into the trained MLP classifier to infer possible microbe-drug associations. Results and discussion Intensive experimental results on two different public databases including the MDAD and the aBiofilm showed that LCASPMDA could achieve better performance than state-of-the-art baseline methods in microbe-drug association prediction.
Collapse
Affiliation(s)
| | - Lei Wang
- Big Data Innovation and Entrepreneurship Education Center of Hunan Province, Changsha University, Changsha, China
| | | | | | - Zhen Zhang
- Big Data Innovation and Entrepreneurship Education Center of Hunan Province, Changsha University, Changsha, China
| | - Xin Liu
- Big Data Innovation and Entrepreneurship Education Center of Hunan Province, Changsha University, Changsha, China
| |
Collapse
|
15
|
Thakur GK, Thakur A, Kulkarni S, Khan N, Khan S. Deep Learning Approaches for Medical Image Analysis and Diagnosis. Cureus 2024; 16:e59507. [PMID: 38826977 PMCID: PMC11144045 DOI: 10.7759/cureus.59507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Accepted: 05/01/2024] [Indexed: 06/04/2024] Open
Abstract
In addition to enhancing diagnostic accuracy, deep learning techniques offer the potential to streamline workflows, reduce interpretation time, and ultimately improve patient outcomes. The scalability and adaptability of deep learning algorithms enable their deployment across diverse clinical settings, ranging from radiology departments to point-of-care facilities. Furthermore, ongoing research efforts focus on addressing the challenges of data heterogeneity, model interpretability, and regulatory compliance, paving the way for seamless integration of deep learning solutions into routine clinical practice. As the field continues to evolve, collaborations between clinicians, data scientists, and industry stakeholders will be paramount in harnessing the full potential of deep learning for advancing medical image analysis and diagnosis. Furthermore, the integration of deep learning algorithms with other technologies, including natural language processing and computer vision, may foster multimodal medical data analysis and clinical decision support systems to improve patient care. The future of deep learning in medical image analysis and diagnosis is promising. With each success and advancement, this technology is getting closer to being leveraged for medical purposes. Beyond medical image analysis, patient care pathways like multimodal imaging, imaging genomics, and intelligent operating rooms or intensive care units can benefit from deep learning models.
Collapse
Affiliation(s)
- Gopal Kumar Thakur
- Department of Data Sciences, Harrisburg University of Science and Technology, Harrisburg, USA
| | - Abhishek Thakur
- Department of Data Sciences, Harrisburg University of Science and Technology, Harrisburg, USA
| | - Shridhar Kulkarni
- Department of Data Sciences, Harrisburg University of Science and Technology, Harrisburg, USA
| | - Naseebia Khan
- Department of Data Sciences, Harrisburg University of Science and Technology, Harrisburg, USA
| | - Shahnawaz Khan
- Department of Computer Application, Bundelkhand University, Jhansi, IND
| |
Collapse
|
16
|
Liu X, Hu J, Zheng J. SL-Miner: a web server for mining evidence and prioritization of cancer-specific synthetic lethality. Bioinformatics 2024; 40:btae016. [PMID: 38244572 PMCID: PMC10868331 DOI: 10.1093/bioinformatics/btae016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Revised: 12/10/2023] [Accepted: 01/16/2024] [Indexed: 01/22/2024] Open
Abstract
SUMMARY Synthetic lethality (SL) refers to a type of genetic interaction in which the simultaneous inactivation of two genes leads to cell death, while the inactivation of a single gene does not affect cell viability. It significantly expands the range of potential therapeutic targets for anti-cancer treatments. SL interactions are primarily identified through experimental screening and computational prediction. Although various computational methods have been proposed, they tend to ignore providing evidence to support their predictions of SL. Besides, they are rarely user-friendly for biologists who likely have limited programming skills. Moreover, the genetic context specificity of SL interactions is often not taken into consideration. Here, we introduce a web server called SL-Miner, which is designed to mine the evidence of SL relationships between a primary gene and a few candidate SL partner genes in a specific type of cancer, and to prioritize these candidate genes by integrating various types of evidence. For intuitive data visualization, SL-Miner provides a range of charts (e.g. volcano plot and box plot) to help users get insights from the data. AVAILABILITY AND IMPLEMENTATION SL-Miner is available at https://slminer.sist.shanghaitech.edu.cn.
Collapse
Affiliation(s)
- Xin Liu
- School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China
| | - Jieni Hu
- School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China
| | - Jie Zheng
- School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China
- Shanghai Engineering Research Center of Intelligent Vision and Imaging, Shanghai 201210, China
| |
Collapse
|
17
|
Son J, Kim D. Applying network link prediction in drug discovery: an overview of the literature. Expert Opin Drug Discov 2024; 19:43-56. [PMID: 37794688 DOI: 10.1080/17460441.2023.2267020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 10/02/2023] [Indexed: 10/06/2023]
Abstract
INTRODUCTION Network representation can give a holistic view of relationships for biomedical entities through network topology. Link prediction estimates the probability of link formation between the pair of unconnected nodes. In the drug discovery process, the link prediction method not only enables the detection of connectivity patterns but also predicts the effects of one biomedical entity to multiple entities simultaneously and vice versa, which is useful for many applications. AREAS COVERED The authors provide a comprehensive overview of network link prediction in drug discovery. Link prediction methodologies such as similarity-based approaches, embedding-based approaches, probabilistic model-based approaches, and preprocessing methods are summarized with examples. In addition to describing their properties and limitations, the authors discuss the applications of link prediction in drug discovery based on the relationship between biomedical concepts. EXPERT OPINION Link prediction is a powerful method to infer the existence of novel relationships in drug discovery. However, link prediction has been hampered by the sparsity of data and the lack of negative links in biomedical networks. With preprocessing to balance positive and negative samples and the collection of more data, the authors believe it is possible to develop more reliable link prediction methods that can become invaluable tools for successful drug discovery.
Collapse
Affiliation(s)
- Jeongtae Son
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
| | - Dongsup Kim
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
| |
Collapse
|
18
|
Wang Y, Li Z, Rao J, Yang Y, Dai Z. Gene based message passing for drug repurposing. iScience 2023; 26:107663. [PMID: 37670781 PMCID: PMC10475505 DOI: 10.1016/j.isci.2023.107663] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Revised: 08/06/2023] [Accepted: 08/14/2023] [Indexed: 09/07/2023] Open
Abstract
The medicinal effect of a drug acts through a series of genes, and the pathological mechanism of a disease is also related to genes with certain biological functions. However, the complex information between drug or disease and a series of genes is neglected by traditional message passing methods. In this study, we proposed a new framework using two different strategies for gene-drug/disease and drug-disease networks, respectively. We employ long short-term memory (LSTM) network to extract the flow of message from series of genes (gene path) to drug/disease. Incorporating the resulting information of gene paths into drug-disease network, we utilize graph convolutional network (GCN) to predict drug-disease associations. Experimental results showed that our method GeneDR (gene-based drug repurposing) makes better use of the information in gene paths, and performs better in predicting drug-disease associations.
Collapse
Affiliation(s)
- Yuxing Wang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Zhiyang Li
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Jiahua Rao
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Zhiming Dai
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| |
Collapse
|
19
|
Che L, Jin Y, Shi Y, Yu X, Sun H, Liu H, Li X. A drug molecular classification model based on graph structure generation. J Biomed Inform 2023; 145:104447. [PMID: 37481052 DOI: 10.1016/j.jbi.2023.104447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 07/14/2023] [Accepted: 07/16/2023] [Indexed: 07/24/2023]
Abstract
Molecular property prediction based on artificial intelligence technology has significant prospects in speeding up drug discovery and reducing drug discovery costs. Among them, molecular property prediction based on graph neural networks (GNNs) has received extensive attention in recent years. However, the existing graph neural networks still face the following challenges in node representation learning. First, the number of nodes increases exponentially with the expansion of the perception field, which limits the exploration ability of the model in the depth direction. Secondly, the large number of nodes in the perception field brings noise, which is not conducive to the model's representation learning of the key structures. Therefore, a graph neural network model based on structure generation is proposed in this paper. The model adopts the depth-first strategy to generate the key structures of the graph, to solve the problem of insufficient exploration ability of the graph neural network in the depth direction. A tendentious node selection method is designed to gradually select nodes and edges to generate the key structures of the graph, to solve the noise problem caused by the excessive number of nodes. In addition, the model skillfully realizes forward propagation and iterative optimization of structure generation by using an attention mechanism and random bias. Experimental results on public data sets show that the proposed model achieves better classification results than the existing best models.
Collapse
Affiliation(s)
- Lixuan Che
- College of Culture and Creativity, Weifang Vocational College, Weifang, China.
| | - Yide Jin
- Department of Statistics, University of Minnesota, Minneapolis, MN, USA.
| | - Yuliang Shi
- School of Software, Shandong University, Jinan, China; Dareway Software Co., Ltd, Jinan, China.
| | - Xiaojing Yu
- Department of Dermatology, Qilu Hospital, Shandong University, Jinan, China.
| | - Hongfeng Sun
- School of Data and Computer Science, Shandong Women's University, Jinan, China.
| | - Hui Liu
- School of Data and Computer Science, Shandong Women's University, Jinan, China.
| | - Xinyu Li
- Department of Dermatology, Qilu Hospital, Shandong University, Jinan, China.
| |
Collapse
|
20
|
Du BX, Long Y, Li X, Wu M, Shi JY. CMMS-GCL: cross-modality metabolic stability prediction with graph contrastive learning. Bioinformatics 2023; 39:btad503. [PMID: 37572298 PMCID: PMC10457661 DOI: 10.1093/bioinformatics/btad503] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Revised: 07/26/2023] [Accepted: 08/11/2023] [Indexed: 08/14/2023] Open
Abstract
MOTIVATION Metabolic stability plays a crucial role in the early stages of drug discovery and development. Accurately modeling and predicting molecular metabolic stability has great potential for the efficient screening of drug candidates as well as the optimization of lead compounds. Considering wet-lab experiment is time-consuming, laborious, and expensive, in silico prediction of metabolic stability is an alternative choice. However, few computational methods have been developed to address this task. In addition, it remains a significant challenge to explain key functional groups determining metabolic stability. RESULTS To address these issues, we develop a novel cross-modality graph contrastive learning model named CMMS-GCL for predicting the metabolic stability of drug candidates. In our framework, we design deep learning methods to extract features for molecules from two modality data, i.e. SMILES sequence and molecule graph. In particular, for the sequence data, we design a multihead attention BiGRU-based encoder to preserve the context of symbols to learn sequence representations of molecules. For the graph data, we propose a graph contrastive learning-based encoder to learn structure representations by effectively capturing the consistencies between local and global structures. We further exploit fully connected neural networks to combine the sequence and structure representations for model training. Extensive experimental results on two datasets demonstrate that our CMMS-GCL consistently outperforms seven state-of-the-art methods. Furthermore, a collection of case studies on sequence data and statistical analyses of the graph structure module strengthens the validation of the interpretability of crucial functional groups recognized by CMMS-GCL. Overall, CMMS-GCL can serve as an effective and interpretable tool for predicting metabolic stability, identifying critical functional groups, and thus facilitating the drug discovery process and lead compound optimization. AVAILABILITY AND IMPLEMENTATION The code and data underlying this article are freely available at https://github.com/dubingxue/CMMS-GCL.
Collapse
Affiliation(s)
- Bing-Xue Du
- School of Life Sciences, Northwestern Polytechnical University, Xi’an 710072, China
- Institute for Infocomm Research (IR), Agency for Science, Technology and Research (A*STAR), Singapore 138632, Singapore
| | - Yahui Long
- Singapore Immunology Network (SIgN), Agency for Science, Technology and Research (A*STAR), Singapore 138648, Singapore
| | - Xiaoli Li
- Institute for Infocomm Research (IR), Agency for Science, Technology and Research (A*STAR), Singapore 138632, Singapore
| | - Min Wu
- Institute for Infocomm Research (IR), Agency for Science, Technology and Research (A*STAR), Singapore 138632, Singapore
| | - Jian-Yu Shi
- School of Life Sciences, Northwestern Polytechnical University, Xi’an 710072, China
| |
Collapse
|
21
|
Zhang P, Wang Z, Sun W, Xu J, Zhang W, Wu K, Wong L, Li L. RDRGSE: A Framework for Noncoding RNA-Drug Resistance Discovery by Incorporating Graph Skeleton Extraction and Attentional Feature Fusion. ACS OMEGA 2023; 8:27386-27397. [PMID: 37546619 PMCID: PMC10398708 DOI: 10.1021/acsomega.3c02763] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/22/2023] [Accepted: 07/06/2023] [Indexed: 08/08/2023]
Abstract
Identifying noncoding RNAs (ncRNAs)-drug resistance association computationally would have a marked effect on understanding ncRNA molecular function and drug target mechanisms and alleviating the screening cost of corresponding biological wet experiments. Although graph neural network-based methods have been developed and facilitated the detection of ncRNAs related to drug resistance, it remains a challenge to explore a highly trusty ncRNA-drug resistance association prediction framework, due to inevitable noise edges originating from the batch effect and experimental errors. Herein, we proposed a framework, referred to as RDRGSE (RDR association prediction by using graph skeleton extraction and attentional feature fusion), for detecting ncRNA-drug resistance association. Specifically, starting with the construction of the original ncRNA-drug resistance association as a bipartite graph, RDRGSE took advantage of a bi-view skeleton extraction strategy to obtain two types of skeleton views, followed by a graph neural network-based estimator for iteratively optimizing skeleton views aimed at learning high-quality ncRNA-drug resistance edge embedding and optimal graph skeleton structure, jointly. Then, RDRGSE adopted adaptive attentional feature fusion to obtain final edge embedding and identified potential RDRAs under an end-to-end pattern. Comprehensive experiments were conducted, and experimental results indicated the significant advantage of a skeleton structure for ncRNA-drug resistance association discovery. Compared with state-of-the-art approaches, RDRGSE improved the prediction performance by 6.7% in terms of AUC and 6.1% in terms of AUPR. Also, ablation-like analysis and independent case studies corroborated RDRGSE generalization ability and robustness. Overall, RDRGSE provides a powerful computational method for ncRNA-drug resistance association prediction, which can also serve as a screening tool for drug resistance biomarkers.
Collapse
Affiliation(s)
- Ping Zhang
- Hubei
Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Zilin Wang
- Hubei
Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Weicheng Sun
- Hubei
Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Jinsheng Xu
- Hubei
Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Weihan Zhang
- Hubei
Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Kun Wu
- Department
of Biochemistry, University of California
Riverside, Riverside, California 92521, United States
| | - Leon Wong
- Guangxi
Key Lab of Human-Machine Interaction and Intelligent Decision, Guangxi Academy of Sciences, Nanning 530007, China
- Institute
of Machine Learning and Systems Biology, School of Electronics and
Information Engineering, Tongji University, Shanghai 200092, China
| | - Li Li
- Hubei
Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
- Hubei
Hongshan Laboratory, Huazhong Agricultural
University, Wuhan 430070, China
| |
Collapse
|
22
|
Wang C, Yuan C, Wang Y, Chen R, Shi Y, Zhang T, Xue F, Patti GJ, Wei L, Hou Q. MPI-VGAE: protein-metabolite enzymatic reaction link learning by variational graph autoencoders. Brief Bioinform 2023; 24:bbad189. [PMID: 37225420 PMCID: PMC10359079 DOI: 10.1093/bib/bbad189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 04/10/2023] [Accepted: 04/27/2023] [Indexed: 05/26/2023] Open
Abstract
Enzymatic reactions are crucial to explore the mechanistic function of metabolites and proteins in cellular processes and to understand the etiology of diseases. The increasing number of interconnected metabolic reactions allows the development of in silico deep learning-based methods to discover new enzymatic reaction links between metabolites and proteins to further expand the landscape of existing metabolite-protein interactome. Computational approaches to predict the enzymatic reaction link by metabolite-protein interaction (MPI) prediction are still very limited. In this study, we developed a Variational Graph Autoencoders (VGAE)-based framework to predict MPI in genome-scale heterogeneous enzymatic reaction networks across ten organisms. By incorporating molecular features of metabolites and proteins as well as neighboring information in the MPI networks, our MPI-VGAE predictor achieved the best predictive performance compared to other machine learning methods. Moreover, when applying the MPI-VGAE framework to reconstruct hundreds of metabolic pathways, functional enzymatic reaction networks and a metabolite-metabolite interaction network, our method showed the most robust performance among all scenarios. To the best of our knowledge, this is the first MPI predictor by VGAE for enzymatic reaction link prediction. Furthermore, we implemented the MPI-VGAE framework to reconstruct the disease-specific MPI network based on the disrupted metabolites and proteins in Alzheimer's disease and colorectal cancer, respectively. A substantial number of novel enzymatic reaction links were identified. We further validated and explored the interactions of these enzymatic reactions using molecular docking. These results highlight the potential of the MPI-VGAE framework for the discovery of novel disease-related enzymatic reactions and facilitate the study of the disrupted metabolisms in diseases.
Collapse
Affiliation(s)
- Cheng Wang
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, 250012, China
- National Institute of Health Data Science of China, Shandong University, Jinan, 250000, China
| | - Chuang Yuan
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, 250012, China
- National Institute of Health Data Science of China, Shandong University, Jinan, 250000, China
| | - Yahui Wang
- Department of Chemistry, Washington University in St. Louis, St. Louis, MO, 63130, USA
- Center for Metabolomics and Isotope Tracing, Washington University in St. Louis, St. Louis, MO, 63130, USA
| | - Ranran Chen
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, 250012, China
- National Institute of Health Data Science of China, Shandong University, Jinan, 250000, China
| | - Yuying Shi
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, 250012, China
- National Institute of Health Data Science of China, Shandong University, Jinan, 250000, China
| | - Tao Zhang
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, 250012, China
- National Institute of Health Data Science of China, Shandong University, Jinan, 250000, China
| | - Fuzhong Xue
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, 250012, China
- National Institute of Health Data Science of China, Shandong University, Jinan, 250000, China
| | - Gary J Patti
- Department of Chemistry, Washington University in St. Louis, St. Louis, MO, 63130, USA
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63130, USA
- Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO, 63130, USA
- Center for Metabolomics and Isotope Tracing, Washington University in St. Louis, St. Louis, MO, 63130, USA
| | - Leyi Wei
- School of Software, Shandong University, Jinan, 250100, China
| | - Qingzhen Hou
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, 250012, China
- National Institute of Health Data Science of China, Shandong University, Jinan, 250000, China
| |
Collapse
|
23
|
Mangione W, Falls Z, Samudrala R. Effective holistic characterization of small molecule effects using heterogeneous biological networks. Front Pharmacol 2023; 14:1113007. [PMID: 37180722 PMCID: PMC10169664 DOI: 10.3389/fphar.2023.1113007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 04/11/2023] [Indexed: 05/16/2023] Open
Abstract
The two most common reasons for attrition in therapeutic clinical trials are efficacy and safety. We integrated heterogeneous data to create a human interactome network to comprehensively describe drug behavior in biological systems, with the goal of accurate therapeutic candidate generation. The Computational Analysis of Novel Drug Opportunities (CANDO) platform for shotgun multiscale therapeutic discovery, repurposing, and design was enhanced by integrating drug side effects, protein pathways, protein-protein interactions, protein-disease associations, and the Gene Ontology, and complemented with its existing drug/compound, protein, and indication libraries. These integrated networks were reduced to a "multiscale interactomic signature" for each compound that describe its functional behavior as vectors of real values. These signatures are then used for relating compounds to each other with the hypothesis that similar signatures yield similar behavior. Our results indicated that there is significant biological information captured within our networks (particularly via side effects) which enhance the performance of our platform, as evaluated by performing all-against-all leave-one-out drug-indication association benchmarking as well as generating novel drug candidates for colon cancer and migraine disorders corroborated via literature search. Further, drug impacts on pathways derived from computed compound-protein interaction scores served as the features for a random forest machine learning model trained to predict drug-indication associations, with applications to mental disorders and cancer metastasis highlighted. This interactomic pipeline highlights the ability of Computational Analysis of Novel Drug Opportunities to accurately relate drugs in a multitarget and multiscale context, particularly for generating putative drug candidates using the information gleaned from indirect data such as side effect profiles and protein pathway information.
Collapse
Affiliation(s)
| | | | - Ram Samudrala
- Jacobs School of Medicine and Biomedical Sciences, Department of Biomedical Informatics, University at Buffalo, Buffalo, NY, United States
| |
Collapse
|
24
|
Tian Z, Yu Y, Fang H, Xie W, Guo M. Predicting microbe-drug associations with structure-enhanced contrastive learning and self-paced negative sampling strategy. Brief Bioinform 2023; 24:7009077. [PMID: 36715986 DOI: 10.1093/bib/bbac634] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 12/19/2022] [Accepted: 12/29/2022] [Indexed: 01/31/2023] Open
Abstract
MOTIVATION Predicting the associations between human microbes and drugs (MDAs) is one critical step in drug development and precision medicine areas. Since discovering these associations through wet experiments is time-consuming and labor-intensive, computational methods have already been an effective way to tackle this problem. Recently, graph contrastive learning (GCL) approaches have shown great advantages in learning the embeddings of nodes from heterogeneous biological graphs (HBGs). However, most GCL-based approaches don't fully capture the rich structure information in HBGs. Besides, fewer MDA prediction methods could screen out the most informative negative samples for effectively training the classifier. Therefore, it still needs to improve the accuracy of MDA predictions. RESULTS In this study, we propose a novel approach that employs the Structure-enhanced Contrastive learning and Self-paced negative sampling strategy for Microbe-Drug Association predictions (SCSMDA). Firstly, SCSMDA constructs the similarity networks of microbes and drugs, as well as their different meta-path-induced networks. Then SCSMDA employs the representations of microbes and drugs learned from meta-path-induced networks to enhance their embeddings learned from the similarity networks by the contrastive learning strategy. After that, we adopt the self-paced negative sampling strategy to select the most informative negative samples to train the MLP classifier. Lastly, SCSMDA predicts the potential microbe-drug associations with the trained MLP classifier. The embeddings of microbes and drugs learning from the similarity networks are enhanced with the contrastive learning strategy, which could obtain their discriminative representations. Extensive results on three public datasets indicate that SCSMDA significantly outperforms other baseline methods on the MDA prediction task. Case studies for two common drugs could further demonstrate the effectiveness of SCSMDA in finding novel MDA associations. AVAILABILITY The source code is publicly available on GitHub https://github.com/Yue-Yuu/SCSMDA-master.
Collapse
Affiliation(s)
- Zhen Tian
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
| | - Yue Yu
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
| | - Haichuan Fang
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
| | - Weixin Xie
- Institute of Intelligent System and Bioinformatics, College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, 150000, China
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, 100044, Beijing, China
| |
Collapse
|
25
|
Wang C, Yuan C, Wang Y, Chen R, Shi Y, Patti GJ, Hou Q. Genome-scale enzymatic reaction prediction by variational graph autoencoders. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.08.531729. [PMID: 36945484 PMCID: PMC10028866 DOI: 10.1101/2023.03.08.531729] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/14/2023]
Abstract
Background Enzymatic reaction networks are crucial to explore the mechanistic function of metabolites and proteins in biological systems and understanding the etiology of diseases and potential target for drug discovery. The increasing number of metabolic reactions allows the development of deep learning-based methods to discover new enzymatic reactions, which will expand the landscape of existing enzymatic reaction networks to investigate the disrupted metabolisms in diseases. Results In this study, we propose the MPI-VGAE framework to predict metabolite-protein interactions (MPI) in a genome-scale heterogeneous enzymatic reaction network across ten organisms with thousands of enzymatic reactions. We improved the Variational Graph Autoencoders (VGAE) model to incorporate both molecular features of metabolites and proteins as well as neighboring features to achieve the best predictive performance of MPI. The MPI-VGAE framework showed robust performance in the reconstruction of hundreds of metabolic pathways and five functional enzymatic reaction networks. The MPI-VGAE framework was also applied to a homogenous metabolic reaction network and achieved as high performance as other state-of-art methods. Furthermore, the MPI-VGAE framework could be implemented to reconstruct the disease-specific MPI network based on hundreds of disrupted metabolites and proteins in Alzheimer's disease and colorectal cancer, respectively. A substantial number of new potential enzymatic reactions were predicted and validated by molecular docking. These results highlight the potential of the MPI-VGAE framework for the discovery of novel disease-related enzymatic reactions and drug targets in real-world applications. Data availability and implementation The MPI-VGAE framework and datasets are publicly accessible on GitHub https://github.com/mmetalab/mpi-vgae . Author Biographies Cheng Wang received his Ph.D. in Chemistry from The Ohio State Univesity, USA. He is currently a Assistant Professor in School of Public Health at Shandong University, China. His research interests include bioinformatics, machine learning-based approach with applications to biomedical networks. Chuang Yuan is a research assistant at Shandong University. He obtained the MS degree in Biology at the University of Science and Technology of China. His research interests include biochemistry & molecular biology, cell biology, biomedicine, bioinformatics, and computational biology. Yahui Wang is a PhD student in Department of Chemistry at Washington University in St. Louis. Her research interests include biochemistry, mass spectrometry-based metabolomics, and cancer metabolism. Ranran Chen is a master graduate student in School of Public Health at University of Shandong, China. Yuying Shi is a master graduate student in School of Public Health at University of Shandong, China. Gary J. Patti is the Michael and Tana Powell Professor at Washington University in St. Louis, where he holds appointments in the Department of Chemisrty and the Department of Medicine. He is also the Senior Director of the Center for Metabolomics and Isotope Tracing at Washington University. His research interests include metabolomics, bioinformatics, high-throughput mass spectrometry, environmental health, cancer, and aging. Leyi Wei received his Ph.D. in Computer Science from Xiamen University, China. He is currently a Professor in School of Software at Shandong University, China. His research interests include machine learning and its applications to bioinformatics. Qingzhen Hou received his Ph.D. in the Centre for Integrative Bioinformatics VU (IBIVU) from Vrije Universiteit Amsterdam, the Netherlands. Since 2020, He has serveved as the head of Bioinformatics Center in National Institute of Health Data Science of China and Assistant Professor in School of Public Health, Shandong University, China. His areas of research are bioinformatics and computational biophysics. Key points Genome-scale heterogeneous networks of metabolite-protein interaction (MPI) based on thousands of enzymatic reactions across ten organisms were constructed semi-automatically.An enzymatic reaction prediction method called Metabolite-Protein Interaction Variational Graph Autoencoders (MPI-VGAE) was developed and optimized to achieve higher performance compared with existing machine learning methods by using both molecular features of metabolites and proteins.MPI-VGAE is broadly useful for applications involving the reconstruction of metabolic pathways, functional enzymatic reaction networks, and homogenous networks (e.g., metabolic reaction networks).By implementing MPI-VGAE to Alzheimer's disease and colorectal cancer, we obtained several novel disease-related protein-metabolite reactions with biological meanings. Moreover, we further investigated the reasonable binding details of protein-metabolite interactions using molecular docking approaches which provided useful information for disease mechanism and drug design.
Collapse
|
26
|
Chandak P, Huang K, Zitnik M. Building a knowledge graph to enable precision medicine. Sci Data 2023; 10:67. [PMID: 36732524 PMCID: PMC9893183 DOI: 10.1038/s41597-023-01960-3] [Citation(s) in RCA: 100] [Impact Index Per Article: 50.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Accepted: 01/11/2023] [Indexed: 02/04/2023] Open
Abstract
Developing personalized diagnostic strategies and targeted treatments requires a deep understanding of disease biology and the ability to dissect the relationship between molecular and genetic factors and their phenotypic consequences. However, such knowledge is fragmented across publications, non-standardized repositories, and evolving ontologies describing various scales of biological organization between genotypes and clinical phenotypes. Here, we present PrimeKG, a multimodal knowledge graph for precision medicine analyses. PrimeKG integrates 20 high-quality resources to describe 17,080 diseases with 4,050,249 relationships representing ten major biological scales, including disease-associated protein perturbations, biological processes and pathways, anatomical and phenotypic scales, and the entire range of approved drugs with their therapeutic action, considerably expanding previous efforts in disease-rooted knowledge graphs. PrimeKG contains an abundance of 'indications', 'contradictions', and 'off-label use' drug-disease edges that lack in other knowledge graphs and can support AI analyses of how drugs affect disease-associated networks. We supplement PrimeKG's graph structure with language descriptions of clinical guidelines to enable multimodal analyses and provide instructions for continual updates of PrimeKG as new data become available.
Collapse
Affiliation(s)
- Payal Chandak
- Harvard-MIT Program in Health Sciences and Technology, Cambridge, MA, 02139, USA
| | - Kexin Huang
- Department of Computer Science, Stanford University, Stanford, CA, 94305, USA
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Harvard University, Boston, MA, 02115, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.
- Harvard Data Science Initiative, Cambridge, MA, 02138, USA.
| |
Collapse
|
27
|
Temiz M, Bakir-Gungor B, Güner Şahan P, Coskun M. Topological feature generation for link prediction in biological networks. PeerJ 2023; 11:e15313. [PMID: 37187525 PMCID: PMC10178302 DOI: 10.7717/peerj.15313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 04/06/2023] [Indexed: 05/17/2023] Open
Abstract
Graph or network embedding is a powerful method for extracting missing or potential information from interactions between nodes in biological networks. Graph embedding methods learn representations of nodes and interactions in a graph with low-dimensional vectors, which facilitates research to predict potential interactions in networks. However, most graph embedding methods suffer from high computational costs in the form of high computational complexity of the embedding methods and learning times of the classifier, as well as the high dimensionality of complex biological networks. To address these challenges, in this study, we use the Chopper algorithm as an alternative approach to graph embedding, which accelerates the iterative processes and thus reduces the running time of the iterative algorithms for three different (nervous system, blood, heart) undirected protein-protein interaction (PPI) networks. Due to the high dimensionality of the matrix obtained after the embedding process, the data are transformed into a smaller representation by applying feature regularization techniques. We evaluated the performance of the proposed method by comparing it with state-of-the-art methods. Extensive experiments demonstrate that the proposed approach reduces the learning time of the classifier and performs better in link prediction. We have also shown that the proposed embedding method is faster than state-of-the-art methods on three different PPI datasets.
Collapse
Affiliation(s)
- Mustafa Temiz
- Department of Computer Engineering, Abdullah Gul University, Kayseri, Turkey
| | - Burcu Bakir-Gungor
- Department of Computer Engineering, Abdullah Gul University, Kayseri, Turkey
| | - Pınar Güner Şahan
- Department of Computer Engineering, Abdullah Gul University, Kayseri, Turkey
| | - Mustafa Coskun
- Department of Artificial Intelligence and Big Data Engineering, Ankara University, Ankara, Turkey
| |
Collapse
|
28
|
Jiang C, Ngo V, Chapman R, Yu Y, Liu H, Jiang G, Zong N. Deep Denoising of Raw Biomedical Knowledge Graph from COVID-19 Literature, LitCovid and Pubtator. J Med Internet Res 2022; 24:e38584. [PMID: 35658098 PMCID: PMC9301549 DOI: 10.2196/38584] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Revised: 05/20/2022] [Accepted: 05/30/2022] [Indexed: 12/05/2022] Open
Abstract
Background Multiple types of biomedical associations of knowledge graphs, including COVID-19–related ones, are constructed based on co-occurring biomedical entities retrieved from recent literature. However, the applications derived from these raw graphs (eg, association predictions among genes, drugs, and diseases) have a high probability of false-positive predictions as co-occurrences in the literature do not always mean there is a true biomedical association between two entities. Objective Data quality plays an important role in training deep neural network models; however, most of the current work in this area has been focused on improving a model’s performance with the assumption that the preprocessed data are clean. Here, we studied how to remove noise from raw knowledge graphs with limited labeled information. Methods The proposed framework used generative-based deep neural networks to generate a graph that can distinguish the unknown associations in the raw training graph. Two generative adversarial network models, NetGAN and Cross-Entropy Low-rank Logits (CELL), were adopted for the edge classification (ie, link prediction), leveraging unlabeled link information based on a real knowledge graph built from LitCovid and Pubtator. Results The performance of link prediction, especially in the extreme case of training data versus test data at a ratio of 1:9, demonstrated that the proposed method still achieved favorable results (area under the receiver operating characteristic curve >0.8 for the synthetic data set and 0.7 for the real data set), despite the limited amount of testing data available. Conclusions Our preliminary findings showed the proposed framework achieved promising results for removing noise during data preprocessing of the biomedical knowledge graph, potentially improving the performance of downstream applications by providing cleaner data.
Collapse
Affiliation(s)
| | - Victoria Ngo
- University of California Davis Health, Sacramento, US
| | | | - Yue Yu
- Mayo Clinic, Rochester, US
| | | | | | - Nansu Zong
- Mayo Clinic, 205 3rd Ave SW, Rochester, US
| |
Collapse
|