1
|
Wang Z, Meng J, Li H, Dai Q, Lin X, Luan Y. Attention-augmented multi-domain cooperative graph representation learning for molecular interaction prediction. Neural Netw 2025; 186:107265. [PMID: 39987715 DOI: 10.1016/j.neunet.2025.107265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2024] [Revised: 01/23/2025] [Accepted: 02/07/2025] [Indexed: 02/25/2025]
Abstract
Accurate identification of molecular interactions is crucial for biological network analysis, which can provide valuable insights into fundamental regulatory mechanisms. Despite considerable progress driven by computational advancements, existing methods often rely on task-specific prior knowledge or inherent structural properties of molecules, which limits their generalizability and applicability. Recently, graph-based methods have emerged as a promising approach for predicting links in molecular networks. However, most of these methods focus primarily on aggregating topological information within individual domains, leading to an inadequate characterization of molecular interactions. To mitigate these challenges, we propose AMCGRL, a generalized multi-domain cooperative graph representation learning framework for multifarious molecular interaction prediction tasks. Concretely, AMCGRL incorporates multiple graph encoders to simultaneously learn molecular representations from both intra-domain and inter-domain graphs in a comprehensive manner. Then, the cross-domain decoder is employed to bridge these graph encoders to facilitate the extraction of task-relevant information across different domains. Furthermore, a hierarchical mutual attention mechanism is developed to capture complex pairwise interaction patterns between distinct types of molecules through inter-molecule communicative learning. Extensive experiments conducted on the various datasets demonstrate the superior representation learning capability of AMCGRL compared to the state-of-the-art methods, proving its effectiveness in advancing the prediction of molecular interactions.
Collapse
Affiliation(s)
- Zhaowei Wang
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China.
| | - Jun Meng
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China.
| | - Haibin Li
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China.
| | - Qiguo Dai
- School of Computer Science and Engineering, Dalian Minzu University, Dalian 116600, China.
| | - Xiaohui Lin
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China.
| | - Yushi Luan
- School of Bioengineering, Dalian University of Technology, Dalian 116024, China.
| |
Collapse
|
2
|
Hu B, Su Y, Tian X, Chen C, Chen C, Lv X. GMAMDA: Predicting Metabolite-Disease Associations Based on Adaptive Hardness Negative Sampling and Adaptive Graph Multiple Convolution. J Chem Inf Model 2025; 65:5242-5254. [PMID: 40372801 DOI: 10.1021/acs.jcim.5c00694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/17/2025]
Abstract
Metabolites are small molecules produced during organism metabolism, with their abnormal concentrations closely linked to the onset and progression of various diseases. Accurate prediction of metabolite-disease associations is crucial for early diagnosis, mechanistic exploration, and treatment optimization. However, existing algorithms often overlook the integration of node features and neglect the impact of different hop domains on nodes in the processing of heterogeneous graphs. Furthermore, current methods solely rely on random sampling for selecting negative samples without considering their reliability, thereby compromising model stability. A novel metabolite-disease association prediction model, GMAMDA, is proposed to address these challenges. GMAMDA integrates adaptive hardness negative sampling, adaptive graph multiple convolution techniques, and a multiheterogeneous graph fusion strategy to forecast potential metabolite-disease associations. Initially, by computing multisource similarity information for metabolites and diseases, multiple heterogeneous graph networks are established for metabolite-disease association networks. Subsequently, the adaptive graph's multiconvolution mechanism is employed to generate feature-rich node representations across various heterogeneous graphs by dynamically leveraging information from different hop neighborhoods. The model then utilizes an adaptive hardness negative sampling approach based on principal component analysis to select negative samples with the highest information content for training, enabling the prediction of potential associations between new metabolites and diseases. Experimental findings demonstrate that GMAMDA outperforms state-of-the-art methods across various evaluation metrics, including AUC (0.9962 ± 0.0014), AUPR (0.9967 ± 0.0009), and accuracy (0.9733 ± 0.0042). Case studies focusing on Alzheimer's disease and kidney disease further validate GMAMDA's clinical potential in predicting metabolite markers.
Collapse
Affiliation(s)
- Binglu Hu
- College of Software, Xinjiang University, Urumqi 830046, Xinjiang, China
| | - Ying Su
- College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
- Xinjiang Aiqiside Detection Technology Co, Ltd, Urumqi 830063, China
| | - Xuecong Tian
- College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
| | - Chen Chen
- College of Software, Xinjiang University, Urumqi 830046, Xinjiang, China
| | - Cheng Chen
- College of Software, Xinjiang University, Urumqi 830046, Xinjiang, China
| | - Xiaoyi Lv
- College of Software, Xinjiang University, Urumqi 830046, Xinjiang, China
| |
Collapse
|
3
|
Yang Y, Sun Y, Li F, Guan B, Liu JX, Shang J. MGCNRF: Prediction of Disease-Related miRNAs Based on Multiple Graph Convolutional Networks and Random Forest. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:15701-15709. [PMID: 37459265 DOI: 10.1109/tnnls.2023.3289182] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/01/2024]
Abstract
Increasing microRNAs (miRNAs) have been confirmed to be inextricably linked to various diseases, and the discovery of their associations has become a routine way of treating diseases. To overcome the time-consuming and laborious shortcoming of traditional experiments in verifying the associations of miRNAs and diseases (MDAs), a variety of computational methods have emerged. However, these methods still have many shortcomings in terms of predictive performance and accuracy. In this study, a model based on multiple graph convolutional networks and random forest (MGCNRF) was proposed for the prediction MDAs. Specifically, MGCNRF first mapped miRNA functional similarity and sequence similarity, disease semantic similarity and target similarity, and the known MDAs into four different two-layer heterogeneous networks. Second, MGCNRF applied four heterogeneous networks into four different layered attention graph convolutional networks (GCNs), respectively, to extract MDA embeddings. Finally, MGCNRF integrated the embeddings of every MDA into the features of the miRNA-disease pair and predicted potential MDAs through the random forest (RF). Fivefold cross-validation was applied to verify the prediction performance of MGCNRF, which outperforms the other seven state-of-the-art methods by area under curve. Furthermore, the accuracy and the case studies of different diseases further demonstrate the scientific rationale of MGCNRF. In conclusion, MGCNRF can serve as a scientific tool for predicting potential MDAs.
Collapse
|
4
|
Lu Q, Zhou Z, Wang Q. Multi-layer graph attention neural networks for accurate drug-target interaction mapping. Sci Rep 2024; 14:26119. [PMID: 39478027 PMCID: PMC11525987 DOI: 10.1038/s41598-024-75742-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Accepted: 10/08/2024] [Indexed: 11/02/2024] Open
Abstract
In the crucial process of drug discovery and repurposing, precise prediction of drug-target interactions (DTIs) is paramount. This study introduces a novel DTI prediction approach-Multi-Layer Graph Attention Neural Network (MLGANN), through a groundbreaking computational framework that effectively harnesses multi-source information to enhance prediction accuracy. MLGANN not only strides forward in constructing a multi-layer DTI network by capturing both direct interactions between drugs and targets as well as their multi-level information but also amalgamates Graph Convolutional Networks (GCN) with a self-attention mechanism to comprehensively integrate diverse data sources. This method exhibited significant performance surpassing existing approaches in comparative experiments, underscoring its immense potential in elevating the efficiency and accuracy of DTI predictions. More importantly, this study accentuates the significance of considering multi-source data information and network heterogeneity in the drug discovery process, offering new perspectives and tools for future pharmaceutical research.
Collapse
Affiliation(s)
- Qianwen Lu
- SDU-ANU Joint Science College, Shandong University, Weihai, 264209, Shandong, China
| | - Zhiheng Zhou
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, 100190, China
| | - Qi Wang
- College of Science, China Agricultural University, Beijing, 100083, China.
| |
Collapse
|
5
|
Pang H, Wei S, Du Z, Zhao Y, Cai S, Zhao Y. Graph Representation Learning Based on Specific Subgraphs for Biomedical Interaction Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1552-1564. [PMID: 38767994 DOI: 10.1109/tcbb.2024.3402741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Discovering the novel associations of biomedical entities is of great significance and can facilitate not only the identification of network biomarkers of disease but also the search for putative drug targets.Graph representation learning (GRL) has incredible potential to efficiently predict the interactions from biomedical networks by modeling the robust representation for each node.> However, the current GRL-based methods learn the representation of nodes by aggregating the features of their neighbors with equal weights. Furthermore, they also fail to identify which features of higher-order neighbors are integrated into the representation of the central node. In this work, we propose a novel graph representation learning framework: a multi-order graph neural network based on reconstructed specific subgraphs (MGRS) for biomedical interaction prediction. In the MGRS, we apply the multi-order graph aggregation module (MOGA) to learn the wide-view representation by integrating the multi-hop neighbor features. Besides, we propose a subgraph selection module (SGSM) to reconstruct the specific subgraph with adaptive edge weights for each node. SGSM can clearly explore the dependency of the node representation on the neighbor features and learn the subgraph-based representation based on the reconstructed weighted subgraphs. Extensive experimental results on four public biomedical networks demonstrate that the MGRS performs better and is more robust than the latest baselines.
Collapse
|
6
|
Li M, Wang Z, Liu L, Liu X, Zhang W. Subgraph-Aware Graph Kernel Neural Network for Link Prediction in Biological Networks. IEEE J Biomed Health Inform 2024; 28:4373-4381. [PMID: 38630566 DOI: 10.1109/jbhi.2024.3390092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2024]
Abstract
Identifying links within biological networks is important in various biomedical applications. Recent studies have revealed that each node in a network may play a unique role in different links, but most link prediction methods overlook distinctive node roles, hindering the acquisition of effective link representations. Subgraph-based methods have been introduced as solutions but often ignore shared information among subgraphs. To address these limitations, we propose a Subgraph-aware Graph Kernel Neural Network (SubKNet) for link prediction in biological networks. Specifically, SubKNet extracts a subgraph for each node pair and feeds it into a graph kernel neural network, which decomposes each subgraph into a combination of trainable graph filters with diversity regularization for subgraph-aware representation learning. Additionally, node embeddings of the network are extracted as auxiliary information, aiding in distinguishing node pairs that share the same subgraph. Extensive experiments on five biological networks demonstrate that SubKNet outperforms baselines, including methods especially designed for biological networks and methods adapted to various networks. Further investigations confirm that employing graph filters to subgraphs helps to distinguish node roles in different subgraphs, and the inclusion of diversity regularization further enhances its capacity from diverse perspectives, generating effective link representations that contribute to more accurate link prediction.
Collapse
|
7
|
Ouyang D, Liang Y, Wang J, Li L, Ai N, Feng J, Lu S, Liao S, Liu X, Xie S. HGCLAMIR: Hypergraph contrastive learning with attention mechanism and integrated multi-view representation for predicting miRNA-disease associations. PLoS Comput Biol 2024; 20:e1011927. [PMID: 38652712 PMCID: PMC11037542 DOI: 10.1371/journal.pcbi.1011927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 02/19/2024] [Indexed: 04/25/2024] Open
Abstract
Existing studies have shown that the abnormal expression of microRNAs (miRNAs) usually leads to the occurrence and development of human diseases. Identifying disease-related miRNAs contributes to studying the pathogenesis of diseases at the molecular level. As traditional biological experiments are time-consuming and expensive, computational methods have been used as an effective complement to infer the potential associations between miRNAs and diseases. However, most of the existing computational methods still face three main challenges: (i) learning of high-order relations; (ii) insufficient representation learning ability; (iii) importance learning and integration of multi-view embedding representation. To this end, we developed a HyperGraph Contrastive Learning with view-aware Attention Mechanism and Integrated multi-view Representation (HGCLAMIR) model to discover potential miRNA-disease associations. First, hypergraph convolutional network (HGCN) was utilized to capture high-order complex relations from hypergraphs related to miRNAs and diseases. Then, we combined HGCN with contrastive learning to improve and enhance the embedded representation learning ability of HGCN. Moreover, we introduced view-aware attention mechanism to adaptively weight the embedded representations of different views, thereby obtaining the importance of multi-view latent representations. Next, we innovatively proposed integrated representation learning to integrate the embedded representation information of multiple views for obtaining more reasonable embedding information. Finally, the integrated representation information was fed into a neural network-based matrix completion method to perform miRNA-disease association prediction. Experimental results on the cross-validation set and independent test set indicated that HGCLAMIR can achieve better prediction performance than other baseline models. Furthermore, the results of case studies and enrichment analysis further demonstrated the accuracy of HGCLAMIR and unconfirmed potential associations had biological significance.
Collapse
Affiliation(s)
- Dong Ouyang
- Peng Cheng Laboratory, Shenzhen, China
- School of Biomedical Engineering, Guangdong Medical University, Dongguan, China
| | - Yong Liang
- Peng Cheng Laboratory, Shenzhen, China
- Pazhou Laboratory (Huangpu), Guangzhou, China
| | - Jinfeng Wang
- College of Mathematics and Informatics, South China Agricultural University, Guangzhou, China
| | - Le Li
- School of Computer Science and Engineering, Faculty of Innovation Engineering, Macau University of Science and Technology, Macau, China
| | - Ning Ai
- School of Computer Science and Engineering, Faculty of Innovation Engineering, Macau University of Science and Technology, Macau, China
| | - Junning Feng
- School of Computer Science and Engineering, Faculty of Innovation Engineering, Macau University of Science and Technology, Macau, China
| | - Shanghui Lu
- School of Computer Science and Engineering, Faculty of Innovation Engineering, Macau University of Science and Technology, Macau, China
| | - Shuilin Liao
- School of Computer Science and Engineering, Faculty of Innovation Engineering, Macau University of Science and Technology, Macau, China
| | - Xiaoying Liu
- Computer Engineering Technical College, Guangdong Polytechnic of Science and Technology, Zhuhai, China
| | - Shengli Xie
- Guangdong-HongKong-Macao Joint Laboratory for Smart Discrete Manufacturing, Guangzhou, China
| |
Collapse
|
8
|
Ma Y, Zhao Y, Ma Y. Kernel Bayesian nonlinear matrix factorization based on variational inference for human-virus protein-protein interaction prediction. Sci Rep 2024; 14:5693. [PMID: 38454139 PMCID: PMC10920681 DOI: 10.1038/s41598-024-56208-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Accepted: 03/04/2024] [Indexed: 03/09/2024] Open
Abstract
Identification of potential human-virus protein-protein interactions (PPIs) contributes to the understanding of the mechanisms of viral infection and to the development of antiviral drugs. Existing computational models often have more hyperparameters that need to be adjusted manually, which limits their computational efficiency and generalization ability. Based on this, this study proposes a kernel Bayesian logistic matrix decomposition model with automatic rank determination, VKBNMF, for the prediction of human-virus PPIs. VKBNMF introduces auxiliary information into the logistic matrix decomposition and sets the prior probabilities of the latent variables to build a Bayesian framework for automatic parameter search. In addition, we construct the variational inference framework of VKBNMF to ensure the solution efficiency. The experimental results show that for the scenarios of paired PPIs, VKBNMF achieves an average AUPR of 0.9101, 0.9316, 0.8727, and 0.9517 on the four benchmark datasets, respectively, and for the scenarios of new human (viral) proteins, VKBNMF still achieves a higher hit rate. The case study also further demonstrated that VKBNMF can be used as an effective tool for the prediction of human-virus PPIs.
Collapse
Affiliation(s)
- Yingjun Ma
- School of Mathematics and Statistics, Xiamen University of Technology, Xiamen, China
| | - Yongbiao Zhao
- School of Computer, Central China Normal University, Wuhan, China
| | - Yuanyuan Ma
- School of Computer Engineering, Hubei University of Arts and Science, Xiangyang, China.
- Hubei Key Laboratory of Power System Design and Test for Electrical Vehicle, Hubei University of Arts and Science, Xiangyang, China.
| |
Collapse
|
9
|
Lei S, Lei X, Chen M, Pan Y. Drug Repositioning Based on Deep Sparse Autoencoder and Drug-Disease Similarity. Interdiscip Sci 2024; 16:160-175. [PMID: 38103130 DOI: 10.1007/s12539-023-00593-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Revised: 11/03/2023] [Accepted: 11/06/2023] [Indexed: 12/17/2023]
Abstract
Drug repositioning is critical to drug development. Previous drug repositioning methods mainly constructed drug-disease heterogeneous networks to extract drug-disease features. However, these methods faced difficulty when we are using structurally simple models to deal with complex heterogeneous networks. Therefore, in this study, the researchers introduced a drug repositioning method named DRDSA. The method utilizes a deep sparse autoencoder and integrates drug-disease similarities. First, the researchers constructed a drug-disease feature network by incorporating information from drug chemical structure, disease semantic data, and existing known drug-disease associations. Then, we learned the low-dimensional representation of the feature network using a deep sparse autoencoder. Finally, we utilized a deep neural network to make predictions on new drug-disease associations based on the feature representation. The experimental results show that our proposed method has achieved optimal results on all four benchmark datasets, especially on the CTD dataset where AUC and AUPR reached 0.9619 and 0.9676, respectively, outperforming other baseline methods. In the case study, the researchers predicted the top ten antiviral drugs for COVID-19. Remarkably, six out of these predictions were subsequently validated by other literature sources.
Collapse
Affiliation(s)
- Song Lei
- School of Computer Science, Shaanxi Normal University, Xi'an, 710119, China
| | - Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an, 710119, China.
| | - Ming Chen
- College of Information Science and Engineering, Hunan Normal University, Changsha, 410081, China
| | - Yi Pan
- Faculty of Computer Science and Control Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
- Shenzhen Key Laboratory of Intelligent Bioinformatics, Shenzhen Institute of Advanced Technology, Shenzhen, 518055, China.
| |
Collapse
|
10
|
Xuan P, Xiu J, Cui H, Zhang X, Nakaguchi T, Zhang T. Complementary feature learning across multiple heterogeneous networks and multimodal attribute learning for predicting disease-related miRNAs. iScience 2024; 27:108639. [PMID: 38303724 PMCID: PMC10831890 DOI: 10.1016/j.isci.2023.108639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 11/02/2023] [Accepted: 12/01/2023] [Indexed: 02/03/2024] Open
Abstract
Inferring the latent disease-related miRNAs is helpful for providing a deep insight into observing the disease pathogenesis. We propose a method, CMMDA, to encode and integrate the context relationship among multiple heterogeneous networks, the complementary information across these networks, and the pairwise multimodal attributes. We first established multiple heterogeneous networks according to the diverse disease similarities. The feature representation embedding the context relationship is formulated for each miRNA (disease) node based on transformer. We designed a co-attention fusion mechanism to encode the complementary information among multiple networks. In terms of a pair of miRNA and disease nodes, the pairwise attributes from multiple networks form a multimodal attribute embedding. A module based on depthwise separable convolution is constructed to enhance the encoding of the specific features from each modality. The experimental results and the ablation studies show that CMMDA's superior performance and the effectiveness of its major innovations.
Collapse
Affiliation(s)
- Ping Xuan
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
- Department of Computer Science, Shantou University, Shantou 515063, China
| | - Jinshan Xiu
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Hui Cui
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, VIC 3083, Australia
| | - Xiaowen Zhang
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Toshiya Nakaguchi
- Center for Frontier Medical Engineering, Chiba University, Chiba 2638522, Japan
| | - Tiangang Zhang
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
- School of Mathematical Science, Heilongjiang University, Harbin 150080, China
| |
Collapse
|
11
|
Luo H, Zhu C, Wang J, Zhang G, Luo J, Yan C. Prediction of drug-disease associations based on reinforcement symmetric metric learning and graph convolution network. Front Pharmacol 2024; 15:1337764. [PMID: 38384286 PMCID: PMC10879308 DOI: 10.3389/fphar.2024.1337764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 01/18/2024] [Indexed: 02/23/2024] Open
Abstract
Accurately identifying novel indications for drugs is crucial in drug research and discovery. Traditional drug discovery is costly and time-consuming. Computational drug repositioning can provide an effective strategy for discovering potential drug-disease associations. However, the known experimentally verified drug-disease associations is relatively sparse, which may affect the prediction performance of the computational drug repositioning methods. Moreover, while the existing drug-disease prediction method based on metric learning algorithm has achieved better performance, it simply learns features of drugs and diseases only from the drug-centered perspective, and cannot comprehensively model the latent features of drugs and diseases. In this study, we propose a novel drug repositioning method named RSML-GCN, which applies graph convolutional network and reinforcement symmetric metric learning to predict potential drug-disease associations. RSML-GCN first constructs a drug-disease heterogeneous network by integrating the association and feature information of drugs and diseases. Then, the graph convolutional network (GCN) is applied to complement the drug-disease association information. Finally, reinforcement symmetric metric learning with adaptive margin is designed to learn the latent vector representation of drugs and diseases. Based on the learned latent vector representation, the novel drug-disease associations can be identified by the metric function. Comprehensive experiments on benchmark datasets demonstrated the superior prediction performance of RSML-GCN for drug repositioning.
Collapse
Affiliation(s)
- Huimin Luo
- School of Computer and Information Engineering, Henan University, Kaifeng, China
- Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Kaifeng, China
| | - Chunli Zhu
- School of Computer and Information Engineering, Henan University, Kaifeng, China
- Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Kaifeng, China
| | - Jianlin Wang
- School of Computer and Information Engineering, Henan University, Kaifeng, China
- Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Kaifeng, China
| | - Ge Zhang
- School of Computer and Information Engineering, Henan University, Kaifeng, China
- Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Kaifeng, China
| | - Junwei Luo
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Chaokun Yan
- School of Computer and Information Engineering, Henan University, Kaifeng, China
- Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Kaifeng, China
| |
Collapse
|
12
|
Ma Y, Zhong J, Zhu N. Weighted hypergraph learning and adaptive inductive matrix completion for SARS-CoV-2 drug repositioning. Methods 2023; 219:102-110. [PMID: 37804962 DOI: 10.1016/j.ymeth.2023.10.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2022] [Revised: 09/14/2023] [Accepted: 10/03/2023] [Indexed: 10/09/2023] Open
Abstract
MOTIVATION The outbreak of the human coronavirus (SARS-CoV-2) has placed a huge burden on public health and the world economy. Compared with de novo drug discovery, drug repurposing is a promising therapeutic strategy that facilitates rapid clinical treatment decisions, shortens the development process, and reduces costs. RESULTS In this study, we propose a weighted hypergraph learning and adaptive inductive matrix completion method, WHAIMC, for predicting potential virus-drug associations. Firstly, we integrate multi-source data to describe viruses and drugs from multiple perspectives, including drug chemical structures, drug targets, virus complete genome sequences, and virus-drug associations. Then, WHAIMC establishes an adaptive inductive matrix completion model to improve performance through adaptive learning of similarity relations. Finally, WHAIMC introduces weighted hypergraph learning into adaptive inductive matrix completion to capture higher-order relationships of viruses (or drugs). The results showed that WHAIMC had a strong predictive performance for new virus-drug associations, new viruses, and new drugs. The case study further demonstrates that WHAIMC is highly effective for repositioning antiviral drugs against SARS-CoV-2 and provides a new perspective for virus-drug association prediction. The code and data in this study is freely available at https://github.com/Mayingjun20179/WHAIMC.
Collapse
Affiliation(s)
- Yingjun Ma
- School of Mathematics and Statistics, Xiamen University of Technology, Xiamen 361024, China.
| | - Junjiang Zhong
- School of Mathematics and Statistics, Xiamen University of Technology, Xiamen 361024, China
| | - Nenghui Zhu
- School of Mathematics and Statistics, Xiamen University of Technology, Xiamen 361024, China
| |
Collapse
|
13
|
Binatlı OC, Gönen M. MOKPE: drug-target interaction prediction via manifold optimization based kernel preserving embedding. BMC Bioinformatics 2023; 24:276. [PMID: 37407927 DOI: 10.1186/s12859-023-05401-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 06/25/2023] [Indexed: 07/07/2023] Open
Abstract
BACKGROUND In many applications of bioinformatics, data stem from distinct heterogeneous sources. One of the well-known examples is the identification of drug-target interactions (DTIs), which is of significant importance in drug discovery. In this paper, we propose a novel framework, manifold optimization based kernel preserving embedding (MOKPE), to efficiently solve the problem of modeling heterogeneous data. Our model projects heterogeneous drug and target data into a unified embedding space by preserving drug-target interactions and drug-drug, target-target similarities simultaneously. RESULTS We performed ten replications of ten-fold cross validation on four different drug-target interaction network data sets for predicting DTIs for previously unseen drugs. The classification evaluation metrics showed better or comparable performance compared to previous similarity-based state-of-the-art methods. We also evaluated MOKPE on predicting unknown DTIs of a given network. Our implementation of the proposed algorithm in R together with the scripts that replicate the reported experiments is publicly available at https://github.com/ocbinatli/mokpe .
Collapse
Affiliation(s)
- Oğuz C Binatlı
- Graduate School of Sciences and Engineering, Koç University, 34450, Istanbul, Turkey
| | - Mehmet Gönen
- Department of Industrial Engineering, College of Engineering, Koç University, 34450, Istanbul, Turkey.
- School of Medicine, Koç University, 34450, Istanbul, Turkey.
| |
Collapse
|
14
|
Gao Z, Ma H, Zhang X, Wang Y, Wu Z. Similarity measures-based graph co-contrastive learning for drug-disease association prediction. Bioinformatics 2023; 39:btad357. [PMID: 37261859 PMCID: PMC10275904 DOI: 10.1093/bioinformatics/btad357] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Revised: 03/14/2023] [Accepted: 05/31/2023] [Indexed: 06/02/2023] Open
Abstract
MOTIVATION An imperative step in drug discovery is the prediction of drug-disease associations (DDAs), which tries to uncover potential therapeutic possibilities for already validated drugs. It is costly and time-consuming to predict DDAs using wet experiments. Graph Neural Networks as an emerging technique have shown superior capacity of dealing with DDA prediction. However, existing Graph Neural Networks-based DDA prediction methods suffer from sparse supervised signals. As graph contrastive learning has shined in mitigating sparse supervised signals, we seek to leverage graph contrastive learning to enhance the prediction of DDAs. Unfortunately, most conventional graph contrastive learning-based models corrupt the raw data graph to augment data, which are unsuitable for DDA prediction. Meanwhile, these methods could not model the interactions between nodes effectively, thereby reducing the accuracy of association predictions. RESULTS A model is proposed to tap potential drug candidates for diseases, which is called Similarity Measures-based Graph Co-contrastive Learning (SMGCL). For learning embeddings from complicated network topologies, SMGCL includes three essential processes: (i) constructs three views based on similarities between drugs and diseases and DDA information; (ii) two graph encoders are performed over the three views, so as to model both local and global topologies simultaneously; and (iii) a graph co-contrastive learning method is introduced, which co-trains the representations of nodes to maximize the agreement between them, thus generating high-quality prediction results. Contrastive learning serves as an auxiliary task for improving DDA predictions. Evaluated by cross-validations, SMGCL achieves pleasing comprehensive performances. Further proof of the SMGCL's practicality is provided by case study of Alzheimer's disease. AVAILABILITY AND IMPLEMENTATION https://github.com/Jcmorz/SMGCL.
Collapse
Affiliation(s)
- Zihao Gao
- College of Computer Science and Engineering, Northwest Normal University, No.967 Anning East Road, Lanzhou, 730070, China
| | - Huifang Ma
- College of Computer Science and Engineering, Northwest Normal University, No.967 Anning East Road, Lanzhou, 730070, China
- Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, No.1 Jinji Road, Guilin, 541004, China
| | - Xiaohui Zhang
- College of Computer Science and Engineering, Northwest Normal University, No.967 Anning East Road, Lanzhou, 730070, China
| | - Yike Wang
- College of Computer Science and Engineering, Northwest Normal University, No.967 Anning East Road, Lanzhou, 730070, China
| | - Zheyu Wu
- College of Computer Science and Engineering, Northwest Normal University, No.967 Anning East Road, Lanzhou, 730070, China
| |
Collapse
|
15
|
Gu C, Li X. Prediction of disease-related miRNAs by voting with multiple classifiers. BMC Bioinformatics 2023; 24:177. [PMID: 37122001 PMCID: PMC10150488 DOI: 10.1186/s12859-023-05308-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Accepted: 04/26/2023] [Indexed: 05/02/2023] Open
Abstract
There is strong evidence to support that mutations and dysregulation of miRNAs are associated with a variety of diseases, including cancer. However, the experimental methods used to identify disease-related miRNAs are expensive and time-consuming. Effective computational approaches to identify disease-related miRNAs are in high demand and would aid in the detection of lncRNA biomarkers for disease diagnosis, treatment, and prevention. In this study, we develop an ensemble learning framework to reveal the potential associations between miRNAs and diseases (ELMDA). The ELMDA framework does not rely on the known associations when calculating miRNA and disease similarities and uses multi-classifiers voting to predict disease-related miRNAs. As a result, the average AUC of the ELMDA framework was 0.9229 for the HMDD v2.0 database in a fivefold cross-validation. All potential associations in the HMDD V2.0 database were predicted, and 90% of the top 50 results were verified with the updated HMDD V3.2 database. The ELMDA framework was implemented to investigate gastric neoplasms, prostate neoplasms and colon neoplasms, and 100%, 94%, and 90%, respectively, of the top 50 potential miRNAs were validated by the HMDD V3.2 database. Moreover, the ELMDA framework can predict isolated disease-related miRNAs. In conclusion, ELMDA appears to be a reliable method to uncover disease-associated miRNAs.
Collapse
Affiliation(s)
- Changlong Gu
- College of Information Science and Engineering, Hunan University, Changsha, 410082, Hunan, China.
| | - Xiaoying Li
- College of Information Science and Engineering, Hunan University, Changsha, 410082, Hunan, China.
| |
Collapse
|
16
|
Xie X, Wang Y, He K, Sheng N. Predicting miRNA-disease associations based on PPMI and attention network. BMC Bioinformatics 2023; 24:113. [PMID: 36959547 PMCID: PMC10037801 DOI: 10.1186/s12859-023-05152-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 01/17/2023] [Indexed: 03/25/2023] Open
Abstract
BACKGROUND With the development of biotechnology and the accumulation of theories, many studies have found that microRNAs (miRNAs) play an important role in various diseases. Uncovering the potential associations between miRNAs and diseases is helpful to better understand the pathogenesis of complex diseases. However, traditional biological experiments are expensive and time-consuming. Therefore, it is necessary to develop more efficient computational methods for exploring underlying disease-related miRNAs. RESULTS In this paper, we present a new computational method based on positive point-wise mutual information (PPMI) and attention network to predict miRNA-disease associations (MDAs), called PATMDA. Firstly, we construct the heterogeneous MDA network and multiple similarity networks of miRNAs and diseases. Secondly, we respectively perform random walk with restart and PPMI on different similarity network views to get multi-order proximity features and then obtain high-order proximity representations of miRNAs and diseases by applying the convolutional neural network to fuse the learned proximity features. Then, we design an attention network with neural aggregation to integrate the representations of a node and its heterogeneous neighbor nodes according to the MDA network. Finally, an inner product decoder is adopted to calculate the relationship scores between miRNAs and diseases. CONCLUSIONS PATMDA achieves superior performance over the six state-of-the-art methods with the area under the receiver operating characteristic curve of 0.933 and 0.946 on the HMDD v2.0 and HMDD v3.2 datasets, respectively. The case studies further demonstrate the validity of PATMDA for discovering novel disease-associated miRNAs.
Collapse
Affiliation(s)
- Xuping Xie
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Yan Wang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China.
- School of Artificial Intelligence, Jilin University, Changchun, China.
| | - Kai He
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Nan Sheng
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| |
Collapse
|
17
|
Hyperbolic matrix factorization improves prediction of drug-target associations. Sci Rep 2023; 13:959. [PMID: 36653463 PMCID: PMC9849222 DOI: 10.1038/s41598-023-27995-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Accepted: 01/11/2023] [Indexed: 01/19/2023] Open
Abstract
Past research in computational systems biology has focused more on the development and applications of advanced statistical and numerical optimization techniques and much less on understanding the geometry of the biological space. By representing biological entities as points in a low dimensional Euclidean space, state-of-the-art methods for drug-target interaction (DTI) prediction implicitly assume the flat geometry of the biological space. In contrast, recent theoretical studies suggest that biological systems exhibit tree-like topology with a high degree of clustering. As a consequence, embedding a biological system in a flat space leads to distortion of distances between biological objects. Here, we present a novel matrix factorization methodology for drug-target interaction prediction that uses hyperbolic space as the latent biological space. When benchmarked against classical, Euclidean methods, hyperbolic matrix factorization exhibits superior accuracy while lowering embedding dimension by an order of magnitude. We see this as additional evidence that the hyperbolic geometry underpins large biological networks.
Collapse
|
18
|
Zhang J, Xie M. Graph regularized non-negative matrix factorization with prior knowledge consistency constraint for drug-target interactions prediction. BMC Bioinformatics 2022; 23:564. [PMID: 36581822 PMCID: PMC9798666 DOI: 10.1186/s12859-022-05119-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 12/20/2022] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Identifying drug-target interactions (DTIs) plays a key role in drug development. Traditional wet experiments to identify DTIs are expensive and time consuming. Effective computational methods to predict DTIs are useful to narrow the searching scope of potential drugs and speed up the process of drug discovery. There are a variety of non-negativity matrix factorization based methods to predict DTIs, but the convergence of the algorithms used in the matrix factorization are often overlooked and the results can be further improved. RESULTS In order to predict DTIs more accurately and quickly, we propose an alternating direction algorithm to solve graph regularized non-negative matrix factorization with prior knowledge consistency constraint (ADA-GRMFC). Based on known DTIs, drug chemical structures and target sequences, ADA-GRMFC at first constructs a DTI matrix, a drug similarity matrix and a target similarity matrix. Then DTI prediction is modeled as the non-negative factorization of the DTI matrix with graph dual regularization terms and a prior knowledge consistency constraint. The graph dual regularization terms are used to integrate the information from the drug similarity matrix and the target similarity matrix, and the prior knowledge consistency constraint is used to ensure the matrix decomposition result should be consistent with the prior knowledge of known DTIs. Finally, an alternating direction algorithm is used to solve the matrix factorization. Furthermore, we prove that the algorithm can converge to a stationary point. Extensive experimental results of 10-fold cross-validation show that ADA-GRMFC has better performance than other state-of-the-art methods. In the case study, ADA-GRMFC is also used to predict the targets interacting with the drug olanzapine, and all of the 10 highest-scoring targets have been accurately predicted. In predicting drug interactions with target estrogen receptors alpha, 17 of the 20 highest-scoring drugs have been validated.
Collapse
Affiliation(s)
- Junjun Zhang
- grid.411427.50000 0001 0089 3695Key Laboratory of Computing and Stochastic Mathematics (LCSM) (Ministry of Education), School of Mathematics and Statistics, Hunan Normal University, Changsha, 410081 China
| | - Minzhu Xie
- grid.411427.50000 0001 0089 3695Key Laboratory of Computing and Stochastic Mathematics (LCSM) (Ministry of Education), School of Mathematics and Statistics, Hunan Normal University, Changsha, 410081 China ,grid.411427.50000 0001 0089 3695College of Information Science and Engineering, Hunan Normal University, Changsha, 410081 China
| |
Collapse
|
19
|
Lei S, Lei X, Liu L. Drug repositioning based on heterogeneous networks and variational graph autoencoders. Front Pharmacol 2022; 13:1056605. [PMID: 36618933 PMCID: PMC9812491 DOI: 10.3389/fphar.2022.1056605] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Accepted: 12/13/2022] [Indexed: 12/24/2022] Open
Abstract
Predicting new therapeutic effects (drug repositioning) of existing drugs plays an important role in drug development. However, traditional wet experimental prediction methods are usually time-consuming and costly. The emergence of more and more artificial intelligence-based drug repositioning methods in the past 2 years has facilitated drug development. In this study we propose a drug repositioning method, VGAEDR, based on a heterogeneous network of multiple drug attributes and a variational graph autoencoder. First, a drug-disease heterogeneous network is established based on three drug attributes, disease semantic information, and known drug-disease associations. Second, low-dimensional feature representations for heterogeneous networks are learned through a variational graph autoencoder module and a multi-layer convolutional module. Finally, the feature representation is fed to a fully connected layer and a Softmax layer to predict new drug-disease associations. Comparative experiments with other baseline methods on three datasets demonstrate the excellent performance of VGAEDR. In the case study, we predicted the top 10 possible anti-COVID-19 drugs on the existing drug and disease data, and six of them were verified by other literatures.
Collapse
|
20
|
Li P, Tiwari P, Xu J, Qian Y, Ai C, Ding Y, Guo F. Sparse regularized joint projection model for identifying associations of non-coding RNAs and human diseases. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.110044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
21
|
Liu BM, Gao YL, Zhang DJ, Zhou F, Wang J, Zheng CH, Liu JX. A new framework for drug-disease association prediction combing light-gated message passing neural network and gated fusion mechanism. Brief Bioinform 2022; 23:6775584. [PMID: 36305457 DOI: 10.1093/bib/bbac457] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2022] [Revised: 09/07/2022] [Accepted: 09/23/2022] [Indexed: 12/14/2022] Open
Abstract
With the development of research on the complex aetiology of many diseases, computational drug repositioning methodology has proven to be a shortcut to costly and inefficient traditional methods. Therefore, developing more promising computational methods is indispensable for finding new candidate diseases to treat with existing drugs. In this paper, a model integrating a new variant of message passing neural network and a novel-gated fusion mechanism called GLGMPNN is proposed for drug-disease association prediction. First, a light-gated message passing neural network (LGMPNN), including message passing, aggregation and updating, is proposed to separately extract multiple pieces of information from the similarity networks and the association network. Then, a gated fusion mechanism consisting of a forget gate and an output gate is applied to integrate the multiple pieces of information to extent. The forget gate calculated by the multiple embeddings is built to integrate the association information into the similarity information. Furthermore, the final node representations are controlled by the output gate, which fuses the topology information of the networks and the initial similarity information. Finally, a bilinear decoder is adopted to reconstruct an adjacency matrix for drug-disease associations. Evaluated by 10-fold cross-validations, GLGMPNN achieves excellent performance compared with the current models. The following studies show that our model can effectively discover novel drug-disease associations.
Collapse
Affiliation(s)
- Bao-Min Liu
- School of Computer Science, Qufu Normal University, Rizhao, 276826, Shandong, China
| | - Ying-Lian Gao
- Qufu Normal University Library, Qufu Normal University, Rizhao, 276826, Shandong, China
| | - Dai-Jun Zhang
- School of Computer Science, Qufu Normal University, Rizhao, 276826, Shandong, China
| | - Feng Zhou
- School of Computer Science, Qufu Normal University, Rizhao, 276826, Shandong, China
| | - Juan Wang
- School of Computer Science, Qufu Normal University, Rizhao, 276826, Shandong, China
| | - Chun-Hou Zheng
- School of Computer Science, Qufu Normal University, Rizhao, 276826, Shandong, China
| | - Jin-Xing Liu
- School of Computer Science, Qufu Normal University, Rizhao, 276826, Shandong, China
| |
Collapse
|
22
|
Gao M, Liu S, Qi Y, Guo X, Shang X. GAE-LGA: integration of multi-omics data with graph autoencoders to identify lncRNA-PCG associations. Brief Bioinform 2022; 23:6775590. [PMID: 36305456 DOI: 10.1093/bib/bbac452] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 09/20/2022] [Accepted: 09/22/2022] [Indexed: 12/14/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) can disrupt the biological functions of protein-coding genes (PCGs) to cause cancer. However, the relationship between lncRNAs and PCGs remains unclear and difficult to predict. Machine learning has achieved a satisfactory performance in association prediction, but to our knowledge, it is currently less used in lncRNA-PCG association prediction. Therefore, we introduce GAE-LGA, a powerful deep learning model with graph autoencoders as components, to recognize potential lncRNA-PCG associations. GAE-LGA jointly explored lncRNA-PCG learning and cross-omics correlation learning for effective lncRNA-PCG association identification. The functional similarity and multi-omics similarity of lncRNAs and PCGs were accumulated and encoded by graph autoencoders to extract feature representations of lncRNAs and PCGs, which were subsequently used for decoding to obtain candidate lncRNA-PCG pairs. Comprehensive evaluation demonstrated that GAE-LGA can successfully capture lncRNA-PCG associations with strong robustness and outperformed other machine learning-based identification methods. Furthermore, multi-omics features were shown to improve the performance of lncRNA-PCG association identification. In conclusion, GAE-LGA can act as an efficient application for lncRNA-PCG association prediction with the following advantages: It fuses multi-omics information into the similarity network, making the feature representation more accurate; it can predict lncRNA-PCG associations for new lncRNAs and identify potential lncRNA-PCG associations with high accuracy.
Collapse
Affiliation(s)
- Meihong Gao
- School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China
| | - Shuhui Liu
- School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China
| | - Yang Qi
- School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China
| | - Xinpeng Guo
- School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China
| | - Xuequn Shang
- School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China
| |
Collapse
|
23
|
Mariappan R, Jayagopal A, Sien HZ, Rajan V. Neural Collective Matrix Factorization for integrated analysis of heterogeneous biomedical data. Bioinformatics 2022; 38:4554-4561. [PMID: 35929808 DOI: 10.1093/bioinformatics/btac543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Revised: 06/30/2022] [Accepted: 08/03/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION In many biomedical studies, there arises the need to integrate data from multiple directly or indirectly related sources. Collective matrix factorization (CMF) and its variants are models designed to collectively learn from arbitrary collections of matrices. The latent factors learnt are rich integrative representations that can be used in downstream tasks, such as clustering or relation prediction with standard machine-learning models. Previous CMF-based methods have numerous modeling limitations. They do not adequately capture complex non-linear interactions and do not explicitly model varying sparsity and noise levels in the inputs, and some cannot model inputs with multiple datatypes. These inadequacies limit their use on many biomedical datasets. RESULTS To address these limitations, we develop Neural Collective Matrix Factorization (NCMF), the first fully neural approach to CMF. We evaluate NCMF on relation prediction tasks of gene-disease association prediction and adverse drug event prediction, using multiple datasets. In each case, data are obtained from heterogeneous publicly available databases and used to learn representations to build predictive models. NCMF is found to outperform previous CMF-based methods and several state-of-the-art graph embedding methods for representation learning in our experiments. Our experiments illustrate the versatility and efficacy of NCMF in representation learning for seamless integration of heterogeneous data. AVAILABILITY AND IMPLEMENTATION https://github.com/ajayago/NCMF_bioinformatics. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ragunathan Mariappan
- Department of Information Systems and Analytics, School of Computing, National University of Singapore, Singapore 117417, Singapore
| | - Aishwarya Jayagopal
- Department of Information Systems and Analytics, School of Computing, National University of Singapore, Singapore 117417, Singapore
| | - Ho Zong Sien
- Department of Information Systems and Analytics, School of Computing, National University of Singapore, Singapore 117417, Singapore
| | - Vaibhav Rajan
- Department of Information Systems and Analytics, School of Computing, National University of Singapore, Singapore 117417, Singapore
| |
Collapse
|
24
|
Xie X, Wang Y, Sheng N, Zhang S, Cao Y, Fu Y. Predicting miRNA-disease associations based on multi-view information fusion. Front Genet 2022; 13:979815. [PMID: 36238163 PMCID: PMC9552014 DOI: 10.3389/fgene.2022.979815] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Accepted: 08/16/2022] [Indexed: 11/13/2022] Open
Abstract
MicroRNAs (miRNAs) play an important role in various biological processes and their abnormal expression could lead to the occurrence of diseases. Exploring the potential relationships between miRNAs and diseases can contribute to the diagnosis and treatment of complex diseases. The increasing databases storing miRNA and disease information provide opportunities to develop computational methods for discovering unobserved disease-related miRNAs, but there are still some challenges in how to effectively learn and fuse information from multi-source data. In this study, we propose a multi-view information fusion based method for miRNA-disease association (MDA)prediction, named MVIFMDA. Firstly, multiple heterogeneous networks are constructed by combining the known MDAs and different similarities of miRNAs and diseases based on multi-source information. Secondly, the topology features of miRNAs and diseases are obtained by using the graph convolutional network to each heterogeneous network view, respectively. Moreover, we design the attention strategy at the topology representation level to adaptively fuse representations including different structural information. Meanwhile, we learn the attribute representations of miRNAs and diseases from their similarity attribute views with convolutional neural networks, respectively. Finally, the complicated associations between miRNAs and diseases are reconstructed by applying a bilinear decoder to the combined features, which combine topology and attribute representations. Experimental results on the public dataset demonstrate that our proposed model consistently outperforms baseline methods. The case studies further show the ability of the MVIFMDA model for inferring underlying associations between miRNAs and diseases.
Collapse
Affiliation(s)
- Xuping Xie
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Yan Wang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
- School of Artificial Intelligence, Jilin University, Changchun, China
- *Correspondence: Yan Wang,
| | - Nan Sheng
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Shuangquan Zhang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Yangkun Cao
- School of Artificial Intelligence, Jilin University, Changchun, China
| | - Yuan Fu
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, United Kingdom
| |
Collapse
|
25
|
Liu B, Papadopoulos D, Malliaros FD, Tsoumakas G, Papadopoulos AN. Multiple similarity drug-target interaction prediction with random walks and matrix factorization. Brief Bioinform 2022; 23:6692553. [PMID: 36070659 DOI: 10.1093/bib/bbac353] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Revised: 07/11/2022] [Accepted: 07/27/2022] [Indexed: 11/14/2022] Open
Abstract
The discovery of drug-target interactions (DTIs) is a very promising area of research with great potential. The accurate identification of reliable interactions among drugs and proteins via computational methods, which typically leverage heterogeneous information retrieved from diverse data sources, can boost the development of effective pharmaceuticals. Although random walk and matrix factorization techniques are widely used in DTI prediction, they have several limitations. Random walk-based embedding generation is usually conducted in an unsupervised manner, while the linear similarity combination in matrix factorization distorts individual insights offered by different views. To tackle these issues, we take a multi-layered network approach to handle diverse drug and target similarities, and propose a novel optimization framework, called Multiple similarity DeepWalk-based Matrix Factorization (MDMF), for DTI prediction. The framework unifies embedding generation and interaction prediction, learning vector representations of drugs and targets that not only retain higher order proximity across all hyper-layers and layer-specific local invariance, but also approximate the interactions with their inner product. Furthermore, we develop an ensemble method (MDMF2A) that integrates two instantiations of the MDMF model, optimizing the area under the precision-recall curve (AUPR) and the area under the receiver operating characteristic curve (AUC), respectively. The empirical study on real-world DTI datasets shows that our method achieves statistically significant improvement over current state-of-the-art approaches in four different settings. Moreover, the validation of highly ranked non-interacting pairs also demonstrates the potential of MDMF2A to discover novel DTIs.
Collapse
Affiliation(s)
- Bin Liu
- Key Laboratory of Data Engineering and Visual Computing, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
- School of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece
| | | | - Fragkiskos D Malliaros
- Paris-Saclay University, CentraleSupélec, Inria, Centre for Visual Computing (CVN), 91190 Gif-Sur-Yvette, France
| | - Grigorios Tsoumakas
- School of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece
| | | |
Collapse
|
26
|
Sharma VS, Fossati A, Ciuffa R, Buljan M, Williams EG, Chen Z, Shao W, Pedrioli PGA, Purcell AW, Martínez MR, Song J, Manica M, Aebersold R, Li C. PCfun: a hybrid computational framework for systematic characterization of protein complex function. Brief Bioinform 2022; 23:6611913. [PMID: 35724564 PMCID: PMC9310514 DOI: 10.1093/bib/bbac239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 05/05/2022] [Accepted: 05/21/2022] [Indexed: 11/14/2022] Open
Abstract
In molecular biology, it is a general assumption that the ensemble of expressed molecules, their activities and interactions determine biological function, cellular states and phenotypes. Stable protein complexes—or macromolecular machines—are, in turn, the key functional entities mediating and modulating most biological processes. Although identifying protein complexes and their subunit composition can now be done inexpensively and at scale, determining their function remains challenging and labor intensive. This study describes Protein Complex Function predictor (PCfun), the first computational framework for the systematic annotation of protein complex functions using Gene Ontology (GO) terms. PCfun is built upon a word embedding using natural language processing techniques based on 1 million open access PubMed Central articles. Specifically, PCfun leverages two approaches for accurately identifying protein complex function, including: (i) an unsupervised approach that obtains the nearest neighbor (NN) GO term word vectors for a protein complex query vector and (ii) a supervised approach using Random Forest (RF) models trained specifically for recovering the GO terms of protein complex queries described in the CORUM protein complex database. PCfun consolidates both approaches by performing a hypergeometric statistical test to enrich the top NN GO terms within the child terms of the GO terms predicted by the RF models. The documentation and implementation of the PCfun package are available at https://github.com/sharmavaruns/PCfun. We anticipate that PCfun will serve as a useful tool and novel paradigm for the large-scale characterization of protein complex function.
Collapse
Affiliation(s)
- Varun S Sharma
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Switzerland.,CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Andrea Fossati
- Quantitative Biosciences Institute (QBI) and Department of Cellular and Molecular Pharmacology, University of California, San Francisco, CA 94158, USA.,J. David Gladstone Institutes, San Francisco, CA 94158, USA
| | - Rodolfo Ciuffa
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Switzerland
| | - Marija Buljan
- Empa - Swiss Federal Laboratories for Materials Science and Technology, St. Gallen, Switzerland.,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Evan G Williams
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette Luxembourg
| | - Zhen Chen
- Collaborative Innovation Center of Henan Grain Crops, Henan Agricultural University, Zhengzhou 450046, China
| | - Wenguang Shao
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Switzerland
| | - Patrick G A Pedrioli
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Switzerland
| | - Anthony W Purcell
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | | | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia.,Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia
| | | | - Ruedi Aebersold
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Switzerland.,Faculty of Science, University of Zurich, Switzerland
| | - Chen Li
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Switzerland.,Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| |
Collapse
|
27
|
Ma Y, Liu Q. Generalized matrix factorization based on weighted hypergraph learning for microbe-drug association prediction. Comput Biol Med 2022; 145:105503. [DOI: 10.1016/j.compbiomed.2022.105503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2022] [Revised: 03/28/2022] [Accepted: 04/04/2022] [Indexed: 11/03/2022]
|
28
|
Li X, Liu LP, Hassoun S. Boost-RS: boosted embeddings for recommender systems and its application to enzyme-substrate interaction prediction. Bioinformatics 2022; 38:2832-2838. [PMID: 35561204 PMCID: PMC9113267 DOI: 10.1093/bioinformatics/btac201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 02/06/2022] [Accepted: 04/07/2022] [Indexed: 11/17/2022] Open
Abstract
MOTIVATION Despite experimental and curation efforts, the extent of enzyme promiscuity on substrates continues to be largely unexplored and under documented. Providing computational tools for the exploration of the enzyme-substrate interaction space can expedite experimentation and benefit applications such as constructing synthesis pathways for novel biomolecules, identifying products of metabolism on ingested compounds, and elucidating xenobiotic metabolism. Recommender systems (RS), which are currently unexplored for the enzyme-substrate interaction prediction problem, can be utilized to provide enzyme recommendations for substrates, and vice versa. The performance of Collaborative-Filtering (CF) RSs; however, hinges on the quality of embedding vectors of users and items (enzymes and substrates in our case). Importantly, enhancing CF embeddings with heterogeneous auxiliary data, specially relational data (e.g. hierarchical, pairwise or groupings), remains a challenge. RESULTS We propose an innovative general RS framework, termed Boost-RS that enhances RS performance by 'boosting' embedding vectors through auxiliary data. Specifically, Boost-RS is trained and dynamically tuned on multiple relevant auxiliary learning tasks Boost-RS utilizes contrastive learning tasks to exploit relational data. To show the efficacy of Boost-RS for the enzyme-substrate prediction interaction problem, we apply the Boost-RS framework to several baseline CF models. We show that each of our auxiliary tasks boosts learning of the embedding vectors, and that contrastive learning using Boost-RS outperforms attribute concatenation and multi-label learning. We also show that Boost-RS outperforms similarity-based models. Ablation studies and visualization of learned representations highlight the importance of using contrastive learning on some of the auxiliary data in boosting the embedding vectors. AVAILABILITY AND IMPLEMENTATION A Python implementation for Boost-RS is provided at https://github.com/HassounLab/Boost-RS. The enzyme-substrate interaction data is available from the KEGG database (https://www.genome.jp/kegg/).
Collapse
Affiliation(s)
- Xinmeng Li
- Department of Computer Science, Tufts University, Medford, MA 02155, USA
| | - Li-Ping Liu
- To whom correspondence should be addressed. and
| | | |
Collapse
|
29
|
Long Y, Wu M, Liu Y, Fang Y, Kwoh CK, Chen J, Luo J, Li X. Pre-training graph neural networks for link prediction in biomedical networks. Bioinformatics 2022; 38:2254-2262. [PMID: 35171981 DOI: 10.1093/bioinformatics/btac100] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 01/15/2022] [Accepted: 02/14/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Graphs or networks are widely utilized to model the interactions between different entities (e.g. proteins, drugs, etc.) for biomedical applications. Predicting potential interactions/links in biomedical networks is important for understanding the pathological mechanisms of various complex human diseases, as well as screening compound targets for drug discovery. Graph neural networks (GNNs) have been utilized for link prediction in various biomedical networks, which rely on the node features extracted from different data sources, e.g. sequence, structure and network data. However, it is challenging to effectively integrate these data sources and automatically extract features for different link prediction tasks. RESULTS In this article, we propose a novel Pre-Training Graph Neural Networks-based framework named PT-GNN to integrate different data sources for link prediction in biomedical networks. First, we design expressive deep learning methods [e.g. convolutional neural network and graph convolutional network (GCN)] to learn features for individual nodes from sequence and structure data. Second, we further propose a GCN-based encoder to effectively refine the node features by modelling the dependencies among nodes in the network. Third, the node features are pre-trained based on graph reconstruction tasks. The pre-trained features can be used for model initialization in downstream tasks. Extensive experiments have been conducted on two critical link prediction tasks, i.e. synthetic lethality (SL) prediction and drug-target interaction (DTI) prediction. Experimental results demonstrate PT-GNN outperforms the state-of-the-art methods for SL prediction and DTI prediction. In addition, the pre-trained features benefit improving the performance and reduce the training time of existing models. AVAILABILITY AND IMPLEMENTATION Python codes and dataset are available at: https://github.com/longyahui/PT-GNN. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yahui Long
- Singapore Immunology Network (SIgN), Agency for Science, Technology and Research, Singapore, Singapore
| | - Min Wu
- Institute for Infocomm Research, Agency for Science, Technology and Research, Singapore, Singapore
| | - Yong Liu
- Joint NTU-UBC Research Centre of Excellence in Active Living for the Elderly, Singapore, Singapore
| | - Yuan Fang
- School of Information Systems, Singapore Management University, 178902 Singapore, Singapore
| | - Chee Keong Kwoh
- School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
| | - Jinmiao Chen
- Singapore Immunology Network (SIgN), Agency for Science, Technology and Research, Singapore, Singapore
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Xiaoli Li
- Institute for Infocomm Research, Agency for Science, Technology and Research, Singapore, Singapore
| |
Collapse
|
30
|
Ma Y. DeepMNE: Deep Multi-network Embedding for lncRNA-Disease Association prediction. IEEE J Biomed Health Inform 2022; 26:3539-3549. [PMID: 35180094 DOI: 10.1109/jbhi.2022.3152619] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Long non-coding RNA (lncRNA) participates in various biological processes, hence its mutations and disorders play an important role in the pathogenesis of multiple human diseases. Identifying disease-related lncRNAs is crucial for the diagnosis, prevention, and treatment of diseases. Although a large number of computational approaches have been developed, effectively integrating multi-omics data and accurately predicting potential lncRNA-disease associations remains a challenge, especially regarding new lncRNAs and new diseases. In this work, we propose a new method with deep multi-network embedding, called DeepMNE, to discover potential lncRNA disease associations, especially for novel diseases and lncRNAs. DeepMNE extracts multi-omics data to describe diseases and lncRNAs, and proposes a network fusion method based on deep learning to integrate multi-source information. Moreover, DeepMNE complements the sparse association network and uses kernel neighborhood similarity to construct disease similarity and lncRNA similarity networks. Furthermore, A graph embedding method is adopted to predict potential associations. Experimental results demonstrate that compared to other state-of-the-art methods, DeepMNE has a higher predictive performance on new associations, new lncRNAs and new diseases. Besides, DeepMNE also elicits a considerable predictive performance on perturbed datasets. Additionally, the results of two different types of case studies indicate that DeepMNE can be used as an effective tool for disease-related lncRNA prediction. The code of DeepMNE is freely available at https://github.com/Mayingjun20179/ DeepMNE.
Collapse
|
31
|
Targets preliminary screening for the fresh natural drug molecule based on Cosine-correlation and similarity-comparison of local network. J Transl Med 2022; 20:67. [PMID: 35115019 PMCID: PMC8812203 DOI: 10.1186/s12967-022-03279-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 01/24/2022] [Indexed: 11/30/2022] Open
Abstract
Background Chinese herbal medicine is made up of hundreds of natural drug molecules and has played a major role in traditional Chinese medicine (TCM) for several thousand years. Therefore, it is of great significance to study the target of natural drug molecules for exploring the mechanism of treating diseases with TCM. However, it is very difficult to determine the targets of a fresh natural drug molecule due to the complexity of the interaction between drug molecules and targets. Compared with traditional biological experiments, the computational method has the advantages of less time and low cost for targets screening, but it remains many great challenges, especially for the molecules without social ties. Methods This study proposed a novel method based on the Cosine-correlation and Similarity-comparison of Local Network (CSLN) to perform the preliminary screening of targets for the fresh natural drug molecules and assign weights to them through a trained parameter. Results The performance of CSLN is superior to the popular drug-target-interaction (DTI) prediction model GRGMF on the gold standard data in the condition that is drug molecules are the objects for training and testing. Moreover, CSLN showed excellent ability in checking the targets screening performance for a fresh-natural-drug-molecule (scenario simulation) on the TCMSP (13 positive samples in top20), meanwhile, Western-Blot also further verified the accuracy of CSLN. Conclusions In summary, the results suggest that CSLN can be used as an alternative strategy for screening targets of fresh natural drug molecules.
Collapse
|
32
|
Ding Y, Tang J, Guo F, Zou Q. Identification of drug-target interactions via multiple kernel-based triple collaborative matrix factorization. Brief Bioinform 2022; 23:6520305. [PMID: 35134117 DOI: 10.1093/bib/bbab582] [Citation(s) in RCA: 48] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Revised: 12/02/2021] [Accepted: 12/19/2021] [Indexed: 12/15/2022] Open
Abstract
Targeted drugs have been applied to the treatment of cancer on a large scale, and some patients have certain therapeutic effects. It is a time-consuming task to detect drug-target interactions (DTIs) through biochemical experiments. At present, machine learning (ML) has been widely applied in large-scale drug screening. However, there are few methods for multiple information fusion. We propose a multiple kernel-based triple collaborative matrix factorization (MK-TCMF) method to predict DTIs. The multiple kernel matrices (contain chemical, biological and clinical information) are integrated via multi-kernel learning (MKL) algorithm. And the original adjacency matrix of DTIs could be decomposed into three matrices, including the latent feature matrix of the drug space, latent feature matrix of the target space and the bi-projection matrix (used to join the two feature spaces). To obtain better prediction performance, MKL algorithm can regulate the weight of each kernel matrix according to the prediction error. The weights of drug side-effects and target sequence are the highest. Compared with other computational methods, our model has better performance on four test data sets.
Collapse
Affiliation(s)
- Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, P.R.China
| | - Jijun Tang
- Department of Computational Science and Engineering, University of South Carolina, Columbia, U.S
| | - Fei Guo
- School of Computer Science and Engineering, Central South University, Changsha, P.R.China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, P.R.China
| |
Collapse
|
33
|
Yang H, Ding Y, Tang J, Guo F. Inferring human microbe–drug associations via multiple kernel fusion on graph neural network. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2021.107888] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
34
|
Fu H, Huang F, Liu X, Qiu Y, Zhang W. MVGCN: data integration through multi-view graph convolutional network for predicting links in biomedical bipartite networks. Bioinformatics 2022; 38:426-434. [PMID: 34499148 DOI: 10.1093/bioinformatics/btab651] [Citation(s) in RCA: 52] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2021] [Revised: 08/07/2021] [Accepted: 09/06/2021] [Indexed: 02/05/2023] Open
Abstract
MOTIVATION There are various interaction/association bipartite networks in biomolecular systems. Identifying unobserved links in biomedical bipartite networks helps to understand the underlying molecular mechanisms of human complex diseases and thus benefits the diagnosis and treatment of diseases. Although a great number of computational methods have been proposed to predict links in biomedical bipartite networks, most of them heavily depend on features and structures involving the bioentities in one specific bipartite network, which limits the generalization capacity of applying the models to other bipartite networks. Meanwhile, bioentities usually have multiple features, and how to leverage them has also been challenging. RESULTS In this study, we propose a novel multi-view graph convolution network (MVGCN) framework for link prediction in biomedical bipartite networks. We first construct a multi-view heterogeneous network (MVHN) by combining the similarity networks with the biomedical bipartite network, and then perform a self-supervised learning strategy on the bipartite network to obtain node attributes as initial embeddings. Further, a neighborhood information aggregation (NIA) layer is designed for iteratively updating the embeddings of nodes by aggregating information from inter- and intra-domain neighbors in every view of the MVHN. Next, we combine embeddings of multiple NIA layers in each view, and integrate multiple views to obtain the final node embeddings, which are then fed into a discriminator to predict the existence of links. Extensive experiments show MVGCN performs better than or on par with baseline methods and has the generalization capacity on six benchmark datasets involving three typical tasks. AVAILABILITY AND IMPLEMENTATION Source code and data can be downloaded from https://github.com/fuhaitao95/MVGCN. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Haitao Fu
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Feng Huang
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Xuan Liu
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Yang Qiu
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Wen Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| |
Collapse
|
35
|
Ma Y, Ma Y. Hypergraph-based logistic matrix factorization for metabolite-disease interaction prediction. Bioinformatics 2022; 38:435-443. [PMID: 34499104 DOI: 10.1093/bioinformatics/btab652] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Revised: 08/08/2021] [Accepted: 09/06/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Function-related metabolites, the terminal products of the cell regulation, show a close association with complex diseases. The identification of disease-related metabolites is critical to the diagnosis, prevention and treatment of diseases. However, most existing computational approaches build networks by calculating pairwise relationships, which is inappropriate for mining higher-order relationships. RESULTS In this study, we presented a novel approach with hypergraph-based logistic matrix factorization, HGLMF, to predict the potential interactions between metabolites and disease. First, the molecular structures and gene associations of metabolites and the hierarchical structures and GO functional annotations of diseases were extracted to build various similarity measures of metabolites and diseases. Next, the kernel neighborhood similarity of metabolites (or diseases) was calculated according to the completed interactive network. Second, multiple networks of metabolites and diseases were fused, respectively, and the hypergraph structures of metabolites and diseases were built. Finally, a logistic matrix factorization based on hypergraph was proposed to predict potential metabolite-disease interactions. In computational experiments, HGLMF accurately predicted the metabolite-disease interaction, and performed better than other state-of-the-art methods. Moreover, HGLMF could be used to predict new metabolites (or diseases). As suggested from the case studies, the proposed method could discover novel disease-related metabolites, which has been confirmed in existing studies. AVAILABILITY AND IMPLEMENTATION The codes and dataset are available at: https://github.com/Mayingjun20179/HGLMF. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yingjun Ma
- School of Applied Mathematics, Xiamen University of Technology, Xiamen 361024, China
| | - Yuanyuan Ma
- School of Computer & Information Engineering, Anyang Normal University, Anyang 455000, China
| |
Collapse
|
36
|
Zhou S, Sun W, Zhang P, Li L. Predicting Pseudogene-miRNA Associations Based on Feature Fusion and Graph Auto-Encoder. Front Genet 2021; 12:781277. [PMID: 34966413 PMCID: PMC8710693 DOI: 10.3389/fgene.2021.781277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 11/16/2021] [Indexed: 11/13/2022] Open
Abstract
Pseudogenes were originally regarded as non-functional components scattered in the genome during evolution. Recent studies have shown that pseudogenes can be transcribed into long non-coding RNA and play a key role at multiple functional levels in different physiological and pathological processes. microRNAs (miRNAs) are a type of non-coding RNA, which plays important regulatory roles in cells. Numerous studies have shown that pseudogenes and miRNAs have interactions and form a ceRNA network with mRNA to regulate biological processes and involve diseases. Exploring the associations of pseudogenes and miRNAs will facilitate the clinical diagnosis of some diseases. Here, we propose a prediction model PMGAE (Pseudogene–MiRNA association prediction based on the Graph Auto-Encoder), which incorporates feature fusion, graph auto-encoder (GAE), and eXtreme Gradient Boosting (XGBoost). First, we calculated three types of similarities including Jaccard similarity, cosine similarity, and Pearson similarity between nodes based on the biological characteristics of pseudogenes and miRNAs. Subsequently, we fused the above similarities to construct a similarity profile as the initial representation features for nodes. Then, we aggregated the similarity profiles and associations of nodes to obtain the low-dimensional representation vector of nodes through a GAE. In the last step, we fed these representation vectors into an XGBoost classifier to predict new pseudogene–miRNA associations (PMAs). The results of five-fold cross validation show that PMGAE achieves a mean AUC of 0.8634 and mean AUPR of 0.8966. Case studies further substantiated the reliability of PMGAE for mining PMAs and the study of endogenous RNA networks in relation to diseases.
Collapse
Affiliation(s)
- Shijia Zhou
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Weicheng Sun
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Ping Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Li Li
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China.,Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, China
| |
Collapse
|
37
|
Zhang G, Li M, Deng H, Xu X, Liu X, Zhang W. SGNNMD: signed graph neural network for predicting deregulation types of miRNA-disease associations. Brief Bioinform 2021; 23:6455665. [PMID: 34875683 DOI: 10.1093/bib/bbab464] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2021] [Revised: 10/08/2021] [Accepted: 10/11/2021] [Indexed: 11/13/2022] Open
Abstract
MiRNAs are a class of small non-coding RNA molecules that play an important role in many biological processes, and determining miRNA-disease associations can benefit drug development and clinical diagnosis. Although great efforts have been made to develop miRNA-disease association prediction methods, few attention has been paid to in-depth classification of miRNA-disease associations, e.g. up/down-regulation of miRNAs in diseases. In this paper, we regard known miRNA-disease associations as a signed bipartite network, which has miRNA nodes, disease nodes and two types of edges representing up/down-regulation of miRNAs in diseases, and propose a signed graph neural network method (SGNNMD) for predicting deregulation types of miRNA-disease associations. SGNNMD extracts subgraphs around miRNA-disease pairs from the signed bipartite network and learns structural features of subgraphs via a labeling algorithm and a neural network, and then combines them with biological features (i.e. miRNA-miRNA functional similarity and disease-disease semantic similarity) to build the prediction model. In the computational experiments, SGNNMD achieves highly competitive performance when compared with several baselines, including the signed graph link prediction methods, multi-relation prediction methods and one existing deregulation type prediction method. Moreover, SGNNMD has good inductive capability and can generalize to miRNAs/diseases unseen during the training.
Collapse
Affiliation(s)
- Guangzhan Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Menglu Li
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Huan Deng
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Xinran Xu
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Xuan Liu
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Wen Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| |
Collapse
|
38
|
Ou-Yang L, Lu F, Zhang ZC, Wu M. Matrix factorization for biomedical link prediction and scRNA-seq data imputation: an empirical survey. Brief Bioinform 2021; 23:6447434. [PMID: 34864871 DOI: 10.1093/bib/bbab479] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 09/25/2021] [Accepted: 10/18/2021] [Indexed: 02/02/2023] Open
Abstract
Advances in high-throughput experimental technologies promote the accumulation of vast number of biomedical data. Biomedical link prediction and single-cell RNA-sequencing (scRNA-seq) data imputation are two essential tasks in biomedical data analyses, which can facilitate various downstream studies and gain insights into the mechanisms of complex diseases. Both tasks can be transformed into matrix completion problems. For a variety of matrix completion tasks, matrix factorization has shown promising performance. However, the sparseness and high dimensionality of biomedical networks and scRNA-seq data have raised new challenges. To resolve these issues, various matrix factorization methods have emerged recently. In this paper, we present a comprehensive review on such matrix factorization methods and their usage in biomedical link prediction and scRNA-seq data imputation. Moreover, we select representative matrix factorization methods and conduct a systematic empirical comparison on 15 real data sets to evaluate their performance under different scenarios. By summarizing the experimental results, we provide general guidelines for selecting matrix factorization methods for different biomedical matrix completion tasks and point out some future directions to further improve the performance for biomedical link prediction and scRNA-seq data imputation.
Collapse
Affiliation(s)
- Le Ou-Yang
- Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen Key Laboratory of Media Security, and Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ), College of Electronics and Information Engineering, Shenzhen University, Shenzhen, 518060, China.,Shenzhen Institute of Artificial Intelligence and Robotics for Society, Shenzhen,518172, China
| | - Fan Lu
- Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen Key Laboratory of Media Security, and Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ), College of Electronics and Information Engineering, Shenzhen University, Shenzhen, 518060, China
| | - Zi-Chao Zhang
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, 200433, China
| | - Min Wu
- Institute for Infocomm Research (I2R), A*STAR, 138632, Singapore
| |
Collapse
|
39
|
Sorkhi AG, Abbasi Z, Mobarakeh MI, Pirgazi J. Drug-target interaction prediction using unifying of graph regularized nuclear norm with bilinear factorization. BMC Bioinformatics 2021; 22:555. [PMID: 34789169 PMCID: PMC8597250 DOI: 10.1186/s12859-021-04464-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Accepted: 10/29/2021] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Wet-lab experiments for identification of interactions between drugs and target proteins are time-consuming, costly and labor-intensive. The use of computational prediction of drug-target interactions (DTIs), which is one of the significant points in drug discovery, has been considered by many researchers in recent years. It also reduces the search space of interactions by proposing potential interaction candidates. RESULTS In this paper, a new approach based on unifying matrix factorization and nuclear norm minimization is proposed to find a low-rank interaction. In this combined method, to solve the low-rank matrix approximation, the terms in the DTI problem are used in such a way that the nuclear norm regularized problem is optimized by a bilinear factorization based on Rank-Restricted Soft Singular Value Decomposition (RRSSVD). In the proposed method, adjacencies between drugs and targets are encoded by graphs. Drug-target interaction, drug-drug similarity, target-target, and combination of similarities have also been used as input. CONCLUSIONS The proposed method is evaluated on four benchmark datasets known as Enzymes (E), Ion channels (ICs), G protein-coupled receptors (GPCRs) and nuclear receptors (NRs) based on AUC, AUPR, and time measure. The results show an improvement in the performance of the proposed method compared to the state-of-the-art techniques.
Collapse
Affiliation(s)
- Ali Ghanbari Sorkhi
- Faculty of Electrical and Computer Engineering, University of Science and Technology of Mazandaran, P.O. Box, 48518-78195 Behshahr, Iran
| | - Zahra Abbasi
- School of Medicine, Faculty of Medical Biotechnology, Shahroud University of Medical Sciences, Shahroud, Iran
| | | | - Jamshid Pirgazi
- Faculty of Electrical and Computer Engineering, University of Science and Technology of Mazandaran, P.O. Box, 48518-78195 Behshahr, Iran
| |
Collapse
|
40
|
Identification of drug-target interactions via multi-view graph regularized link propagation model. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.05.100] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|
41
|
Yi HC, You ZH, Huang DS, Kwoh CK. Graph representation learning in bioinformatics: trends, methods and applications. Brief Bioinform 2021; 23:6361044. [PMID: 34471921 DOI: 10.1093/bib/bbab340] [Citation(s) in RCA: 58] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 07/18/2021] [Accepted: 08/02/2021] [Indexed: 12/12/2022] Open
Abstract
Graph is a natural data structure for describing complex systems, which contains a set of objects and relationships. Ubiquitous real-life biomedical problems can be modeled as graph analytics tasks. Machine learning, especially deep learning, succeeds in vast bioinformatics scenarios with data represented in Euclidean domain. However, rich relational information between biological elements is retained in the non-Euclidean biomedical graphs, which is not learning friendly to classic machine learning methods. Graph representation learning aims to embed graph into a low-dimensional space while preserving graph topology and node properties. It bridges biomedical graphs and modern machine learning methods and has recently raised widespread interest in both machine learning and bioinformatics communities. In this work, we summarize the advances of graph representation learning and its representative applications in bioinformatics. To provide a comprehensive and structured analysis and perspective, we first categorize and analyze both graph embedding methods (homogeneous graph embedding, heterogeneous graph embedding, attribute graph embedding) and graph neural networks. Furthermore, we summarize their representative applications from molecular level to genomics, pharmaceutical and healthcare systems level. Moreover, we provide open resource platforms and libraries for implementing these graph representation learning methods and discuss the challenges and opportunities of graph representation learning in bioinformatics. This work provides a comprehensive survey of emerging graph representation learning algorithms and their applications in bioinformatics. It is anticipated that it could bring valuable insights for researchers to contribute their knowledge to graph representation learning and future-oriented bioinformatics studies.
Collapse
Affiliation(s)
- Hai-Cheng Yi
- Chinese Academy of Sciences, Xinjiang Technical Institute of Physics and Chemistry, Urumqi 830011, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an 710129, China
| | - De-Shuang Huang
- Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, Shanghai 201804, China
| | - Chee Keong Kwoh
- School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore
| |
Collapse
|
42
|
Tang X, Luo J, Shen C, Lai Z. Multi-view Multichannel Attention Graph Convolutional Network for miRNA-disease association prediction. Brief Bioinform 2021; 22:6271996. [PMID: 33963829 DOI: 10.1093/bib/bbab174] [Citation(s) in RCA: 93] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 04/08/2021] [Accepted: 04/09/2021] [Indexed: 12/11/2022] Open
Abstract
MOTIVATION In recent years, a growing number of studies have proved that microRNAs (miRNAs) play significant roles in the development of human complex diseases. Discovering the associations between miRNAs and diseases has become an important part of the discovery and treatment of disease. Since uncovering associations via traditional experimental methods is complicated and time-consuming, many computational methods have been proposed to identify the potential associations. However, there are still challenges in accurately determining potential associations between miRNA and disease by using multisource data. RESULTS In this study, we develop a Multi-view Multichannel Attention Graph Convolutional Network (MMGCN) to predict potential miRNA-disease associations. Different from simple multisource information integration, MMGCN employs GCN encoder to obtain the features of miRNA and disease in different similarity views, respectively. Moreover, our MMGCN can enhance the learned latent representations for association prediction by utilizing multichannel attention, which adaptively learns the importance of different features. Empirical results on two datasets demonstrate that MMGCN model can achieve superior performance compared with nine state-of-the-art methods on most of the metrics. Furthermore, we prove the effectiveness of multichannel attention mechanism and the validity of multisource data in miRNA and disease association prediction. Case studies also indicate the ability of the method for discovering new associations.
Collapse
Affiliation(s)
- Xinru Tang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410083, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410083, China
| | - Cong Shen
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410083, China
| | - Zihan Lai
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410083, China
| |
Collapse
|
43
|
Ma Y, Liu L, Chen Q, Ma Y. An Inductive Logistic Matrix Factorization Model for Predicting Drug-Metabolite Association With Vicus Regularization. Front Microbiol 2021; 12:650366. [PMID: 33868209 PMCID: PMC8047063 DOI: 10.3389/fmicb.2021.650366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Accepted: 03/08/2021] [Indexed: 11/28/2022] Open
Abstract
Metabolites are closely related to human disease. The interaction between metabolites and drugs has drawn increasing attention in the field of pharmacomicrobiomics. However, only a small portion of the drug-metabolite interactions were experimentally observed due to the fact that experimental validation is labor-intensive, costly, and time-consuming. Although a few computational approaches have been proposed to predict latent associations for various bipartite networks, such as miRNA-disease, drug-target interaction networks, and so on, to our best knowledge the associations between drugs and metabolites have not been reported on a large scale. In this study, we propose a novel algorithm, namely inductive logistic matrix factorization (ILMF) to predict the latent associations between drugs and metabolites. Specifically, the proposed ILMF integrates drug-drug interaction, metabolite-metabolite interaction, and drug-metabolite interaction into this framework, to model the probability that a drug would interact with a metabolite. Moreover, we exploit inductive matrix completion to guide the learning of projection matrices U and V that depend on the low-dimensional feature representation matrices of drugs and metabolites: Fm and Fd . These two matrices can be obtained by fusing multiple data sources. Thus, Fd U and Fm V can be viewed as drug-specific and metabolite-specific latent representations, different from classical LMF. Furthermore, we utilize the Vicus spectral matrix that reveals the refined local geometrical structure inherent in the original data to encode the relationships between drugs and metabolites. Extensive experiments are conducted on a manually curated "DrugMetaboliteAtlas" dataset. The experimental results show that ILMF can achieve competitive performance compared with other state-of-the-art approaches, which demonstrates its effectiveness in predicting potential drug-metabolite associations.
Collapse
Affiliation(s)
- Yuanyuan Ma
- School of Computer and Information Engineering, Anyang Normal University, Anyang, China
| | - Lifang Liu
- School of Education, Anyang Normal University, Anyang, China
| | - Qianjun Chen
- School of Computer, Central China Normal University, Wuhan, China
| | - Yingjun Ma
- School of Applied Mathematics, Xiamen University of Technology, Xiamen, China
| |
Collapse
|
44
|
Long Y, Wu M, Liu Y, Zheng J, Kwoh CK, Luo J, Li X. Graph contextualized attention network for predicting synthetic lethality in human cancers. Bioinformatics 2021; 37:2432-2440. [PMID: 33609108 DOI: 10.1093/bioinformatics/btab110] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Revised: 02/09/2021] [Accepted: 02/16/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Synthetic Lethality (SL) plays an increasingly critical role in the targeted anticancer therapeutics. In addition, identifying SL interactions can create opportunities to selectively kill cancer cells without harming normal cells. Given the high cost of wet-lab experiments, in silico prediction of SL interactions as an alternative can be a rapid and cost-effective way to guide the experimental screening of candidate SL pairs. Several matrix factorization-based methods have recently been proposed for human SL prediction. However, they are limited in capturing the dependencies of neighbors. In addition, it is also highly challenging to make accurate predictions for new genes without any known SL partners. RESULTS In this work, we propose a novel graph contextualized attention network named GCATSL to learn gene representations for SL prediction. First, we leverage different data sources to construct multiple feature graphs for genes, which serve as the feature inputs for our GCATSL method. Second, for each feature graph, we design node-level attention mechanism to effectively capture the importance of local and global neighbors and learn local and global representations for the nodes, respectively. We further exploit multi-layer perceptron (MLP) to aggregate the original features with the local and global representations and then derive the feature-specific representations. Third, to derive the final representations, we design feature-level attention to integrate feature-specific representations by taking the importance of different feature graphs into account. Extensive experimental results on three datasets under different settings demonstrated that our GCATSL model outperforms 14 state-of-the-art methods consistently. In addition, case studies further validated the effectiveness of our proposed model in identifying novel SL pairs. AVAILABILITY Python codes and dataset are freely available on GitHub (https://github.com/longyahui/GCATSL) and Zenodo (https://zenodo.org/record/4522679) under the MIT license.
Collapse
Affiliation(s)
- Yahui Long
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410000, China.,School of Computer Science and Engineering, Nanyang Technological University, Singapore, 639798, Singapore
| | - Min Wu
- Institute for Infocomm Research, Agency for Science, Technology and Research (A*STAR), 138632, Singapore
| | - Yong Liu
- Joint NTU-UBC Research Centre of Excellence in Active Living for the Elderly, Nanyang Technological University, 639798, Singapore
| | - Jie Zheng
- School of Information Science and Technology, ShanghaiTech University, Shanghai, 201210, China
| | - Chee Keong Kwoh
- School of Computer Science and Engineering, Nanyang Technological University, Singapore, 639798, Singapore
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410000, China
| | - Xiaoli Li
- Institute for Infocomm Research, Agency for Science, Technology and Research (A*STAR), 138632, Singapore
| |
Collapse
|
45
|
Long Y, Luo J. Association Mining to Identify Microbe Drug Interactions Based on Heterogeneous Network Embedding Representation. IEEE J Biomed Health Inform 2021; 25:266-275. [PMID: 32750918 DOI: 10.1109/jbhi.2020.2998906] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Accurately identifying microbe-drug associations plays a critical role in drug development and precision medicine. Considering that the conventional wet-lab method is time-consuming, labor-intensive and expensive, computational approach is an alternative choice. The increasing availability of numerous biological data provides a great opportunity to systematically understand complex interaction mechanisms between microbes and drugs. However, few computational methods have been developed for microbe drug prediction. In this work, we leverage multiple sources of biomedical data to construct a heterogeneous network for microbes and drugs, including drug-drug interactions, microbe-microbe interactions and microbe-drug associations. And then we propose a novel Heterogeneous Network Embedding Representation framework for Microbe-Drug Association prediction, named (HNERMDA), by combining metapath2vec with bipartite network recommendation. In this framework, we introduce metapath2vec, a heterogeneous network representation learning method, to learn low-dimensional embedding representations for microbes and drugs. Following that, we further design a bias bipartite network projection recommendation algorithm to improve prediction accuracy. Comprehensive experiments on two datasets, named MDAD and aBiofilm, demonstrated that our model consistently outperformed five baseline methods in three types of cross-validations. Case study on two popular drugs (i.e., Ciprofloxacin and Pefloxacin) further validated the effectiveness of our HNERMDA model in inferring potential target microbes for drugs.
Collapse
|
46
|
Shen C, Luo J, Ouyang W, Ding P, Chen X. IDDkin: Network-based influence deep diffusion model for enhancing prediction of kinase inhibitors. Bioinformatics 2020; 36:5481-5491. [PMID: 33367525 DOI: 10.1093/bioinformatics/btaa1058] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Revised: 11/09/2020] [Accepted: 12/10/2020] [Indexed: 01/01/2023] Open
Abstract
MOTIVATION Protein kinases have been the focus of drug discovery research for many years because they play a causal role in many human diseases. Understanding the binding profile of kinase inhibitors is a prerequisite for drug discovery, and traditional methods of predicting kinase inhibitors are time-consuming and inefficient. Calculation-based predictive methods provide a relatively low-cost and high-efficiency approach to the rapid development and effective understanding of the binding profile of kinase inhibitors. Particularly, the continuous improvement of network pharmacology methods provides unprecedented opportunities for drug discovery, network-based computational methods could be employed to aggregate the effective information from heterogeneous sources, which have become a new way for predicting the binding profile of kinase inhibitors. RESULTS In this study, we proposed a network-based influence deep diffusion model, named IDDkin, for enhancing the prediction of kinase inhibitors. IDDkin uses deep graph convolutional networks, graph attention networks and adaptive weighting methods to diffuse the effective information of heterogeneous networks. The updated kinase and compound representations are used to predict potential compound-kinase pairs. The experimental results show that the performance of IDDkin is superior to the comparison methods, including the state-of-the art kinase inhibitor prediction method and the classic model widely used in relationship prediction. In experiments conducted to verify its generalizability and in case studies, the IDDkin model also shows excellent performance. All of these results demonstrate the powerful predictive ability of the IDDkin model in the field of kinase inhibitors. AVAILABILITY AND IMPLEMENTATION Source code and data can be downloaded from https://github.com/ CS-BIO/IDDkin. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Cong Shen
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410083, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410083, China
| | - Wenjue Ouyang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410083, China
| | - Pingjian Ding
- School of Computer Science, University of South China, Hengyang, 421001, China
| | - Xiangtao Chen
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410083, China
| |
Collapse
|
47
|
Lei X, Mudiyanselage TB, Zhang Y, Bian C, Lan W, Yu N, Pan Y. A comprehensive survey on computational methods of non-coding RNA and disease association prediction. Brief Bioinform 2020; 22:6042241. [PMID: 33341893 DOI: 10.1093/bib/bbaa350] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 10/20/2020] [Accepted: 11/01/2020] [Indexed: 02/06/2023] Open
Abstract
The studies on relationships between non-coding RNAs and diseases are widely carried out in recent years. A large number of experimental methods and technologies of producing biological data have also been developed. However, due to their high labor cost and production time, nowadays, calculation-based methods, especially machine learning and deep learning methods, have received a lot of attention and been used commonly to solve these problems. From a computational point of view, this survey mainly introduces three common non-coding RNAs, i.e. miRNAs, lncRNAs and circRNAs, and the related computational methods for predicting their association with diseases. First, the mainstream databases of above three non-coding RNAs are introduced in detail. Then, we present several methods for RNA similarity and disease similarity calculations. Later, we investigate ncRNA-disease prediction methods in details and classify these methods into five types: network propagating, recommend system, matrix completion, machine learning and deep learning. Furthermore, we provide a summary of the applications of these five types of computational methods in predicting the associations between diseases and miRNAs, lncRNAs and circRNAs, respectively. Finally, the advantages and limitations of various methods are identified, and future researches and challenges are also discussed.
Collapse
Affiliation(s)
- Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | | | - Yuchen Zhang
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Chen Bian
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Wei Lan
- School of Computer, Electronics and Information at Guangxi University, Nanning, China
| | - Ning Yu
- Department of Computing Sciences at the College at Brockport, State University of New York, Rochester, NY, USA
| | - Yi Pan
- Computer Science Department at Georgia State University, Atlanta, GA, USA
| |
Collapse
|
48
|
Ata SK, Wu M, Fang Y, Ou-Yang L, Kwoh CK, Li XL. Recent advances in network-based methods for disease gene prediction. Brief Bioinform 2020; 22:6023077. [PMID: 33276376 DOI: 10.1093/bib/bbaa303] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Revised: 09/29/2020] [Accepted: 10/10/2020] [Indexed: 01/28/2023] Open
Abstract
Disease-gene association through genome-wide association study (GWAS) is an arduous task for researchers. Investigating single nucleotide polymorphisms that correlate with specific diseases needs statistical analysis of associations. Considering the huge number of possible mutations, in addition to its high cost, another important drawback of GWAS analysis is the large number of false positives. Thus, researchers search for more evidence to cross-check their results through different sources. To provide the researchers with alternative and complementary low-cost disease-gene association evidence, computational approaches come into play. Since molecular networks are able to capture complex interplay among molecules in diseases, they become one of the most extensively used data for disease-gene association prediction. In this survey, we aim to provide a comprehensive and up-to-date review of network-based methods for disease gene prediction. We also conduct an empirical analysis on 14 state-of-the-art methods. To summarize, we first elucidate the task definition for disease gene prediction. Secondly, we categorize existing network-based efforts into network diffusion methods, traditional machine learning methods with handcrafted graph features and graph representation learning methods. Thirdly, an empirical analysis is conducted to evaluate the performance of the selected methods across seven diseases. We also provide distinguishing findings about the discussed methods based on our empirical analysis. Finally, we highlight potential research directions for future studies on disease gene prediction.
Collapse
Affiliation(s)
- Sezin Kircali Ata
- School of Computer Science and Engineering Nanyang Technological University (NTU)
| | - Min Wu
- Institute for Infocomm Research (I2R), A*STAR, Singapore
| | - Yuan Fang
- School of Information Systems, Singapore Management University, Singapore
| | - Le Ou-Yang
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen China
| | | | - Xiao-Li Li
- Department head and principal scientist at I2R, A*STAR, Singapore
| |
Collapse
|
49
|
Wang L, Chen Y, Zhang N, Chen W, Zhang Y, Gao R. QIMCMDA: MiRNA-Disease Association Prediction by q-Kernel Information and Matrix Completion. Front Genet 2020; 11:594796. [PMID: 33193744 PMCID: PMC7643770 DOI: 10.3389/fgene.2020.594796] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Accepted: 09/21/2020] [Indexed: 12/27/2022] Open
Abstract
Studies have shown that microRNAs (miRNAs) are closely associated with many human diseases, but we have not yet fully understand the role and potential molecular mechanisms of miRNAs in the process of disease development. However, ordinary biological experiments often require higher costs, and computational methods can be used to quickly and effectively predict the potential miRNA-disease association effect at a lower cost, and can be used as a useful reference for experimental methods. For miRNA-disease association prediction, we have proposed a new method called Matrix completion algorithm based on q-kernel information (QIMCMDA). We use fivefold cross-validation and leave-one-out cross-validation to prove the effectiveness of QIMCMDA. LOOCV shows that AUC can reach 0.9235, and its performance is significantly better than other commonly used technologies. In addition, we applied QIMCMDA to case studies of three human diseases, and the results show that our method performs well in inferring potential interaction between miRNAs and diseases. It is expected that QIMCMDA will become an excellent supplement in the field of biomedical research in the future.
Collapse
Affiliation(s)
- Lin Wang
- School of Mathematics and Statistics, Shandong University, Jinan, China
| | - Yaguang Chen
- School of Mathematics and Statistics, Shandong University, Jinan, China
| | - Naiqian Zhang
- School of Mathematics and Statistics, Shandong University, Jinan, China
| | - Wei Chen
- School of Mathematics and Statistics, Shandong University, Jinan, China
| | - Yusen Zhang
- School of Mathematics and Statistics, Shandong University, Jinan, China
| | - Rui Gao
- School of Control Science and Engineering, Shandong University, Jinan, China
| |
Collapse
|
50
|
Yu Z, Huang F, Zhao X, Xiao W, Zhang W. Predicting drug-disease associations through layer attention graph convolutional network. Brief Bioinform 2020; 22:5918381. [PMID: 33078832 DOI: 10.1093/bib/bbaa243] [Citation(s) in RCA: 183] [Impact Index Per Article: 36.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Revised: 08/16/2020] [Accepted: 08/31/2020] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Determining drug-disease associations is an integral part in the process of drug development. However, the identification of drug-disease associations through wet experiments is costly and inefficient. Hence, the development of efficient and high-accuracy computational methods for predicting drug-disease associations is of great significance. RESULTS In this paper, we propose a novel computational method named as layer attention graph convolutional network (LAGCN) for the drug-disease association prediction. Specifically, LAGCN first integrates the known drug-disease associations, drug-drug similarities and disease-disease similarities into a heterogeneous network, and applies the graph convolution operation to the network to learn the embeddings of drugs and diseases. Second, LAGCN combines the embeddings from multiple graph convolution layers using an attention mechanism. Third, the unobserved drug-disease associations are scored based on the integrated embeddings. Evaluated by 5-fold cross-validations, LAGCN achieves an area under the precision-recall curve of 0.3168 and an area under the receiver-operating characteristic curve of 0.8750, which are better than the results of existing state-of-the-art prediction methods and baseline methods. The case study shows that LAGCN can discover novel associations that are not curated in our dataset. CONCLUSION LAGCN is a useful tool for predicting drug-disease associations. This study reveals that embeddings from different convolution layers can reflect the proximities of different orders, and combining the embeddings by the attention mechanism can improve the prediction performances.
Collapse
Affiliation(s)
- Zhouxin Yu
- College of Informatics, Huazhong Agricultural University
| | - Feng Huang
- College of Informatics, Huazhong Agricultural University
| | - Xiaohan Zhao
- College of Informatics, Huazhong Agricultural University
| | | | - Wen Zhang
- College of Informatics, Huazhong Agricultural University
| |
Collapse
|