1
|
Bi XA, Chen K, Jiang S, Luo S, Zhou W, Xing Z, Xu L, Liu Z, Liu T. Community Graph Convolution Neural Network for Alzheimer's Disease Classification and Pathogenetic Factors Identification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:1959-1973. [PMID: 37204952 DOI: 10.1109/tnnls.2023.3269446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
As a complex neural network system, the brain regions and genes collaborate to effectively store and transmit information. We abstract the collaboration correlations as the brain region gene community network (BG-CN) and present a new deep learning approach, such as the community graph convolutional neural network (Com-GCN), for investigating the transmission of information within and between communities. The results can be used for diagnosing and extracting causal factors for Alzheimer's disease (AD). First, an affinity aggregation model for BG-CN is developed to describe intercommunity and intracommunity information transmission. Second, we design the Com-GCN architecture with intercommunity convolution and intracommunity convolution operations based on the affinity aggregation model. Through sufficient experimental validation on the AD neuroimaging initiative (ADNI) dataset, the design of Com-GCN matches the physiological mechanism better and improves the interpretability and classification performance. Furthermore, Com-GCN can identify lesioned brain regions and disease-causing genes, which may assist precision medicine and drug design in AD and serve as a valuable reference for other neurological disorders.
Collapse
|
2
|
Lu P, Wang Y. RDGAN: Prediction of circRNA-Disease Associations via Resistance Distance and Graph Attention Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1445-1457. [PMID: 38787672 DOI: 10.1109/tcbb.2024.3402248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2024]
Abstract
As a series of single-stranded RNAs, circRNAs have been implicated in numerous diseases and can serve as valuable biomarkers for disease therapy and prevention. However, traditional biological experiments demand significant time and effort. Therefore, various computational methods have been proposed to address this limitation, but how to extract features more comprehensively remains a challenge that needs further attention in the future. In this study, we propose a unique approach to predict circRNA-disease associations based on resistance distance and graph attention network (RDGAN). First, the associations of circRNA and disease are obtained by fusing multiple databases, and resistance distance as a similarity matrix is used to further deal with the sparse of the similarity matrices. Then the circRNA-disease heterogeneous network is constructed based on the similiarity of circRNA-circRNA, disease-disease and the known circRNA-disease adjacency matric. Second, leveraging the three neural network modules-ResGatedGraphConv, GAT and MFConv-we gather node feature embeddings collected from the heterogeneous network. Subsequently, all the characteristics are supplied to the self-attention mechanism to predict new potential connections. Finally, our model obtains a remarkable AUC value of 0.9630 through five-fold cross-validation, surpassing the predictive performance of the other eight state-of-the-art models.
Collapse
|
3
|
Hu X, Jiang Y, Deng L. Exploring ncRNA-Drug Sensitivity Associations via Graph Contrastive Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1380-1389. [PMID: 38578855 DOI: 10.1109/tcbb.2024.3385423] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/07/2024]
Abstract
Increasing evidence has shown that noncoding RNAs (ncRNAs) can affect drug efficiency by modulating drug sensitivity genes. Exploring the association between ncRNAs and drug sensitivity is essential for drug discovery and disease prevention. However, traditional biological experiments for identifying ncRNA-drug sensitivity associations are time-consuming and laborious. In this study, we develop a novel graph contrastive learning approach named NDSGCL to predict ncRNA-drug sensitivity. NDSGCL uses graph convolutional networks to learn feature representations of ncRNAs and drugs in ncRNA-drug bipartite graphs. It integrates local structural neighbours and global semantic neighbours to learn a more comprehensive representation by contrastive learning. Specifically, the local structural neighbours aim to capture the higher-order relationship in the ncRNA-drug graph, while the global semantic neighbours are defined based on semantic clusters of the graph that can alleviate the impact of data sparsity. The experimental results show that NDSGCL outperforms basic graph convolutional network methods, existing contrastive learning methods, and state-of-the-art prediction methods. Visualization experiments show that the contrastive objectives of local structural neighbours and global semantic neighbours play a significant role in contrastive learning. Case studies on two drugs show that NDSGCL is an effective tool for predicting ncRNA-drug sensitivity associations.
Collapse
|
4
|
Wang XF, Yu CQ, You ZH, Wang Y, Huang L, Qiao Y, Wang L, Li ZW. BEROLECMI: a novel prediction method to infer circRNA-miRNA interaction from the role definition of molecular attributes and biological networks. BMC Bioinformatics 2024; 25:264. [PMID: 39127625 DOI: 10.1186/s12859-024-05891-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 08/01/2024] [Indexed: 08/12/2024] Open
Abstract
Circular RNA (CircRNA)-microRNA (miRNA) interaction (CMI) is an important model for the regulation of biological processes by non-coding RNA (ncRNA), which provides a new perspective for the study of human complex diseases. However, the existing CMI prediction models mainly rely on the nearest neighbor structure in the biological network, ignoring the molecular network topology, so it is difficult to improve the prediction performance. In this paper, we proposed a new CMI prediction method, BEROLECMI, which uses molecular sequence attributes, molecular self-similarity, and biological network topology to define the specific role feature representation for molecules to infer the new CMI. BEROLECMI effectively makes up for the lack of network topology in the CMI prediction model and achieves the highest prediction performance in three commonly used data sets. In the case study, 14 of the 15 pairs of unknown CMIs were correctly predicted.
Collapse
Affiliation(s)
- Xin-Fei Wang
- School of Information Engineering, Xijing University, Xi'an, China
| | - Chang-Qing Yu
- School of Information Engineering, Xijing University, Xi'an, China.
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China.
| | - Yan Wang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China.
- School of Artificial Intelligence, Jilin University, Changchun, China.
| | - Lan Huang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Yan Qiao
- College of Agriculture and Forestry, Longdong University, Qingyang, China
| | - Lei Wang
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China
- Guangxi Academy of Sciences, Nanning, China
| | - Zheng-Wei Li
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China
| |
Collapse
|
5
|
Zhang Y, Wang Z, Wei H, Chen M. Exploring potential circRNA biomarkers for cancers based on double-line heterogeneous graph representation learning. BMC Med Inform Decis Mak 2024; 24:159. [PMID: 38844961 PMCID: PMC11157868 DOI: 10.1186/s12911-024-02564-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2024] [Accepted: 06/04/2024] [Indexed: 06/09/2024] Open
Abstract
BACKGROUND Compared with the time-consuming and labor-intensive for biological validation in vitro or in vivo, the computational models can provide high-quality and purposeful candidates in an instant. Existing computational models face limitations in effectively utilizing sparse local structural information for accurate predictions in circRNA-disease associations. This study addresses this challenge with a proposed method, CDA-DGRL (Prediction of CircRNA-Disease Association based on Double-line Graph Representation Learning), which employs a deep learning framework leveraging graph networks and a dual-line representation model integrating graph node features. METHOD CDA-DGRL comprises several key steps: initially, the integration of diverse biological information to compute integrated similarities among circRNAs and diseases, leading to the construction of a heterogeneous network specific to circRNA-disease associations. Subsequently, circRNA and disease node features are derived using sparse autoencoders. Thirdly, a graph convolutional neural network is employed to capture the local graph network structure by inputting the circRNA-disease heterogeneous network alongside node features. Fourthly, the utilization of node2vec facilitates depth-first sampling of the circRNA-disease heterogeneous network to grasp the global graph network structure, addressing issues associated with sparse raw data. Finally, the fusion of local and global graph network structures is inputted into an extra trees classifier to identify potential circRNA-disease associations. RESULTS The results, obtained through a rigorous five-fold cross-validation on the circR2Disease dataset, demonstrate the superiority of CDA-DGRL with an AUC value of 0.9866 and an AUPR value of 0.9897 compared to existing state-of-the-art models. Notably, the hyper-random tree classifier employed in this model outperforms other machine learning classifiers. CONCLUSION Thus, CDA-DGRL stands as a promising methodology for reliably identifying circRNA-disease associations, offering potential avenues to alleviate the necessity for extensive traditional biological experiments. The source code and data for this study are available at https://github.com/zywait/CDA-DGRL .
Collapse
Affiliation(s)
- Yi Zhang
- School of Computer Science and Engineering, Guilin University of Technology, Guilin, 541004, China
- Guangxi Key Laboratory of Embedded Technology and Intelligent System, Guilin University of Technology, Guilin, 541004, China
| | - ZhenMei Wang
- School of Big Data, Guangxi Vocational and Technical College, Nanning, 530003, China.
| | - Hanyan Wei
- Pharmacy School, Guilin Medical University, Guilin, 541004, China
| | - Min Chen
- School of Computer Science and Technology, Hunan Institute of Technology, Hengyang, 421010, China
| |
Collapse
|
6
|
Wang Y, Lu P. GEHGAN: CircRNA-disease association prediction via graph embedding and heterogeneous graph attention network. Comput Biol Chem 2024; 110:108079. [PMID: 38704917 DOI: 10.1016/j.compbiolchem.2024.108079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 04/03/2024] [Accepted: 04/16/2024] [Indexed: 05/07/2024]
Abstract
There is growing proof suggested that circRNAs play a crucial function in diverse important biological reactions related to human diseases. Within the area of biochemistry, a massive range of wet experiments have been carried out to find out the connections of circRNA-disease in recent years. Since wet experiments are expensive and laborious, nowadays, calculation-based solutions have increasingly attracted the attention of researchers. However, the performance of these methods is restricted due to the inability to balance the distribution among various types of nodes. To remedy the problem, we present a novel computational method called GEHGAN to forecast the new relationships in this research, leveraging graph embedding and heterogeneous graph attention networks. Firstly, we calculate circRNA sequences similarity, circRNA RBP similarity, disease semantic similarity and corresponding GIP kernel similarity to construct heterogeneous graph. Secondly, a graph embedding method using random walks with jump and stay strategies is applied to obtain the preliminary embeddings of circRNAs and diseases, greatly improving the performance of the model. Thirdly, a multi-head graph attention network is employed to further update the embeddings, followed by the employment of the MLP as a predictor. As a result, the five-fold cross-validation indicates that GEHGAN achieves an outstanding AUC score of 0.9829 and an AUPR value of 0.9815 on the CircR2Diseasev2.0 database, and case studies on osteosarcoma, gastric and colorectal neoplasms further confirm the model's efficacy at identifying circRNA-disease correlations.
Collapse
Affiliation(s)
- Yuehao Wang
- School of Computer and Communication, Lanzhou University of Technology, Lanzhou, 730050, Gansu, PR China.
| | - Pengli Lu
- School of Computer and Communication, Lanzhou University of Technology, Lanzhou, 730050, Gansu, PR China.
| |
Collapse
|
7
|
Li YC, You ZH, Yu CQ, Wang L, Hu L, Hu PW, Qiao Y, Wang XF, Huang YA. DeepCMI: a graph-based model for accurate prediction of circRNA-miRNA interactions with multiple information. Brief Funct Genomics 2024; 23:276-285. [PMID: 37539561 DOI: 10.1093/bfgp/elad030] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2023] [Revised: 05/25/2023] [Accepted: 07/13/2023] [Indexed: 08/05/2023] Open
Abstract
Recently, the role of competing endogenous RNAs in regulating gene expression through the interaction of microRNAs has been closely associated with the expression of circular RNAs (circRNAs) in various biological processes such as reproduction and apoptosis. While the number of confirmed circRNA-miRNA interactions (CMIs) continues to increase, the conventional in vitro approaches for discovery are expensive, labor intensive, and time consuming. Therefore, there is an urgent need for effective prediction of potential CMIs through appropriate data modeling and prediction based on known information. In this study, we proposed a novel model, called DeepCMI, that utilizes multi-source information on circRNA/miRNA to predict potential CMIs. Comprehensive evaluations on the CMI-9905 and CMI-9589 datasets demonstrated that DeepCMI successfully infers potential CMIs. Specifically, DeepCMI achieved AUC values of 90.54% and 94.8% on the CMI-9905 and CMI-9589 datasets, respectively. These results suggest that DeepCMI is an effective model for predicting potential CMIs and has the potential to significantly reduce the need for downstream in vitro studies. To facilitate the use of our trained model and data, we have constructed a computational platform, which is available at http://120.77.11.78/DeepCMI/. The source code and datasets used in this work are available at https://github.com/LiYuechao1998/DeepCMI.
Collapse
Affiliation(s)
- Yue-Chao Li
- School of Information Engineering, Xijing University, Xi'an, China
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Chang-Qing Yu
- School of Information Engineering, Xijing University, Xi'an, China
| | - Lei Wang
- Guangxi Academy of Sciences, Nanning, China
| | - Lun Hu
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi, China
| | - Peng-Wei Hu
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi, China
| | - Yan Qiao
- College of Agriculture and Forestry, Longdong University, Qingyang 745000, China
| | - Xin-Fei Wang
- School of Information Engineering, Xijing University, Xi'an, China
| | - Yu-An Huang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| |
Collapse
|
8
|
Wang L, Li ZW, You ZH, Huang DS, Wong L. GSLCDA: An Unsupervised Deep Graph Structure Learning Method for Predicting CircRNA-Disease Association. IEEE J Biomed Health Inform 2024; 28:1742-1751. [PMID: 38127594 DOI: 10.1109/jbhi.2023.3344714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Growing studies reveal that Circular RNAs (circRNAs) are broadly engaged in physiological processes of cell proliferation, differentiation, aging, apoptosis, and are closely associated with the pathogenesis of numerous diseases. Clarification of the correlation among diseases and circRNAs is of great clinical importance to provide new therapeutic strategies for complex diseases. However, previous circRNA-disease association prediction methods rely excessively on the graph network, and the model performance is dramatically reduced when noisy connections occur in the graph structure. To address this problem, this paper proposes an unsupervised deep graph structure learning method GSLCDA to predict potential CDAs. Concretely, we first integrate circRNA and disease multi-source data to constitute the CDA heterogeneous network. Then the network topology is learned using the graph structure, and the original graph is enhanced in an unsupervised manner by maximize the inter information of the learned and original graphs to uncover their essential features. Finally, graph space sensitive k-nearest neighbor (KNN) algorithm is employed to search for latent CDAs. In the benchmark dataset, GSLCDA obtained 92.67% accuracy with 0.9279 AUC. GSLCDA also exhibits exceptional performance on independent datasets. Furthermore, 14, 12 and 14 of the top 16 circRNAs with the most points GSLCDA prediction scores were confirmed in the relevant literature in the breast cancer, colorectal cancer and lung cancer case studies, respectively. Such results demonstrated that GSLCDA can validly reveal underlying CDA and offer new perspectives for the diagnosis and therapy of complex human diseases.
Collapse
|
9
|
Zhang P, Zhang W, Sun W, Xu J, Hu H, Wang L, Wong L. Identification of gene biomarkers for brain diseases via multi-network topological semantics extraction and graph convolutional network. BMC Genomics 2024; 25:175. [PMID: 38350848 PMCID: PMC10865627 DOI: 10.1186/s12864-024-09967-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 01/03/2024] [Indexed: 02/15/2024] Open
Abstract
BACKGROUND Brain diseases pose a significant threat to human health, and various network-based methods have been proposed for identifying gene biomarkers associated with these diseases. However, the brain is a complex system, and extracting topological semantics from different brain networks is necessary yet challenging to identify pathogenic genes for brain diseases. RESULTS In this study, we present a multi-network representation learning framework called M-GBBD for the identification of gene biomarker in brain diseases. Specifically, we collected multi-omics data to construct eleven networks from different perspectives. M-GBBD extracts the spatial distributions of features from these networks and iteratively optimizes them using Kullback-Leibler divergence to fuse the networks into a common semantic space that represents the gene network for the brain. Subsequently, a graph consisting of both gene and large-scale disease proximity networks learns representations through graph convolution techniques and predicts whether a gene is associated which brain diseases while providing associated scores. Experimental results demonstrate that M-GBBD outperforms several baseline methods. Furthermore, our analysis supported by bioinformatics revealed CAMP as a significantly associated gene with Alzheimer's disease identified by M-GBBD. CONCLUSION Collectively, M-GBBD provides valuable insights into identifying gene biomarkers for brain diseases and serves as a promising framework for brain networks representation learning.
Collapse
Affiliation(s)
- Ping Zhang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, 277100, Shandong, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Weihan Zhang
- CAS Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, The Innovative Academy of Seed Design, Chinese Academy of Sciences, Hubei Hongshan Laboratory, Wuhan, 430074, China
| | - Weicheng Sun
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Jinsheng Xu
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Hua Hu
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, 277100, Shandong, China.
| | - Lei Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, 277100, Shandong, China.
- Guangxi Key Lab of Human-Machine Interaction and Intelligent Decision, Guangxi Academy of Sciences, Nanning, 530007, China.
| | - Leon Wong
- College of Big Data and Internet, Shenzhen Technology University, Shenzhen, 518118, China.
| |
Collapse
|
10
|
Lu P, Zhang W, Wu J. AMPCDA: Prediction of circRNA-disease associations by utilizing attention mechanisms on metapaths. Comput Biol Chem 2024; 108:107989. [PMID: 38016366 DOI: 10.1016/j.compbiolchem.2023.107989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 10/24/2023] [Accepted: 11/15/2023] [Indexed: 11/30/2023]
Abstract
Researchers have been creating an expanding corpus of experimental evidences in biomedical field which has revealed prevalent associations between circRNAs and human diseases. Such linkages unveiled afforded a new perspective for elucidating etiology and devise innovative therapeutic strategies. In recent years, many computational methods were introduced to remedy the limitations of inefficiency and exorbitant budgets brought by conventional lab-experimental approaches to enumerate possible circRNA-disease associations, but the majority of existing methods still face challenges in effectively integrating node embeddings with higher-order neighborhood representations, which might hinder the final predictive accuracy from attaining optimal measures. To overcome such constraints, we proposed AMPCDA, a computational technique harnessing predefined metapaths to predict circRNA-disease associations. Specifically, an association graph is initially built upon three source databases and two similarity derivation procedures, and DeepWalk is subsequently imposed on the graph to procure initial feature representations. Vectorial embeddings of metapath instances, concatenated by initial node features, are then fed through a customed encoder. By employing self-attention section, metapath-specific contributions to each node are accumulated before combining with node's intrinsic features and channeling into a graph attention module, which furnished the input representations for the multilayer perceptron to predict the ultimate association probability scores. By integrating graph topology features and node embedding themselves, AMPCDA managed to effectively leverage information carried by multiple nodes along paths and exhibited an exceptional predictive performance, achieving AUC values of 0.9623, 0.9675, and 0.9711 under 5-fold cross validation, 10-fold cross validation, and leave-one-out cross validation, respectively. These results signify substantial accuracy improvements compared to other prediction models. Case study assessments confirm the high predictive accuracy of our proposed technique in identifying circRNA-disease connections, highlighting its value in guiding future biological research to reveal new disease mechanisms.
Collapse
Affiliation(s)
- Pengli Lu
- School of Computer and Communication, Lanzhou University of Technology, Lanzhou, 730050, Gansu, PR China.
| | - Wenqi Zhang
- School of Computer and Communication, Lanzhou University of Technology, Lanzhou, 730050, Gansu, PR China.
| | - Jinkai Wu
- School of Computer and Communication, Lanzhou University of Technology, Lanzhou, 730050, Gansu, PR China.
| |
Collapse
|
11
|
Wei MM, Yu CQ, Li LP, You ZH, Wang L. BCMCMI: A Fusion Model for Predicting circRNA-miRNA Interactions Combining Semantic and Meta-path. J Chem Inf Model 2023; 63:5384-5394. [PMID: 37535872 DOI: 10.1021/acs.jcim.3c00852] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/05/2023]
Abstract
More and more evidence suggests that circRNA plays a vital role in generating and treating diseases by interacting with miRNA. Therefore, accurate prediction of potential circRNA-miRNA interaction (CMI) has become urgent. However, traditional wet experiments are time-consuming and costly, and the results will be affected by objective factors. In this paper, we propose a computational model BCMCMI, which combines three features to predict CMI. Specifically, BCMCMI utilizes the bidirectional encoding capability of the BERT algorithm to extract sequence features from the semantic information of circRNA and miRNA. Then, a heterogeneous network is constructed based on cosine similarity and known CMI information. The Metapath2vec is employed to conduct random walks following meta-paths in the network to capture topological features, including similarity features. Finally, potential CMIs are predicted using the XGBoost classifier. BCMCMI achieves superior results compared to other state-of-the-art models on two benchmark datasets for CMI prediction. We also utilize t-SNE to visually observe the distribution of the extracted features on a randomly selected dataset. The remarkable prediction results show that BCMCMI can serve as a valuable complement to the wet experiment process.
Collapse
Affiliation(s)
- Meng-Meng Wei
- School of Information Engineering, Xijing University, Xi'an, Shaanxi 710123, China
| | - Chang-Qing Yu
- School of Information Engineering, Xijing University, Xi'an, Shaanxi 710123, China
| | - Li-Ping Li
- College of Agriculture and Forestry, Longdong University, Qingyang, Gansu 745000, China
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an, Shaanxi 710072, China
| | - Lei Wang
- Guangxi Key Lab of Human-Machine Interaction and Intelligent Decision, Guangxi Academy of Sciences, Nanning, Guangxi 530007, China
| |
Collapse
|
12
|
Yuan L, Zhao J, Shen Z, Zhang Q, Geng Y, Zheng CH, Huang DS. iCircDA-NEAE: Accelerated attribute network embedding and dynamic convolutional autoencoder for circRNA-disease associations prediction. PLoS Comput Biol 2023; 19:e1011344. [PMID: 37651321 PMCID: PMC10470932 DOI: 10.1371/journal.pcbi.1011344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 07/10/2023] [Indexed: 09/02/2023] Open
Abstract
Accumulating evidence suggests that circRNAs play crucial roles in human diseases. CircRNA-disease association prediction is extremely helpful in understanding pathogenesis, diagnosis, and prevention, as well as identifying relevant biomarkers. During the past few years, a large number of deep learning (DL) based methods have been proposed for predicting circRNA-disease association and achieved impressive prediction performance. However, there are two main drawbacks to these methods. The first is these methods underutilize biometric information in the data. Second, the features extracted by these methods are not outstanding to represent association characteristics between circRNAs and diseases. In this study, we developed a novel deep learning model, named iCircDA-NEAE, to predict circRNA-disease associations. In particular, we use disease semantic similarity, Gaussian interaction profile kernel, circRNA expression profile similarity, and Jaccard similarity simultaneously for the first time, and extract hidden features based on accelerated attribute network embedding (AANE) and dynamic convolutional autoencoder (DCAE). Experimental results on the circR2Disease dataset show that iCircDA-NEAE outperforms other competing methods significantly. Besides, 16 of the top 20 circRNA-disease pairs with the highest prediction scores were validated by relevant literature. Furthermore, we observe that iCircDA-NEAE can effectively predict new potential circRNA-disease associations.
Collapse
Affiliation(s)
- Lin Yuan
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
- Shandong Engineering Research Center of Big Data Applied Technology, Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
- Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Jinan, China
| | - Jiawang Zhao
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
- Shandong Engineering Research Center of Big Data Applied Technology, Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
- Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Jinan, China
| | - Zhen Shen
- School of Computer and Software, Nanyang Institute of Technology, Nanyang, China
| | - Qinhu Zhang
- Eastern Institute for Advanced Study, Eastern Institute of Technology, Ningbo, China
| | - Yushui Geng
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
- Shandong Engineering Research Center of Big Data Applied Technology, Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
- Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Jinan, China
| | - Chun-Hou Zheng
- Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, Hefei, China
| | - De-Shuang Huang
- Eastern Institute for Advanced Study, Eastern Institute of Technology, Ningbo, China
| |
Collapse
|
13
|
Watanabe N, Kuriya Y, Murata M, Yamamoto M, Shimizu M, Araki M. Different Recognition of Protein Features Depending on Deep Learning Models: A Case Study of Aromatic Decarboxylase UbiD. BIOLOGY 2023; 12:795. [PMID: 37372080 DOI: 10.3390/biology12060795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 05/17/2023] [Accepted: 05/29/2023] [Indexed: 06/29/2023]
Abstract
The number of unannotated protein sequences is explosively increasing due to genome sequence technology. A more comprehensive understanding of protein functions for protein annotation requires the discovery of new features that cannot be captured from conventional methods. Deep learning can extract important features from input data and predict protein functions based on the features. Here, protein feature vectors generated by 3 deep learning models are analyzed using Integrated Gradients to explore important features of amino acid sites. As a case study, prediction and feature extraction models for UbiD enzymes were built using these models. The important amino acid residues extracted from the models were different from secondary structures, conserved regions and active sites of known UbiD information. Interestingly, the different amino acid residues within UbiD sequences were regarded as important factors depending on the type of models and sequences. The Transformer models focused on more specific regions than the other models. These results suggest that each deep learning model understands protein features with different aspects from existing knowledge and has the potential to discover new laws of protein functions. This study will help to extract new protein features for the other protein annotations.
Collapse
Affiliation(s)
- Naoki Watanabe
- Artificial Intelligence Center for Health and Biomedical Research, National Institutes of Biomedical Innovation, Health and Nutrition, 3-17 Senrioka-shinmachi, Settsu 566-0002, Japan
| | - Yuki Kuriya
- Artificial Intelligence Center for Health and Biomedical Research, National Institutes of Biomedical Innovation, Health and Nutrition, 3-17 Senrioka-shinmachi, Settsu 566-0002, Japan
| | - Masahiro Murata
- Graduate School of Science, Technology and Innovation, Kobe University, 1-1 Rokkodai, Nada-Ku, Kobe 657-8501, Japan
| | - Masaki Yamamoto
- Artificial Intelligence Center for Health and Biomedical Research, National Institutes of Biomedical Innovation, Health and Nutrition, 3-17 Senrioka-shinmachi, Settsu 566-0002, Japan
| | - Masayuki Shimizu
- Bacchus Bio Innovation Co., Ltd., 6-3-7 Minatojima minami-machi, Kobe 650-0047, Japan
| | - Michihiro Araki
- Artificial Intelligence Center for Health and Biomedical Research, National Institutes of Biomedical Innovation, Health and Nutrition, 3-17 Senrioka-shinmachi, Settsu 566-0002, Japan
- Graduate School of Science, Technology and Innovation, Kobe University, 1-1 Rokkodai, Nada-Ku, Kobe 657-8501, Japan
- Graduate School of Medicine, Kyoto University, 54 Shogoin-Kawahara-cho, Sakyo-ku, Kyoto 606-8507, Japan
- National Cerebral and Cardiovascular Center, 6-1 Kishibe-Shinmachi, Suita 564-8565, Japan
| |
Collapse
|
14
|
Lu C, Zhang L, Zeng M, Lan W, Duan G, Wang J. Inferring disease-associated circRNAs by multi-source aggregation based on heterogeneous graph neural network. Brief Bioinform 2023; 24:6960978. [PMID: 36572658 DOI: 10.1093/bib/bbac549] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Revised: 11/03/2022] [Accepted: 11/11/2022] [Indexed: 12/28/2022] Open
Abstract
Emerging evidence has proved that circular RNAs (circRNAs) are implicated in pathogenic processes. They are regarded as promising biomarkers for diagnosis due to covalently closed loop structures. As opposed to traditional experiments, computational approaches can identify circRNA-disease associations at a lower cost. Aggregating multi-source pathogenesis data helps to alleviate data sparsity and infer potential associations at the system level. The majority of computational approaches construct a homologous network using multi-source data, but they lose the heterogeneity of the data. Effective methods that use the features of multi-source data are considered as a matter of urgency. In this paper, we propose a model (CDHGNN) based on edge-weighted graph attention and heterogeneous graph neural networks for potential circRNA-disease association prediction. The circRNA network, micro RNA network, disease network and heterogeneous network are constructed based on multi-source data. To reflect association probabilities between nodes, an edge-weighted graph attention network model is designed for node features. To assign attention weights to different types of edges and learn contextual meta-path, CDHGNN infers potential circRNA-disease association based on heterogeneous neural networks. CDHGNN outperforms state-of-the-art algorithms in terms of accuracy. Edge-weighted graph attention networks and heterogeneous graph networks have both improved performance significantly. Furthermore, case studies suggest that CDHGNN is capable of identifying specific molecular associations and investigating biomolecular regulatory relationships in pathogenesis. The code of CDHGNN is freely available at https://github.com/BioinformaticsCSU/CDHGNN.
Collapse
Affiliation(s)
- Chengqian Lu
- School of Computer Science and Engineering, Central South University, Changsha, 410083, Hunan, China.,Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, Hunan, China.,School of Computer Science, Xiangtan University, Xiangtan, 411105, Hunan, China
| | - Lishen Zhang
- School of Computer Science and Engineering, Central South University, Changsha, 410083, Hunan, China.,Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, Hunan, China
| | - Min Zeng
- School of Computer Science and Engineering, Central South University, Changsha, 410083, Hunan, China.,Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, Hunan, China
| | - Wei Lan
- School of Computer, Electronic and Information, Guangxi University, Nanning, 530004, Guangxi, China
| | - Guihua Duan
- School of Computer Science and Engineering, Central South University, Changsha, 410083, Hunan, China.,Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, Hunan, China
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha, 410083, Hunan, China.,Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, Hunan, China
| |
Collapse
|
15
|
Lan W, Dong Y, Zhang H, Li C, Chen Q, Liu J, Wang J, Chen YPP. Benchmarking of computational methods for predicting circRNA-disease associations. Brief Bioinform 2023; 24:6972300. [PMID: 36611256 DOI: 10.1093/bib/bbac613] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 10/29/2022] [Accepted: 12/11/2022] [Indexed: 01/09/2023] Open
Abstract
Accumulating evidences demonstrate that circular RNA (circRNA) plays an important role in human diseases. Identification of circRNA-disease associations can help for the diagnosis of human diseases, while the traditional method based on biological experiments is time-consuming. In order to address the limitation, a series of computational methods have been proposed in recent years. However, few works have summarized these methods or compared the performance of them. In this paper, we divided the existing methods into three categories: information propagation, traditional machine learning and deep learning. Then, the baseline methods in each category are introduced in detail. Further, 5 different datasets are collected, and 14 representative methods of each category are selected and compared in the 5-fold, 10-fold cross-validation and the de novo experiment. In order to further evaluate the effectiveness of these methods, six common cancers are selected to compare the number of correctly identified circRNA-disease associations in the top-10, top-20, top-50, top-100 and top-200. In addition, according to the results, the observation about the robustness and the character of these methods are concluded. Finally, the future directions and challenges are discussed.
Collapse
Affiliation(s)
- Wei Lan
- School of Computer, Electronic and Information and Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning, Guangxi 530004, China
| | - Yi Dong
- School of Computer, Electronic and Information and Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning, Guangxi 530004, China
| | - Hongyu Zhang
- School of Computer, Electronic and Information and Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning, Guangxi 530004, China
| | - Chunling Li
- School of Computer, Electronic and Information and Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning, Guangxi 530004, China
| | - Qingfeng Chen
- School of Computer, Electronic and Information and State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, Guangxi University, Nanning, Guangxi 530004, China
| | - Jin Liu
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Yi-Ping Phoebe Chen
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, Victoria 3086, Australia
| |
Collapse
|
16
|
Zhang ML, Zhao BW, Su XR, He YZ, Yang Y, Hu L. RLFDDA: a meta-path based graph representation learning model for drug-disease association prediction. BMC Bioinformatics 2022; 23:516. [PMID: 36456957 PMCID: PMC9713188 DOI: 10.1186/s12859-022-05069-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 11/21/2022] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND Drug repositioning is a very important task that provides critical information for exploring the potential efficacy of drugs. Yet developing computational models that can effectively predict drug-disease associations (DDAs) is still a challenging task. Previous studies suggest that the accuracy of DDA prediction can be improved by integrating different types of biological features. But how to conduct an effective integration remains a challenging problem for accurately discovering new indications for approved drugs. METHODS In this paper, we propose a novel meta-path based graph representation learning model, namely RLFDDA, to predict potential DDAs on heterogeneous biological networks. RLFDDA first calculates drug-drug similarities and disease-disease similarities as the intrinsic biological features of drugs and diseases. A heterogeneous network is then constructed by integrating DDAs, disease-protein associations and drug-protein associations. With such a network, RLFDDA adopts a meta-path random walk model to learn the latent representations of drugs and diseases, which are concatenated to construct joint representations of drug-disease associations. As the last step, we employ the random forest classifier to predict potential DDAs with their joint representations. RESULTS To demonstrate the effectiveness of RLFDDA, we have conducted a series of experiments on two benchmark datasets by following a ten-fold cross-validation scheme. The results show that RLFDDA yields the best performance in terms of AUC and F1-score when compared with several state-of-the-art DDAs prediction models. We have also conducted a case study on two common diseases, i.e., paclitaxel and lung tumors, and found that 7 out of top-10 diseases and 8 out of top-10 drugs have already been validated for paclitaxel and lung tumors respectively with literature evidence. Hence, the promising performance of RLFDDA may provide a new perspective for novel DDAs discovery over heterogeneous networks.
Collapse
Affiliation(s)
- Meng-Long Zhang
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, China
- University of Chinese Academy of Sciences, Beijing, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi, China
| | - Bo-Wei Zhao
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, China
- University of Chinese Academy of Sciences, Beijing, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi, China
| | - Xiao-Rui Su
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, China
- University of Chinese Academy of Sciences, Beijing, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi, China
| | - Yi-Zhou He
- School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China
| | - Yue Yang
- School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China
| | - Lun Hu
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, China
- University of Chinese Academy of Sciences, Beijing, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi, China
| |
Collapse
|
17
|
Ryšavý P, Kléma J, Merkerová MD. circGPA: circRNA functional annotation based on probability-generating functions. BMC Bioinformatics 2022; 23:392. [PMID: 36167495 PMCID: PMC9513885 DOI: 10.1186/s12859-022-04957-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Accepted: 09/21/2022] [Indexed: 11/25/2022] Open
Abstract
Recent research has already shown that circular RNAs (circRNAs) are functional in gene expression regulation and potentially related to diseases. Due to their stability, circRNAs can also be used as biomarkers for diagnosis. However, the function of most circRNAs remains unknown, and it is expensive and time-consuming to discover it through biological experiments. In this paper, we predict circRNA annotations from the knowledge of their interaction with miRNAs and subsequent miRNA-mRNA interactions. First, we construct an interaction network for a target circRNA and secondly spread the information from the network nodes with the known function to the root circRNA node. This idea itself is not new; our main contribution lies in proposing an efficient and exact deterministic procedure based on the principle of probability-generating functions to calculate the p-value of association test between a circRNA and an annotation term. We show that our publicly available algorithm is both more effective and efficient than the commonly used Monte-Carlo sampling approach that may suffer from difficult quantification of sampling convergence and subsequent sampling inefficiency. We experimentally demonstrate that the new approach is two orders of magnitude faster than the Monte-Carlo sampling, which makes summary annotation of large circRNA files feasible; this includes their reannotation after periodical interaction network updates, for example. We provide a summary annotation of a current circRNA database as one of our outputs. The proposed algorithm could be generalized towards other types of RNA in way that is straightforward.
Collapse
Affiliation(s)
- Petr Ryšavý
- Department of Computer Science, Faculty of Electrical Engineering, Czech Technical University in Prague, Prague, Czech Republic
| | - Jiří Kléma
- Department of Computer Science, Faculty of Electrical Engineering, Czech Technical University in Prague, Prague, Czech Republic
| | | |
Collapse
|
18
|
Wu X, Zhou Y. GE-Impute: graph embedding-based imputation for single-cell RNA-seq data. Brief Bioinform 2022; 23:6651303. [PMID: 35901457 DOI: 10.1093/bib/bbac313] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Revised: 06/27/2022] [Accepted: 07/11/2022] [Indexed: 11/12/2022] Open
Abstract
Single-cell RNA-sequencing (scRNA-seq) has been widely used to depict gene expression profiles at the single-cell resolution. However, its relatively high dropout rate often results in artificial zero expressions of genes and therefore compromised reliability of results. To overcome such unwanted sparsity of scRNA-seq data, several imputation algorithms have been developed to recover the single-cell expression profiles. Here, we propose a novel approach, GE-Impute, to impute the dropout zeros in scRNA-seq data with graph embedding-based neural network model. GE-Impute learns the neural graph representation for each cell and reconstructs the cell-cell similarity network accordingly, which enables better imputation of dropout zeros based on the more accurately allocated neighbors in the similarity network. Gene expression correlation analysis between true expression data and simulated dropout data suggests significantly better performance of GE-Impute on recovering dropout zeros for both droplet- and plated-based scRNA-seq data. GE-Impute also outperforms other imputation methods in identifying differentially expressed genes and improving the unsupervised clustering on datasets from various scRNA-seq techniques. Moreover, GE-Impute enhances the identification of marker genes, facilitating the cell type assignment of clusters. In trajectory analysis, GE-Impute improves time-course scRNA-seq data analysis and reconstructing differentiation trajectory. The above results together demonstrate that GE-Impute could be a useful method to recover the single-cell expression profiles, thus enabling better biological interpretation of scRNA-seq data. GE-Impute is implemented in Python and is freely available at https://github.com/wxbCaterpillar/GE-Impute.
Collapse
Affiliation(s)
- Xiaobin Wu
- Department of Biomedical Informatics, Center for Noncoding RNA Medicine, School of Basic Medical Sciences, Peking University, Beijing, China.,MOE Key Laboratory of Molecular Cardiovascular Sciences, Peking University, Beijing, China
| | - Yuan Zhou
- Department of Biomedical Informatics, Center for Noncoding RNA Medicine, School of Basic Medical Sciences, Peking University, Beijing, China.,MOE Key Laboratory of Molecular Cardiovascular Sciences, Peking University, Beijing, China
| |
Collapse
|