1
|
Wang S, Yu ZG, Han GS. MVSLLnc: LncRNA subcellular localization prediction based on multi-source features and two-stage voting strategy. Methods 2025; 234:324-332. [PMID: 39837434 DOI: 10.1016/j.ymeth.2025.01.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2024] [Revised: 12/28/2024] [Accepted: 01/16/2025] [Indexed: 01/23/2025] Open
Abstract
The subcellular localization of long non-coding RNAs (lncRNAs) is crucial for understanding the function of lncRNAs. Since the traditional biological experimental methods are time-consuming and some existing computational methods rely on high computing power, we are committed to finding a simple and easy-to-implement method to achieve more efficient prediction of the subcellular localization of lncRNAs. In this work, we proposed a model based on multi-source features and two-stage voting strategy for predicting the subcellular localization of lncRNAs (MVSLLnc). The multi-source features include k-mer frequency, features based on the coordinate values of Chaos Game Representation (CGR) and features based on physicochemical property (PhyChe). We feed the multi-source features into the traditional machine learning classifiers RF, SVM and XGBoost, respectively, and perform the final prediction task with two-stage voting strategy. Experimental results on three benchmark datasets show that the accuracy can reach 0.829, 0.793 and 0.968, respectively. The accuracy on three independent test sets is 0.642, 0.737 and 0.518, respectively, which are competitive with the existing methods. Our ablation analyses show that the two-stage voting strategy can make full use of the advantages of multi-source features and multiple classifiers, and obtain more robust results.
Collapse
Affiliation(s)
- Sheng Wang
- National Center for Applied Mathematics in Hunan, Xiangtan University, Hunan 411105, China; Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Hunan 411105, China
| | - Zu-Guo Yu
- National Center for Applied Mathematics in Hunan, Xiangtan University, Hunan 411105, China; Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Hunan 411105, China.
| | - Guo-Sheng Han
- National Center for Applied Mathematics in Hunan, Xiangtan University, Hunan 411105, China; Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Hunan 411105, China.
| |
Collapse
|
2
|
Wu Y, Xie X, Zhu J, Guan L, Li M. Overview and Prospects of DNA Sequence Visualization. Int J Mol Sci 2025; 26:477. [PMID: 39859192 PMCID: PMC11764684 DOI: 10.3390/ijms26020477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2024] [Revised: 12/30/2024] [Accepted: 01/04/2025] [Indexed: 01/27/2025] Open
Abstract
Due to advances in big data technology, deep learning, and knowledge engineering, biological sequence visualization has been extensively explored. In the post-genome era, biological sequence visualization enables the visual representation of both structured and unstructured biological sequence data. However, a universal visualization method for all types of sequences has not been reported. Biological sequence data are rapidly expanding exponentially and the acquisition, extraction, fusion, and inference of knowledge from biological sequences are critical supporting technologies for visualization research. These areas are important and require in-depth exploration. This paper elaborates on a comprehensive overview of visualization methods for DNA sequences from four different perspectives-two-dimensional, three-dimensional, four-dimensional, and dynamic visualization approaches-and discusses the strengths and limitations of each method in detail. Furthermore, this paper proposes two potential future research directions for biological sequence visualization in response to the challenges of inefficient graphical feature extraction and knowledge association network generation in existing methods. The first direction is the construction of knowledge graphs for biological sequence big data, and the second direction is the cross-modal visualization of biological sequences using machine learning methods. This review is anticipated to provide valuable insights and contributions to computational biology, bioinformatics, genomic computing, genetic breeding, evolutionary analysis, and other related disciplines in the fields of biology, medicine, chemistry, statistics, and computing. It has an important reference value in biological sequence recommendation systems and knowledge question answering systems.
Collapse
Affiliation(s)
| | | | | | | | - Mengshan Li
- School of Mathematics and Computer Science, Gannan Normal University, Ganzhou 341000, China; (Y.W.); (X.X.); (J.Z.); (L.G.)
| |
Collapse
|
3
|
Shang J, Zhao L, He X, Meng X, Zhang L, Ge D, Li F, Liu JX. SGFCCDA: Scale Graph Convolutional Networks and Feature Convolution for circRNA-Disease Association Prediction. IEEE J Biomed Health Inform 2024; 28:7006-7014. [PMID: 39250355 DOI: 10.1109/jbhi.2024.3456478] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/11/2024]
Abstract
Circular RNAs (circRNAs) have emerged as a novel class of non-coding RNAs with regulatory roles in disease pathogenesis. Computational models aimed at predicting circRNA-disease associations offer valuable insights into disease mechanisms, thereby enabling the development of innovative diagnostic and therapeutic approaches while reducing the reliance on costly wet experiments. In this study, SGFCCDA is proposed for predicting potential circRNA-disease associations based on scale graph convolutional networks and feature convolution. Specifically, SGFCCDA integrates multiple measures of circRNA and disease similarity and combines known association information to construct a heterogeneous network. This network is then explored by scale graph convolutional networks to capture both topological and attribute information. Additionally, convolutional neural networks are employed to further learn the features and obtain higher-order feature representations containing richer information about nodes. The Hadamard product is utilized to effectively combine circRNA features with disease features, and a multilayer perceptron is applied to predict the association between each pair of circRNA and disease. Five-fold cross validation experiments conducted on the CircR2Disease dataset demonstrate the accurate prediction capabilities of SGFCCDA in identifying potential circRNA-disease associations. Furthermore, case studies provide further confirmation of SGFCCDA's ability to identify disease-associated circRNAs.
Collapse
|
4
|
Cen K, Xing Z, Wang X, Wang Y, Li J. circ2DGNN: circRNA-Disease Association Prediction via Transformer-Based Graph Neural Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:2556-2567. [PMID: 39475749 DOI: 10.1109/tcbb.2024.3488281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
Abstract
Investigating the associations between circRNA and diseases is vital for comprehending the underlying mechanisms of diseases and formulating effective therapies. Computational prediction methods often rely solely on known circRNA-disease data, indirectly incorporating other biomolecules' effects by computing circRNA and disease similarities based on these molecules. However, this approach is limited, as other biomolecules also play significant roles in circRNA-disease interactions. To address this, we construct a comprehensive heterogeneous network incorporating data on human circRNAs, diseases, and other biomolecule interactions to develop a novel computational model, circ2DGNN, which is built upon a heterogeneous graph neural network. circ2DGNN directly takes heterogeneous networks as inputs and obtains the embedded representation of each node for downstream link prediction through graph representation learning. circ2DGNN employs a Transformer-like architecture, which can compute heterogeneous attention score for each edge, and perform message propagation and aggregation, using a residual connection to enhance the representation vector. It uniquely applies the same parameter matrix only to identical meta-relationships, reflecting diverse parameter spaces for different relationship types. After fine-tuning hyperparameters via five-fold cross-validation, evaluation conducted on a test dataset shows circ2DGNN outperforms existing state-of-the-art(SOTA) methods.
Collapse
|
5
|
Wang XF, Huang L, Wang Y, Guan RC, You ZH, Sheng N, Xie XP, Yang QX. A multichannel graph neural network based on multisimilarity modality hypergraph contrastive learning for predicting unknown types of cancer biomarkers. Brief Bioinform 2024; 25:bbae575. [PMID: 39523624 PMCID: PMC11551052 DOI: 10.1093/bib/bbae575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2024] [Revised: 10/19/2024] [Accepted: 11/01/2024] [Indexed: 11/16/2024] Open
Abstract
Identifying potential cancer biomarkers is a key task in biomedical research, providing a promising avenue for the diagnosis and treatment of human tumors and cancers. In recent years, several machine learning-based RNA-disease association prediction techniques have emerged. However, they primarily focus on modeling relationships of a single type, overlooking the importance of gaining insights into molecular behaviors from a complete regulatory network perspective and discovering biomarkers of unknown types. Furthermore, effectively handling local and global topological structural information of nodes in biological molecular regulatory graphs remains a challenge to improving biomarker prediction performance. To address these limitations, we propose a multichannel graph neural network based on multisimilarity modality hypergraph contrastive learning (MML-MGNN) for predicting unknown types of cancer biomarkers. MML-MGNN leverages multisimilarity modality hypergraph contrastive learning to delve into local associations in the regulatory network, learning diverse insights into the topological structures of multiple types of similarities, and then globally modeling the multisimilarity modalities through a multichannel graph autoencoder. By combining representations obtained from local-level associations and global-level regulatory graphs, MML-MGNN can acquire molecular feature descriptors benefiting from multitype association properties and the complete regulatory network. Experimental results on predicting three different types of cancer biomarkers demonstrate the outstanding performance of MML-MGNN. Furthermore, a case study on gastric cancer underscores the outstanding ability of MML-MGNN to gain deeper insights into molecular mechanisms in regulatory networks and prominent potential in cancer biomarker prediction.
Collapse
Affiliation(s)
- Xin-Fei Wang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, No. 2699, Qianjin Street, Changchun 130012, China
| | - Lan Huang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, No. 2699, Qianjin Street, Changchun 130012, China
| | - Yan Wang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, No. 2699, Qianjin Street, Changchun 130012, China
| | - Ren-Chu Guan
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, No. 2699, Qianjin Street, Changchun 130012, China
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Youyi West Road, Xi'an,710072, China
| | - Nan Sheng
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, No. 2699, Qianjin Street, Changchun 130012, China
| | - Xu-Ping Xie
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, No. 2699, Qianjin Street, Changchun 130012, China
| | - Qi-Xing Yang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, No. 2699, Qianjin Street, Changchun 130012, China
| |
Collapse
|
6
|
Zhou Z, Du Z, Jiang X, Zhuo L, Xu Y, Fu X, Liu M, Zou Q. GAM-MDR: probing miRNA-drug resistance using a graph autoencoder based on random path masking. Brief Funct Genomics 2024; 23:475-483. [PMID: 38391194 DOI: 10.1093/bfgp/elae005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 01/15/2024] [Accepted: 01/31/2024] [Indexed: 02/24/2024] Open
Abstract
MicroRNAs (miRNAs) are found ubiquitously in biological cells and play a pivotal role in regulating the expression of numerous target genes. Therapies centered around miRNAs are emerging as a promising strategy for disease treatment, aiming to intervene in disease progression by modulating abnormal miRNA expressions. The accurate prediction of miRNA-drug resistance (MDR) is crucial for the success of miRNA therapies. Computational models based on deep learning have demonstrated exceptional performance in predicting potential MDRs. However, their effectiveness can be compromised by errors in the data acquisition process, leading to inaccurate node representations. To address this challenge, we introduce the GAM-MDR model, which combines the graph autoencoder (GAE) with random path masking techniques to precisely predict potential MDRs. The reliability and effectiveness of the GAM-MDR model are mainly reflected in two aspects. Firstly, it efficiently extracts the representations of miRNA and drug nodes in the miRNA-drug network. Secondly, our designed random path masking strategy efficiently reconstructs critical paths in the network, thereby reducing the adverse impact of noisy data. To our knowledge, this is the first time that a random path masking strategy has been integrated into a GAE to infer MDRs. Our method was subjected to multiple validations on public datasets and yielded promising results. We are optimistic that our model could offer valuable insights for miRNA therapeutic strategies and deepen the understanding of the regulatory mechanisms of miRNAs. Our data and code are publicly available at GitHub:https://github.com/ZZCrazy00/GAM-MDR.
Collapse
Affiliation(s)
- Zhecheng Zhou
- Wenzhou University of Technology, 325000, Wenzhou, China
| | - Zhenya Du
- Guangzhou Xinhua University, 510520, Guangzhou, China
| | - Xin Jiang
- Wenzhou University of Technology, 325000, Wenzhou, China
| | - Linlin Zhuo
- Wenzhou University of Technology, 325000, Wenzhou, China
| | - Yixin Xu
- West China School of Pharmacy Sichuan University, 610041, Chengdu, China
| | - Xiangzheng Fu
- College of Computer Science and Electronic Engineering, Hunan University, 410006, Changsha, China
| | - Mingzhe Liu
- Wenzhou University of Technology, 325000, Wenzhou, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, 611730, Chengdu, China
| |
Collapse
|
7
|
Yuan C, Zhou F, Xu Z, Wu D, Hou P, Yang D, Pan L, Wang P. Functionalized DNA Origami-Enabled Detection of Biomarkers. Chembiochem 2024; 25:e202400227. [PMID: 38700476 DOI: 10.1002/cbic.202400227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 04/30/2024] [Accepted: 05/03/2024] [Indexed: 05/05/2024]
Abstract
Biomarkers are crucial physiological and pathological indicators in the host. Over the years, numerous detection methods have been developed for biomarkers, given their significant potential in various biological and biomedical applications. Among these, the detection system based on functionalized DNA origami has emerged as a promising approach due to its precise control over sensing modules, enabling sensitive, specific, and programmable biomarker detection. We summarize the advancements in biomarker detection using functionalized DNA origami, focusing on strategies for DNA origami functionalization, mechanisms of biomarker recognition, and applications in disease diagnosis and monitoring. These applications are organized into sections based on the type of biomarkers - nucleic acids, proteins, small molecules, and ions - and concludes with a discussion on the advantages and challenges associated with using functionalized DNA origami systems for biomarker detection.
Collapse
Affiliation(s)
- Caiqing Yuan
- College of Chemistry and Materials Science, Shanghai Normal University, Shanghai, 200233, China
- Institute of Molecular Medicine, Department of Laboratory Medicine, Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China
| | - Fei Zhou
- Institute of Molecular Medicine, Department of Laboratory Medicine, Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China
- School of Life Sciences, Shanghai University, Shanghai, 200444, China
| | - Zhihao Xu
- Institute of Molecular Medicine, Department of Laboratory Medicine, Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China
- School of Life Sciences, Shanghai University, Shanghai, 200444, China
| | - Dunkai Wu
- College of Chemistry and Materials Science, Shanghai Normal University, Shanghai, 200233, China
- Institute of Molecular Medicine, Department of Laboratory Medicine, Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China
| | - Pengfei Hou
- College of Chemistry and Materials Science, Shanghai Normal University, Shanghai, 200233, China
- Institute of Molecular Medicine, Department of Laboratory Medicine, Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China
| | - Donglei Yang
- Institute of Molecular Medicine, Department of Laboratory Medicine, Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China
| | - Li Pan
- Institute of Molecular Medicine, Department of Laboratory Medicine, Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China
| | - Pengfei Wang
- Institute of Molecular Medicine, Department of Laboratory Medicine, Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China
| |
Collapse
|
8
|
Zhang Y, Wang Z, Wei H, Chen M. Exploring potential circRNA biomarkers for cancers based on double-line heterogeneous graph representation learning. BMC Med Inform Decis Mak 2024; 24:159. [PMID: 38844961 PMCID: PMC11157868 DOI: 10.1186/s12911-024-02564-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2024] [Accepted: 06/04/2024] [Indexed: 06/09/2024] Open
Abstract
BACKGROUND Compared with the time-consuming and labor-intensive for biological validation in vitro or in vivo, the computational models can provide high-quality and purposeful candidates in an instant. Existing computational models face limitations in effectively utilizing sparse local structural information for accurate predictions in circRNA-disease associations. This study addresses this challenge with a proposed method, CDA-DGRL (Prediction of CircRNA-Disease Association based on Double-line Graph Representation Learning), which employs a deep learning framework leveraging graph networks and a dual-line representation model integrating graph node features. METHOD CDA-DGRL comprises several key steps: initially, the integration of diverse biological information to compute integrated similarities among circRNAs and diseases, leading to the construction of a heterogeneous network specific to circRNA-disease associations. Subsequently, circRNA and disease node features are derived using sparse autoencoders. Thirdly, a graph convolutional neural network is employed to capture the local graph network structure by inputting the circRNA-disease heterogeneous network alongside node features. Fourthly, the utilization of node2vec facilitates depth-first sampling of the circRNA-disease heterogeneous network to grasp the global graph network structure, addressing issues associated with sparse raw data. Finally, the fusion of local and global graph network structures is inputted into an extra trees classifier to identify potential circRNA-disease associations. RESULTS The results, obtained through a rigorous five-fold cross-validation on the circR2Disease dataset, demonstrate the superiority of CDA-DGRL with an AUC value of 0.9866 and an AUPR value of 0.9897 compared to existing state-of-the-art models. Notably, the hyper-random tree classifier employed in this model outperforms other machine learning classifiers. CONCLUSION Thus, CDA-DGRL stands as a promising methodology for reliably identifying circRNA-disease associations, offering potential avenues to alleviate the necessity for extensive traditional biological experiments. The source code and data for this study are available at https://github.com/zywait/CDA-DGRL .
Collapse
Affiliation(s)
- Yi Zhang
- School of Computer Science and Engineering, Guilin University of Technology, Guilin, 541004, China
- Guangxi Key Laboratory of Embedded Technology and Intelligent System, Guilin University of Technology, Guilin, 541004, China
| | - ZhenMei Wang
- School of Big Data, Guangxi Vocational and Technical College, Nanning, 530003, China.
| | - Hanyan Wei
- Pharmacy School, Guilin Medical University, Guilin, 541004, China
| | - Min Chen
- School of Computer Science and Technology, Hunan Institute of Technology, Hengyang, 421010, China
| |
Collapse
|
9
|
Zhao YX, Yu CQ, Li LP, Wang DW, Song HF, Wei Y. BJLD-CMI: a predictive circRNA-miRNA interactions model combining multi-angle feature information. Front Genet 2024; 15:1399810. [PMID: 38798699 PMCID: PMC11116695 DOI: 10.3389/fgene.2024.1399810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Accepted: 04/03/2024] [Indexed: 05/29/2024] Open
Abstract
Increasing research findings suggest that circular RNA (circRNA) exerts a crucial function in the pathogenesis of complex human diseases by binding to miRNA. Identifying their potential interactions is of paramount importance for the diagnosis and treatment of diseases. However, long cycles, small scales, and time-consuming processes characterize previous biological wet experiments. Consequently, the use of an efficient computational model to forecast the interactions between circRNA and miRNA is gradually becoming mainstream. In this study, we present a new prediction model named BJLD-CMI. The model extracts circRNA sequence features and miRNA sequence features by applying Jaccard and Bert's method and organically integrates them to obtain CMI attribute features, and then uses the graph embedding method Line to extract CMI behavioral features based on the known circRNA-miRNA correlation graph information. And then we predict the potential circRNA-miRNA interactions by fusing the multi-angle feature information such as attribute and behavior through Autoencoder in Autoencoder Networks. BJLD-CMI attained 94.95% and 90.69% of the area under the ROC curve on the CMI-9589 and CMI-9905 datasets. When compared with existing models, the results indicate that BJLD-CMI exhibits the best overall competence. During the case study experiment, we conducted a PubMed literature search to confirm that out of the top 10 predicted CMIs, seven pairs did indeed exist. These results suggest that BJLD-CMI is an effective method for predicting interactions between circRNAs and miRNAs. It provides a valuable candidate for biological wet experiments and can reduce the burden of researchers.
Collapse
Affiliation(s)
- Yi-Xin Zhao
- School of information Engineering, Xijing University, Xi’an, China
| | - Chang-Qing Yu
- School of information Engineering, Xijing University, Xi’an, China
| | - Li-Ping Li
- School of information Engineering, Xijing University, Xi’an, China
- College of Grassland and Environment Sciences, Xinjiang Agricultural University, Ürümqi, China
| | - Deng-Wu Wang
- School of information Engineering, Xijing University, Xi’an, China
| | - Hui-Fan Song
- School of information Engineering, Xijing University, Xi’an, China
| | - Yu Wei
- School of information Engineering, Xijing University, Xi’an, China
| |
Collapse
|
10
|
Yang J, Lei X, Zhang F. Identification of circRNA-disease associations via multi-model fusion and ensemble learning. J Cell Mol Med 2024; 28:e18180. [PMID: 38506066 PMCID: PMC10951890 DOI: 10.1111/jcmm.18180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 01/21/2024] [Accepted: 02/05/2024] [Indexed: 03/21/2024] Open
Abstract
Circular RNA (circRNA) is a common non-coding RNA and plays an important role in the diagnosis and therapy of human diseases, circRNA-disease associations prediction based on computational methods can provide a new way for better clinical diagnosis. In this article, we proposed a novel method for circRNA-disease associations prediction based on ensemble learning, named ELCDA. First, the association heterogeneous network was constructed via collecting multiple information of circRNAs and diseases, and multiple similarity measures are adopted here, then, we use metapath, matrix factorization and GraphSAGE-based models to extract features of nodes from different views, the final comprehensive features of circRNAs and diseases via ensemble learning, finally, a soft voting ensemble strategy is used to integrate the predicted results of all classifier. The performance of ELCDA is evaluated by fivefold cross-validation and compare with other state-of-the-art methods, the experimental results show that ELCDA is outperformance than others. Furthermore, three common diseases are used as case studies, which also demonstrate that ELCDA is an effective method for predicting circRNA-disease associations.
Collapse
Affiliation(s)
- Jing Yang
- School of Computer ScienceShaanxi Normal UniversityXi'anShaanxiChina
| | - Xiujuan Lei
- School of Computer ScienceShaanxi Normal UniversityXi'anShaanxiChina
| | - Fa Zhang
- School of Medical TechnologyBeijing Institute of TechnologyBeijingChina
| |
Collapse
|
11
|
Turgut H, Turanli B, Boz B. DCDA: CircRNA-Disease Association Prediction with Feed-Forward Neural Network and Deep Autoencoder. Interdiscip Sci 2024; 16:91-103. [PMID: 37978116 DOI: 10.1007/s12539-023-00590-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Revised: 10/13/2023] [Accepted: 10/15/2023] [Indexed: 11/19/2023]
Abstract
Circular RNA is a single-stranded RNA with a closed-loop structure. In recent years, academic research has revealed that circular RNAs play critical roles in biological processes and are related to human diseases. The discovery of potential circRNAs as disease biomarkers and drug targets is crucial since it can help diagnose diseases in the early stages and be used to treat people. However, in conventional experimental methods, conducting experiments to detect associations between circular RNAs and diseases is time-consuming and costly. To overcome this problem, various computational methodologies are proposed to extract essential features for both circular RNAs and diseases and predict the associations. Studies showed that computational methods successfully predicted performance and made it possible to detect possible highly related circular RNAs for diseases. This study proposes a deep learning-based circRNA-disease association predictor methodology called DCDA, which uses multiple data sources to create circRNA and disease features and reveal hidden feature codings of a circular RNA-disease pair with a deep autoencoder, then predict the relation score of the pair by a deep neural network. Fivefold cross-validation results on the benchmark dataset showed that our model outperforms state-of-the-art prediction methods in the literature with the AUC score of 0.9794.
Collapse
Affiliation(s)
- Hacer Turgut
- Computer Engineering Department, Marmara University, 34854, Istanbul, Türkiye.
| | - Beste Turanli
- Bioengineering Department, Marmara University, 34854, Istanbul, Türkiye
| | - Betül Boz
- Computer Engineering Department, Marmara University, 34854, Istanbul, Türkiye.
| |
Collapse
|
12
|
Wang L, Li ZW, You ZH, Huang DS, Wong L. MAGCDA: A Multi-Hop Attention Graph Neural Networks Method for CircRNA-Disease Association Prediction. IEEE J Biomed Health Inform 2024; 28:1752-1761. [PMID: 38145538 DOI: 10.1109/jbhi.2023.3346821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2023]
Abstract
With a growing body of evidence establishing circular RNAs (circRNAs) are widely exploited in eukaryotic cells and have a significant contribution in the occurrence and development of many complex human diseases. Disease-associated circRNAs can serve as clinical diagnostic biomarkers and therapeutic targets, providing novel ideas for biopharmaceutical research. However, available computation methods for predicting circRNA-disease associations (CDAs) do not sufficiently consider the contextual information of biological network nodes, making their performance limited. In this work, we propose a multi-hop attention graph neural network-based approach MAGCDA to infer potential CDAs. Specifically, we first construct a multi-source attribute heterogeneous network of circRNAs and diseases, then use a multi-hop strategy of graph nodes to deeply aggregate node context information through attention diffusion, thus enhancing topological structure information and mining data hidden features, and finally use random forest to accurately infer potential CDAs. In the four gold standard data sets, MAGCDA achieved prediction accuracy of 92.58%, 91.42%, 83.46% and 91.12%, respectively. MAGCDA has also presented prominent achievements in ablation experiments and in comparisons with other models. Additionally, 18 and 17 potential circRNAs in top 20 predicted scores for MAGCDA prediction scores were confirmed in case studies of the complex diseases breast cancer and Almozheimer's disease, respectively. These results suggest that MAGCDA can be a practical tool to explore potential disease-associated circRNAs and provide a theoretical basis for disease diagnosis and treatment.
Collapse
|
13
|
Niu M, Wang C, Zhang Z, Zou Q. A computational model of circRNA-associated diseases based on a graph neural network: prediction and case studies for follow-up experimental validation. BMC Biol 2024; 22:24. [PMID: 38281919 PMCID: PMC10823650 DOI: 10.1186/s12915-024-01826-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Accepted: 01/11/2024] [Indexed: 01/30/2024] Open
Abstract
BACKGROUND Circular RNAs (circRNAs) have been confirmed to play a vital role in the occurrence and development of diseases. Exploring the relationship between circRNAs and diseases is of far-reaching significance for studying etiopathogenesis and treating diseases. To this end, based on the graph Markov neural network algorithm (GMNN) constructed in our previous work GMNN2CD, we further considered the multisource biological data that affects the association between circRNA and disease and developed an updated web server CircDA and based on the human hepatocellular carcinoma (HCC) tissue data to verify the prediction results of CircDA. RESULTS CircDA is built on a Tumarkov-based deep learning framework. The algorithm regards biomolecules as nodes and the interactions between molecules as edges, reasonably abstracts multiomics data, and models them as a heterogeneous biomolecular association network, which can reflect the complex relationship between different biomolecules. Case studies using literature data from HCC, cervical, and gastric cancers demonstrate that the CircDA predictor can identify missing associations between known circRNAs and diseases, and using the quantitative real-time PCR (RT-qPCR) experiment of HCC in human tissue samples, it was found that five circRNAs were significantly differentially expressed, which proved that CircDA can predict diseases related to new circRNAs. CONCLUSIONS This efficient computational prediction and case analysis with sufficient feedback allows us to identify circRNA-associated diseases and disease-associated circRNAs. Our work provides a method to predict circRNA-associated diseases and can provide guidance for the association of diseases with certain circRNAs. For ease of use, an online prediction server ( http://server.malab.cn/CircDA ) is provided, and the code is open-sourced ( https://github.com/nmt315320/CircDA.git ) for the convenience of algorithm improvement.
Collapse
Affiliation(s)
- Mengting Niu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic University, Shenzhen, 518055, China
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Chunyu Wang
- Faculty of Computing, Harbin Institute of Technology, Harbin, 150000, Heilongjiang, China
| | - Zhanguo Zhang
- Hepatic Surgery Center, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, 1095 Jiefang Avenue, Wuhan, 430030, China.
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, No. 4 Block 2 North Jianshe Road, Chengdu, 610054, China.
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China.
| |
Collapse
|
14
|
Chen L, Zhao X. PCDA-HNMP: Predicting circRNA-disease association using heterogeneous network and meta-path. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:20553-20575. [PMID: 38124565 DOI: 10.3934/mbe.2023909] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Increasing amounts of experimental studies have shown that circular RNAs (circRNAs) play important regulatory roles in human diseases through interactions with related microRNAs (miRNAs). CircRNAs have become new potential disease biomarkers and therapeutic targets. Predicting circRNA-disease association (CDA) is of great significance for exploring the pathogenesis of complex diseases, which can improve the diagnosis level of diseases and promote the targeted therapy of diseases. However, determination of CDAs through traditional clinical trials is usually time-consuming and expensive. Computational methods are now alternative ways to predict CDAs. In this study, a new computational method, named PCDA-HNMP, was designed. For obtaining informative features of circRNAs and diseases, a heterogeneous network was first constructed, which defined circRNAs, mRNAs, miRNAs and diseases as nodes and associations between them as edges. Then, a deep analysis was conducted on the heterogeneous network by extracting meta-paths connecting to circRNAs (diseases), thereby mining hidden associations between various circRNAs (diseases). These associations constituted the meta-path-induced networks for circRNAs and diseases. The features of circRNAs and diseases were derived from the aforementioned networks via mashup. On the other hand, miRNA-disease associations (mDAs) were employed to improve the model's performance. miRNA features were yielded from the meta-path-induced networks on miRNAs and circRNAs, which were constructed from the meta-paths connecting miRNAs and circRNAs in the heterogeneous network. A concatenation operation was adopted to build the features of CDAs and mDAs. Such representations of CDAs and mDAs were fed into XGBoost to set up the model. The five-fold cross-validation yielded an area under the curve (AUC) of 0.9846, which was better than those of some existing state-of-the-art methods. The employment of mDAs can really enhance the model's performance and the importance analysis on meta-path-induced networks shown that networks produced by the meta-paths containing validated CDAs provided the most important contributions.
Collapse
Affiliation(s)
- Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Xiaoyu Zhao
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| |
Collapse
|
15
|
Wang MN, Xie XJ, You ZH, Wong L, Li LP, Chen ZH. Combining K Nearest Neighbor With Nonnegative Matrix Factorization for Predicting Circrna-Disease Associations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2610-2618. [PMID: 35675235 DOI: 10.1109/tcbb.2022.3180903] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Accumulating evidences show that circular RNAs (circRNAs) play an important role in regulating gene expression, and involve in many complex human diseases. Identifying associations of circRNA with disease helps to understand the pathogenesis, treatment and diagnosis of complex diseases. Since inferring circRNA-disease associations by biological experiments is costly and time-consuming, there is an urgently need to develop a computational model to identify the association between them. In this paper, we proposed a novel method named KNN-NMF, which combines K nearest neighbors with nonnegative matrix factorization to infer associations between circRNA and disease (KNN-NMF). Frist, we compute the Gaussian Interaction Profile (GIP) kernel similarity of circRNA and disease, the semantic similarity of disease, respectively. Then, the circRNA-disease new interaction profiles are established using weight K nearest neighbors to reduce the false negative association impact on prediction performance. Finally, Nonnegative Matrix Factorization is implemented to predict associations of circRNA with disease. The experiment results indicate that the prediction performance of KNN-NMF outperforms the competing methods under five-fold cross-validation. Moreover, case studies of two common diseases further show that KNN-NMF can identify potential circRNA-disease associations effectively.
Collapse
|
16
|
Ma Z, Kuang Z, Deng L. NGCICM: A Novel Deep Learning-Based Method for Predicting circRNA-miRNA Interactions. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3080-3092. [PMID: 37027645 DOI: 10.1109/tcbb.2023.3248787] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
The circRNAs and miRNAs play an important role in the development of human diseases, and they can be widely used as biomarkers of diseases for disease diagnosis. In particular, circRNAs can act as sponge adsorbers for miRNAs and act together in certain diseases. However, the associations between the vast majority of circRNAs and diseases and between miRNAs and diseases remain unclear. Computational-based approaches are urgently needed to discover the unknown interactions between circRNAs and miRNAs. In this paper, we propose a novel deep learning algorithm based on Node2vec and Graph ATtention network (GAT), Conditional Random Field (CRF) layer and Inductive Matrix Completion (IMC) to predict circRNAs and miRNAs interactions (NGCICM). We construct a GAT-based encoder for deep feature learning by fusing the talking-heads attention mechanism and the CRF layer. The IMC-based decoder is also constructed to obtain interaction scores. The Area Under the receiver operating characteristic Curve (AUC) of the NGCICM method is 0.9697, 0.9932 and 0.9980, and the Area Under the Precision-Recall curve (AUPR) is 0.9671, 0.9935 and 0.9981, respectively, using 2-fold, 5-fold and 10-fold Cross-Validation (CV) as the benchmark. The experimental results confirm the effectiveness of the NGCICM algorithm in predicting the interactions between circRNAs and miRNAs.
Collapse
|
17
|
Wu Q, Deng Z, Zhang W, Pan X, Choi KS, Zuo Y, Shen HB, Yu DJ. MLNGCF: circRNA-disease associations prediction with multilayer attention neural graph-based collaborative filtering. Bioinformatics 2023; 39:btad499. [PMID: 37561093 PMCID: PMC10457666 DOI: 10.1093/bioinformatics/btad499] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 06/17/2023] [Accepted: 08/09/2023] [Indexed: 08/11/2023] Open
Abstract
MOTIVATION CircRNAs play a critical regulatory role in physiological processes, and the abnormal expression of circRNAs can mediate the processes of diseases. Therefore, exploring circRNAs-disease associations is gradually becoming an important area of research. Due to the high cost of validating circRNA-disease associations using traditional wet-lab experiments, novel computational methods based on machine learning are gaining more and more attention in this field. However, current computational methods suffer to insufficient consideration of latent features in circRNA-disease interactions. RESULTS In this study, a multilayer attention neural graph-based collaborative filtering (MLNGCF) is proposed. MLNGCF first enhances multiple biological information with autoencoder as the initial features of circRNAs and diseases. Then, by constructing a central network of different diseases and circRNAs, a multilayer cooperative attention-based message propagation is performed on the central network to obtain the high-order features of circRNAs and diseases. A neural network-based collaborative filtering is constructed to predict the unknown circRNA-disease associations and update the model parameters. Experiments on the benchmark datasets demonstrate that MLNGCF outperforms state-of-the-art methods, and the prediction results are supported by the literature in the case studies. AVAILABILITY AND IMPLEMENTATION The source codes and benchmark datasets of MLNGCF are available at https://github.com/ABard0/MLNGCF.
Collapse
Affiliation(s)
- Qunzhuo Wu
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China
| | - Zhaohong Deng
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China
| | - Wei Zhang
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China
| | - Xiaoyong Pan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiaotong University, Shanghai, China
| | - Kup-Sze Choi
- The Centre for Smart Health, The Hong Kong Polytechnic University, Hong Kong
| | - Yun Zuo
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiaotong University, Shanghai, China
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
| |
Collapse
|
18
|
Yuan L, Zhao J, Shen Z, Zhang Q, Geng Y, Zheng CH, Huang DS. iCircDA-NEAE: Accelerated attribute network embedding and dynamic convolutional autoencoder for circRNA-disease associations prediction. PLoS Comput Biol 2023; 19:e1011344. [PMID: 37651321 PMCID: PMC10470932 DOI: 10.1371/journal.pcbi.1011344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 07/10/2023] [Indexed: 09/02/2023] Open
Abstract
Accumulating evidence suggests that circRNAs play crucial roles in human diseases. CircRNA-disease association prediction is extremely helpful in understanding pathogenesis, diagnosis, and prevention, as well as identifying relevant biomarkers. During the past few years, a large number of deep learning (DL) based methods have been proposed for predicting circRNA-disease association and achieved impressive prediction performance. However, there are two main drawbacks to these methods. The first is these methods underutilize biometric information in the data. Second, the features extracted by these methods are not outstanding to represent association characteristics between circRNAs and diseases. In this study, we developed a novel deep learning model, named iCircDA-NEAE, to predict circRNA-disease associations. In particular, we use disease semantic similarity, Gaussian interaction profile kernel, circRNA expression profile similarity, and Jaccard similarity simultaneously for the first time, and extract hidden features based on accelerated attribute network embedding (AANE) and dynamic convolutional autoencoder (DCAE). Experimental results on the circR2Disease dataset show that iCircDA-NEAE outperforms other competing methods significantly. Besides, 16 of the top 20 circRNA-disease pairs with the highest prediction scores were validated by relevant literature. Furthermore, we observe that iCircDA-NEAE can effectively predict new potential circRNA-disease associations.
Collapse
Affiliation(s)
- Lin Yuan
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
- Shandong Engineering Research Center of Big Data Applied Technology, Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
- Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Jinan, China
| | - Jiawang Zhao
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
- Shandong Engineering Research Center of Big Data Applied Technology, Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
- Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Jinan, China
| | - Zhen Shen
- School of Computer and Software, Nanyang Institute of Technology, Nanyang, China
| | - Qinhu Zhang
- Eastern Institute for Advanced Study, Eastern Institute of Technology, Ningbo, China
| | - Yushui Geng
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
- Shandong Engineering Research Center of Big Data Applied Technology, Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
- Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Jinan, China
| | - Chun-Hou Zheng
- Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, Hefei, China
| | - De-Shuang Huang
- Eastern Institute for Advanced Study, Eastern Institute of Technology, Ningbo, China
| |
Collapse
|
19
|
Wang H, Han J, Li H, Duan L, Liu Z, Cheng H. CDA-SKAG: Predicting circRNA-disease associations using similarity kernel fusion and an attention-enhancing graph autoencoder. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:7957-7980. [PMID: 37161181 DOI: 10.3934/mbe.2023345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Circular RNAs (circRNAs) constitute a category of circular non-coding RNA molecules whose abnormal expression is closely associated with the development of diseases. As biological data become abundant, a lot of computational prediction models have been used for circRNA-disease association prediction. However, existing prediction models ignore the non-linear information of circRNAs and diseases when fusing multi-source similarities. In addition, these models fail to take full advantage of the vital feature information of high-similarity neighbor nodes when extracting features of circRNAs or diseases. In this paper, we propose a deep learning model, CDA-SKAG, which introduces a similarity kernel fusion algorithm to integrate multi-source similarity matrices to capture the non-linear information of circRNAs or diseases, and construct a circRNA information space and a disease information space. The model embeds an attention-enhancing layer in the graph autoencoder to enhance the associations between nodes with higher similarity. A cost-sensitive neural network is introduced to address the problem of positive and negative sample imbalance, consequently improving our model's generalization capability. The experimental results show that the prediction performance of our model CDA-SKAG outperformed existing circRNA-disease association prediction models. The results of the case studies on lung and cervical cancer suggest that CDA-SKAG can be utilized as an effective tool to assist in predicting circRNA-disease associations.
Collapse
Affiliation(s)
- Huiqing Wang
- College of Information and Computer, Taiyuan University of Technology, Taiyuan 030024, China
| | - Jiale Han
- College of Information and Computer, Taiyuan University of Technology, Taiyuan 030024, China
| | - Haolin Li
- College of Information and Computer, Taiyuan University of Technology, Taiyuan 030024, China
| | - Liguo Duan
- College of Information and Computer, Taiyuan University of Technology, Taiyuan 030024, China
| | - Zhihao Liu
- College of Information and Computer, Taiyuan University of Technology, Taiyuan 030024, China
| | - Hao Cheng
- College of Information and Computer, Taiyuan University of Technology, Taiyuan 030024, China
| |
Collapse
|
20
|
Lan W, Dong Y, Zhang H, Li C, Chen Q, Liu J, Wang J, Chen YPP. Benchmarking of computational methods for predicting circRNA-disease associations. Brief Bioinform 2023; 24:6972300. [PMID: 36611256 DOI: 10.1093/bib/bbac613] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 10/29/2022] [Accepted: 12/11/2022] [Indexed: 01/09/2023] Open
Abstract
Accumulating evidences demonstrate that circular RNA (circRNA) plays an important role in human diseases. Identification of circRNA-disease associations can help for the diagnosis of human diseases, while the traditional method based on biological experiments is time-consuming. In order to address the limitation, a series of computational methods have been proposed in recent years. However, few works have summarized these methods or compared the performance of them. In this paper, we divided the existing methods into three categories: information propagation, traditional machine learning and deep learning. Then, the baseline methods in each category are introduced in detail. Further, 5 different datasets are collected, and 14 representative methods of each category are selected and compared in the 5-fold, 10-fold cross-validation and the de novo experiment. In order to further evaluate the effectiveness of these methods, six common cancers are selected to compare the number of correctly identified circRNA-disease associations in the top-10, top-20, top-50, top-100 and top-200. In addition, according to the results, the observation about the robustness and the character of these methods are concluded. Finally, the future directions and challenges are discussed.
Collapse
Affiliation(s)
- Wei Lan
- School of Computer, Electronic and Information and Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning, Guangxi 530004, China
| | - Yi Dong
- School of Computer, Electronic and Information and Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning, Guangxi 530004, China
| | - Hongyu Zhang
- School of Computer, Electronic and Information and Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning, Guangxi 530004, China
| | - Chunling Li
- School of Computer, Electronic and Information and Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning, Guangxi 530004, China
| | - Qingfeng Chen
- School of Computer, Electronic and Information and State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, Guangxi University, Nanning, Guangxi 530004, China
| | - Jin Liu
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Yi-Ping Phoebe Chen
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, Victoria 3086, Australia
| |
Collapse
|
21
|
Wang L, You ZH, Huang DS, Li JQ. MGRCDA: Metagraph Recommendation Method for Predicting CircRNA-Disease Association. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:67-75. [PMID: 34236991 DOI: 10.1109/tcyb.2021.3090756] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Clinical evidence began to accumulate, suggesting that circRNAs can be novel therapeutic targets for various diseases and play a critical role in human health. However, limited by the complex mechanism of circRNA, it is difficult to quickly and large-scale explore the relationship between disease and circRNA in the wet-lab experiment. In this work, we design a new computational model MGRCDA on account of the metagraph recommendation theory to predict the potential circRNA-disease associations. Specifically, we first regard the circRNA-disease association prediction problem as the system recommendation problem, and design a series of metagraphs according to the heterogeneous biological networks; then extract the semantic information of the disease and the Gaussian interaction profile kernel (GIPK) similarity of circRNA and disease as network attributes; finally, the iterative search of the metagraph recommendation algorithm is used to calculate the scores of the circRNA-disease pair. On the gold standard dataset circR2Disease, MGRCDA achieved a prediction accuracy of 92.49% with an area under the ROC curve of 0.9298, which is significantly higher than other state-of-the-art models. Furthermore, among the top 30 disease-related circRNAs recommended by the model, 25 have been verified by the latest published literature. The experimental results prove that MGRCDA is feasible and efficient, and it can recommend reliable candidates to further wet-lab experiment and reduce the scope of the experiment.
Collapse
|
22
|
Liu ZH, Ji CM, Ni JC, Wang YT, Qiao LJ, Zheng CH. Convolution Neural Networks Using Deep Matrix Factorization for Predicting Circrna-Disease Association. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:277-284. [PMID: 34951853 DOI: 10.1109/tcbb.2021.3138339] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
CircRNAs have a stable structure, which gives them a higher tolerance to nucleases. Therefore, the properties of circular RNAs are beneficial in disease diagnosis. However, there are few known associations between circRNAs and disease. Biological experiments identify new associations is time-consuming and high-cost. As a result, there is a need of building efficient and achievable computation models to predict potential circRNA-disease associations. In this paper, we design a novel convolution neural networks framework(DMFCNNCD) to learn features from deep matrix factorization to predict circRNA-disease associations. Firstly, we decompose the circRNA-disease association matrix to obtain the original features of the disease and circRNA, and use the mapping module to extract potential nonlinear features. Then, we integrate it with the similarity information to form a training set. Finally, we apply convolution neural networks to predict the unknown association between circRNAs and diseases. The five-fold cross-validation on various experiments shows that our method can predict circRNA-disease association and outperforms state of the art methods.
Collapse
|
23
|
Zheng K, Zhang XL, Wang L, You ZH, Zhan ZH, Li HY. Line graph attention networks for predicting disease-associated Piwi-interacting RNAs. Brief Bioinform 2022; 23:6748487. [PMID: 36198846 DOI: 10.1093/bib/bbac393] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 08/08/2022] [Accepted: 08/12/2022] [Indexed: 12/14/2022] Open
Abstract
PIWI proteins and Piwi-Interacting RNAs (piRNAs) are commonly detected in human cancers, especially in germline and somatic tissues, and correlate with poorer clinical outcomes, suggesting that they play a functional role in cancer. As the problem of combinatorial explosions between ncRNA and disease exposes gradually, new bioinformatics methods for large-scale identification and prioritization of potential associations are therefore of interest. However, in the real world, the network of interactions between molecules is enormously intricate and noisy, which poses a problem for efficient graph mining. Line graphs can extend many heterogeneous networks to replace dichotomous networks. In this study, we present a new graph neural network framework, line graph attention networks (LGAT). And we apply it to predict PiRNA disease association (GAPDA). In the experiment, GAPDA performs excellently in 5-fold cross-validation with an AUC of 0.9038. Not only that, it still has superior performance compared with methods based on collaborative filtering and attribute features. The experimental results show that GAPDA ensures the prospect of the graph neural network on such problems and can be an excellent supplement for future biomedical research.
Collapse
Affiliation(s)
- Kai Zheng
- College of Information Science and Engineering, Zaozhuang University, Shandong 277100, China.,Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| | | | - Lei Wang
- College of Information Science and Engineering, Zaozhuang University, Shandong 277100, China.,Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China
| | - Zhu-Hong You
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China
| | - Zhao-Hui Zhan
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
| | - Hao-Yuan Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| |
Collapse
|
24
|
Li Y, Hu XG, Wang L, Li PP, You ZH. MNMDCDA: prediction of circRNA-disease associations by learning mixed neighborhood information from multiple distances. Brief Bioinform 2022; 23:6831006. [PMID: 36384071 DOI: 10.1093/bib/bbac479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 09/25/2022] [Accepted: 10/10/2022] [Indexed: 11/18/2022] Open
Abstract
Emerging evidence suggests that circular RNA (circRNA) is an important regulator of a variety of pathological processes and serves as a promising biomarker for many complex human diseases. Nevertheless, there are relatively few known circRNA-disease associations, and uncovering new circRNA-disease associations by wet-lab methods is time consuming and costly. Considering the limitations of existing computational methods, we propose a novel approach named MNMDCDA, which combines high-order graph convolutional networks (high-order GCNs) and deep neural networks to infer associations between circRNAs and diseases. Firstly, we computed different biological attribute information of circRNA and disease separately and used them to construct multiple multi-source similarity networks. Then, we used the high-order GCN algorithm to learn feature embedding representations with high-order mixed neighborhood information of circRNA and disease from the constructed multi-source similarity networks, respectively. Finally, the deep neural network classifier was implemented to predict associations of circRNAs with diseases. The MNMDCDA model obtained AUC scores of 95.16%, 94.53%, 89.80% and 91.83% on four benchmark datasets, i.e., CircR2Disease, CircAtlas v2.0, Circ2Disease and CircRNADisease, respectively, using the 5-fold cross-validation approach. Furthermore, 25 of the top 30 circRNA-disease pairs with the best scores of MNMDCDA in the case study were validated by recent literature. Numerous experimental results indicate that MNMDCDA can be used as an effective computational tool to predict circRNA-disease associations and can provide the most promising candidates for biological experiments.
Collapse
Affiliation(s)
- Yang Li
- School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, China
| | - Xue-Gang Hu
- School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, China
| | - Lei Wang
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China.,College of Information Science and Engineering, Zaozhuang University, Shandong 277100, China
| | - Pei-Pei Li
- School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, China
| | - Zhu-Hong You
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China.,School of Computer Science, Northwestern Polytechnical University, Xi'an Shaanxi 710129, China
| |
Collapse
|
25
|
Shen S, Liu J, Zhou C, Qian Y, Deng L. XGBCDA: a multiple heterogeneous networks-based method for predicting circRNA-disease associations. BMC Med Genomics 2022; 13:196. [PMID: 36329528 PMCID: PMC9632006 DOI: 10.1186/s12920-021-01054-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2021] [Accepted: 07/29/2021] [Indexed: 11/06/2022] Open
Abstract
BACKGROUND Biological experiments have demonstrated that circRNA plays an essential role in various biological processes and human diseases. However, it is time-consuming and costly to merely conduct biological experiments to detect the association between circRNA and diseases. Accordingly, developing an efficient computational model to predict circRNA-disease associations is urgent. METHODS In this research, we propose a multiple heterogeneous networks-based method, named XGBCDA, to predict circRNA-disease associations. The method first extracts original features, namely statistical features and graph theory features, from integrated circRNA similarity network, disease similarity network and circRNA-disease association network, and then sends these original features to the XGBoost classifier for training latent features. The method utilizes the tree learned by the XGBoost model, the index of leaf that instance finally falls into, and the 1 of K coding to represent the latent features. Finally, the method combines the latent features from the XGBoost with the original features to train the final model for predicting the association between the circRNA and diseases. RESULTS The tenfold cross-validation results of the XGBCDA method illustrate that the area under the ROC curve reaches 0.9860. In addition, the method presents a striking performance in the case studies of colorectal cancer, gastric cancer and cervical cancer. CONCLUSION With fabulous performance in predicting potential circRNA-disease associations, the XGBCDA method has the promising ability to assist biomedical researchers in terms of circRNA-disease association prediction.
Collapse
Affiliation(s)
- Siyuan Shen
- grid.413254.50000 0000 9544 7024School of Software, Xinjiang University, Wulumuqi, 830091 China
| | - Junyi Liu
- grid.216417.70000 0001 0379 7164School of Computer Science and Engineering, Central South University, Changsha, 410075 China
| | - Cheng Zhou
- grid.216417.70000 0001 0379 7164School of Computer Science and Engineering, Central South University, Changsha, 410075 China
| | - Yurong Qian
- School of Software, Xinjiang University, Wulumuqi, 830091, China.
| | - Lei Deng
- School of Software, Xinjiang University, Wulumuqi, 830091, China. .,School of Computer Science and Engineering, Central South University, Changsha, 410075, China.
| |
Collapse
|
26
|
Ryšavý P, Kléma J, Merkerová MD. circGPA: circRNA functional annotation based on probability-generating functions. BMC Bioinformatics 2022; 23:392. [PMID: 36167495 PMCID: PMC9513885 DOI: 10.1186/s12859-022-04957-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Accepted: 09/21/2022] [Indexed: 11/25/2022] Open
Abstract
Recent research has already shown that circular RNAs (circRNAs) are functional in gene expression regulation and potentially related to diseases. Due to their stability, circRNAs can also be used as biomarkers for diagnosis. However, the function of most circRNAs remains unknown, and it is expensive and time-consuming to discover it through biological experiments. In this paper, we predict circRNA annotations from the knowledge of their interaction with miRNAs and subsequent miRNA-mRNA interactions. First, we construct an interaction network for a target circRNA and secondly spread the information from the network nodes with the known function to the root circRNA node. This idea itself is not new; our main contribution lies in proposing an efficient and exact deterministic procedure based on the principle of probability-generating functions to calculate the p-value of association test between a circRNA and an annotation term. We show that our publicly available algorithm is both more effective and efficient than the commonly used Monte-Carlo sampling approach that may suffer from difficult quantification of sampling convergence and subsequent sampling inefficiency. We experimentally demonstrate that the new approach is two orders of magnitude faster than the Monte-Carlo sampling, which makes summary annotation of large circRNA files feasible; this includes their reannotation after periodical interaction network updates, for example. We provide a summary annotation of a current circRNA database as one of our outputs. The proposed algorithm could be generalized towards other types of RNA in way that is straightforward.
Collapse
Affiliation(s)
- Petr Ryšavý
- Department of Computer Science, Faculty of Electrical Engineering, Czech Technical University in Prague, Prague, Czech Republic
| | - Jiří Kléma
- Department of Computer Science, Faculty of Electrical Engineering, Czech Technical University in Prague, Prague, Czech Republic
| | | |
Collapse
|
27
|
Wang L, Wong L, Li Z, Huang Y, Su X, Zhao B, You Z. A machine learning framework based on multi-source feature fusion for circRNA-disease association prediction. Brief Bioinform 2022; 23:6693603. [PMID: 36070867 DOI: 10.1093/bib/bbac388] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Revised: 07/26/2022] [Accepted: 08/11/2022] [Indexed: 11/14/2022] Open
Abstract
Circular RNAs (circRNAs) are involved in the regulatory mechanisms of multiple complex diseases, and the identification of their associations is critical to the diagnosis and treatment of diseases. In recent years, many computational methods have been designed to predict circRNA-disease associations. However, most of the existing methods rely on single correlation data. Here, we propose a machine learning framework for circRNA-disease association prediction, called MLCDA, which effectively fuses multiple sources of heterogeneous information including circRNA sequences and disease ontology. Comprehensive evaluation in the gold standard dataset showed that MLCDA can successfully capture the complex relationships between circRNAs and diseases and accurately predict their potential associations. In addition, the results of case studies on real data show that MLCDA significantly outperforms other existing methods. MLCDA can serve as a useful tool for circRNA-disease association prediction, providing mechanistic insights for disease research and thus facilitating the progress of disease treatment.
Collapse
Affiliation(s)
- Lei Wang
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning, 530007, China
| | - Leon Wong
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning, 530007, China
| | - Zhengwei Li
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning, 530007, China
| | - Yuan Huang
- Department of Computing, Hong Kong Polytechnic University, Hong Kong 999077, China
| | - Xiaorui Su
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
| | - Bowei Zhao
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
| | - Zhuhong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an 710129, China
| |
Collapse
|
28
|
Huang Y, Li Y, Lin W, Fan S, Chen H, Xia J, Pi J, Xu JF. Promising Roles of Circular RNAs as Biomarkers and Targets for Potential Diagnosis and Therapy of Tuberculosis. Biomolecules 2022; 12:biom12091235. [PMID: 36139074 PMCID: PMC9496049 DOI: 10.3390/biom12091235] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2022] [Revised: 09/01/2022] [Accepted: 09/02/2022] [Indexed: 12/02/2022] Open
Abstract
Tuberculosis (TB), caused by Mycobacterium tuberculosis (Mtb) infection, remains one of the most threatening infectious diseases worldwide. A series of challenges still exist for TB prevention, diagnosis and treatment, which therefore require more attempts to clarify the pathological and immunological mechanisms in the development and progression of TB. Circular RNAs (circRNAs) are a large class of non-coding RNA, mostly expressed in eukaryotic cells, which are generated by the spliceosome through the back-splicing of linear RNAs. Accumulating studies have identified that circRNAs are widely involved in a variety of physiological and pathological processes, acting as the sponges or decoys for microRNAs and proteins, scaffold platforms for proteins, modulators for transcription and special templates for translation. Due to the stable and widely spread characteristics of circRNAs, they are expected to serve as promising prognostic/diagnostic biomarkers and therapeutic targets for diseases. In this review, we briefly describe the biogenesis, classification, detection technology and functions of circRNAs, and, in particular, outline the dynamic, and sometimes aberrant changes of circRNAs in TB. Moreover, we further summarize the recent progress of research linking circRNAs to TB-related pathogenetic processes, as well as the potential roles of circRNAs as diagnostic biomarkers and miRNAs sponges in the case of Mtb infection, which is expected to enhance our understanding of TB and provide some novel ideas about how to overcome the challenges associated TB in the future.
Collapse
Affiliation(s)
- Yifan Huang
- Guangdong Provincial Key Laboratory of Medical Molecular Diagnostics, The First Dongguan Affiliated Hospital, Guangdong Medical University, Dongguan 523808, China
- Institute of Laboratory Medicine, School of Medical Technology, Guangdong Medical University, Dongguan 523808, China
| | - Ying Li
- Guangdong Provincial Key Laboratory of Medical Molecular Diagnostics, The First Dongguan Affiliated Hospital, Guangdong Medical University, Dongguan 523808, China
- Institute of Laboratory Medicine, School of Medical Technology, Guangdong Medical University, Dongguan 523808, China
| | - Wensen Lin
- Guangdong Provincial Key Laboratory of Medical Molecular Diagnostics, The First Dongguan Affiliated Hospital, Guangdong Medical University, Dongguan 523808, China
- Institute of Laboratory Medicine, School of Medical Technology, Guangdong Medical University, Dongguan 523808, China
| | - Shuhao Fan
- Guangdong Provincial Key Laboratory of Medical Molecular Diagnostics, The First Dongguan Affiliated Hospital, Guangdong Medical University, Dongguan 523808, China
- Institute of Laboratory Medicine, School of Medical Technology, Guangdong Medical University, Dongguan 523808, China
| | - Haorong Chen
- Guangdong Provincial Key Laboratory of Medical Molecular Diagnostics, The First Dongguan Affiliated Hospital, Guangdong Medical University, Dongguan 523808, China
- Institute of Laboratory Medicine, School of Medical Technology, Guangdong Medical University, Dongguan 523808, China
| | - Jiaojiao Xia
- Guangdong Provincial Key Laboratory of Medical Molecular Diagnostics, The First Dongguan Affiliated Hospital, Guangdong Medical University, Dongguan 523808, China
- Institute of Laboratory Medicine, School of Medical Technology, Guangdong Medical University, Dongguan 523808, China
| | - Jiang Pi
- Guangdong Provincial Key Laboratory of Medical Molecular Diagnostics, The First Dongguan Affiliated Hospital, Guangdong Medical University, Dongguan 523808, China
- Institute of Laboratory Medicine, School of Medical Technology, Guangdong Medical University, Dongguan 523808, China
- Correspondence: (J.P.); (J.-F.X.)
| | - Jun-Fa Xu
- Guangdong Provincial Key Laboratory of Medical Molecular Diagnostics, The First Dongguan Affiliated Hospital, Guangdong Medical University, Dongguan 523808, China
- Institute of Laboratory Medicine, School of Medical Technology, Guangdong Medical University, Dongguan 523808, China
- Correspondence: (J.P.); (J.-F.X.)
| |
Collapse
|
29
|
Uddin M, Islam MK, Hassan MR, Jahan F, Baek JH. A fast and efficient algorithm for DNA sequence similarity identification. COMPLEX INTELL SYST 2022; 9:1265-1280. [PMID: 36035628 PMCID: PMC9395857 DOI: 10.1007/s40747-022-00846-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Accepted: 08/05/2022] [Indexed: 11/22/2022]
Abstract
DNA sequence similarity analysis is necessary for enormous purposes including genome analysis, extracting biological information, finding the evolutionary relationship of species. There are two types of sequence analysis which are alignment-based (AB) and alignment-free (AF). AB is effective for small homologous sequences but becomes NP-hard problem for long sequences. However, AF algorithms can solve the major limitations of AB. But most of the existing AF methods show high time complexity and memory consumption, less precision, and less performance on benchmark datasets. To minimize these limitations, we develop an AF algorithm using a 2D \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$k-mer$$\end{document}k-mer count matrix inspired by the CGR approach. Then we shrink the matrix by analyzing the neighbors and then measure similarities using the best combinations of pairwise distance (PD) and phylogenetic tree methods. We also dynamically choose the value of k for \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$k-mer$$\end{document}k-mer. We develop an efficient system for finding the positions of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$k-mer$$\end{document}k-mer in the count matrix. We apply our system in six different datasets. We achieve the top rank for two benchmark datasets from AFproject, 100% accuracy for two datasets (16 S Ribosomal, 18 Eutherian), and achieve a milestone for time complexity and memory consumption in comparison to the existing study datasets (HEV, HIV-1). Therefore, the comparative results of the benchmark datasets and existing studies demonstrate that our method is highly effective, efficient, and accurate. Thus, our method can be used with the top level of authenticity for DNA sequence similarity measurement.
Collapse
|
30
|
Zheng K, Zhao H, Zhao Q, Wang B, Gao X, Wang J. NASMDR: a framework for miRNA-drug resistance prediction using efficient neural architecture search and graph isomorphism networks. Brief Bioinform 2022; 23:6674165. [PMID: 35998922 DOI: 10.1093/bib/bbac338] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Revised: 06/15/2022] [Accepted: 07/23/2022] [Indexed: 11/13/2022] Open
Abstract
As a frontier field of individualized therapy, microRNA (miRNA) pharmacogenomics facilitates the understanding of different individual responses to certain drugs and provides a reasonable reference for clinical treatment. However, the known drug resistance-associated miRNAs are not yet sufficient to support precision medicine. Although existing methods are effective, they all focus on modelling miRNA-drug resistance interaction graphs, making their performance bounded by the interaction density. In this study, we propose a framework for miRNA-drug resistance prediction through efficient neural architecture search and graph isomorphism networks (NASMDR). NASMDR uses attribute information instead of the commonly used interactive graph information. In the cross-validation experiment, the proposed framework can achieve an AUC of 0.9468 on the ncDR dataset, which is 2.29% higher than the state-of-the-art method. In addition, we propose a novel sequence characterization approach, k-mer Sparse Nonnegative Matrix Factorization (KSNMF). The results show that NASMDR provides novel insights for integrating efficient neural architecture search and graph isomorphic networks into a unified framework to predict drug resistance-related miRNAs. The codes for NASMDR are available at https://github.com/kaizheng-academic/NASMDR.
Collapse
Affiliation(s)
- Kai Zheng
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| | - Haochen Zhao
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| | - Qichang Zhao
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| | - Bin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| | - Xin Gao
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| |
Collapse
|
31
|
Zheng K, Liang Y, Liu YY, Yasir M, Wang P. A decision support system based on multi-sources information to predict piRNA–disease associations using stacked autoencoder. Soft comput 2022. [DOI: 10.1007/s00500-022-07396-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
32
|
Wu Q, Deng Z, Pan X, Shen HB, Choi KS, Wang S, Wu J, Yu DJ. MDGF-MCEC: a multi-view dual attention embedding model with cooperative ensemble learning for CircRNA-disease association prediction. Brief Bioinform 2022; 23:6652197. [PMID: 35907779 DOI: 10.1093/bib/bbac289] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Revised: 06/19/2022] [Accepted: 06/26/2022] [Indexed: 11/12/2022] Open
Abstract
Circular RNA (circRNA) is closely involved in physiological and pathological processes of many diseases. Discovering the associations between circRNAs and diseases is of great significance. Due to the high-cost to verify the circRNA-disease associations by wet-lab experiments, computational approaches for predicting the associations become a promising research direction. In this paper, we propose a method, MDGF-MCEC, based on multi-view dual attention graph convolution network (GCN) with cooperative ensemble learning to predict circRNA-disease associations. First, MDGF-MCEC constructs two disease relation graphs and two circRNA relation graphs based on different similarities. Then, the relation graphs are fed into a multi-view GCN for representation learning. In order to learn high discriminative features, a dual-attention mechanism is introduced to adjust the contribution weights, at both channel level and spatial level, of different features. Based on the learned embedding features of diseases and circRNAs, nine different feature combinations between diseases and circRNAs are treated as new multi-view data. Finally, we construct a multi-view cooperative ensemble classifier to predict the associations between circRNAs and diseases. Experiments conducted on the CircR2Disease database demonstrate that the proposed MDGF-MCEC model achieves a high area under curve of 0.9744 and outperforms the state-of-the-art methods. Promising results are also obtained from experiments on the circ2Disease and circRNADisease databases. Furthermore, the predicted associated circRNAs for hepatocellular carcinoma and gastric cancer are supported by the literature. The code and dataset of this study are available at https://github.com/ABard0/MDGF-MCEC.
Collapse
Affiliation(s)
| | - Zhaohong Deng
- Jiangnan University, School of Artificial Intelligence and Computer Science, China
| | - Xiaoyong Pan
- Shanghai Jiao Tong University, Department of Automation, China
| | - Hong-Bin Shen
- Shanghai Jiao Tong University, Shanghai, China, Department of Automation, China
| | - Kup-Sze Choi
- Hong Kong Polytechnic University, School of Nursing, China
| | - Shitong Wang
- Jiangnan University, School of Artificial Intelligence and Computer Science, China
| | - Jing Wu
- Jiangnan University, State Key Laboratory of Food Science and Technology, China
| | - Dong-Jun Yu
- Nanjing University of Science and Technology, School of Computer Science and Engineering, China
| |
Collapse
|
33
|
Wang Y, Wang LL, Wong L, Li Y, Wang L, You ZH. SIPGCN: A Novel Deep Learning Model for Predicting Self-Interacting Proteins from Sequence Information Using Graph Convolutional Networks. Biomedicines 2022; 10:biomedicines10071543. [PMID: 35884848 PMCID: PMC9313220 DOI: 10.3390/biomedicines10071543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 06/24/2022] [Accepted: 06/24/2022] [Indexed: 11/16/2022] Open
Abstract
Protein is the basic organic substance that constitutes the cell and is the material condition for the life activity and the guarantee of the biological function activity. Elucidating the interactions and functions of proteins is a central task in exploring the mysteries of life. As an important protein interaction, self-interacting protein (SIP) has a critical role. The fast growth of high-throughput experimental techniques among biomolecules has led to a massive influx of available SIP data. How to conduct scientific research using the massive amount of SIP data has become a new challenge that is being faced in related research fields such as biology and medicine. In this work, we design an SIP prediction method SIPGCN using a deep learning graph convolutional network (GCN) based on protein sequences. First, protein sequences are characterized using a position-specific scoring matrix, which is able to describe the biological evolutionary message, then their hidden features are extracted by the deep learning method GCN, and, finally, the random forest is utilized to predict whether there are interrelationships between proteins. In the cross-validation experiment, SIPGCN achieved 93.65% accuracy and 99.64% specificity in the human data set. SIPGCN achieved 90.69% and 99.08% of these two indicators in the yeast data set, respectively. Compared with other feature models and previous methods, SIPGCN showed excellent results. These outcomes suggest that SIPGCN may be a suitable instrument for predicting SIP and may be a reliable candidate for future wet experiments.
Collapse
Affiliation(s)
- Ying Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277160, China;
| | - Lin-Lin Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277160, China;
- Correspondence: (L.-L.W.); (L.W.)
| | - Leon Wong
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China; (L.W.); (Z.-H.Y.)
| | - Yang Li
- School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, China;
| | - Lei Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277160, China;
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China; (L.W.); (Z.-H.Y.)
- Correspondence: (L.-L.W.); (L.W.)
| | - Zhu-Hong You
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China; (L.W.); (Z.-H.Y.)
- School of Computer Science, Northwestern Polytechnical University, Xi’an 710129, China
| |
Collapse
|
34
|
Niu M, Zou Q, Wang C. GMNN2CD: identification of circRNA-disease associations based on variational inference and graph Markov neural networks. Bioinformatics 2022; 38:2246-2253. [PMID: 35157027 DOI: 10.1093/bioinformatics/btac079] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Revised: 12/05/2021] [Accepted: 02/09/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION With the analysis of the characteristic and function of circular RNAs (circRNAs), people have realized that they play a critical role in the diseases. Exploring the relationship between circRNAs and diseases is of far-reaching significance for searching the etiopathogenesis and treatment of diseases. Nevertheless, it is inefficient to learn new associations only through biotechnology. RESULTS Consequently, we present a computational method, GMNN2CD, which employs a graph Markov neural network (GMNN) algorithm to predict unknown circRNA-disease associations. First, used verified associations, we calculate semantic similarity and Gaussian interactive profile kernel similarity (GIPs) of the disease and the GIPs of circRNA and then merge them to form a unified descriptor. After that, GMNN2CD uses a fusion feature variational map autoencoder to learn deep features and uses a label propagation map autoencoder to propagate tags based on known associations. Based on variational inference, GMNN alternate training enhances the ability of GMNN2CD to obtain high-efficiency high-dimensional features from low-dimensional representations. Finally, 5-fold cross-validation of five benchmark datasets shows that GMNN2CD is superior to the state-of-the-art methods. Furthermore, case studies have shown that GMNN2CD can detect potential associations. AVAILABILITY AND IMPLEMENTATION The source code and data are available at https://github.com/nmt315320/GMNN2CD.git.
Collapse
Affiliation(s)
- Mengting Niu
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan 610000, China.,Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang 324000, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan 610000, China.,Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang 324000, China
| | - Chunyu Wang
- Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150000, China
| |
Collapse
|
35
|
Ma Z, Kuang Z, Deng L. CRPGCN: predicting circRNA-disease associations using graph convolutional network based on heterogeneous network. BMC Bioinformatics 2021; 22:551. [PMID: 34772332 PMCID: PMC8588735 DOI: 10.1186/s12859-021-04467-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Accepted: 11/01/2021] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND The existing studies show that circRNAs can be used as a biomarker of diseases and play a prominent role in the treatment and diagnosis of diseases. However, the relationships between the vast majority of circRNAs and diseases are still unclear, and more experiments are needed to study the mechanism of circRNAs. Nowadays, some scholars use the attributes between circRNAs and diseases to study and predict their associations. Nonetheless, most of the existing experimental methods use less information about the attributes of circRNAs, which has a certain impact on the accuracy of the final prediction results. On the other hand, some scholars also apply experimental methods to predict the associations between circRNAs and diseases. But such methods are usually expensive and time-consuming. Based on the above shortcomings, follow-up research is needed to propose a more efficient calculation-based method to predict the associations between circRNAs and diseases. RESULTS In this study, a novel algorithm (method) is proposed, which is based on the Graph Convolutional Network (GCN) constructed with Random Walk with Restart (RWR) and Principal Component Analysis (PCA) to predict the associations between circRNAs and diseases (CRPGCN). In the construction of CRPGCN, the RWR algorithm is used to improve the similarity associations of the computed nodes with their neighbours. After that, the PCA method is used to dimensionality reduction and extract features, it makes the connection between circRNAs with higher similarity and diseases closer. Finally, The GCN algorithm is used to learn the features between circRNAs and diseases and calculate the final similarity scores, and the learning datas are constructed from the adjacency matrix, similarity matrix and feature matrix as a heterogeneous adjacency matrix and a heterogeneous feature matrix. CONCLUSIONS After 2-fold cross-validation, 5-fold cross-validation and 10-fold cross-validation, the area under the ROC curve of the CRPGCN is 0.9490, 0.9720 and 0.9722, respectively. The CRPGCN method has a valuable effect in predict the associations between circRNAs and diseases.
Collapse
Affiliation(s)
- Zhihao Ma
- School of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha, China
| | - Zhufang Kuang
- School of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha, China
| | - Lei Deng
- School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
36
|
Löchel HF, Heider D. Chaos game representation and its applications in bioinformatics. Comput Struct Biotechnol J 2021; 19:6263-6271. [PMID: 34900136 PMCID: PMC8636998 DOI: 10.1016/j.csbj.2021.11.008] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Revised: 11/04/2021] [Accepted: 11/05/2021] [Indexed: 11/18/2022] Open
Abstract
Chaos game representation (CGR), a milestone in graphical bioinformatics, has become a powerful tool regarding alignment-free sequence comparison and feature encoding for machine learning. The algorithm maps a sequence to 2-dimensional space, while an extension of the CGR, the so-called frequency matrix representation (FCGR), transforms sequences of different lengths into equal-sized images or matrices. The CGR is a generalized Markov chain and includes various properties, which allow a unique representation of a sequence. Therefore, it has a broad spectrum of applications in bioinformatics, such as sequence comparison and phylogenetic analysis and as an encoding of sequences for machine learning. This review introduces the construction of CGRs and FCGRs, their applications on DNA and proteins, and gives an overview of recent applications and progress in bioinformatics.
Collapse
Affiliation(s)
- Hannah Franziska Löchel
- Department of Mathematics and Computer Science, University of Marburg, Hans-Meerwein-Str. 6, D-35032 Marburg, Germany
| | - Dominik Heider
- Department of Mathematics and Computer Science, University of Marburg, Hans-Meerwein-Str. 6, D-35032 Marburg, Germany
| |
Collapse
|
37
|
Graph convolutional network approach to discovering disease-related circRNA-miRNA-mRNA axes. Methods 2021; 198:45-55. [PMID: 34758394 DOI: 10.1016/j.ymeth.2021.10.006] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2021] [Revised: 10/07/2021] [Accepted: 10/19/2021] [Indexed: 02/05/2023] Open
Abstract
Non-coding RNAs are gaining prominence in biology and medicine, as they play major roles in cellular homeostasis among which the circRNA-miRNA-mRNA axes are involved in a series of disease-related pathways, such as apoptosis, cell invasion and metastasis. Recently, many computational methods have been developed for the prediction of the relationship between ncRNAs and diseases, which can alleviate the time-consuming and labor-intensive exploration involved with biological experiments. However, these methods handle ncRNAs separately, ignoring the impact of the interactions among ncRNAs on the diseases. In this paper we present a novel approach to discovering disease-related circRNA-miRNA-mRNA axes from the disease-RNA information network. Our method, using graph convolutional network, learns the characteristic representation of each biological entity by propagating and aggregating local neighbor information based on the global structure of the network. The approach is evaluated using the real-world datasets and the results show that it outperforms other state-of-the-art baselines on most of the metrics.
Collapse
|
38
|
Wang CC, Han CD, Zhao Q, Chen X. Circular RNAs and complex diseases: from experimental results to computational models. Brief Bioinform 2021; 22:bbab286. [PMID: 34329377 PMCID: PMC8575014 DOI: 10.1093/bib/bbab286] [Citation(s) in RCA: 117] [Impact Index Per Article: 29.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Revised: 06/23/2021] [Accepted: 07/03/2021] [Indexed: 12/13/2022] Open
Abstract
Circular RNAs (circRNAs) are a class of single-stranded, covalently closed RNA molecules with a variety of biological functions. Studies have shown that circRNAs are involved in a variety of biological processes and play an important role in the development of various complex diseases, so the identification of circRNA-disease associations would contribute to the diagnosis and treatment of diseases. In this review, we summarize the discovery, classifications and functions of circRNAs and introduce four important diseases associated with circRNAs. Then, we list some significant and publicly accessible databases containing comprehensive annotation resources of circRNAs and experimentally validated circRNA-disease associations. Next, we introduce some state-of-the-art computational models for predicting novel circRNA-disease associations and divide them into two categories, namely network algorithm-based and machine learning-based models. Subsequently, several evaluation methods of prediction performance of these computational models are summarized. Finally, we analyze the advantages and disadvantages of different types of computational models and provide some suggestions to promote the development of circRNA-disease association identification from the perspective of the construction of new computational models and the accumulation of circRNA-related data.
Collapse
Affiliation(s)
- Chun-Chun Wang
- School of Information and Control Engineering, China University of Mining and Technology
| | - Chen-Di Han
- School of Information and Control Engineering, China University of Mining and Technology
| | - Qi Zhao
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning
| | - Xing Chen
- China University of Mining and Technology
| |
Collapse
|
39
|
Xiao Q, Dai J, Luo J. A survey of circular RNAs in complex diseases: databases, tools and computational methods. Brief Bioinform 2021; 23:6407737. [PMID: 34676391 DOI: 10.1093/bib/bbab444] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Revised: 09/21/2021] [Accepted: 09/28/2021] [Indexed: 01/22/2023] Open
Abstract
Circular RNAs (circRNAs) are a category of novelty discovered competing endogenous non-coding RNAs that have been proved to implicate many human complex diseases. A large number of circRNAs have been confirmed to be involved in cancer progression and are expected to become promising biomarkers for tumor diagnosis and targeted therapy. Deciphering the underlying relationships between circRNAs and diseases may provide new insights for us to understand the pathogenesis of complex diseases and further characterize the biological functions of circRNAs. As traditional experimental methods are usually time-consuming and laborious, computational models have made significant progress in systematically exploring potential circRNA-disease associations, which not only creates new opportunities for investigating pathogenic mechanisms at the level of circRNAs, but also helps to significantly improve the efficiency of clinical trials. In this review, we first summarize the functions and characteristics of circRNAs and introduce some representative circRNAs related to tumorigenesis. Then, we mainly investigate the available databases and tools dedicated to circRNA and disease studies. Next, we present a comprehensive review of computational methods for predicting circRNA-disease associations and classify them into five categories, including network propagating-based, path-based, matrix factorization-based, deep learning-based and other machine learning methods. Finally, we further discuss the challenges and future researches in this field.
Collapse
Affiliation(s)
- Qiu Xiao
- Hunan Normal University and Hunan Xiangjiang Artificial Intelligence Academy, Changsha, China
| | - Jianhua Dai
- Hunan Normal University and Hunan Xiangjiang Artificial Intelligence Academy, Changsha, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| |
Collapse
|
40
|
Wang L, You ZH, Zhou X, Yan X, Li HY, Huang YA. NMFCDA: Combining randomization-based neural network with non-negative matrix factorization for predicting CircRNA-disease association. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107629] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
41
|
Wang L, Yan X, You ZH, Zhou X, Li HY, Huang YA. SGANRDA: semi-supervised generative adversarial networks for predicting circRNA-disease associations. Brief Bioinform 2021; 22:6175330. [PMID: 33734296 DOI: 10.1093/bib/bbab028] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Revised: 01/18/2021] [Accepted: 01/19/2021] [Indexed: 12/31/2022] Open
Abstract
Emerging research shows that circular RNA (circRNA) plays a crucial role in the diagnosis, occurrence and prognosis of complex human diseases. Compared with traditional biological experiments, the computational method of fusing multi-source biological data to identify the association between circRNA and disease can effectively reduce cost and save time. Considering the limitations of existing computational models, we propose a semi-supervised generative adversarial network (GAN) model SGANRDA for predicting circRNA-disease association. This model first fused the natural language features of the circRNA sequence and the features of disease semantics, circRNA and disease Gaussian interaction profile kernel, and then used all circRNA-disease pairs to pre-train the GAN network, and fine-tune the network parameters through labeled samples. Finally, the extreme learning machine classifier is employed to obtain the prediction result. Compared with the previous supervision model, SGANRDA innovatively introduced circRNA sequences and utilized all the information of circRNA-disease pairs during the pre-training process. This step can increase the information content of the feature to some extent and reduce the impact of too few known associations on the model performance. SGANRDA obtained AUC scores of 0.9411 and 0.9223 in leave-one-out cross-validation and 5-fold cross-validation, respectively. Prediction results on the benchmark dataset show that SGANRDA outperforms other existing models. In addition, 25 of the top 30 circRNA-disease pairs with the highest scores of SGANRDA in case studies were verified by recent literature. These experimental results demonstrate that SGANRDA is a useful model to predict the circRNA-disease association and can provide reliable candidates for biological experiments.
Collapse
Affiliation(s)
- Lei Wang
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, China
| | - Xin Yan
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116, China
| | - Zhu-Hong You
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, China
| | - Xi Zhou
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, China
| | - Hao-Yuan Li
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116, China
| | - Yu-An Huang
- Department of Computing, Hong Kong Polytechnic University, Hong Kong, China
| |
Collapse
|
42
|
Zhang Q, Liu P, Wang X, Zhang Y, Han Y, Yu B. StackPDB: Predicting DNA-binding proteins based on XGB-RFE feature optimization and stacked ensemble classifier. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2020.106921] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
|
43
|
Jia LN, Yan X, You ZH, Zhou X, Li LP, Wang L, Song KJ. NLPEI: A Novel Self-Interacting Protein Prediction Model Based on Natural Language Processing and Evolutionary Information. Evol Bioinform Online 2020; 16:1176934320984171. [PMID: 33488064 PMCID: PMC7768313 DOI: 10.1177/1176934320984171] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Accepted: 12/01/2020] [Indexed: 12/13/2022] Open
Abstract
The study of protein self-interactions (SIPs) can not only reveal the function of proteins at the molecular level, but is also crucial to understand activities such as growth, development, differentiation, and apoptosis, providing an important theoretical basis for exploring the mechanism of major diseases. With the rapid advances in biotechnology, a large number of SIPs have been discovered. However, due to the long period and high cost inherent to biological experiments, the gap between the identification of SIPs and the accumulation of data is growing. Therefore, fast and accurate computational methods are needed to effectively predict SIPs. In this study, we designed a new method, NLPEI, for predicting SIPs based on natural language understanding theory and evolutionary information. Specifically, we first understand the protein sequence as natural language and use natural language processing algorithms to extract its features. Then, we use the Position-Specific Scoring Matrix (PSSM) to represent the evolutionary information of the protein and extract its features through the Stacked Auto-Encoder (SAE) algorithm of deep learning. Finally, we fuse the natural language features of proteins with evolutionary features and make accurate predictions by Extreme Learning Machine (ELM) classifier. In the SIPs gold standard data sets of human and yeast, NLPEI achieved 94.19% and 91.29% prediction accuracy. Compared with different classifier models, different feature models, and other existing methods, NLPEI obtained the best results. These experimental results indicated that NLPEI is an effective tool for predicting SIPs and can provide reliable candidates for biological experiments.
Collapse
Affiliation(s)
- Li-Na Jia
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, China
| | - Xin Yan
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China
- School of Foreign Languages, Zaozhuang University, Zaozhuang, China
| | - Zhu-Hong You
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Xi Zhou
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Li-Ping Li
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Lei Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, China
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
- Lei Wang, Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China.
| | - Ke-Jian Song
- School of information engineering, Jiangxi University of Science and Technology, Ganzhou, China
| |
Collapse
|
44
|
Lu C, Zeng M, Wu FX, Li M, Wang J. Improving circRNA-disease association prediction by sequence and ontology representations with convolutional and recurrent neural networks. Bioinformatics 2020; 36:5656-5664. [PMID: 33367690 DOI: 10.1093/bioinformatics/btaa1077] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Revised: 11/22/2020] [Accepted: 12/15/2020] [Indexed: 01/26/2023] Open
Abstract
MOTIVATION Emerging studies indicate that circular RNAs (circRNAs) are widely involved in the progression of human diseases. Due to its special structure which is stable, circRNAs are promising diagnostic and prognostic biomarkers for diseases. However, the experimental verification of circRNA-disease associations is expensive and limited to small-scale. Effective computational methods for predicting potential circRNA-disease associations are regarded as a matter of urgency. Although several models have been proposed, over-reliance on known associations and the absence of characteristics of biological functions make precise predictions are still challenging. RESULTS In this study, we propose a method for predicting CircRNA-Disease Associations based on Sequence and Ontology Representations, named CDASOR, with convolutional and recurrent neural networks. For sequences of circRNAs, we encode them with continuous k-mers, get low-dimensional vectors of k-mers, extract their local feature vectors with 1 D CNN and learn their long-term dependencies with bi-directional long short-term memory. For diseases, we serialize disease ontology into sentences containing the hierarchy of ontology, obtain low-dimensional vectors for disease ontology terms and get terms' dependencies. Furthermore, we get association patterns of circRNAs and diseases from known circRNA-disease associations with neural networks. After the above steps, we get circRNAs' and diseases' high-level representations which are informative to improve the prediction. The experimental results show that CDASOR provides an accurate prediction. Importing the characteristics of biological functions, CDASOR achieves impressive predictions in the de novo test. In addition, 6 of the top-10 predicted results are verified by the published literature in the case studies. AVAILABILITY The code of CDASOR is freely available at https://github.com/BioinformaticsCSU/CDASOR. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chengqian Lu
- School of Computer Science and Engineering, Central South University, Changsha, 410083, P.R. China.,Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - Min Zeng
- School of Computer Science and Engineering, Central South University, Changsha, 410083, P.R. China.,Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - Fang-Xiang Wu
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, 410083, P.R. China.,Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha, 410083, P.R. China.,Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| |
Collapse
|
45
|
Lei X, Mudiyanselage TB, Zhang Y, Bian C, Lan W, Yu N, Pan Y. A comprehensive survey on computational methods of non-coding RNA and disease association prediction. Brief Bioinform 2020; 22:6042241. [PMID: 33341893 DOI: 10.1093/bib/bbaa350] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 10/20/2020] [Accepted: 11/01/2020] [Indexed: 02/06/2023] Open
Abstract
The studies on relationships between non-coding RNAs and diseases are widely carried out in recent years. A large number of experimental methods and technologies of producing biological data have also been developed. However, due to their high labor cost and production time, nowadays, calculation-based methods, especially machine learning and deep learning methods, have received a lot of attention and been used commonly to solve these problems. From a computational point of view, this survey mainly introduces three common non-coding RNAs, i.e. miRNAs, lncRNAs and circRNAs, and the related computational methods for predicting their association with diseases. First, the mainstream databases of above three non-coding RNAs are introduced in detail. Then, we present several methods for RNA similarity and disease similarity calculations. Later, we investigate ncRNA-disease prediction methods in details and classify these methods into five types: network propagating, recommend system, matrix completion, machine learning and deep learning. Furthermore, we provide a summary of the applications of these five types of computational methods in predicting the associations between diseases and miRNAs, lncRNAs and circRNAs, respectively. Finally, the advantages and limitations of various methods are identified, and future researches and challenges are also discussed.
Collapse
Affiliation(s)
- Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | | | - Yuchen Zhang
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Chen Bian
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Wei Lan
- School of Computer, Electronics and Information at Guangxi University, Nanning, China
| | - Ning Yu
- Department of Computing Sciences at the College at Brockport, State University of New York, Rochester, NY, USA
| | - Yi Pan
- Computer Science Department at Georgia State University, Atlanta, GA, USA
| |
Collapse
|
46
|
Zhao Y, Zheng K, Guan B, Guo M, Song L, Gao J, Qu H, Wang Y, Shi D, Zhang Y. DLDTI: a learning-based framework for drug-target interaction identification using neural networks and network representation. J Transl Med 2020; 18:434. [PMID: 33187537 PMCID: PMC7666529 DOI: 10.1186/s12967-020-02602-7] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Accepted: 11/01/2020] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Drug repositioning, the strategy of unveiling novel targets of existing drugs could reduce costs and accelerate the pace of drug development. To elucidate the novel molecular mechanism of known drugs, considering the long time and high cost of experimental determination, the efficient and feasible computational methods to predict the potential associations between drugs and targets are of great aid. METHODS A novel calculation model for drug-target interaction (DTI) prediction based on network representation learning and convolutional neural networks, called DLDTI, was generated. The proposed approach simultaneously fused the topology of complex networks and diverse information from heterogeneous data sources, and coped with the noisy, incomplete, and high-dimensional nature of large-scale biological data by learning the low-dimensional and rich depth features of drugs and proteins. The low-dimensional feature vectors were used to train DLDTI to obtain the optimal mapping space and to infer new DTIs by ranking candidates according to their proximity to the optimal mapping space. More specifically, based on the results from the DLDTI, we experimentally validated the predicted targets of tetramethylpyrazine (TMPZ) on atherosclerosis progression in vivo. RESULTS The experimental results showed that the DLDTI model achieved promising performance under fivefold cross-validations with AUC values of 0.9172, which was higher than the methods using different classifiers or different feature combination methods mentioned in this paper. For the validation study of TMPZ on atherosclerosis, a total of 288 targets were identified and 190 of them were involved in platelet activation. The pathway analysis indicated signaling pathways, namely PI3K/Akt, cAMP and calcium pathways might be the potential targets. Effects and molecular mechanism of TMPZ on atherosclerosis were experimentally confirmed in animal models. CONCLUSIONS DLDTI model can serve as a useful tool to provide promising DTI candidates for experimental validation. Based on the predicted results of DLDTI model, we found TMPZ could attenuate atherosclerosis by inhibiting signal transductions in platelets. The source code and datasets explored in this work are available at https://github.com/CUMTzackGit/DLDTI .
Collapse
Affiliation(s)
- Yihan Zhao
- Department of Graduate School, Beijing University of Chinese Medicine, Beijing, China
| | - Kai Zheng
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Baoyi Guan
- National Clinical Research Center for Chinese Medicine Cardiology, Xiyuan Hospital, Cardiovascular Diseases Center, China Academy of Chinese Medical Sciences, Beijing, China
| | - Mengmeng Guo
- Institute of Cardiovascular Sciences, Health Science Center, Peking University, Key Laboratory of Molecular Cardiovascular Sciences, Ministry of Education, Beijing, China
| | - Lei Song
- Department of Graduate School, Beijing University of Chinese Medicine, Beijing, China
| | - Jie Gao
- National Clinical Research Center for Chinese Medicine Cardiology, Xiyuan Hospital, Cardiovascular Diseases Center, China Academy of Chinese Medical Sciences, Beijing, China
| | - Hua Qu
- National Clinical Research Center for Chinese Medicine Cardiology, Xiyuan Hospital, Cardiovascular Diseases Center, China Academy of Chinese Medical Sciences, Beijing, China
| | - Yuhui Wang
- Institute of Cardiovascular Sciences, Health Science Center, Peking University, Key Laboratory of Molecular Cardiovascular Sciences, Ministry of Education, Beijing, China
| | - Dazhuo Shi
- National Clinical Research Center for Chinese Medicine Cardiology, Xiyuan Hospital, Cardiovascular Diseases Center, China Academy of Chinese Medical Sciences, Beijing, China.
| | - Ying Zhang
- National Clinical Research Center for Chinese Medicine Cardiology, Xiyuan Hospital, Cardiovascular Diseases Center, China Academy of Chinese Medical Sciences, Beijing, China.
| |
Collapse
|
47
|
Xiao Q, Zhong J, Tang X, Luo J. iCDA-CMG: identifying circRNA-disease associations by federating multi-similarity fusion and collective matrix completion. Mol Genet Genomics 2020; 296:223-233. [PMID: 33159254 DOI: 10.1007/s00438-020-01741-2] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Accepted: 10/23/2020] [Indexed: 01/22/2023]
Abstract
Circular RNAs (circRNAs) are a special class of non-coding RNAs with covalently closed-loop structures. Studies prove that circRNAs perform critical roles in various biological processes, and the aberrant expression of circRNAs is closely related to tumorigenesis. Therefore, identifying potential circRNA-disease associations is beneficial to understand the pathogenesis of complex diseases at the circRNA level and helps biomedical researchers and practitioners to discover diagnostic biomarkers accurately. However, it is tremendously laborious and time-consuming to discover disease-related circRNAs with conventional biological experiments. In this study, we develop an integrative framework, called iCDA-CMG, to predict potential associations between circRNAs and diseases. By incorporating multi-source prior knowledge, including known circRNA-disease associations, disease similarities and circRNA similarities, we adopt a collective matrix completion-based graph learning model to prioritize the most promising disease-related circRNAs for guiding laborious clinical trials. The results show that iCDA-CMG outperforms other state-of-the-art models in terms of cross-validation and independent prediction. Moreover, the case studies for several representative cancers suggest the effectiveness of iCDA-CMG in screening circRNA candidates for human diseases, which will contribute to elucidating the pathogenesis mechanisms and unveiling new opportunities for disease diagnosis and targeted therapy.
Collapse
Affiliation(s)
- Qiu Xiao
- Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing, College of Information Science and Engineering, Hunan Normal University, Changsha, 410081, China.,Hunan Xiangjiang Artificial Intelligence Academy, Changsha, 410000, China
| | - Jiancheng Zhong
- Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing, College of Information Science and Engineering, Hunan Normal University, Changsha, 410081, China.
| | - Xiwei Tang
- School of Information Science and Engineering, Hunan First Normal University, Changsha, 410205, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China.
| |
Collapse
|
48
|
Fan C, Lei X, Pan Y. Prioritizing CircRNA-Disease Associations With Convolutional Neural Network Based on Multiple Similarity Feature Fusion. Front Genet 2020; 11:540751. [PMID: 33193615 PMCID: PMC7525185 DOI: 10.3389/fgene.2020.540751] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Accepted: 08/12/2020] [Indexed: 12/15/2022] Open
Abstract
Accumulating evidence shows that circular RNAs (circRNAs) have significant roles in human health and in the occurrence and development of diseases. Biological researchers have identified disease-related circRNAs that could be considered as potential biomarkers for clinical diagnosis, prognosis, and treatment. However, identification of circRNA–disease associations using traditional biological experiments is still expensive and time-consuming. In this study, we propose a novel method named MSFCNN for the task of circRNA–disease association prediction, involving two-layer convolutional neural networks on a feature matrix that fuses multiple similarity kernels and interaction features among circRNAs, miRNAs, and diseases. First, four circRNA similarity kernels and seven disease similarity kernels are constructed based on the biological or topological properties of circRNAs and diseases. Subsequently, the similarity kernel fusion method is used to integrate the similarity kernels into one circRNA similarity kernel and one disease similarity kernel, respectively. Then, a feature matrix for each circRNA–disease pair is constructed by integrating the fused circRNA similarity kernel and fused disease similarity kernel with interactions and features among circRNAs, miRNAs, and diseases. The features of circRNA–miRNA and disease–miRNA interactions are selected using principal component analysis. Finally, taking the constructed feature matrix as an input, we used two-layer convolutional neural networks to predict circRNA–disease association labels and mine potential novel associations. Five-fold cross validation shows that our proposed model outperforms conventional machine learning methods, including support vector machine, random forest, and multilayer perception approaches. Furthermore, case studies of predicted circRNAs for specific diseases and the top predicted circRNA–disease associations are analyzed. The results show that the MSFCNN model could be an effective tool for mining potential circRNA–disease associations.
Collapse
Affiliation(s)
- Chunyan Fan
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Yi Pan
- Department of Computer Science, Georgia State University, Atlanta, GA, United States
| |
Collapse
|
49
|
Wei H, Ding Y, Liu B. iPiDA-sHN: Identification of Piwi-interacting RNA-disease associations by selecting high quality negative samples. Comput Biol Chem 2020; 88:107361. [PMID: 32916452 DOI: 10.1016/j.compbiolchem.2020.107361] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2020] [Revised: 07/31/2020] [Accepted: 08/15/2020] [Indexed: 12/31/2022]
Abstract
As a large group of small non-coding RNAs (ncRNAs), Piwi-interacting RNAs (piRNAs) have been detected to be associated with various diseases. Identifying disease associated piRNAs can provide promising candidate molecular targets to promote the drug design. Although, a few computational ensemble methods have been developed for identifying piRNA-disease associations, the low-quality negative associations even with positive associations used during the training process prevent the predictive performance improvement. In this study, we proposed a new computational predictor named iPiDA-sHN to predict potential piRNA-disease associations. iPiDA-sHN presented the piRNA-disease pairs by incorporating piRNA sequence information, the known piRNA-disease association network, and the disease semantic graph. High-level features of piRNA-disease associations were extracted by the Convolutional Neural Network (CNN). Two-step positive-unlabeled learning strategy based on Support Vector Machine (SVM) was employed to select the high quality negative samples from the unknown piRNA-disease pairs. Finally, the SVM predictor trained with the known piRNA-disease associations and the high quality negative associations was used to predict new piRNA-disease associations. The experimental results showed that iPiDA-sHN achieved superior predictive ability compared with other state-of-the-art predictors.
Collapse
Affiliation(s)
- Hang Wei
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China.
| | - Yuxin Ding
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China.
| | - Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China; School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China; Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing 100081, China.
| |
Collapse
|