1
|
Zhang W, Zeng Y, Xiang X, Zhao B, Hu S, Li L, Zhu X, Wang L. Association prediction of lncRNAs and diseases using multiview graph convolution neural network. Front Genet 2025; 16:1568270. [PMID: 40303981 PMCID: PMC12037633 DOI: 10.3389/fgene.2025.1568270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2025] [Accepted: 04/04/2025] [Indexed: 05/02/2025] Open
Abstract
Long noncoding RNAs (lncRNAs) regulate physiological processes via interactions with macromolecules such as miRNAs, proteins, and genes, forming disease-associated regulatory networks. However, predicting lncRNA-disease associations remains challenging due to network complexity and isolated entities. Here, we propose MVIGCN, a graph convolutional network (GCN)-based method integrating multimodal data to predict these associations. Our framework constructs a heterogeneous network combining disease semantics, lncRNA similarity, and miRNA-lncRNA-disease interactions to address isolation issues. By modeling topological features and multiscale relationships through deep learning with attention mechanisms, MVIGCN prioritizes critical nodes and edges, enhancing prediction accuracy. Cross-validation demonstrated improved reliability over single-view methods, highlighting its potential to identify disease-related lncRNA biomarkers. This work advances network-based computational strategies for decoding lncRNA functions in disease biology and provides a scalable tool for prioritizing therapeutic targets.
Collapse
Affiliation(s)
- Wei Zhang
- College of Computer Science and Engineering, Changsha University, Changsha, Hunan, China
| | - Yifu Zeng
- College of Computer Science and Engineering, Changsha University, Changsha, Hunan, China
| | - Xiaowen Xiang
- College of Computer Science and Engineering, Changsha University, Changsha, Hunan, China
| | - Bihai Zhao
- College of Computer Science and Engineering, Changsha University, Changsha, Hunan, China
| | - Sai Hu
- Department of Information and Computing Science, College of Mathematics, Changsha University, Changsha, China
| | - Limiao Li
- College of Computer Science and Engineering, Changsha University, Changsha, Hunan, China
| | - Xiaoyu Zhu
- College of Computer Science and Engineering, Changsha University, Changsha, Hunan, China
| | - Lei Wang
- College of Computer Science and Engineering, Changsha University, Changsha, Hunan, China
| |
Collapse
|
2
|
Yang G, Liu Y, Wen S, Chen W, Zhu X, Wang Y. DTI-MHAPR: optimized drug-target interaction prediction via PCA-enhanced features and heterogeneous graph attention networks. BMC Bioinformatics 2025; 26:11. [PMID: 39800678 PMCID: PMC11726937 DOI: 10.1186/s12859-024-06021-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Accepted: 12/20/2024] [Indexed: 01/16/2025] Open
Abstract
Drug-target interactions (DTIs) are pivotal in drug discovery and development, and their accurate identification can significantly expedite the process. Numerous DTI prediction methods have emerged, yet many fail to fully harness the feature information of drugs and targets or address the issue of feature redundancy. We aim to refine DTI prediction accuracy by eliminating redundant features and capitalizing on the node topological structure to enhance feature extraction. To achieve this, we introduce a PCA-augmented multi-layer heterogeneous graph-based network that concentrates on key features throughout the encoding-decoding phase. Our approach initiates with the construction of a heterogeneous graph from various similarity metrics, which is then encoded via a graph neural network. We concatenate and integrate the resultant representation vectors to merge multi-level information. Subsequently, principal component analysis is applied to distill the most informative features, with the random forest algorithm employed for the final decoding of the integrated data. Our method outperforms six baseline models in terms of accuracy, as demonstrated by extensive experimentation. Comprehensive ablation studies, visualization of results, and in-depth case analyses further validate our framework's efficacy and interpretability, providing a novel tool for drug discovery that integrates multimodal features.
Collapse
Affiliation(s)
- Guang Yang
- School of Information and Artificial Intelligence, Anhui Agricultural University, Changjiang West Road, Hefei, 230036, Anhui, China
| | - Yinbo Liu
- School of Information and Artificial Intelligence, Anhui Agricultural University, Changjiang West Road, Hefei, 230036, Anhui, China
| | - Sijian Wen
- School of Information and Artificial Intelligence, Anhui Agricultural University, Changjiang West Road, Hefei, 230036, Anhui, China
| | - Wenxi Chen
- School of Information and Artificial Intelligence, Anhui Agricultural University, Changjiang West Road, Hefei, 230036, Anhui, China
| | - Xiaolei Zhu
- School of Information and Artificial Intelligence, Anhui Agricultural University, Changjiang West Road, Hefei, 230036, Anhui, China
| | - Yongmei Wang
- School of Information and Artificial Intelligence, Anhui Agricultural University, Changjiang West Road, Hefei, 230036, Anhui, China.
| |
Collapse
|
3
|
Huang L, Sheng N, Gao L, Wang L, Hou W, Hong J, Wang Y. Self-Supervised Contrastive Learning on Attribute and Topology Graphs for Predicting Relationships Among lncRNAs, miRNAs and Diseases. IEEE J Biomed Health Inform 2025; 29:657-668. [PMID: 39316476 DOI: 10.1109/jbhi.2024.3467101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/26/2024]
Abstract
Exploring associations between long non-coding RNAs (lncRNAs), microRNAs (miRNAs) and diseases is crucial for disease prevention, diagnosis and treatment. While determining these relationships experimentally is resource-intensive and time-consuming, computational methods have emerged as an attractive way. However, existing computational methods tend to focus on single tasks, neglecting the benefits of leveraging multiple biomolecular interactions and domain-specific knowledge for multi-task prediction. Furthermore, the scarcity of labeled data for lncRNA-disease associations (LDAs), miRNA-disease associations (MDAs) and lncRNA-miRNA interactions (LMIs) poses challenges for comprehensive node embedding learning. This paper proposes a multi-task prediction model (called SSCLMD) that employs self-supervised contrastive learning on attribute and topology graphs to identify potential LDAs, MDAs and LMIs. Firstly, domain knowledge of lncRNAs, miRNAs and diseases as well as their interactions are exploited to construct attribute graph and topology graph, respectively. Then, the nodes are encoded in the attribute and topology spaces to extract the specific and common feature. Meanwhile, the attention mechanism is performed to adaptively fuse the embedding from different views. SSCLMD incorporates contrastive self-supervised learning as a regularize to guide node embedding learning in both attribute and topology space without relying on labels. Severing as a regularize in multi-task learning paradigm, it to improves the model.s generalization capabilities. Extensive experiments on 2 manually curated datasets demonstrate that SSCLMD significantly outperforms baseline methods in LDA, MDA and LMI prediction tasks. Case studies on both old and new datasets further supported SSCLMD's ability to uncover novel disease-related lncRNAs and miRNAs.
Collapse
|
4
|
Zhang B, Wang H, Ma C, Huang H, Fang Z, Qu J. LDAGM: prediction lncRNA-disease asociations by graph convolutional auto-encoder and multilayer perceptron based on multi-view heterogeneous networks. BMC Bioinformatics 2024; 25:332. [PMID: 39407120 PMCID: PMC11481433 DOI: 10.1186/s12859-024-05950-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Accepted: 10/01/2024] [Indexed: 10/19/2024] Open
Abstract
BACKGROUND Long non-coding RNAs (lncRNAs) can prevent, diagnose, and treat a variety of complex human diseases, and it is crucial to establish a method to efficiently predict lncRNA-disease associations. RESULTS In this paper, we propose a prediction method for the lncRNA-disease association relationship, named LDAGM, which is based on the Graph Convolutional Autoencoder and Multilayer Perceptron model. The method first extracts the functional similarity and Gaussian interaction profile kernel similarity of lncRNAs and miRNAs, as well as the semantic similarity and Gaussian interaction profile kernel similarity of diseases. It then constructs six homogeneous networks and deeply fuses them using a deep topology feature extraction method. The fused networks facilitate feature complementation and deep mining of the original association relationships, capturing the deep connections between nodes. Next, by combining the obtained deep topological features with the similarity network of lncRNA, disease, and miRNA interactions, we construct a multi-view heterogeneous network model. The Graph Convolutional Autoencoder is employed for nonlinear feature extraction. Finally, the extracted nonlinear features are combined with the deep topological features of the multi-view heterogeneous network to obtain the final feature representation of the lncRNA-disease pair. Prediction of the lncRNA-disease association relationship is performed using the Multilayer Perceptron model. To enhance the performance and stability of the Multilayer Perceptron model, we introduce a hidden layer called the aggregation layer in the Multilayer Perceptron model. Through a gate mechanism, it controls the flow of information between each hidden layer in the Multilayer Perceptron model, aiming to achieve optimal feature extraction from each hidden layer. CONCLUSIONS Parameter analysis, ablation studies, and comparison experiments verified the effectiveness of this method, and case studies verified the accuracy of this method in predicting lncRNA-disease association relationships.
Collapse
Grants
- No. 62172123 National Natural Science Foundation, China
- No. 62172123 National Natural Science Foundation, China
- No. 62172123 National Natural Science Foundation, China
- No. 62172123 National Natural Science Foundation, China
- No. 62172123 National Natural Science Foundation, China
- No. 62172123 National Natural Science Foundation, China
- Grant No. 2022ZX01A36 the Key Research and Development Program of Heilongjiang
- Grant No. 2022ZX01A36 the Key Research and Development Program of Heilongjiang
- Grant No. 2022ZX01A36 the Key Research and Development Program of Heilongjiang
- Grant No. 2022ZX01A36 the Key Research and Development Program of Heilongjiang
- Grant No. 2022ZX01A36 the Key Research and Development Program of Heilongjiang
- Grant No. 2022ZX01A36 the Key Research and Development Program of Heilongjiang
- No. ZY20B11 the Special projects for the central government to guide the development of local science and technology, China
- No. ZY20B11 the Special projects for the central government to guide the development of local science and technology, China
- No. ZY20B11 the Special projects for the central government to guide the development of local science and technology, China
- No. ZY20B11 the Special projects for the central government to guide the development of local science and technology, China
- No. ZY20B11 the Special projects for the central government to guide the development of local science and technology, China
- No. ZY20B11 the Special projects for the central government to guide the development of local science and technology, China
- No. CXRC20221104236 the Harbin Manufacturing Technology Innovation Talent Project
- No. CXRC20221104236 the Harbin Manufacturing Technology Innovation Talent Project
- No. CXRC20221104236 the Harbin Manufacturing Technology Innovation Talent Project
- No. CXRC20221104236 the Harbin Manufacturing Technology Innovation Talent Project
- No. CXRC20221104236 the Harbin Manufacturing Technology Innovation Talent Project
- No. CXRC20221104236 the Harbin Manufacturing Technology Innovation Talent Project
Collapse
Affiliation(s)
- Bing Zhang
- Harbin University of Science and Technology, Harbin, 150006, Heilongjiang province, China
| | - Haoyu Wang
- Harbin University of Science and Technology, Harbin, 150006, Heilongjiang province, China.
| | - Chao Ma
- Harbin University of Science and Technology, Harbin, 150006, Heilongjiang province, China
| | - Hai Huang
- Harbin University of Science and Technology, Harbin, 150006, Heilongjiang province, China
| | - Zhou Fang
- Cyberspace Research Center, Harbin, 150001, Heilongjiang province, China
| | - Jiaxing Qu
- Cyberspace Research Center, Harbin, 150001, Heilongjiang province, China
| |
Collapse
|
5
|
Wang XF, Huang L, Wang Y, Guan RC, You ZH, Sheng N, Xie XP, Hou WJ. Multi-view learning framework for predicting unknown types of cancer markers via directed graph neural networks fitting regulatory networks. Brief Bioinform 2024; 25:bbae546. [PMID: 39470307 PMCID: PMC11514060 DOI: 10.1093/bib/bbae546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Revised: 09/02/2024] [Accepted: 10/11/2024] [Indexed: 10/30/2024] Open
Abstract
The discovery of diagnostic and therapeutic biomarkers for complex diseases, especially cancer, has always been a central and long-term challenge in molecular association prediction research, offering promising avenues for advancing the understanding of complex diseases. To this end, researchers have developed various network-based prediction techniques targeting specific molecular associations. However, limitations imposed by reductionism and network representation learning have led existing studies to narrowly focus on high prediction efficiency within single association type, thereby glossing over the discovery of unknown types of associations. Additionally, effectively utilizing network structure to fit the interaction properties of regulatory networks and combining specific case biomarker validations remains an unresolved issue in cancer biomarker prediction methods. To overcome these limitations, we propose a multi-view learning framework, CeRVE, based on directed graph neural networks (DGNN) for predicting unknown type cancer biomarkers. CeRVE effectively extracts and integrates subgraph information through multi-view feature learning. Subsequently, CeRVE utilizes DGNN to simulate the entire regulatory network, propagating node attribute features and extracting various interaction relationships between molecules. Furthermore, CeRVE constructed a comparative analysis matrix of three cancers and adjacent normal tissues through The Cancer Genome Atlas and identified multiple types of potential cancer biomarkers through differential expression analysis of mRNA, microRNA, and long noncoding RNA. Computational testing of multiple types of biomarkers for 72 cancers demonstrates that CeRVE exhibits superior performance in cancer biomarker prediction, providing a powerful tool and insightful approach for AI-assisted disease biomarker discovery.
Collapse
Affiliation(s)
- Xin-Fei Wang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, No. 2699, Qianjin Street, Changchun, 130012, China
| | - Lan Huang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, No. 2699, Qianjin Street, Changchun, 130012, China
| | - Yan Wang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, No. 2699, Qianjin Street, Changchun, 130012, China
| | - Ren-Chu Guan
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, No. 2699, Qianjin Street, Changchun, 130012, China
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Youyi West Road, Xi’an, 710072, China
| | - Nan Sheng
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, No. 2699, Qianjin Street, Changchun, 130012, China
| | - Xu-Ping Xie
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, No. 2699, Qianjin Street, Changchun, 130012, China
| | - Wen-Ju Hou
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, No. 2699, Qianjin Street, Changchun, 130012, China
| |
Collapse
|
6
|
Yu CQ, Wang XF, Li LP, You ZH, Ren ZH, Chu P, Guo F, Wang ZY. RBNE-CMI: An Efficient Method for Predicting circRNA-miRNA Interactions via Multiattribute Incomplete Heterogeneous Network Embedding. J Chem Inf Model 2024. [PMID: 39231016 DOI: 10.1021/acs.jcim.4c01118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/06/2024]
Abstract
Circular RNA (circRNA)-microRNA (miRNA) interaction (CMI) plays crucial roles in cellular regulation, offering promising perspectives for disease diagnosis and therapy. Therefore, it is necessary to employ computational methods for the rapid and cost-effective prediction of potential circRNA-miRNA interactions. However, the existing methods are limited by incomplete data; therefore, it is difficult to model molecules with different attributes on a large scale, which greatly hinders the efficiency and performance of prediction. In this study, we propose an effective method for predicting circRNA-miRNA interactions, called RBNE-CMI, and introduce a framework that can embed incomplete multiattribute CMI heterogeneous networks. By combining the proposed method, we integrate different data sets in the CMI prediction field into one incomplete network for modeling, achieving superior performance in 5-fold cross-validation. Moreover, in the prediction task based on complete data, the proposed method still achieves better performance than the known model. In addition, in the case study, we successfully predicted 18 of the 20 potential cancer biomarkers. The data and source code can be found at https://github.com/1axin/RBNE-CMI.
Collapse
Affiliation(s)
- Chang-Qing Yu
- School of Information Engineering, Xijing University, Xi'an 710123 China
| | - Xin-Fei Wang
- College of Computer Science and Technology, Jilin University, Changchun 130012 China
| | - Li-Ping Li
- Yizhi School of Agriculture and Forestry, Xiangyang Polytechnic Institute, Xianyang 712000, China
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an 710072, China
| | - Zhong-Hao Ren
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China
| | - Peng Chu
- School of Information Engineering, Xijing University, Xi'an 710123 China
| | - Feng Guo
- School of Information Engineering, Xijing University, Xi'an 710123 China
| | - Zhen-Yu Wang
- School of Telecommunications, Lanzhou University of Technology, Lanzhou 730000, China
| |
Collapse
|
7
|
Xuan P, Wang W, Cui H, Wang S, Nakaguchi T, Zhang T. Mask-Guided Target Node Feature Learning and Dynamic Detailed Feature Enhancement for lncRNA-Disease Association Prediction. J Chem Inf Model 2024; 64:6662-6675. [PMID: 39112431 DOI: 10.1021/acs.jcim.4c00652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/27/2024]
Abstract
Identifying new relevant long noncoding RNAs (lncRNAs) for various human diseases can facilitate the exploration of the causes and progression of these diseases. Recently, several graph inference methods have been proposed to predict disease-related lncRNAs by exploiting the topological structure and node attributes within graphs. However, these methods did not prioritize the target lncRNA and disease nodes over auxiliary nodes like miRNA nodes, potentially limiting their ability to fully utilize the features of the target nodes. We propose a new method, mask-guided target node feature learning and dynamic detailed feature enhancement for lncRNA-disease association prediction (MDLD), to enhance node feature learning for improved lncRNA-disease association prediction. First, we designed a heterogeneous graph masked transformer autoencoder to guide feature learning, focusing more on the features of target lncRNA (disease) nodes. The target nodes were increasingly masked as training progressed, which helps develop a more robust prediction model. Second, we developed a graph convolutional network with dynamic residuals (GCNDR) to learn and integrate the heterogeneous topology and features of all lncRNA, disease, and miRNA nodes. GCNDR employs an interlayer residual strategy and a residual evolution strategy to mitigate oversmoothing caused by multilayer graph convolution. The interlayer residual strategy estimates the importance of node features learned in the previous GCN encoding layer for nodes in the current encoding layer. Additionally, since there are dependencies in the importance of features of individual lncRNA (disease, miRNA) nodes across multiple encoding layers, a gated recurrent unit-based strategy is proposed to encode these dependencies. Finally, we designed a perspective-level attention mechanism to obtain more informative features of lncRNA and disease node pairs from the perspectives of mask-enhanced and dynamic-enhanced node features. Cross-validation experimental results demonstrated that MDLD outperformed 10 other state-of-the-art prediction methods. Ablation experiments and case studies on candidate lncRNAs for three diseases further proved the technical contributions of MDLD and its capability to discover disease-related lncRNAs.
Collapse
Affiliation(s)
- Ping Xuan
- Department of Computer Science and Technology, Shantou University, Shantou 515063, China
- School of Mathematical Science, Heilongjiang University, Harbin 150080, China
| | - Wei Wang
- Department of Computer Science and Technology, Shantou University, Shantou 515063, China
| | - Hui Cui
- Department of Computer Science and Information Technology, La Trobe University, Melbourne 3083, Australia
| | - Shuai Wang
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China
| | - Toshiya Nakaguchi
- Center for Frontier Medical Engineering, Chiba University, Chiba 2638522, Japan
| | - Tiangang Zhang
- School of Mathematical Science, Heilongjiang University, Harbin 150080, China
| |
Collapse
|
8
|
Wang XF, Yu CQ, You ZH, Wang Y, Huang L, Qiao Y, Wang L, Li ZW. BEROLECMI: a novel prediction method to infer circRNA-miRNA interaction from the role definition of molecular attributes and biological networks. BMC Bioinformatics 2024; 25:264. [PMID: 39127625 DOI: 10.1186/s12859-024-05891-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 08/01/2024] [Indexed: 08/12/2024] Open
Abstract
Circular RNA (CircRNA)-microRNA (miRNA) interaction (CMI) is an important model for the regulation of biological processes by non-coding RNA (ncRNA), which provides a new perspective for the study of human complex diseases. However, the existing CMI prediction models mainly rely on the nearest neighbor structure in the biological network, ignoring the molecular network topology, so it is difficult to improve the prediction performance. In this paper, we proposed a new CMI prediction method, BEROLECMI, which uses molecular sequence attributes, molecular self-similarity, and biological network topology to define the specific role feature representation for molecules to infer the new CMI. BEROLECMI effectively makes up for the lack of network topology in the CMI prediction model and achieves the highest prediction performance in three commonly used data sets. In the case study, 14 of the 15 pairs of unknown CMIs were correctly predicted.
Collapse
Affiliation(s)
- Xin-Fei Wang
- School of Information Engineering, Xijing University, Xi'an, China
| | - Chang-Qing Yu
- School of Information Engineering, Xijing University, Xi'an, China.
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China.
| | - Yan Wang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China.
- School of Artificial Intelligence, Jilin University, Changchun, China.
| | - Lan Huang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Yan Qiao
- College of Agriculture and Forestry, Longdong University, Qingyang, China
| | - Lei Wang
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China
- Guangxi Academy of Sciences, Nanning, China
| | - Zheng-Wei Li
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China
| |
Collapse
|
9
|
Peng L, Ren M, Huang L, Chen M. GEnDDn: An lncRNA-Disease Association Identification Framework Based on Dual-Net Neural Architecture and Deep Neural Network. Interdiscip Sci 2024; 16:418-438. [PMID: 38733474 DOI: 10.1007/s12539-024-00619-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2023] [Revised: 02/02/2024] [Accepted: 02/03/2024] [Indexed: 05/13/2024]
Abstract
Accumulating studies have demonstrated close relationships between long non-coding RNAs (lncRNAs) and diseases. Identification of new lncRNA-disease associations (LDAs) enables us to better understand disease mechanisms and further provides promising insights into cancer targeted therapy and anti-cancer drug design. Here, we present an LDA prediction framework called GEnDDn based on deep learning. GEnDDn mainly comprises two steps: First, features of both lncRNAs and diseases are extracted by combining similarity computation, non-negative matrix factorization, and graph attention auto-encoder, respectively. And each lncRNA-disease pair (LDP) is depicted as a vector based on concatenation operation on the extracted features. Subsequently, unknown LDPs are classified by aggregating dual-net neural architecture and deep neural network. Using six different evaluation metrics, we found that GEnDDn surpassed four competing LDA identification methods (SDLDA, LDNFSGB, IPCARF, LDASR) on the lncRNADisease and MNDR databases under fivefold cross-validation experiments on lncRNAs, diseases, LDPs, and independent lncRNAs and independent diseases, respectively. Ablation experiments further validated the powerful LDA prediction performance of GEnDDn. Furthermore, we utilized GEnDDn to find underlying lncRNAs for lung cancer and breast cancer. The results elucidated that there may be dense linkages between IFNG-AS1 and lung cancer as well as between HIF1A-AS1 and breast cancer. The results require further biomedical experimental verification. GEnDDn is publicly available at https://github.com/plhhnu/GEnDDn.
Collapse
Affiliation(s)
- Lihong Peng
- College of Life Science and Chemistry, Hunan University of Technology, Zhuzhou, 412007, China
| | - Mengnan Ren
- College of Life Science and Chemistry, Hunan University of Technology, Zhuzhou, 412007, China
| | - Liangliang Huang
- College of Life Science and Chemistry, Hunan University of Technology, Zhuzhou, 412007, China
| | - Min Chen
- School of Computer Science, Hunan Institute of Technology, Hengyang, 421002, China.
| |
Collapse
|
10
|
Ma Y, Shi Y, Chen X, Zhang B, Wu H, Gao J. NFMCLDA: Predicting miRNA-based lncRNA-disease associations by network fusion and matrix completion. Comput Biol Med 2024; 174:108403. [PMID: 38582002 DOI: 10.1016/j.compbiomed.2024.108403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 03/28/2024] [Accepted: 04/01/2024] [Indexed: 04/08/2024]
Abstract
In recent years, emerging evidence has revealed a strong association between dysregulations of long non-coding RNAs (lncRNAs) and sophisticated human diseases. Biological experiments are adequate to identify such associations, but they are costly and time-consuming. Therefore, developing high-quality computational methods is a challenging and urgent task in the field of bioinformatics. This paper proposes a new lncRNA-disease association inference approach NFMCLDA (Network Fusion and Matrix Completion lncRNA-Disease Association), which can effectively integrate multi-source association data. In this approach, miRNA information is used as the transition path, and an unbalanced random walk method on three-layer heterogeneous network is adopted in the preprocessing. Therefore, more effective information between networks can be mined and the sparsity problem of the association matrix can be solved. Finally, the matrix completion method accurately predicts associations. The results show that NFMCLDA can provide more accurate lncRNA-disease associations than state-of-the-art methods. The areas under the receiver operating characteristic curves are 0.9648 and 0.9713, respectively, through the cross-validation of 5-fold and 10-fold. Data from published case studies on four diseases - lung cancer, osteosarcoma, cervical cancer, and colon cancer - have confirmed the reliable predictive potential of NFMCLDA model.
Collapse
Affiliation(s)
- Yibing Ma
- School of Science, Jiangnan University, Wuxi, Jiangsu, 214122, China
| | - Yongle Shi
- School of Science, Jiangnan University, Wuxi, Jiangsu, 214122, China
| | - Xiang Chen
- School of Science, Jiangnan University, Wuxi, Jiangsu, 214122, China
| | - Bai Zhang
- School of Science, Jiangnan University, Wuxi, Jiangsu, 214122, China
| | - Hanwen Wu
- School of Science, Jiangnan University, Wuxi, Jiangsu, 214122, China
| | - Jie Gao
- School of Science, Jiangnan University, Wuxi, Jiangsu, 214122, China.
| |
Collapse
|
11
|
Sheng N, Xie X, Wang Y, Huang L, Zhang S, Gao L, Wang H. A Survey of Deep Learning for Detecting miRNA- Disease Associations: Databases, Computational Methods, Challenges, and Future Directions. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:328-347. [PMID: 38194377 DOI: 10.1109/tcbb.2024.3351752] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2024]
Abstract
MicroRNAs (miRNAs) are an important class of non-coding RNAs that play an essential role in the occurrence and development of various diseases. Identifying the potential miRNA-disease associations (MDAs) can be beneficial in understanding disease pathogenesis. Traditional laboratory experiments are expensive and time-consuming. Computational models have enabled systematic large-scale prediction of potential MDAs, greatly improving the research efficiency. With recent advances in deep learning, it has become an attractive and powerful technique for uncovering novel MDAs. Consequently, numerous MDA prediction methods based on deep learning have emerged. In this review, we first summarize publicly available databases related to miRNAs and diseases for MDA prediction. Next, we outline commonly used miRNA and disease similarity calculation and integration methods. Then, we comprehensively review the 48 existing deep learning-based MDA computation methods, categorizing them into classical deep learning and graph neural network-based techniques. Subsequently, we investigate the evaluation methods and metrics that are frequently used to assess MDA prediction performance. Finally, we discuss the performance trends of different computational methods, point out some problems in current research, and propose 9 potential future research directions. Data resources and recent advances in MDA prediction methods are summarized in the GitHub repository https://github.com/sheng-n/DL-miRNA-disease-association-methods.
Collapse
|
12
|
Xuan P, Lu S, Cui H, Wang S, Nakaguchi T, Zhang T. Learning Association Characteristics by Dynamic Hypergraph and Gated Convolution Enhanced Pairwise Attributes for Prediction of Disease-Related lncRNAs. J Chem Inf Model 2024; 64:3569-3578. [PMID: 38523267 DOI: 10.1021/acs.jcim.4c00245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/26/2024]
Abstract
As the long non-coding RNAs (lncRNAs) play important roles during the incurrence and development of various human diseases, identifying disease-related lncRNAs can contribute to clarifying the pathogenesis of diseases. Most of the recent lncRNA-disease association prediction methods utilized the multi-source data about the lncRNAs and diseases. A single lncRNA may participate in multiple disease processes, and multiple lncRNAs usually are involved in the same disease process synergistically. However, the previous methods did not completely exploit the biological characteristics to construct the informative prediction models. We construct a prediction model based on adaptive hypergraph and gated convolution for lncRNA-disease association prediction (AGLDA), to embed and encode the biological characteristics about lncRNA-disease associations, the topological features from the entire heterogeneous graph perspective, and the gated enhanced pairwise features. First, the strategy for constructing hyperedges is designed to reflect the biological characteristic that multiple lncRNAs are involved in multiple disease processes. Furthermore, each hyperedge has its own biological perspective, and multiple hyperedges are beneficial for revealing the diverse relationships among multiple lncRNAs and diseases. Second, we encode the biological features of each lncRNA (disease) node using a strategy based on dynamic hypergraph convolutional networks. The strategy may adaptively learn the features of the hyperedges and formulate the dynamically evolved hypergraph topological structure. Third, a group convolutional network is established to integrate the entire heterogeneous topological structure and multiple types of node attributes within an lncRNA-disease-miRNA graph. Finally, a gated convolutional strategy is proposed to enhance the informative features of the lncRNA-disease node pairs. The comparison experiments indicate that AGLDA outperforms seven advanced prediction methods. The ablation studies confirm the effectiveness of major innovations, and the case studies validate AGLDA's ability in application for discovering potential disease-related lncRNA candidates.
Collapse
Affiliation(s)
- Ping Xuan
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
- Department of Computer Science, Shantou University, Shantou 515063, China
| | - Siyuan Lu
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Hui Cui
- Department of Computer Science and Information Technology, La Trobe University, Melbourne 3083, Australia
| | - Shuai Wang
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China
| | - Toshiya Nakaguchi
- Center for Frontier Medical Engineering, Chiba University, Chiba 2638522, Japan
| | - Tiangang Zhang
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
- School of Mathematical Science, Heilongjiang University, Harbin 150080, China
| |
Collapse
|
13
|
Li G, Bai P, Liang C, Luo J. Node-adaptive graph Transformer with structural encoding for accurate and robust lncRNA-disease association prediction. BMC Genomics 2024; 25:73. [PMID: 38233788 PMCID: PMC10795365 DOI: 10.1186/s12864-024-09998-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 01/09/2024] [Indexed: 01/19/2024] Open
Abstract
BACKGROUND Long noncoding RNAs (lncRNAs) are integral to a plethora of critical cellular biological processes, including the regulation of gene expression, cell differentiation, and the development of tumors and cancers. Predicting the relationships between lncRNAs and diseases can contribute to a better understanding of the pathogenic mechanisms of disease and provide strong support for the development of advanced treatment methods. RESULTS Therefore, we present an innovative Node-Adaptive Graph Transformer model for predicting unknown LncRNA-Disease Associations, named NAGTLDA. First, we utilize the node-adaptive feature smoothing (NAFS) method to learn the local feature information of nodes and encode the structural information of the fusion similarity network of diseases and lncRNAs using Structural Deep Network Embedding (SDNE). Next, the Transformer module is used to capture potential association information between the network nodes. Finally, we employ a Transformer module with two multi-headed attention layers for learning global-level embedding fusion. Network structure coding is added as the structural inductive bias of the network to compensate for the missing message-passing mechanism in Transformer. NAGTLDA achieved an average AUC of 0.9531 and AUPR of 0.9537 significantly higher than state-of-the-art methods in 5-fold cross validation. We perform case studies on 4 diseases; 55 out of 60 associations between lncRNAs and diseases have been validated in the literatures. The results demonstrate the enormous potential of the graph Transformer structure to incorporate graph structural information for uncovering lncRNA-disease unknown correlations. CONCLUSIONS Our proposed NAGTLDA model can serve as a highly efficient computational method for predicting biological information associations.
Collapse
Affiliation(s)
- Guanghui Li
- School of Information Engineering, East China Jiaotong University, Nanchang, China.
| | - Peihao Bai
- School of Information Engineering, East China Jiaotong University, Nanchang, China
| | - Cheng Liang
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China.
| |
Collapse
|
14
|
Wang S, Hui C, Zhang T, Wu P, Nakaguchi T, Xuan P. Graph Reasoning Method Based on Affinity Identification and Representation Decoupling for Predicting lncRNA-Disease Associations. J Chem Inf Model 2023; 63:6947-6958. [PMID: 37906529 DOI: 10.1021/acs.jcim.3c01214] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
An increasing number of studies have shown that dysregulation of lncRNAs is related to the occurrence of various diseases. Most of the previous methods, however, are designed based on homogeneity assumption that the representation of a target lncRNA (or disease) node should be updated by aggregating the attributes of its neighbor nodes. However, the assumption ignores the affinity nodes that are far from the target node. We present a novel prediction method, GAIRD, to fully leverage the heterogeneous information in the network and the decoupled node features. The first major innovation is a random walk strategy based on width-first searching and depth-first searching. Different from previous methods that only focus on homogeneous information, our new strategy learns both the homogeneous information within local neighborhoods and the heterogeneous information within higher-order neighborhoods. The second innovation is a representation decoupling module to extract the purer attributes and the purer topologies. Third, a module based on group convolution and deep separable convolution is developed to promote the pairwise intrachannel and interchannel feature learning. The experimental results show that GAIRD outperforms comparing state-of-the-art methods, and the ablation studies prove the contributions of major innovations. We also performed case studies on 3 diseases to further demonstrate the effectiveness of the GAIRD model in applications.
Collapse
Affiliation(s)
- Shuai Wang
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China
| | - Cui Hui
- Department of Computer Science and Information Technology, La Trobe University, Melbourne 3083, Australia
| | - Tiangang Zhang
- School of Mathematical Science, Heilongjiang University, Harbin 150080, China
| | - Peiliang Wu
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China
- Key Laboratory for Computer Virtual Technology and System Integration of Hebei Province, Qinhuangdao 066004, China
| | - Toshiya Nakaguchi
- Center for Frontier Medical Engineering, Chiba University, Chiba 2638522, Japan
| | - Ping Xuan
- Department of Computer Science, School of Engineering, Shantou University, Shantou 515063, China
| |
Collapse
|
15
|
Cao Y, Xiao J, Sheng N, Qu Y, Wang Z, Sun C, Mu X, Huang Z, Li X. X-LDA: An interpretable and knowledge-informed heterogeneous graph learning framework for LncRNA-disease association prediction. Comput Biol Med 2023; 167:107634. [PMID: 39491920 DOI: 10.1016/j.compbiomed.2023.107634] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 10/06/2023] [Accepted: 10/23/2023] [Indexed: 11/05/2024]
Abstract
The identification of disease-related long noncoding RNAs (lncRNAs) is beneficial to unravel the intricacies of gene expression regulation and epigenetic signatures. Computational methods provide a cost-effective means to explore lncRNA-disease associations (LDAs). However, these methods often lack interpretability, leaving their predictions less convincing to biological and medical researchers. We propose an interpretable and knowledge-informed heterogeneous graph learning framework based on graph patch convolution and integrated gradients to predict LDAs and provides intuitive explanations for its predictions, called X-LDA. The heterogeneous graph is the foundation of the predictions of LDAs, we construct the knowledge-informed heterogeneous graph including LDAs drawn from biological experiments, lncRNA similarities rooted in gene sequences, disease similarities constructed based on disease categorizations. To integrate diverse biological premises and facilitate interpretability, we define nine distinct graph patch types, which encapsulate essential topological relationships within lncRNA-disease node pairs. X-LDA is designed to employ parameter sharing and multi-convolution kernels to grasp common and multiple perspectives of the graph patches, respectively. This approach culminates in the fusion of various semantic information into context embeddings. These post-hoc explanations hinge on graph patch features and integrated gradients, shedding light on the underlying factors driving predictions. Cross validation experiment on the dataset curated from databases and literatures demonstrates that the superior performance of X-LDA in comparison to nine state-of-the-art methods of three categories. X-LDA achieves a larger average area under the receiver operating curve 0.9891 (by at least 6.68%), and a larger average area under the precision-recall curve 0.7907 (by at least 23.2%) than competitive methods. The results of our well-designed ablation and interpretability experiments and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis demonstrate X-LDA's robustness, learnability, predictability, and interpretability. The applicability of X-LDA is also demonstrated through a case study involving the investigation of associated lncRNAs in prostate cancer, colorectal cancer, and breast cancer.
Collapse
Affiliation(s)
- Yangkun Cao
- School of Artificial Intelligence, Jilin University, Changchun, 130012, China
| | - Jun Xiao
- College of Computer Science and Technology, Jilin University, Changchun, 130012, China
| | - Nan Sheng
- College of Computer Science and Technology, Jilin University, Changchun, 130012, China
| | - Yinwei Qu
- College of Computer Science and Technology, Jilin University, Changchun, 130012, China
| | - Zhihang Wang
- College of Computer Science and Technology, Jilin University, Changchun, 130012, China
| | - Chang Sun
- College of Computer Science, Nankai University, Tianjin, 300071, China
| | - Xuechen Mu
- School of Mathematics, Jilin University, Changchun, 130012, China
| | - Zhenyu Huang
- College of Computer Science and Technology, Jilin University, Changchun, 130012, China.
| | - Xuan Li
- College of Computer Science and Technology, Jilin University, Changchun, 130012, China.
| |
Collapse
|
16
|
Sheng N, Wang Y, Huang L, Gao L, Cao Y, Xie X, Fu Y. Multi-task prediction-based graph contrastive learning for inferring the relationship among lncRNAs, miRNAs and diseases. Brief Bioinform 2023; 24:bbad276. [PMID: 37529914 DOI: 10.1093/bib/bbad276] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 07/09/2023] [Accepted: 07/11/2023] [Indexed: 08/03/2023] Open
Abstract
MOTIVATION Identifying the relationships among long non-coding RNAs (lncRNAs), microRNAs (miRNAs) and diseases is highly valuable for diagnosing, preventing, treating and prognosing diseases. The development of effective computational prediction methods can reduce experimental costs. While numerous methods have been proposed, they often to treat the prediction of lncRNA-disease associations (LDAs), miRNA-disease associations (MDAs) and lncRNA-miRNA interactions (LMIs) as separate task. Models capable of predicting all three relationships simultaneously remain relatively scarce. Our aim is to perform multi-task predictions, which not only construct a unified framework, but also facilitate mutual complementarity of information among lncRNAs, miRNAs and diseases. RESULTS In this work, we propose a novel unsupervised embedding method called graph contrastive learning for multi-task prediction (GCLMTP). Our approach aims to predict LDAs, MDAs and LMIs by simultaneously extracting embedding representations of lncRNAs, miRNAs and diseases. To achieve this, we first construct a triple-layer lncRNA-miRNA-disease heterogeneous graph (LMDHG) that integrates the complex relationships between these entities based on their similarities and correlations. Next, we employ an unsupervised embedding model based on graph contrastive learning to extract potential topological feature of lncRNAs, miRNAs and diseases from the LMDHG. The graph contrastive learning leverages graph convolutional network architectures to maximize the mutual information between patch representations and corresponding high-level summaries of the LMDHG. Subsequently, for the three prediction tasks, multiple classifiers are explored to predict LDA, MDA and LMI scores. Comprehensive experiments are conducted on two datasets (from older and newer versions of the database, respectively). The results show that GCLMTP outperforms other state-of-the-art methods for the disease-related lncRNA and miRNA prediction tasks. Additionally, case studies on two datasets further demonstrate the ability of GCLMTP to accurately discover new associations. To ensure reproducibility of this work, we have made the datasets and source code publicly available at https://github.com/sheng-n/GCLMTP.
Collapse
Affiliation(s)
- Nan Sheng
- Key laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, 130012 Changchun, China
| | - Yan Wang
- Key laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, 130012 Changchun, China
- School of Artificial Intelligence, Jilin University, 130012 Changchun, China
| | - Lan Huang
- Key laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, 130012 Changchun, China
| | - Ling Gao
- Key laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, 130012 Changchun, China
| | - Yangkun Cao
- School of Artificial Intelligence, Jilin University, 130012 Changchun, China
| | - Xuping Xie
- Key laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, 130012 Changchun, China
| | - Yuan Fu
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, Ceredigion, UK
| |
Collapse
|
17
|
Sheng N, Huang L, Gao L, Cao Y, Xie X, Wang Y. A Survey of Computational Methods and Databases for lncRNA-MiRNA Interaction Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2810-2826. [PMID: 37030713 DOI: 10.1109/tcbb.2023.3264254] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Long non-coding RNAs (lncRNAs) and microRNAs (miRNAs) are two prevalent non-coding RNAs in current research. They play critical regulatory roles in the life processes of animals and plants. Studies have shown that lncRNAs can interact with miRNAs to participate in post-transcriptional regulatory processes, mainly involved in regulating cancer development, metastatic progression, and drug resistance. Additionally, these interactions have significant effects on plant growth, development, and responses to biotic and abiotic stresses. Deciphering the potential relationships between lncRNAs and miRNAs may provide new insights into our understanding of the biological functions of lncRNAs and miRNAs, and the pathogenesis of complex diseases. In contrast, gathering information on lncRNA-miRNA interactions (LMIs) through biological experiments is expensive and time-consuming. With the accumulation of multi-omics data, computational models are extremely attractive in systematically exploring potential LMIs. To the best of our knowledge, this is the first comprehensive review of computational methods for identifying LMIs. Specifically, we first summarized the available public databases for predicting animal and plant LMIs. Second, we comprehensively reviewed the computational methods for predicting LMIs and classified them into two categories, including network-based methods and sequence-based methods. Third, we analyzed the standard evaluation methods and metrics used in LMI prediction. Finally, we pointed out some problems in the current study and discuss future research directions. Relevant databases and the latest advances in LMI prediction are summarized in a GitHub repository https://github.com/sheng-n/lncRNA-miRNA-interaction-methods, and we'll keep it updated.
Collapse
|
18
|
Xuan P, Bai H, Cui H, Zhang X, Nakaguchi T, Zhang T. Specific topology and topological connection sensitivity enhanced graph learning for lncRNA-disease association prediction. Comput Biol Med 2023; 164:107265. [PMID: 37531860 DOI: 10.1016/j.compbiomed.2023.107265] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Revised: 06/26/2023] [Accepted: 07/16/2023] [Indexed: 08/04/2023]
Abstract
Predicting disease-related candidate long noncoding RNAs (lncRNAs) is beneficial for exploring disease pathogenesis due to the close relations between lncRNAs and the occurrence and development of human diseases. It is a long-term and challenging task to adequately extract specific and local topologies in individual lncRNA network and individual disease network, and integrate the information of the connection relationships. We propose a new graph learning-based prediction method to encode specific and local topologies from each individual network, neighbor topologies with different connection relationships, and pairwise attributes. We first construct a lncRNA network composed of all the lncRNA nodes and their similarities, and a single disease network that contains all the disease nodes and disease similarities. Then, a network-aware graph convolutional autoencoder is constructed to encode the specific and local topologies of each network. Secondly, a heterogeneous network is established to embed all lncRNA, disease, and miRNA nodes and their various connections. Afterwards, a connection-sensitive graph neural network is designed to deeply integrate the neighbor node attributes and connection characteristics in the heterogeneous network and learn neighbor topological representations. We also construct both connection-level and topology representation-level attention mechanisms to extract informative connections and topological representations. Finally, we build a multi-layer convolutional neural networks with weighted residuals to adaptively complement the detailed features to pairwise attribute encoding. Comprehensive experiments and comparison results demonstrated that NCPred outperforms seven state-of-the-art prediction methods. The ablation studies demonstrated the importance of local topology learning, neighbor topology learning, and pairwise attribute encoding. Case studies on prostate, lung, and breast cancers further revealed NCPred's capacity to screen potential candidate disease-related lncRNAs.
Collapse
Affiliation(s)
- Ping Xuan
- Department of Computer Science, School of Engineering, Shantou University, Shantou, China
| | - Honglei Bai
- School of Computer Science and Technology, Heilongjiang University, Harbin, China
| | - Hui Cui
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, Australia
| | - Xiaowen Zhang
- School of Computer Science and Technology, Heilongjiang University, Harbin, China
| | - Toshiya Nakaguchi
- Center for Frontier Medical Engineering, Chiba University, Chiba, Japan
| | - Tiangang Zhang
- School of Computer Science and Technology, Heilongjiang University, Harbin, China; School of Mathematical Science, Heilongjiang University, Harbin, China.
| |
Collapse
|
19
|
Choi SR, Lee M. Transformer Architecture and Attention Mechanisms in Genome Data Analysis: A Comprehensive Review. BIOLOGY 2023; 12:1033. [PMID: 37508462 PMCID: PMC10376273 DOI: 10.3390/biology12071033] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 07/18/2023] [Accepted: 07/21/2023] [Indexed: 07/30/2023]
Abstract
The emergence and rapid development of deep learning, specifically transformer-based architectures and attention mechanisms, have had transformative implications across several domains, including bioinformatics and genome data analysis. The analogous nature of genome sequences to language texts has enabled the application of techniques that have exhibited success in fields ranging from natural language processing to genomic data. This review provides a comprehensive analysis of the most recent advancements in the application of transformer architectures and attention mechanisms to genome and transcriptome data. The focus of this review is on the critical evaluation of these techniques, discussing their advantages and limitations in the context of genome data analysis. With the swift pace of development in deep learning methodologies, it becomes vital to continually assess and reflect on the current standing and future direction of the research. Therefore, this review aims to serve as a timely resource for both seasoned researchers and newcomers, offering a panoramic view of the recent advancements and elucidating the state-of-the-art applications in the field. Furthermore, this review paper serves to highlight potential areas of future investigation by critically evaluating studies from 2019 to 2023, thereby acting as a stepping-stone for further research endeavors.
Collapse
Affiliation(s)
| | - Minhyeok Lee
- School of Electrical and Electronics Engineering, Chung-Ang University, Seoul 06974, Republic of Korea;
| |
Collapse
|
20
|
Kim Y, Lee M. Deep Learning Approaches for lncRNA-Mediated Mechanisms: A Comprehensive Review of Recent Developments. Int J Mol Sci 2023; 24:10299. [PMID: 37373445 DOI: 10.3390/ijms241210299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 06/16/2023] [Accepted: 06/17/2023] [Indexed: 06/29/2023] Open
Abstract
This review paper provides an extensive analysis of the rapidly evolving convergence of deep learning and long non-coding RNAs (lncRNAs). Considering the recent advancements in deep learning and the increasing recognition of lncRNAs as crucial components in various biological processes, this review aims to offer a comprehensive examination of these intertwined research areas. The remarkable progress in deep learning necessitates thoroughly exploring its latest applications in the study of lncRNAs. Therefore, this review provides insights into the growing significance of incorporating deep learning methodologies to unravel the intricate roles of lncRNAs. By scrutinizing the most recent research spanning from 2021 to 2023, this paper provides a comprehensive understanding of how deep learning techniques are employed in investigating lncRNAs, thereby contributing valuable insights to this rapidly evolving field. The review is aimed at researchers and practitioners looking to integrate deep learning advancements into their lncRNA studies.
Collapse
Affiliation(s)
- Yoojoong Kim
- School of Computer Science and Information Engineering, The Catholic University of Korea, Bucheon 14662, Republic of Korea
| | - Minhyeok Lee
- School of Electrical and Electronics Engineering, Chung-Ang University, Seoul 06974, Republic of Korea
| |
Collapse
|
21
|
Xuan P, Zhao Y, Cui H, Zhan L, Jin Q, Zhang T, Nakaguchi T. Semantic Meta-Path Enhanced Global and Local Topology Learning for lncRNA-Disease Association Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1480-1491. [PMID: 36173783 DOI: 10.1109/tcbb.2022.3209571] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Since abnormal expression of long non-coding RNAs (lncRNAs) is associated with various human diseases, identifying disease-related lncRNAs helps reveal the pathogenesis of diseases. Existing methods for lncRNA-disease association prediction mainly focus on multi-sourced data related to lncRNAs and diseases. The rich semantic information of meta-paths, composed of multiple kinds of connections between lncRNA and disease nodes, is neglected. We propose a new prediction method, MGLDA, to encode and integrate the semantics of multiple meta-paths, the global topology of heterogeneous graph, and pairwise attributes of lncRNA and disease nodes. First, a tri-layer heterogeneous graph is constructed to associate multi-sourced data across the lncRNA, disease, and miRNA nodes. Afterwards, we establish multiple meta-paths connecting the lncRNA and disease nodes to derive and denote various semantics. Each meta-path contains its specific semantics formulated by an embedding strategy, and each embedding covers local topology formed by the diverse semantic connections among the lncRNA, disease, and miRNA nodes. We construct multiple graph convolutional autoencoders (GCA) with topology-level attention to learn global and multiple local topologies from the tri-layer graph and each meta-path, respectively. The topology-level attention mechanism can learn the importance of various global and local topologies for adaptive pairwise topology fusion. Finally, a convolutional autoencoder learns the attribute representations of lncRNA-disease pairs, which integrates the learnt detailed and representative pairwise features. Experimental results show that MGLDA outperforms other state-of-the-art prediction methods in comparison and retrieves more real lncRNA-disease associations in the top-ranked candidates. The ablation study also demonstrates the important contributions of the local and global topology learning, and pairwise attribute learning. Case studies on three diseases further demonstrate MGLDA's ability to identify potential disease-related lncRNAs.
Collapse
|
22
|
Recent advances in predicting lncRNA-disease associations based on computational methods. Drug Discov Today 2023; 28:103432. [PMID: 36370992 DOI: 10.1016/j.drudis.2022.103432] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 10/19/2022] [Accepted: 11/03/2022] [Indexed: 11/11/2022]
Abstract
Mutations in and dysregulation of long non-coding RNAs (lncRNAs) are closely associated with the development of various human complex diseases, but only a few lncRNAs have been experimentally confirmed to be associated with human diseases. Predicting new potential lncRNA-disease associations (LDAs) will help us to understand the pathogenesis of human diseases and to detect disease markers, as well as in disease diagnosis, prevention and treatment. Computational methods can effectively narrow down the screening scope of biological experiments, thereby reducing the duration and cost of such experiments. In this review, we outline recent advances in computational methods for predicting LDAs, focusing on LDA databases, lncRNA/disease similarity calculations, and advanced computational models. In addition, we analyze the limitations of various computational models and discuss future challenges and directions for development.
Collapse
|
23
|
Sheng N, Huang L, Lu Y, Wang H, Yang L, Gao L, Xie X, Fu Y, Wang Y. Data resources and computational methods for lncRNA-disease association prediction. Comput Biol Med 2023; 153:106527. [PMID: 36610216 DOI: 10.1016/j.compbiomed.2022.106527] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Revised: 12/08/2022] [Accepted: 12/31/2022] [Indexed: 01/03/2023]
Abstract
Increasing interest has been attracted in deciphering the potential disease pathogenesis through lncRNA-disease association (LDA) prediction, regarding to the diverse functional roles of lncRNAs in genome regulation. Whilst, computational models and algorithms benefit systematic biology research, even facilitate the classical biological experimental procedures. In this review, we introduce representative diseases associated with lncRNAs, such as cancers, cardiovascular diseases, and neurological diseases. Current publicly available resources related to lncRNAs and diseases have also been included. Furthermore, all of the 64 computational methods for LDA prediction have been divided into 5 groups, including machine learning-based methods, network propagation-based methods, matrix factorization- and completion-based methods, deep learning-based methods, and graph neural network-based methods. The common evaluation methods and metrics in LDA prediction have also been discussed. Finally, the challenges and future trends in LDA prediction have been discussed. Recent advances in LDA prediction approaches have been summarized in the GitHub repository at https://github.com/sheng-n/lncRNA-disease-methods.
Collapse
Affiliation(s)
- Nan Sheng
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Lan Huang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China.
| | - Yuting Lu
- School of Artificial Intelligence, Jilin University, Changchun, China
| | - Hao Wang
- Department of Hepatopancreatobiliary Surgery, Second Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Lili Yang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China; Department of Obstetrics, The First Hospital of Jilin University, Changchun, China
| | - Ling Gao
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Xuping Xie
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Yuan Fu
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, Ceredigion, United Kingdom
| | - Yan Wang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China; School of Artificial Intelligence, Jilin University, Changchun, China.
| |
Collapse
|
24
|
Yu X, Zhou S, Zou H, Wang Q, Liu C, Zang M, Liu T. Survey of deep learning techniques for disease prediction based on omics data. HUMAN GENE 2023; 35:201140. [DOI: 10.1016/j.humgen.2022.201140] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2025]
|
25
|
Li J, Wang D, Yang Z, Liu M. HEGANLDA: A Computational Model for Predicting Potential Lncrna-Disease Associations Based On Multiple Heterogeneous Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:388-398. [PMID: 34932483 DOI: 10.1109/tcbb.2021.3136886] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Long non-coding RNAs (lncRNAs) play vital regulatory roles in many human complex diseases, however, the number of validated lncRNA-disease associations is notable rare so far. How to predict potential lncRNA-disease associations precisely through computational methods remains challenging. In this study, we proposed a novel method, LDVCHN (LncRNA-Disease Vector Calculation Heterogeneous Networks), and also developed the corresponding model, HEGANLDA (Heterogeneous Embedding Generative Adversarial Networks LncRNA-Disease Association), for predicting potential lncRNA-disease associations. In HEGANLDA, the graph embedding algorithm (HeGAN) was introduced for mapping all nodes in the lncRNA-miRNA-disease heterogeneous network into the low-dimensional vectors which severed as the inputs of LDVCHN. HEGANLDA effectively adopted the XGBoost (eXtreme Gradient Boosting) classifier, which was trained by the low-dimensional vectors, to predict potential lncRNA-disease associations. The 10-fold cross-validation method was utilized to evaluate the performance of our model, our model finally achieved an area under the ROC curve of 0.983. According to the experiment results, HEGANLDA outperformed any one of five current state-of-the-art methods. To further evaluate the effectiveness of HEGANLDA in predicting potential lncRNA-disease associations, both case studies and robustness tests were performed and the results confirmed its effectiveness and robustness. The source code and data of HEGANLDA are available at https://github.com/HEGANLDA/HEGANLDA.
Collapse
|
26
|
Tan J, Li X, Zhang L, Du Z. Recent advances in machine learning methods for predicting LncRNA and disease associations. Front Cell Infect Microbiol 2022; 12:1071972. [PMID: 36530425 PMCID: PMC9748103 DOI: 10.3389/fcimb.2022.1071972] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Accepted: 11/11/2022] [Indexed: 12/03/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) are involved in almost the entire cell life cycle through different mechanisms and play an important role in many key biological processes. Mutations and dysregulation of lncRNAs have been implicated in many complex human diseases. Therefore, identifying the relationship between lncRNAs and diseases not only contributes to biologists' understanding of disease mechanisms, but also provides new ideas and solutions for disease diagnosis, treatment, prognosis and prevention. Since the existing experimental methods for predicting lncRNA-disease associations (LDAs) are expensive and time consuming, machine learning methods for predicting lncRNA-disease associations have become increasingly popular among researchers. In this review, we summarize some of the human diseases studied by LDAs prediction models, association and similarity features of LDAs prediction, performance evaluation methods of models and some advanced machine learning prediction models of LDAs. Finally, we discuss the potential limitations of machine learning-based methods for LDAs prediction and provide some ideas for designing new prediction models.
Collapse
|
27
|
Zhou Y, Wang X, Yao L, Zhu M. LDAformer: predicting lncRNA-disease associations based on topological feature extraction and Transformer encoder. Brief Bioinform 2022; 23:6696138. [PMID: 36094081 DOI: 10.1093/bib/bbac370] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 07/27/2022] [Accepted: 08/06/2022] [Indexed: 12/14/2022] Open
Abstract
The identification of long noncoding RNA (lncRNA)-disease associations is of great value for disease diagnosis and treatment, and it is now commonly used to predict potential lncRNA-disease associations with computational methods. However, the existing methods do not sufficiently extract key features during data processing, and the learning model parts are either less powerful or overly complex. Therefore, there is still potential to achieve better predictive performance by improving these two aspects. In this work, we propose a novel lncRNA-disease association prediction method LDAformer based on topological feature extraction and Transformer encoder. We construct the heterogeneous network by integrating the associations between lncRNAs, diseases and micro RNAs (miRNAs). Intra-class similarities and inter-class associations are presented as the lncRNA-disease-miRNA weighted adjacency matrix to unify semantics. Next, we design a topological feature extraction process to further obtain multi-hop topological pathway features latent in the adjacency matrix. Finally, to capture the interdependencies between heterogeneous pathways, a Transformer encoder based on the global self-attention mechanism is employed to predict lncRNA-disease associations. The efficient feature extraction and the intuitive and powerful learning model lead to ideal performance. The results of computational experiments on two datasets show that our method outperforms the state-of-the-art baseline methods. Additionally, case studies further indicate its capability to discover new associations accurately.
Collapse
Affiliation(s)
- Yi Zhou
- College of Computer Science, Sichuan University, 1st Ring Road South 1 Section, 610065, Chengdu, China
| | - Xinyi Wang
- College of Computer Science, Sichuan University, 1st Ring Road South 1 Section, 610065, Chengdu, China
| | - Lin Yao
- College of Computer Science, Sichuan University, 1st Ring Road South 1 Section, 610065, Chengdu, China
| | - Min Zhu
- College of Computer Science, Sichuan University, 1st Ring Road South 1 Section, 610065, Chengdu, China
| |
Collapse
|
28
|
Xie F, Yang Z, Song J, Dai Q, Duan X. DHNLDA: A Novel Deep Hierarchical Network Based Method for Predicting lncRNA-Disease Associations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3395-3403. [PMID: 34543201 DOI: 10.1109/tcbb.2021.3113326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Recent studies have found that lncRNA (long non-coding RNA) in ncRNA (non-coding RNA) is not only involved in many biological processes, but also abnormally expressed in many complex diseases. Identification of lncRNA-disease associations accurately is of great significance for understanding the function of lncRNA and disease mechanism. In this paper, a deep learning framework consisting of stacked autoencoder(SAE), multi-scale ResNet and stacked ensemble module, named DHNLDA, was constructed to predict lncRNA-disease associations, which integrates multiple biological data sources and constructing feature matrices. Among them, the biological data including the similarity and the interaction of lncRNAs, diseases and miRNAs are integrated. The feature matrices are obtained by node2vec embedding and feature extraction respectively. Then, the SAE and the multi-scale ResNet are used to learn the complementary information between nodes, and the high-level features of node attributes are obtained. Finally, the fusion of high-level feature is input into the stacked ensemble module to obtain the prediction results of lncRNA-disease associations. The experimental results of five-fold cross-validation show that the AUC of DHNLDA reaches 0.975 better than the existing methods. Case studies of stomach cancer, breast cancer and lung cancer have shown the great ability of DHNLDA to discover the potential lncRNA-disease associations.
Collapse
|
29
|
Shi H, Zhang X, Tang L, Liu L. Heterogeneous graph neural network for lncRNA-disease association prediction. Sci Rep 2022; 12:17519. [PMID: 36266433 PMCID: PMC9585029 DOI: 10.1038/s41598-022-22447-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Accepted: 10/14/2022] [Indexed: 01/12/2023] Open
Abstract
Identifying lncRNA-disease associations is conducive to the diagnosis, treatment and prevention of diseases. Due to the expensive and time-consuming methods verified by biological experiments, prediction methods based on computational models have gradually become an important means of lncRNA-disease associations discovery. However, existing methods still have challenges to make full use of network topology information to identify potential associations between lncRNA and disease in multi-source data. In this study, we propose a novel method called HGNNLDA for lncRNA-disease association prediction. First, HGNNLDA constructs a heterogeneous network composed of lncRNA similarity network, lncRNA-disease association network and lncRNA-miRNA association network; Then, on this heterogeneous network, various types of strong correlation neighbors with fixed size are sampled for each node by restart random walk; Next, the embedding information of lncRNA and disease in each lncRNA-disease association pair is obtained by the method of type-based neighbor aggregation and all types combination though heterogeneous graph neural network, in which attention mechanism is introduced considering that different types of neighbors will make different contributions to the prediction of lncRNA-disease association. As a result, the area under the receiver operating characteristic curve (AUC) and the area under the precision-recall curve (AUPR) under fivefold cross-validation (5FCV) are 0.9786 and 0.8891, respectively. Compared with five state-of-art prediction models, HGNNLDA has better prediction performance. In addition, in two types of case studies, it is further verified that our method can effectively predict the potential lncRNA-disease associations, and have ability to predict new diseases without any known lncRNAs.
Collapse
Affiliation(s)
- Hong Shi
- School of Information, Yunan Normal University, Kunming, 650092 China
| | - Xiaomeng Zhang
- School of Information, Yunan Normal University, Kunming, 650092 China
| | - Lin Tang
- grid.410739.80000 0001 0723 6903Key Laboratory of Educational Informatization for Nationalities Ministry of Education, Yunnan Normal University, Kunming, 650092 China
| | - Lin Liu
- School of Information, Yunan Normal University, Kunming, 650092 China
| |
Collapse
|
30
|
Xuan P, Wang S, Cui H, Zhao Y, Zhang T, Wu P. Learning global dependencies and multi-semantics within heterogeneous graph for predicting disease-related lncRNAs. Brief Bioinform 2022; 23:6695267. [DOI: 10.1093/bib/bbac361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Revised: 07/18/2022] [Accepted: 08/05/2022] [Indexed: 11/14/2022] Open
Abstract
Abstract
Motivation
Long noncoding RNAs (lncRNAs) play an important role in the occurrence and development of diseases. Predicting disease-related lncRNAs can help to understand the pathogenesis of diseases deeply. The existing methods mainly rely on multi-source data related to lncRNAs and diseases when predicting the associations between lncRNAs and diseases. There are interdependencies among node attributes in a heterogeneous graph composed of all lncRNAs, diseases and micro RNAs. The meta-paths composed of various connections between them also contain rich semantic information. However, the existing methods neglect to integrate attribute information of intermediate nodes in meta-paths.
Results
We propose a novel association prediction model, GSMV, to learn and deeply integrate the global dependencies, semantic information of meta-paths and node-pair multi-view features related to lncRNAs and diseases. We firstly formulate the global representations of the lncRNA and disease nodes by establishing a self-attention mechanism to capture and learn the global dependencies among node attributes. Second, starting from the lncRNA and disease nodes, respectively, multiple meta-pathways are established to reveal different semantic information. Considering that each meta-path contains specific semantics and has multiple meta-path instances which have different contributions to revealing meta-path semantics, we design a graph neural network based module which consists of a meta-path instance encoding strategy and two novel attention mechanisms. The proposed meta-path instance encoding strategy is used to learn the contextual connections between nodes within a meta-path instance. One of the two new attention mechanisms is at the meta-path instance level, which learns rich and informative meta-path instances. The other attention mechanism integrates various semantic information from multiple meta-paths to learn the semantic representation of lncRNA and disease nodes. Finally, a dilated convolution-based learning module with adjustable receptive fields is proposed to learn multi-view features of lncRNA-disease node pairs. The experimental results prove that our method outperforms seven state-of-the-art comparing methods for lncRNA-disease association prediction. Ablation experiments demonstrate the contributions of the proposed global representation learning, semantic information learning, pairwise multi-view feature learning and the meta-path instance encoding strategy. Case studies on three cancers further demonstrate our method’s ability to discover potential disease-related lncRNA candidates.
Contact
zhang@hlju.edu.cn or peiliangwu@ysu.edu.cn
Supplementary information
Supplementary data are available at Briefings in Bioinformatics online.
Collapse
Affiliation(s)
- Ping Xuan
- School of Information Science and Engineering (School of Software), Yanshan University , Qinhuangdao 066004, China
- School of Computer Science and Technology, Heilongjiang University , Harbin 150080, China
| | - Shuai Wang
- School of Information Science and Engineering (School of Software), Yanshan University , Qinhuangdao 066004, China
| | - Hui Cui
- Department of Computer Science and Information Technology, La Trobe University , Melbourne 3083, Australia
| | - Yue Zhao
- School of Computer Science and Technology, Heilongjiang University , Harbin 150080, China
| | - Tiangang Zhang
- School of Mathematical Science, Heilongjiang University , Harbin 150080, China
| | - Peiliang Wu
- School of Information Science and Engineering (School of Software), Yanshan University , Qinhuangdao 066004, China
| |
Collapse
|
31
|
Wang Y, Shao Y, Zhang H, Wang J, Zhang P, Zhang W, Chen H. Comprehensive analysis of key genes and pathways for biological and clinical implications in thyroid-associated ophthalmopathy. BMC Genomics 2022; 23:630. [PMID: 36056316 PMCID: PMC9440526 DOI: 10.1186/s12864-022-08854-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 08/24/2022] [Indexed: 12/02/2022] Open
Abstract
Background Thyroid-associated ophthalmopathy (TAO) is a common and organ-specific autoimmune disease. Early diagnosis and novel treatments are essential to improve the prognosis of TAO patients. Therefore, the current work was performed to identify the key genes and pathways for the biological and clinical implications of TAO through comprehensive bioinformatics analysis and a series of clinical validations. Methods GSE105149 and GSE185952 were obtained from the Gene Expression Omnibus (GEO) database for analysis. The data were normalized to identify the common differentially expressed genes (DEGs) between the two datasets, and the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were conducted to assess key pathways in TAO. Protein–protein interaction (PPI) networks and hub genes among the common DEGs were identified. Furthermore, we collected the general information and blood samples from 50 TAO patients and 20 healthy controls (HCs), and the expression levels of the proteins encoded by hub genes in serum were detected by enzyme-linked immunosorbent assay (ELISA). Then we further assessed the relationship between the ELISA data and the TAO development. Results Several common pathways, including neuroactive ligand-receptor interaction, the IL-17 signaling pathway, and the TNF signaling pathway, were identified in both datasets. In parallel, 52 common DEGs were identified. The KEGG analysis showed that these common DEGs are mainly enriched in long-term depression, the VEGF signaling pathway, the IL-17 signaling pathway, the TNF signaling pathway, and cytokine-cytokine receptor interactions. The key hub genes PRKCG, OSM, DPP4, LRRTM1, CXCL6, and CSF3R were screened out through the PPI network. As confirmation, the ELISA results indicated that protein expression levels of PRKCG, OSM, CSF3R, and DPP4 were significantly upregulated in TAO patients compared with HCs. In addition, PRKCG and DPP4 were verified to show value in diagnosing TAO, and CSF3R was found to be a valuable diagnostic marker in distinguishing active TAO from inactive TAO. Conclusions Inflammation- and neuromodulation-related pathways might be closely associated with TAO. Based on the clinical verification, OSM, CSF3R, CXCL6, DPP4, and PRKCG may serve as inflammation- or neuromodulation-related biomarkers for TAO, providing novel insights for the diagnosis and treatment of TAO. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-022-08854-5.
Collapse
Affiliation(s)
- Yueyue Wang
- Department of Endocrinology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Yanfei Shao
- Department of General Surgery, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China.,Shanghai Minimally Invasive Surgery Center, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Haitao Zhang
- Department of Endocrinology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Jun Wang
- Department of Toxicology, The Key Laboratory of Modern Toxicology of Ministry of Education, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Peng Zhang
- Department of Ophthalmology, The Friendship Hospital of Ili Kazakh Autonomous Prefecture Ili & Jiangsu Joint Institute of Health, Ili, China
| | - Weizhong Zhang
- Department of Ophthalmology, The Friendship Hospital of Ili Kazakh Autonomous Prefecture Ili & Jiangsu Joint Institute of Health, Ili, China. .,Department of Ophthalmology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China.
| | - Huanhuan Chen
- Department of Endocrinology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China.
| |
Collapse
|
32
|
Zhang Y, Ye F, Gao X. MCA-Net: Multi-Feature Coding and Attention Convolutional Neural Network for Predicting lncRNA-Disease Association. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2907-2919. [PMID: 34283719 DOI: 10.1109/tcbb.2021.3098126] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
With the advent of the era of big data, it is troublesome to accurately predict the associations between lncRNAs and diseases based on traditional biological experiments due to its time-consuming and subjective. In this paper, we propose a novel deep learning method for predicting lncRNA-disease associations using multi-feature coding and attention convolutional neural network (MCA-Net). We first calculate six similarity features to extract different types of lncRNA and disease feature information. Second, a multi-feature coding method is proposed to construct the feature vectors of lncRNA-disease association samples by integrating the six similarity features. Furthermore, an attention convolutional neural network is developed to identify lncRNA-disease associations under 10-fold cross-validation. Finally, we evaluate the performance of MCA-Net from different perspectives including the effects of the model parameters, distinct deep learning models, and the necessity of attention mechanism. We also compare MCA-Net with several state-of-the-art methods on three publicly available datasets, i.e., LncRNADisease, Lnc2Cancer, and LncRNADisease2.0. The results show that our MCA-Net outperforms the state-of-the-art methods on all three dataset. Besides, case studies on breast cancer and lung cancer further verify that MCA-Net is effective and accurate for the lncRNA-disease association prediction.
Collapse
|
33
|
Wang B, Liu R, Zheng X, Du X, Wang Z. lncRNA-disease association prediction based on matrix decomposition of elastic network and collaborative filtering. Sci Rep 2022; 12:12700. [PMID: 35882886 PMCID: PMC9325687 DOI: 10.1038/s41598-022-16594-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2022] [Accepted: 07/12/2022] [Indexed: 11/30/2022] Open
Abstract
In recent years, with the continuous development and innovation of high-throughput biotechnology, more and more evidence show that lncRNA plays an essential role in biological life activities and is related to the occurrence of various diseases. However, due to the high cost and time-consuming of traditional biological experiments, the number of associations between lncRNAs and diseases that rely on experiments to verify is minimal. Computer-aided study of lncRNA-disease association is an important method to study the development of the lncRNA-disease association. Using the existing data to establish a prediction model and predict the unknown lncRNA-disease association can make the biological experiment targeted and improve its accuracy of the biological experiment. Therefore, we need to find an accurate and efficient method to predict the relationship between lncRNA and diseases and help biologists complete the diagnosis and treatment of diseases. Most of the current lncRNA-disease association predictions do not consider the model instability caused by the actual data. Also, predictive models may produce data that overfit is not considered. This paper proposes a lncRNA-disease association prediction model (ENCFLDA) that combines an elastic network with matrix decomposition and collaborative filtering. This method uses the existing lncRNA-miRNA association data and miRNA-disease association data to predict the association between unknown lncRNA and disease, updates the matrix by matrix decomposition combined with the elastic network, and then obtains the final prediction matrix by collaborative filtering. This method uses the existing lncRNA-miRNA association data and miRNA-disease association data to predict the association of unknown lncRNAs with diseases. First, since the known lncRNA-disease association matrix is very sparse, the cosine similarity and KNN are used to update the lncRNA-disease association matrix. The matrix is then updated by matrix decomposition combined with an elastic net algorithm, to increase the stability of the overall prediction model and eliminate data overfitting. The final prediction matrix is then obtained through collaborative filtering based on lncRNA.Through simulation experiments, the results show that the AUC value of ENCFLDA can reach 0.9148 under the framework of LOOCV, which is higher than the prediction result of the latest model.
Collapse
Affiliation(s)
- Bo Wang
- College of Computer and Control, Qiqihar University, Qiqihar, 161006, China.
| | - RunJie Liu
- College of Computer and Control, Qiqihar University, Qiqihar, 161006, China
| | - XiaoDong Zheng
- College of Computer and Control, Qiqihar University, Qiqihar, 161006, China
| | - XiaoXin Du
- College of Computer and Control, Qiqihar University, Qiqihar, 161006, China
| | - ZhengFei Wang
- College of Computer and Control, Qiqihar University, Qiqihar, 161006, China
| |
Collapse
|
34
|
Guo Z, Hui Y, Kong F, Lin X. Finding Lung-Cancer-Related lncRNAs Based on Laplacian Regularized Least Squares With Unbalanced Bi-Random Walk. Front Genet 2022; 13:933009. [PMID: 35938010 PMCID: PMC9355720 DOI: 10.3389/fgene.2022.933009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Accepted: 06/03/2022] [Indexed: 11/13/2022] Open
Abstract
Lung cancer is one of the leading causes of cancer-related deaths. Thus, it is important to find its biomarkers. Furthermore, there is an increasing number of studies reporting that long noncoding RNAs (lncRNAs) demonstrate dense linkages with multiple human complex diseases. Inferring new lncRNA-disease associations help to identify potential biomarkers for lung cancer and further understand its pathogenesis, design new drugs, and formulate individualized therapeutic options for lung cancer patients. This study developed a computational method (LDA-RLSURW) by integrating Laplacian regularized least squares and unbalanced bi-random walk to discover possible lncRNA biomarkers for lung cancer. First, the lncRNA and disease similarities were computed. Second, unbalanced bi-random walk was, respectively, applied to the lncRNA and disease networks to score associations between diseases and lncRNAs. Third, Laplacian regularized least squares were further used to compute the association probability between each lncRNA-disease pair based on the computed random walk scores. LDA-RLSURW was compared using 10 classical LDA prediction methods, and the best AUC value of 0.9027 on the lncRNADisease database was obtained. We found the top 30 lncRNAs associated with lung cancers and inferred that lncRNAs TUG1, PTENP1, and UCA1 may be biomarkers of lung neoplasms, non-small–cell lung cancer, and LUAD, respectively.
Collapse
|
35
|
Xie G, Zhu Y, Lin Z, Sun Y, Gu G, Li J, Wang W. HBRWRLDA: predicting potential lncRNA-disease associations based on hypergraph bi-random walk with restart. Mol Genet Genomics 2022; 297:1215-1228. [PMID: 35752742 DOI: 10.1007/s00438-022-01909-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Accepted: 05/20/2022] [Indexed: 10/17/2022]
Abstract
Accumulating evidence indicates that the regulation of long non-coding RNAs (lncRNAs) is closely related to a variety of diseases. Identifying meaningful lncRNA-disease associations will help to contribute to the understanding of the molecular mechanisms underlying these diseases. However, only a limited number of associations between lncRNAs and diseases have been inferred from traditional biological experiments due to the high cost and highly specialized. Therefore, computational methods are increasingly used to reduce time of biological experiments and complement biological research. In this paper, a computational method called HBRWRLDA is proposed to predict lncRNA-disease associations. First, HBRWRLDA models the relationships between multiple nodes using hypergraphs, which allows HBRWRLDA to integrate the expression similarity of lncRNAs and the semantic similarity of diseases to construct hypergraphs. Then, a bi-random walk on hypergraphs is used to predict potential lncRNA-disease associations. HBRWRLDA achieves a higher area under the curve value of 0.9551 and [Formula: see text], respectively, compared with the other five advanced methods under the framework of one-leave cross validation (LOOCV) and five-fold cross-validation (5-fold CV). In addition, the prediction effect of HBRWRLDA was confirmed case studies of three diseases: renal cell carcinoma, gastric cancer, and hepatocellular carcinoma. Case studies demonstrates the capacity of HBRWRLDA to identify potentially disease-associated lncRNAs. Overall, HBRWRLDA is excellent at predicting potential lncRNA-disease associations and could be useful in conducting further biological experiments by helping researchers identify candidates of lncRNA-disease association.
Collapse
Affiliation(s)
- Guobo Xie
- School of Computing, Guangdong University of Technology, Guangzhou, 510000, China
| | - Yinting Zhu
- School of Computing, Guangdong University of Technology, Guangzhou, 510000, China
| | - Zhiyi Lin
- School of Computing, Guangdong University of Technology, Guangzhou, 510000, China.
| | - Yuping Sun
- School of Computing, Guangdong University of Technology, Guangzhou, 510000, China
| | - Guosheng Gu
- School of Computing, Guangdong University of Technology, Guangzhou, 510000, China
| | - Jianming Li
- School of Computing, Guangdong University of Technology, Guangzhou, 510000, China
| | - Weiming Wang
- School of Computing, Guangdong University of Technology, Guangzhou, 510000, China
| |
Collapse
|
36
|
Liu Y, Yu Y, Zhao S. Dual Attention Mechanisms and Feature Fusion Networks Based Method for Predicting LncRNA-Disease Associations. Interdiscip Sci 2022; 14:358-371. [PMID: 35067893 DOI: 10.1007/s12539-021-00492-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Revised: 11/02/2021] [Accepted: 11/07/2021] [Indexed: 11/30/2022]
Abstract
LncRNAs play a part in numerous momentous processes of biology such as disease diagnoses, preventions and treatments. The associations between various diseases and lncRNAs are one of the crucial approaches to learn the role and status of lncRNAs in human diseases. With the researches on lncRNA and diseases, multiple methods based on neural network have been employed to predict these associations. However, the deep and complicated characteristic representations of lncRNA-disease associations were failed to be extracted, and the discriminative contributions of the interactions, correlations, and similarities among miRNAs diseases, and lncRNAs for the correlation predictions were ignored. In this paper, based on the multibiology premise of lncRNAs, miRNAs, and diseases, a dual attention network was proposed to predict the model of lncRNA-disease associations for miRNAs, the disease characteristic matrix, and lncRNAs. Through two attention modules, we enable the model to learn the nonlinear, more complex and useful features of lncRNA, miRNA, and disease characteristic matrix. For the feature embedding matrix composed of lncRNA-disease, the connection between lncRNA-disease feature embedding matrix and lncRNA, miRNA, and disease characteristic matrix was enhanced through deconvolution and feature fusion layer. Compared with several latest methods, the method proposed in this paper can produce better performance. Researches on the cases of osteosarcoma, lung cancer, and gastric cancer have confirmed the effective recognition of potential lncRNA-disease associations.
Collapse
Affiliation(s)
- Yu Liu
- Dalian Key Lab of Digital Technology for National Culture, Dalian Minzu University, Dalian, 116600, China. .,Universiti Putra Malaysia, 43400 UPM Serdang, Selangor, Darul Ehsan, Malaysia.
| | - Yingying Yu
- Dalian Key Lab of Digital Technology for National Culture, Dalian Minzu University, Dalian, 116600, China
| | - Shimin Zhao
- Guangxi Vocational and Technical College, Nanning, 530000, Guangxi, China
| |
Collapse
|
37
|
Xuan P, Gong Z, Cui H, Li B, Zhang T. Fully connected autoencoder and convolutional neural network with attention-based method for inferring disease-related lncRNAs. Brief Bioinform 2022; 23:6561435. [PMID: 35362511 DOI: 10.1093/bib/bbac089] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Revised: 01/17/2022] [Accepted: 02/23/2022] [Indexed: 11/14/2022] Open
Abstract
Since abnormal expression of long noncoding RNAs (lncRNAs) is often closely related to various human diseases, identification of disease-associated lncRNAs is helpful for exploring the complex pathogenesis. Most of recent methods concentrate on exploiting multiple kinds of data related to lncRNAs and diseases for predicting candidate disease-related lncRNAs. These methods, however, failed to deeply integrate the topology information from the meta-paths that are composed of lncRNA, disease and microRNA (miRNA) nodes. We proposed a new method based on fully connected autoencoders and convolutional neural networks, called ACLDA, for inferring potential disease-related lncRNA candidates. A heterogeneous graph that consists of lncRNA, disease and miRNA nodes were firstly constructed to integrate similarities, associations and interactions among them. Fully connected autoencoder-based module was established to extract the low-dimensional features of lncRNA, disease and miRNA nodes in the heterogeneous graph. We designed the attention mechanisms at the node feature level and at the meta-path level to learn more informative features and meta-paths. A module based on convolutional neural networks was constructed to encode the local topologies of lncRNA and disease nodes from multiple meta-path perspectives. The comprehensive experimental results demonstrated ACLDA achieves superior performance than several state-of-the-art prediction methods. Case studies on breast, lung and colon cancers demonstrated that ACLDA is able to discover the potential disease-related lncRNAs.
Collapse
Affiliation(s)
- Ping Xuan
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Zhe Gong
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Hui Cui
- Department of Computer Science and Information Technology, La Trobe University, Melbourne 3083, Australia
| | - Bochong Li
- Center for Frontier Medical Engineering, Chiba University, Chiba 2638522, Japan
| | - Tiangang Zhang
- School of Mathematical Science, Heilongjiang University, Harbin 150080, China
| |
Collapse
|
38
|
Sheng N, Huang L, Wang Y, Zhao J, Xuan P, Gao L, Cao Y. Multi-channel graph attention autoencoders for disease-related lncRNAs prediction. Brief Bioinform 2022; 23:6519791. [PMID: 35108355 DOI: 10.1093/bib/bbab604] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Revised: 12/08/2021] [Accepted: 12/27/2021] [Indexed: 12/31/2022] Open
Abstract
MOTIVATION Predicting disease-related long non-coding RNAs (lncRNAs) can be used as the biomarkers for disease diagnosis and treatment. The development of effective computational prediction approaches to predict lncRNA-disease associations (LDAs) can provide insights into the pathogenesis of complex human diseases and reduce experimental costs. However, few of the existing methods use microRNA (miRNA) information and consider the complex relationship between inter-graph and intra-graph in complex-graph for assisting prediction. RESULTS In this paper, the relationships between the same types of nodes and different types of nodes in complex-graph are introduced. We propose a multi-channel graph attention autoencoder model to predict LDAs, called MGATE. First, an lncRNA-miRNA-disease complex-graph is established based on the similarity and correlation among lncRNA, miRNA and diseases to integrate the complex association among them. Secondly, in order to fully extract the comprehensive information of the nodes, we use graph autoencoder networks to learn multiple representations from complex-graph, inter-graph and intra-graph. Thirdly, a graph-level attention mechanism integration module is adopted to adaptively merge the three representations, and a combined training strategy is performed to optimize the whole model to ensure the complementary and consistency among the multi-graph embedding representations. Finally, multiple classifiers are explored, and Random Forest is used to predict the association score between lncRNA and disease. Experimental results on the public dataset show that the area under receiver operating characteristic curve and area under precision-recall curve of MGATE are 0.964 and 0.413, respectively. MGATE performance significantly outperformed seven state-of-the-art methods. Furthermore, the case studies of three cancers further demonstrate the ability of MGATE to identify potential disease-correlated candidate lncRNAs. The source code and supplementary data are available at https://github.com/sheng-n/MGATE. CONTACT huanglan@jlu.edu.cn, wy6868@jlu.edu.cn.
Collapse
Affiliation(s)
- Nan Sheng
- Key laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
| | - Lan Huang
- Key laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
| | - Yan Wang
- Key laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China.,School of Artificial Intelligence, Jilin University, Changchun 130012, China
| | - Jing Zhao
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus OH 43210, USA
| | - Ping Xuan
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Ling Gao
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Yangkun Cao
- School of Artificial Intelligence, Jilin University, Changchun 130012, China
| |
Collapse
|
39
|
Li J, Kong M, Wang D, Yang Z, Hao X. Prediction of lncRNA-Disease Associations via Closest Node Weight Graphs of the Spatial Neighborhood Based on the Edge Attention Graph Convolutional Network. Front Genet 2022; 12:808962. [PMID: 35058974 PMCID: PMC8763691 DOI: 10.3389/fgene.2021.808962] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Accepted: 11/29/2021] [Indexed: 11/24/2022] Open
Abstract
Accumulated evidence of biological clinical trials has shown that long non-coding RNAs (lncRNAs) are closely related to the occurrence and development of various complex human diseases. Research works on lncRNA–disease relations will benefit to further understand the pathogenesis of human complex diseases at the molecular level, but only a small proportion of lncRNA–disease associations has been confirmed. Considering the high cost of biological experiments, exploring potential lncRNA–disease associations with computational approaches has become very urgent. In this study, a model based on closest node weight graph of the spatial neighborhood (CNWGSN) and edge attention graph convolutional network (EAGCN), LDA-EAGCN, was developed to uncover potential lncRNA–disease associations by integrating disease semantic similarity, lncRNA functional similarity, and known lncRNA–disease associations. Inspired by the great success of the EAGCN method on the chemical molecule property recognition problem, the prediction of lncRNA–disease associations could be regarded as a component recognition problem of lncRNA–disease characteristic graphs. The CNWGSN features of lncRNA–disease associations combined with known lncRNA–disease associations were introduced to train EAGCN, and correlation scores of input data were predicted with EAGCN for judging whether the input lncRNAs would be associated with the input diseases. LDA-EAGCN achieved a reliable AUC value of 0.9853 in the ten-fold cross-over experiments, which was the highest among five state-of-the-art models. Furthermore, case studies of renal cancer, laryngeal carcinoma, and liver cancer were implemented, and most of the top-ranking lncRNA–disease associations have been proven by recently published experimental literature works. It can be seen that LDA-EAGCN is an effective model for predicting potential lncRNA–disease associations. Its source code and experimental data are available at https://github.com/HGDKMF/LDA-EAGCN.
Collapse
Affiliation(s)
- Jianwei Li
- Institute of Computational Medicine, School of Artificial Intelligence, Hebei University of Technology, Tianjin, China.,Hebei Province Key Laboratory of Big Data Calculation, Hebei University of Technology, Tianjin, China
| | - Mengfan Kong
- Institute of Computational Medicine, School of Artificial Intelligence, Hebei University of Technology, Tianjin, China
| | - Duanyang Wang
- Institute of Computational Medicine, School of Artificial Intelligence, Hebei University of Technology, Tianjin, China
| | - Zhenwu Yang
- Institute of Computational Medicine, School of Artificial Intelligence, Hebei University of Technology, Tianjin, China
| | - Xiaoke Hao
- Institute of Computational Medicine, School of Artificial Intelligence, Hebei University of Technology, Tianjin, China
| |
Collapse
|
40
|
Gong Y, Zhu W, Sun M, Shi L. Bioinformatics Analysis of Long Non-coding RNA and Related Diseases: An Overview. Front Genet 2021; 12:813873. [PMID: 34956340 PMCID: PMC8692768 DOI: 10.3389/fgene.2021.813873] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Accepted: 11/26/2021] [Indexed: 12/30/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) are usually located in the nucleus and cytoplasm of cells. The transcripts of lncRNAs are >200 nucleotides in length and do not encode proteins. Compared with small RNAs, lncRNAs have longer sequences, more complex spatial structures, and more diverse and complex mechanisms involved in the regulation of gene expression. LncRNAs are widely involved in the biological processes of cells, and in the occurrence and development of many human diseases. Many studies have shown that lncRNAs can induce the occurrence of diseases, and some lncRNAs undergo specific changes in tumor cells. Research into the roles of lncRNAs has covered the diagnosis of, for example, cardiovascular, cerebrovascular, and central nervous system diseases. The bioinformatics of lncRNAs has gradually become a research hotspot and has led to the discovery of a large number of lncRNAs and associated biological functions, and lncRNA databases and recognition models have been developed. In this review, the research progress of lncRNAs is discussed, and lncRNA-related databases and the mechanisms and modes of action of lncRNAs are described. In addition, disease-related lncRNA methods and the relationships between lncRNAs and human lung adenocarcinoma, rectal cancer, colon cancer, heart disease, and diabetes are discussed. Finally, the significance and existing problems of lncRNA research are considered.
Collapse
Affiliation(s)
- Yuxin Gong
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China.,Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China.,Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Smart Education, Hainan Normal University, Ministry of Education, Haikou, China
| | - Wen Zhu
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Meili Sun
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Lei Shi
- Department of Spine Surgery, Changzheng Hospital, Naval Medical University, Shanghai, China
| |
Collapse
|
41
|
Gao L, Cui H, Zhang T, Sheng N, Xuan P. Prediction of drug-disease associations by integrating common topologies of heterogeneous networks and specific topologies of subnets. Brief Bioinform 2021; 23:6446271. [PMID: 34850815 DOI: 10.1093/bib/bbab467] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Revised: 09/23/2021] [Accepted: 10/13/2021] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION The development process of a new drug is time-consuming and costly. Thus, identifying new uses for approved drugs, named drug repositioning, is helpful for speeding up the drug development process and reducing development costs. Existing drug-related disease prediction methods mainly focus on single or multiple drug-disease heterogeneous networks. However, heterogeneous networks, and drug subnets and disease subnet contained in heterogeneous networks cover the common topology information between drug and disease nodes, the specific information between drug nodes and the specific information between disease nodes, respectively. RESULTS We design a novel model, CTST, to extract and integrate common and specific topologies in multiple heterogeneous networks and subnets. Multiple heterogeneous networks composed of drug and disease nodes are established to integrate multiple kinds of similarities and associations among drug and disease nodes. These heterogeneous networks contain multiple drug subnets and a disease subnet. For multiple heterogeneous networks and subnets, we then define the common and specific representations of drug and disease nodes. The common representations of drug and disease nodes are encoded by a graph convolutional autoencoder with sharing parameters and they integrate the topological relationships of all nodes in heterogeneous networks. The specific representations of nodes are learned by specific graph convolutional autoencoders, respectively, and they fuse the topology and attributes of the nodes in each subnet. We then propose attention mechanisms at common representation level and specific representation level to learn more informative common and specific representations, respectively. Finally, an integration module with representation feature level attention is built to adaptively integrate these two representations for final association prediction. Extensive experimental results confirm the effectiveness of CTST. Comparison with six latest methods and case studies on five drugs further verify CTST has the ability to discover potential candidate diseases.
Collapse
Affiliation(s)
- Ling Gao
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Hui Cui
- Department of Computer Science and Information Technology, La Trobe University, Melbourne 3083, Australia
| | - Tiangang Zhang
- School of Mathematical Science, Heilongjiang University, Harbin 150080, China
| | - Nan Sheng
- College of Computer Science and Technology, Jilin University, Changchun 130012, China
| | - Ping Xuan
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| |
Collapse
|
42
|
Zhang Y, Chen M, Huang L, Xie X, Li X, Jin H, Wang X, Wei H. Fusion of KATZ measure and space projection to fast probe potential lncRNA-disease associations in bipartite graphs. PLoS One 2021; 16:e0260329. [PMID: 34807960 PMCID: PMC8608294 DOI: 10.1371/journal.pone.0260329] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2021] [Accepted: 11/06/2021] [Indexed: 11/19/2022] Open
Abstract
It is well known that numerous long noncoding RNAs (lncRNAs) closely relate to the physiological and pathological processes of human diseases and can serves as potential biomarkers. Therefore, lncRNA-disease associations that are identified by computational methods as the targeted candidates reduce the cost of biological experiments focusing on deep study furtherly. However, inaccurate construction of similarity networks and inadequate numbers of observed known lncRNA–disease associations, such inherent problems make many mature computational methods that have been developed for many years still exit some limitations. It motivates us to explore a new computational method that was fused with KATZ measure and space projection to fast probing potential lncRNA-disease associations (namely KATZSP). KATZSP is comprised of following key steps: combining all the global information with which to change Boolean network of known lncRNA–disease associations into the weighted networks; changing the similarities calculation into counting the number of walks that connect lncRNA nodes and disease nodes in bipartite graphs; obtaining the space projection scores to refine the primary prediction scores. The process to fuse KATZ measure and space projection was simplified and uncomplicated with needing only one attenuation factor. The leave-one-out cross validation (LOOCV) experimental results showed that, compared with other state-of-the-art methods (NCPLDA, LDAI-ISPS and IIRWR), KATZSP had a higher predictive accuracy shown with area-under-the-curve (AUC) value on the three datasets built, while KATZSP well worked on inferring potential associations related to new lncRNAs (or isolated diseases). The results from real cases study (such as pancreas cancer, lung cancer and colorectal cancer) further confirmed that KATZSP is capable of superior predictive ability to be applied as a guide for traditional biological experiments.
Collapse
Affiliation(s)
- Yi Zhang
- School of Information Science and Engineering, Guilin University of Technology, Guilin, China
- Guangxi Key Laboratory of Embedded Technology and Intelligent System, Guilin University of Technology, Guilin, China
| | - Min Chen
- School of Computer Science and Technology, Hunan Institute of Technology, Hengyang, China
| | - Li Huang
- Academy of Arts and Design, Tsinghua University, Beijing, China
- The Future Laboratory, Tsinghua University, Beijing, China
| | - Xiaolan Xie
- School of Information Science and Engineering, Guilin University of Technology, Guilin, China
| | - Xin Li
- School of Information Science and Engineering, Guilin University of Technology, Guilin, China
| | - Hong Jin
- School of Information Science and Engineering, Guilin University of Technology, Guilin, China
| | - Xiaohua Wang
- Pharmacy School, Guilin Medical University, Guilin, China
| | - Hanyan Wei
- Pharmacy School, Guilin Medical University, Guilin, China
| |
Collapse
|
43
|
Xuan P, Zhan L, Cui H, Zhang T, Nakaguchi T, Zhang W. Graph Triple-Attention Network for Disease-related LncRNA Prediction. IEEE J Biomed Health Inform 2021; 26:2839-2849. [PMID: 34813484 DOI: 10.1109/jbhi.2021.3130110] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Abnormal expressions of long non-coding RNAs (lncRNAs) are associated with various human diseases. Identifying disease-related lncRNAs can help clarify complex disease pathogeneses. The latest methods for lncRNA-disease association prediction rely on diverse data about lncRNAs and diseases. These methods, however, cannot adequately integrate the neighbour topological information of lncRNA and disease nodes. Moreover, more intrinsic features of lncRNA-disease node pairs can be explored to better predict the latent associations between lncRNAs and diseases. We developed a novel method, named GTAN, to predict the association propensities between lncRNAs and diseases. GTAN integrates various information about lncRNAs and diseases, including similarities, associations and interactions among lncRNAs, diseases and miRNAs, and exploits neighbour topology and attribute representations of a pair of lncRNA-disease nodes. We adopted in GTAN a graph neural network architecture with three attention mechanisms and multi-layer convolutional neural networks. First, a neighbour-level self-attention mechanism is constructed to learn the importance of each neighbour for an interested lncRNA or disease node. Second, topology-level attention is proposed to enhance contextual dependencies among multiple local topology representations of the lncRNA or disease node. An attention-enhanced graph neural network framework is then established to learn a topology representation of top-ranked neighbours for a pair of lncRNA-disease nodes. GTAN also has attribute-level attention to distinguish various contributions of attributes of the lncRNA-disease pair. Finally, attribute representation is learned by multi-layer CNN to integrate detailed features and representative features of the pair. Extensive experimental results demonstrated that GTAN outperformed state-of-the-art methods. The improved recall rates also showed GTANs capacity for retrieving more actual lncRNA-disease associations in the top-ranked candidates. The ablation studies confirmed the important contributions of three attention mechanisms. Case studies on lung cancer, prostate cancer and colon cancer further showed GTANs ability in discovering potential lncRNA candidates related to diseases.
Collapse
|
44
|
Liu Y, Han K, Zhu YH, Zhang Y, Shen LC, Song J, Yu DJ. Improving protein fold recognition using triplet network and ensemble deep learning. Brief Bioinform 2021; 22:bbab248. [PMID: 34226918 PMCID: PMC8768454 DOI: 10.1093/bib/bbab248] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Revised: 06/04/2021] [Indexed: 12/24/2022] Open
Abstract
Protein fold recognition is a critical step toward protein structure and function prediction, aiming at providing the most likely fold type of the query protein. In recent years, the development of deep learning (DL) technique has led to massive advances in this important field, and accordingly, the sensitivity of protein fold recognition has been dramatically improved. Most DL-based methods take an intermediate bottleneck layer as the feature representation of proteins with new fold types. However, this strategy is indirect, inefficient and conditional on the hypothesis that the bottleneck layer's representation is assumed as a good representation of proteins with new fold types. To address the above problem, in this work, we develop a new computational framework by combining triplet network and ensemble DL. We first train a DL-based model, termed FoldNet, which employs triplet loss to train the deep convolutional network. FoldNet directly optimizes the protein fold embedding itself, making the proteins with the same fold types be closer to each other than those with different fold types in the new protein embedding space. Subsequently, using the trained FoldNet, we implement a new residue-residue contact-assisted predictor, termed FoldTR, which improves protein fold recognition. Furthermore, we propose a new ensemble DL method, termed FSD_XGBoost, which combines protein fold embedding with the other two discriminative fold-specific features extracted by two DL-based methods SSAfold and DeepFR. The Top 1 sensitivity of FSD_XGBoost increases to 74.8% at the fold level, which is ~9% higher than that of the state-of-the-art method. Together, the results suggest that fold-specific features extracted by different DL methods complement with each other, and their combination can further improve fold recognition at the fold level. The implemented web server of FoldTR and benchmark datasets are publicly available at http://csbio.njust.edu.cn/bioinf/foldtr/.
Collapse
Affiliation(s)
| | | | | | | | | | - Jiangning Song
- Corresponding authors: Dong-Jun Yu, School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China. E-mail: ; Jiangning Song, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia. E-mail:
| | - Dong-Jun Yu
- Corresponding authors: Dong-Jun Yu, School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China. E-mail: ; Jiangning Song, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia. E-mail:
| |
Collapse
|
45
|
Cheng Y, Gong Y, Liu Y, Song B, Zou Q. Molecular design in drug discovery: a comprehensive review of deep generative models. Brief Bioinform 2021; 22:6355420. [PMID: 34415297 DOI: 10.1093/bib/bbab344] [Citation(s) in RCA: 58] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 07/19/2021] [Accepted: 08/04/2021] [Indexed: 12/22/2022] Open
Abstract
Deep generative models have been an upsurge in the deep learning community since they were proposed. These models are designed for generating new synthetic data including images, videos and texts by fitting the data approximate distributions. In the last few years, deep generative models have shown superior performance in drug discovery especially de novo molecular design. In this study, deep generative models are reviewed to witness the recent advances of de novo molecular design for drug discovery. In addition, we divide those models into two categories based on molecular representations in silico. Then these two classical types of models are reported in detail and discussed about both pros and cons. We also indicate the current challenges in deep generative models for de novo molecular design. De novo molecular design automatically is promising but a long road to be explored.
Collapse
Affiliation(s)
- Yu Cheng
- College of Information Science and Engineering, Hunan University, 2 Lushan S Rd, Yuelu District, 410086, Changsha, China
| | - Yongshun Gong
- School of Software, Shandong University, 250100, Jinan, China
| | - Yuansheng Liu
- College of Information Science and Engineering, Hunan University, 2 Lushan S Rd, Yuelu District, 410086, Changsha, China
| | - Bosheng Song
- College of Information Science and Engineering, Hunan University, 2 Lushan S Rd, Yuelu District, 410086, Changsha, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, 610054, Chengdu, China
| |
Collapse
|
46
|
Yi HC, You ZH, Huang DS, Kwoh CK. Graph representation learning in bioinformatics: trends, methods and applications. Brief Bioinform 2021; 23:6361044. [PMID: 34471921 DOI: 10.1093/bib/bbab340] [Citation(s) in RCA: 58] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 07/18/2021] [Accepted: 08/02/2021] [Indexed: 12/12/2022] Open
Abstract
Graph is a natural data structure for describing complex systems, which contains a set of objects and relationships. Ubiquitous real-life biomedical problems can be modeled as graph analytics tasks. Machine learning, especially deep learning, succeeds in vast bioinformatics scenarios with data represented in Euclidean domain. However, rich relational information between biological elements is retained in the non-Euclidean biomedical graphs, which is not learning friendly to classic machine learning methods. Graph representation learning aims to embed graph into a low-dimensional space while preserving graph topology and node properties. It bridges biomedical graphs and modern machine learning methods and has recently raised widespread interest in both machine learning and bioinformatics communities. In this work, we summarize the advances of graph representation learning and its representative applications in bioinformatics. To provide a comprehensive and structured analysis and perspective, we first categorize and analyze both graph embedding methods (homogeneous graph embedding, heterogeneous graph embedding, attribute graph embedding) and graph neural networks. Furthermore, we summarize their representative applications from molecular level to genomics, pharmaceutical and healthcare systems level. Moreover, we provide open resource platforms and libraries for implementing these graph representation learning methods and discuss the challenges and opportunities of graph representation learning in bioinformatics. This work provides a comprehensive survey of emerging graph representation learning algorithms and their applications in bioinformatics. It is anticipated that it could bring valuable insights for researchers to contribute their knowledge to graph representation learning and future-oriented bioinformatics studies.
Collapse
Affiliation(s)
- Hai-Cheng Yi
- Chinese Academy of Sciences, Xinjiang Technical Institute of Physics and Chemistry, Urumqi 830011, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an 710129, China
| | - De-Shuang Huang
- Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, Shanghai 201804, China
| | - Chee Keong Kwoh
- School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore
| |
Collapse
|
47
|
A representation learning model based on variational inference and graph autoencoder for predicting lncRNA-disease associations. BMC Bioinformatics 2021; 22:136. [PMID: 33745450 PMCID: PMC7983260 DOI: 10.1186/s12859-021-04073-z] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2021] [Accepted: 03/11/2021] [Indexed: 01/01/2023] Open
Abstract
Background Numerous studies have demonstrated that long non-coding RNAs are related to plenty of human diseases. Therefore, it is crucial to predict potential lncRNA-disease associations for disease prognosis, diagnosis and therapy. Dozens of machine learning and deep learning algorithms have been adopted to this problem, yet it is still challenging to learn efficient low-dimensional representations from high-dimensional features of lncRNAs and diseases to predict unknown lncRNA-disease associations accurately. Results We proposed an end-to-end model, VGAELDA, which integrates variational inference and graph autoencoders for lncRNA-disease associations prediction. VGAELDA contains two kinds of graph autoencoders. Variational graph autoencoders (VGAE) infer representations from features of lncRNAs and diseases respectively, while graph autoencoders propagate labels via known lncRNA-disease associations. These two kinds of autoencoders are trained alternately by adopting variational expectation maximization algorithm. The integration of both the VGAE for graph representation learning, and the alternate training via variational inference, strengthens the capability of VGAELDA to capture efficient low-dimensional representations from high-dimensional features, and hence promotes the robustness and preciseness for predicting unknown lncRNA-disease associations. Further analysis illuminates that the designed co-training framework of lncRNA and disease for VGAELDA solves a geometric matrix completion problem for capturing efficient low-dimensional representations via a deep learning approach. Conclusion Cross validations and numerical experiments illustrate that VGAELDA outperforms the current state-of-the-art methods in lncRNA-disease association prediction. Case studies indicate that VGAELDA is capable of detecting potential lncRNA-disease associations. The source code and data are available at https://github.com/zhanglabNKU/VGAELDA. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04073-z.
Collapse
|