1
|
Xuan P, Wang W, Cui H, Wang S, Nakaguchi T, Zhang T. Mask-Guided Target Node Feature Learning and Dynamic Detailed Feature Enhancement for lncRNA-Disease Association Prediction. J Chem Inf Model 2024; 64:6662-6675. [PMID: 39112431 DOI: 10.1021/acs.jcim.4c00652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/27/2024]
Abstract
Identifying new relevant long noncoding RNAs (lncRNAs) for various human diseases can facilitate the exploration of the causes and progression of these diseases. Recently, several graph inference methods have been proposed to predict disease-related lncRNAs by exploiting the topological structure and node attributes within graphs. However, these methods did not prioritize the target lncRNA and disease nodes over auxiliary nodes like miRNA nodes, potentially limiting their ability to fully utilize the features of the target nodes. We propose a new method, mask-guided target node feature learning and dynamic detailed feature enhancement for lncRNA-disease association prediction (MDLD), to enhance node feature learning for improved lncRNA-disease association prediction. First, we designed a heterogeneous graph masked transformer autoencoder to guide feature learning, focusing more on the features of target lncRNA (disease) nodes. The target nodes were increasingly masked as training progressed, which helps develop a more robust prediction model. Second, we developed a graph convolutional network with dynamic residuals (GCNDR) to learn and integrate the heterogeneous topology and features of all lncRNA, disease, and miRNA nodes. GCNDR employs an interlayer residual strategy and a residual evolution strategy to mitigate oversmoothing caused by multilayer graph convolution. The interlayer residual strategy estimates the importance of node features learned in the previous GCN encoding layer for nodes in the current encoding layer. Additionally, since there are dependencies in the importance of features of individual lncRNA (disease, miRNA) nodes across multiple encoding layers, a gated recurrent unit-based strategy is proposed to encode these dependencies. Finally, we designed a perspective-level attention mechanism to obtain more informative features of lncRNA and disease node pairs from the perspectives of mask-enhanced and dynamic-enhanced node features. Cross-validation experimental results demonstrated that MDLD outperformed 10 other state-of-the-art prediction methods. Ablation experiments and case studies on candidate lncRNAs for three diseases further proved the technical contributions of MDLD and its capability to discover disease-related lncRNAs.
Collapse
Affiliation(s)
- Ping Xuan
- Department of Computer Science and Technology, Shantou University, Shantou 515063, China
- School of Mathematical Science, Heilongjiang University, Harbin 150080, China
| | - Wei Wang
- Department of Computer Science and Technology, Shantou University, Shantou 515063, China
| | - Hui Cui
- Department of Computer Science and Information Technology, La Trobe University, Melbourne 3083, Australia
| | - Shuai Wang
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China
| | - Toshiya Nakaguchi
- Center for Frontier Medical Engineering, Chiba University, Chiba 2638522, Japan
| | - Tiangang Zhang
- School of Mathematical Science, Heilongjiang University, Harbin 150080, China
| |
Collapse
|
2
|
Diao B, Luo J, Guo Y. A comprehensive survey on deep learning-based identification and predicting the interaction mechanism of long non-coding RNAs. Brief Funct Genomics 2024; 23:314-324. [PMID: 38576205 DOI: 10.1093/bfgp/elae010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 02/25/2024] [Accepted: 03/14/2024] [Indexed: 04/06/2024] Open
Abstract
Long noncoding RNAs (lncRNAs) have been discovered to be extensively involved in eukaryotic epigenetic, transcriptional, and post-transcriptional regulatory processes with the advancements in sequencing technology and genomics research. Therefore, they play crucial roles in the body's normal physiology and various disease outcomes. Presently, numerous unknown lncRNA sequencing data require exploration. Establishing deep learning-based prediction models for lncRNAs provides valuable insights for researchers, substantially reducing time and costs associated with trial and error and facilitating the disease-relevant lncRNA identification for prognosis analysis and targeted drug development as the era of artificial intelligence progresses. However, most lncRNA-related researchers lack awareness of the latest advancements in deep learning models and model selection and application in functional research on lncRNAs. Thus, we elucidate the concept of deep learning models, explore several prevalent deep learning algorithms and their data preferences, conduct a comprehensive review of recent literature studies with exemplary predictive performance over the past 5 years in conjunction with diverse prediction functions, critically analyze and discuss the merits and limitations of current deep learning models and solutions, while also proposing prospects based on cutting-edge advancements in lncRNA research.
Collapse
Affiliation(s)
- Biyu Diao
- Department of Breast Surgery, The First Affiliated Hospital of Ningbo University, No. 59, Liuting Street, Haishu District, Ningbo 315000, China
| | - Jin Luo
- Department of Breast Surgery, The First Affiliated Hospital of Ningbo University, No. 59, Liuting Street, Haishu District, Ningbo 315000, China
| | - Yu Guo
- Department of Breast Surgery, The First Affiliated Hospital of Ningbo University, No. 59, Liuting Street, Haishu District, Ningbo 315000, China
| |
Collapse
|
3
|
Peng L, Ren M, Huang L, Chen M. GEnDDn: An lncRNA-Disease Association Identification Framework Based on Dual-Net Neural Architecture and Deep Neural Network. Interdiscip Sci 2024; 16:418-438. [PMID: 38733474 DOI: 10.1007/s12539-024-00619-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2023] [Revised: 02/02/2024] [Accepted: 02/03/2024] [Indexed: 05/13/2024]
Abstract
Accumulating studies have demonstrated close relationships between long non-coding RNAs (lncRNAs) and diseases. Identification of new lncRNA-disease associations (LDAs) enables us to better understand disease mechanisms and further provides promising insights into cancer targeted therapy and anti-cancer drug design. Here, we present an LDA prediction framework called GEnDDn based on deep learning. GEnDDn mainly comprises two steps: First, features of both lncRNAs and diseases are extracted by combining similarity computation, non-negative matrix factorization, and graph attention auto-encoder, respectively. And each lncRNA-disease pair (LDP) is depicted as a vector based on concatenation operation on the extracted features. Subsequently, unknown LDPs are classified by aggregating dual-net neural architecture and deep neural network. Using six different evaluation metrics, we found that GEnDDn surpassed four competing LDA identification methods (SDLDA, LDNFSGB, IPCARF, LDASR) on the lncRNADisease and MNDR databases under fivefold cross-validation experiments on lncRNAs, diseases, LDPs, and independent lncRNAs and independent diseases, respectively. Ablation experiments further validated the powerful LDA prediction performance of GEnDDn. Furthermore, we utilized GEnDDn to find underlying lncRNAs for lung cancer and breast cancer. The results elucidated that there may be dense linkages between IFNG-AS1 and lung cancer as well as between HIF1A-AS1 and breast cancer. The results require further biomedical experimental verification. GEnDDn is publicly available at https://github.com/plhhnu/GEnDDn.
Collapse
Affiliation(s)
- Lihong Peng
- College of Life Science and Chemistry, Hunan University of Technology, Zhuzhou, 412007, China
| | - Mengnan Ren
- College of Life Science and Chemistry, Hunan University of Technology, Zhuzhou, 412007, China
| | - Liangliang Huang
- College of Life Science and Chemistry, Hunan University of Technology, Zhuzhou, 412007, China
| | - Min Chen
- School of Computer Science, Hunan Institute of Technology, Hengyang, 421002, China.
| |
Collapse
|
4
|
Xu P, Li C, Yuan J, Bao Z, Liu W. Predict lncRNA-drug associations based on graph neural network. Front Genet 2024; 15:1388015. [PMID: 38737125 PMCID: PMC11082279 DOI: 10.3389/fgene.2024.1388015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Accepted: 04/05/2024] [Indexed: 05/14/2024] Open
Abstract
LncRNAs are an essential type of non-coding RNAs, which have been reported to be involved in various human pathological conditions. Increasing evidence suggests that drugs can regulate lncRNAs expression, which makes it possible to develop lncRNAs as therapeutic targets. Thus, developing in-silico methods to predict lncRNA-drug associations (LDAs) is a critical step for developing lncRNA-based therapies. In this study, we predict LDAs by using graph convolutional networks (GCN) and graph attention networks (GAT) based on lncRNA and drug similarity networks. Results show that our proposed method achieves good performance (average AUCs > 0.92) on five datasets. In addition, case studies and KEGG functional enrichment analysis further prove that the model can effectively identify novel LDAs. On the whole, this study provides a deep learning-based framework for predicting novel LDAs, which will accelerate the lncRNA-targeted drug development process.
Collapse
Affiliation(s)
- Peng Xu
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, China
- School of Computer Science of Information Technology, Qiannan Normal University for Nationalities, Duyun, China
| | - Chuchu Li
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, China
| | - Jiaqi Yuan
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, China
| | - Zhenshen Bao
- College of Information Engineering, Taizhou University, Taizhou, Jiangsu, China
| | - Wenbin Liu
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangzhou, Guangdong, China
| |
Collapse
|
5
|
Yao D, Deng Y, Zhan X, Zhan X. Predicting lncRNA-disease associations using multiple metapaths in hierarchical graph attention networks. BMC Bioinformatics 2024; 25:46. [PMID: 38287236 PMCID: PMC11271052 DOI: 10.1186/s12859-024-05672-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Accepted: 01/23/2024] [Indexed: 01/31/2024] Open
Abstract
BACKGROUND Many biological studies have shown that lncRNAs regulate the expression of epigenetically related genes. The study of lncRNAs has helped to deepen our understanding of the pathogenesis of complex diseases at the molecular level. Due to the large number of lncRNAs and the complex and time-consuming nature of biological experiments, applying computer techniques to predict potential lncRNA-disease associations is very effective. To explore information between complex network structures, existing methods rely mainly on lncRNA and disease information. Metapaths have been applied to network models as an effective method for exploring information in heterogeneous graphs. However, existing methods are dominated by lncRNAs or disease nodes and tend to ignore the paths provided by intermediate nodes. METHODS We propose a deep learning model based on hierarchical graphical attention networks to predict unknown lncRNA-disease associations using multiple types of metapaths to extract features. We have named this model the MMHGAN. First, the model constructs a lncRNA-disease-miRNA heterogeneous graph based on known associations and two homogeneous graphs of lncRNAs and diseases. Second, for homogeneous graphs, the features of neighboring nodes are aggregated using a multihead attention mechanism. Third, for the heterogeneous graph, metapaths of different intermediate nodes are selected to construct subgraphs, and the importance of different types of metapaths is calculated and aggregated to obtain the final embedded features. Finally, the features are reconstructed using a fully connected layer to obtain the prediction results. RESULTS We used a fivefold cross-validation method and obtained an average AUC value of 96.07% and an average AUPR value of 93.23%. Additionally, ablation experiments demonstrated the role of homogeneous graphs and different intermediate node path weights. In addition, we studied lung cancer, esophageal carcinoma, and breast cancer. Among the 15 lncRNAs associated with these diseases, 15, 12, and 14 lncRNAs were validated by the lncRNA Disease Database and the Lnc2Cancer Database, respectively. CONCLUSION We compared the MMHGAN model with six existing models with better performance, and the case study demonstrated that the model was effective in predicting the correlation between potential lncRNAs and diseases.
Collapse
Affiliation(s)
- Dengju Yao
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, 150080, China.
| | - Yuexiao Deng
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, 150080, China
| | - Xiaojuan Zhan
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, 150080, China
- College of Computer Science and Technology, Heilongjiang Institute of Technology, Harbin, 150050, China
| | - Xiaorong Zhan
- Department of Endocrinology and Metabolism, Hospital of South, University of Science and Technology, Shenzhen, 518055, China
| |
Collapse
|
6
|
Li G, Bai P, Liang C, Luo J. Node-adaptive graph Transformer with structural encoding for accurate and robust lncRNA-disease association prediction. BMC Genomics 2024; 25:73. [PMID: 38233788 PMCID: PMC10795365 DOI: 10.1186/s12864-024-09998-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 01/09/2024] [Indexed: 01/19/2024] Open
Abstract
BACKGROUND Long noncoding RNAs (lncRNAs) are integral to a plethora of critical cellular biological processes, including the regulation of gene expression, cell differentiation, and the development of tumors and cancers. Predicting the relationships between lncRNAs and diseases can contribute to a better understanding of the pathogenic mechanisms of disease and provide strong support for the development of advanced treatment methods. RESULTS Therefore, we present an innovative Node-Adaptive Graph Transformer model for predicting unknown LncRNA-Disease Associations, named NAGTLDA. First, we utilize the node-adaptive feature smoothing (NAFS) method to learn the local feature information of nodes and encode the structural information of the fusion similarity network of diseases and lncRNAs using Structural Deep Network Embedding (SDNE). Next, the Transformer module is used to capture potential association information between the network nodes. Finally, we employ a Transformer module with two multi-headed attention layers for learning global-level embedding fusion. Network structure coding is added as the structural inductive bias of the network to compensate for the missing message-passing mechanism in Transformer. NAGTLDA achieved an average AUC of 0.9531 and AUPR of 0.9537 significantly higher than state-of-the-art methods in 5-fold cross validation. We perform case studies on 4 diseases; 55 out of 60 associations between lncRNAs and diseases have been validated in the literatures. The results demonstrate the enormous potential of the graph Transformer structure to incorporate graph structural information for uncovering lncRNA-disease unknown correlations. CONCLUSIONS Our proposed NAGTLDA model can serve as a highly efficient computational method for predicting biological information associations.
Collapse
Affiliation(s)
- Guanghui Li
- School of Information Engineering, East China Jiaotong University, Nanchang, China.
| | - Peihao Bai
- School of Information Engineering, East China Jiaotong University, Nanchang, China
| | - Cheng Liang
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China.
| |
Collapse
|
7
|
Yao D, Li B, Zhan X, Zhan X, Yu L. GCNFORMER: graph convolutional network and transformer for predicting lncRNA-disease associations. BMC Bioinformatics 2024; 25:5. [PMID: 38166659 PMCID: PMC10763317 DOI: 10.1186/s12859-023-05625-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2023] [Accepted: 12/18/2023] [Indexed: 01/05/2024] Open
Abstract
BACKGROUND A growing body of researches indicate that the disrupted expression of long non-coding RNA (lncRNA) is linked to a range of human disorders. Therefore, the effective prediction of lncRNA-disease association (LDA) can not only suggest solutions to diagnose a condition but also save significant time and labor costs. METHOD In this work, we proposed a novel LDA predicting algorithm based on graph convolutional network and transformer, named GCNFORMER. Firstly, we integrated the intraclass similarity and interclass connections between miRNAs, lncRNAs and diseases, and built a graph adjacency matrix. Secondly, to completely obtain the features between various nodes, we employed a graph convolutional network for feature extraction. Finally, to obtain the global dependencies between inputs and outputs, we used a transformer encoder with a multiheaded attention mechanism to forecast lncRNA-disease associations. RESULTS The results of fivefold cross-validation experiment on the public dataset revealed that the AUC and AUPR of GCNFORMER achieved 0.9739 and 0.9812, respectively. We compared GCNFORMER with six advanced LDA prediction models, and the results indicated its superiority over the other six models. Furthermore, GCNFORMER's effectiveness in predicting potential LDAs is underscored by case studies on breast cancer, colon cancer and lung cancer. CONCLUSIONS The combination of graph convolutional network and transformer can effectively improve the performance of LDA prediction model and promote the in-depth development of this research filed.
Collapse
Affiliation(s)
- Dengju Yao
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, 150080, China.
| | - Bailin Li
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, 150080, China
| | - Xiaojuan Zhan
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, 150080, China
- College of Computer Science and Technology, Heilongjiang Institute of Technology, Harbin, 150050, China
| | - Xiaorong Zhan
- Department of Endocrinology and Metabolism, Hospital of South, University of Science and Technology, Shenzhen, 518055, China
| | - Liyang Yu
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, 150080, China
| |
Collapse
|
8
|
Peng L, Huang L, Su Q, Tian G, Chen M, Han G. LDA-VGHB: identifying potential lncRNA-disease associations with singular value decomposition, variational graph auto-encoder and heterogeneous Newton boosting machine. Brief Bioinform 2023; 25:bbad466. [PMID: 38127089 PMCID: PMC10734633 DOI: 10.1093/bib/bbad466] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2023] [Revised: 10/05/2023] [Accepted: 11/25/2023] [Indexed: 12/23/2023] Open
Abstract
Long noncoding RNAs (lncRNAs) participate in various biological processes and have close linkages with diseases. In vivo and in vitro experiments have validated many associations between lncRNAs and diseases. However, biological experiments are time-consuming and expensive. Here, we introduce LDA-VGHB, an lncRNA-disease association (LDA) identification framework, by incorporating feature extraction based on singular value decomposition and variational graph autoencoder and LDA classification based on heterogeneous Newton boosting machine. LDA-VGHB was compared with four classical LDA prediction methods (i.e. SDLDA, LDNFSGB, IPCARF and LDASR) and four popular boosting models (XGBoost, AdaBoost, CatBoost and LightGBM) under 5-fold cross-validations on lncRNAs, diseases, lncRNA-disease pairs and independent lncRNAs and independent diseases, respectively. It greatly outperformed the other methods with its prominent performance under four different cross-validations on the lncRNADisease and MNDR databases. We further investigated potential lncRNAs for lung cancer, breast cancer, colorectal cancer and kidney neoplasms and inferred the top 20 lncRNAs associated with them among all their unobserved lncRNAs. The results showed that most of the predicted top 20 lncRNAs have been verified by biomedical experiments provided by the Lnc2Cancer 3.0, lncRNADisease v2.0 and RNADisease databases as well as publications. We found that HAR1A, KCNQ1DN, ZFAT-AS1 and HAR1B could associate with lung cancer, breast cancer, colorectal cancer and kidney neoplasms, respectively. The results need further biological experimental validation. We foresee that LDA-VGHB was capable of identifying possible lncRNAs for complex diseases. LDA-VGHB is publicly available at https://github.com/plhhnu/LDA-VGHB.
Collapse
Affiliation(s)
- Lihong Peng
- School of Computer Science, Hunan University of Technology, 412007, Hunan, China
- College of Life Sciences and Chemistry, Hunan University of Technology, 412007, Hunan, China
| | - Liangliang Huang
- School of Computer Science, Hunan University of Technology, 412007, Hunan, China
| | - Qiongli Su
- Department of Pharmacy, the Affiliated Zhuzhou Hospital Xiangya Medical College CSU, 412007, Hunan, China
| | - Geng Tian
- Geneis (Beijing) Co. Ltd, China, 100102, Beijing, China
| | - Min Chen
- School of Computer Science, Hunan Institute of Technology, 421002, No. 18 Henghua Road, Zhuhui District, Hengyang, Hunan, China
| | - Guosheng Han
- School of Mathematics and Computational Science, Xiangtan University, 411105, Yuhu District, Xiangtan, Hunan, China
- Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, 411105, Yuhu District, Xiangtan, Hunan, China
| |
Collapse
|
9
|
Wang S, Hui C, Zhang T, Wu P, Nakaguchi T, Xuan P. Graph Reasoning Method Based on Affinity Identification and Representation Decoupling for Predicting lncRNA-Disease Associations. J Chem Inf Model 2023; 63:6947-6958. [PMID: 37906529 DOI: 10.1021/acs.jcim.3c01214] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
An increasing number of studies have shown that dysregulation of lncRNAs is related to the occurrence of various diseases. Most of the previous methods, however, are designed based on homogeneity assumption that the representation of a target lncRNA (or disease) node should be updated by aggregating the attributes of its neighbor nodes. However, the assumption ignores the affinity nodes that are far from the target node. We present a novel prediction method, GAIRD, to fully leverage the heterogeneous information in the network and the decoupled node features. The first major innovation is a random walk strategy based on width-first searching and depth-first searching. Different from previous methods that only focus on homogeneous information, our new strategy learns both the homogeneous information within local neighborhoods and the heterogeneous information within higher-order neighborhoods. The second innovation is a representation decoupling module to extract the purer attributes and the purer topologies. Third, a module based on group convolution and deep separable convolution is developed to promote the pairwise intrachannel and interchannel feature learning. The experimental results show that GAIRD outperforms comparing state-of-the-art methods, and the ablation studies prove the contributions of major innovations. We also performed case studies on 3 diseases to further demonstrate the effectiveness of the GAIRD model in applications.
Collapse
Affiliation(s)
- Shuai Wang
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China
| | - Cui Hui
- Department of Computer Science and Information Technology, La Trobe University, Melbourne 3083, Australia
| | - Tiangang Zhang
- School of Mathematical Science, Heilongjiang University, Harbin 150080, China
| | - Peiliang Wu
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China
- Key Laboratory for Computer Virtual Technology and System Integration of Hebei Province, Qinhuangdao 066004, China
| | - Toshiya Nakaguchi
- Center for Frontier Medical Engineering, Chiba University, Chiba 2638522, Japan
| | - Ping Xuan
- Department of Computer Science, School of Engineering, Shantou University, Shantou 515063, China
| |
Collapse
|
10
|
Lu Z, Zhong H, Tang L, Luo J, Zhou W, Liu L. Predicting lncRNA-disease associations based on heterogeneous graph convolutional generative adversarial network. PLoS Comput Biol 2023; 19:e1011634. [PMID: 38019786 PMCID: PMC10686445 DOI: 10.1371/journal.pcbi.1011634] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 10/25/2023] [Indexed: 12/01/2023] Open
Abstract
There is a growing body of evidence indicating the crucial roles that long non-coding RNAs (lncRNAs) play in the development and progression of various diseases, including cancers, cardiovascular diseases, and neurological disorders. However, accurately predicting potential lncRNA-disease associations remains a challenge, as existing methods have limitations in extracting heterogeneous association information and handling sparse and unbalanced data. To address these issues, we propose a novel computational method, called HGC-GAN, which combines heterogeneous graph convolutional neural networks (GCN) and generative adversarial networks (GAN) to predict potential lncRNA-disease associations. Specifically, we construct a lncRNA-miRNA-disease heterogeneous network by integrating multiple association data and sequence information. The GCN-based generator is then employed to aggregate neighbor information of nodes and obtain node embeddings, which are used to predict lncRNA-disease associations. Meanwhile, the GAN-based discriminator is trained to distinguish between real and fake lncRNA-disease associations generated by the generator, enabling the generator to improve its ability to generate accurate lncRNA-disease associations gradually. Our experimental results demonstrate that HGC-GAN performs better in predicting potential lncRNA-disease associations, with AUC and AUPR values of 0.9591 and 0.9606, respectively, under 10-fold cross-validation. Moreover, our case study further confirms the effectiveness of HGC-GAN in predicting potential lncRNA-disease associations, even for novel lncRNAs without any known lncRNA-disease associations. Overall, our proposed method HGC-GAN provides a promising approach to predict potential lncRNA-disease associations and may have important implications for disease diagnosis, treatment, and drug development.
Collapse
Affiliation(s)
- Zhonghao Lu
- School of Information, Yunnan Normal University, Yunnan, People’s Republic of China
| | - Hua Zhong
- School of Information, Yunnan Normal University, Yunnan, People’s Republic of China
| | - Lin Tang
- Key Laboratory of Educational Information for Nationalities Ministry of Education, Yunnan Normal University, Yunnan, People’s Republic of China
| | - Jing Luo
- State Key Laboratory for Conservation and Utilization of Bio-resource in Yunnan, School of Life Sciences and School of Ecology and Environment, Yunnan University, Kunming, People’s Republic of China
| | - Wei Zhou
- School of Software, Yunnan University, Kunming, People’s Republic of China
| | - Lin Liu
- School of Information, Yunnan Normal University, Yunnan, People’s Republic of China
| |
Collapse
|
11
|
Cao Y, Xiao J, Sheng N, Qu Y, Wang Z, Sun C, Mu X, Huang Z, Li X. X-LDA: An interpretable and knowledge-informed heterogeneous graph learning framework for LncRNA-disease association prediction. Comput Biol Med 2023; 167:107634. [PMID: 39491920 DOI: 10.1016/j.compbiomed.2023.107634] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 10/06/2023] [Accepted: 10/23/2023] [Indexed: 11/05/2024]
Abstract
The identification of disease-related long noncoding RNAs (lncRNAs) is beneficial to unravel the intricacies of gene expression regulation and epigenetic signatures. Computational methods provide a cost-effective means to explore lncRNA-disease associations (LDAs). However, these methods often lack interpretability, leaving their predictions less convincing to biological and medical researchers. We propose an interpretable and knowledge-informed heterogeneous graph learning framework based on graph patch convolution and integrated gradients to predict LDAs and provides intuitive explanations for its predictions, called X-LDA. The heterogeneous graph is the foundation of the predictions of LDAs, we construct the knowledge-informed heterogeneous graph including LDAs drawn from biological experiments, lncRNA similarities rooted in gene sequences, disease similarities constructed based on disease categorizations. To integrate diverse biological premises and facilitate interpretability, we define nine distinct graph patch types, which encapsulate essential topological relationships within lncRNA-disease node pairs. X-LDA is designed to employ parameter sharing and multi-convolution kernels to grasp common and multiple perspectives of the graph patches, respectively. This approach culminates in the fusion of various semantic information into context embeddings. These post-hoc explanations hinge on graph patch features and integrated gradients, shedding light on the underlying factors driving predictions. Cross validation experiment on the dataset curated from databases and literatures demonstrates that the superior performance of X-LDA in comparison to nine state-of-the-art methods of three categories. X-LDA achieves a larger average area under the receiver operating curve 0.9891 (by at least 6.68%), and a larger average area under the precision-recall curve 0.7907 (by at least 23.2%) than competitive methods. The results of our well-designed ablation and interpretability experiments and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis demonstrate X-LDA's robustness, learnability, predictability, and interpretability. The applicability of X-LDA is also demonstrated through a case study involving the investigation of associated lncRNAs in prostate cancer, colorectal cancer, and breast cancer.
Collapse
Affiliation(s)
- Yangkun Cao
- School of Artificial Intelligence, Jilin University, Changchun, 130012, China
| | - Jun Xiao
- College of Computer Science and Technology, Jilin University, Changchun, 130012, China
| | - Nan Sheng
- College of Computer Science and Technology, Jilin University, Changchun, 130012, China
| | - Yinwei Qu
- College of Computer Science and Technology, Jilin University, Changchun, 130012, China
| | - Zhihang Wang
- College of Computer Science and Technology, Jilin University, Changchun, 130012, China
| | - Chang Sun
- College of Computer Science, Nankai University, Tianjin, 300071, China
| | - Xuechen Mu
- School of Mathematics, Jilin University, Changchun, 130012, China
| | - Zhenyu Huang
- College of Computer Science and Technology, Jilin University, Changchun, 130012, China.
| | - Xuan Li
- College of Computer Science and Technology, Jilin University, Changchun, 130012, China.
| |
Collapse
|
12
|
Shi M, Li X, Li M, Si Y. Attention-based generative adversarial networks improve prognostic outcome prediction of cancer from multimodal data. Brief Bioinform 2023; 24:bbad329. [PMID: 37756592 DOI: 10.1093/bib/bbad329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 08/20/2023] [Accepted: 08/28/2023] [Indexed: 09/29/2023] Open
Abstract
The prediction of prognostic outcome is critical for the development of efficient cancer therapeutics and potential personalized medicine. However, due to the heterogeneity and diversity of multimodal data of cancer, data integration and feature selection remain a challenge for prognostic outcome prediction. We proposed a deep learning method with generative adversarial network based on sequential channel-spatial attention modules (CSAM-GAN), a multimodal data integration and feature selection approach, for accomplishing prognostic stratification tasks in cancer. Sequential channel-spatial attention modules equipped with an encoder-decoder are applied for the input features of multimodal data to accurately refine selected features. A discriminator network was proposed to make the generator and discriminator learning in an adversarial way to accurately describe the complex heterogeneous information of multiple modal data. We conducted extensive experiments with various feature selection and classification methods and confirmed that the CSAM-GAN via the multilayer deep neural network (DNN) classifier outperformed these baseline methods on two different multimodal data sets with miRNA expression, mRNA expression and histopathological image data: lower-grade glioma and kidney renal clear cell carcinoma. The CSAM-GAN via the multilayer DNN classifier bridges the gap between heterogenous multimodal data and prognostic outcome prediction.
Collapse
Affiliation(s)
- Mingguang Shi
- School of Electrical Engineering and Automation, Hefei University of Technology, Hefei, Anhui 230009, China
| | - Xuefeng Li
- School of Electrical Engineering and Automation, Hefei University of Technology, Hefei, Anhui 230009, China
| | - Mingna Li
- School of Electrical Engineering and Automation, Hefei University of Technology, Hefei, Anhui 230009, China
| | - Yichong Si
- School of Electrical Engineering and Automation, Hefei University of Technology, Hefei, Anhui 230009, China
| |
Collapse
|
13
|
Xuan P, Bai H, Cui H, Zhang X, Nakaguchi T, Zhang T. Specific topology and topological connection sensitivity enhanced graph learning for lncRNA-disease association prediction. Comput Biol Med 2023; 164:107265. [PMID: 37531860 DOI: 10.1016/j.compbiomed.2023.107265] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Revised: 06/26/2023] [Accepted: 07/16/2023] [Indexed: 08/04/2023]
Abstract
Predicting disease-related candidate long noncoding RNAs (lncRNAs) is beneficial for exploring disease pathogenesis due to the close relations between lncRNAs and the occurrence and development of human diseases. It is a long-term and challenging task to adequately extract specific and local topologies in individual lncRNA network and individual disease network, and integrate the information of the connection relationships. We propose a new graph learning-based prediction method to encode specific and local topologies from each individual network, neighbor topologies with different connection relationships, and pairwise attributes. We first construct a lncRNA network composed of all the lncRNA nodes and their similarities, and a single disease network that contains all the disease nodes and disease similarities. Then, a network-aware graph convolutional autoencoder is constructed to encode the specific and local topologies of each network. Secondly, a heterogeneous network is established to embed all lncRNA, disease, and miRNA nodes and their various connections. Afterwards, a connection-sensitive graph neural network is designed to deeply integrate the neighbor node attributes and connection characteristics in the heterogeneous network and learn neighbor topological representations. We also construct both connection-level and topology representation-level attention mechanisms to extract informative connections and topological representations. Finally, we build a multi-layer convolutional neural networks with weighted residuals to adaptively complement the detailed features to pairwise attribute encoding. Comprehensive experiments and comparison results demonstrated that NCPred outperforms seven state-of-the-art prediction methods. The ablation studies demonstrated the importance of local topology learning, neighbor topology learning, and pairwise attribute encoding. Case studies on prostate, lung, and breast cancers further revealed NCPred's capacity to screen potential candidate disease-related lncRNAs.
Collapse
Affiliation(s)
- Ping Xuan
- Department of Computer Science, School of Engineering, Shantou University, Shantou, China
| | - Honglei Bai
- School of Computer Science and Technology, Heilongjiang University, Harbin, China
| | - Hui Cui
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, Australia
| | - Xiaowen Zhang
- School of Computer Science and Technology, Heilongjiang University, Harbin, China
| | - Toshiya Nakaguchi
- Center for Frontier Medical Engineering, Chiba University, Chiba, Japan
| | - Tiangang Zhang
- School of Computer Science and Technology, Heilongjiang University, Harbin, China; School of Mathematical Science, Heilongjiang University, Harbin, China.
| |
Collapse
|
14
|
Su Z, Lu H, Wu Y, Li Z, Duan L. Predicting potential lncRNA biomarkers for lung cancer and neuroblastoma based on an ensemble of a deep neural network and LightGBM. Front Genet 2023; 14:1238095. [PMID: 37655066 PMCID: PMC10466784 DOI: 10.3389/fgene.2023.1238095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Accepted: 07/19/2023] [Indexed: 09/02/2023] Open
Abstract
Introduction: Lung cancer is one of the most frequent neoplasms worldwide with approximately 2.2 million new cases and 1.8 million deaths each year. The expression levels of programmed death ligand-1 (PDL1) demonstrate a complex association with lung cancer. Neuroblastoma is a high-risk malignant tumor and is mainly involved in childhood patients. Identification of new biomarkers for these two diseases can significantly promote their diagnosis and therapy. However, in vivo experiments to discover potential biomarkers are costly and laborious. Consequently, artificial intelligence technologies, especially machine learning methods, provide a powerful avenue to find new biomarkers for various diseases. Methods: We developed a machine learning-based method named LDAenDL to detect potential long noncoding RNA (lncRNA) biomarkers for lung cancer and neuroblastoma using an ensemble of a deep neural network and LightGBM. LDAenDL first computes the Gaussian kernel similarity and functional similarity of lncRNAs and the Gaussian kernel similarity and semantic similarity of diseases to obtain their similar networks. Next, LDAenDL combines a graph convolutional network, graph attention network, and convolutional neural network to learn the biological features of the lncRNAs and diseases based on their similarity networks. Third, these features are concatenated and fed to an ensemble model composed of a deep neural network and LightGBM to find new lncRNA-disease associations (LDAs). Finally, the proposed LDAenDL method is applied to identify possible lncRNA biomarkers associated with lung cancer and neuroblastoma. Results: The experimental results show that LDAenDL computed the best AUCs of 0.8701, 107 0.8953, and 0.9110 under cross-validation on lncRNAs, diseases, and lncRNA-disease pairs on Dataset 1, respectively, and 0.9490, 0.9157, and 0.9708 on Dataset 2, respectively. Furthermore, AUPRs of 0.8903, 0.9061, and 0.9166 under three cross-validations were obtained on Dataset 1, and 0.9582, 0.9122, and 0.9743 on Dataset 2. The results demonstrate that LDAenDL significantly outperformed the other four classical LDA prediction methods (i.e., SDLDA, LDNFSGB, IPCAF, and LDASR). Case studies demonstrate that CCDC26 and IFNG-AS1 may be new biomarkers of lung cancer, SNHG3 may associate with PDL1 for lung cancer, and HOTAIR and BDNF-AS may be potential biomarkers of neuroblastoma. Conclusion: We hope that the proposed LDAenDL method can help the development of targeted therapies for these two diseases.
Collapse
Affiliation(s)
- Zhenguo Su
- Clinical Lab, Yantai Affiliated Hospital of Binzhou Medical University, Yantai, China
| | - Huihui Lu
- Department of Thoracic Cardiovascular Surgery, Hunan Province Directly Affiliated TCM Hospital, Zhuzhou, China
| | - Yan Wu
- Geneis (Beijing) Co., Ltd., Beijing, China
| | - Zejun Li
- School of Computer Science, Hunan Institute of Technology, Hengyang, China
| | - Lian Duan
- Faculty of Pediatrics, The Chinese PLA General Hospital, Beijing, China
- Department of Pediatric Surgery, The Seventh Medical Center of PLA General Hospital, Beijing, China
- National Engineering Laboratory for Birth Defects Prevention and Control of Key Technology, Beijing, China
- Beijing Key Laboratory of Pediatric Organ Failure, Beijing, China
| |
Collapse
|
15
|
Lu C, Xie M. LDAEXC: LncRNA-Disease Associations Prediction with Deep Autoencoder and XGBoost Classifier. Interdiscip Sci 2023:10.1007/s12539-023-00573-z. [PMID: 37308797 DOI: 10.1007/s12539-023-00573-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2022] [Revised: 05/14/2023] [Accepted: 05/15/2023] [Indexed: 06/14/2023]
Abstract
Numerous scientific evidences have revealed that long non-coding RNAs (lncRNAs) are involved in the progression of human complex diseases and biological life activities. Therefore, identifying novel and potential disease-related lncRNAs is helpful to diagnosis, prognosis and therapy of many human complex diseases. Since traditional laboratory experiments are cost and time-consuming, a great quantity of computer algorithms have been proposed for predicting the relationships between lncRNAs and diseases. However, there are still much room for the improvement. In this paper, we introduce an accurate framework named LDAEXC to infer LncRNA-Disease Associations with deep autoencoder and XGBoost Classifier. LDAEXC utilizes different similarity views of lncRNAs and human diseases to construct features for each data sources. Then, the reduced features are obtained by feeding the constructed feature vectors into a deep autoencoder, and at last an XGBoost classifier is leveraged to calculate the latent lncRNA-disease-associated scores using reduced features. The fivefold cross-validation experiments on four datasets showed that LDAEXC reached AUC scores of 0.9676 ± 0.0043, 0.9449 ± 0.022, 0.9375 ± 0.0331 and 0.9556 ± 0.0134, respectively, significantly higher than other advanced similar computer methods. Extensive experiment results and case studies of two complex diseases (colon and breast cancers) further indicated the practicability and excellent prediction performance of LDAEXC in inferring unknown lncRNA-disease associations. TLDAEXC utilizes disease semantic similarity, lncRNA expression similarity, and Gaussian interaction profile kernel similarity of lncRNAs and diseases for feature construction. The constructed features are fed to a deep autoencoder to extract reduced features, and an XGBoost classifier is used to predict the lncRNA-disease associations based on the reduced features. The fivefold and tenfold cross-validation experiments on a benchmark dataset showed that LDAEXC could achieve AUC scores of 0.9676 and 0.9682, respectively, significantly higher than other state-of-the-art similar methods.
Collapse
Affiliation(s)
- Cuihong Lu
- College of Information Science and Engineering, Hunan Normal University, Changsha, China
| | - Minzhu Xie
- College of Information Science and Engineering, Hunan Normal University, Changsha, China.
| |
Collapse
|
16
|
Zhong H, Luo J, Tang L, Liao S, Lu Z, Lin G, Murphy RW, Liu L. Association filtering and generative adversarial networks for predicting lncRNA-associated disease. BMC Bioinformatics 2023; 24:234. [PMID: 37277721 DOI: 10.1186/s12859-023-05368-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 05/29/2023] [Indexed: 06/07/2023] Open
Abstract
BACKGROUND Long non-coding RNA (lncRNA) closely associates with numerous biological processes, and with many diseases. Therefore, lncRNA-disease association prediction helps obtain relevant biological information and understand pathogenesis, and thus better diagnose preventable diseases. RESULTS Herein, we offer the LDAF_GAN method for predicting lncRNA-associated disease based on association filtering and generative adversarial networks. Experimentation used two types of data: lncRNA-disease associated data without lncRNA sequence features, and fused lncRNA sequence features. LDAF_GAN uses a generator and discriminator, and differs from the original GAN by the addition of a filtering operation and negative sampling. Filtering allows the generator output to filter out unassociated diseases before being fed into the discriminator. Thus, the results generated by the model focuses only on lncRNAs associated with disease. Negative sampling takes a portion of disease terms with 0 from the association matrix as negative samples, which are assumed to be unassociated with lncRNA. A regular term is added to the loss function to avoid producing a vector with all values of 1, which can fool the discriminator. Thus, the model requires that generated positive samples are close to 1, and negative samples are close to 0. The model achieved a superior fitting effect; LDAF_GAN had superior performance in predicting fivefold cross-validations on the two datasets with AUC values of 0.9265 and 0.9278, respectively. In the case study, LDAF_GAN predicted disease association for six lncRNAs-H19, MALAT1, XIST, ZFAS1, UCA1, and ZEB1-AS1-and with the top ten predictions of 100%, 80%, 90%, 90%, 100%, and 90%, respectively, which were reported by previous studies. CONCLUSION LDAF_GAN efficiently predicts the potential association of existing lncRNAs and the potential association of new lncRNAs with diseases. The results of fivefold cross-validation, tenfold cross-validation, and case studies suggest that the model has great predictive potential for lncRNA-disease association prediction.
Collapse
Affiliation(s)
- Hua Zhong
- School of Information Science, Yunnan Normal University, Kunming, China
| | - Jing Luo
- State Key Laboratory for Conservation and Utilization of Bio-resource, School of Ecology and Environment and School of Life Sciences, Yunnan University, Kunming, China
| | - Lin Tang
- Key Laboratory of Educational lnformation for Nationalities Ministry of Education, Yunnan University, Kunming, China
| | - Shicheng Liao
- School of Information Science, Yunnan Normal University, Kunming, China
| | - Zhonghao Lu
- School of Information Science, Yunnan Normal University, Kunming, China
| | - Guoliang Lin
- School of Medicine, Yunnan University, Kunming, China
| | - Robert W Murphy
- Reptilia Zoo and Education Centre, 2501 Rutherford Rd., Vaughan, ON, L4K 2N6, Canada
| | - Lin Liu
- School of Information Science, Yunnan Normal University, Kunming, China.
| |
Collapse
|
17
|
Xuan P, Zhao Y, Cui H, Zhan L, Jin Q, Zhang T, Nakaguchi T. Semantic Meta-Path Enhanced Global and Local Topology Learning for lncRNA-Disease Association Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1480-1491. [PMID: 36173783 DOI: 10.1109/tcbb.2022.3209571] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Since abnormal expression of long non-coding RNAs (lncRNAs) is associated with various human diseases, identifying disease-related lncRNAs helps reveal the pathogenesis of diseases. Existing methods for lncRNA-disease association prediction mainly focus on multi-sourced data related to lncRNAs and diseases. The rich semantic information of meta-paths, composed of multiple kinds of connections between lncRNA and disease nodes, is neglected. We propose a new prediction method, MGLDA, to encode and integrate the semantics of multiple meta-paths, the global topology of heterogeneous graph, and pairwise attributes of lncRNA and disease nodes. First, a tri-layer heterogeneous graph is constructed to associate multi-sourced data across the lncRNA, disease, and miRNA nodes. Afterwards, we establish multiple meta-paths connecting the lncRNA and disease nodes to derive and denote various semantics. Each meta-path contains its specific semantics formulated by an embedding strategy, and each embedding covers local topology formed by the diverse semantic connections among the lncRNA, disease, and miRNA nodes. We construct multiple graph convolutional autoencoders (GCA) with topology-level attention to learn global and multiple local topologies from the tri-layer graph and each meta-path, respectively. The topology-level attention mechanism can learn the importance of various global and local topologies for adaptive pairwise topology fusion. Finally, a convolutional autoencoder learns the attribute representations of lncRNA-disease pairs, which integrates the learnt detailed and representative pairwise features. Experimental results show that MGLDA outperforms other state-of-the-art prediction methods in comparison and retrieves more real lncRNA-disease associations in the top-ranked candidates. The ablation study also demonstrates the important contributions of the local and global topology learning, and pairwise attribute learning. Case studies on three diseases further demonstrate MGLDA's ability to identify potential disease-related lncRNAs.
Collapse
|
18
|
Recent advances in predicting lncRNA-disease associations based on computational methods. Drug Discov Today 2023; 28:103432. [PMID: 36370992 DOI: 10.1016/j.drudis.2022.103432] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 10/19/2022] [Accepted: 11/03/2022] [Indexed: 11/11/2022]
Abstract
Mutations in and dysregulation of long non-coding RNAs (lncRNAs) are closely associated with the development of various human complex diseases, but only a few lncRNAs have been experimentally confirmed to be associated with human diseases. Predicting new potential lncRNA-disease associations (LDAs) will help us to understand the pathogenesis of human diseases and to detect disease markers, as well as in disease diagnosis, prevention and treatment. Computational methods can effectively narrow down the screening scope of biological experiments, thereby reducing the duration and cost of such experiments. In this review, we outline recent advances in computational methods for predicting LDAs, focusing on LDA databases, lncRNA/disease similarity calculations, and advanced computational models. In addition, we analyze the limitations of various computational models and discuss future challenges and directions for development.
Collapse
|
19
|
Sheng N, Huang L, Lu Y, Wang H, Yang L, Gao L, Xie X, Fu Y, Wang Y. Data resources and computational methods for lncRNA-disease association prediction. Comput Biol Med 2023; 153:106527. [PMID: 36610216 DOI: 10.1016/j.compbiomed.2022.106527] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Revised: 12/08/2022] [Accepted: 12/31/2022] [Indexed: 01/03/2023]
Abstract
Increasing interest has been attracted in deciphering the potential disease pathogenesis through lncRNA-disease association (LDA) prediction, regarding to the diverse functional roles of lncRNAs in genome regulation. Whilst, computational models and algorithms benefit systematic biology research, even facilitate the classical biological experimental procedures. In this review, we introduce representative diseases associated with lncRNAs, such as cancers, cardiovascular diseases, and neurological diseases. Current publicly available resources related to lncRNAs and diseases have also been included. Furthermore, all of the 64 computational methods for LDA prediction have been divided into 5 groups, including machine learning-based methods, network propagation-based methods, matrix factorization- and completion-based methods, deep learning-based methods, and graph neural network-based methods. The common evaluation methods and metrics in LDA prediction have also been discussed. Finally, the challenges and future trends in LDA prediction have been discussed. Recent advances in LDA prediction approaches have been summarized in the GitHub repository at https://github.com/sheng-n/lncRNA-disease-methods.
Collapse
Affiliation(s)
- Nan Sheng
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Lan Huang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China.
| | - Yuting Lu
- School of Artificial Intelligence, Jilin University, Changchun, China
| | - Hao Wang
- Department of Hepatopancreatobiliary Surgery, Second Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Lili Yang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China; Department of Obstetrics, The First Hospital of Jilin University, Changchun, China
| | - Ling Gao
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Xuping Xie
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Yuan Fu
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, Ceredigion, United Kingdom
| | - Yan Wang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China; School of Artificial Intelligence, Jilin University, Changchun, China.
| |
Collapse
|
20
|
Zhang Z, Xu J, Wu Y, Liu N, Wang Y, Liang Y. CapsNet-LDA: predicting lncRNA-disease associations using attention mechanism and capsule network based on multi-view data. Brief Bioinform 2023; 24:6889447. [PMID: 36511221 DOI: 10.1093/bib/bbac531] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Revised: 10/25/2022] [Accepted: 11/07/2022] [Indexed: 12/15/2022] Open
Abstract
Cumulative studies have shown that many long non-coding RNAs (lncRNAs) are crucial in a number of diseases. Predicting potential lncRNA-disease associations (LDAs) can facilitate disease prevention, diagnosis and treatment. Therefore, it is vital to develop practical computational methods for LDA prediction. In this study, we propose a novel predictor named capsule network (CapsNet)-LDA for LDA prediction. CapsNet-LDA first uses a stacked autoencoder for acquiring the informative low-dimensional representations of the lncRNA-disease pairs under multiple views, then the attention mechanism is leveraged to implement an adaptive allocation of importance weights to them, and they are subsequently processed using a CapsNet-based architecture for predicting LDAs. Different from the conventional convolutional neural networks (CNNs) that have some restrictions with the usage of scalar neurons and pooling operations. the CapsNets use vector neurons instead of scalar neurons that have better robustness for the complex combination of features and they use dynamic routing processes for updating parameters. CapsNet-LDA is superior to other five state-of-the-art models on four benchmark datasets, four perturbed datasets and an independent test set in the comparison experiments, demonstrating that CapsNet-LDA has excellent performance and robustness against perturbation, as well as good generalization ability. The ablation studies verify the effectiveness of some modules of CapsNet-LDA. Moreover, the ability of multi-view data to improve performance is proven. Case studies further indicate that CapsNet-LDA can accurately predict novel LDAs for specific diseases.
Collapse
Affiliation(s)
- Zequn Zhang
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, 310045 Jiangxi, China
| | - Junlin Xu
- College of Information Science and Engineering, Hunan University, Changsha 410082, Hunan, China
| | - Yanan Wu
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, 310045 Jiangxi, China
| | - Niannian Liu
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, 310045 Jiangxi, China
| | - Yinglong Wang
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, 310045 Jiangxi, China
| | - Ying Liang
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, 310045 Jiangxi, China
| |
Collapse
|
21
|
Tan J, Li X, Zhang L, Du Z. Recent advances in machine learning methods for predicting LncRNA and disease associations. Front Cell Infect Microbiol 2022; 12:1071972. [PMID: 36530425 PMCID: PMC9748103 DOI: 10.3389/fcimb.2022.1071972] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Accepted: 11/11/2022] [Indexed: 12/03/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) are involved in almost the entire cell life cycle through different mechanisms and play an important role in many key biological processes. Mutations and dysregulation of lncRNAs have been implicated in many complex human diseases. Therefore, identifying the relationship between lncRNAs and diseases not only contributes to biologists' understanding of disease mechanisms, but also provides new ideas and solutions for disease diagnosis, treatment, prognosis and prevention. Since the existing experimental methods for predicting lncRNA-disease associations (LDAs) are expensive and time consuming, machine learning methods for predicting lncRNA-disease associations have become increasingly popular among researchers. In this review, we summarize some of the human diseases studied by LDAs prediction models, association and similarity features of LDAs prediction, performance evaluation methods of models and some advanced machine learning prediction models of LDAs. Finally, we discuss the potential limitations of machine learning-based methods for LDAs prediction and provide some ideas for designing new prediction models.
Collapse
|
22
|
Xuan P, Wang S, Cui H, Zhao Y, Zhang T, Wu P. Learning global dependencies and multi-semantics within heterogeneous graph for predicting disease-related lncRNAs. Brief Bioinform 2022; 23:6695267. [DOI: 10.1093/bib/bbac361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Revised: 07/18/2022] [Accepted: 08/05/2022] [Indexed: 11/14/2022] Open
Abstract
Abstract
Motivation
Long noncoding RNAs (lncRNAs) play an important role in the occurrence and development of diseases. Predicting disease-related lncRNAs can help to understand the pathogenesis of diseases deeply. The existing methods mainly rely on multi-source data related to lncRNAs and diseases when predicting the associations between lncRNAs and diseases. There are interdependencies among node attributes in a heterogeneous graph composed of all lncRNAs, diseases and micro RNAs. The meta-paths composed of various connections between them also contain rich semantic information. However, the existing methods neglect to integrate attribute information of intermediate nodes in meta-paths.
Results
We propose a novel association prediction model, GSMV, to learn and deeply integrate the global dependencies, semantic information of meta-paths and node-pair multi-view features related to lncRNAs and diseases. We firstly formulate the global representations of the lncRNA and disease nodes by establishing a self-attention mechanism to capture and learn the global dependencies among node attributes. Second, starting from the lncRNA and disease nodes, respectively, multiple meta-pathways are established to reveal different semantic information. Considering that each meta-path contains specific semantics and has multiple meta-path instances which have different contributions to revealing meta-path semantics, we design a graph neural network based module which consists of a meta-path instance encoding strategy and two novel attention mechanisms. The proposed meta-path instance encoding strategy is used to learn the contextual connections between nodes within a meta-path instance. One of the two new attention mechanisms is at the meta-path instance level, which learns rich and informative meta-path instances. The other attention mechanism integrates various semantic information from multiple meta-paths to learn the semantic representation of lncRNA and disease nodes. Finally, a dilated convolution-based learning module with adjustable receptive fields is proposed to learn multi-view features of lncRNA-disease node pairs. The experimental results prove that our method outperforms seven state-of-the-art comparing methods for lncRNA-disease association prediction. Ablation experiments demonstrate the contributions of the proposed global representation learning, semantic information learning, pairwise multi-view feature learning and the meta-path instance encoding strategy. Case studies on three cancers further demonstrate our method’s ability to discover potential disease-related lncRNA candidates.
Contact
zhang@hlju.edu.cn or peiliangwu@ysu.edu.cn
Supplementary information
Supplementary data are available at Briefings in Bioinformatics online.
Collapse
Affiliation(s)
- Ping Xuan
- School of Information Science and Engineering (School of Software), Yanshan University , Qinhuangdao 066004, China
- School of Computer Science and Technology, Heilongjiang University , Harbin 150080, China
| | - Shuai Wang
- School of Information Science and Engineering (School of Software), Yanshan University , Qinhuangdao 066004, China
| | - Hui Cui
- Department of Computer Science and Information Technology, La Trobe University , Melbourne 3083, Australia
| | - Yue Zhao
- School of Computer Science and Technology, Heilongjiang University , Harbin 150080, China
| | - Tiangang Zhang
- School of Mathematical Science, Heilongjiang University , Harbin 150080, China
| | - Peiliang Wu
- School of Information Science and Engineering (School of Software), Yanshan University , Qinhuangdao 066004, China
| |
Collapse
|
23
|
Eptaminitaki GC, Stellas D, Bonavida B, Baritaki S. Long Non-coding RNAs (lncRNAs) signaling in Cancer Chemoresistance: From Prediction to Druggability. Drug Resist Updat 2022; 65:100866. [DOI: 10.1016/j.drup.2022.100866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 09/15/2022] [Accepted: 09/19/2022] [Indexed: 11/03/2022]
|
24
|
Xu L, Li X, Yang Q, Tan L, Liu Q, Liu Y. Application of Bidirectional Generative Adversarial Networks to Predict Potential miRNAs Associated With Diseases. Front Genet 2022; 13:936823. [PMID: 35903359 PMCID: PMC9314862 DOI: 10.3389/fgene.2022.936823] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Accepted: 06/08/2022] [Indexed: 11/18/2022] Open
Abstract
Substantial evidence has shown that microRNAs are crucial for biological processes within complex human diseases. Identifying the association of miRNA–disease pairs will contribute to accelerating the discovery of potential biomarkers and pathogenesis. Researchers began to focus on constructing computational models to facilitate the progress of disease pathology and clinical medicine by identifying the potential disease-related miRNAs. However, most existing computational methods are expensive, and their use is limited to unobserved relationships for unknown miRNAs (diseases) without association information. In this manuscript, we proposed a creatively semi-supervised model named bidirectional generative adversarial network for miRNA-disease association prediction (BGANMDA). First, we constructed a microRNA similarity network, a disease similarity network, and Gaussian interaction profile kernel similarity based on the known miRNA–disease association and comprehensive similarity of miRNAs (diseases). Next, an integrated similarity feature network with the full underlying relationships of miRNA–disease pairwise was obtained. Then, the similarity feature network was fed into the BGANMDA model to learn advanced traits in latent space. Finally, we ranked an association score list and predicted the associations between miRNA and disease. In our experiment, a five-fold cross validation was applied to estimate BGANMDA’s performance, and an area under the curve (AUC) of 0.9319 and a standard deviation of 0.00021 were obtained. At the same time, in the global and local leave-one-out cross validation (LOOCV), the AUC value and standard deviation of BGANMDA were 0.9116 ± 0.0025 and 0.8928 ± 0.0022, respectively. Furthermore, BGANMDA was employed in three different case studies to validate its prediction capability and accuracy. The experimental results of the case studies showed that 46, 46, and 48 of the top 50 prediction lists had been identified in previous studies.
Collapse
Affiliation(s)
- Long Xu
- School of Computer Science and Technology, Heilongjiang University, Harbin, China
| | - Xiaokun Li
- School of Computer Science and Technology, Heilongjiang University, Harbin, China
- Postdoctoral Program of Heilongjiang Hengxun Technology Co., Ltd., Heilongjiang University, Harbin, China
- *Correspondence: Xiaokun Li, ; Yong Liu,
| | - Qiang Yang
- School of Electronic Engineering, Heilongjiang University, Harbin, China
| | - Long Tan
- School of Computer Science and Technology, Heilongjiang University, Harbin, China
| | - Qingyuan Liu
- Postdoctoral Program of Heilongjiang Hengxun Technology Co., Ltd., Heilongjiang University, Harbin, China
| | - Yong Liu
- School of Computer Science and Technology, Heilongjiang University, Harbin, China
- *Correspondence: Xiaokun Li, ; Yong Liu,
| |
Collapse
|
25
|
Sheng N, Huang L, Wang Y, Zhao J, Xuan P, Gao L, Cao Y. Multi-channel graph attention autoencoders for disease-related lncRNAs prediction. Brief Bioinform 2022; 23:6519791. [PMID: 35108355 DOI: 10.1093/bib/bbab604] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Revised: 12/08/2021] [Accepted: 12/27/2021] [Indexed: 12/31/2022] Open
Abstract
MOTIVATION Predicting disease-related long non-coding RNAs (lncRNAs) can be used as the biomarkers for disease diagnosis and treatment. The development of effective computational prediction approaches to predict lncRNA-disease associations (LDAs) can provide insights into the pathogenesis of complex human diseases and reduce experimental costs. However, few of the existing methods use microRNA (miRNA) information and consider the complex relationship between inter-graph and intra-graph in complex-graph for assisting prediction. RESULTS In this paper, the relationships between the same types of nodes and different types of nodes in complex-graph are introduced. We propose a multi-channel graph attention autoencoder model to predict LDAs, called MGATE. First, an lncRNA-miRNA-disease complex-graph is established based on the similarity and correlation among lncRNA, miRNA and diseases to integrate the complex association among them. Secondly, in order to fully extract the comprehensive information of the nodes, we use graph autoencoder networks to learn multiple representations from complex-graph, inter-graph and intra-graph. Thirdly, a graph-level attention mechanism integration module is adopted to adaptively merge the three representations, and a combined training strategy is performed to optimize the whole model to ensure the complementary and consistency among the multi-graph embedding representations. Finally, multiple classifiers are explored, and Random Forest is used to predict the association score between lncRNA and disease. Experimental results on the public dataset show that the area under receiver operating characteristic curve and area under precision-recall curve of MGATE are 0.964 and 0.413, respectively. MGATE performance significantly outperformed seven state-of-the-art methods. Furthermore, the case studies of three cancers further demonstrate the ability of MGATE to identify potential disease-correlated candidate lncRNAs. The source code and supplementary data are available at https://github.com/sheng-n/MGATE. CONTACT huanglan@jlu.edu.cn, wy6868@jlu.edu.cn.
Collapse
Affiliation(s)
- Nan Sheng
- Key laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
| | - Lan Huang
- Key laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
| | - Yan Wang
- Key laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China.,School of Artificial Intelligence, Jilin University, Changchun 130012, China
| | - Jing Zhao
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus OH 43210, USA
| | - Ping Xuan
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Ling Gao
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Yangkun Cao
- School of Artificial Intelligence, Jilin University, Changchun 130012, China
| |
Collapse
|
26
|
Wang L, Zhong C. gGATLDA: lncRNA-disease association prediction based on graph-level graph attention network. BMC Bioinformatics 2022; 23:11. [PMID: 34983363 PMCID: PMC8729153 DOI: 10.1186/s12859-021-04548-z] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2021] [Accepted: 12/21/2021] [Indexed: 01/20/2023] Open
Abstract
Background Long non-coding RNAs (lncRNAs) are related to human diseases by regulating gene expression. Identifying lncRNA-disease associations (LDAs) will contribute to diagnose, treatment, and prognosis of diseases. However, the identification of LDAs by the biological experiments is time-consuming, costly and inefficient. Therefore, the development of efficient and high-accuracy computational methods for predicting LDAs is of great significance. Results In this paper, we propose a novel computational method (gGATLDA) to predict LDAs based on graph-level graph attention network. Firstly, we extract the enclosing subgraphs of each lncRNA-disease pair. Secondly, we construct the feature vectors by integrating lncRNA similarity and disease similarity as node attributes in subgraphs. Finally, we train a graph neural network (GNN) model by feeding the subgraphs and feature vectors to it, and use the trained GNN model to predict lncRNA-disease potential association scores. The experimental results show that our method can achieve higher area under the receiver operation characteristic curve (AUC), area under the precision recall curve (AUPR), accuracy and F1-Score than the state-of-the-art methods in five fold cross-validation. Case studies show that our method can effectively identify lncRNAs associated with breast cancer, gastric cancer, prostate cancer, and renal cancer. Conclusion The experimental results indicate that our method is a useful approach for predicting potential LDAs.
Collapse
Affiliation(s)
- Li Wang
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, China.,School of Computer, Electronics and Information, Guangxi University, Nanning, China
| | - Cheng Zhong
- School of Computer, Electronics and Information, Guangxi University, Nanning, China. .,Key Laboratory of Parallel and Distributed Computing in Guangxi Colleges and Universities, Guangxi University, Nanning, China.
| |
Collapse
|