1
|
Rinaldi S, Moroni E, Rozza R, Magistrato A. Frontiers and Challenges of Computing ncRNAs Biogenesis, Function and Modulation. J Chem Theory Comput 2024; 20:993-1018. [PMID: 38287883 DOI: 10.1021/acs.jctc.3c01239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2024]
Abstract
Non-coding RNAs (ncRNAs), generated from nonprotein coding DNA sequences, constitute 98-99% of the human genome. Non-coding RNAs encompass diverse functional classes, including microRNAs, small interfering RNAs, PIWI-interacting RNAs, small nuclear RNAs, small nucleolar RNAs, and long non-coding RNAs. With critical involvement in gene expression and regulation across various biological and physiopathological contexts, such as neuronal disorders, immune responses, cardiovascular diseases, and cancer, non-coding RNAs are emerging as disease biomarkers and therapeutic targets. In this review, after providing an overview of non-coding RNAs' role in cell homeostasis, we illustrate the potential and the challenges of state-of-the-art computational methods exploited to study non-coding RNAs biogenesis, function, and modulation. This can be done by directly targeting them with small molecules or by altering their expression by targeting the cellular engines underlying their biosynthesis. Drawing from applications, also taken from our work, we showcase the significance and role of computer simulations in uncovering fundamental facets of ncRNA mechanisms and modulation. This information may set the basis to advance gene modulation tools and therapeutic strategies to address unmet medical needs.
Collapse
Affiliation(s)
- Silvia Rinaldi
- National Research Council of Italy (CNR) - Institute of Chemistry of OrganoMetallic Compounds (ICCOM), c/o Area di Ricerca CNR di Firenze Via Madonna del Piano 10, 50019 Sesto Fiorentino, Florence, Italy
| | - Elisabetta Moroni
- National Research Council of Italy (CNR) - Institute of Chemical Sciences and Technologies (SCITEC), via Mario Bianco 9, 20131 Milano, Italy
| | - Riccardo Rozza
- National Research Council of Italy (CNR) - Institute of Material Foundry (IOM) c/o International School for Advanced Studies (SISSA), Via Bonomea, 265, 34136 Trieste, Italy
| | - Alessandra Magistrato
- National Research Council of Italy (CNR) - Institute of Material Foundry (IOM) c/o International School for Advanced Studies (SISSA), Via Bonomea, 265, 34136 Trieste, Italy
| |
Collapse
|
2
|
Li G, Bai P, Liang C, Luo J. Node-adaptive graph Transformer with structural encoding for accurate and robust lncRNA-disease association prediction. BMC Genomics 2024; 25:73. [PMID: 38233788 PMCID: PMC10795365 DOI: 10.1186/s12864-024-09998-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 01/09/2024] [Indexed: 01/19/2024] Open
Abstract
BACKGROUND Long noncoding RNAs (lncRNAs) are integral to a plethora of critical cellular biological processes, including the regulation of gene expression, cell differentiation, and the development of tumors and cancers. Predicting the relationships between lncRNAs and diseases can contribute to a better understanding of the pathogenic mechanisms of disease and provide strong support for the development of advanced treatment methods. RESULTS Therefore, we present an innovative Node-Adaptive Graph Transformer model for predicting unknown LncRNA-Disease Associations, named NAGTLDA. First, we utilize the node-adaptive feature smoothing (NAFS) method to learn the local feature information of nodes and encode the structural information of the fusion similarity network of diseases and lncRNAs using Structural Deep Network Embedding (SDNE). Next, the Transformer module is used to capture potential association information between the network nodes. Finally, we employ a Transformer module with two multi-headed attention layers for learning global-level embedding fusion. Network structure coding is added as the structural inductive bias of the network to compensate for the missing message-passing mechanism in Transformer. NAGTLDA achieved an average AUC of 0.9531 and AUPR of 0.9537 significantly higher than state-of-the-art methods in 5-fold cross validation. We perform case studies on 4 diseases; 55 out of 60 associations between lncRNAs and diseases have been validated in the literatures. The results demonstrate the enormous potential of the graph Transformer structure to incorporate graph structural information for uncovering lncRNA-disease unknown correlations. CONCLUSIONS Our proposed NAGTLDA model can serve as a highly efficient computational method for predicting biological information associations.
Collapse
Affiliation(s)
- Guanghui Li
- School of Information Engineering, East China Jiaotong University, Nanchang, China.
| | - Peihao Bai
- School of Information Engineering, East China Jiaotong University, Nanchang, China
| | - Cheng Liang
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China.
| |
Collapse
|
3
|
Chen X, Lv Q, Liu Y. A Comprehensive Genome-Wide Analysis of lncRNA Expression Profile during Hepatic Carcinoma Cell Proliferation Promoted by Phospholipase Cγ2. CYTOL GENET+ 2023. [DOI: 10.3103/s0095452723020032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/08/2023]
|
4
|
Recent advances in predicting lncRNA-disease associations based on computational methods. Drug Discov Today 2023; 28:103432. [PMID: 36370992 DOI: 10.1016/j.drudis.2022.103432] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 10/19/2022] [Accepted: 11/03/2022] [Indexed: 11/11/2022]
Abstract
Mutations in and dysregulation of long non-coding RNAs (lncRNAs) are closely associated with the development of various human complex diseases, but only a few lncRNAs have been experimentally confirmed to be associated with human diseases. Predicting new potential lncRNA-disease associations (LDAs) will help us to understand the pathogenesis of human diseases and to detect disease markers, as well as in disease diagnosis, prevention and treatment. Computational methods can effectively narrow down the screening scope of biological experiments, thereby reducing the duration and cost of such experiments. In this review, we outline recent advances in computational methods for predicting LDAs, focusing on LDA databases, lncRNA/disease similarity calculations, and advanced computational models. In addition, we analyze the limitations of various computational models and discuss future challenges and directions for development.
Collapse
|
5
|
Du XX, Liu Y, Wang B, Zhang JF. lncRNA-disease association prediction method based on the nearest neighbor matrix completion model. Sci Rep 2022; 12:21653. [PMID: 36522410 PMCID: PMC9755128 DOI: 10.1038/s41598-022-25730-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2022] [Accepted: 12/05/2022] [Indexed: 12/23/2022] Open
Abstract
State-of-the-art medical studies proved that long noncoding ribonucleic acids (lncRNAs) are closely related to various diseases. However, their large-scale detection in biological experiments is problematic and expensive. To aid screening and improve the efficiency of biological experiments, this study introduced a prediction model based on the nearest neighbor concept for lncRNA-disease association prediction. We used a new similarity algorithm in the model that fused potential associations. The experimental validation of the proposed algorithm proved its superiority over the available Cosine, Pearson, and Jaccard similarity algorithms. Satisfactory results in the comparative leave-one-out cross-validation test (with AUC = 0.96) confirmed its excellent predictive performance. Finally, the proposed model's reliability was confirmed by performing predictions using a new dataset, yielding AUC = 0.92.
Collapse
Affiliation(s)
- Xiao-xin Du
- grid.412616.60000 0001 0002 2355College of Computer and Control, Qiqihar University, Qiqihar, 161006 China
| | - Yan Liu
- grid.412616.60000 0001 0002 2355College of Computer and Control, Qiqihar University, Qiqihar, 161006 China
| | - Bo Wang
- grid.412616.60000 0001 0002 2355College of Computer and Control, Qiqihar University, Qiqihar, 161006 China
| | - Jian-fei Zhang
- grid.412616.60000 0001 0002 2355College of Computer and Control, Qiqihar University, Qiqihar, 161006 China
| |
Collapse
|
6
|
Zha W, Li S, Xu H, Chen J, Liu K, Li P, Liu K, Yang G, Chen Z, Shi S, Zhou L, You A. Genome-wide identification of long non-coding (lncRNA) in Nilaparvata lugens's adaptability to resistant rice. PeerJ 2022; 10:e13587. [PMID: 35910769 PMCID: PMC9332332 DOI: 10.7717/peerj.13587] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Accepted: 05/24/2022] [Indexed: 01/22/2023] Open
Abstract
Background The brown planthopper (BPH), Nilaparvata lugens (Stål), is a very destructive pest that poses a major threat to rice plants worldwide. BPH and rice have developed complex feeding and defense strategies in the long-term co-evolution. Methods To explore the molecular mechanism of BPH's adaptation to resistant rice varieties, the lncRNA expression profiles of two virulent BPH populations were analyzed. The RNA-seq method was used to obtain the lncRNA expression data in TN1 and YHY15. Results In total, 3,112 highly reliable lncRNAs in TN1 and YHY15 were identified. Compared to the expression profiles between TN1 and YHY15, 157 differentially expressed lncRNAs, and 675 differentially expressed mRNAs were identified. Further analysis of the possible regulation relationships between differentially expressed lncRNAs and differentially expressed mRNAs, identified three pair antisense targets, nine pair cis-regulation targets, and 3,972 pair co-expressed targets. Function enriched found arginine and proline metabolism, glutathione metabolism, and carbon metabolism categories may significantly affect the adaptability in BPH when it is exposed to susceptible and resistant rice varieties. Altogether, it provided scientific data for the study of lncRNA regulation of brown planthopper resistance to rice. These results are helpful in the development of new control strategies for host defense against BPH and breeding rice for high yield.
Collapse
Affiliation(s)
- Wenjun Zha
- Hubei Key Laboratory of Food Crop Germplasm and Genetic Improvement, Food Crops Institute, Hubei Academy of Agricultural Sciences, Wuhan, China
| | - Sanhe Li
- Hubei Key Laboratory of Food Crop Germplasm and Genetic Improvement, Food Crops Institute, Hubei Academy of Agricultural Sciences, Wuhan, China
| | - Huashan Xu
- Hubei Key Laboratory of Food Crop Germplasm and Genetic Improvement, Food Crops Institute, Hubei Academy of Agricultural Sciences, Wuhan, China
| | - Junxiao Chen
- Hubei Key Laboratory of Food Crop Germplasm and Genetic Improvement, Food Crops Institute, Hubei Academy of Agricultural Sciences, Wuhan, China
| | - Kai Liu
- Hubei Key Laboratory of Food Crop Germplasm and Genetic Improvement, Food Crops Institute, Hubei Academy of Agricultural Sciences, Wuhan, China
| | - Peide Li
- Hubei Key Laboratory of Food Crop Germplasm and Genetic Improvement, Food Crops Institute, Hubei Academy of Agricultural Sciences, Wuhan, China
| | - Kai Liu
- Hubei Key Laboratory of Food Crop Germplasm and Genetic Improvement, Food Crops Institute, Hubei Academy of Agricultural Sciences, Wuhan, China
| | - Guocai Yang
- Hubei Key Laboratory of Food Crop Germplasm and Genetic Improvement, Food Crops Institute, Hubei Academy of Agricultural Sciences, Wuhan, China
| | - Zhijun Chen
- Hubei Key Laboratory of Food Crop Germplasm and Genetic Improvement, Food Crops Institute, Hubei Academy of Agricultural Sciences, Wuhan, China
| | - Shaojie Shi
- Hubei Key Laboratory of Food Crop Germplasm and Genetic Improvement, Food Crops Institute, Hubei Academy of Agricultural Sciences, Wuhan, China
| | - Lei Zhou
- Hubei Key Laboratory of Food Crop Germplasm and Genetic Improvement, Food Crops Institute, Hubei Academy of Agricultural Sciences, Wuhan, China
| | - Aiqing You
- Hubei Key Laboratory of Food Crop Germplasm and Genetic Improvement, Food Crops Institute, Hubei Academy of Agricultural Sciences, Wuhan, China,Hubei Hongshan Laboratory, Wuhan, Hubei, China
| |
Collapse
|
7
|
Xuan P, Gong Z, Cui H, Li B, Zhang T. Fully connected autoencoder and convolutional neural network with attention-based method for inferring disease-related lncRNAs. Brief Bioinform 2022; 23:6561435. [PMID: 35362511 DOI: 10.1093/bib/bbac089] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Revised: 01/17/2022] [Accepted: 02/23/2022] [Indexed: 11/14/2022] Open
Abstract
Since abnormal expression of long noncoding RNAs (lncRNAs) is often closely related to various human diseases, identification of disease-associated lncRNAs is helpful for exploring the complex pathogenesis. Most of recent methods concentrate on exploiting multiple kinds of data related to lncRNAs and diseases for predicting candidate disease-related lncRNAs. These methods, however, failed to deeply integrate the topology information from the meta-paths that are composed of lncRNA, disease and microRNA (miRNA) nodes. We proposed a new method based on fully connected autoencoders and convolutional neural networks, called ACLDA, for inferring potential disease-related lncRNA candidates. A heterogeneous graph that consists of lncRNA, disease and miRNA nodes were firstly constructed to integrate similarities, associations and interactions among them. Fully connected autoencoder-based module was established to extract the low-dimensional features of lncRNA, disease and miRNA nodes in the heterogeneous graph. We designed the attention mechanisms at the node feature level and at the meta-path level to learn more informative features and meta-paths. A module based on convolutional neural networks was constructed to encode the local topologies of lncRNA and disease nodes from multiple meta-path perspectives. The comprehensive experimental results demonstrated ACLDA achieves superior performance than several state-of-the-art prediction methods. Case studies on breast, lung and colon cancers demonstrated that ACLDA is able to discover the potential disease-related lncRNAs.
Collapse
Affiliation(s)
- Ping Xuan
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Zhe Gong
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Hui Cui
- Department of Computer Science and Information Technology, La Trobe University, Melbourne 3083, Australia
| | - Bochong Li
- Center for Frontier Medical Engineering, Chiba University, Chiba 2638522, Japan
| | - Tiangang Zhang
- School of Mathematical Science, Heilongjiang University, Harbin 150080, China
| |
Collapse
|
8
|
Xie G, Jiang J, Sun Y. LDA-LNSUBRW: lncRNA-Disease Association Prediction Based on Linear Neighborhood Similarity and Unbalanced bi-Random Walk. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:989-997. [PMID: 32870798 DOI: 10.1109/tcbb.2020.3020595] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Increasing number of experiments show that lncRNAs are involved in many biological processes, and their mutations and disorders are associated with many diseases. However, verifying the relationships between lncRNAs and diseases is time consuming and laborio. Searching for effective computational methods will contribute to our understanding of the underlying mechanisms of disease and identifying biomarkers of diseases. Therefore, we proposed a method called lncRNA-disease association prediction based on linear neighborhood similarity and unbalanced bi-random walk (LDA-LNSUBRW). Given that the known lncRNA-disease associations are rare, a pretreatment step should be performed to obtain the interaction possibility of unknown cases, so as to help us predict the potential associations. In the framework of leave-one-out cross-validation (LOOCV)and fivefold cross-validation (5-fold CV), LDA-LNSUBRW achieved effective performance with AUC of 0.8874 and 0.8632 ± 0.0051, respectively. The experimental results in this paper show that the proposed method is superior to five other state-of-the-art methods. In addition, case studies of three diseases (lung cancer, breast cancer, and osteosarcoma)were carried out to illustrate that LDA-LNSUBRW could predict the relevant lncRNAs.
Collapse
|
9
|
Wang L, Shang M, Dai Q, He PA. Prediction of lncRNA-disease association based on a Laplace normalized random walk with restart algorithm on heterogeneous networks. BMC Bioinformatics 2022; 23:5. [PMID: 34983367 PMCID: PMC8729064 DOI: 10.1186/s12859-021-04538-1] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Accepted: 12/15/2021] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND More and more evidence showed that long non-coding RNAs (lncRNAs) play important roles in the development and progression of human sophisticated diseases. Therefore, predicting human lncRNA-disease associations is a challenging and urgently task in bioinformatics to research of human sophisticated diseases. RESULTS In the work, a global network-based computational framework called as LRWRHLDA were proposed which is a universal network-based method. Firstly, four isomorphic networks include lncRNA similarity network, disease similarity network, gene similarity network and miRNA similarity network were constructed. And then, six heterogeneous networks include known lncRNA-disease, lncRNA-gene, lncRNA-miRNA, disease-gene, disease-miRNA, and gene-miRNA associations network were applied to design a multi-layer network. Finally, the Laplace normalized random walk with restart algorithm in this global network is suggested to predict the relationship between lncRNAs and diseases. CONCLUSIONS The ten-fold cross validation is used to evaluate the performance of LRWRHLDA. As a result, LRWRHLDA achieves an AUC of 0.98402, which is higher than other compared methods. Furthermore, LRWRHLDA can predict isolated disease-related lnRNA (isolated lnRNA related disease). The results for colorectal cancer, lung adenocarcinoma, stomach cancer and breast cancer have been verified by other researches. The case studies indicated that our method is effective.
Collapse
Affiliation(s)
- Liugen Wang
- School of Science, Zhejiang Sci-Tech University, Hangzhou, 310018, China
| | - Min Shang
- School of Science, Zhejiang Sci-Tech University, Hangzhou, 310018, China
| | - Qi Dai
- College of Life Science, Zhejiang Sci-Tech University, Hangzhou, 310018, China
| | - Ping-An He
- School of Science, Zhejiang Sci-Tech University, Hangzhou, 310018, China.
| |
Collapse
|
10
|
Graph convolutional network approach to discovering disease-related circRNA-miRNA-mRNA axes. Methods 2021; 198:45-55. [PMID: 34758394 DOI: 10.1016/j.ymeth.2021.10.006] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2021] [Revised: 10/07/2021] [Accepted: 10/19/2021] [Indexed: 02/05/2023] Open
Abstract
Non-coding RNAs are gaining prominence in biology and medicine, as they play major roles in cellular homeostasis among which the circRNA-miRNA-mRNA axes are involved in a series of disease-related pathways, such as apoptosis, cell invasion and metastasis. Recently, many computational methods have been developed for the prediction of the relationship between ncRNAs and diseases, which can alleviate the time-consuming and labor-intensive exploration involved with biological experiments. However, these methods handle ncRNAs separately, ignoring the impact of the interactions among ncRNAs on the diseases. In this paper we present a novel approach to discovering disease-related circRNA-miRNA-mRNA axes from the disease-RNA information network. Our method, using graph convolutional network, learns the characteristic representation of each biological entity by propagating and aggregating local neighbor information based on the global structure of the network. The approach is evaluated using the real-world datasets and the results show that it outperforms other state-of-the-art baselines on most of the metrics.
Collapse
|
11
|
Ding P, Ouyang W, Luo J, Kwoh CK. Heterogeneous information network and its application to human health and disease. Brief Bioinform 2021; 21:1327-1346. [PMID: 31566212 DOI: 10.1093/bib/bbz091] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 06/29/2019] [Accepted: 06/30/2019] [Indexed: 12/11/2022] Open
Abstract
The molecular components with the functional interdependencies in human cell form complicated biological network. Diseases are mostly caused by the perturbations of the composite of the interaction multi-biomolecules, rather than an abnormality of a single biomolecule. Furthermore, new biological functions and processes could be revealed by discovering novel biological entity relationships. Hence, more and more biologists focus on studying the complex biological system instead of the individual biological components. The emergence of heterogeneous information network (HIN) offers a promising way to systematically explore complicated and heterogeneous relationships between various molecules for apparently distinct phenotypes. In this review, we first present the basic definition of HIN and the biological system considered as a complex HIN. Then, we discuss the topological properties of HIN and how these can be applied to detect network motif and functional module. Afterwards, methodologies of discovering relationships between disease and biomolecule are presented. Useful insights on how HIN aids in drug development and explores human interactome are provided. Finally, we analyze the challenges and opportunities for uncovering combinatorial patterns among pharmacogenomics and cell-type detection based on single-cell genomic data.
Collapse
Affiliation(s)
- Pingjian Ding
- School of Computer Science, University of South China, Hengyang, China
| | - Wenjue Ouyang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Chee-Keong Kwoh
- School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
12
|
Yuan L, Zhao J, Sun T, Shen Z. A machine learning framework that integrates multi-omics data predicts cancer-related LncRNAs. BMC Bioinformatics 2021; 22:332. [PMID: 34134612 PMCID: PMC8210375 DOI: 10.1186/s12859-021-04256-8] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Accepted: 06/07/2021] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND LncRNAs (Long non-coding RNAs) are a type of non-coding RNA molecule with transcript length longer than 200 nucleotides. LncRNA has been novel candidate biomarkers in cancer diagnosis and prognosis. However, it is difficult to discover the true association mechanism between lncRNAs and complex diseases. The unprecedented enrichment of multi-omics data and the rapid development of machine learning technology provide us with the opportunity to design a machine learning framework to study the relationship between lncRNAs and complex diseases. RESULTS In this article, we proposed a new machine learning approach, namely LGDLDA (LncRNA-Gene-Disease association networks based LncRNA-Disease Association prediction), for disease-related lncRNAs association prediction based multi-omics data, machine learning methods and neural network neighborhood information aggregation. Firstly, LGDLDA calculates the similarity matrix of lncRNA, gene and disease respectively, and it calculates the similarity between lncRNAs through the lncRNA expression profile matrix, lncRNA-miRNA interaction matrix and lncRNA-protein interaction matrix. We obtain gene similarity matrix by calculating the lncRNA-gene association matrix and the gene-disease association matrix, and we obtain disease similarity matrix by calculating the disease ontology, the disease-miRNA association matrix, and Gaussian interaction profile kernel similarity. Secondly, LGDLDA integrates the neighborhood information in similarity matrices by using nonlinear feature learning of neural network. Thirdly, LGDLDA uses embedded node representations to approximate the observed matrices. Finally, LGDLDA ranks candidate lncRNA-disease pairs and then selects potential disease-related lncRNAs. CONCLUSIONS Compared with lncRNA-disease prediction methods, our proposed method takes into account more critical information and obtains the performance improvement cancer-related lncRNA predictions. Randomly split data experiment results show that the stability of LGDLDA is better than IDHI-MIRW, NCPLDA, LncDisAP and NCPHLDA. The results on different simulation data sets show that LGDLDA can accurately and effectively predict the disease-related lncRNAs. Furthermore, we applied the method to three real cancer data including gastric cancer, colorectal cancer and breast cancer to predict potential cancer-related lncRNAs.
Collapse
Affiliation(s)
- Lin Yuan
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Daxue Road 3501, Jinan, 250353, Shandong, China
| | - Jing Zhao
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Daxue Road 3501, Jinan, 250353, Shandong, China
| | - Tao Sun
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Daxue Road 3501, Jinan, 250353, Shandong, China
| | - Zhen Shen
- School of Computer and Software, Nanyang Institute of Technology, Changjiang Road 80, Nanyang, 473004, Henan, China.
| |
Collapse
|
13
|
Chen Q, Lai D, Lan W, Wu X, Chen B, Liu J, Chen YPP, Wang J. ILDMSF: Inferring Associations Between Long Non-Coding RNA and Disease Based on Multi-Similarity Fusion. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1106-1112. [PMID: 31443046 DOI: 10.1109/tcbb.2019.2936476] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The dysregulation and mutation of long non-coding RNAs (lncRNAs) have been proved to result in a variety of human diseases. Identifying potential disease-related lncRNAs may benefit disease diagnosis, treatment and prognosis. A number of methods have been proposed to predict the potential lncRNA-disease relationships. However, most of them may give rise to incorrect results due to relying on single similarity measure. This article proposes a novel framework (ILDMSF) by fusing the lncRNA similarities and disease similarities, which are measured by lncRNA-related gene and known lncRNA-disease interaction and disease semantic interaction, and known lncRNA-disease interaction, respectively. Further, the support vector machine is employed to identify the potential lncRNA-disease associations based on the integrated similarity. The leave-one-out cross validation is performed to compare ILDMSF with other state of the art methods. The experimental results demonstrate our method is prospective in exploring potential correlations between lncRNA and disease.
Collapse
|
14
|
Lei X, Mudiyanselage TB, Zhang Y, Bian C, Lan W, Yu N, Pan Y. A comprehensive survey on computational methods of non-coding RNA and disease association prediction. Brief Bioinform 2020; 22:6042241. [PMID: 33341893 DOI: 10.1093/bib/bbaa350] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 10/20/2020] [Accepted: 11/01/2020] [Indexed: 02/06/2023] Open
Abstract
The studies on relationships between non-coding RNAs and diseases are widely carried out in recent years. A large number of experimental methods and technologies of producing biological data have also been developed. However, due to their high labor cost and production time, nowadays, calculation-based methods, especially machine learning and deep learning methods, have received a lot of attention and been used commonly to solve these problems. From a computational point of view, this survey mainly introduces three common non-coding RNAs, i.e. miRNAs, lncRNAs and circRNAs, and the related computational methods for predicting their association with diseases. First, the mainstream databases of above three non-coding RNAs are introduced in detail. Then, we present several methods for RNA similarity and disease similarity calculations. Later, we investigate ncRNA-disease prediction methods in details and classify these methods into five types: network propagating, recommend system, matrix completion, machine learning and deep learning. Furthermore, we provide a summary of the applications of these five types of computational methods in predicting the associations between diseases and miRNAs, lncRNAs and circRNAs, respectively. Finally, the advantages and limitations of various methods are identified, and future researches and challenges are also discussed.
Collapse
Affiliation(s)
- Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | | | - Yuchen Zhang
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Chen Bian
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Wei Lan
- School of Computer, Electronics and Information at Guangxi University, Nanning, China
| | - Ning Yu
- Department of Computing Sciences at the College at Brockport, State University of New York, Rochester, NY, USA
| | - Yi Pan
- Computer Science Department at Georgia State University, Atlanta, GA, USA
| |
Collapse
|
15
|
Zhang J, Jiang Z, Hu X, Song B. A novel graph attention adversarial network for predicting disease-related associations. Methods 2020; 179:81-88. [PMID: 32446956 DOI: 10.1016/j.ymeth.2020.05.010] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2020] [Revised: 05/01/2020] [Accepted: 05/13/2020] [Indexed: 10/24/2022] Open
Abstract
Identifying complex human diseases at molecular level is very helpful, especially in diseases diagnosis, therapy, prognosis and monitoring. Accumulating evidences demonstrated that RNAs are playing important roles in identifying various complex human diseases. However, the amount of verified disease-related RNAs is still little while many of their biological experiments are very time-consuming and labor-intensive. Therefore, researchers have instead been seeking to develop effective computational algorithms to predict associations between diseases and RNAs. In this paper, we propose a novel model called Graph Attention Adversarial Network (GAAN) for the potential disease-RNA association prediction. To our best knowledge, we are among the pioneers to integrate successfully both the state-of-the-art graph convolutional networks (GCNs) and attention mechanism in our model for the prediction of disease-RNA associations. Comparing to other disease-RNA association prediction methods, GAAN is novel in conducting the computations from the aspect of global structure of disease-RNA network with graph embedding while integrating features of local neighborhoods with the attention mechanism. Moreover, GAAN uses adversarial regularization to further discover feature representation distribution of the latent nodes in disease-RNA networks. GAAN also benefits from the efficiency of deep model for the computation of big associations networks. To evaluate the performance of GAAN, we conduct experiments on networks of diseases associating with two different RNAs: MicroRNAs (miRNAs) and Long non-coding RNAs (lncRNAs). Comparisons of GAAN with several popular baseline methods on disease-RNA networks show that our novel model outperforms others by a wide margin in predicting potential disease-RNAs associations.
Collapse
Affiliation(s)
- Jinli Zhang
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China.
| | - Zongli Jiang
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China.
| | - Xiaohua Hu
- College of Computing and Informatics, Drexel University, Philadelphia, PA 19104, USA.
| | - Bo Song
- College of Computing and Informatics, Drexel University, Philadelphia, PA 19104, USA.
| |
Collapse
|
16
|
Sheng N, Cui H, Zhang T, Xuan P. Attentional multi-level representation encoding based on convolutional and variance autoencoders for lncRNA-disease association prediction. Brief Bioinform 2020; 22:5841901. [PMID: 32444875 DOI: 10.1093/bib/bbaa067] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Revised: 03/30/2020] [Accepted: 03/31/2020] [Indexed: 01/01/2023] Open
Abstract
As the abnormalities of long non-coding RNAs (lncRNAs) are closely related to various human diseases, identifying disease-related lncRNAs is important for understanding the pathogenesis of complex diseases. Most of current data-driven methods for disease-related lncRNA candidate prediction are based on diseases and lncRNAs. Those methods, however, fail to consider the deeply embedded node attributes of lncRNA-disease pairs, which contain multiple relations and representations across lncRNAs, diseases and miRNAs. Moreover, the low-dimensional feature distribution at the pairwise level has not been taken into account. We propose a prediction model, VADLP, to extract, encode and adaptively integrate multi-level representations. Firstly, a triple-layer heterogeneous graph is constructed with weighted inter-layer and intra-layer edges to integrate the similarities and correlations among lncRNAs, diseases and miRNAs. We then define three representations including node attributes, pairwise topology and feature distribution. Node attributes are derived from the graph by an embedding strategy to represent the lncRNA-disease associations, which are inferred via their common lncRNAs, diseases and miRNAs. Pairwise topology is formulated by random walk algorithm and encoded by a convolutional autoencoder to represent the hidden topological structural relations between a pair of lncRNA and disease. The new feature distribution is modeled by a variance autoencoder to reveal the underlying lncRNA-disease relationship. Finally, an attentional representation-level integration module is constructed to adaptively fuse the three representations for lncRNA-disease association prediction. The proposed model is tested over a public dataset with a comprehensive list of evaluations. Our model outperforms six state-of-the-art lncRNA-disease prediction models with statistical significance. The ablation study showed the important contributions of three representations. In particular, the improved recall rates under different top $k$ values demonstrate that our model is powerful in discovering true disease-related lncRNAs in the top-ranked candidates. Case studies of three cancers further proved the capacity of our model to discover potential disease-related lncRNAs.
Collapse
|
17
|
Zhang Y, Chen M, Li A, Cheng X, Jin H, Liu Y. LDAI-ISPS: LncRNA-Disease Associations Inference Based on Integrated Space Projection Scores. Int J Mol Sci 2020; 21:E1508. [PMID: 32098405 PMCID: PMC7073162 DOI: 10.3390/ijms21041508] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Revised: 02/18/2020] [Accepted: 02/19/2020] [Indexed: 12/14/2022] Open
Abstract
Long non-coding RNAs (long ncRNAs, lncRNAs) of all kinds have been implicated in a range of cell developmental processes and diseases, while they are not translated into proteins. Inferring diseases associated lncRNAs by computational methods can be helpful to understand the pathogenesis of diseases, but those current computational methods still have not achieved remarkable predictive performance: such as the inaccurate construction of similarity networks and inadequate numbers of known lncRNA-disease associations. In this research, we proposed a lncRNA-disease associations inference based on integrated space projection scores (LDAI-ISPS) composed of the following key steps: changing the Boolean network of known lncRNA-disease associations into the weighted networks via combining all the global information (e.g., disease semantic similarities, lncRNA functional similarities, and known lncRNA-disease associations); obtaining the space projection scores via vector projections of the weighted networks to form the final prediction scores without biases. The leave-one-out cross validation (LOOCV) results showed that, compared with other methods, LDAI-ISPS had a higher accuracy with area-under-the-curve (AUC) value of 0.9154 for inferring diseases, with AUC value of 0.8865 for inferring new lncRNAs (whose associations related to diseases are unknown), with AUC value of 0.7518 for inferring isolated diseases (whose associations related to lncRNAs are unknown). A case study also confirmed the predictive performance of LDAI-ISPS as a helper for traditional biological experiments in inferring the potential LncRNA-disease associations and isolated diseases.
Collapse
Affiliation(s)
- Yi Zhang
- School of Information Science and Engineering, Guilin University of Technology, Guilin 541004, China
| | - Min Chen
- Hunan Institute of Technology, School of Computer Science and Technology, Hengyang 421002, China
| | - Ang Li
- Hunan Institute of Technology, School of Computer Science and Technology, Hengyang 421002, China
| | - Xiaohui Cheng
- School of Information Science and Engineering, Guilin University of Technology, Guilin 541004, China
| | - Hong Jin
- School of Information Science and Engineering, Guilin University of Technology, Guilin 541004, China
| | - Yarong Liu
- School of Information Science and Engineering, Guilin University of Technology, Guilin 541004, China
| |
Collapse
|
18
|
Biswas AK, Kim DC, Kang M, Gao JX. Robust Inductive Matrix Completion Strategy to Explore Associations Between LincRNAs and Human Disease Phenotypes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:2066-2077. [PMID: 29994224 DOI: 10.1109/tcbb.2018.2844816] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Over the past few years, it has been established that a number of long intergenic non-coding RNAs (lincRNAs) are linked to a wide variety of human diseases. The relationship among many other lincRNAs still remains as puzzle. Validation of such link between the two entities through biological experiments is expensive. However, piles of information about the two are becoming available, thanks to the High Throughput Sequencing (HTS) platforms, Genome Wide Association Studies (GWAS), etc., thereby opening opportunity for cutting-edge machine learning and data mining approaches. However, there are only a few in silico lincRNA-disease association inference tools available to date, and none of these utilizes side information of both the entities. The recently developed Inductive Matrix Completion (IMC) technique provides a recommendation platform among two entities considering respective side information. But, the formulation of IMC is incapable of handling noise and outliers that may present in the dataset, while data sparsity consideration is another issue with the standard IMC method. Thus, a robust version of IMC is needed that can solve these two issues. As a remedy, in this paper, we propose Robust Inductive Matrix Completion (RIMC) using l2,1 norm loss function as well as l2,1 norm based regularization. We applied RIMC to the available association data between human lincRNAs and OMIM disease phenotypes as well as a diverse set of side information about the lincRNAs and the diseases. Our method performs better than the state-of-the-art methods in terms of precision@k and recall@k at the top- k disease prioritization to the subject lincRNAs. We also demonstrate that RIMC is equally effective for querying about novel lincRNAs, as well as predicting rank of a newly known disease for a set of well-characterized lincRNAs. Availability: All the supporting datasets are available at the publicly accessible URL located at http://biomecis.uta.edu/~ashis/res/RIMC/.
Collapse
|
19
|
Zhang H, Liang Y, Peng C, Han S, Du W, Li Y. Predicting lncRNA-disease associations using network topological similarity based on deep mining heterogeneous networks. Math Biosci 2019; 315:108229. [PMID: 31323239 DOI: 10.1016/j.mbs.2019.108229] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2018] [Revised: 05/12/2019] [Accepted: 07/16/2019] [Indexed: 12/17/2022]
Abstract
A kind of noncoding RNA with length more than 200 nucleotides named long noncoding RNA (lncRNA) has gained considerable attention in recent decades. Many studies have confirmed that human genome contains many thousands of lncRNAs. LncRNAs play significant roles in many important biological processes, including complex disease diagnosis, prognosis, prevention and treatment. For some important diseases such as cancer, lncRNAs have been novel candidate biomarkers. However, the role of lncRNAs in human diseases is still in its infancy, and only a small part of lncRNA-disease associations have been experimentally verified. Predicting lncRNA-disease association is an important way to understand the mechanism and function of lncRNA involved in diseases to enrich the annotations of lncRNA. Therefore, it is urgent to prioritize lncRNAs potentially associated with diseases. Biological system is a highly complex heterogenous network involved different molecules. Therefore, the algorithms based on network methods have been extensively applied in information fields which can provide a quantifiable characterization for the networks characterizing multifarious biological systems. A heterogeneous network topology possessing abundant interactions between biomedical entities is rarely utilized in similarity-based methods for predicting lncRNA-disease associations based on the array of varying features of lncRNAs and diseases. DeepWalk, encoding the relations of nodes in a continuous vector space, is an extension of language model and unsupervised learning from sequence-based word to network. In this article, we present a novel lncRNA-disease association prediction method based on DeepWalk, which enhances the existing association discovery methods through a topology-based similarity measure. We integrate the heterogeneous data to construct a Linked Tripartite Network which is a heterogeneous network containing three types od nodes which generated from bioinformatics linked datasets and use DeepWalk method to extract topological structure features of the nodes in the linked tripartite network for calculating similarities. Our proposed method can be separated into the following steps: Firstly, we integrate heterogeneous data to construct a Linked Tripartite Network: containing the topological interactions of known lncRNA-disease, lncRNA-microRNA and microRNA-disease. Secondly, the topological structure features of the nodes are extracted based on DeepWalk. Thirdly, similarity scores of disease-disease pairs and lncRNA-lncRNA pairs are computed based on the topology of this network. Finally, new lncRNA and disease associations are discovered by rule-based inference method with lncRNA-lncRNA similarities. Our proposed method shows superior predictive performance for prediction of lncRNA-disease associations based on topological similarity from heterogenous network. The AUC value is used to show the performance of our method. The similarity measurement using network topology based on DeepWalk provide a novel perspective which is different from the similarity derived from sequence or structure information. Availability: All the data and codes are freely availability at: https://github.com/Pengeace/lncRNA-disease-link.
Collapse
Affiliation(s)
- Hui Zhang
- College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
| | - Yanchun Liang
- College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China; Zhuhai Laboratory of Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, Zhuhai College of Jilin University, Zhuhai 519041, China
| | - Cheng Peng
- College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
| | - Siyu Han
- College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
| | - Wei Du
- College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China.
| | - Ying Li
- College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China.
| |
Collapse
|
20
|
LLCLPLDA: a novel model for predicting lncRNA-disease associations. Mol Genet Genomics 2019; 294:1477-1486. [PMID: 31250107 DOI: 10.1007/s00438-019-01590-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Accepted: 06/21/2019] [Indexed: 12/19/2022]
Abstract
Long noncoding RNAs play a significant role in the occurrence of diseases. Thus, studying the relationship prediction between lncRNAs and disease is becoming more popular. Researchers hope to determine effective treatments by revealing the occurrence and development of diseases at the molecular level. However, the traditional biological experimental way to verify the association between lncRNAs and disease is very time-consuming and expensive. Therefore, we developed a method called LLCLPLDA to predict potential lncRNA-disease associations. First, locality-constrained linear coding (LLC) is leveraged to project the features of lncRNAs and diseases to local-constraint features, and then, a label propagation (LP) strategy is used to mix up the initial association matrix and the obtained features of lncRNAs and diseases. To demonstrate the performance of our method, we compared LLCLPLDA with five methods in the leave-one-out cross-validation and fivefold cross-validation scheme, and the experimental results show that the proposed method outperforms the other five methods. Additionally, we conducted case studies on three diseases: cervical cancer, gliomas, and breast cancer. The top five predicted lncRNAs for cervical cancer and gliomas were verified, and four of the five lncRNAs for breast cancer were also confirmed.
Collapse
|
21
|
Piro RM, Marsico A. Network-Based Methods and Other Approaches for Predicting lncRNA Functions and Disease Associations. Methods Mol Biol 2019; 1912:301-321. [PMID: 30635899 DOI: 10.1007/978-1-4939-8982-9_12] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The discovery that a considerable portion of eukaryotic genomes is transcribed and gives rise to long noncoding RNAs (lncRNAs) provides an important new perspective on the transcriptome and raises questions about the centrality of these lncRNAs in gene-regulatory processes and diseases. The rapidly increasing number of mechanistically investigated lncRNAs has provided evidence for distinct functional classes, such as enhancer-like lncRNAs, which modulate gene expression via chromatin looping, and noncoding competing endogenous RNAs (ceRNAs), which act as microRNA decoys. Despite great progress in the last years, the majority of lncRNAs are functionally uncharacterized and their implication for disease biogenesis and progression is unknown. Here, we summarize recent developments in lncRNA function prediction in general and lncRNA-disease associations in particular, with emphasis on in silico methods based on network analysis and on ceRNA function prediction. We believe that such computational techniques provide a valuable aid to prioritize functional lncRNAs or disease-relevant lncRNAs for targeted, experimental follow-up studies.
Collapse
Affiliation(s)
- Rosario Michael Piro
- Institut für Informatik, Freie Universität Berlin, Berlin, Germany.,Institut für Medizinische Genetik und Humangenetik, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Annalisa Marsico
- Institut für Informatik, Freie Universität Berlin, Berlin, Germany. .,Max-Planck-Institut für molekulare Genetik, Berlin, Germany.
| |
Collapse
|
22
|
Zhang J, Zhang Z, Chen Z, Deng L. Integrating Multiple Heterogeneous Networks for Novel LncRNA-Disease Association Inference. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:396-406. [PMID: 28489543 DOI: 10.1109/tcbb.2017.2701379] [Citation(s) in RCA: 86] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Accumulating experimental evidence has indicated that long non-coding RNAs (lncRNAs) are critical for the regulation of cellular biological processes implicated in many human diseases. However, only relatively few experimentally supported lncRNA-disease associations have been reported. Developing effective computational methods to infer lncRNA-disease associations is becoming increasingly important. Current network-based algorithms typically use a network representation to identify novel associations between lncRNAs and diseases. But these methods are concentrated on specific entities of interest (lncRNAs and diseases) and they do not allow to consider networks with more than two types of entities. Considering the limitations in previous computational methods, we develop a new global network-based framework, LncRDNetFlow, to prioritize disease-related lncRNAs. LncRDNetFlow utilizes a flow propagation algorithm to integrate multiple networks based on a variety of biological information including lncRNA similarity, protein-protein interactions, disease similarity, and the associations between them to infer lncRNA-disease associations. We show that LncRDNetFlow performs significantly better than the existing state-of-the-art approaches in cross-validation. To further validate the reproducibility of the performance, we use the proposed method to identify the related lncRNAs for ovarian cancer, glioma, and cervical cancer. The results are encouraging. Many predicted lncRNAs in the top list have been verified by the biological studies.
Collapse
|
23
|
Lan W, Huang L, Lai D, Chen Q. Identifying Interactions Between Long Noncoding RNAs and Diseases Based on Computational Methods. Methods Mol Biol 2019. [PMID: 29536445 DOI: 10.1007/978-1-4939-7717-8_12] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
With the development and improvement of next-generation sequencing technology, a great number of noncoding RNAs have been discovered. Long noncoding RNAs (lncRNAs) are the biggest kind of noncoding RNAs with more than 200 nt nucleotides in length. There are increasing evidences showing that lncRNAs play key roles in many biological processes. Therefore, the mutation and dysregulation of lncRNAs have close association with a number of complex human diseases. Identifying the most likely interaction between lncRNAs and diseases becomes a fundamental challenge in human health. A common view is that lncRNAs with similar function tend to be related to phenotypic similar diseases. In this chapter, we firstly introduce the concept of lncRNA, their biological features, and available data resources. Further, the recent computational approaches are explored to identify interactions between long noncoding RNAs and diseases, including their advantages and disadvantages. The key issues and potential future works of predicting interactions between long noncoding RNAs and diseases are also discussed.
Collapse
Affiliation(s)
- Wei Lan
- School of Computer, Electronics and Information, Guangxi University, Nanning, China
| | - Liyu Huang
- Information and Network Center, Guangxi University, Nanning, China
| | - Dehuan Lai
- School of Computer, Electronics and Information, Guangxi University, Nanning, China
| | - Qingfeng Chen
- School of Computer, Electronics and Information, Guangxi University, Nanning, China. .,State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, Guangxi University, Nanning, China.
| |
Collapse
|
24
|
Xie G, Huang Z, Liu Z, Lin Z, Ma L. NCPHLDA: a novel method for human lncRNA–disease association prediction based on network consistency projection. Mol Omics 2019; 15:442-450. [DOI: 10.1039/c9mo00092e] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
In recent years, an increasing number of biological experiments and clinical reports have shown that lncRNA is closely related to the development of various complex human diseases.
Collapse
Affiliation(s)
- Guobo Xie
- School of Computer Science
- Guangdong University of Technology
- Guangzhou
- China
| | - Zecheng Huang
- School of Computer Science
- Guangdong University of Technology
- Guangzhou
- China
| | - Zhenguo Liu
- Department of Thoracic Surgery
- The First Affiliated Hospital of Sun Yat-sen University
- Guangzhou
- China
| | - Zhiyi Lin
- School of Computer Science
- Guangdong University of Technology
- Guangzhou
- China
| | - Lei Ma
- Institute of Automation
- Chinese Academy of Sciences
- Beijing
- China
| |
Collapse
|
25
|
Manzanarez-Ozuna E, Flores DL, Gutiérrez-López E, Cervantes D, Juárez P. Model based on GA and DNN for prediction of mRNA-Smad7 expression regulated by miRNAs in breast cancer. Theor Biol Med Model 2018; 15:24. [PMID: 30594253 PMCID: PMC6310970 DOI: 10.1186/s12976-018-0095-8] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2018] [Accepted: 11/30/2018] [Indexed: 01/06/2023] Open
Abstract
Background The Smad7 protein is negative regulator of the TGF-β signaling pathway, which is upregulated in patients with breast cancer. miRNAs regulate proteins expressions by arresting or degrading the mRNAs. The purpose of this work is to identify a miRNAs profile that regulates the expression of the mRNA coding for Smad7 in breast cancer using the data from patients with breast cancer obtained from the Cancer Genome Atlas Project. Methods We develop an automatic search method based on genetic algorithms to find a predictive model based on deep neural networks (DNN) which fit the set of biological data and apply the Olden algorithm to identify the relative importance of each miRNAs. Results A computational model of non-linear regression is shown, based on deep neural networks that predict the regulation given by the miRNA target transcripts mRNA coding for Smad7 protein in patients with breast cancer, with R2 of 0.99 is shown and MSE of 0.00001. In addition, the model is validated with the results in vivo and in vitro experiments reported in the literature. The set of miRNAs hsa-mir-146a, hsa-mir-93, hsa-mir-375, hsa-mir-205, hsa-mir-15a, hsa-mir-21, hsa-mir-20a, hsa-mir-503, hsa-mir-29c, hsa-mir-497, hsa-mir-107, hsa-mir-125a, hsa-mir-200c, hsa-mir-212, hsa-mir-429, hsa-mir-34a, hsa-let-7c, hsa-mir-92b, hsa-mir-33a, hsa-mir-15b, hsa-mir-224, hsa-mir-185 and hsa-mir-10b integrate a profile that critically regulates the expression of the mRNA coding for Smad7 in breast cancer. Conclusions We developed a genetic algorithm to select best features as DNN inputs (miRNAs). The genetic algorithm also builds the best DNN architecture by optimizing the parameters. Although the confirmation of the results by laboratory experiments has not occurred, the results allow suggesting that miRNAs profile could be used as biomarkers or targets in targeted therapies. Electronic supplementary material The online version of this article (10.1186/s12976-018-0095-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Edgar Manzanarez-Ozuna
- Universidad Autónoma de Baja California, Carretera Transpeninsular Ensenada-Tijuana 3917 Colonia Playitas, C.P. 22860, Ensenada, B.C., Mexico
| | - Dora-Luz Flores
- Universidad Autónoma de Baja California, Carretera Transpeninsular Ensenada-Tijuana 3917 Colonia Playitas, C.P. 22860, Ensenada, B.C., Mexico.
| | - Everardo Gutiérrez-López
- Universidad Autónoma de Baja California, Carretera Transpeninsular Ensenada-Tijuana 3917 Colonia Playitas, C.P. 22860, Ensenada, B.C., Mexico
| | - David Cervantes
- Universidad Autónoma de Baja California, Carretera Transpeninsular Ensenada-Tijuana 3917 Colonia Playitas, C.P. 22860, Ensenada, B.C., Mexico
| | - Patricia Juárez
- Centro de Investigación Científica y de Educación Superior de Ensenada, Carretera Ensenada-Tijuana No. 3918, Zona Playitas, C.P. 22860, Ensenada, B.C., Mexico
| |
Collapse
|
26
|
Multiple Linear Regression Analysis of lncRNA-Disease Association Prediction Based on Clinical Prognosis Data. BIOMED RESEARCH INTERNATIONAL 2018; 2018:3823082. [PMID: 30643802 PMCID: PMC6311254 DOI: 10.1155/2018/3823082] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/20/2018] [Revised: 10/23/2018] [Accepted: 11/05/2018] [Indexed: 01/06/2023]
Abstract
Long noncoding RNAs (lncRNAs) have an important role in various life processes of the body, especially cancer. The analysis of disease prognosis is ignored in current prediction on lncRNA-disease associations. In this study, a multiple linear regression model was constructed for lncRNA-disease association prediction based on clinical prognosis data (MlrLDAcp), which integrated the cancer data of clinical prognosis and the expression quantity of lncRNA transcript. MlrLDAcp could realize not only cancer survival prediction but also lncRNA-disease association prediction. Ultimately, 60 lncRNAs most closely related to prostate cancer survival were selected from 481 alternative lncRNAs. Then, the multiple linear regression relationship between the prognosis survival of 176 patients with prostate cancer and 60 lncRNAs was also given. Compared with previous studies, MlrLDAcp had a predominant survival predictive ability and could effectively predict lncRNA-disease associations. MlrLDAcp had an area under the curve (AUC) value of 0.875 for survival prediction and an AUC value of 0.872 for lncRNA-disease association prediction. It could be an effective biological method for biomedical research.
Collapse
|
27
|
Le DH, Dao LTM. Annotating Diseases Using Human Phenotype Ontology Improves Prediction of Disease-Associated Long Non-coding RNAs. J Mol Biol 2018; 430:2219-2230. [PMID: 29758261 DOI: 10.1016/j.jmb.2018.05.006] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2017] [Revised: 04/28/2018] [Accepted: 05/05/2018] [Indexed: 01/13/2023]
Abstract
Recently, many long non-coding RNAs (lncRNAs) have been identified and their biological function has been characterized; however, our understanding of their underlying molecular mechanisms related to disease is still limited. To overcome the limitation in experimentally identifying disease-lncRNA associations, computational methods have been proposed as a powerful tool to predict such associations. These methods are usually based on the similarities between diseases or lncRNAs since it was reported that similar diseases are associated with functionally similar lncRNAs. Therefore, prediction performance is highly dependent on how well the similarities can be captured. Previous studies have calculated the similarity between two diseases by mapping exactly each disease to a single Disease Ontology (DO) term, and then use a semantic similarity measure to calculate the similarity between them. However, the problem of this approach is that a disease can be described by more than one DO terms. Until now, there is no annotation database of DO terms for diseases except for genes. In contrast, Human Phenotype Ontology (HPO) is designed to fully annotate human disease phenotypes. Therefore, in this study, we constructed disease similarity networks/matrices using HPO instead of DO. Then, we used these networks/matrices as inputs of two representative machine learning-based and network-based ranking algorithms, that is, regularized least square and heterogeneous graph-based inference, respectively. The results showed that the prediction performance of the two algorithms on HPO-based is better than that on DO-based networks/matrices. In addition, our method can predict 11 novel cancer-associated lncRNAs, which are supported by literature evidence.
Collapse
Affiliation(s)
- Duc-Hau Le
- School of Computer Science and Engineering, Thuyloi University, 175 Tay Son, Dong Da, Hanoi, Vietnam; Vinmec Research Institute of Stem Cell and Gene Technology, 458 Minh Khai, Hai Ba Trung, Hanoi, Vietnam.
| | - Lan T M Dao
- Vinmec Research Institute of Stem Cell and Gene Technology, 458 Minh Khai, Hai Ba Trung, Hanoi, Vietnam
| |
Collapse
|
28
|
Zhou J, Shi YY. A Bipartite Network and Resource Transfer-Based Approach to Infer lncRNA-Environmental Factor Associations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:753-759. [PMID: 28436883 DOI: 10.1109/tcbb.2017.2695187] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Phenotypes and diseases are often determined by the complex interactions between genetic factors and environmental factors (EFs). However, compared with protein-coding genes and microRNAs, there is a paucity of computational methods for understanding the associations between long non-coding RNAs (lncRNAs) and EFs. In this study, we focused on the associations between lncRNA and EFs. By using the common miRNA partners of any pair of lncRNA and EF, based on the competing endogenous RNA (ceRNA) hypothesis and the technique of resources transfer within the experimentally-supported lncRNA-miRNA and miRNA-EF association bipartite networks, we propose an algorithm for predicting new lncRNA-EF associations. Results show that, compared with another recently-proposed method, our approach is capable of predicting more credible lncRNA-EF associations. These results support the validity of our approach to predict biologically significant associations, which could lead to a better understanding of the molecular processes.
Collapse
|
29
|
Ding L, Wang M, Sun D, Li A. TPGLDA: Novel prediction of associations between lncRNAs and diseases via lncRNA-disease-gene tripartite graph. Sci Rep 2018; 8:1065. [PMID: 29348552 PMCID: PMC5773503 DOI: 10.1038/s41598-018-19357-3] [Citation(s) in RCA: 63] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2017] [Accepted: 12/28/2017] [Indexed: 12/29/2022] Open
Abstract
Accumulating evidences have indicated that lncRNAs play an important role in various human complex diseases. However, known disease-related lncRNAs are still comparatively small in number, and experimental identification is time-consuming and labor-intensive. Therefore, developing a useful computational method for inferring potential associations between lncRNAs and diseases has become a hot topic, which can significantly help people to explore complex human diseases at the molecular level and effectively advance the quality of disease diagnostics, therapy, prognosis and prevention. In this paper, we propose a novel prediction of lncRNA-disease associations via lncRNA-disease-gene tripartite graph (TPGLDA), which integrates gene-disease associations with lncRNA-disease associations. Compared to previous studies, TPGLDA can be used to better delineate the heterogeneity of coding-non-coding genes-disease association and can effectively identify potential lncRNA-disease associations. After implementing the leave-one-out cross validation, TPGLDA achieves an AUC value of 93.9% which demonstrates its good predictive performance. Moreover, the top 5 predicted rankings of lung cancer, hepatocellular carcinoma and ovarian cancer are manually confirmed by different relevant databases and literatures, affording convincing evidence of the good performance as well as potential value of TPGLDA in identifying potential lncRNA-disease associations. Matlab and R codes of TPGLDA can be found at following: https://github.com/USTC-HIlab/TPGLDA .
Collapse
Affiliation(s)
- Liang Ding
- School of Information Science and Technology, University of Science and Technology of China, Hefei, AH230027, China
| | - Minghui Wang
- School of Information Science and Technology, University of Science and Technology of China, Hefei, AH230027, China.
- Centers for Biomedical Engineering, University of Science and Technology of China, Hefei, AH230027, China.
| | - Dongdong Sun
- School of Information Science and Technology, University of Science and Technology of China, Hefei, AH230027, China
| | - Ao Li
- School of Information Science and Technology, University of Science and Technology of China, Hefei, AH230027, China
- Centers for Biomedical Engineering, University of Science and Technology of China, Hefei, AH230027, China
| |
Collapse
|
30
|
Biswas AK, Kim D, Kang M, Ding C, Gao JX. Stable solution to l 2,1-based robust inductive matrix completion and its application in linking long noncoding RNAs to human diseases. BMC Med Genomics 2017; 10:77. [PMID: 29297358 PMCID: PMC5751820 DOI: 10.1186/s12920-017-0310-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Backgrounds A large number of long intergenic non-coding RNAs (lincRNAs) are linked to a broad spectrum of human diseases. The disease association with many other lincRNAs still remain as puzzle. Validation of such links between the two entities through biological experiments are expensive. However, a plethora lincRNA-data are available now, thanks to the High Throughput Sequencing (HTS) platforms, Genome Wide Association Studies (GWAS), etc, which opens the opportunity for cutting-edge machine learning and data mining approaches to extract meaningful relationships among lincRNAs and diseases. However, there are only a few in silico lincRNA-disease association inference tools available to date, and none of them utilizes side information of both the entities simultaneously in a single framework. Methods The recently developed Inductive Matrix Completion (IMC) technique provides a recommendation platform among two entities considering respective side information about them. However, the formulation of IMC is incapable of handling noise and outliers that may be present in the datasets, while data sparsity consideration is another issue with the standard IMC method. Thus, a robust version of IMC is needed that can solve the two issues. As a remedy, in this paper, we propose Stable Robust Inductive Matrix Completion (SRIMC) that utilizes the l2,1 norm based regularization to optimize the objective function with a unique 2-step stable solution approach. Results We applied SRIMC to the available association data between human lincRNAs and OMIM disease phenotypes as well as a diverse set of side information about the lincRNAs and the diseases. The method performs better than the state-of-the-art methods in terms of precision@k and recall@k at the top-k disease prioritization to the subject lincRNAs. We also demonstrate that SRIMC is equally effective for querying about novel lincRNAs, as well as predicting rank of a newly known disease for a set of well-characterized lincRNAs. Conclusions With the experimental results and computational evaluation, we show that SRIMC is robust in handling datasets with noise and outliers as well as dealing with novel lincRNAs and disease phenotypes.
Collapse
Affiliation(s)
- Ashis Kumer Biswas
- Department of Computer Science and Engineering, University of Colorado Denver, Denver, 80204, Colorado, USA
| | - Dongchul Kim
- Department of Computer Science, University of Rio Grande Valley, Edinburg, 78541, Texas, USA
| | - Mingon Kang
- Department of Computer Science, Kennesaw State University, Marietta, 30060, Georgia, USA
| | - Chris Ding
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, 76019, Texas, USA
| | - Jean X Gao
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, 76019, Texas, USA.
| |
Collapse
|
31
|
Chen X, Yan CC, Zhang X, You ZH. Long non-coding RNAs and complex diseases: from experimental results to computational models. Brief Bioinform 2017; 18:558-576. [PMID: 27345524 PMCID: PMC5862301 DOI: 10.1093/bib/bbw060] [Citation(s) in RCA: 312] [Impact Index Per Article: 39.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2016] [Indexed: 02/07/2023] Open
Abstract
LncRNAs have attracted lots of attentions from researchers worldwide in recent decades. With the rapid advances in both experimental technology and computational prediction algorithm, thousands of lncRNA have been identified in eukaryotic organisms ranging from nematodes to humans in the past few years. More and more research evidences have indicated that lncRNAs are involved in almost the whole life cycle of cells through different mechanisms and play important roles in many critical biological processes. Therefore, it is not surprising that the mutations and dysregulations of lncRNAs would contribute to the development of various human complex diseases. In this review, we first made a brief introduction about the functions of lncRNAs, five important lncRNA-related diseases, five critical disease-related lncRNAs and some important publicly available lncRNA-related databases about sequence, expression, function, etc. Nowadays, only a limited number of lncRNAs have been experimentally reported to be related to human diseases. Therefore, analyzing available lncRNA–disease associations and predicting potential human lncRNA–disease associations have become important tasks of bioinformatics, which would benefit human complex diseases mechanism understanding at lncRNA level, disease biomarker detection and disease diagnosis, treatment, prognosis and prevention. Furthermore, we introduced some state-of-the-art computational models, which could be effectively used to identify disease-related lncRNAs on a large scale and select the most promising disease-related lncRNAs for experimental validation. We also analyzed the limitations of these models and discussed the future directions of developing computational models for lncRNA research.
Collapse
Affiliation(s)
- Xing Chen
- School of Information and Electrical Engineering, China University of Mining and Technology, Xuzhou, China
- Corresponding authors. Xing Chen, School of Information and Electrical Engineering, China University of Mining and Technology, Xuzhou 221116, China. E-mail: ; Zhu-Hong You, School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China. E-mail:
| | | | - Xu Zhang
- School of Mechanical, Electrical & Information Engineering, Shandong University, Weihai, China
- Corresponding authors. Xing Chen, School of Information and Electrical Engineering, China University of Mining and Technology, Xuzhou 221116, China. E-mail: ; Zhu-Hong You, School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China. E-mail:
| | - Zhu-Hong You
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China
| |
Collapse
|
32
|
Huang YA, Chen X, You ZH, Huang DS, Chan KCC. ILNCSIM: improved lncRNA functional similarity calculation model. Oncotarget 2017; 7:25902-14. [PMID: 27028993 PMCID: PMC5041953 DOI: 10.18632/oncotarget.8296] [Citation(s) in RCA: 100] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2016] [Accepted: 03/04/2016] [Indexed: 12/15/2022] Open
Abstract
Increasing observations have indicated that lncRNAs play a significant role in various critical biological processes and the development and progression of various human diseases. Constructing lncRNA functional similarity networks could benefit the development of computational models for inferring lncRNA functions and identifying lncRNA-disease associations. However, little effort has been devoted to quantifying lncRNA functional similarity. In this study, we developed an Improved LNCRNA functional SIMilarity calculation model (ILNCSIM) based on the assumption that lncRNAs with similar biological functions tend to be involved in similar diseases. The main improvement comes from the combination of the concept of information content and the hierarchical structure of disease directed acyclic graphs for disease similarity calculation. ILNCSIM was combined with the previously proposed model of Laplacian Regularized Least Squares for lncRNA-Disease Association to further evaluate its performance. As a result, new model obtained reliable performance in the leave-one-out cross validation (AUCs of 0.9316 and 0.9074 based on MNDR and Lnc2cancer databases, respectively), and 5-fold cross validation (AUCs of 0.9221 and 0.9033 for MNDR and Lnc2cancer databases), which significantly improved the prediction performance of previous models. It is anticipated that ILNCSIM could serve as an effective lncRNA function prediction model for future biomedical researches.
Collapse
Affiliation(s)
- Yu-An Huang
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, 518060, China
| | - Xing Chen
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, China.,National Center for Mathematics and Interdisciplinary Sciences, Chinese Academy of Sciences, Beijing, 100190, China
| | - Zhu-Hong You
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116, China
| | - De-Shuang Huang
- School of Electronics and Information Engineering, Tongji University, Shanghai, 201804, China
| | - Keith C C Chan
- Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China
| |
Collapse
|
33
|
Peng H, Lan C, Liu Y, Liu T, Blumenstein M, Li J. Chromosome preference of disease genes and vectorization for the prediction of non-coding disease genes. Oncotarget 2017; 8:78901-78916. [PMID: 29108274 PMCID: PMC5668007 DOI: 10.18632/oncotarget.20481] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2017] [Accepted: 07/19/2017] [Indexed: 12/15/2022] Open
Abstract
Disease-related protein-coding genes have been widely studied, but disease-related non-coding genes remain largely unknown. This work introduces a new vector to represent diseases, and applies the newly vectorized data for a positive-unlabeled learning algorithm to predict and rank disease-related long non-coding RNA (lncRNA) genes. This novel vector representation for diseases consists of two sub-vectors, one is composed of 45 elements, characterizing the information entropies of the disease genes distribution over 45 chromosome substructures. This idea is supported by our observation that some substructures (e.g., the chromosome 6 p-arm) are highly preferred by disease-related protein coding genes, while some (e.g., the 21 p-arm) are not favored at all. The second sub-vector is 30-dimensional, characterizing the distribution of disease gene enriched KEGG pathways in comparison with our manually created pathway groups. The second sub-vector complements with the first one to differentiate between various diseases. Our prediction method outperforms the state-of-the-art methods on benchmark datasets for prioritizing disease related lncRNA genes. The method also works well when only the sequence information of an lncRNA gene is known, or even when a given disease has no currently recognized long non-coding genes.
Collapse
Affiliation(s)
- Hui Peng
- Advanced Analytics Institute & Centre for Health Technologies, University of Technology Sydney, Broadway, NSW, Australia
| | - Chaowang Lan
- Advanced Analytics Institute & Centre for Health Technologies, University of Technology Sydney, Broadway, NSW, Australia
| | - Yuansheng Liu
- Advanced Analytics Institute & Centre for Health Technologies, University of Technology Sydney, Broadway, NSW, Australia
| | - Tao Liu
- Centre for Childhood Cancer Research, University of New South Wales, Sydney, Kensington, NSW, Australia
| | - Michael Blumenstein
- School of Software, University of Technology Sydney, Broadway, NSW, Australia
| | - Jinyan Li
- Advanced Analytics Institute & Centre for Health Technologies, University of Technology Sydney, Broadway, NSW, Australia
| |
Collapse
|
34
|
Gu C, Liao B, Li X, Cai L, Li Z, Li K, Yang J. Global network random walk for predicting potential human lncRNA-disease associations. Sci Rep 2017; 7:12442. [PMID: 28963512 PMCID: PMC5622075 DOI: 10.1038/s41598-017-12763-z] [Citation(s) in RCA: 49] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2017] [Accepted: 09/14/2017] [Indexed: 12/13/2022] Open
Abstract
There is more and more evidence that the mutation and dysregulation of long non-coding RNA (lncRNA) are associated with numerous diseases, including cancers. However, experimental methods to identify associations between lncRNAs and diseases are expensive and time-consuming. Effective computational approaches to identify disease-related lncRNAs are in high demand; and would benefit the detection of lncRNA biomarkers for disease diagnosis, treatment, and prevention. In light of some limitations of existing computational methods, we developed a global network random walk model for predicting lncRNA-disease associations (GrwLDA) to reveal the potential associations between lncRNAs and diseases. GrwLDA is a universal network-based method and does not require negative samples. This method can be applied to a disease with no known associated lncRNA (isolated disease) and to lncRNA with no known associated disease (novel lncRNA). The leave-one-out cross validation (LOOCV) method was implemented to evaluate the predicted performance of GrwLDA. As a result, GrwLDA obtained reliable AUCs of 0.9449, 0.8562, and 0.8374 for overall, novel lncRNA and isolated disease prediction, respectively, significantly outperforming previous methods. Case studies of colon, gastric, and kidney cancers were also implemented, and the top 5 disease-lncRNA associations were reported for each disease. Interestingly, 13 (out of the 15) associations were confirmed by literature mining.
Collapse
Affiliation(s)
- Changlong Gu
- College of Information Science and Engineering, Hunan University, Changsha, Hunan, 410082, China
| | - Bo Liao
- College of Information Science and Engineering, Hunan University, Changsha, Hunan, 410082, China.
| | - Xiaoying Li
- College of Information Science and Engineering, Hunan University, Changsha, Hunan, 410082, China
| | - Lijun Cai
- College of Information Science and Engineering, Hunan University, Changsha, Hunan, 410082, China
| | - Zejun Li
- College of Information Science and Engineering, Hunan University, Changsha, Hunan, 410082, China.,School of Computer and Information Science, Hunan Institute of Technology, Hengyang, 412002, China
| | - Keqin Li
- Department of Computer Science, State University of New York, New Paltz, New York, 12561, USA
| | - Jialiang Yang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, 10029, USA
| |
Collapse
|
35
|
Wang P, Guo Q, Gao Y, Zhi H, Zhang Y, Liu Y, Zhang J, Yue M, Guo M, Ning S, Zhang G, Li X. Improved method for prioritization of disease associated lncRNAs based on ceRNA theory and functional genomics data. Oncotarget 2017; 8:4642-4655. [PMID: 27992375 PMCID: PMC5354861 DOI: 10.18632/oncotarget.13964] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2016] [Accepted: 12/07/2016] [Indexed: 02/01/2023] Open
Abstract
Although several computational models that predict disease-associated lncRNAs (long non-coding RNAs) exist, only a limited number of disease-associated lncRNAs are known. In this study, we mapped lncRNAs to their functional genomics context using competing endogenous RNAs (ceRNAs) theory. Based on the criteria that similar lncRNAs are likely involved in similar diseases, we proposed a disease lncRNA prioritization method, DisLncPri, to identify novel disease-lncRNA associations. Using a leave-one-out cross validation (LOOCV) strategy, DisLncPri achieved reliable area under curve (AUC) values of 0.89 and 0.87 for the LncRNADisease and Lnc2Cancer datasets that further improved to 0.90 and 0.89 by integrating a multiple rank fusion strategy. We found that DisLncPri had the highest rank enrichment score and AUC value in comparison to several other methods for case studies of alzheimer's disease, ovarian cancer, pancreatic cancer and gastric cancer. Several novel lncRNAs in the top ranks of these diseases were found to be newly verified by relevant databases or reported in recent studies. Prioritization of lncRNAs from a microarray (GSE53622) of oesophageal cancer patients highlighted ENSG00000226029 (top 2), a previously unidentified lncRNA as a potential prognostic biomarker. Our analysis thus indicates that DisLncPri is an excellent tool for identifying lncRNAs that could be novel biomarkers and therapeutic targets in a variety of human diseases.
Collapse
Affiliation(s)
- Peng Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Qiuyan Guo
- The First Affiliated Hospital, Harbin Medical University, Harbin, China
| | - Yue Gao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Hui Zhi
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Yan Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Yue Liu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Jizhou Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Ming Yue
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Maoni Guo
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Shangwei Ning
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
- Key Laboratory of Cardiovascular Medicine Research, Harbin Medical University, Ministry of Education, China
| | - Guangmei Zhang
- The First Affiliated Hospital, Harbin Medical University, Harbin, China
| | - Xia Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
- Key Laboratory of Cardiovascular Medicine Research, Harbin Medical University, Ministry of Education, China
| |
Collapse
|
36
|
A Digital Communication Analysis of Gene Expression of Proteins in Biological Systems: A Layered Network Model View. Cognit Comput 2016. [DOI: 10.1007/s12559-016-9434-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
37
|
Wang J, Ma R, Ma W, Chen J, Yang J, Xi Y, Cui Q. LncDisease: a sequence based bioinformatics tool for predicting lncRNA-disease associations. Nucleic Acids Res 2016; 44:e90. [PMID: 26887819 PMCID: PMC4872090 DOI: 10.1093/nar/gkw093] [Citation(s) in RCA: 57] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2015] [Revised: 02/04/2016] [Accepted: 02/06/2016] [Indexed: 02/07/2023] Open
Abstract
LncRNAs represent a large class of noncoding RNA molecules that have important functions and play key roles in a variety of human diseases. There is an urgent need to develop bioinformatics tools as to gain insight into lncRNAs. This study developed a sequence-based bioinformatics method, LncDisease, to predict the lncRNA-disease associations based on the crosstalk between lncRNAs and miRNAs. Using LncDisease, we predicted the lncRNAs associated with breast cancer and hypertension. The breast-cancer-associated lncRNAs were studied in two breast tumor cell lines, MCF-7 and MDA-MB-231. The qRT-PCR results showed that 11 (91.7%) of the 12 predicted lncRNAs could be validated in both breast cancer cell lines. The hypertension-associated lncRNAs were further evaluated in human vascular smooth muscle cells (VSMCs) stimulated with angiotensin II (Ang II). The qRT-PCR results showed that 3 (75.0%) of the 4 predicted lncRNAs could be validated in Ang II-treated human VSMCs. In addition, we predicted 6 diseases associated with the lncRNA GAS5 and validated 4 (66.7%) of them by literature mining. These results greatly support the specificity and efficacy of LncDisease in the study of lncRNAs in human diseases. The LncDisease software is freely available on the Software Page: http://www.cuilab.cn/.
Collapse
Affiliation(s)
- Junyi Wang
- Department of Biomedical Informatics, School of Basic Medical Sciences, Peking University, 38 Xueyuan Rd, Beijing 100191, China MOE Key Lab of Cardiovascular Sciences, Peking University, 38 Xueyuan Road, Beijing 100191, China
| | - Ruixia Ma
- Mitchell Cancer Institute, University of South Alabama, 1160 Springhill Ave, Mobile, AL 36604, USA
| | - Wei Ma
- Department of Biomedical Informatics, School of Basic Medical Sciences, Peking University, 38 Xueyuan Rd, Beijing 100191, China MOE Key Lab of Cardiovascular Sciences, Peking University, 38 Xueyuan Road, Beijing 100191, China Department of Physiology and Pathophysiology, School of Basic Medical Sciences, Peking University, 38 Xueyuan Rd, Beijing 100191, China
| | - Ji Chen
- MOE Key Lab of Cardiovascular Sciences, Peking University, 38 Xueyuan Road, Beijing 100191, China Department of Physiology and Pathophysiology, School of Basic Medical Sciences, Peking University, 38 Xueyuan Rd, Beijing 100191, China
| | - Jichun Yang
- MOE Key Lab of Cardiovascular Sciences, Peking University, 38 Xueyuan Road, Beijing 100191, China Department of Physiology and Pathophysiology, School of Basic Medical Sciences, Peking University, 38 Xueyuan Rd, Beijing 100191, China
| | - Yaguang Xi
- Mitchell Cancer Institute, University of South Alabama, 1160 Springhill Ave, Mobile, AL 36604, USA
| | - Qinghua Cui
- Department of Biomedical Informatics, School of Basic Medical Sciences, Peking University, 38 Xueyuan Rd, Beijing 100191, China MOE Key Lab of Cardiovascular Sciences, Peking University, 38 Xueyuan Road, Beijing 100191, China Department of Physiology and Pathophysiology, School of Basic Medical Sciences, Peking University, 38 Xueyuan Rd, Beijing 100191, China Beijing Key Laboratory of Tumor Systems Biology, Peking University, 38 Xueyuan Road, Beijing 100191, China
| |
Collapse
|
38
|
Xiao H, Yuan Z, Guo D, Hou B, Yin C, Zhang W, Li F. Genome-wide identification of long noncoding RNA genes and their potential association with fecundity and virulence in rice brown planthopper, Nilaparvata lugens. BMC Genomics 2015; 16:749. [PMID: 26437919 PMCID: PMC4594746 DOI: 10.1186/s12864-015-1953-y] [Citation(s) in RCA: 57] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2015] [Accepted: 09/23/2015] [Indexed: 12/26/2022] Open
Abstract
Background The functional repertoire of long noncoding RNA (lncRNA) has been characterized in several model organisms, demonstrating that lncRNA plays important roles in fundamental biological processes. However, they remain largely unidentified in most species. Understanding the characteristics and functions of lncRNA in insects would be useful for insect resources utilization and sustainable pest control. Methods A computational pipeline was developed to identify lncRNA genes in the rice brown planthopper, Nilaparvata lugens, a destructive rice pest causing huge yield losses. Strand specific RT-PCR were used to determine the transcription orientation of lncRNAs. Results In total, 2,439 lncRNA transcripts corresponding to 1,882 loci were detected from 12 whole transcriptomes (RNA-seq) datasets, including samples from high fecundity (HFP), low fecundity (LFP), I87i and C89i populations, in addition Mudgo and TN1 virulence strains. The identified N. lugens lncRNAs had low sequence similarities with other known lncRNAs. However, their structural features were similar with mammalian counterparts. N. lugens lncRNAs had shorter transcripts than protein-coding genes due to the lower exon number though their exons and introns were longer. Only 19.9% of N. lugens lncRNAs had multiple alternatively spliced isoforms. We observed biases in the genome location of N. lugens lncRNAs. More than 30% of the lncRNAs overlapped with known protein-coding genes. These lncRNAs tend to be co-expressed with their neighboring genes (Pearson correlation, p < 0.01, T-test) and might interact with adjacent protein-coding genes. In total, 19-148 lncRNAs were specifically-expressed in the samples of HFP, LFP, Mudgo, TN1, I87i and C89i populations. Three lncRNAs specifically expressed in HFP and LFP populations overlapped with reproductive-associated genes. Discussion The structural features of N. lugens lncRNAs are similar to mammalian counterparts. Coexpression and function analysis suggeste that N. lugens lncRNAs might have important functions in high fecundity and virulence adaptability. Conclusions This study provided the first catalog of lncRNA genes in rice brown planthopper. Gene expression and genome location analysis indicated that lncRNAs might play important roles in high fecundity and virulence adaptation in N. lugens. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1953-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Huamei Xiao
- Department of Entomology, College of Plant protection, Nanjing Agricultural University, Nanjing, 210095, China.,Department of City Construction, Shaoyang University, Shaoyang, 422000, China
| | - Zhuting Yuan
- Department of Entomology, College of Plant protection, Nanjing Agricultural University, Nanjing, 210095, China
| | - Dianhao Guo
- Department of Entomology, College of Plant protection, Nanjing Agricultural University, Nanjing, 210095, China
| | - Bofeng Hou
- Department of Entomology, College of Plant protection, Nanjing Agricultural University, Nanjing, 210095, China
| | - Chuanlin Yin
- Department of Entomology, College of Plant protection, Nanjing Agricultural University, Nanjing, 210095, China
| | - Wenqing Zhang
- State Key Laboratory for Biocontrol/Institute of Entomology, Sun Yat Sen University, Guangzhou, 510275, China
| | - Fei Li
- Department of Entomology, College of Plant protection, Nanjing Agricultural University, Nanjing, 210095, China. .,Ministry of Agriculture Key Lab of Agricultural Entomology, Institute of Insect Sciences, Zhejiang University, 866 Yuhangtang Road, Hangzhou, 310058, China.
| |
Collapse
|