51
|
Dai Z, Ren J, Tong X, Hu H, Lu K, Dai F, Han MJ. The Landscapes of Full-Length Transcripts and Splice Isoforms as Well as Transposons Exonization in the Lepidopteran Model System, Bombyx mori. Front Genet 2021; 12:704162. [PMID: 34594358 PMCID: PMC8476886 DOI: 10.3389/fgene.2021.704162] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2021] [Accepted: 09/01/2021] [Indexed: 11/13/2022] Open
Abstract
The domesticated silkworm, Bombyx mori, is an important model system for the order Lepidoptera. Currently, based on third-generation sequencing, the chromosome-level genome of Bombyx mori has been released. However, its transcripts were mainly assembled by using short reads of second-generation sequencing and expressed sequence tags which cannot explain the transcript profile accurately. Here, we used PacBio Iso-Seq technology to investigate the transcripts from 45 developmental stages of Bombyx mori. We obtained 25,970 non-redundant high-quality consensus isoforms capturing ∼60% of previous reported RNAs, 15,431 (∼47%) novel transcripts, and identified 7,253 long non-coding RNA (lncRNA) with a large proportion of novel lncRNA (∼56%). In addition, we found that transposable elements (TEs) exonization account for 11,671 (∼45%) transcripts including 5,980 protein-coding transcripts (∼32%) and 5,691 lncRNAs (∼79%). Overall, our results expand the silkworm transcripts and have general implications to understand the interaction between TEs and their host genes. These transcripts resource will promote functional studies of genes and lncRNAs as well as TEs in the silkworm.
Collapse
Affiliation(s)
- Zongrui Dai
- State Key Laboratory of Silkworm Genome Biology, Key Laboratory of Sericultural Biology and Genetic Breeding, Ministry of Agriculture and Rural Affairs, College of Sericulture, Textile and Biomass Science, Southwest University, Chongqing, China.,WESTA College, Southwest University, Chongqing, China
| | - Jianyu Ren
- State Key Laboratory of Silkworm Genome Biology, Key Laboratory of Sericultural Biology and Genetic Breeding, Ministry of Agriculture and Rural Affairs, College of Sericulture, Textile and Biomass Science, Southwest University, Chongqing, China
| | - Xiaoling Tong
- State Key Laboratory of Silkworm Genome Biology, Key Laboratory of Sericultural Biology and Genetic Breeding, Ministry of Agriculture and Rural Affairs, College of Sericulture, Textile and Biomass Science, Southwest University, Chongqing, China
| | - Hai Hu
- State Key Laboratory of Silkworm Genome Biology, Key Laboratory of Sericultural Biology and Genetic Breeding, Ministry of Agriculture and Rural Affairs, College of Sericulture, Textile and Biomass Science, Southwest University, Chongqing, China
| | - Kunpeng Lu
- State Key Laboratory of Silkworm Genome Biology, Key Laboratory of Sericultural Biology and Genetic Breeding, Ministry of Agriculture and Rural Affairs, College of Sericulture, Textile and Biomass Science, Southwest University, Chongqing, China
| | - Fangyin Dai
- State Key Laboratory of Silkworm Genome Biology, Key Laboratory of Sericultural Biology and Genetic Breeding, Ministry of Agriculture and Rural Affairs, College of Sericulture, Textile and Biomass Science, Southwest University, Chongqing, China
| | - Min-Jin Han
- State Key Laboratory of Silkworm Genome Biology, Key Laboratory of Sericultural Biology and Genetic Breeding, Ministry of Agriculture and Rural Affairs, College of Sericulture, Textile and Biomass Science, Southwest University, Chongqing, China
| |
Collapse
|
52
|
Hollander M, Do T, Will T, Helms V. Detecting Rewiring Events in Protein-Protein Interaction Networks Based on Transcriptomic Data. FRONTIERS IN BIOINFORMATICS 2021; 1:724297. [PMID: 36303788 PMCID: PMC9581068 DOI: 10.3389/fbinf.2021.724297] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2021] [Accepted: 08/23/2021] [Indexed: 12/25/2022] Open
Abstract
Proteins rarely carry out their cellular functions in isolation. Instead, eukaryotic proteins engage in about six interactions with other proteins on average. The aggregated protein interactome of an organism forms a “hairy ball”-type protein-protein interaction (PPI) network. Yet, in a typical human cell, only about half of all proteins are expressed at a particular time. Hence, it has become common practice to prune the full PPI network to the subset of expressed proteins. If RNAseq data is available, one can further resolve the specific protein isoforms present in a cell or tissue. Here, we review various approaches, software tools and webservices that enable users to construct context-specific or tissue-specific PPI networks and how these are rewired between two cellular conditions. We illustrate their different functionalities on the example of the interactions involving the human TNR6 protein. In an outlook, we describe how PPI networks may be integrated with epigenetic data or with data on the activity of splicing factors.
Collapse
|
53
|
Asim MN, Ibrahim MA, Imran Malik M, Dengel A, Ahmed S. Advances in Computational Methodologies for Classification and Sub-Cellular Locality Prediction of Non-Coding RNAs. Int J Mol Sci 2021; 22:8719. [PMID: 34445436 PMCID: PMC8395733 DOI: 10.3390/ijms22168719] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2021] [Revised: 08/02/2021] [Accepted: 08/03/2021] [Indexed: 02/06/2023] Open
Abstract
Apart from protein-coding Ribonucleic acids (RNAs), there exists a variety of non-coding RNAs (ncRNAs) which regulate complex cellular and molecular processes. High-throughput sequencing technologies and bioinformatics approaches have largely promoted the exploration of ncRNAs which revealed their crucial roles in gene regulation, miRNA binding, protein interactions, and splicing. Furthermore, ncRNAs are involved in the development of complicated diseases like cancer. Categorization of ncRNAs is essential to understand the mechanisms of diseases and to develop effective treatments. Sub-cellular localization information of ncRNAs demystifies diverse functionalities of ncRNAs. To date, several computational methodologies have been proposed to precisely identify the class as well as sub-cellular localization patterns of RNAs). This paper discusses different types of ncRNAs, reviews computational approaches proposed in the last 10 years to distinguish coding-RNA from ncRNA, to identify sub-types of ncRNAs such as piwi-associated RNA, micro RNA, long ncRNA, and circular RNA, and to determine sub-cellular localization of distinct ncRNAs and RNAs. Furthermore, it summarizes diverse ncRNA classification and sub-cellular localization determination datasets along with benchmark performance to aid the development and evaluation of novel computational methodologies. It identifies research gaps, heterogeneity, and challenges in the development of computational approaches for RNA sequence analysis. We consider that our expert analysis will assist Artificial Intelligence researchers with knowing state-of-the-art performance, model selection for various tasks on one platform, dominantly used sequence descriptors, neural architectures, and interpreting inter-species and intra-species performance deviation.
Collapse
Affiliation(s)
- Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany
| | - Muhammad Ali Ibrahim
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany
| | - Muhammad Imran Malik
- National Center for Artificial Intelligence (NCAI), National University of Sciences and Technology, Islamabad 44000, Pakistan;
- School of Electrical Engineering & Computer Science, National University of Sciences and Technology, Islamabad 44000, Pakistan
| | - Andreas Dengel
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany
| | - Sheraz Ahmed
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- DeepReader GmbH, Trippstadter Str. 122, 67663 Kaiserslautern, Germany
| |
Collapse
|
54
|
Tan J, Fang Z, Wu S, Guo Q, Jiang X, Zhu H. HoPhage: an ab initio tool for identifying hosts of phage fragments from metaviromes. Bioinformatics 2021; 38:543-545. [PMID: 34383025 PMCID: PMC8723153 DOI: 10.1093/bioinformatics/btab585] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Revised: 07/27/2021] [Accepted: 08/10/2021] [Indexed: 02/03/2023] Open
Abstract
SUMMARY We present HoPhage (Host of Phage) to identify the host of a given phage fragment from metavirome data at the genus level. HoPhage integrates two modules using a deep learning algorithm and a Markov chain model, respectively. HoPhage achieves 47.90% and 82.47% mean accuracy at the genus and phylum levels for ∼1-kb long artificial phage fragments when predicting host among 50 genera, representing 7.54-20.22% and 13.55-24.31% improvement, respectively. By testing on three real virome samples, HoPhage yields 81.11% mean accuracy at the genus level within a much broader candidate host range. AVAILABILITY AND IMPLEMENTATION HoPhage is available at http://cqb.pku.edu.cn/ZhuLab/HoPhage/data/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jie Tan
- State Key Laboratory for Turbulence and Complex Systems, Department of Biomedical Engineering, College of Engineering and Center for Quantitative Biology, Peking University, Beijing 100871, China
| | - Zhencheng Fang
- State Key Laboratory for Turbulence and Complex Systems, Department of Biomedical Engineering, College of Engineering and Center for Quantitative Biology, Peking University, Beijing 100871, China
| | - Shufang Wu
- State Key Laboratory for Turbulence and Complex Systems, Department of Biomedical Engineering, College of Engineering and Center for Quantitative Biology, Peking University, Beijing 100871, China
| | - Qian Guo
- State Key Laboratory for Turbulence and Complex Systems, Department of Biomedical Engineering, College of Engineering and Center for Quantitative Biology, Peking University, Beijing 100871, China
| | - Xiaoqing Jiang
- State Key Laboratory for Turbulence and Complex Systems, Department of Biomedical Engineering, College of Engineering and Center for Quantitative Biology, Peking University, Beijing 100871, China
| | | |
Collapse
|
55
|
Zheng H, Talukder A, Li X, Hu H. A systematic evaluation of the computational tools for lncRNA identification. Brief Bioinform 2021; 22:6343529. [PMID: 34368833 DOI: 10.1093/bib/bbab285] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Revised: 06/21/2021] [Accepted: 07/03/2021] [Indexed: 12/28/2022] Open
Abstract
The computational identification of long non-coding RNAs (lncRNAs) is important to study lncRNAs and their functions. Despite the existence of many computation tools for lncRNA identification, to our knowledge, there is no systematic evaluation of these tools on common datasets and no consensus regarding their performance and the importance of the features used. To fill this gap, in this study, we assessed the performance of 17 tools on several common datasets. We also investigated the importance of the features used by the tools. We found that the deep learning-based tools have the best performance in terms of identifying lncRNAs, and the peptide features do not contribute much to the tool accuracy. Moreover, when the transcripts in a cell type were considered, the performance of all tools significantly dropped, and the deep learning-based tools were no longer as good as other tools. Our study will serve as an excellent starting point for selecting tools and features for lncRNA identification.
Collapse
Affiliation(s)
- Hansi Zheng
- Department of Computer Science, University of Central Florida, Orlando, FL, USA
| | - Amlan Talukder
- Department of Computer Science, University of Central Florida, Orlando, FL, USA
| | - Xiaoman Li
- Burnett School of Biomedical Science, University of Central Florida, Orlando, FL, USA
| | - Haiyan Hu
- Department of Computer Science, University of Central Florida, Orlando, FL, USA
| |
Collapse
|
56
|
Riquier S, Mathieu M, Bessiere C, Boureux A, Ruffle F, Lemaitre JM, Djouad F, Gilbert N, Commes T. Long non-coding RNA exploration for mesenchymal stem cell characterisation. BMC Genomics 2021; 22:412. [PMID: 34088266 PMCID: PMC8178833 DOI: 10.1186/s12864-020-07289-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Accepted: 11/28/2020] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND The development of RNA sequencing (RNAseq) and the corresponding emergence of public datasets have created new avenues of transcriptional marker search. The long non-coding RNAs (lncRNAs) constitute an emerging class of transcripts with a potential for high tissue specificity and function. Therefore, we tested the biomarker potential of lncRNAs on Mesenchymal Stem Cells (MSCs), a complex type of adult multipotent stem cells of diverse tissue origins, that is frequently used in clinics but which is lacking extensive characterization. RESULTS We developed a dedicated bioinformatics pipeline for the purpose of building a cell-specific catalogue of unannotated lncRNAs. The pipeline performs ab initio transcript identification, pseudoalignment and uses new methodologies such as a specific k-mer approach for naive quantification of expression in numerous RNAseq data. We next applied it on MSCs, and our pipeline was able to highlight novel lncRNAs with high cell specificity. Furthermore, with original and efficient approaches for functional prediction, we demonstrated that each candidate represents one specific state of MSCs biology. CONCLUSIONS We showed that our approach can be employed to harness lncRNAs as cell markers. More specifically, our results suggest different candidates as potential actors in MSCs biology and propose promising directions for future experimental investigations.
Collapse
Affiliation(s)
- Sébastien Riquier
- IRMB, University of Montpellier, INSERM, 80 rue Augustin Fliche, Montpellier, France
| | - Marc Mathieu
- IRMB, University of Montpellier, INSERM, 80 rue Augustin Fliche, Montpellier, France
| | - Chloé Bessiere
- IRMB, University of Montpellier, INSERM, 80 rue Augustin Fliche, Montpellier, France
| | - Anthony Boureux
- IRMB, University of Montpellier, INSERM, 80 rue Augustin Fliche, Montpellier, France
| | - Florence Ruffle
- IRMB, University of Montpellier, INSERM, 80 rue Augustin Fliche, Montpellier, France
| | - Jean-Marc Lemaitre
- IRMB, University of Montpellier, INSERM, 80 rue Augustin Fliche, Montpellier, France
| | - Farida Djouad
- IRMB, University of Montpellier, INSERM, 80 rue Augustin Fliche, Montpellier, France
| | - Nicolas Gilbert
- IRMB, University of Montpellier, INSERM, 80 rue Augustin Fliche, Montpellier, France
| | - Thérèse Commes
- IRMB, University of Montpellier, INSERM, 80 rue Augustin Fliche, Montpellier, France
| |
Collapse
|
57
|
Li Y, Sun H, Feng S, Zhang Q, Han S, Du W. Capsule-LPI: a LncRNA-protein interaction predicting tool based on a capsule network. BMC Bioinformatics 2021; 22:246. [PMID: 33985444 PMCID: PMC8120853 DOI: 10.1186/s12859-021-04171-y] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Accepted: 05/05/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Long noncoding RNAs (lncRNAs) play important roles in multiple biological processes. Identifying LncRNA-protein interactions (LPIs) is key to understanding lncRNA functions. Although some LPIs computational methods have been developed, the LPIs prediction problem remains challenging. How to integrate multimodal features from more perspectives and build deep learning architectures with better recognition performance have always been the focus of research on LPIs. RESULTS We present a novel multichannel capsule network framework to integrate multimodal features for LPI prediction, Capsule-LPI. Capsule-LPI integrates four groups of multimodal features, including sequence features, motif information, physicochemical properties and secondary structure features. Capsule-LPI is composed of four feature-learning subnetworks and one capsule subnetwork. Through comprehensive experimental comparisons and evaluations, we demonstrate that both multimodal features and the architecture of the multichannel capsule network can significantly improve the performance of LPI prediction. The experimental results show that Capsule-LPI performs better than the existing state-of-the-art tools. The precision of Capsule-LPI is 87.3%, which represents a 1.7% improvement. The F-value of Capsule-LPI is 92.2%, which represents a 1.4% improvement. CONCLUSIONS This study provides a novel and feasible LPI prediction tool based on the integration of multimodal features and a capsule network. A webserver ( http://csbg-jlu.site/lpc/predict ) is developed to be convenient for users.
Collapse
Affiliation(s)
- Ying Li
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, College of Computer Science and Technology, Jilin University, Qianjin Street, 130012, Changchun, China
| | - Hang Sun
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, College of Computer Science and Technology, Jilin University, Qianjin Street, 130012, Changchun, China
| | - Shiyao Feng
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, College of Computer Science and Technology, Jilin University, Qianjin Street, 130012, Changchun, China
| | - Qi Zhang
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, College of Computer Science and Technology, Jilin University, Qianjin Street, 130012, Changchun, China
| | - Siyu Han
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, College of Computer Science and Technology, Jilin University, Qianjin Street, 130012, Changchun, China
- Department of Computer Science, Faculty of Engineering, University of Bristol, Bristol, BS8 1UB, UK
| | - Wei Du
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, College of Computer Science and Technology, Jilin University, Qianjin Street, 130012, Changchun, China.
| |
Collapse
|
58
|
PlncRNA-HDeep: plant long noncoding RNA prediction using hybrid deep learning based on two encoding styles. BMC Bioinformatics 2021; 22:242. [PMID: 33980138 PMCID: PMC8114701 DOI: 10.1186/s12859-020-03870-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Accepted: 11/09/2020] [Indexed: 11/10/2022] Open
Abstract
Background Long noncoding RNAs (lncRNAs) play an important role in regulating biological activities and their prediction is significant for exploring biological processes. Long short-term memory (LSTM) and convolutional neural network (CNN) can automatically extract and learn the abstract information from the encoded RNA sequences to avoid complex feature engineering. An ensemble model learns the information from multiple perspectives and shows better performance than a single model. It is feasible and interesting that the RNA sequence is considered as sentence and image to train LSTM and CNN respectively, and then the trained models are hybridized to predict lncRNAs. Up to present, there are various predictors for lncRNAs, but few of them are proposed for plant. A reliable and powerful predictor for plant lncRNAs is necessary. Results To boost the performance of predicting lncRNAs, this paper proposes a hybrid deep learning model based on two encoding styles (PlncRNA-HDeep), which does not require prior knowledge and only uses RNA sequences to train the models for predicting plant lncRNAs. It not only learns the diversified information from RNA sequences encoded by p-nucleotide and one-hot encodings, but also takes advantages of lncRNA-LSTM proposed in our previous study and CNN. The parameters are adjusted and three hybrid strategies are tested to maximize its performance. Experiment results show that PlncRNA-HDeep is more effective than lncRNA-LSTM and CNN and obtains 97.9% sensitivity, 95.1% precision, 96.5% accuracy and 96.5% F1 score on Zea mays dataset which are better than those of several shallow machine learning methods (support vector machine, random forest, k-nearest neighbor, decision tree, naive Bayes and logistic regression) and some existing tools (CNCI, PLEK, CPC2, LncADeep and lncRNAnet). Conclusions PlncRNA-HDeep is feasible and obtains the credible predictive results. It may also provide valuable references for other related research.
Collapse
|
59
|
Singh D, Madhawan A, Roy J. Identification of multiple RNAs using feature fusion. Brief Bioinform 2021; 22:6272794. [PMID: 33971667 DOI: 10.1093/bib/bbab178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 04/08/2021] [Indexed: 11/13/2022] Open
Abstract
Detection of novel transcripts with deep sequencing has increased the demand for computational algorithms as their identification and validation using in vivo techniques is time-consuming, costly and unreliable. Most of these discovered transcripts belong to non-coding RNAs, a large group known for their diverse functional roles but lacks the common taxonomy. Thus, upon the identification of the absence of coding potential in them, it is crucial to recognize their prime functional category. To address this heterogeneity issue, we divide the ncRNAs into three classes and present RNA classifier (RNAC) that categorizes the RNAs into coding, housekeeping, small non-coding and long non-coding classes. RNAC utilizes the alignment-based genomic descriptors to extract statistical, local binary patterns and histogram features and fuse them to construct the classification models with extreme gradient boosting. The experiments are performed on four species, and the performance is assessed on multiclass and conventional binary classification (coding versus no-coding) problems. The proposed approach achieved >93% accuracy on both classification problems and also outperformed other well-known existing methods in coding potential prediction. This validates the usefulness of feature fusion for improved performance on both types of classification problems. Hence, RNAC is a valuable tool for the accurate identification of multiple RNAs .
Collapse
Affiliation(s)
- Dalwinder Singh
- National Agri-Food Biotechnology Institute, Sector 81, SAS Nagar, 140306, Punjab, India
| | - Akansha Madhawan
- National Agri-Food Biotechnology Institute, Sector 81, SAS Nagar, 140306, Punjab, India
| | - Joy Roy
- National Agri-Food Biotechnology Institute, Sector 81, SAS Nagar, 140306, Punjab, India
| |
Collapse
|
60
|
Shen ZA, Luo T, Zhou YK, Yu H, Du PF. NPI-GNN: Predicting ncRNA-protein interactions with deep graph neural networks. Brief Bioinform 2021; 22:6210071. [PMID: 33822882 DOI: 10.1093/bib/bbab051] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Revised: 01/29/2021] [Accepted: 02/01/2021] [Indexed: 12/23/2022] Open
Abstract
Noncoding RNAs (ncRNAs) play crucial roles in many biological processes. Experimental methods for identifying ncRNA-protein interactions (NPIs) are always costly and time-consuming. Many computational approaches have been developed as alternative ways. In this work, we collected five benchmarking datasets for predicting NPIs. Based on these datasets, we evaluated and compared the prediction performances of existing machine-learning based methods. Graph neural network (GNN) is a recently developed deep learning algorithm for link predictions on complex networks, which has never been applied in predicting NPIs. We constructed a GNN-based method, which is called Noncoding RNA-Protein Interaction prediction using Graph Neural Networks (NPI-GNN), to predict NPIs. The NPI-GNN method achieved comparable performance with state-of-the-art methods in a 5-fold cross-validation. In addition, it is capable of predicting novel interactions based on network information and sequence information. We also found that insufficient sequence information does not affect the NPI-GNN prediction performance much, which makes NPI-GNN more robust than other methods. As far as we can tell, NPI-GNN is the first end-to-end GNN predictor for predicting NPIs. All benchmarking datasets in this work and all source codes of the NPI-GNN method have been deposited with documents in a GitHub repo (https://github.com/AshuiRUA/NPI-GNN).
Collapse
Affiliation(s)
- Zi-Ang Shen
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| | - Tao Luo
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| | - Yuan-Ke Zhou
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| | - Han Yu
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| | - Pu-Feng Du
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| |
Collapse
|
61
|
Xu X, Liu S, Yang Z, Zhao X, Deng Y, Zhang G, Pang J, Zhao C, Zhang W. A systematic review of computational methods for predicting long noncoding RNAs. Brief Funct Genomics 2021; 20:162-173. [PMID: 33754153 DOI: 10.1093/bfgp/elab016] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 02/20/2021] [Accepted: 02/22/2021] [Indexed: 12/20/2022] Open
Abstract
Accurately and rapidly distinguishing long noncoding RNAs (lncRNAs) from transcripts is prerequisite for exploring their biological functions. In recent years, many computational methods have been developed to predict lncRNAs from transcripts, but there is no systematic review on these computational methods. In this review, we introduce databases and features involved in the development of computational prediction models, and subsequently summarize existing state-of-the-art computational methods, including methods based on binary classifiers, deep learning and ensemble learning. However, a user-friendly way of employing existing state-of-the-art computational methods is in demand. Therefore, we develop a Python package ezLncPred, which provides a pragmatic command line implementation to utilize nine state-of-the-art lncRNA prediction methods. Finally, we discuss challenges of lncRNA prediction and future directions.
Collapse
|
62
|
|
63
|
|
64
|
Shaw D, Chen H, Xie M, Jiang T. DeepLPI: a multimodal deep learning method for predicting the interactions between lncRNAs and protein isoforms. BMC Bioinformatics 2021; 22:24. [PMID: 33461501 PMCID: PMC7814738 DOI: 10.1186/s12859-020-03914-7] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Accepted: 11/30/2020] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Long non-coding RNAs (lncRNAs) regulate diverse biological processes via interactions with proteins. Since the experimental methods to identify these interactions are expensive and time-consuming, many computational methods have been proposed. Although these computational methods have achieved promising prediction performance, they neglect the fact that a gene may encode multiple protein isoforms and different isoforms of the same gene may interact differently with the same lncRNA. RESULTS In this study, we propose a novel method, DeepLPI, for predicting the interactions between lncRNAs and protein isoforms. Our method uses sequence and structure data to extract intrinsic features and expression data to extract topological features. To combine these different data, we adopt a hybrid framework by integrating a multimodal deep learning neural network and a conditional random field. To overcome the lack of known interactions between lncRNAs and protein isoforms, we apply a multiple instance learning (MIL) approach. In our experiment concerning the human lncRNA-protein interactions in the NPInter v3.0 database, DeepLPI improved the prediction performance by 4.7% in term of AUC and 5.9% in term of AUPRC over the state-of-the-art methods. Our further correlation analyses between interactive lncRNAs and protein isoforms also illustrated that their co-expression information helped predict the interactions. Finally, we give some examples where DeepLPI was able to outperform the other methods in predicting mouse lncRNA-protein interactions and novel human lncRNA-protein interactions. CONCLUSION Our results demonstrated that the use of isoforms and MIL contributed significantly to the improvement of performance in predicting lncRNA and protein interactions. We believe that such an approach would find more applications in predicting other functional roles of RNAs and proteins.
Collapse
Affiliation(s)
- Dipan Shaw
- Department of Computer Science and Engineering, University of California, Riverside, CA 92521 USA
| | - Hao Chen
- Department of Computer Science and Engineering, University of California, Riverside, CA 92521 USA
| | - Minzhu Xie
- College of Information Science and Engineering, Hunan Normal University, Changsha, China
| | - Tao Jiang
- Department of Computer Science and Engineering, University of California, Riverside, CA 92521 USA
- Bioinformatics Division, BNRIST/Department of Computer Science and Technology, Tsinghua University, Beijing, China
| |
Collapse
|
65
|
Abstract
Background Many transcripts have been generated due to the development of sequencing technologies, and lncRNA is an important type of transcript. Predicting lncRNAs from transcripts is a challenging and important task. Traditional experimental lncRNA prediction methods are time-consuming and labor-intensive. Efficient computational methods for lncRNA prediction are in demand. Results In this paper, we propose two lncRNA prediction methods based on feature ensemble learning strategies named LncPred-IEL and LncPred-ANEL. Specifically, we encode sequences into six different types of features including transcript-specified features and general sequence-derived features. Then we consider two feature ensemble strategies to utilize and integrate the information in different feature types, the iterative ensemble learning (IEL) and the attention network ensemble learning (ANEL). IEL employs a supervised iterative way to ensemble base predictors built on six different types of features. ANEL introduces an attention mechanism-based deep learning model to ensemble features by adaptively learning the weight of individual feature types. Experiments demonstrate that both LncPred-IEL and LncPred-ANEL can effectively separate lncRNAs and other transcripts in feature space. Moreover, comparison experiments demonstrate that LncPred-IEL and LncPred-ANEL outperform several state-of-the-art methods when evaluated by 5-fold cross-validation. Both methods have good performances in cross-species lncRNA prediction. Conclusions LncPred-IEL and LncPred-ANEL are promising lncRNA prediction tools that can effectively utilize and integrate the information in different types of features.
Collapse
Affiliation(s)
- Yanzhen Xu
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Xiaohan Zhao
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Shuai Liu
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Wen Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China.
| |
Collapse
|
66
|
Alam T, Al-Absi HRH, Schmeier S. Deep Learning in LncRNAome: Contribution, Challenges, and Perspectives. Noncoding RNA 2020; 6:E47. [PMID: 33266128 PMCID: PMC7711891 DOI: 10.3390/ncrna6040047] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2020] [Revised: 10/27/2020] [Accepted: 11/06/2020] [Indexed: 12/11/2022] Open
Abstract
Long non-coding RNAs (lncRNA), the pervasively transcribed part of the mammalian genome, have played a significant role in changing our protein-centric view of genomes. The abundance of lncRNAs and their diverse roles across cell types have opened numerous avenues for the research community regarding lncRNAome. To discover and understand lncRNAome, many sophisticated computational techniques have been leveraged. Recently, deep learning (DL)-based modeling techniques have been successfully used in genomics due to their capacity to handle large amounts of data and produce relatively better results than traditional machine learning (ML) models. DL-based modeling techniques have now become a choice for many modeling tasks in the field of lncRNAome as well. In this review article, we summarized the contribution of DL-based methods in nine different lncRNAome research areas. We also outlined DL-based techniques leveraged in lncRNAome, highlighting the challenges computational scientists face while developing DL-based models for lncRNAome. To the best of our knowledge, this is the first review article that summarizes the role of DL-based techniques in multiple areas of lncRNAome.
Collapse
Affiliation(s)
- Tanvir Alam
- College of Science and Engineering, Hamad Bin Khalifa University, Doha 34110, Qatar;
| | - Hamada R. H. Al-Absi
- College of Science and Engineering, Hamad Bin Khalifa University, Doha 34110, Qatar;
| | - Sebastian Schmeier
- School of Natural and Computational Sciences, Massey University, Auckland 0632, New Zealand;
| |
Collapse
|
67
|
Li J, Zhang X, Liu C. The computational approaches of lncRNA identification based on coding potential: Status quo and challenges. Comput Struct Biotechnol J 2020; 18:3666-3677. [PMID: 33304463 PMCID: PMC7710504 DOI: 10.1016/j.csbj.2020.11.030] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 11/15/2020] [Accepted: 11/16/2020] [Indexed: 12/13/2022] Open
Abstract
Long noncoding RNAs (lncRNAs) make up a large proportion of transcriptome in eukaryotes, and have been revealed with many regulatory functions in various biological processes. When studying lncRNAs, the first step is to accurately and specifically distinguish them from the colossal transcriptome data with complicated composition, which contains mRNAs, lncRNAs, small RNAs and their primary transcripts. In the face of such a huge and progressively expanding transcriptome data, the in-silico approaches provide a practicable scheme for effectively and rapidly filtering out lncRNA targets, using machine learning and probability statistics. In this review, we mainly discussed the characteristics of algorithms and features on currently developed approaches. We also outlined the traits of some state-of-the-art tools for ease of operation. Finally, we pointed out the underlying challenges in lncRNA identification with the advent of new experimental data.
Collapse
Affiliation(s)
- Jing Li
- CAS Key Laboratory of Tropical Plant Resources and Sustainable Use, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Menglun, Mengla, Yunnan 666303, China
- Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Menglun, Mengla, Yunnan 666303, China
| | - Xuan Zhang
- CAS Key Laboratory of Tropical Plant Resources and Sustainable Use, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Menglun, Mengla, Yunnan 666303, China
| | - Changning Liu
- CAS Key Laboratory of Tropical Plant Resources and Sustainable Use, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Menglun, Mengla, Yunnan 666303, China
- Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Menglun, Mengla, Yunnan 666303, China
- The Innovative Academy of Seed Design, Chinese Academy of Sciences, Menglun, Mengla, Yunnan 666303, China
| |
Collapse
|
68
|
Identification of Key Genes Involved in Acute Myocardial Infarction by Comparative Transcriptome Analysis. BIOMED RESEARCH INTERNATIONAL 2020; 2020:1470867. [PMID: 33083450 PMCID: PMC7559508 DOI: 10.1155/2020/1470867] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Revised: 08/26/2020] [Accepted: 09/11/2020] [Indexed: 11/26/2022]
Abstract
Background Acute myocardial infarction (AMI) is regarded as an urgent clinical entity, and identification of differentially expressed genes, lncRNAs, and altered pathways shall provide new insight into the molecular mechanisms behind AMI. Materials and Methods Microarray data was collected to identify key genes and lncRNAs involved in AMI pathogenesis. The differential expression analysis and gene set enrichment analysis (GSEA) were employed to identify the upregulated and downregulated genes and pathways in AMI. The protein-protein interaction network and protein-RNA interaction analysis were utilized to reveal key long noncoding RNAs. Results In the present study, we utilized gene expression profiles of circulating endothelial cells (CEC) from 49 patients of AMI and 50 controls and identified a total of 552 differentially expressed genes (DEGs). Based on these DEGs, we also observed that inflammatory response-related genes and pathways were highly upregulated in AMI. Mapping the DEGs to the protein-protein interaction (PPI) network and identifying the subnetworks, we found that OMD and WDFY3 were the hub nodes of two subnetworks with the highest connectivity, which were found to be involved in circadian rhythm and organ- or tissue-specific immune response. Furthermore, 23 lncRNAs were differentially expressed between AMI and control groups. Specifically, we identified some functional lncRNAs, including XIST and its antisense RNA, TSIX, and three lncRNAs (LINC00528, LINC00936, and LINC01001), which were predicted to be interacting with TLR2 and participate in Toll-like receptor signaling pathway. In addition, we also employed the MMPC algorithm to identify six gene signatures for AMI diagnosis. Particularly, the multivariable SVM model based on the six genes has achieved a satisfying performance (AUC = 0.97). Conclusion In conclusion, we have identified key regulatory lncRNAs implicated in AMI, which not only deepens our understanding of the lncRNA-related molecular mechanism of AMI but also provides computationally predicted regulatory lncRNAs for AMI researchers.
Collapse
|
69
|
Towards a comprehensive pipeline to identify and functionally annotate long noncoding RNA (lncRNA). Comput Biol Med 2020; 127:104028. [PMID: 33126123 DOI: 10.1016/j.compbiomed.2020.104028] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Revised: 09/28/2020] [Accepted: 09/29/2020] [Indexed: 12/20/2022]
Abstract
Long noncoding RNAs (lncRNAs) are implicated in various genetic diseases and cancer, attributed to their critical role in gene regulation. They are a divergent group of RNAs and are easily differentiated from other types with unique characteristics, functions, and mechanisms of action. In this review, we provide a list of some of the prominent data repositories containing lncRNAs, their interactome, and predicted and validated disease associations. Next, we discuss various wet-lab experiments formulated to obtain the data for these repositories. We also provide a critical review of in silico methods available for the identification purpose and suggest techniques to further improve their performance. The bulk of the methods currently focus on distinguishing lncRNA transcripts from the coding ones. Functional annotation of these transcripts still remains a grey area and more efforts are needed in that space. Finally, we provide details of current progress, discuss impediments, and illustrate a roadmap for developing a generalized computational pipeline for comprehensive annotation of lncRNAs, which is essential to accelerate research in this area.
Collapse
|
70
|
LncLocation: Efficient Subcellular Location Prediction of Long Non-Coding RNA-Based Multi-Source Heterogeneous Feature Fusion. Int J Mol Sci 2020; 21:ijms21197271. [PMID: 33019721 PMCID: PMC7582431 DOI: 10.3390/ijms21197271] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2020] [Revised: 09/27/2020] [Accepted: 09/28/2020] [Indexed: 12/13/2022] Open
Abstract
Recent studies uncover that subcellular location of long non-coding RNAs (lncRNAs) can provide significant information on its function. Due to the lack of experimental data, the number of lncRNAs is very limited, experimentally verified subcellular localization, and the numbers of lncRNAs located in different organelle are wildly imbalanced. The prediction of subcellular location of lncRNAs is actually a multi-classification small sample imbalance problem. The imbalance of data results in the poor recognition effect of machine learning models on small data subsets, which is a puzzling and challenging problem in the existing research. In this study, we integrate multi-source features to construct a sequence-based computational tool, lncLocation, to predict the subcellular location of lncRNAs. Autoencoder is used to enhance part of the features, and the binomial distribution-based filtering method and recursive feature elimination (RFE) are used to filter some of the features. It improves the representation ability of data and reduces the problem of unbalanced multi-classification data. By comprehensive experiments on different feature combinations and machine learning models, we select the optimal features and classifier model scheme to construct a subcellular location prediction tool, lncLocation. LncLocation can obtain an 87.78% accuracy using 5-fold cross validation on the benchmark data, which is higher than the state-of-the-art tools, and the classification performance, especially for small class sets, is improved significantly.
Collapse
|
71
|
Varela-Martínez E, Bilbao-Arribas M, Abendaño N, Asín J, Pérez M, de Andrés D, Luján L, Jugo BM. Whole transcriptome approach to evaluate the effect of aluminium hydroxide in ovine encephalon. Sci Rep 2020; 10:15240. [PMID: 32943671 PMCID: PMC7498608 DOI: 10.1038/s41598-020-71905-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Accepted: 08/10/2020] [Indexed: 12/18/2022] Open
Abstract
Aluminium hydroxide adjuvants are crucial for livestock and human vaccines. Few studies have analysed their effect on the central nervous system in vivo. In this work, lambs received three different treatments of parallel subcutaneous inoculations during 16 months with aluminium-containing commercial vaccines, an equivalent dose of aluminium hydroxide or mock injections. Brain samples were sequenced by RNA-seq and miRNA-seq for the expression analysis of mRNAs, long non-coding RNAs and microRNAs and three expression comparisons were made. Although few differentially expressed genes were identified, some dysregulated genes by aluminium hydroxide alone were linked to neurological functions, the lncRNA TUNA among them, or were enriched in mitochondrial energy metabolism related functions. In the same way, the miRNA expression was mainly disrupted by the adjuvant alone treatment. Some differentially expressed miRNAs had been previously linked to neurological diseases, oxidative stress and apoptosis. In brief, in this study aluminium hydroxide alone altered the transcriptome of the encephalon to a higher degree than commercial vaccines that present a milder effect. The expression changes in the animals inoculated with aluminium hydroxide suggest mitochondrial disfunction. Further research is needed to elucidate to which extent these changes could have pathological consequences.
Collapse
Affiliation(s)
- Endika Varela-Martínez
- Department of Genetics, Physical Anthropology and Animal Physiology, Faculty of Science and Technology, University of the Basque Country (UPV/EHU), Leioa, Spain
| | - Martin Bilbao-Arribas
- Department of Genetics, Physical Anthropology and Animal Physiology, Faculty of Science and Technology, University of the Basque Country (UPV/EHU), Leioa, Spain
| | - Naiara Abendaño
- Department of Genetics, Physical Anthropology and Animal Physiology, Faculty of Science and Technology, University of the Basque Country (UPV/EHU), Leioa, Spain
| | - Javier Asín
- Department of Animal Pathology, University of Zaragoza, Zaragoza, Spain
| | - Marta Pérez
- Department of Animal Pathology, University of Zaragoza, Zaragoza, Spain
| | - Damián de Andrés
- Institute of Agrobiotechnology (CSIC-UPNA-Gov. Navarra), Navarra, Spain
| | - Lluís Luján
- Department of Animal Pathology, University of Zaragoza, Zaragoza, Spain
| | - Begoña M Jugo
- Department of Genetics, Physical Anthropology and Animal Physiology, Faculty of Science and Technology, University of the Basque Country (UPV/EHU), Leioa, Spain.
| |
Collapse
|
72
|
Ke ZP, Xu YJ, Wang ZS, Sun J. RNA sequencing profiling reveals key mRNAs and long noncoding RNAs in atrial fibrillation. J Cell Biochem 2020; 121:3752-3763. [PMID: 31680326 DOI: 10.1002/jcb.29504] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Accepted: 10/08/2019] [Indexed: 01/24/2023]
Abstract
Long noncoding RNAs (lncRNAs) are an emerging class of RNA species that could participate in some critical pathways and disease pathogenesis. However, the underlying molecular mechanism of lncRNAs in atrial fibrillation (AF) is still not fully understood. In the present study, we analyzed RNA-seq data of paired left and right atrial appendages from five patients with AF and other five patients without AF. Based on the gene expression profiles of 20 samples, we found that a majority of genes were aberrantly expressed in both left and right atrial appendages of patients with AF. Similarly, the dysregulated pathways in the left and right atrial appendages of patients with AF also bore a close resemblance. Moreover, we predicted regulatory lncRNAs that regulated the expression of adjacent protein-coding genes (PCGs) or interacted with proteins. We identified that NPPA and its antisense RNA NPPA-AS1 may participate in the pathogenesis of AF by regulating the muscle contraction. We also identified that RP11 - 99E15.2 and RP3 - 523K23.2 could interact with proteins ITGB3 and HSF2, respectively. RP11 - 99E15.2 and RP3 - 523K23.2 may participate in the pathogenesis of AF via regulating the extracellular matrix binding and the transcription of HSF2 target genes, respectively. The close association of the lncRNA-interacting proteins with AF further demonstrated that these two lncRNAs were also associated with AF. In conclusion, we have identified key regulatory lncRNAs implicated in AF, which not only improves our understanding of the lncRNA-related molecular mechanism underlying AF but also provides computationally predicted regulatory lncRNAs for AF researchers.
Collapse
Affiliation(s)
- Zun-Ping Ke
- Department of Cardiology, The Fifth People's Hospital of Shanghai, Fudan University, Shanghai, China
| | - Ying-Jia Xu
- Department of Cardiology, The Fifth People's Hospital of Shanghai, Fudan University, Shanghai, China
| | - Zhang-Sheng Wang
- Department of Cardiology, The Fifth People's Hospital of Shanghai, Fudan University, Shanghai, China
| | - Jian Sun
- Department of Cardiology, Xinhua Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China.,Clinical Research Unit, Xinhua Hospital, Shanghai Jiao Tong University, School of Medicine, Shanghai, China
| |
Collapse
|
73
|
lncRNA_Mdeep: An Alignment-Free Predictor for Distinguishing Long Non-Coding RNAs from Protein-Coding Transcripts by Multimodal Deep Learning. Int J Mol Sci 2020; 21:ijms21155222. [PMID: 32718000 PMCID: PMC7432689 DOI: 10.3390/ijms21155222] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2020] [Revised: 07/14/2020] [Accepted: 07/16/2020] [Indexed: 01/04/2023] Open
Abstract
Long non-coding RNAs (lncRNAs) play crucial roles in diverse biological processes and human complex diseases. Distinguishing lncRNAs from protein-coding transcripts is a fundamental step for analyzing the lncRNA functional mechanism. However, the experimental identification of lncRNAs is expensive and time-consuming. In this study, we presented an alignment-free multimodal deep learning framework (namely lncRNA_Mdeep) to distinguish lncRNAs from protein-coding transcripts. LncRNA_Mdeep incorporated three different input modalities, then a multimodal deep learning framework was built for learning the high-level abstract representations and predicting the probability whether a transcript was lncRNA or not. LncRNA_Mdeep achieved 98.73% prediction accuracy in a 10-fold cross-validation test on humans. Compared with other eight state-of-the-art methods, lncRNA_Mdeep showed 93.12% prediction accuracy independent test on humans, which was 0.94%~15.41% higher than that of other eight methods. In addition, the results on 11 cross-species datasets showed that lncRNA_Mdeep was a powerful predictor for predicting lncRNAs.
Collapse
|
74
|
Wang Y, Bhattacharya T, Jiang Y, Qin X, Wang Y, Liu Y, Saykin AJ, Chen L. A novel deep learning method for predictive modeling of microbiome data. Brief Bioinform 2020; 22:5835556. [PMID: 32406914 DOI: 10.1093/bib/bbaa073] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2019] [Revised: 02/22/2020] [Accepted: 04/10/2020] [Indexed: 12/22/2022] Open
Abstract
With the development and decreasing cost of next-generation sequencing technologies, the study of the human microbiome has become a rapid expanding research field, which provides an unprecedented opportunity in various clinical applications such as drug response predictions and disease diagnosis. It is thus essential and desirable to build a prediction model for clinical outcomes based on microbiome data that usually consist of taxon abundance and a phylogenetic tree. Importantly, all microbial species are not uniformly distributed in the phylogenetic tree but tend to be clustered at different phylogenetic depths. Therefore, the phylogenetic tree represents a unique correlation structure of microbiome, which can be an important prior to improve the prediction performance. However, prediction methods that consider the phylogenetic tree in an efficient and rigorous way are under-developed. Here, we develop a novel deep learning prediction method MDeep (microbiome-based deep learning method) to predict both continuous and binary outcomes. Conceptually, MDeep designs convolutional layers to mimic taxonomic ranks with multiple convolutional filters on each convolutional layer to capture the phylogenetic correlation among microbial species in a local receptive field and maintain the correlation structure across different convolutional layers via feature mapping. Taken together, the convolutional layers with its built-in convolutional filters capture microbial signals at different taxonomic levels while encouraging local smoothing and preserving local connectivity induced by the phylogenetic tree. We use both simulation studies and real data applications to demonstrate that MDeep outperforms competing methods in both regression and binary classifications. Availability and Implementation: MDeep software is available at https://github.com/lichen-lab/MDeep Contact:chen61@iu.edu.
Collapse
|
75
|
Zhang Y, Jia C, Fullwood MJ, Kwoh CK. DeepCPP: a deep neural network based on nucleotide bias information and minimum distribution similarity feature selection for RNA coding potential prediction. Brief Bioinform 2020; 22:2073-2084. [PMID: 32227075 DOI: 10.1093/bib/bbaa039] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2019] [Revised: 02/24/2020] [Accepted: 02/25/2020] [Indexed: 12/22/2022] Open
Abstract
The development of deep sequencing technologies has led to the discovery of novel transcripts. Many in silico methods have been developed to assess the coding potential of these transcripts to further investigate their functions. Existing methods perform well on distinguishing majority long noncoding RNAs (lncRNAs) and coding RNAs (mRNAs) but poorly on RNAs with small open reading frames (sORFs). Here, we present DeepCPP (deep neural network for coding potential prediction), a deep learning method for RNA coding potential prediction. Extensive evaluations on four previous datasets and six new datasets constructed in different species show that DeepCPP outperforms other state-of-the-art methods, especially on sORF type data, which overcomes the bottleneck of sORF mRNA identification by improving more than 4.31, 37.24 and 5.89% on its accuracy for newly discovered human, vertebrate and insect data, respectively. Additionally, we also revealed that discontinuous k-mer, and our newly proposed nucleotide bias and minimal distribution similarity feature selection method play crucial roles in this classification problem. Taken together, DeepCPP is an effective method for RNA coding potential prediction.
Collapse
Affiliation(s)
- Yu Zhang
- School of Computer Science and Engineering, Nanyang Techonological University, 50 Nanyang Avenue, Singapore
| | - Cangzhi Jia
- School of Mathematical Sciences, Dalian University of Technology, No.2 Linggong Road, Dalian, China
| | - Melissa Jane Fullwood
- School of Computer Science and Engineering, Nanyang Techonological University, 50 Nanyang Avenue, Singapore
| | - Chee Keong Kwoh
- School of Computer Science and Engineering, Nanyang Techonological University, 50 Nanyang Avenue, Singapore
| |
Collapse
|
76
|
Shi X, Shao X, Liu B, Lv M, Pandey P, Guo C, Zhang R, Zhang Y. Genome-wide screening of functional long noncoding RNAs in the epicardial adipose tissues of atrial fibrillation. Biochim Biophys Acta Mol Basis Dis 2020; 1866:165757. [PMID: 32147422 DOI: 10.1016/j.bbadis.2020.165757] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2020] [Revised: 02/23/2020] [Accepted: 02/26/2020] [Indexed: 12/12/2022]
Abstract
Atrial fibrillation (AF) is the most common arrhythmias, and patients with AF are facing increased risk of heart failure and ischemic stroke. However, the AF pathogenesis, especially the long noncoding RNAs (lncRNA)-related mechanism, has not been fully understood. In this study, we collected RNA sequencing data of the epicardial adipose tissues (EAT) from 6 AF and 6 sinus rhythm (SR) to identify the differentially expressed protein-coding genes (PCGs) and lncRNAs. Functionally, the differentially expressed PCGs were significantly enriched in bone development disease, chronic kidney failure, and kidney disease. Particularly, we found that homeobox (HOX) genes, especially the antisense RNAs, HOTAIRM1, HOXA-AS2 and HOXB-AS2, were significantly downregulated in EAT of AF. The biological function predictions for the dysregulated lncRNAs revealed that TNF signaling pathway was the most frequent pathway that the lncRNAs might participate in. In addition, SNHG16 and RP11-471B22.2 might participate in TGF-beta signaling and ECM-receptor interaction by interacting with the proteins involved in the pathways, respectively. Collectively, we provided some potentially pathogenic lncRNAs in AF, which might be useful for the related researchers to study their functionality and develop new therapeutics.
Collapse
Affiliation(s)
- Xin Shi
- Xinhua Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai 200092, China
| | - Xuelian Shao
- School of Life Sciences, Fudan University, Shanghai, China
| | - Ban Liu
- Department of Cardiology, Shanghai Tenth People's Hospital, Tongji University School of Medicine, Shanghai, China
| | - Mengwei Lv
- Shanghai East Hospital of Clinical Medical College, Nanjing Medical University, Shanghai, China; Department of Cardiovascular Surgery, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, China
| | - Pratik Pandey
- Department of Cardiology, Shanghai Tenth People's Hospital, Tongji University School of Medicine, Shanghai, China
| | - Changfa Guo
- Department of Cardiovascular Surgery, Zhongshan Hospital, Fudan University, Shanghai, China.
| | - Ruilin Zhang
- School of Basic Medical Sciences, Wuhan University, Wuhan, China.
| | - Yangyang Zhang
- Department of Cardiovascular Surgery, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, China.
| |
Collapse
|
77
|
Camargo AP, Sourkov V, Pereira G, Carazzolle M. RNAsamba: neural network-based assessment of the protein-coding potential of RNA sequences. NAR Genom Bioinform 2020; 2:lqz024. [PMID: 33575571 PMCID: PMC7671399 DOI: 10.1093/nargab/lqz024] [Citation(s) in RCA: 65] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2019] [Revised: 11/15/2019] [Accepted: 12/17/2019] [Indexed: 02/06/2023] Open
Abstract
The advent of high-throughput sequencing technologies made it possible to obtain large volumes of genetic information, quickly and inexpensively. Thus, many efforts are devoted to unveiling the biological roles of genomic elements, being the distinction between protein-coding and long non-coding RNAs one of the most important tasks. We describe RNAsamba, a tool to predict the coding potential of RNA molecules from sequence information using a neural network-based that models both the whole sequence and the ORF to identify patterns that distinguish coding from non-coding transcripts. We evaluated RNAsamba's classification performance using transcripts coming from humans and several other model organisms and show that it recurrently outperforms other state-of-the-art methods. Our results also show that RNAsamba can identify coding signals in partial-length ORFs and UTR sequences, evidencing that its algorithm is not dependent on complete transcript sequences. Furthermore, RNAsamba can also predict small ORFs, traditionally identified with ribosome profiling experiments. We believe that RNAsamba will enable faster and more accurate biological findings from genomic data of species that are being sequenced for the first time. A user-friendly web interface, the documentation containing instructions for local installation and usage, and the source code of RNAsamba can be found at https://rnasamba.lge.ibi.unicamp.br/.
Collapse
Affiliation(s)
- Antonio P Camargo
- Department of Genetics, Evolution, Microbiology and Immunology, Institute of Biology, University of Campinas, Campinas, SP, 13083-862, Brazil
| | - Vsevolod Sourkov
- Department of Computer Science, ReDNA Labs, Pattaya, Chonburi, 20150, Thailand
| | - Gonçalo A G Pereira
- Department of Genetics, Evolution, Microbiology and Immunology, Institute of Biology, University of Campinas, Campinas, SP, 13083-862, Brazil
| | - Marcelo F Carazzolle
- Department of Genetics, Evolution, Microbiology and Immunology, Institute of Biology, University of Campinas, Campinas, SP, 13083-862, Brazil
| |
Collapse
|
78
|
Su X, Zhang J, Yang W, Liu Y, Liu Y, Shan Z, Wang W. Identification of the Prognosis-Related lncRNAs and Genes in Gastric Cancer. Front Genet 2020; 11:27. [PMID: 32117443 PMCID: PMC7027194 DOI: 10.3389/fgene.2020.00027] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2019] [Accepted: 01/08/2020] [Indexed: 12/18/2022] Open
Abstract
Gastric cancer is a common malignant tumor with high occurrence and recurrence and is the leading cause of death worldwide. However, the prognostic value of protein-coding and non-coding RNAs in stage III gastric cancer has not been systematically analyzed. In this study, using TCGA data, we identified 585 long noncoding RNAs (lncRNAs) and 927 protein-coding genes (PCGs) correlated with the overall survival rate of gastric cancer. Functional enrichment analysis revealed that the prognostic genes positively correlated with death rates were enriched in pathways, including gap junction, focal adhesion, cell adhesion molecules (CAMs), and neuroactive ligand-receptor interaction, that are involved in the tumor microenvironment and cell-cell communications, suggesting that their dysregulation may promote the tumor progression. To evaluate the performance of the prognostic genes in risk prediction, we built three multivariable Cox models based on prognostic genes selected from the prognostic PCGs and lncRNAs. The performance of the three models based on features from only PCGs or lncRNAs or from all prognostic genes were systematically compared, which revealed that the features selected from all the prognostic genes showed higher performance than the features selected only from lncRNAs or PCGs. Furthermore, the multivariable Cox regression analysis revealed that the stratification with the highest performance was an independent prognostic factor in stage III gastric cancer. In addition, we explored the underlying mechanism of the prognostic lncRNAs in the Cox model by predicting the lncRNA and protein interaction. Specifically, CTD-2218G20.2 was predicted to interact with PSG4, PSG5, and PSG7, which could also interact with cancer-related proteins, including KISS1, TIMP2, MMP11, IGFBP1, EGFR, and CDKN1C, suggesting that CTD-2218G20.2 might participate in the cancer progression via these cancer-related proteins. In summary, the systematic analysis of the prognostic lncRNAs and PCGs was of great importance to the understanding of the progression of stage III gastric cancer.
Collapse
Affiliation(s)
- Xiaohui Su
- Department of Gastric Surgery, Cancer Hospital of China Medical University, Liaoning, China
| | - Jianjun Zhang
- Department of Gastric Surgery, Cancer Hospital of China Medical University, Liaoning, China
| | - Wei Yang
- Department of Gastric Surgery, Cancer Hospital of China Medical University, Liaoning, China
| | - Yanqing Liu
- Department of Gastric Surgery, Cancer Hospital of China Medical University, Liaoning, China
| | - Yang Liu
- Department of Gastric Surgery, Cancer Hospital of China Medical University, Liaoning, China
| | - Zexing Shan
- Department of Gastric Surgery, Cancer Hospital of China Medical University, Liaoning, China
| | - Wentao Wang
- Department of Gastric Surgery, Cancer Hospital of China Medical University, Liaoning, China
| |
Collapse
|
79
|
LPI-BLS: Predicting lncRNA–protein interactions with a broad learning system-based stacked ensemble classifier. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.08.084] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
|
80
|
Tong X, Liu S. CPPred: coding potential prediction based on the global description of RNA sequence. Nucleic Acids Res 2019; 47:e43. [PMID: 30753596 PMCID: PMC6486542 DOI: 10.1093/nar/gkz087] [Citation(s) in RCA: 75] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2018] [Revised: 01/26/2019] [Accepted: 02/01/2019] [Indexed: 11/12/2022] Open
Abstract
The rapid and accurate approach to distinguish between coding RNAs and ncRNAs has been playing a critical role in analyzing thousands of novel transcripts, which have been generated in recent years by next-generation sequencing technology. Previously developed methods CPAT, CPC2 and PLEK can distinguish coding RNAs and ncRNAs very well, but poorly distinguish between small coding RNAs and small ncRNAs. Herein, we report an approach, CPPred (coding potential prediction), which is based on SVM classifier and multiple sequence features including novel RNA features encoded by the global description. The CPPred can better distinguish not only between coding RNAs and ncRNAs, but also between small coding RNAs and small ncRNAs than the state-of-the-art methods due to the addition of the novel RNA features. A recent study proposes 1335 novel human coding RNAs from a large number of RNA-seq datasets. However, only 119 transcripts are predicted as coding RNAs by the CPPred. In fact, almost all proposed novel coding RNAs are ncRNAs (91.1%), which is consistent with previous reports. Remarkably, we also reveal that the global description of encoding features (T2, C0 and GC) plays an important role in the prediction of coding potential.
Collapse
Affiliation(s)
- Xiaoxue Tong
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Shiyong Liu
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| |
Collapse
|
81
|
Shi C, Chen J, Kang X, Zhao G, Lao X, Zheng H. Deep Learning in the Study of Protein-Related Interactions. Protein Pept Lett 2019; 27:359-369. [PMID: 31538879 DOI: 10.2174/0929866526666190723114142] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2019] [Revised: 03/13/2019] [Accepted: 04/05/2019] [Indexed: 11/22/2022]
Abstract
Protein-related interaction prediction is critical to understanding life processes, biological functions, and mechanisms of drug action. Experimental methods used to determine proteinrelated interactions have always been costly and inefficient. In recent years, advances in biological and medical technology have provided us with explosive biological and physiological data, and deep learning-based algorithms have shown great promise in extracting features and learning patterns from complex data. At present, deep learning in protein research has emerged. In this review, we provide an introductory overview of the deep neural network theory and its unique properties. Mainly focused on the application of this technology in protein-related interactions prediction over the past five years, including protein-protein interactions prediction, protein-RNA\DNA, Protein- drug interactions prediction, and others. Finally, we discuss some of the challenges that deep learning currently faces.
Collapse
Affiliation(s)
- Cheng Shi
- School of Life Science and Technology, China Pharmaceutical University, Nanjing 210009, China
| | - Jiaxing Chen
- School of Life Science and Technology, China Pharmaceutical University, Nanjing 210009, China
| | - Xinyue Kang
- School of Life Science and Technology, China Pharmaceutical University, Nanjing 210009, China
| | - Guiling Zhao
- School of Life Science and Technology, China Pharmaceutical University, Nanjing 210009, China
| | - Xingzhen Lao
- School of Life Science and Technology, China Pharmaceutical University, Nanjing 210009, China
| | - Heng Zheng
- School of Life Science and Technology, China Pharmaceutical University, Nanjing 210009, China
| |
Collapse
|
82
|
PredLnc-GFStack: A Global Sequence Feature Based on a Stacked Ensemble Learning Method for Predicting lncRNAs from Transcripts. Genes (Basel) 2019; 10:genes10090672. [PMID: 31484412 PMCID: PMC6770532 DOI: 10.3390/genes10090672] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2019] [Revised: 08/05/2019] [Accepted: 08/28/2019] [Indexed: 11/16/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) are a class of RNAs with the length exceeding 200 base pairs (bps), which do not encode proteins, nevertheless, lncRNAs have many vital biological functions. A large number of novel transcripts were discovered as a result of the development of high-throughput sequencing technology. Under this circumstance, computational methods for lncRNA prediction are in great demand. In this paper, we consider global sequence features and propose a stacked ensemble learning-based method to predict lncRNAs from transcripts, abbreviated as PredLnc-GFStack. We extract the critical features from the candidate feature list using the genetic algorithm (GA) and then employ the stacked ensemble learning method to construct PredLnc-GFStack model. Computational experimental results show that PredLnc-GFStack outperforms several state-of-the-art methods for lncRNA prediction. Furthermore, PredLnc-GFStack demonstrates an outstanding ability for cross-species ncRNA prediction.
Collapse
|
83
|
Zhang H, Liang Y, Peng C, Han S, Du W, Li Y. Predicting lncRNA-disease associations using network topological similarity based on deep mining heterogeneous networks. Math Biosci 2019; 315:108229. [PMID: 31323239 DOI: 10.1016/j.mbs.2019.108229] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2018] [Revised: 05/12/2019] [Accepted: 07/16/2019] [Indexed: 12/17/2022]
Abstract
A kind of noncoding RNA with length more than 200 nucleotides named long noncoding RNA (lncRNA) has gained considerable attention in recent decades. Many studies have confirmed that human genome contains many thousands of lncRNAs. LncRNAs play significant roles in many important biological processes, including complex disease diagnosis, prognosis, prevention and treatment. For some important diseases such as cancer, lncRNAs have been novel candidate biomarkers. However, the role of lncRNAs in human diseases is still in its infancy, and only a small part of lncRNA-disease associations have been experimentally verified. Predicting lncRNA-disease association is an important way to understand the mechanism and function of lncRNA involved in diseases to enrich the annotations of lncRNA. Therefore, it is urgent to prioritize lncRNAs potentially associated with diseases. Biological system is a highly complex heterogenous network involved different molecules. Therefore, the algorithms based on network methods have been extensively applied in information fields which can provide a quantifiable characterization for the networks characterizing multifarious biological systems. A heterogeneous network topology possessing abundant interactions between biomedical entities is rarely utilized in similarity-based methods for predicting lncRNA-disease associations based on the array of varying features of lncRNAs and diseases. DeepWalk, encoding the relations of nodes in a continuous vector space, is an extension of language model and unsupervised learning from sequence-based word to network. In this article, we present a novel lncRNA-disease association prediction method based on DeepWalk, which enhances the existing association discovery methods through a topology-based similarity measure. We integrate the heterogeneous data to construct a Linked Tripartite Network which is a heterogeneous network containing three types od nodes which generated from bioinformatics linked datasets and use DeepWalk method to extract topological structure features of the nodes in the linked tripartite network for calculating similarities. Our proposed method can be separated into the following steps: Firstly, we integrate heterogeneous data to construct a Linked Tripartite Network: containing the topological interactions of known lncRNA-disease, lncRNA-microRNA and microRNA-disease. Secondly, the topological structure features of the nodes are extracted based on DeepWalk. Thirdly, similarity scores of disease-disease pairs and lncRNA-lncRNA pairs are computed based on the topology of this network. Finally, new lncRNA and disease associations are discovered by rule-based inference method with lncRNA-lncRNA similarities. Our proposed method shows superior predictive performance for prediction of lncRNA-disease associations based on topological similarity from heterogenous network. The AUC value is used to show the performance of our method. The similarity measurement using network topology based on DeepWalk provide a novel perspective which is different from the similarity derived from sequence or structure information. Availability: All the data and codes are freely availability at: https://github.com/Pengeace/lncRNA-disease-link.
Collapse
Affiliation(s)
- Hui Zhang
- College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
| | - Yanchun Liang
- College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China; Zhuhai Laboratory of Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, Zhuhai College of Jilin University, Zhuhai 519041, China
| | - Cheng Peng
- College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
| | - Siyu Han
- College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
| | - Wei Du
- College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China.
| | - Ying Li
- College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China.
| |
Collapse
|
84
|
Kong Y, Lu Z, Liu P, Liu Y, Wang F, Liang EY, Hou FF, Liang M. Long Noncoding RNA: Genomics and Relevance to Physiology. Compr Physiol 2019; 9:933-946. [PMID: 31187897 DOI: 10.1002/cphy.c180032] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
The mammalian cell expresses thousands of long noncoding RNAs (lncRNAs) that are longer than 200 nucleotides but do not encode any protein. lncRNAs can change the expression of protein-coding genes through both cis and trans mechanisms, including imprinting and other types of transcriptional regulation, and posttranscriptional regulation including serving as molecular sponges. Deep sequencing, coupled with analysis of sequence characteristics, is the primary method used to identify lncRNAs. Physiological roles of specific lncRNAs can be examined using genetic targeting or knockdown with modified oligonucleotides. Identification of nucleic acids or proteins with which an lncRNA interacts is essential for understanding the molecular mechanism underlying its physiological role. lncRNAs have been reported to contribute to the regulation of physiological functions and disease development in several organ systems, including the cardiovascular, renal, muscular, endocrine, digestive, nervous, respiratory, and reproductive systems. The physiological role of the majority of lncRNAs, many of which are species and tissue specific, remains to be determined. © 2019 American Physiological Society. Compr Physiol 9:933-946, 2019.
Collapse
Affiliation(s)
- Yiwei Kong
- Center of Systems Molecular Medicine, Department of Physiology, Medical College of Wisconsin, Milwaukee, Wisconsin, USA.,Department of Nephrology, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China
| | - Zeyuan Lu
- Center of Systems Molecular Medicine, Department of Physiology, Medical College of Wisconsin, Milwaukee, Wisconsin, USA.,Department of Nephrology, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China
| | - Pengyuan Liu
- Center of Systems Molecular Medicine, Department of Physiology, Medical College of Wisconsin, Milwaukee, Wisconsin, USA.,Sir Run Run Shaw Hospital, Institute of Translational Medicine, Zhejiang University, Zhejiang, China
| | - Yong Liu
- Center of Systems Molecular Medicine, Department of Physiology, Medical College of Wisconsin, Milwaukee, Wisconsin, USA
| | - Feng Wang
- Center of Systems Molecular Medicine, Department of Physiology, Medical College of Wisconsin, Milwaukee, Wisconsin, USA.,Department of Nephrology, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China
| | - Eugene Y Liang
- Center for Advancing Population Science, Medical College of Wisconsin, Milwaukee, Wisconsin, USA
| | - Fan Fan Hou
- National Clinical Research Center for Kidney Disease, State Key Laboratory of Organ Failure Research, Guangzhou Regenerative Medicine and Health - Guangdong Laboratory, Division of Nephrology, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Mingyu Liang
- Center of Systems Molecular Medicine, Department of Physiology, Medical College of Wisconsin, Milwaukee, Wisconsin, USA
| |
Collapse
|
85
|
Fang Z, Tan J, Wu S, Li M, Xu C, Xie Z, Zhu H. PPR-Meta: a tool for identifying phages and plasmids from metagenomic fragments using deep learning. Gigascience 2019; 8:giz066. [PMID: 31220250 PMCID: PMC6586199 DOI: 10.1093/gigascience/giz066] [Citation(s) in RCA: 101] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2018] [Revised: 03/26/2019] [Accepted: 05/14/2019] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Phages and plasmids are the major components of mobile genetic elements, and fragments from such elements generally co-exist with chromosome-derived fragments in sequenced metagenomic data. However, there is a lack of efficient methods that can simultaneously identify phages and plasmids in metagenomic data, and the existing tools identifying either phages or plasmids have not yet presented satisfactory performance. FINDINGS We present PPR-Meta, a 3-class classifier that allows simultaneous identification of both phage and plasmid fragments from metagenomic assemblies. PPR-Meta consists of several modules for predicting sequences of different lengths. Using deep learning, a novel network architecture, referred to as the Bi-path Convolutional Neural Network, is designed to improve the performance for short fragments. PPR-Meta demonstrates much better performance than currently available similar tools individually for phage or plasmid identification, while testing on both artificial contigs and real metagenomic data. PPR-Meta is freely available via http://cqb.pku.edu.cn/ZhuLab/PPR_Meta or https://github.com/zhenchengfang/PPR-Meta. CONCLUSIONS To the best of our knowledge, PPR-Meta is the first tool that can simultaneously identify phage and plasmid fragments efficiently and reliably. The software is optimized and can be easily run on a local PC by non-computer professionals. We developed PPR-Meta to promote the research on mobile genetic elements and horizontal gene transfer.
Collapse
Affiliation(s)
- Zhencheng Fang
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, No.5 Yiheyuan Road Haidian District, Beijing 100871, China
- Center for Quantitative Biology, Peking University, No.5 Yiheyuan Road Haidian District, Beijing 100871, China
| | - Jie Tan
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, No.5 Yiheyuan Road Haidian District, Beijing 100871, China
- Center for Quantitative Biology, Peking University, No.5 Yiheyuan Road Haidian District, Beijing 100871, China
| | - Shufang Wu
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, No.5 Yiheyuan Road Haidian District, Beijing 100871, China
- Center for Quantitative Biology, Peking University, No.5 Yiheyuan Road Haidian District, Beijing 100871, China
| | - Mo Li
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, No.5 Yiheyuan Road Haidian District, Beijing 100871, China
- Center for Quantitative Biology, Peking University, No.5 Yiheyuan Road Haidian District, Beijing 100871, China
- Peking University-Tsinghua University - National Institute of Biological Sciences (PTN) joint PhD program, School of Life Sciences, Peking University, No.5 Yiheyuan Road Haidian District, Beijing 100871, China
| | - Congmin Xu
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, No.5 Yiheyuan Road Haidian District, Beijing 100871, China
- Center for Quantitative Biology, Peking University, No.5 Yiheyuan Road Haidian District, Beijing 100871, China
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, 631 Cherry St, Atlanta, Georgia 30332, GA, USA
| | - Zhongjie Xie
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, No.5 Yiheyuan Road Haidian District, Beijing 100871, China
- Center for Quantitative Biology, Peking University, No.5 Yiheyuan Road Haidian District, Beijing 100871, China
| | - Huaiqiu Zhu
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, No.5 Yiheyuan Road Haidian District, Beijing 100871, China
- Center for Quantitative Biology, Peking University, No.5 Yiheyuan Road Haidian District, Beijing 100871, China
| |
Collapse
|
86
|
Wekesa JS, Luan Y, Chen M, Meng J. A Hybrid Prediction Method for Plant lncRNA-Protein Interaction. Cells 2019; 8:E521. [PMID: 31151273 PMCID: PMC6627874 DOI: 10.3390/cells8060521] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2019] [Revised: 05/22/2019] [Accepted: 05/29/2019] [Indexed: 01/23/2023] Open
Abstract
Long non-protein-coding RNAs (lncRNAs) identification and analysis are pervasive in transcriptome studies due to their roles in biological processes. In particular, lncRNA-protein interaction has plausible relevance to gene expression regulation and in cellular processes such as pathogen resistance in plants. While lncRNA-protein interaction has been studied in animals, there has yet to be extensive research in plants. In this paper, we propose a novel plant lncRNA-protein interaction prediction method, namely PLRPIM, which combines deep learning and shallow machine learning methods. The selection of an optimal feature subset and subsequent efficient compression are significant challenges for deep learning models. The proposed method adopts k-mer and extracts high-level abstraction sequence-based features using stacked sparse autoencoder. Based on the extracted features, the fusion of random forest (RF) and light gradient boosting machine (LGBM) is used to build the prediction model. The performances are evaluated on Arabidopsis thaliana and Zea mays datasets. Results from experiments demonstrate PLRPIM's superiority compared with other prediction tools on the two datasets. Based on 5-fold cross-validation, we obtain 89.98% and 93.44% accuracy, 0.954 and 0.982 AUC for Arabidopsis thaliana and Zea mays, respectively. PLRPIM predicts potential lncRNA-protein interaction pairs effectively, which can facilitate lncRNA related research including function prediction.
Collapse
Affiliation(s)
- Jael Sanyanda Wekesa
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116023, Liaoning, China.
- Department of Information Technology, Jomo Kenyatta University of Agriculture and Technology, Nairobi 62000-00200, Kenya.
| | - Yushi Luan
- School of Bioengineering, Dalian University of Technology, Dalian 116023, Liaoning, China.
| | - Ming Chen
- College of Life Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China.
| | - Jun Meng
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116023, Liaoning, China.
| |
Collapse
|
87
|
Amin N, McGrath A, Chen YPP. Evaluation of deep learning in non-coding RNA classification. NAT MACH INTELL 2019. [DOI: 10.1038/s42256-019-0051-2] [Citation(s) in RCA: 90] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
|
88
|
Deep learning in bioinformatics: Introduction, application, and perspective in the big data era. Methods 2019; 166:4-21. [PMID: 31022451 DOI: 10.1016/j.ymeth.2019.04.008] [Citation(s) in RCA: 152] [Impact Index Per Article: 25.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2018] [Revised: 03/23/2019] [Accepted: 04/15/2019] [Indexed: 12/13/2022] Open
Abstract
Deep learning, which is especially formidable in handling big data, has achieved great success in various fields, including bioinformatics. With the advances of the big data era in biology, it is foreseeable that deep learning will become increasingly important in the field and will be incorporated in vast majorities of analysis pipelines. In this review, we provide both the exoteric introduction of deep learning, and concrete examples and implementations of its representative applications in bioinformatics. We start from the recent achievements of deep learning in the bioinformatics field, pointing out the problems which are suitable to use deep learning. After that, we introduce deep learning in an easy-to-understand fashion, from shallow neural networks to legendary convolutional neural networks, legendary recurrent neural networks, graph neural networks, generative adversarial networks, variational autoencoder, and the most recent state-of-the-art architectures. After that, we provide eight examples, covering five bioinformatics research directions and all the four kinds of data type, with the implementation written in Tensorflow and Keras. Finally, we discuss the common issues, such as overfitting and interpretability, that users will encounter when adopting deep learning methods and provide corresponding suggestions. The implementations are freely available at https://github.com/lykaust15/Deep_learning_examples.
Collapse
|
89
|
Long Noncoding RNA and Protein Interactions: From Experimental Results to Computational Models Based on Network Methods. Int J Mol Sci 2019; 20:ijms20061284. [PMID: 30875752 PMCID: PMC6471543 DOI: 10.3390/ijms20061284] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2019] [Revised: 03/09/2019] [Accepted: 03/11/2019] [Indexed: 01/13/2023] Open
Abstract
Non-coding RNAs with a length of more than 200 nucleotides are long non-coding RNAs (lncRNAs), which have gained tremendous attention in recent decades. Many studies have confirmed that lncRNAs have important influence in post-transcriptional gene regulation; for example, lncRNAs affect the stability and translation of splicing factor proteins. The mutations and malfunctions of lncRNAs are closely related to human disorders. As lncRNAs interact with a variety of proteins, predicting the interaction between lncRNAs and proteins is a significant way to depth exploration functions and enrich annotations of lncRNAs. Experimental approaches for lncRNA–protein interactions are expensive and time-consuming. Computational approaches to predict lncRNA–protein interactions can be grouped into two broad categories. The first category is based on sequence, structural information and physicochemical property. The second category is based on network method through fusing heterogeneous data to construct lncRNA related heterogeneous network. The network-based methods can capture the implicit feature information in the topological structure of related biological heterogeneous networks containing lncRNAs, which is often ignored by sequence-based methods. In this paper, we summarize and discuss the materials, interaction score calculation algorithms, advantages and disadvantages of state-of-the-art algorithms of lncRNA–protein interaction prediction based on network methods to assist researchers in selecting a suitable method for acquiring more dependable results. All the related different network data are also collected and processed in convenience of users, and are available at https://github.com/HAN-Siyu/APINet/.
Collapse
|
90
|
Turner AW, Wong D, Khan MD, Dreisbach CN, Palmore M, Miller CL. Multi-Omics Approaches to Study Long Non-coding RNA Function in Atherosclerosis. Front Cardiovasc Med 2019; 6:9. [PMID: 30838214 PMCID: PMC6389617 DOI: 10.3389/fcvm.2019.00009] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2018] [Accepted: 01/30/2019] [Indexed: 12/15/2022] Open
Abstract
Atherosclerosis is a complex inflammatory disease of the vessel wall involving the interplay of multiple cell types including vascular smooth muscle cells, endothelial cells, and macrophages. Large-scale genome-wide association studies (GWAS) and the advancement of next generation sequencing technologies have rapidly expanded the number of long non-coding RNA (lncRNA) transcripts predicted to play critical roles in the pathogenesis of the disease. In this review, we highlight several lncRNAs whose functional role in atherosclerosis is well-documented through traditional biochemical approaches as well as those identified through RNA-sequencing and other high-throughput assays. We describe novel genomics approaches to study both evolutionarily conserved and divergent lncRNA functions and interactions with DNA, RNA, and proteins. We also highlight assays to resolve the complex spatial and temporal regulation of lncRNAs. Finally, we summarize the latest suite of computational tools designed to improve genomic and functional annotation of these transcripts in the human genome. Deep characterization of lncRNAs is fundamental to unravel coronary atherosclerosis and other cardiovascular diseases, as these regulatory molecules represent a new class of potential therapeutic targets and/or diagnostic markers to mitigate both genetic and environmental risk factors.
Collapse
Affiliation(s)
- Adam W. Turner
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, United States
| | - Doris Wong
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, United States
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA, United States
| | - Mohammad Daud Khan
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, United States
| | - Caitlin N. Dreisbach
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, United States
- School of Nursing, University of Virginia, Charlottesville, VA, United States
- Data Science Institute, University of Virginia, Charlottesville, VA, United States
| | - Meredith Palmore
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, United States
| | - Clint L. Miller
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, United States
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA, United States
- Data Science Institute, University of Virginia, Charlottesville, VA, United States
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA, United States
- Department of Public Health Sciences, University of Virginia, Charlottesville, VA, United States
| |
Collapse
|
91
|
Pyfrom SC, Luo H, Payton JE. PLAIDOH: a novel method for functional prediction of long non-coding RNAs identifies cancer-specific LncRNA activities. BMC Genomics 2019; 20:137. [PMID: 30767760 PMCID: PMC6377765 DOI: 10.1186/s12864-019-5497-4] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2018] [Accepted: 01/29/2019] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Long non-coding RNAs (lncRNAs) exhibit remarkable cell-type specificity and disease association. LncRNA's functional versatility includes epigenetic modification, nuclear domain organization, transcriptional control, regulation of RNA splicing and translation, and modulation of protein activity. However, most lncRNAs remain uncharacterized due to a shortage of predictive tools available to guide functional experiments. RESULTS To address this gap for lymphoma-associated lncRNAs identified in our studies, we developed a new computational method, Predicting LncRNA Activity through Integrative Data-driven 'Omics and Heuristics (PLAIDOH), which has several unique features not found in other methods. PLAIDOH integrates transcriptome, subcellular localization, enhancer landscape, genome architecture, chromatin interaction, and RNA-binding (eCLIP) data and generates statistically defined output scores. PLAIDOH's approach identifies and ranks functional connections between individual lncRNA, coding gene, and protein pairs using enhancer, transcript cis-regulatory, and RNA-binding protein interactome scores that predict the relative likelihood of these different lncRNA functions. When applied to 'omics datasets that we collected from lymphoma patients, or to publicly available cancer (TCGA) or ENCODE datasets, PLAIDOH identified and prioritized well-known lncRNA-target gene regulatory pairs (e.g., HOTAIR and HOX genes, PVT1 and MYC), validated hits in multiple lncRNA-targeted CRISPR screens, and lncRNA-protein binding partners (e.g., NEAT1 and NONO). Importantly, PLAIDOH also identified novel putative functional interactions, including one lymphoma-associated lncRNA based on analysis of data from our human lymphoma study. We validated PLAIDOH's predictions for this lncRNA using knock-down and knock-out experiments in lymphoma cell models. CONCLUSIONS Our study demonstrates that we have developed a new method for the prediction and ranking of functional connections between individual lncRNA, coding gene, and protein pairs, which were validated by genetic experiments and comparison to published CRISPR screens. PLAIDOH expedites validation and follow-on mechanistic studies of lncRNAs in any biological system. It is available at https://github.com/sarahpyfrom/PLAIDOH .
Collapse
Affiliation(s)
- Sarah C. Pyfrom
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO 63110 USA
| | - Hong Luo
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO 63110 USA
| | - Jacqueline E. Payton
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO 63110 USA
| |
Collapse
|