1
|
Mohapatra S, Banerjee A, Rausseo P, Dragomir MP, Manyam GC, Broom BM, Calin GA. FuncPEP v2.0: An Updated Database of Functional Short Peptides Translated from Non-Coding RNAs. Noncoding RNA 2024; 10:20. [PMID: 38668378 PMCID: PMC11054400 DOI: 10.3390/ncrna10020020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Revised: 03/27/2024] [Accepted: 03/28/2024] [Indexed: 04/29/2024] Open
Abstract
Over the past decade, there have been reports of short novel functional peptides (less than 100 aa in length) translated from so-called non-coding RNAs (ncRNAs) that have been characterized using mass spectrometry (MS) and large-scale proteomics studies. Therefore, understanding the bivalent functions of some ncRNAs as transcripts that encode both functional RNAs and short peptides, which we named ncPEPs, will deepen our understanding of biology and disease. In 2020, we published the first database of functional peptides translated from non-coding RNAs-FuncPEP. Herein, we have performed an update including the newly published ncPEPs from the last 3 years along with the categorization of host ncRNAs. FuncPEP v2.0 contains 152 functional ncPEPs, out of which 40 are novel entries. A PubMed search from August 2020 to July 2023 incorporating specific keywords was performed and screened for publications reporting validated functional peptides derived from ncRNAs. We did not observe a significant increase in newly discovered functional ncPEPs, but a steady increase. The novel identified ncPEPs included in the database were characterized by a wide array of molecular and physiological parameters (i.e., types of host ncRNA, species distribution, chromosomal density, distribution of ncRNA length, identification methods, molecular weight, and functional distribution across humans and other species). We consider that, despite the fact that MS can now easily identify ncPEPs, there still are important limitations in proving their functionality.
Collapse
Affiliation(s)
- Swati Mohapatra
- Department of Translational Molecular Pathology, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; (S.M.); (P.R.)
- The University of Texas MD Anderson Cancer Center UTHealth Houston Graduate School of Biomedical Sciences, Houston, TX 77030, USA;
| | - Anik Banerjee
- The University of Texas MD Anderson Cancer Center UTHealth Houston Graduate School of Biomedical Sciences, Houston, TX 77030, USA;
- Department of Neurology, University of Texas McGovern Medical School, Houston, TX 77030, USA
| | - Paola Rausseo
- Department of Translational Molecular Pathology, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; (S.M.); (P.R.)
- Scripps College, Claremont, CA 91711, USA
| | - Mihnea P. Dragomir
- Institute of Pathology, Charité—Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, 10117 Berlin, Germany;
- German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
- Berlin Institute of Health at Charité, 10117 Berlin, Germany
| | - Ganiraju C. Manyam
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; (G.C.M.)
| | - Bradley M. Broom
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; (G.C.M.)
| | - George A. Calin
- Department of Translational Molecular Pathology, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; (S.M.); (P.R.)
- Center for RNA Interference and Non-Coding RNAs, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| |
Collapse
|
2
|
Danilevicz MF, Gill M, Fernandez CGT, Petereit J, Upadhyaya SR, Batley J, Bennamoun M, Edwards D, Bayer PE. DNABERT-based explainable lncRNA identification in plant genome assemblies. Comput Struct Biotechnol J 2023; 21:5676-5685. [PMID: 38058296 PMCID: PMC10696397 DOI: 10.1016/j.csbj.2023.11.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 11/13/2023] [Accepted: 11/13/2023] [Indexed: 12/08/2023] Open
Abstract
Long non-coding ribonucleic acids (lncRNAs) have been shown to play an important role in plant gene regulation, involving both epigenetic and transcript regulation. LncRNAs are transcripts longer than 200 nucleotides that are not translated into functional proteins but can be translated into small peptides. Machine learning models have predominantly used transcriptome data with manually defined features to detect lncRNAs, however, they often underrepresent the abundance of lncRNAs and can be biased in their detection. Here we present a study using Natural Language Processing (NLP) models to identify plant lncRNAs from genomic sequences rather than transcriptomic data. The NLP models were trained to predict lncRNAs for seven model and crop species (Zea mays, Arabidopsis thaliana, Brassica napus, Brassica oleracea, Brassica rapa, Glycine max and Oryza sativa) using publicly available genomic references. We demonstrated that lncRNAs can be accurately predicted from genomic sequences with the highest accuracy of 83.4% for Z. mays and the lowest accuracy of 57.9% for B. rapa, revealing that genome assembly quality might affect the accuracy of lncRNA identification. Furthermore, we demonstrated the potential of using NLP models for cross-species prediction with an average of 63.1% accuracy using target species not previously seen by the model. As more species are incorporated into the training datasets, we expect the accuracy to increase, becoming a more reliable tool for uncovering novel lncRNAs. Finally, we show that the models can be interpreted using explainable artificial intelligence to identify motifs important to lncRNA prediction and that these motifs frequently flanked the lncRNA sequence.
Collapse
Affiliation(s)
| | - Mitchell Gill
- School of Biological Sciences, University of Western Australia, Australia
| | | | - Jakob Petereit
- School of Biological Sciences, University of Western Australia, Australia
| | | | - Jacqueline Batley
- School of Biological Sciences, University of Western Australia, Australia
| | - Mohammed Bennamoun
- School of Physics, Mathematics and Computing, University of Western Australia, Australia
| | - David Edwards
- School of Biological Sciences, University of Western Australia, Australia
| | - Philipp E. Bayer
- School of Biological Sciences, University of Western Australia, Australia
| |
Collapse
|
3
|
Dong X, Zhang K, Xun C, Chu T, Liang S, Zeng Y, Liu Z. Small Open Reading Frame-Encoded Micro-Peptides: An Emerging Protein World. Int J Mol Sci 2023; 24:10562. [PMID: 37445739 DOI: 10.3390/ijms241310562] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2023] [Revised: 06/20/2023] [Accepted: 06/21/2023] [Indexed: 07/15/2023] Open
Abstract
Small open reading frames (sORFs) are often overlooked features in genomes. In the past, they were labeled as noncoding or "transcriptional noise". However, accumulating evidence from recent years suggests that sORFs may be transcribed and translated to produce sORF-encoded polypeptides (SEPs) with less than 100 amino acids. The vigorous development of computational algorithms, ribosome profiling, and peptidome has facilitated the prediction and identification of many new SEPs. These SEPs were revealed to be involved in a wide range of basic biological processes, such as gene expression regulation, embryonic development, cellular metabolism, inflammation, and even carcinogenesis. To effectively understand the potential biological functions of SEPs, we discuss the history and development of the newly emerging research on sORFs and SEPs. In particular, we review a range of recently discovered bioinformatics tools for identifying, predicting, and validating SEPs as well as a variety of biochemical experiments for characterizing SEP functions. Lastly, this review underlines the challenges and future directions in identifying and validating sORFs and their encoded micropeptides, providing a significant reference for upcoming research on sORF-encoded peptides.
Collapse
Affiliation(s)
- Xiaoping Dong
- National & Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, China
- Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha 410081, China
| | - Kun Zhang
- The State Key Laboratory of Developmental Biology of Freshwater Fish, College of Life Science, Hunan Normal University, Changsha 410081, China
| | - Chengfeng Xun
- National & Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, China
- Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha 410081, China
| | - Tianqi Chu
- National & Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, China
- Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha 410081, China
| | - Songping Liang
- National & Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, China
- Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha 410081, China
| | - Yong Zeng
- National & Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, China
- Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha 410081, China
- The State Key Laboratory of Developmental Biology of Freshwater Fish, College of Life Science, Hunan Normal University, Changsha 410081, China
| | - Zhonghua Liu
- National & Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, China
- Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha 410081, China
| |
Collapse
|
4
|
Kim Y, Lee M. Deep Learning Approaches for lncRNA-Mediated Mechanisms: A Comprehensive Review of Recent Developments. Int J Mol Sci 2023; 24:10299. [PMID: 37373445 DOI: 10.3390/ijms241210299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 06/16/2023] [Accepted: 06/17/2023] [Indexed: 06/29/2023] Open
Abstract
This review paper provides an extensive analysis of the rapidly evolving convergence of deep learning and long non-coding RNAs (lncRNAs). Considering the recent advancements in deep learning and the increasing recognition of lncRNAs as crucial components in various biological processes, this review aims to offer a comprehensive examination of these intertwined research areas. The remarkable progress in deep learning necessitates thoroughly exploring its latest applications in the study of lncRNAs. Therefore, this review provides insights into the growing significance of incorporating deep learning methodologies to unravel the intricate roles of lncRNAs. By scrutinizing the most recent research spanning from 2021 to 2023, this paper provides a comprehensive understanding of how deep learning techniques are employed in investigating lncRNAs, thereby contributing valuable insights to this rapidly evolving field. The review is aimed at researchers and practitioners looking to integrate deep learning advancements into their lncRNA studies.
Collapse
Affiliation(s)
- Yoojoong Kim
- School of Computer Science and Information Engineering, The Catholic University of Korea, Bucheon 14662, Republic of Korea
| | - Minhyeok Lee
- School of Electrical and Electronics Engineering, Chung-Ang University, Seoul 06974, Republic of Korea
| |
Collapse
|
5
|
Palos K, Yu L, Railey CE, Nelson Dittrich AC, Nelson ADL. Linking discoveries, mechanisms, and technologies to develop a clearer perspective on plant long noncoding RNAs. THE PLANT CELL 2023; 35:1762-1786. [PMID: 36738093 PMCID: PMC10226578 DOI: 10.1093/plcell/koad027] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 12/19/2022] [Accepted: 12/22/2022] [Indexed: 05/30/2023]
Abstract
Long noncoding RNAs (lncRNAs) are a large and diverse class of genes in eukaryotic genomes that contribute to a variety of regulatory processes. Functionally characterized lncRNAs play critical roles in plants, ranging from regulating flowering to controlling lateral root formation. However, findings from the past decade have revealed that thousands of lncRNAs are present in plant transcriptomes, and characterization has lagged far behind identification. In this setting, distinguishing function from noise is challenging. However, the plant community has been at the forefront of discovery in lncRNA biology, providing many functional and mechanistic insights that have increased our understanding of this gene class. In this review, we examine the key discoveries and insights made in plant lncRNA biology over the past two and a half decades. We describe how discoveries made in the pregenomics era have informed efforts to identify and functionally characterize lncRNAs in the subsequent decades. We provide an overview of the functional archetypes into which characterized plant lncRNAs fit and speculate on new avenues of research that may uncover yet more archetypes. Finally, this review discusses the challenges facing the field and some exciting new molecular and computational approaches that may help inform lncRNA comparative and functional analyses.
Collapse
Affiliation(s)
- Kyle Palos
- Boyce Thompson Institute, Cornell University, Ithaca, NY 14853, USA
| | - Li’ang Yu
- Boyce Thompson Institute, Cornell University, Ithaca, NY 14853, USA
| | - Caylyn E Railey
- Boyce Thompson Institute, Cornell University, Ithaca, NY 14853, USA
- Plant Biology Graduate Field, Cornell University, Ithaca, NY 14853, USA
| | | | | |
Collapse
|
6
|
Wang Y, Zhao P, Du H, Cao Y, Peng Q, Fu L. LncDLSM: Identification of Long Non-Coding RNAs With Deep Learning-Based Sequence Model. IEEE J Biomed Health Inform 2023; 27:2117-2127. [PMID: 37027676 DOI: 10.1109/jbhi.2023.3247805] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/24/2023]
Abstract
Long non-coding RNAs (LncRNAs) serve a vital role in regulating gene expressions and other biological processes. Differentiation of lncRNAs from protein-coding transcripts helps researchers dig into the mechanism of lncRNA formation and its downstream regulations related to various diseases. Previous works have been proposed to identify lncRNAs, including traditional bio-sequencing and machine learning approaches. Considering the tedious work of biological characteristic-based feature extraction procedures and inevitable artifacts during bio-sequencing processes, those lncRNA detection methods are not always satisfactory. Hence, in this work, we presented lncDLSM, a deep learning-based framework differentiating lncRNA from other protein-coding transcripts without dependencies on prior biological knowledge. lncDLSM is a helpful tool for identifying lncRNAs compared with other biological feature-based machine learning methods and can be applied to other species by transfer learning achieving satisfactory results. Further experiments showed that different species display distinct boundaries among distributions corresponding to the homology and the specificity among species, respectively.
Collapse
|
7
|
Wang Z, Wang S, Fan X, Zhang K, Zhang J, Zhao H, Gao X, Zhang Y, Guo S, Zhou D, Li Q, Na Z, Chen D, Guo R. Systematic Characterization and Regulatory Role of lncRNAs in Asian Honey Bees Responding to Microsporidian Infestation. Int J Mol Sci 2023; 24:ijms24065886. [PMID: 36982959 PMCID: PMC10058195 DOI: 10.3390/ijms24065886] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2023] [Revised: 03/09/2023] [Accepted: 03/17/2023] [Indexed: 03/30/2023] Open
Abstract
Long noncoding RNAs (lncRNAs) are pivotal regulators in gene expression and diverse biological processes, such as immune defense and host-pathogen interactions. However, little is known about the roles of lncRNAs in the response of the Asian honey bee (Apis cerana) to microsporidian infestation. Based on our previously obtained high-quality transcriptome datasets from the midgut tissues of Apis cerana cerana workers at 7 days post inoculation (dpi) and 10 dpi with Nosema ceranae (AcT7 and AcT10 groups) and the corresponding un-inoculated midgut tissues (AcCK7 and AcCK10 groups), the transcriptome-wide identification and structural characterization of lncRNAs were conducted, and the differential expression pattern of lncRNAs was then analyzed, followed by investigation of the regulatory roles of differentially expressed lncRNAs (DElncRNAs) in host response. Here, 2365, 2322, 2487, and 1986 lncRNAs were, respectively, identified in the AcCK7, AcT7, AcCK7, and AcT10 groups. After removing redundant ones, a total of 3496 A. c. cerana lncRNAs were identified, which shared similar structural characteristics with those discovered in other animals and plants, such as shorter exons and introns than mRNAs. Additionally, 79 and 73 DElncRNAs were screened from the workers' midguts at 7 dpi and 10 dpi, respectively, indicating the alteration of the overall expression pattern of lncRNAs in host midguts after N. ceranae infestation. These DElncRNAs could, respectively, regulate 87 and 73 upstream and downstream genes, involving a suite of functional terms and pathways, such as metabolic process and Hippo signaling pathway. Additionally, 235 and 209 genes co-expressed with DElncRNAs were found to enrich in 29 and 27 terms, as well as 112 and 123 pathways, such as ABC transporters and the cAMP signaling pathway. Further, it was detected that 79 (73) DElncRNAs in the host midguts at 7 (10) dpi could target 321 (313) DEmiRNAs and further target 3631 (3130) DEmRNAs. TCONS_00024312 and XR_001765805.1 were potential precursors for ame-miR-315 and ame-miR-927, while TCONS_00006120 was the putative precursor for both ame-miR-87-1 and ame-miR-87-2. These results together suggested that DElncRNAs are likely to play regulatory roles in the host response to N. ceranae infestation through the regulation of neighboring genes via a cis-acting effect, modulation of co-expressed mRNAs via trans-acting effect, and control of downstream target genes' expression via competing endogenous RNA networks. Our findings provide a basis for disclosing the mechanism underlying DElncRNA-mediated host N. ceranae response and a new perspective into the interaction between A. c. cerana and N. ceranae.
Collapse
Affiliation(s)
- Zixin Wang
- College of Animal Sciences (College of Bee Science), Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Siyi Wang
- College of Animal Sciences (College of Bee Science), Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Xiaoxue Fan
- College of Animal Sciences (College of Bee Science), Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Kaiyao Zhang
- College of Animal Sciences (College of Bee Science), Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Jiaxin Zhang
- College of Animal Sciences (College of Bee Science), Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Haodong Zhao
- College of Animal Sciences (College of Bee Science), Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Xuze Gao
- College of Animal Sciences (College of Bee Science), Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Yiqiong Zhang
- College of Animal Sciences (College of Bee Science), Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Sijia Guo
- College of Animal Sciences (College of Bee Science), Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Dingding Zhou
- College of Animal Sciences (College of Bee Science), Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Qiming Li
- College of Animal Sciences (College of Bee Science), Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Zhihao Na
- College of Animal Sciences (College of Bee Science), Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Dafu Chen
- College of Animal Sciences (College of Bee Science), Fujian Agriculture and Forestry University, Fuzhou 350002, China
- Apitherapy Research Institute of Fujian Province, Fuzhou 350002, China
| | - Rui Guo
- College of Animal Sciences (College of Bee Science), Fujian Agriculture and Forestry University, Fuzhou 350002, China
- Apitherapy Research Institute of Fujian Province, Fuzhou 350002, China
| |
Collapse
|
8
|
Dindhoria K, Monga I, Thind AS. Computational approaches and challenges for identification and annotation of non-coding RNAs using RNA-Seq. Funct Integr Genomics 2022; 22:1105-1112. [DOI: 10.1007/s10142-022-00915-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 11/04/2022] [Accepted: 11/04/2022] [Indexed: 11/22/2022]
|
9
|
Lu Y, Xu H, Jiang Y, Hu Z, Du R, Zhao X, Tian Y, Zhu Q, Zhang Y, Liu Y, Wang Y. Comprehensive analysis of differently expression mRNA and non-coding RNAs, and their regulatory mechanisms on relationship in thiram-induced tibial dyschondroplasia in chicken. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2022; 242:113924. [PMID: 35908532 DOI: 10.1016/j.ecoenv.2022.113924] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 07/10/2022] [Accepted: 07/26/2022] [Indexed: 06/15/2023]
Abstract
Thiram pollution is one of the main causes of tibial dyschondroplasia (TD) induced by feed sources. Several studies have speculated that miRNA, circRNA and lncRNA may have significant impact on the development of TD, however, the specific mRNAs and noncoding RNAs and their respective regulatory mechanisms and functions in the development of TD have not been explored. Therefore, in this present study, we screened the differentially expressed mRNA, miRNA, circRNA and lncRNA by whole-transcriptome sequencing (RNA-seq) and differentially expressed genes (DEGs) enrichment, as well as constructed the interaction network among the mRNA-miRNA, mRNA-lncRNA and mRNA-miRNA-circRNA. The sequencing results were verified by fluorescence real-time quantitative PCR (RT-qPCR). The results obtained in this study, revealed that the cells were atrophied and disordered in the TD group, and the expression of BMP6, TGF-β and VEGF were significantly reduced. A total of 141 mRNAs, 10 miRNAs, 23 lncRNAs and 35 circRNAs of DEGs were obtained (p<0.05) Theses DEGs were enriched in the adhere junction and insulin signaling pathways. In addition, the mRNA-miRNA-circRNA network suggested that several pivotal ceRNA showed a regulatory relationship between the transcripts with miRNA, circRNA or lncRNA. Taken together, the results in the present study, represent an insight for further functional research on the ceRNA regulatory mechanism of TD in broilers.
Collapse
Affiliation(s)
- Yuxiang Lu
- Key Laboratory of Livestock and Poultry Multi-omics, Ministry of Agriculture and Rural Affairs, College of Animal and Technology (Institute of Animal Genetics and Breeding), Sichuan Agricultural University, Chengdu, Sichuan 611130, PR China; Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu, Sichuan 611130, PR China
| | - Hengyong Xu
- Key Laboratory of Livestock and Poultry Multi-omics, Ministry of Agriculture and Rural Affairs, College of Animal and Technology (Institute of Animal Genetics and Breeding), Sichuan Agricultural University, Chengdu, Sichuan 611130, PR China; Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu, Sichuan 611130, PR China
| | - Yuru Jiang
- Key Laboratory of Livestock and Poultry Multi-omics, Ministry of Agriculture and Rural Affairs, College of Animal and Technology (Institute of Animal Genetics and Breeding), Sichuan Agricultural University, Chengdu, Sichuan 611130, PR China; Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu, Sichuan 611130, PR China
| | - Zhi Hu
- Key Laboratory of Livestock and Poultry Multi-omics, Ministry of Agriculture and Rural Affairs, College of Animal and Technology (Institute of Animal Genetics and Breeding), Sichuan Agricultural University, Chengdu, Sichuan 611130, PR China; Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu, Sichuan 611130, PR China
| | - Ranran Du
- Key Laboratory of Livestock and Poultry Multi-omics, Ministry of Agriculture and Rural Affairs, College of Animal and Technology (Institute of Animal Genetics and Breeding), Sichuan Agricultural University, Chengdu, Sichuan 611130, PR China; Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu, Sichuan 611130, PR China
| | - Xiaoling Zhao
- Key Laboratory of Livestock and Poultry Multi-omics, Ministry of Agriculture and Rural Affairs, College of Animal and Technology (Institute of Animal Genetics and Breeding), Sichuan Agricultural University, Chengdu, Sichuan 611130, PR China; Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu, Sichuan 611130, PR China
| | - Yaofu Tian
- Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu, Sichuan 611130, PR China
| | - Qing Zhu
- Key Laboratory of Livestock and Poultry Multi-omics, Ministry of Agriculture and Rural Affairs, College of Animal and Technology (Institute of Animal Genetics and Breeding), Sichuan Agricultural University, Chengdu, Sichuan 611130, PR China; Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu, Sichuan 611130, PR China
| | - Yao Zhang
- Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu, Sichuan 611130, PR China
| | - Yiping Liu
- Key Laboratory of Livestock and Poultry Multi-omics, Ministry of Agriculture and Rural Affairs, College of Animal and Technology (Institute of Animal Genetics and Breeding), Sichuan Agricultural University, Chengdu, Sichuan 611130, PR China; Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu, Sichuan 611130, PR China
| | - Yan Wang
- Key Laboratory of Livestock and Poultry Multi-omics, Ministry of Agriculture and Rural Affairs, College of Animal and Technology (Institute of Animal Genetics and Breeding), Sichuan Agricultural University, Chengdu, Sichuan 611130, PR China; Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu, Sichuan 611130, PR China.
| |
Collapse
|
10
|
Pan J, Wang R, Shang F, Ma R, Rong Y, Zhang Y. Functional Micropeptides Encoded by Long Non-Coding RNAs: A Comprehensive Review. Front Mol Biosci 2022; 9:817517. [PMID: 35769907 PMCID: PMC9234465 DOI: 10.3389/fmolb.2022.817517] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Accepted: 05/24/2022] [Indexed: 12/03/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) were originally defined as non-coding RNAs (ncRNAs) which lack protein-coding ability. However, with the emergence of technologies such as ribosome profiling sequencing and ribosome-nascent chain complex sequencing, it has been demonstrated that most lncRNAs have short open reading frames hence the potential to encode functional micropeptides. Such micropeptides have been described to be widely involved in life-sustaining activities in several organisms, such as homeostasis regulation, disease, and tumor occurrence, and development, and morphological development of animals, and plants. In this review, we focus on the latest developments in the field of lncRNA-encoded micropeptides, and describe the relevant computational tools and techniques for micropeptide prediction and identification. This review aims to serve as a reference for future research studies on lncRNA-encoded micropeptides.
Collapse
Affiliation(s)
- Jianfeng Pan
- College of Animal Science, Inner Mongolia Agricultural University, Hohhot, China
| | - Ruijun Wang
- College of Animal Science, Inner Mongolia Agricultural University, Hohhot, China
- Key Laboratory of Mutton Sheep Genetics and Breeding, Ministry of Agriculture, Hohhot, China
- Key Laboratory of Animal Genetics, Breeding and Reproduction, Hohhot, China
- Engineering Research Center for Goat Genetics and Breeding, Hohhot, China
| | - Fangzheng Shang
- College of Animal Science, Inner Mongolia Agricultural University, Hohhot, China
| | - Rong Ma
- College of Animal Science, Inner Mongolia Agricultural University, Hohhot, China
| | - Youjun Rong
- College of Animal Science, Inner Mongolia Agricultural University, Hohhot, China
| | - Yanjun Zhang
- College of Animal Science, Inner Mongolia Agricultural University, Hohhot, China
- Key Laboratory of Mutton Sheep Genetics and Breeding, Ministry of Agriculture, Hohhot, China
- Key Laboratory of Animal Genetics, Breeding and Reproduction, Hohhot, China
- Engineering Research Center for Goat Genetics and Breeding, Hohhot, China
- *Correspondence: Yanjun Zhang,
| |
Collapse
|
11
|
Aryee DNT, Fock V, Kapoor U, Radic-Sarikas B, Kovar H. Zooming in on Long Non-Coding RNAs in Ewing Sarcoma Pathogenesis. Cells 2022; 11:1267. [PMID: 35455947 PMCID: PMC9032025 DOI: 10.3390/cells11081267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Revised: 03/29/2022] [Accepted: 04/06/2022] [Indexed: 11/16/2022] Open
Abstract
Ewing sarcoma (ES) is a rare aggressive cancer of bone and soft tissue that is mainly characterized by a reciprocal chromosomal translocation. As a result, about 90% of cases express the EWS-FLI1 fusion protein that has been shown to function as an aberrant transcription factor driving sarcomagenesis. ES is the second most common malignant bone tumor in children and young adults. Current treatment modalities include dose-intensified chemo- and radiotherapy, as well as surgery. Despite these strategies, patients who present with metastasis or relapse still have dismal prognosis, warranting a better understanding of treatment resistant-disease biology in order to generate better prognostic and therapeutic tools. Since the genomes of ES tumors are relatively quiet and stable, exploring the contributions of epigenetic mechanisms in the initiation and progression of the disease becomes inevitable. The search for novel biomarkers and potential therapeutic targets of cancer metastasis and chemotherapeutic drug resistance is increasingly focusing on long non-coding RNAs (lncRNAs). Recent advances in genome analysis by high throughput sequencing have immensely expanded and advanced our knowledge of lncRNAs. They are non-protein coding RNA species with multiple biological functions that have been shown to be dysregulated in many diseases and are emerging as crucial players in cancer development. Understanding the various roles of lncRNAs in tumorigenesis and metastasis would determine eclectic avenues to establish therapeutic and diagnostic targets. In ES, some lncRNAs have been implicated in cell proliferation, migration and invasion, features that make them suitable as relevant biomarkers and therapeutic targets. In this review, we comprehensively discuss known lncRNAs implicated in ES that could serve as potential biomarkers and therapeutic targets of the disease. Though some current reviews have discussed non-coding RNAs in ES, to our knowledge, this is the first review focusing exclusively on ES-associated lncRNAs.
Collapse
Affiliation(s)
- Dave N T Aryee
- St. Anna Children's Cancer Research Institute, 1090 Vienna, Austria
- Department of Pediatrics, Medical University of Vienna, 1090 Vienna, Austria
| | - Valerie Fock
- St. Anna Children's Cancer Research Institute, 1090 Vienna, Austria
| | - Utkarsh Kapoor
- St. Anna Children's Cancer Research Institute, 1090 Vienna, Austria
| | - Branka Radic-Sarikas
- St. Anna Children's Cancer Research Institute, 1090 Vienna, Austria
- Department of Pediatric Surgery, Medical University of Vienna, 1090 Vienna, Austria
| | - Heinrich Kovar
- St. Anna Children's Cancer Research Institute, 1090 Vienna, Austria
- Department of Pediatrics, Medical University of Vienna, 1090 Vienna, Austria
| |
Collapse
|
12
|
Chen XG, Liu S, Zhang W. Predicting Coding Potential of RNA Sequences by Solving Local Data Imbalance. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1075-1083. [PMID: 32886613 DOI: 10.1109/tcbb.2020.3021800] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Non-coding RNAs (ncRNAs)play an important role in various biological processes and are associated with diseases. Distinguishing between coding RNAs and ncRNAs, also known as predicting coding potential of RNA sequences, is critical for downstream biological function analysis. Many machine learning-based methods have been proposed for predicting coding potential of RNA sequences. Recent studies reveal that most existing methods have poor performance on RNA sequences with short Open Reading Frames (sORF, ORF length<303nt). In this work, we analyze the distribution of ORF length of RNA sequences, and observe that the number of coding RNAs with sORF is inadequate and coding RNAs with sORF are much less than ncRNAs with sORF. Thus, there exists the problem of local data imbalance in RNA sequences with sORF. We propose a coding potential prediction method CPE-SLDI, which uses data oversampling techniques to augment samples for coding RNAs with sORF so as to alleviate local data imbalance. Compared with existing methods, CPE-SLDI produces the better performances, and studies reveal that data augmentation by various data oversampling techniques can enhance the performance of coding potential prediction, especially for RNA sequences with sORF. The implementation of the proposed method is available at https://github.com/chenxgscuec/CPESLDI.
Collapse
|
13
|
Micropeptides translated from putative long non-coding RNAs. Acta Biochim Biophys Sin (Shanghai) 2022; 54:292-300. [PMID: 35538037 PMCID: PMC9827906 DOI: 10.3724/abbs.2022010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) transcribed in mammals and eukaryotes were thought to have no protein coding capability. However, recent studies have suggested that plenty of lncRNAs are mis-annotated and virtually contain coding sequences which are translated into functional peptides by ribosomal machinery, and these functional peptides are called micropeptides or small peptides. Here we review the rapidly advancing field of micropeptides translated from putative lncRNAs, describe the strategies for their identification, and elucidate their critical roles in many fundamental biological processes. We also discuss the prospects of research in micropeptides and the potential applications of micropeptides.
Collapse
|
14
|
Watson ER, Taherian Fard A, Mar JC. Computational Methods for Single-Cell Imaging and Omics Data Integration. Front Mol Biosci 2022; 8:768106. [PMID: 35111809 PMCID: PMC8801747 DOI: 10.3389/fmolb.2021.768106] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Accepted: 11/29/2021] [Indexed: 12/12/2022] Open
Abstract
Integrating single cell omics and single cell imaging allows for a more effective characterisation of the underlying mechanisms that drive a phenotype at the tissue level, creating a comprehensive profile at the cellular level. Although the use of imaging data is well established in biomedical research, its primary application has been to observe phenotypes at the tissue or organ level, often using medical imaging techniques such as MRI, CT, and PET. These imaging technologies complement omics-based data in biomedical research because they are helpful for identifying associations between genotype and phenotype, along with functional changes occurring at the tissue level. Single cell imaging can act as an intermediary between these levels. Meanwhile new technologies continue to arrive that can be used to interrogate the genome of single cells and its related omics datasets. As these two areas, single cell imaging and single cell omics, each advance independently with the development of novel techniques, the opportunity to integrate these data types becomes more and more attractive. This review outlines some of the technologies and methods currently available for generating, processing, and analysing single-cell omics- and imaging data, and how they could be integrated to further our understanding of complex biological phenomena like ageing. We include an emphasis on machine learning algorithms because of their ability to identify complex patterns in large multidimensional data.
Collapse
Affiliation(s)
| | - Atefeh Taherian Fard
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, QLD, Australia
| | - Jessica Cara Mar
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, QLD, Australia
| |
Collapse
|
15
|
Nazari E, Biviji R, Roshandel D, Pour R, Shahriari MH, Mehrabian A, Tabesh H. Decision fusion in healthcare and medicine: a narrative review. Mhealth 2022; 8:8. [PMID: 35178439 PMCID: PMC8800206 DOI: 10.21037/mhealth-21-15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 08/02/2021] [Indexed: 11/06/2022] Open
Abstract
OBJECTIVE To provide an overview of the decision fusion (DF) technique and describe the applications of the technique in healthcare and medicine at prevention, diagnosis, treatment and administrative levels. BACKGROUND The rapid development of technology over the past 20 years has led to an explosion in data growth in various industries, like healthcare. Big data analysis within the healthcare systems is essential for arriving to a value-based decision over a period of time. Diversity and uncertainty in big data analytics have made it impossible to analyze data by using conventional data mining techniques and thus alternative solutions are required. DF is a form of data fusion techniques that could increase the accuracy of diagnosis and facilitate interpretation, summarization and sharing of information. METHODS We conducted a review of articles published between January 1980 and December 2020 from various databases such as Google Scholar, IEEE, PubMed, Science Direct, Scopus and web of science using the keywords decision fusion (DF), information fusion, healthcare, medicine and big data. A total of 141 articles were included in this narrative review. CONCLUSIONS Given the importance of big data analysis in reducing costs and improving the quality of healthcare; along with the potential role of DF in big data analysis, it is recommended to know the full potential of this technique including the advantages, challenges and applications of the technique before its use. Future studies should focus on describing the methodology and types of data used for its applications within the healthcare sector.
Collapse
Affiliation(s)
- Elham Nazari
- Department of Medical Informatics, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Rizwana Biviji
- Science of Healthcare Delivery, College of Health Solutions, Arizona State University, Phoenix, AZ, USA
| | - Danial Roshandel
- Centre for Ophthalmology and Visual Science (affiliated with the Lions Eye Institute), The University of Western Australia, Perth, Western Australia, Australia
| | - Reza Pour
- Department of Computer Engineering, Azad University, Mashhad, Iran
| | - Mohammad Hasan Shahriari
- Department of Health Information Technology and Management, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Amin Mehrabian
- Warwick Medical School, University of Warwick, Coventry, UK
| | - Hamed Tabesh
- Department of Medical Informatics, Mashhad University of Medical Sciences, Mashhad, Iran
| |
Collapse
|
16
|
PLncWX: A Machine-Learning Algorithm for Plant lncRNA Identification Based on WOA-XGBoost. J CHEM-NY 2021. [DOI: 10.1155/2021/6256021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Long noncoding RNAs (lncRNAs) are a class of RNAs longer than 200 nt and cannot encode the protein. Studies have shown that lncRNAs can regulate gene expression at the epigenetic, transcriptional, and posttranscriptional levels, which are not only closely related to the occurrence, development, and prevention of human diseases, but also can regulate plant flowering and participate in plant abiotic stress responses such as drought and salt. Therefore, how to accurately and efficiently identify lncRNAs is still an essential job of relevant researches. There have been a large number of identification tools based on machine-learning and deep learning algorithms, mostly using human and mouse gene sequences as training sets, seldom plants, and only using one or one class of feature selection methods after feature extraction. We developed an identification model containing dicot, monocot, algae, moss, and fern. After comparing 20 feature selection methods (seven filter and thirteen wrapper methods) combined with seven classifiers, respectively, considering the correlation between features and model redundancy at the same time, we found that the WOA-XGBoost-based model had better performance with 91.55%, 96.78%, and 91.68% of accuracy, AUC, and F1_score. Meanwhile, the number of elements in the feature subset was reduced to 23, which effectively improved the prediction accuracy and modeling efficiency.
Collapse
|
17
|
Asim MN, Ibrahim MA, Imran Malik M, Dengel A, Ahmed S. Advances in Computational Methodologies for Classification and Sub-Cellular Locality Prediction of Non-Coding RNAs. Int J Mol Sci 2021; 22:8719. [PMID: 34445436 PMCID: PMC8395733 DOI: 10.3390/ijms22168719] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2021] [Revised: 08/02/2021] [Accepted: 08/03/2021] [Indexed: 02/06/2023] Open
Abstract
Apart from protein-coding Ribonucleic acids (RNAs), there exists a variety of non-coding RNAs (ncRNAs) which regulate complex cellular and molecular processes. High-throughput sequencing technologies and bioinformatics approaches have largely promoted the exploration of ncRNAs which revealed their crucial roles in gene regulation, miRNA binding, protein interactions, and splicing. Furthermore, ncRNAs are involved in the development of complicated diseases like cancer. Categorization of ncRNAs is essential to understand the mechanisms of diseases and to develop effective treatments. Sub-cellular localization information of ncRNAs demystifies diverse functionalities of ncRNAs. To date, several computational methodologies have been proposed to precisely identify the class as well as sub-cellular localization patterns of RNAs). This paper discusses different types of ncRNAs, reviews computational approaches proposed in the last 10 years to distinguish coding-RNA from ncRNA, to identify sub-types of ncRNAs such as piwi-associated RNA, micro RNA, long ncRNA, and circular RNA, and to determine sub-cellular localization of distinct ncRNAs and RNAs. Furthermore, it summarizes diverse ncRNA classification and sub-cellular localization determination datasets along with benchmark performance to aid the development and evaluation of novel computational methodologies. It identifies research gaps, heterogeneity, and challenges in the development of computational approaches for RNA sequence analysis. We consider that our expert analysis will assist Artificial Intelligence researchers with knowing state-of-the-art performance, model selection for various tasks on one platform, dominantly used sequence descriptors, neural architectures, and interpreting inter-species and intra-species performance deviation.
Collapse
Affiliation(s)
- Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany
| | - Muhammad Ali Ibrahim
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany
| | - Muhammad Imran Malik
- National Center for Artificial Intelligence (NCAI), National University of Sciences and Technology, Islamabad 44000, Pakistan;
- School of Electrical Engineering & Computer Science, National University of Sciences and Technology, Islamabad 44000, Pakistan
| | - Andreas Dengel
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany
| | - Sheraz Ahmed
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- DeepReader GmbH, Trippstadter Str. 122, 67663 Kaiserslautern, Germany
| |
Collapse
|
18
|
Hassanzadeh HR, Wang MD. An Integrated Deep Network for Cancer Survival Prediction Using Omics Data. Front Big Data 2021; 4:568352. [PMID: 34337396 PMCID: PMC8322661 DOI: 10.3389/fdata.2021.568352] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2020] [Accepted: 06/01/2021] [Indexed: 12/22/2022] Open
Abstract
As a highly sophisticated disease that humanity faces, cancer is known to be associated with dysregulation of cellular mechanisms in different levels, which demands novel paradigms to capture informative features from different omics modalities in an integrated way. Successful stratification of patients with respect to their molecular profiles is a key step in precision medicine and in tailoring personalized treatment for critically ill patients. In this article, we use an integrated deep belief network to differentiate high-risk cancer patients from the low-risk ones in terms of the overall survival. Our study analyzes RNA, miRNA, and methylation molecular data modalities from both labeled and unlabeled samples to predict cancer survival and subsequently to provide risk stratification. To assess the robustness of our novel integrative analytics, we utilize datasets of three cancer types with 836 patients and show that our approach outperforms the most successful supervised and semi-supervised classification techniques applied to the same cancer prediction problems. In addition, despite the preconception that deep learning techniques require large size datasets for proper training, we have illustrated that our model can achieve better results for moderately sized cancer datasets.
Collapse
Affiliation(s)
- Hamid Reza Hassanzadeh
- School of Interactive Computing, Georgia Institute of Technology, Atlanta, GA, United States
| | - May D. Wang
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, United States
| |
Collapse
|
19
|
Singh D, Madhawan A, Roy J. Identification of multiple RNAs using feature fusion. Brief Bioinform 2021; 22:6272794. [PMID: 33971667 DOI: 10.1093/bib/bbab178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 04/08/2021] [Indexed: 11/13/2022] Open
Abstract
Detection of novel transcripts with deep sequencing has increased the demand for computational algorithms as their identification and validation using in vivo techniques is time-consuming, costly and unreliable. Most of these discovered transcripts belong to non-coding RNAs, a large group known for their diverse functional roles but lacks the common taxonomy. Thus, upon the identification of the absence of coding potential in them, it is crucial to recognize their prime functional category. To address this heterogeneity issue, we divide the ncRNAs into three classes and present RNA classifier (RNAC) that categorizes the RNAs into coding, housekeeping, small non-coding and long non-coding classes. RNAC utilizes the alignment-based genomic descriptors to extract statistical, local binary patterns and histogram features and fuse them to construct the classification models with extreme gradient boosting. The experiments are performed on four species, and the performance is assessed on multiclass and conventional binary classification (coding versus no-coding) problems. The proposed approach achieved >93% accuracy on both classification problems and also outperformed other well-known existing methods in coding potential prediction. This validates the usefulness of feature fusion for improved performance on both types of classification problems. Hence, RNAC is a valuable tool for the accurate identification of multiple RNAs .
Collapse
Affiliation(s)
- Dalwinder Singh
- National Agri-Food Biotechnology Institute, Sector 81, SAS Nagar, 140306, Punjab, India
| | - Akansha Madhawan
- National Agri-Food Biotechnology Institute, Sector 81, SAS Nagar, 140306, Punjab, India
| | - Joy Roy
- National Agri-Food Biotechnology Institute, Sector 81, SAS Nagar, 140306, Punjab, India
| |
Collapse
|
20
|
Xu X, Liu S, Yang Z, Zhao X, Deng Y, Zhang G, Pang J, Zhao C, Zhang W. A systematic review of computational methods for predicting long noncoding RNAs. Brief Funct Genomics 2021; 20:162-173. [PMID: 33754153 DOI: 10.1093/bfgp/elab016] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 02/20/2021] [Accepted: 02/22/2021] [Indexed: 12/20/2022] Open
Abstract
Accurately and rapidly distinguishing long noncoding RNAs (lncRNAs) from transcripts is prerequisite for exploring their biological functions. In recent years, many computational methods have been developed to predict lncRNAs from transcripts, but there is no systematic review on these computational methods. In this review, we introduce databases and features involved in the development of computational prediction models, and subsequently summarize existing state-of-the-art computational methods, including methods based on binary classifiers, deep learning and ensemble learning. However, a user-friendly way of employing existing state-of-the-art computational methods is in demand. Therefore, we develop a Python package ezLncPred, which provides a pragmatic command line implementation to utilize nine state-of-the-art lncRNA prediction methods. Finally, we discuss challenges of lncRNA prediction and future directions.
Collapse
|
21
|
Bonidia RP, Sampaio LDH, Domingues DS, Paschoal AR, Lopes FM, de Carvalho ACPLF, Sanches DS. Feature extraction approaches for biological sequences: a comparative study of mathematical features. Brief Bioinform 2021; 22:6135010. [PMID: 33585910 DOI: 10.1093/bib/bbab011] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Revised: 12/13/2020] [Accepted: 01/07/2021] [Indexed: 11/14/2022] Open
Abstract
As consequence of the various genomic sequencing projects, an increasing volume of biological sequence data is being produced. Although machine learning algorithms have been successfully applied to a large number of genomic sequence-related problems, the results are largely affected by the type and number of features extracted. This effect has motivated new algorithms and pipeline proposals, mainly involving feature extraction problems, in which extracting significant discriminatory information from a biological set is challenging. Considering this, our work proposes a new study of feature extraction approaches based on mathematical features (numerical mapping with Fourier, entropy and complex networks). As a case study, we analyze long non-coding RNA sequences. Moreover, we separated this work into three studies. First, we assessed our proposal with the most addressed problem in our review, e.g. lncRNA and mRNA; second, we also validate the mathematical features in different classification problems, to predict the class of lncRNA, e.g. circular RNAs sequences; third, we analyze its robustness in scenarios with imbalanced data. The experimental results demonstrated three main contributions: first, an in-depth study of several mathematical features; second, a new feature extraction pipeline; and third, its high performance and robustness for distinct RNA sequence classification. Availability: https://github.com/Bonidia/FeatureExtraction_BiologicalSequences.
Collapse
Affiliation(s)
- Robson P Bonidia
- Department of Computer Science, Bioinformatics Graduate Program (PPGBIOINFO), Federal University of Technology - Paraná, UTFPR, Campus Cornélio Procópio, 86300-000, Brazil.,Institute of Mathematics and Computer Sciences, University of São Paulo - USP, São Carlos, 13566-590, Brazil
| | - Lucas D H Sampaio
- Department of Computer Science, Bioinformatics Graduate Program (PPGBIOINFO), Federal University of Technology - Paraná, UTFPR, Campus Cornélio Procópio, 86300-000, Brazil
| | - Douglas S Domingues
- Department of Computer Science, Bioinformatics Graduate Program (PPGBIOINFO), Federal University of Technology - Paraná, UTFPR, Campus Cornélio Procópio, 86300-000, Brazil.,Department of Botany, Institute of Biosciences, São Paulo State University (UNESP), Rio Claro 13506-900, Brazil
| | - Alexandre R Paschoal
- Department of Computer Science, Bioinformatics Graduate Program (PPGBIOINFO), Federal University of Technology - Paraná, UTFPR, Campus Cornélio Procópio, 86300-000, Brazil
| | - Fabrício M Lopes
- Department of Computer Science, Bioinformatics Graduate Program (PPGBIOINFO), Federal University of Technology - Paraná, UTFPR, Campus Cornélio Procópio, 86300-000, Brazil
| | - André C P L F de Carvalho
- Institute of Mathematics and Computer Sciences, University of São Paulo - USP, São Carlos, 13566-590, Brazil
| | - Danilo S Sanches
- Department of Computer Science, Bioinformatics Graduate Program (PPGBIOINFO), Federal University of Technology - Paraná, UTFPR, Campus Cornélio Procópio, 86300-000, Brazil
| |
Collapse
|
22
|
Abstract
Background Many transcripts have been generated due to the development of sequencing technologies, and lncRNA is an important type of transcript. Predicting lncRNAs from transcripts is a challenging and important task. Traditional experimental lncRNA prediction methods are time-consuming and labor-intensive. Efficient computational methods for lncRNA prediction are in demand. Results In this paper, we propose two lncRNA prediction methods based on feature ensemble learning strategies named LncPred-IEL and LncPred-ANEL. Specifically, we encode sequences into six different types of features including transcript-specified features and general sequence-derived features. Then we consider two feature ensemble strategies to utilize and integrate the information in different feature types, the iterative ensemble learning (IEL) and the attention network ensemble learning (ANEL). IEL employs a supervised iterative way to ensemble base predictors built on six different types of features. ANEL introduces an attention mechanism-based deep learning model to ensemble features by adaptively learning the weight of individual feature types. Experiments demonstrate that both LncPred-IEL and LncPred-ANEL can effectively separate lncRNAs and other transcripts in feature space. Moreover, comparison experiments demonstrate that LncPred-IEL and LncPred-ANEL outperform several state-of-the-art methods when evaluated by 5-fold cross-validation. Both methods have good performances in cross-species lncRNA prediction. Conclusions LncPred-IEL and LncPred-ANEL are promising lncRNA prediction tools that can effectively utilize and integrate the information in different types of features.
Collapse
Affiliation(s)
- Yanzhen Xu
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Xiaohan Zhao
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Shuai Liu
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Wen Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China.
| |
Collapse
|
23
|
Li J, Zhang X, Liu C. The computational approaches of lncRNA identification based on coding potential: Status quo and challenges. Comput Struct Biotechnol J 2020; 18:3666-3677. [PMID: 33304463 PMCID: PMC7710504 DOI: 10.1016/j.csbj.2020.11.030] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 11/15/2020] [Accepted: 11/16/2020] [Indexed: 12/13/2022] Open
Abstract
Long noncoding RNAs (lncRNAs) make up a large proportion of transcriptome in eukaryotes, and have been revealed with many regulatory functions in various biological processes. When studying lncRNAs, the first step is to accurately and specifically distinguish them from the colossal transcriptome data with complicated composition, which contains mRNAs, lncRNAs, small RNAs and their primary transcripts. In the face of such a huge and progressively expanding transcriptome data, the in-silico approaches provide a practicable scheme for effectively and rapidly filtering out lncRNA targets, using machine learning and probability statistics. In this review, we mainly discussed the characteristics of algorithms and features on currently developed approaches. We also outlined the traits of some state-of-the-art tools for ease of operation. Finally, we pointed out the underlying challenges in lncRNA identification with the advent of new experimental data.
Collapse
Affiliation(s)
- Jing Li
- CAS Key Laboratory of Tropical Plant Resources and Sustainable Use, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Menglun, Mengla, Yunnan 666303, China
- Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Menglun, Mengla, Yunnan 666303, China
| | - Xuan Zhang
- CAS Key Laboratory of Tropical Plant Resources and Sustainable Use, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Menglun, Mengla, Yunnan 666303, China
| | - Changning Liu
- CAS Key Laboratory of Tropical Plant Resources and Sustainable Use, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Menglun, Mengla, Yunnan 666303, China
- Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Menglun, Mengla, Yunnan 666303, China
- The Innovative Academy of Seed Design, Chinese Academy of Sciences, Menglun, Mengla, Yunnan 666303, China
| |
Collapse
|
24
|
Ahmed S, Hossain Z, Uddin M, Taherzadeh G, Sharma A, Shatabda S, Dehzangi A. Accurate prediction of RNA 5-hydroxymethylcytosine modification by utilizing novel position-specific gapped k-mer descriptors. Comput Struct Biotechnol J 2020; 18:3528-3538. [PMID: 33304452 PMCID: PMC7701324 DOI: 10.1016/j.csbj.2020.10.032] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2020] [Revised: 10/30/2020] [Accepted: 10/30/2020] [Indexed: 12/13/2022] Open
Abstract
RNA modification is an essential step towards generation of new RNA structures. Such modification is potentially able to modify RNA function or its stability. Among different modifications, 5-Hydroxymethylcytosine (5hmC) modification of RNA exhibit significant potential for a series of biological processes. Understanding the distribution of 5hmC in RNA is essential to determine its biological functionality. Although conventional sequencing techniques allow broad identification of 5hmC, they are both time-consuming and resource-intensive. In this study, we propose a new computational tool called iRNA5hmC-PS to tackle this problem. To build iRNA5hmC-PS we extract a set of novel sequence-based features called Position-Specific Gapped k-mer (PSG k-mer) to obtain maximum sequential information. Our feature analysis shows that our proposed PSG k-mer features contain vital information for the identification of 5hmC sites. We also use a group-wise feature importance calculation strategy to select a small subset of features containing maximum discriminative information. Our experimental results demonstrate that iRNA5hmC-PS is able to enhance the prediction performance, dramatically. iRNA5hmC-PS achieves 78.3% prediction performance, which is 12.8% better than those reported in the previous studies. iRNA5hmC-PS is publicly available as an online tool at http://103.109.52.8:81/iRNA5hmC-PS. Its benchmark dataset, source codes, and documentation are available at https://github.com/zahid6454/iRNA5hmC-PS.
Collapse
Affiliation(s)
- Sajid Ahmed
- Department of Computer Science and Engineering, United International University, Dhaka, Bangladesh
| | - Zahid Hossain
- Department of Computer Science and Engineering, United International University, Dhaka, Bangladesh
| | - Mahtab Uddin
- Department of Natural Science, United International University, Dhaka, Bangladesh
| | - Ghazaleh Taherzadeh
- Institute for Bioscience and Biotechnology Research, University of Maryland, College Park, MD 20742, USA
| | - Alok Sharma
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, QLD 4111, Australia.,Department of Medical Science Mathematics, Tokyo Medical and Dental University (TMDU), Tokyo, Japan.,Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan.,School of Engineering and Physics, University of the South Pacific, Suva, Fiji
| | - Swakkhar Shatabda
- Department of Computer Science and Engineering, United International University, Dhaka, Bangladesh
| | - Abdollah Dehzangi
- Department of Computer Science, Rutgers University, Camden, NJ 08102, USA.,Center for Computational and Integrative Biology, Rutgers University, Camden, NJ 08102, USA
| |
Collapse
|
25
|
Towards a comprehensive pipeline to identify and functionally annotate long noncoding RNA (lncRNA). Comput Biol Med 2020; 127:104028. [PMID: 33126123 DOI: 10.1016/j.compbiomed.2020.104028] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Revised: 09/28/2020] [Accepted: 09/29/2020] [Indexed: 12/20/2022]
Abstract
Long noncoding RNAs (lncRNAs) are implicated in various genetic diseases and cancer, attributed to their critical role in gene regulation. They are a divergent group of RNAs and are easily differentiated from other types with unique characteristics, functions, and mechanisms of action. In this review, we provide a list of some of the prominent data repositories containing lncRNAs, their interactome, and predicted and validated disease associations. Next, we discuss various wet-lab experiments formulated to obtain the data for these repositories. We also provide a critical review of in silico methods available for the identification purpose and suggest techniques to further improve their performance. The bulk of the methods currently focus on distinguishing lncRNA transcripts from the coding ones. Functional annotation of these transcripts still remains a grey area and more efforts are needed in that space. Finally, we provide details of current progress, discuss impediments, and illustrate a roadmap for developing a generalized computational pipeline for comprehensive annotation of lncRNAs, which is essential to accelerate research in this area.
Collapse
|
26
|
lncRNA_Mdeep: An Alignment-Free Predictor for Distinguishing Long Non-Coding RNAs from Protein-Coding Transcripts by Multimodal Deep Learning. Int J Mol Sci 2020; 21:ijms21155222. [PMID: 32718000 PMCID: PMC7432689 DOI: 10.3390/ijms21155222] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2020] [Revised: 07/14/2020] [Accepted: 07/16/2020] [Indexed: 01/04/2023] Open
Abstract
Long non-coding RNAs (lncRNAs) play crucial roles in diverse biological processes and human complex diseases. Distinguishing lncRNAs from protein-coding transcripts is a fundamental step for analyzing the lncRNA functional mechanism. However, the experimental identification of lncRNAs is expensive and time-consuming. In this study, we presented an alignment-free multimodal deep learning framework (namely lncRNA_Mdeep) to distinguish lncRNAs from protein-coding transcripts. LncRNA_Mdeep incorporated three different input modalities, then a multimodal deep learning framework was built for learning the high-level abstract representations and predicting the probability whether a transcript was lncRNA or not. LncRNA_Mdeep achieved 98.73% prediction accuracy in a 10-fold cross-validation test on humans. Compared with other eight state-of-the-art methods, lncRNA_Mdeep showed 93.12% prediction accuracy independent test on humans, which was 0.94%~15.41% higher than that of other eight methods. In addition, the results on 11 cross-species datasets showed that lncRNA_Mdeep was a powerful predictor for predicting lncRNAs.
Collapse
|
27
|
Choi SW, Kim HW, Nam JW. The small peptide world in long noncoding RNAs. Brief Bioinform 2020; 20:1853-1864. [PMID: 30010717 PMCID: PMC6917221 DOI: 10.1093/bib/bby055] [Citation(s) in RCA: 200] [Impact Index Per Article: 40.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2018] [Revised: 05/08/2018] [Indexed: 02/07/2023] Open
Abstract
Long noncoding RNAs (lncRNAs) are a group of transcripts that are longer than 200 nucleotides (nt) without coding potential. Over the past decade, tens of thousands of novel lncRNAs have been annotated in animal and plant genomes because of advanced high-throughput RNA sequencing technologies and with the aid of coding transcript classifiers. Further, a considerable number of reports have revealed the existence of stable, functional small peptides (also known as micropeptides), translated from lncRNAs. In this review, we discuss the methods of lncRNA classification, the investigations regarding their coding potential and the functional significance of the peptides they encode.
Collapse
Affiliation(s)
- Seo-Won Choi
- Department of Life Science, College of Natural Sciences, Hanyang University, Seoul 04763, Republic of Korea
| | - Hyun-Woo Kim
- Department of Life Science, College of Natural Sciences, Hanyang University, Seoul 04763, Republic of Korea
| | - Jin-Wu Nam
- Department of Life Science, College of Natural Sciences, Hanyang University, Seoul 04763, Republic of Korea
| |
Collapse
|
28
|
LPI-BLS: Predicting lncRNA–protein interactions with a broad learning system-based stacked ensemble classifier. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.08.084] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
|
29
|
Li ZG, Xiang WC, Shui SF, Han XW, Guo D, Yan L. 11 Long noncoding RNA UCA1 functions as miR-135a sponge to promote the epithelial to mesenchymal transition in glioma. J Cell Biochem 2019; 121:2447-2457. [PMID: 31680311 DOI: 10.1002/jcb.29467] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2019] [Accepted: 10/08/2019] [Indexed: 12/18/2022]
Abstract
The dysregulation of long noncoding (lncRNA) UCA1 may play an important role in tumor progression. However, the function in gliomas is unclear. Therefore, this experiment was designed to explore the pathogenesis of glioma based on lncRNA UCA1. Real-time quantitative polymerase chain reaction (RT-qPCR) was used to detect the expression of lncRNA UCA1, miR-135a, and HOXD9 in gliomas tissues. The effect of lncRNA UCA1 and miR-135a on tumor cell proliferation and migration invasiveness was examined by CCK-8 and transwell assays. Target gene prediction and screening, luciferase reporter assay were used to verify downstream target genes of lncRNA UCA1. Expression of E-cadherin, N-cadherin, vimentin, and HOXD9 was detected by RT-qPCR and Western blotting. The tumor changes in mice were detected by in vivo experiments in nude mice. lncRNA UCA1 was highly expressed in glioma tissues and cell lines. lncRNA UCA1 expression was associated with significantly poor overall survival in gliomas. Moreover, lncRNA UCA1 significantly enhanced cell proliferation and migration, and promoted the occurrence of EMT. In addition, lncRNA UCA1 promoted the development of EMT by positively regulating HOXD9 expression as a miR-135a sponge. In vivo experiments indicated that UCA1 exerted its biological functions by modulating miR-135a and HOXD9. In conclusion, lncRNA UCA1 can induce the activation of HOXD9 by inhibiting the expression of miR-135a and promote the occurrence of EMT in glioma.
Collapse
Affiliation(s)
- Zhi-Guo Li
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Wei-Chu Xiang
- Department of Neurosurgery, The General Hospital of Central Theater Command, PLA, China
| | - Shao-Feng Shui
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Xin-Wei Han
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Dong Guo
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Lei Yan
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| |
Collapse
|
30
|
Platon L, Zehraoui F, Bendahmane A, Tahi F. IRSOM, a reliable identifier of ncRNAs based on supervised self-organizing maps with rejection. Bioinformatics 2019; 34:i620-i628. [PMID: 30423081 PMCID: PMC6129289 DOI: 10.1093/bioinformatics/bty572] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
Motivation Non-coding RNAs (ncRNAs) play important roles in many biological processes and are involved in many diseases. Their identification is an important task, and many tools exist in the literature for this purpose. However, almost all of them are focused on the discrimination of coding and ncRNAs without giving more biological insight. In this paper, we propose a new reliable method called IRSOM, based on a supervised Self-Organizing Map (SOM) with a rejection option, that overcomes these limitations. The rejection option in IRSOM improves the accuracy of the method and also allows identifing the ambiguous transcripts. Furthermore, with the visualization of the SOM, we analyze the rejected predictions and highlight the ambiguity of the transcripts. Results IRSOM was tested on datasets of several species from different reigns, and shown better results compared to state-of-art. The accuracy of IRSOM is always greater than 0.95 for all the species with an average specificity of 0.98 and an average sensitivity of 0.99. Besides, IRSOM is fast (it takes around 254 s to analyze a dataset of 147 000 transcripts) and is able to handle very large datasets. Availability and implementation IRSOM is implemented in Python and C++. It is available on our software platform EvryRNA (http://EvryRNA.ibisc.univ-evry.fr).
Collapse
Affiliation(s)
- Ludovic Platon
- IBISC, Université Evry, Université Paris-Saclay, Evry, France.,Institute of Plant Sciences Paris-Saclay, INRA, CNRS, Université Paris-Sud, Université d'Evry, Université Paris-Diderot, Orsay, France
| | - Farida Zehraoui
- IBISC, Université Evry, Université Paris-Saclay, Evry, France
| | - Abdelhafid Bendahmane
- Institute of Plant Sciences Paris-Saclay, INRA, CNRS, Université Paris-Sud, Université d'Evry, Université Paris-Diderot, Orsay, France
| | - Fariza Tahi
- IBISC, Université Evry, Université Paris-Saclay, Evry, France
| |
Collapse
|
31
|
PredLnc-GFStack: A Global Sequence Feature Based on a Stacked Ensemble Learning Method for Predicting lncRNAs from Transcripts. Genes (Basel) 2019; 10:genes10090672. [PMID: 31484412 PMCID: PMC6770532 DOI: 10.3390/genes10090672] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2019] [Revised: 08/05/2019] [Accepted: 08/28/2019] [Indexed: 11/16/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) are a class of RNAs with the length exceeding 200 base pairs (bps), which do not encode proteins, nevertheless, lncRNAs have many vital biological functions. A large number of novel transcripts were discovered as a result of the development of high-throughput sequencing technology. Under this circumstance, computational methods for lncRNA prediction are in great demand. In this paper, we consider global sequence features and propose a stacked ensemble learning-based method to predict lncRNAs from transcripts, abbreviated as PredLnc-GFStack. We extract the critical features from the candidate feature list using the genetic algorithm (GA) and then employ the stacked ensemble learning method to construct PredLnc-GFStack model. Computational experimental results show that PredLnc-GFStack outperforms several state-of-the-art methods for lncRNA prediction. Furthermore, PredLnc-GFStack demonstrates an outstanding ability for cross-species ncRNA prediction.
Collapse
|
32
|
Cao Z, Pan X, Yang Y, Huang Y, Shen HB. The lncLocator: a subcellular localization predictor for long non-coding RNAs based on a stacked ensemble classifier. Bioinformatics 2019; 34:2185-2194. [PMID: 29462250 DOI: 10.1093/bioinformatics/bty085] [Citation(s) in RCA: 287] [Impact Index Per Article: 47.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2017] [Accepted: 02/14/2018] [Indexed: 01/01/2023] Open
Abstract
Motivation The long non-coding RNA (lncRNA) studies have been hot topics in the field of RNA biology. Recent studies have shown that their subcellular localizations carry important information for understanding their complex biological functions. Considering the costly and time-consuming experiments for identifying subcellular localization of lncRNAs, computational methods are urgently desired. However, to the best of our knowledge, there are no computational tools for predicting the lncRNA subcellular locations to date. Results In this study, we report an ensemble classifier-based predictor, lncLocator, for predicting the lncRNA subcellular localizations. To fully exploit lncRNA sequence information, we adopt both k-mer features and high-level abstraction features generated by unsupervised deep models, and construct four classifiers by feeding these two types of features to support vector machine (SVM) and random forest (RF), respectively. Then we use a stacked ensemble strategy to combine the four classifiers and get the final prediction results. The current lncLocator can predict five subcellular localizations of lncRNAs, including cytoplasm, nucleus, cytosol, ribosome and exosome, and yield an overall accuracy of 0.59 on the constructed benchmark dataset. Availability and implementation The lncLocator is available at www.csbio.sjtu.edu.cn/bioinf/lncLocator. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhen Cao
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China
| | - Xiaoyong Pan
- Department of Medical Informatics, Erasmus MC, Rotterdam, The Netherlands
| | - Yang Yang
- Department of Computer Science, Shanghai Jiao Tong University, and Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai, China
| | - Yan Huang
- State Key Laboratory of Infrared Physics, Shanghai Institute of Technical Physics, Chinese Academy of Sciences, Shanghai, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China
| |
Collapse
|
33
|
Zhang SW, Wang Y, Zhang XX, Wang JQ. Prediction of the RBP binding sites on lncRNAs using the high-order nucleotide encoding convolutional neural network. Anal Biochem 2019; 583:113364. [PMID: 31323206 DOI: 10.1016/j.ab.2019.113364] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2019] [Revised: 07/10/2019] [Accepted: 07/15/2019] [Indexed: 01/09/2023]
Abstract
Long non-coding RNA (lncRNA) plays an important role in cells through the interaction with RNA-binding proteins (RBPs). Finding the RBPs binding sites on the lncRNA chains can help to understand the post-transcriptional regulatory mechanism, exploring the pathogenesis of cancers and possible roles in other diseases. Although many genome-wide RBP experimental techniques can identify the RNA-protein interactions and detect the binding sites on RNA chains, they are still time-consuming, labor-intensive and cost-heavy. Thus, many computational methods have been developed to predict the RBPs sites by integrating the RNA sequence, structure and domain specific features, etc. However, current approaches that focus on predicting the RBPs binding sites on RNA chains lack a consideration of the dependencies among nucleotides. In this work, we propose a higher-order nucleotide encoding convolutional neural network-based method (namely HOCNNLB) to predict the RBPs binding sites on lncRNA chains. HOCNNLB first employs a high-order one-hot encoding strategy to encode the lncRNA sequences by considering the dependence among nucleotides, then the encoded lncRNA sequences are fed into the convolutional neural network (CNN) to predict the RBP binding sites. We evaluate HOCNNLB on 31 experimental datasets of 12 lncRNA binding proteins. The average AUC of HOCNNLB achieves 0.953, which is 0.247, 0.175 higher than that of iDeepS and DeepBind, respectively. The average accuracy is 90.2%, which is 26.8%, 19.5% higher than that of iDeepS and DeepBind, respectively. These results demonstrate that HOCNNLB can reliably predict the RBP binding sites on lncRNA chains and outperforms the state-of-the-art methods. The source code of HOCNNLB and the datasets used in this work are available at https://github.com/NWPU-903PR/HOCNNLB for academic users.
Collapse
Affiliation(s)
- Shao-Wu Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an, China.
| | - Ya Wang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an, China
| | - Xi-Xi Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an, China
| | - Jia-Qi Wang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an, China
| |
Collapse
|
34
|
Abstract
Background:
Revealing the subcellular location of a newly discovered protein can
bring insight into their function and guide research at the cellular level. The experimental methods
currently used to identify the protein subcellular locations are both time-consuming and expensive.
Thus, it is highly desired to develop computational methods for efficiently and effectively identifying
the protein subcellular locations. Especially, the rapidly increasing number of protein sequences
entering the genome databases has called for the development of automated analysis methods.
Methods:
In this review, we will describe the recent advances in predicting the protein subcellular
locations with machine learning from the following aspects: i) Protein subcellular location benchmark
dataset construction, ii) Protein feature representation and feature descriptors, iii) Common
machine learning algorithms, iv) Cross-validation test methods and assessment metrics, v) Web
servers.
Result & Conclusion:
Concomitant with a large number of protein sequences generated by highthroughput
technologies, four future directions for predicting protein subcellular locations with
machine learning should be paid attention. One direction is the selection of novel and effective features
(e.g., statistics, physical-chemical, evolutional) from the sequences and structures of proteins.
Another is the feature fusion strategy. The third is the design of a powerful predictor and the fourth
one is the protein multiple location sites prediction.
Collapse
Affiliation(s)
- Ting-He Zhang
- School of Automation, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Shao-Wu Zhang
- School of Automation, Northwestern Polytechnical University, Xi'an, 710072, China
| |
Collapse
|
35
|
Emamjomeh A, Zahiri J, Asadian M, Behmanesh M, Fakheri BA, Mahdevar G. Identification, Prediction and Data Analysis of Noncoding RNAs: A Review. Med Chem 2019; 15:216-230. [PMID: 30484409 DOI: 10.2174/1573406414666181015151610] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2017] [Revised: 06/03/2018] [Accepted: 09/30/2018] [Indexed: 12/13/2022]
Abstract
BACKGROUND Noncoding RNAs (ncRNAs) which play an important role in various cellular processes are important in medicine as well as in drug design strategies. Different studies have shown that ncRNAs are dis-regulated in cancer cells and play an important role in human tumorigenesis. Therefore, it is important to identify and predict such molecules by experimental and computational methods, respectively. However, to avoid expensive experimental methods, computational algorithms have been developed for accurately and fast prediction of ncRNAs. OBJECTIVE The aim of this review was to introduce the experimental and computational methods to identify and predict ncRNAs structure. Also, we explained the ncRNA's roles in cellular processes and drugs design, briefly. METHOD In this survey, we will introduce ncRNAs and their roles in biological and medicinal processes. Then, some important laboratory techniques will be studied to identify ncRNAs. Finally, the state-of-the-art models and algorithms will be introduced along with important tools and databases. RESULTS The results showed that the integration of experimental and computational approaches improves to identify ncRNAs. Moreover, the high accurate databases, algorithms and tools were compared to predict the ncRNAs. CONCLUSION ncRNAs prediction is an exciting research field, but there are different difficulties. It requires accurate and reliable algorithms and tools. Also, it should be mentioned that computational costs of such algorithm including running time and usage memory are very important. Finally, some suggestions were presented to improve computational methods of ncRNAs gene and structural prediction.
Collapse
Affiliation(s)
- Abbasali Emamjomeh
- Laboratory of Computational Biotechnology and Bioinformatics (CBB), Department of Plant Breeding and Biotechnology (PBB), University of Zabol, Zabol, Iran
| | - Javad Zahiri
- Bioinformatics and Computational Omics Lab (BioCOOL), Department of Biophysics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran
| | - Mehrdad Asadian
- Department of Plant Breeding and Biotechnology (PBB), Faculty of Agriculture, University of Zabol, Zabol, Iran
| | - Mehrdad Behmanesh
- Department of Genetics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran
| | - Barat A Fakheri
- Department of Plant Breeding and Biotechnology (PBB), Faculty of Agriculture, University of Zabol, Zabol, Iran
| | - Ghasem Mahdevar
- Department of Mathematics, Faculty of Sciences, University of Isfahan, Isfahan, Iran
| |
Collapse
|
36
|
Kong Y, Lu Z, Liu P, Liu Y, Wang F, Liang EY, Hou FF, Liang M. Long Noncoding RNA: Genomics and Relevance to Physiology. Compr Physiol 2019; 9:933-946. [PMID: 31187897 DOI: 10.1002/cphy.c180032] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
The mammalian cell expresses thousands of long noncoding RNAs (lncRNAs) that are longer than 200 nucleotides but do not encode any protein. lncRNAs can change the expression of protein-coding genes through both cis and trans mechanisms, including imprinting and other types of transcriptional regulation, and posttranscriptional regulation including serving as molecular sponges. Deep sequencing, coupled with analysis of sequence characteristics, is the primary method used to identify lncRNAs. Physiological roles of specific lncRNAs can be examined using genetic targeting or knockdown with modified oligonucleotides. Identification of nucleic acids or proteins with which an lncRNA interacts is essential for understanding the molecular mechanism underlying its physiological role. lncRNAs have been reported to contribute to the regulation of physiological functions and disease development in several organ systems, including the cardiovascular, renal, muscular, endocrine, digestive, nervous, respiratory, and reproductive systems. The physiological role of the majority of lncRNAs, many of which are species and tissue specific, remains to be determined. © 2019 American Physiological Society. Compr Physiol 9:933-946, 2019.
Collapse
Affiliation(s)
- Yiwei Kong
- Center of Systems Molecular Medicine, Department of Physiology, Medical College of Wisconsin, Milwaukee, Wisconsin, USA.,Department of Nephrology, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China
| | - Zeyuan Lu
- Center of Systems Molecular Medicine, Department of Physiology, Medical College of Wisconsin, Milwaukee, Wisconsin, USA.,Department of Nephrology, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China
| | - Pengyuan Liu
- Center of Systems Molecular Medicine, Department of Physiology, Medical College of Wisconsin, Milwaukee, Wisconsin, USA.,Sir Run Run Shaw Hospital, Institute of Translational Medicine, Zhejiang University, Zhejiang, China
| | - Yong Liu
- Center of Systems Molecular Medicine, Department of Physiology, Medical College of Wisconsin, Milwaukee, Wisconsin, USA
| | - Feng Wang
- Center of Systems Molecular Medicine, Department of Physiology, Medical College of Wisconsin, Milwaukee, Wisconsin, USA.,Department of Nephrology, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China
| | - Eugene Y Liang
- Center for Advancing Population Science, Medical College of Wisconsin, Milwaukee, Wisconsin, USA
| | - Fan Fan Hou
- National Clinical Research Center for Kidney Disease, State Key Laboratory of Organ Failure Research, Guangzhou Regenerative Medicine and Health - Guangdong Laboratory, Division of Nephrology, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Mingyu Liang
- Center of Systems Molecular Medicine, Department of Physiology, Medical College of Wisconsin, Milwaukee, Wisconsin, USA
| |
Collapse
|
37
|
Liu XQ, Li BX, Zeng GR, Liu QY, Ai DM. Prediction of Long Non-Coding RNAs Based on Deep Learning. Genes (Basel) 2019; 10:genes10040273. [PMID: 30987229 PMCID: PMC6523782 DOI: 10.3390/genes10040273] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Revised: 03/29/2019] [Accepted: 03/29/2019] [Indexed: 01/09/2023] Open
Abstract
With the rapid development of high-throughput sequencing technology, a large number of transcript sequences have been discovered, and how to identify long non-coding RNAs (lncRNAs) from transcripts is a challenging task. The identification and inclusion of lncRNAs not only can more clearly help us to understand life activities themselves, but can also help humans further explore and study the disease at the molecular level. At present, the detection of lncRNAs mainly includes two forms of calculation and experiment. Due to the limitations of bio sequencing technology and ineluctable errors in sequencing processes, the detection effect of these methods is not very satisfactory. In this paper, we constructed a deep-learning model to effectively distinguish lncRNAs from mRNAs. We used k-mer embedding vectors obtained through training the GloVe algorithm as input features and set up the deep learning framework to include a bidirectional long short-term memory model (BLSTM) layer and a convolutional neural network (CNN) layer with three additional hidden layers. By testing our model, we have found that it obtained the best values of 97.9%, 96.4% and 99.0% in F1score, accuracy and auROC, respectively, which showed better classification performance than the traditional PLEK, CNCI and CPC methods for identifying lncRNAs. We hope that our model will provide effective help in distinguishing mature mRNAs from lncRNAs, and become a potential tool to help humans understand and detect the diseases associated with lncRNAs.
Collapse
Affiliation(s)
- Xiu-Qin Liu
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing 100083, China.
| | - Bing-Xiu Li
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing 100083, China.
| | - Guan-Rong Zeng
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing 100083, China.
| | - Qiao-Yue Liu
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing 100083, China.
| | - Dong-Mei Ai
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing 100083, China.
| |
Collapse
|
38
|
Karlik E, Ari S, Gozukirmizi N. LncRNAs: genetic and epigenetic effects in plants. BIOTECHNOL BIOTEC EQ 2019. [DOI: 10.1080/13102818.2019.1581085] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Affiliation(s)
- Elif Karlik
- Department of Biotechnology Institute of Graduate Studies in Science and Engineering, Istanbul University, Istanbul, Turkey
- Department of Molecular Biology and Genetics Faculty of Science, Istinye University, Istanbul, Turkey
| | - Sule Ari
- Department of Molecular Biology and Genetics Faculty of Science, Istanbul University, Istanbul, Turkey
| | - Nermin Gozukirmizi
- Department of Molecular Biology and Genetics Faculty of Science, Istanbul University, Istanbul, Turkey
- Department of Molecular Biology and Genetics Faculty of Science, Istinye University, Istanbul, Turkey
| |
Collapse
|
39
|
Turner AW, Wong D, Khan MD, Dreisbach CN, Palmore M, Miller CL. Multi-Omics Approaches to Study Long Non-coding RNA Function in Atherosclerosis. Front Cardiovasc Med 2019; 6:9. [PMID: 30838214 PMCID: PMC6389617 DOI: 10.3389/fcvm.2019.00009] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2018] [Accepted: 01/30/2019] [Indexed: 12/15/2022] Open
Abstract
Atherosclerosis is a complex inflammatory disease of the vessel wall involving the interplay of multiple cell types including vascular smooth muscle cells, endothelial cells, and macrophages. Large-scale genome-wide association studies (GWAS) and the advancement of next generation sequencing technologies have rapidly expanded the number of long non-coding RNA (lncRNA) transcripts predicted to play critical roles in the pathogenesis of the disease. In this review, we highlight several lncRNAs whose functional role in atherosclerosis is well-documented through traditional biochemical approaches as well as those identified through RNA-sequencing and other high-throughput assays. We describe novel genomics approaches to study both evolutionarily conserved and divergent lncRNA functions and interactions with DNA, RNA, and proteins. We also highlight assays to resolve the complex spatial and temporal regulation of lncRNAs. Finally, we summarize the latest suite of computational tools designed to improve genomic and functional annotation of these transcripts in the human genome. Deep characterization of lncRNAs is fundamental to unravel coronary atherosclerosis and other cardiovascular diseases, as these regulatory molecules represent a new class of potential therapeutic targets and/or diagnostic markers to mitigate both genetic and environmental risk factors.
Collapse
Affiliation(s)
- Adam W. Turner
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, United States
| | - Doris Wong
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, United States
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA, United States
| | - Mohammad Daud Khan
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, United States
| | - Caitlin N. Dreisbach
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, United States
- School of Nursing, University of Virginia, Charlottesville, VA, United States
- Data Science Institute, University of Virginia, Charlottesville, VA, United States
| | - Meredith Palmore
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, United States
| | - Clint L. Miller
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, United States
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA, United States
- Data Science Institute, University of Virginia, Charlottesville, VA, United States
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA, United States
- Department of Public Health Sciences, University of Virginia, Charlottesville, VA, United States
| |
Collapse
|
40
|
Deshpande S, Shuttleworth J, Yang J, Taramonli S, England M. PLIT: An alignment-free computational tool for identification of long non-coding RNAs in plant transcriptomic datasets. Comput Biol Med 2019; 105:169-181. [DOI: 10.1016/j.compbiomed.2018.12.014] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2018] [Revised: 12/27/2018] [Accepted: 12/29/2018] [Indexed: 02/05/2023]
|
41
|
Prediction of drug-target interaction by integrating diverse heterogeneous information source with multiple kernel learning and clustering methods. Comput Biol Chem 2019; 78:460-467. [DOI: 10.1016/j.compbiolchem.2018.11.028] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2018] [Revised: 11/30/2018] [Accepted: 11/30/2018] [Indexed: 02/08/2023]
|
42
|
Zhavoronkov A, Mamoshina P, Vanhaelen Q, Scheibye-Knudsen M, Moskalev A, Aliper A. Artificial intelligence for aging and longevity research: Recent advances and perspectives. Ageing Res Rev 2019; 49:49-66. [PMID: 30472217 DOI: 10.1016/j.arr.2018.11.003] [Citation(s) in RCA: 102] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2018] [Revised: 11/07/2018] [Accepted: 11/21/2018] [Indexed: 12/14/2022]
Abstract
The applications of modern artificial intelligence (AI) algorithms within the field of aging research offer tremendous opportunities. Aging is an almost universal unifying feature possessed by all living organisms, tissues, and cells. Modern deep learning techniques used to develop age predictors offer new possibilities for formerly incompatible dynamic and static data types. AI biomarkers of aging enable a holistic view of biological processes and allow for novel methods for building causal models-extracting the most important features and identifying biological targets and mechanisms. Recent developments in generative adversarial networks (GANs) and reinforcement learning (RL) permit the generation of diverse synthetic molecular and patient data, identification of novel biological targets, and generation of novel molecular compounds with desired properties and geroprotectors. These novel techniques can be combined into a unified, seamless end-to-end biomarker development, target identification, drug discovery and real world evidence pipeline that may help accelerate and improve pharmaceutical research and development practices. Modern AI is therefore expected to contribute to the credibility and prominence of longevity biotechnology in the healthcare and pharmaceutical industry, and to the convergence of countless areas of research.
Collapse
|
43
|
Yang C, Yang L, Zhou M, Xie H, Zhang C, Wang MD, Zhu H. LncADeep: anab initiolncRNA identification and functional annotation tool based on deep learning. Bioinformatics 2018; 34:3825-3834. [DOI: 10.1093/bioinformatics/bty428] [Citation(s) in RCA: 78] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Accepted: 05/23/2018] [Indexed: 12/15/2022] Open
Affiliation(s)
- Cheng Yang
- Department of Biomedical Engineering, College of Engineering, and Centre for Quantitative Biology, Peking University, Beijing, China
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
| | - Longshu Yang
- Department of Biomedical Engineering, College of Engineering, and Centre for Quantitative Biology, Peking University, Beijing, China
| | - Man Zhou
- Department of Biomedical Engineering, College of Engineering, and Centre for Quantitative Biology, Peking University, Beijing, China
| | - Haoling Xie
- Department of Biomedical Engineering, College of Engineering, and Centre for Quantitative Biology, Peking University, Beijing, China
- Peking University-Tsinghua University-National Institute of Biological Sciences (PTN) Joint PhD Program and College of Life Sciences, Peking University, Beijing, China
| | - Chengjiu Zhang
- Department of Biomedical Engineering, College of Engineering, and Centre for Quantitative Biology, Peking University, Beijing, China
| | - May D Wang
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
| | - Huaiqiu Zhu
- Department of Biomedical Engineering, College of Engineering, and Centre for Quantitative Biology, Peking University, Beijing, China
- Peking University-Tsinghua University-National Institute of Biological Sciences (PTN) Joint PhD Program and College of Life Sciences, Peking University, Beijing, China
| |
Collapse
|
44
|
Fraser K, Bruckner DM, Dordick JS. Advancing Predictive Hepatotoxicity at the Intersection of Experimental, in Silico, and Artificial Intelligence Technologies. Chem Res Toxicol 2018; 31:412-430. [PMID: 29722533 DOI: 10.1021/acs.chemrestox.8b00054] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Adverse drug reactions, particularly those that result in drug-induced liver injury (DILI), are a major cause of drug failure in clinical trials and drug withdrawals. Hepatotoxicity-mediated drug attrition occurs despite substantial investments of time and money in developing cellular assays, animal models, and computational models to predict its occurrence in humans. Underperformance in predicting hepatotoxicity associated with drugs and drug candidates has been attributed to existing gaps in our understanding of the mechanisms involved in driving hepatic injury after these compounds perfuse and are metabolized by the liver. Herein we assess in vitro, in vivo (animal), and in silico strategies used to develop predictive DILI models. We address the effectiveness of several two- and three-dimensional in vitro cellular methods that are frequently employed in hepatotoxicity screens and how they can be used to predict DILI in humans. We also explore how humanized animal models can recapitulate human drug metabolic profiles and associated liver injury. Finally, we highlight the maturation of computational methods for predicting hepatotoxicity, the untapped potential of artificial intelligence for improving in silico DILI screens, and how knowledge acquired from these predictions can shape the refinement of experimental methods.
Collapse
Affiliation(s)
- Keith Fraser
- Department of Chemical and Biological Engineering and Department of Biological Sciences Center for Biotechnology and Interdisciplinary Studies , Rensselaer Polytechnic Institute , Troy , New York 12180 , United States
| | - Dylan M Bruckner
- Department of Chemical and Biological Engineering and Department of Biological Sciences Center for Biotechnology and Interdisciplinary Studies , Rensselaer Polytechnic Institute , Troy , New York 12180 , United States
| | - Jonathan S Dordick
- Department of Chemical and Biological Engineering and Department of Biological Sciences Center for Biotechnology and Interdisciplinary Studies , Rensselaer Polytechnic Institute , Troy , New York 12180 , United States
| |
Collapse
|
45
|
Lefever S, Anckaert J, Volders PJ, Luypaert M, Vandesompele J, Mestdagh P. decodeRNA- predicting non-coding RNA functions using guilt-by-association. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018; 2017:3866795. [PMID: 29220434 PMCID: PMC5502368 DOI: 10.1093/database/bax042] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/10/2016] [Accepted: 05/01/2017] [Indexed: 12/23/2022]
Abstract
Although the long non-coding RNA (lncRNA) landscape is expanding rapidly, only a small number of lncRNAs have been functionally annotated. Here, we present decodeRNA (http://www.decoderna.org), a database providing functional contexts for both human lncRNAs and microRNAs in 29 cancer and 12 normal tissue types. With state-of-the-art data mining and visualization options, easy access to results and a straightforward user interface, decodeRNA aims to be a powerful tool for researchers in the ncRNA field. Database URL:http://www.decoderna.org
Collapse
Affiliation(s)
- Steve Lefever
- Center for Medical Genetics Ghent (CMGG), Ghent University, Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent University, Ghent, Belgium
| | - Jasper Anckaert
- Center for Medical Genetics Ghent (CMGG), Ghent University, Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent University, Ghent, Belgium
| | - Pieter-Jan Volders
- Center for Medical Genetics Ghent (CMGG), Ghent University, Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent University, Ghent, Belgium
| | - Manuel Luypaert
- Center for Medical Genetics Ghent (CMGG), Ghent University, Ghent, Belgium
| | - Jo Vandesompele
- Center for Medical Genetics Ghent (CMGG), Ghent University, Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent University, Ghent, Belgium.,Biogazelle, Zwijnaarde, Belgium
| | - Pieter Mestdagh
- Center for Medical Genetics Ghent (CMGG), Ghent University, Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent University, Ghent, Belgium.,Biogazelle, Zwijnaarde, Belgium
| |
Collapse
|
46
|
West MD, Labat I, Sternberg H, Larocca D, Nasonkin I, Chapman KB, Singh R, Makarev E, Aliper A, Kazennov A, Alekseenko A, Shuvalov N, Cheskidova E, Alekseev A, Artemov A, Putin E, Mamoshina P, Pryanichnikov N, Larocca J, Copeland K, Izumchenko E, Korzinkin M, Zhavoronkov A. Use of deep neural network ensembles to identify embryonic-fetal transition markers: repression of COX7A1 in embryonic and cancer cells. Oncotarget 2017; 9:7796-7811. [PMID: 29487692 PMCID: PMC5814259 DOI: 10.18632/oncotarget.23748] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2017] [Accepted: 12/20/2017] [Indexed: 12/19/2022] Open
Abstract
Here we present the application of deep neural network (DNN) ensembles trained on transcriptomic data to identify the novel markers associated with the mammalian embryonic-fetal transition (EFT). Molecular markers of this process could provide important insights into regulatory mechanisms of normal development, epimorphic tissue regeneration and cancer. Subsequent analysis of the most significant genes behind the DNNs classifier on an independent dataset of adult-derived and human embryonic stem cell (hESC)-derived progenitor cell lines led to the identification of COX7A1 gene as a potential EFT marker. COX7A1, encoding a cytochrome C oxidase subunit, was up-regulated in post-EFT murine and human cells including adult stem cells, but was not expressed in pre-EFT pluripotent embryonic stem cells or their in vitro-derived progeny. COX7A1 expression level was observed to be undetectable or low in multiple sarcoma and carcinoma cell lines as compared to normal controls. The knockout of the gene in mice led to a marked glycolytic shift reminiscent of the Warburg effect that occurs in cancer cells. The DNN approach facilitated the elucidation of a potentially new biomarker of cancer and pre-EFT cells, the embryo-onco phenotype, which may potentially be used as a target for controlling the embryonic-fetal transition.
Collapse
Affiliation(s)
| | - Ivan Labat
- AgeX Therapeutics, Inc., Alameda, CA, USA
| | | | | | | | | | | | - Eugene Makarev
- Pharmaceutical Artificial Intelligence Department, Insilico Medicine, Inc., Emerging Technology Centers, Johns Hopkins University at Eastern, Baltimore, MD, USA
| | - Alex Aliper
- Pharmaceutical Artificial Intelligence Department, Insilico Medicine, Inc., Emerging Technology Centers, Johns Hopkins University at Eastern, Baltimore, MD, USA
| | - Andrey Kazennov
- Pharmaceutical Artificial Intelligence Department, Insilico Medicine, Inc., Emerging Technology Centers, Johns Hopkins University at Eastern, Baltimore, MD, USA.,Moscow Institute of Physics and Technology, Dolgoprudny, Russia
| | - Andrey Alekseenko
- Pharmaceutical Artificial Intelligence Department, Insilico Medicine, Inc., Emerging Technology Centers, Johns Hopkins University at Eastern, Baltimore, MD, USA.,Innopolis University, Innoplis, Russia
| | - Nikolai Shuvalov
- Pharmaceutical Artificial Intelligence Department, Insilico Medicine, Inc., Emerging Technology Centers, Johns Hopkins University at Eastern, Baltimore, MD, USA.,Moscow Institute of Physics and Technology, Dolgoprudny, Russia
| | - Evgenia Cheskidova
- Pharmaceutical Artificial Intelligence Department, Insilico Medicine, Inc., Emerging Technology Centers, Johns Hopkins University at Eastern, Baltimore, MD, USA.,Moscow Institute of Physics and Technology, Dolgoprudny, Russia
| | - Aleksandr Alekseev
- Pharmaceutical Artificial Intelligence Department, Insilico Medicine, Inc., Emerging Technology Centers, Johns Hopkins University at Eastern, Baltimore, MD, USA.,Moscow Institute of Physics and Technology, Dolgoprudny, Russia
| | - Artem Artemov
- Pharmaceutical Artificial Intelligence Department, Insilico Medicine, Inc., Emerging Technology Centers, Johns Hopkins University at Eastern, Baltimore, MD, USA
| | - Evgeny Putin
- Pharmaceutical Artificial Intelligence Department, Insilico Medicine, Inc., Emerging Technology Centers, Johns Hopkins University at Eastern, Baltimore, MD, USA.,Computer Technologies Lab, ITMO University, St. Petersburg, Russia
| | - Polina Mamoshina
- Pharmaceutical Artificial Intelligence Department, Insilico Medicine, Inc., Emerging Technology Centers, Johns Hopkins University at Eastern, Baltimore, MD, USA
| | - Nikita Pryanichnikov
- Pharmaceutical Artificial Intelligence Department, Insilico Medicine, Inc., Emerging Technology Centers, Johns Hopkins University at Eastern, Baltimore, MD, USA
| | | | | | - Evgeny Izumchenko
- Johns Hopkins University, School of Medicine, Department of Otolaryngology-Head and Neck Cancer Research, Baltimore, MD, USA
| | - Mikhail Korzinkin
- Pharmaceutical Artificial Intelligence Department, Insilico Medicine, Inc., Emerging Technology Centers, Johns Hopkins University at Eastern, Baltimore, MD, USA
| | - Alex Zhavoronkov
- Pharmaceutical Artificial Intelligence Department, Insilico Medicine, Inc., Emerging Technology Centers, Johns Hopkins University at Eastern, Baltimore, MD, USA.,The Biogerontology Research Foundation, Trevissome Park, Truro, UK
| |
Collapse
|
47
|
Ye F. Particle swarm optimization-based automatic parameter selection for deep neural networks and its applications in large-scale and high-dimensional data. PLoS One 2017; 12:e0188746. [PMID: 29236718 PMCID: PMC5728507 DOI: 10.1371/journal.pone.0188746] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2017] [Accepted: 10/02/2017] [Indexed: 01/02/2023] Open
Abstract
In this paper, we propose a new automatic hyperparameter selection approach for determining the optimal network configuration (network structure and hyperparameters) for deep neural networks using particle swarm optimization (PSO) in combination with a steepest gradient descent algorithm. In the proposed approach, network configurations were coded as a set of real-number m-dimensional vectors as the individuals of the PSO algorithm in the search procedure. During the search procedure, the PSO algorithm is employed to search for optimal network configurations via the particles moving in a finite search space, and the steepest gradient descent algorithm is used to train the DNN classifier with a few training epochs (to find a local optimal solution) during the population evaluation of PSO. After the optimization scheme, the steepest gradient descent algorithm is performed with more epochs and the final solutions (pbest and gbest) of the PSO algorithm to train a final ensemble model and individual DNN classifiers, respectively. The local search ability of the steepest gradient descent algorithm and the global search capabilities of the PSO algorithm are exploited to determine an optimal solution that is close to the global optimum. We constructed several experiments on hand-written characters and biological activity prediction datasets to show that the DNN classifiers trained by the network configurations expressed by the final solutions of the PSO algorithm, employed to construct an ensemble model and individual classifier, outperform the random approach in terms of the generalization performance. Therefore, the proposed approach can be regarded an alternative tool for automatic network structure and parameter selection for deep neural networks.
Collapse
Affiliation(s)
- Fei Ye
- School of information science and technology, Southwest Jiaotong University, ChengDu, China
| |
Collapse
|
48
|
Gene Prediction in Metagenomic Fragments with Deep Learning. BIOMED RESEARCH INTERNATIONAL 2017; 2017:4740354. [PMID: 29250541 PMCID: PMC5698827 DOI: 10.1155/2017/4740354] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Accepted: 10/08/2017] [Indexed: 01/14/2023]
Abstract
Next generation sequencing technologies used in metagenomics yield numerous sequencing fragments which come from thousands of different species. Accurately identifying genes from metagenomics fragments is one of the most fundamental issues in metagenomics. In this article, by fusing multifeatures (i.e., monocodon usage, monoamino acid usage, ORF length coverage, and Z-curve features) and using deep stacking networks learning model, we present a novel method (called Meta-MFDL) to predict the metagenomic genes. The results with 10 CV and independent tests show that Meta-MFDL is a powerful tool for identifying genes from metagenomic fragments.
Collapse
|
49
|
Ventola GMM, Noviello TMR, D'Aniello S, Spagnuolo A, Ceccarelli M, Cerulo L. Identification of long non-coding transcripts with feature selection: a comparative study. BMC Bioinformatics 2017; 18:187. [PMID: 28335739 PMCID: PMC5364679 DOI: 10.1186/s12859-017-1594-z] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2016] [Accepted: 03/10/2017] [Indexed: 01/15/2023] Open
Abstract
Background The unveiling of long non-coding RNAs as important gene regulators in many biological contexts has increased the demand for efficient and robust computational methods to identify novel long non-coding RNAs from transcripts assembled with high throughput RNA-seq data. Several classes of sequence-based features have been proposed to distinguish between coding and non-coding transcripts. Among them, open reading frame, conservation scores, nucleotide arrangements, and RNA secondary structure have been used with success in literature to recognize intergenic long non-coding RNAs, a particular subclass of non-coding RNAs. Results In this paper we perform a systematic assessment of a wide collection of features extracted from sequence data. We use most of the features proposed in the literature, and we include, as a novel set of features, the occurrence of repeats contained in transposable elements. The aim is to detect signatures (groups of features) able to distinguish long non-coding transcripts from other classes, both protein-coding and non-coding. We evaluate different feature selection algorithms, test for signature stability, and evaluate the prediction ability of a signature with a machine learning algorithm. The study reveals different signatures in human, mouse, and zebrafish, highlighting that some features are shared among species, while others tend to be species-specific. Compared to coding potential tools and similar supervised approaches, including novel signatures, such as those identified here, in a machine learning algorithm improves the prediction performance, in terms of area under precision and recall curve, by 1 to 24%, depending on the species and on the signature. Conclusions Understanding which features are best suited for the prediction of long non-coding RNAs allows for the development of more effective automatic annotation pipelines especially relevant for poorly annotated genomes, such as zebrafish. We provide a web tool that recognizes novel long non-coding RNAs with the obtained signatures from fasta and gtf formats. The tool is available at the following url: http://www.bioinformatics-sannio.org/software/. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1594-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Giovanna M M Ventola
- Department of Science and Technology, University of Sannio, via Port'Arsa, 11, Benevento, 82100, Italy.,BioGeM, Institute of Genetic Research "Gaetano Salvatore", c.da Camporeale, Ariano Irpino (AV), 83031, Italy
| | - Teresa M R Noviello
- Department of Science and Technology, University of Sannio, via Port'Arsa, 11, Benevento, 82100, Italy.,BioGeM, Institute of Genetic Research "Gaetano Salvatore", c.da Camporeale, Ariano Irpino (AV), 83031, Italy
| | - Salvatore D'Aniello
- Biology and Evolution of Marine Organisms, Stazione Zoologica Anton Dohrn, Villa Comunale, Napoli, 80121, Italy
| | - Antonietta Spagnuolo
- Biology and Evolution of Marine Organisms, Stazione Zoologica Anton Dohrn, Villa Comunale, Napoli, 80121, Italy
| | - Michele Ceccarelli
- Department of Science and Technology, University of Sannio, via Port'Arsa, 11, Benevento, 82100, Italy
| | - Luigi Cerulo
- Department of Science and Technology, University of Sannio, via Port'Arsa, 11, Benevento, 82100, Italy. .,BioGeM, Institute of Genetic Research "Gaetano Salvatore", c.da Camporeale, Ariano Irpino (AV), 83031, Italy.
| |
Collapse
|
50
|
Vieira LM, Grativol C, Thiebaut F, Carvalho TG, Hardoim PR, Hemerly A, Lifschitz S, Ferreira PCG, Walter MEMT. PlantRNA_Sniffer: A SVM-Based Workflow to Predict Long Intergenic Non-Coding RNAs in Plants. Noncoding RNA 2017; 3:ncrna3010011. [PMID: 29657283 PMCID: PMC5831995 DOI: 10.3390/ncrna3010011] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2016] [Revised: 02/19/2017] [Accepted: 02/24/2017] [Indexed: 12/17/2022] Open
Abstract
Non-coding RNAs (ncRNAs) constitute an important set of transcripts produced in the cells of organisms. Among them, there is a large amount of a particular class of long ncRNAs that are difficult to predict, the so-called long intergenic ncRNAs (lincRNAs), which might play essential roles in gene regulation and other cellular processes. Despite the importance of these lincRNAs, there is still a lack of biological knowledge and, currently, the few computational methods considered are so specific that they cannot be successfully applied to other species different from those that they have been originally designed to. Prediction of lncRNAs have been performed with machine learning techniques. Particularly, for lincRNA prediction, supervised learning methods have been explored in recent literature. As far as we know, there are no methods nor workflows specially designed to predict lincRNAs in plants. In this context, this work proposes a workflow to predict lincRNAs on plants, considering a workflow that includes known bioinformatics tools together with machine learning techniques, here a support vector machine (SVM). We discuss two case studies that allowed to identify novel lincRNAs, in sugarcane (Saccharum spp.) and in maize (Zea mays). From the results, we also could identify differentially-expressed lincRNAs in sugarcane and maize plants submitted to pathogenic and beneficial microorganisms.
Collapse
Affiliation(s)
- Lucas Maciel Vieira
- Departamento de Ciência da Computação, Universidade de Brasília, Brasília-DF 70910-900, Brasil.
| | - Clicia Grativol
- Laboratório de Química e Função de Proteínas e Peptídeos, Universidade Estadual do Norte Fluminense, Campos dos Goytacazes-RJ 28013-602, Brazil.
| | - Flavia Thiebaut
- Instituto de Bioquímica Médica Leopoldo de Meis, Universidade Federal do Rio de Janeiro, Rio de Janeiro-RJ 21941-901, Brazil.
| | - Thais G Carvalho
- Instituto de Bioquímica Médica Leopoldo de Meis, Universidade Federal do Rio de Janeiro, Rio de Janeiro-RJ 21941-901, Brazil.
| | - Pablo R Hardoim
- Instituto de Bioquímica Médica Leopoldo de Meis, Universidade Federal do Rio de Janeiro, Rio de Janeiro-RJ 21941-901, Brazil.
| | - Adriana Hemerly
- Instituto de Bioquímica Médica Leopoldo de Meis, Universidade Federal do Rio de Janeiro, Rio de Janeiro-RJ 21941-901, Brazil.
| | - Sergio Lifschitz
- Departamento de Informática, Pontifícia Universidade Católica do Rio de Janeiro, Rio de Janeiro-RJ 22451-900, Brazil.
| | - Paulo Cavalcanti Gomes Ferreira
- Instituto de Bioquímica Médica Leopoldo de Meis, Universidade Federal do Rio de Janeiro, Rio de Janeiro-RJ 21941-901, Brazil.
| | - Maria Emilia M T Walter
- Departamento de Ciência da Computação, Universidade de Brasília, Brasília-DF 70910-900, Brasil.
| |
Collapse
|