1
|
Tian XC, Nie S, Domingues D, Rossi Paschoal A, Jiang LB, Mao JF. PlantLncBoost: key features for plant lncRNA identification and significant improvement in accuracy and generalization. THE NEW PHYTOLOGIST 2025. [PMID: 40432231 DOI: 10.1111/nph.70211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2025] [Accepted: 04/15/2025] [Indexed: 05/29/2025]
Abstract
Long noncoding RNAs (lncRNAs) are critical regulators of numerous biological processes in plants. Nevertheless, their identification is challenging due to the low sequence conservation across various species. Existing computational methods for lncRNA identification often face difficulties in generalizing across diverse plant species, highlighting the need for more robust and versatile identification models. Here, we present PlantLncBoost, a novel computational tool designed to improve the generalization in plant lncRNA identification. By integrating advanced gradient boosting algorithms with comprehensive feature selection, our approach achieves both high accuracy and generalizability. We conducted an extensive analysis of 1662 features and identified three key features - ORF coverage, complex Fourier average, and atomic Fourier amplitude - that effectively distinguish lncRNAs from mRNAs. We assessed the performance of PlantLncBoost using comprehensive datasets from 20 plant species. The model exhibited exceptional performance, with an accuracy of 96.63%, a sensitivity of 98.42%, and a specificity of 94.93%, significantly outperforming existing tools. Further analysis revealed that the features we selected effectively capture the differences between lncRNAs and mRNAs across a variety of plant species. PlantLncBoost represents a significant advancement in plant lncRNA identification. It is freely accessible on GitHub (https://github.com/xuechantian/PlantLncBoost) and has been integrated into a comprehensive analysis pipeline, Plant-LncRNA-pipeline v.2 (https://github.com/xuechantian/Plant-LncRNA-pipeline-v2).
Collapse
Affiliation(s)
- Xue-Chan Tian
- School of Life Sciences and Medicine, Shandong University of Technology, Zibo, Shandong, 255000, China
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, 100083, China
| | - Shuai Nie
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, 100083, China
- Rice Research Institute, Guangdong Academy of Agricultural Sciences, Guangdong Key Laboratory of Rice Science and Technology, Guangdong Rice Engineering Laboratory, Key Laboratory of Genetics and Breeding of High Quality Rice in Southern China (Co-construction by Ministry and Province), Ministry of Agriculture and Rural Affairs, Guangzhou, 510640, China
| | - Douglas Domingues
- Department of Genetics, "Luiz de Queiroz" College of Agriculture, University of São Paulo, 13418-900, Piracicaba, Sao Paulo, Brazil
| | - Alexandre Rossi Paschoal
- Bioinformatics and Pattern Recognition Group (BIOINFO-CP), Department of Computer Science, Federal University of Technology - Paraná, UTFPR, Campus Cornélio Procópio, Cornélio Procópio, 86300-000, Brazil
- The Rosalind Franklin Institute, OX110QX, Didcot, UK
| | - Li-Bo Jiang
- School of Life Sciences and Medicine, Shandong University of Technology, Zibo, Shandong, 255000, China
| | - Jian-Feng Mao
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, 100083, China
- Department of Plant Physiology, Umeå Plant Science Centre (UPSC), Umeå University, Umeå, 90187, Sweden
| |
Collapse
|
2
|
Bai H, Wang J, Jiang X, Guo Z, Yang W, Yang Z, Li J, Liu C. TetraRNA, a tetra-class machine learning model for deciphering the coding potential derivation of RNA world. Comput Struct Biotechnol J 2025; 27:1305-1317. [PMID: 40230410 PMCID: PMC11994946 DOI: 10.1016/j.csbj.2025.03.039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2024] [Revised: 03/20/2025] [Accepted: 03/24/2025] [Indexed: 04/16/2025] Open
Abstract
CncRNAs (coding and noncoding RNAs) are a class of bifunctional RNAs that that has both coding and noncoding biological activity. An increasing number of cncRNAs are being identified, prompting reassessment of our knowledge of RNA. However, most existing RNA classification tools are based on binary classification models which are not effective in distinguishing cncRNAs from mRNAs or long noncoding RNAs (lncRNAs). Our statistical analysis demonstrated that mRNA-derived cncRNAs (untranslated mRNAs, untr-mRNAs) and lncRNA-derived cncRNAs (translated ncRNAs, tr-ncRNAs) do not fall in the same cluster. Therefore, in this study, we devised a novel tetra-class RNA classification model that is systematically optimized for RNA feature extraction. According to our model, all human RNAs can be reclassified into one of four categories - mRNA, untr-mRNA, lncRNA, and tr-ncRNA - representing a novel RNA classification system and allowing the discovery of more potential cncRNAs. Further analysis revealed significant differences among the four types of RNAs in tissue-specific expression, functional annotation, sequence composition, and other factors, providing insights into their divergent evolution trajectories. Moreover, investigation of the small tr-ncRNA peptides demonstrated that their evolution is coordinated with that of the the conserved functional small RNAs associated with them. All analysis results have been integrated into a database - TetraRNADB accessible online (http://tetrarnadb.liu-lab.com/).
Collapse
Affiliation(s)
- Hanrui Bai
- College of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei 230026, China
- CAS Key Laboratory of Tropical Plant Resources and Sustainable Use, Yunnan Key Laboratory of Crop Wild Relatives Omics, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Kunming 650223, China
| | - Jie Wang
- Department of Chromosome Biology, Max Planck Institute for Plant Breeding Research, Carl-von-Linne-Weg 10, Cologne 50829, Germany
| | - Xiaoke Jiang
- CAS Key Laboratory of Tropical Plant Resources and Sustainable Use, Yunnan Key Laboratory of Crop Wild Relatives Omics, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Kunming 650223, China
| | - Zhen Guo
- College of Science and Engineering, Saint Louis University, St. Louis, MO 63103, USA
| | - Wenjing Yang
- CAS Key Laboratory of Tropical Plant Resources and Sustainable Use, Yunnan Key Laboratory of Crop Wild Relatives Omics, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Kunming 650223, China
| | - Zitian Yang
- CAS Key Laboratory of Tropical Plant Resources and Sustainable Use, Yunnan Key Laboratory of Crop Wild Relatives Omics, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Kunming 650223, China
| | - Jing Li
- CAS Key Laboratory of Tropical Plant Resources and Sustainable Use, Yunnan Key Laboratory of Crop Wild Relatives Omics, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Kunming 650223, China
| | - Changning Liu
- College of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei 230026, China
- CAS Key Laboratory of Tropical Plant Resources and Sustainable Use, Yunnan Key Laboratory of Crop Wild Relatives Omics, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Kunming 650223, China
| |
Collapse
|
3
|
Gao Y, Takenaka K, Xu SM, Cheng Y, Janitz M. Recent advances in investigation of circRNA/lncRNA-miRNA-mRNA networks through RNA sequencing data analysis. Brief Funct Genomics 2025; 24:elaf005. [PMID: 40251826 PMCID: PMC12008121 DOI: 10.1093/bfgp/elaf005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2024] [Revised: 03/10/2025] [Accepted: 03/18/2025] [Indexed: 04/21/2025] Open
Abstract
Non-coding RNAs (ncRNAs) are RNA molecules that are transcribed from DNA but are not translated into proteins. Studies over the past decades have revealed that ncRNAs can be classified into small RNAs, long non-coding RNAs and circular RNAs by genomic size and structure. Accumulated evidences have eludicated the critical roles of these non-coding transcripts in regulating gene expression through transcription and translation, thereby shaping cellular function and disease pathogenesis. Notably, recent studies have investigated the function of ncRNAs as competitive endogenous RNAs (ceRNAs) that sequester miRNAs and modulate mRNAs expression. The ceRNAs network emerges as a pivotal regulatory function, with significant implications in various diseases such as cancer and neurodegenerative disease. Therefore, we highlighted multiple bioinformatics tools and databases that aim to predict ceRNAs interaction. Furthermore, we discussed limitations of using current technologies and potential improvement for ceRNAs network detection. Understanding of the dynamic interplay within ceRNAs may advance the biological comprehension, as well as providing potential targets for therapeutic intervention.
Collapse
Affiliation(s)
- Yulan Gao
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Gate 11 via Botany St, Sydney, NSW 2052, Australia
| | - Konii Takenaka
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Gate 11 via Botany St, Sydney, NSW 2052, Australia
| | - Si-Mei Xu
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Gate 11 via Botany St, Sydney, NSW 2052, Australia
| | - Yuning Cheng
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Gate 11 via Botany St, Sydney, NSW 2052, Australia
| | - Michael Janitz
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Gate 11 via Botany St, Sydney, NSW 2052, Australia
| |
Collapse
|
4
|
Thakur A, Kumar M. Computational Resources for lncRNA Functions and Targetome. Methods Mol Biol 2025; 2883:299-323. [PMID: 39702714 DOI: 10.1007/978-1-0716-4290-0_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2024]
Abstract
Long non-coding RNAs (lncRNAs) are a type of non-coding RNA molecules exceeding 200 nucleotides in length and that do not encode proteins. The dysregulated expression of lncRNAs has been identified in various diseases, holding therapeutic significance. Over the past decade, numerous computational resources have been published in the field of lncRNA. In this chapter, we have provided a comprehensive review of the databases as well as predictive tools, that is, lncRNA databases, machine learning based algorithms, and tools predicting lncRNAs utilizing different techniques. The chapter will focus on the importance of lncRNA resources developed for different organisms specifically for humans, mouse, plants, and other model organisms. We have enlisted important databases, primarily focusing on comprehensive information related to lncRNA registries, associations with diseases, differential expression, lncRNA transcriptome, target regulations, and all-in-one resources. Further, we have also included the updated version of lncRNA resources. Additionally, computational identification of lncRNAs using algorithms like Deep learning, Support Vector Machine (SVM), and Random Forest (RF) was also discussed. In conclusion, this comprehensive overview concludes by summarizing vital in silico resources, empowering biologists to choose the most suitable tools for their lncRNA research endeavors. This chapter serves as a valuable guide, emphasizing the significance of computational approaches in understanding lncRNAs and their implications in various biological contexts.
Collapse
Affiliation(s)
- Anamika Thakur
- Virology Unit and Bioinformatics Centre, Institute of Microbial Technology, Council of Scientific and Industrial Research (CSIR), Sector 39A, Chandigarh, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| | - Manoj Kumar
- Virology Unit and Bioinformatics Centre, Institute of Microbial Technology, Council of Scientific and Industrial Research (CSIR), Sector 39A, Chandigarh, India.
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India.
| |
Collapse
|
5
|
Bereczki Z, Benczik B, Balogh OM, Marton S, Puhl E, Pétervári M, Váczy-Földi M, Papp ZT, Makkos A, Glass K, Locquet F, Euler G, Schulz R, Ferdinandy P, Ágg B. Mitigating off-target effects of small RNAs: conventional approaches, network theory and artificial intelligence. Br J Pharmacol 2025; 182:340-379. [PMID: 39293936 DOI: 10.1111/bph.17302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 05/07/2024] [Accepted: 06/17/2024] [Indexed: 09/20/2024] Open
Abstract
Three types of highly promising small RNA therapeutics, namely, small interfering RNAs (siRNAs), microRNAs (miRNAs) and the RNA subtype of antisense oligonucleotides (ASOs), offer advantages over small-molecule drugs. These small RNAs can target any gene product, opening up new avenues of effective and safe therapeutic approaches for a wide range of diseases. In preclinical research, synthetic small RNAs play an essential role in the investigation of physiological and pathological pathways as silencers of specific genes, facilitating discovery and validation of drug targets in different conditions. Off-target effects of small RNAs, however, could make it difficult to interpret experimental results in the preclinical phase and may contribute to adverse events of small RNA therapeutics. Out of the two major types of off-target effects we focused on the hybridization-dependent, especially on the miRNA-like off-target effects. Our main aim was to discuss several approaches, including sequence design, chemical modifications and target prediction, to reduce hybridization-dependent off-target effects that should be considered even at the early development phase of small RNA therapy. Because there is no standard way of predicting hybridization-dependent off-target effects, this review provides an overview of all major state-of-the-art computational methods and proposes new approaches, such as the possible inclusion of network theory and artificial intelligence (AI) in the prediction workflows. Case studies and a concise survey of experimental methods for validating in silico predictions are also presented. These methods could contribute to interpret experimental results, to minimize off-target effects and hopefully to avoid off-target-related adverse events of small RNA therapeutics. LINKED ARTICLES: This article is part of a themed issue Non-coding RNA Therapeutics. To view the other articles in this section visit http://onlinelibrary.wiley.com/doi/10.1111/bph.v182.2/issuetoc.
Collapse
Affiliation(s)
- Zoltán Bereczki
- Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
- Center for Pharmacology and Drug Research & Development, Semmelweis University, Budapest, Hungary
- HUN-REN-SU System Pharmacology Research Group, Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
| | - Bettina Benczik
- Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
- Center for Pharmacology and Drug Research & Development, Semmelweis University, Budapest, Hungary
- HUN-REN-SU System Pharmacology Research Group, Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
- Pharmahungary Group, Szeged, Hungary
| | - Olivér M Balogh
- Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
- Center for Pharmacology and Drug Research & Development, Semmelweis University, Budapest, Hungary
- HUN-REN-SU System Pharmacology Research Group, Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
| | - Szandra Marton
- Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
- Center for Pharmacology and Drug Research & Development, Semmelweis University, Budapest, Hungary
| | - Eszter Puhl
- Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
- Center for Pharmacology and Drug Research & Development, Semmelweis University, Budapest, Hungary
| | - Mátyás Pétervári
- Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
- Center for Pharmacology and Drug Research & Development, Semmelweis University, Budapest, Hungary
- HUN-REN-SU System Pharmacology Research Group, Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
- Sanovigado Kft, Budapest, Hungary
| | - Máté Váczy-Földi
- Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
- Center for Pharmacology and Drug Research & Development, Semmelweis University, Budapest, Hungary
- HUN-REN-SU System Pharmacology Research Group, Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
| | - Zsolt Tamás Papp
- Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
- Center for Pharmacology and Drug Research & Development, Semmelweis University, Budapest, Hungary
- HUN-REN-SU System Pharmacology Research Group, Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
| | - András Makkos
- Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
- Center for Pharmacology and Drug Research & Development, Semmelweis University, Budapest, Hungary
- HUN-REN-SU System Pharmacology Research Group, Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
- Pharmahungary Group, Szeged, Hungary
| | - Kimberly Glass
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Fabian Locquet
- Physiologisches Institut, Justus-Liebig-Universität Gießen, Giessen, Germany
| | - Gerhild Euler
- Physiologisches Institut, Justus-Liebig-Universität Gießen, Giessen, Germany
| | - Rainer Schulz
- Physiologisches Institut, Justus-Liebig-Universität Gießen, Giessen, Germany
| | - Péter Ferdinandy
- Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
- Center for Pharmacology and Drug Research & Development, Semmelweis University, Budapest, Hungary
- HUN-REN-SU System Pharmacology Research Group, Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
- Pharmahungary Group, Szeged, Hungary
| | - Bence Ágg
- Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
- Center for Pharmacology and Drug Research & Development, Semmelweis University, Budapest, Hungary
- HUN-REN-SU System Pharmacology Research Group, Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
- Pharmahungary Group, Szeged, Hungary
| |
Collapse
|
6
|
Tan L, Mengshan L, Yu F, Yelin L, Jihong Z, Lixin G. Predicting lncRNA-protein interactions using a hybrid deep learning model with dinucleotide-codon fusion feature encoding. BMC Genomics 2024; 25:1253. [PMID: 39732642 DOI: 10.1186/s12864-024-11168-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2024] [Accepted: 12/18/2024] [Indexed: 12/30/2024] Open
Abstract
Long non-coding RNAs (lncRNAs) play crucial roles in numerous biological processes and are involved in complex human diseases through interactions with proteins. Accurate identification of lncRNA-protein interactions (LPI) can help elucidate the functional mechanisms of lncRNAs and provide scientific insights into the molecular mechanisms underlying related diseases. While many sequence-based methods have been developed to predict LPIs, efficiently extracting and effectively integrating potential feature information that reflects functional attributes from lncRNA and protein sequences remains a significant challenge. This paper proposes a Dinucleotide-Codon Fusion Feature encoding (DNCFF) and constructs an LPI prediction model based on deep learning, termed LPI-DNCFF. The Dual Nucleotide Visual Fusion Feature encoding (DNVFF) incorporates positional information of single nucleotides with subsequent nucleotide connections, while Codon Fusion Feature encoding (CFF) considers the specificity, molecular weight, and physicochemical properties of each amino acid. These encoding methods encapsulate rich and intuitive sequence information in limited encoding dimensions. The model comprehensively predicts LPIs by integrating global, local, and structural features, and inputs them into BiLSTM and attention layers to form a hybrid deep learning model. Experimental results demonstrate that LPI-DNCFF effectively predicts LPIs. The BiLSTM layer and attention mechanism can learn long-term dependencies and identify weighted key features, enhancing model performance. Compared to one-hot encoding, DNCFF more efficiently and thoroughly extracts potential sequence features. Compared to other existing methods, LPI-DNCFF achieved the best performance on the RPI1847 and ATH948 datasets, with MCC values of approximately 97.84% and 84.58%, respectively, outperforming the state-of-the-art method by about 1.44% and 3.48%.
Collapse
Affiliation(s)
- Li Tan
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou, 341000, Jiangxi, China
| | - Li Mengshan
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou, 341000, Jiangxi, China.
- Ganzhou Power Supply Branch of State Grid Jiangxi Electric Power Co., Ltd, Ganzhou, 341000, Jiangxi, China.
| | - Fu Yu
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou, 341000, Jiangxi, China
- Ganzhou Power Supply Branch of State Grid Jiangxi Electric Power Co., Ltd, Ganzhou, 341000, Jiangxi, China
| | - Li Yelin
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou, 341000, Jiangxi, China
| | - Zhu Jihong
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou, 341000, Jiangxi, China
| | - Guan Lixin
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou, 341000, Jiangxi, China
| |
Collapse
|
7
|
Poloni JF, Oliveira FHS, Feltes BC. Localization is the key to action: regulatory peculiarities of lncRNAs. Front Genet 2024; 15:1478352. [PMID: 39737005 PMCID: PMC11683014 DOI: 10.3389/fgene.2024.1478352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2024] [Accepted: 11/27/2024] [Indexed: 01/01/2025] Open
Abstract
To understand the transcriptomic profile of an individual cell in a multicellular organism, we must comprehend its surrounding environment and the cellular space where distinct molecular stimuli responses are located. Contradicting the initial perception that RNAs were nonfunctional and that only a few could act in chromatin remodeling, over the last few decades, research has revealed that they are multifaceted, versatile regulators of most cellular processes. Among the various RNAs, long non-coding RNAs (LncRNAs) regulate multiple biological processes and can even impact cell fate. In this sense, the subcellular localization of lncRNAs is the primary determinant of their functions. It affects their behavior by limiting their potential molecular partner and which process it can affect. The fine-tuned activity of lncRNAs is also tissue-specific and modulated by their cis and trans regulation. Hence, the spatial context of lncRNAs is crucial for understanding the regulatory networks by which they influence and are influenced. Therefore, predicting a lncRNA's correct location is not just a technical challenge but a critical step in understanding the biological meaning of its activity. Hence, examining these peculiarities is crucial to researching and discussing lncRNAs. In this review, we debate the spatial regulation of lncRNAs and their tissue-specific roles and regulatory mechanisms. We also briefly highlight how bioinformatic tools can aid research in the area.
Collapse
Affiliation(s)
| | | | - Bruno César Feltes
- Department of Biophysics, Laboratory of DNA Repair and Aging, Institute of Biosciences, Federal University of Rio Grande do Sul, Porto Alegre, Rio Grande do Sul, Brazil
| |
Collapse
|
8
|
Liu Y, Gao Y, Niu R, Zhang Z, Lu GW, Hu H, Liu T, Cheng Z. Rapid and accurate bacteria identification through deep-learning-based two-dimensional Raman spectroscopy. Anal Chim Acta 2024; 1332:343376. [PMID: 39580159 DOI: 10.1016/j.aca.2024.343376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2024] [Revised: 10/22/2024] [Accepted: 10/27/2024] [Indexed: 11/25/2024]
Abstract
Surface-enhanced Raman spectroscopy (SERS) offers a distinctive vibrational fingerprint of the molecules and has led to widespread applications in medical diagnosis, biochemistry, and virology. With the rapid development of artificial intelligence (AI) technology, AI-enabled Raman spectroscopic techniques, as a promising avenue for biosensing applications, have significantly boosted bacteria identification. By converting spectra into images, the dataset is enriched with more detailed information, allowing AI to identify bacterial isolates with enhanced precision. However, previous studies usually suffer from a trade-off between high-resolution spectrograms for high-accuracy identification and short training time for data processing. Here, we present an efficient bacteria identification strategy that combines deep learning models with a spectrogram encoding algorithm based on wavelet packet transform and Gramian angular field techniques. In contrast to the direct analysis of raw Raman spectra, our approach utilizes wavelet packet transform techniques to compress the spectra by a factor of 1/15, while concurrently maintaining state-of-the-art accuracy by amplifying the subtle differences via Gramian angular field techniques. The results demonstrate that our approach can achieve a 99.64 % and a 90.55 % identification accuracy for two types of bacterial isolates and thirty types of bacterial isolates, respectively, while a 90 % reduction in training time compared to the conventional methods. To verify the model's stability, Gaussian noises were superimposed on the testing dataset, showing a specific generalization ability and superior performance. This algorithm has the potential for integration into on-site testing protocols and is readily updatable with new bacterial isolates. This study provides profound insights and contributes to the current understanding of spectroscopy, paving the way for accurate and rapid bacteria identification in diverse applications of environment monitoring, food safety, microbiology, and public health.
Collapse
Affiliation(s)
- Yichen Liu
- School of Precision Instrument and Opto-electronics Engineering, Tianjin University, Tianjin 300072, China; Key Laboratory of Opto-electronic Information Technology, Ministry of Education, Tianjin 300072, China
| | - Yisheng Gao
- School of Precision Instrument and Opto-electronics Engineering, Tianjin University, Tianjin 300072, China; Key Laboratory of Opto-electronic Information Technology, Ministry of Education, Tianjin 300072, China.
| | - Rui Niu
- School of Precision Instrument and Opto-electronics Engineering, Tianjin University, Tianjin 300072, China; Key Laboratory of Opto-electronic Information Technology, Ministry of Education, Tianjin 300072, China
| | - Zunyue Zhang
- School of Precision Instrument and Opto-electronics Engineering, Tianjin University, Tianjin 300072, China; Key Laboratory of Opto-electronic Information Technology, Ministry of Education, Tianjin 300072, China
| | - Guo-Wei Lu
- Institute of Material Chemistry and Engineering, Kyushu University, Fukuoka 816-8580, Japan
| | - Haofeng Hu
- School of Precision Instrument and Opto-electronics Engineering, Tianjin University, Tianjin 300072, China; Key Laboratory of Opto-electronic Information Technology, Ministry of Education, Tianjin 300072, China; School of Marine Science and Technology, Tianjin University, Tianjin 300072, China.
| | - Tiegen Liu
- School of Precision Instrument and Opto-electronics Engineering, Tianjin University, Tianjin 300072, China; Key Laboratory of Opto-electronic Information Technology, Ministry of Education, Tianjin 300072, China
| | - Zhenzhou Cheng
- School of Precision Instrument and Opto-electronics Engineering, Tianjin University, Tianjin 300072, China; Key Laboratory of Opto-electronic Information Technology, Ministry of Education, Tianjin 300072, China; Georgia Tech-Shenzhen Institute, Tianjin University, Shenzhen 518055, China; Department of Chemistry, The University of Tokyo, Tokyo 113-0033, Japan; School of Physics and Electronic Engineering, Xinjiang Normal University, Urumqi 830054, China.
| |
Collapse
|
9
|
Wang S, Qi X, Liu D, Xie D, Jiang B, Wang J, Wang X, Wu G. The implications for urological malignancies of non-coding RNAs in the the tumor microenvironment. Comput Struct Biotechnol J 2024; 23:491-505. [PMID: 38249783 PMCID: PMC10796827 DOI: 10.1016/j.csbj.2023.12.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 12/08/2023] [Accepted: 12/16/2023] [Indexed: 01/23/2024] Open
Abstract
Urological malignancies are a major global health issue because of their complexity and the wide range of ways they affect patients. There's a growing need for in-depth research into these cancers, especially at the molecular level. Recent studies have highlighted the importance of non-coding RNAs (ncRNAs) – these don't code for proteins but are crucial in controlling genes – and the tumor microenvironment (TME), which is no longer seen as just a background factor but as an active player in cancer progression. Understanding how ncRNAs and the TME interact is key for finding new ways to diagnose and predict outcomes in urological cancers, and for developing new treatments. This article reviews the basic features of ncRNAs and goes into detail about their various roles in the TME, focusing specifically on how different ncRNAs function and act in urological malignancies.
Collapse
Affiliation(s)
- Shijin Wang
- Department of Urology, The First Affiliated Hospital of Dalian Medical University, Dalian 116011, Liaoning, China
| | - Xiaochen Qi
- Department of Urology, The First Affiliated Hospital of Dalian Medical University, Dalian 116011, Liaoning, China
| | - Dequan Liu
- Department of Urology, The First Affiliated Hospital of Dalian Medical University, Dalian 116011, Liaoning, China
| | - Deqian Xie
- Department of Urology, The First Affiliated Hospital of Dalian Medical University, Dalian 116011, Liaoning, China
| | - Bowen Jiang
- Department of Urology, The First Affiliated Hospital of Dalian Medical University, Dalian 116011, Liaoning, China
| | - Jin Wang
- Department of Urology, The First Affiliated Hospital of Dalian Medical University, Dalian 116011, Liaoning, China
| | - Xiaoxi Wang
- Department of Clinical Laboratory Medicine, The First Affiliated Hospital of Dalian Medical University, Dalian 116011, Liaoning, China
| | - Guangzhen Wu
- Department of Urology, The First Affiliated Hospital of Dalian Medical University, Dalian 116011, Liaoning, China
| |
Collapse
|
10
|
Bravo S, Zarate P, Cari I, Clavijo L, Lopez I, Phillips NM, Vidal R. Comparative Tissue Identification and Characterization of Long Non-Coding RNAs in the Globally Distributed Blue Shark Prionace glauca. Life (Basel) 2024; 14:1144. [PMID: 39337927 PMCID: PMC11433378 DOI: 10.3390/life14091144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Revised: 08/24/2024] [Accepted: 08/27/2024] [Indexed: 09/30/2024] Open
Abstract
Long non-coding RNAs (lncRNAs) are involved in numerous biological processes and serve crucial regulatory functions in both animals and plants. Nevertheless, there is limited understanding of lncRNAs and their patterns of expression and roles in sharks. In the current study, we systematically identified and characterized lncRNAs in the blue shark (Prionace glauca) from four tissues (liver, spleen, muscle, and kidney) using high-throughput sequencing and bioinformatics tools. A total of 21,932 high-confidence lncRNAs were identified, with 8984 and 3067 stably and tissue-specific expressed lncRNAs, respectively. In addition, a total of 45,007 differentially expressed (DE) lncRNAs were obtained among tissues, with kidney versus muscle having the largest numbers across tissues. DE lncRNAs trans target protein-coding genes were predicted, and functional gene ontology enrichment of these genes showed GO terms such as muscle system processes, cellular/metabolic processes, and stress and immune responses, all of which correspond with the specific biological functions of each tissue analyzed. These results advance our knowledge of lncRNAs in sharks and present novel data on tissue-specific lncRNAs, providing key information to support future functional shark investigations.
Collapse
Affiliation(s)
- Scarleth Bravo
- Laboratory of Genomics, Molecular Ecology and Evolutionary Studies, Department of Biology, Universidad de Santiago de Chile, Santiago 9160000, Chile; (S.B.); (I.L.)
| | - Patricia Zarate
- Departamento de Oceanografía y Medio Ambiente, División de Investigación Pesquera, Instituto de Fomento Pesquero, Valparaíso 2361827, Chile; (P.Z.); (I.C.); (L.C.)
| | - Ilia Cari
- Departamento de Oceanografía y Medio Ambiente, División de Investigación Pesquera, Instituto de Fomento Pesquero, Valparaíso 2361827, Chile; (P.Z.); (I.C.); (L.C.)
| | - Ljubitza Clavijo
- Departamento de Oceanografía y Medio Ambiente, División de Investigación Pesquera, Instituto de Fomento Pesquero, Valparaíso 2361827, Chile; (P.Z.); (I.C.); (L.C.)
| | - Ignacio Lopez
- Laboratory of Genomics, Molecular Ecology and Evolutionary Studies, Department of Biology, Universidad de Santiago de Chile, Santiago 9160000, Chile; (S.B.); (I.L.)
| | - Nicole M. Phillips
- School of Biological, Environmental, and Earth Sciences, University of Southern Mississippi, Hattiesburg, MS 39406, USA;
| | - Rodrigo Vidal
- Laboratory of Genomics, Molecular Ecology and Evolutionary Studies, Department of Biology, Universidad de Santiago de Chile, Santiago 9160000, Chile; (S.B.); (I.L.)
| |
Collapse
|
11
|
Li A, Zhou H, Xiong S, Li J, Mallik S, Fei R, Liu Y, Zhou H, Wang X, Hei X, Wang L. PLEKv2: predicting lncRNAs and mRNAs based on intrinsic sequence features and the coding-net model. BMC Genomics 2024; 25:756. [PMID: 39095710 PMCID: PMC11295476 DOI: 10.1186/s12864-024-10662-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Accepted: 07/25/2024] [Indexed: 08/04/2024] Open
Abstract
BACKGROUND Long non-coding RNAs (lncRNAs) are RNA transcripts of more than 200 nucleotides that do not encode canonical proteins. Their biological structure is similar to messenger RNAs (mRNAs). To distinguish between lncRNA and mRNA transcripts quickly and accurately, we upgraded the PLEK alignment-free tool to its next version, PLEKv2, and constructed models tailored for both animals and plants. RESULTS PLEKv2 can achieve 98.7% prediction accuracy for human datasets. Compared with classical tools and deep learning-based models, this is 8.1%, 3.7%, 16.6%, 1.4%, 4.9%, and 48.9% higher than CPC2, CNCI, Wen et al.'s CNN, LncADeep, PLEK, and NcResNet, respectively. The accuracy of PLEKv2 was > 90% for cross-species prediction. PLEKv2 is more effective and robust than CPC2, CNCI, LncADeep, PLEK, and NcResNet for primate datasets (including chimpanzees, macaques, and gorillas). Moreover, PLEKv2 is not only suitable for non-human primates that are closely related to humans, but can also predict the coding ability of RNA sequences in plants such as Arabidopsis. CONCLUSIONS The experimental results illustrate that the model constructed by PLEKv2 can distinguish lncRNAs and mRNAs better than PLEK. The PLEKv2 software is freely available at https://sourceforge.net/projects/plek2/ .
Collapse
Affiliation(s)
- Aimin Li
- Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, Shaanxi, 710048, China.
| | - Haotian Zhou
- Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, Shaanxi, 710048, China
| | - Siqi Xiong
- Department of Information Engineering, College of Technology, Hubei Engineering University, Xiaogan, Hubei, 432000, China.
| | - Junhuai Li
- Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, Shaanxi, 710048, China
| | - Saurav Mallik
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Rong Fei
- Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, Shaanxi, 710048, China
| | - Yajun Liu
- Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, Shaanxi, 710048, China
| | - Hongfang Zhou
- Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, Shaanxi, 710048, China
| | - Xiaofan Wang
- Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, Shaanxi, 710048, China
| | - Xinhong Hei
- Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, Shaanxi, 710048, China
| | - Lei Wang
- Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, Shaanxi, 710048, China
| |
Collapse
|
12
|
Rajabi D, Khanmohammadi S, Rezaei N. The role of long noncoding RNAs in amyotrophic lateral sclerosis. Rev Neurosci 2024; 35:533-547. [PMID: 38452377 DOI: 10.1515/revneuro-2023-0155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 02/18/2024] [Indexed: 03/09/2024]
Abstract
Amyotrophic lateral sclerosis (ALS) is a progressive neurodegenerative disease with a poor prognosis leading to death. The diagnosis and treatment of ALS are inherently challenging due to its complex pathomechanism. Long noncoding RNAs (lncRNAs) are transcripts longer than 200 nucleotides involved in different cellular processes, incisively gene expression. In recent years, more studies have been conducted on lncRNA classes and interference in different disease pathologies, showing their promising contribution to diagnosing and treating neurodegenerative diseases. In this review, we discussed the role of lncRNAs like NEAT1 and C9orf72-as in ALS pathogenesis mechanisms caused by mutations in different genes, including TAR DNA-binding protein-43 (TDP-43), fused in sarcoma (FUS), superoxide dismutase type 1 (SOD1). NEAT1 is a well-established lncRNA in ALS pathogenesis; hence, we elaborate on its involvement in forming paraspeckles, stress response, inflammatory response, and apoptosis. Furthermore, antisense lncRNAs (as-lncRNAs), a key group of transcripts from the opposite strand of genes, including ZEB1-AS1 and ATXN2-AS, are discussed as newly identified components in the pathology of ALS. Ultimately, we review the current standing of using lncRNAs as biomarkers and therapeutic agents and the future vision of further studies on lncRNA applications.
Collapse
Affiliation(s)
- Darya Rajabi
- School of Medicine, Tehran University of Medical Sciences, Felestin St., Keshavarz Blvd., Tehran, 1416634793, Iran
| | - Shaghayegh Khanmohammadi
- School of Medicine, Tehran University of Medical Sciences, Felestin St., Keshavarz Blvd., Tehran, 1416634793, Iran
- Research Center for Immunodeficiencies, Children's Medical Center, No 63, Gharib Ave, Keshavarz Blv, Tehran, 1419733151, Iran
- Network of Immunity in Infection, Malignancy and Autoimmunity (NIIMA), Universal Scientific Education and Research Network (USERN), Children's Medical Center, No 63, Gharib Ave, Keshavarz Blv, Tehran, 1419733151, Iran
| | - Nima Rezaei
- Research Center for Immunodeficiencies, Children's Medical Center, No 63, Gharib Ave, Keshavarz Blv, Tehran, 1419733151, Iran
- Network of Immunity in Infection, Malignancy and Autoimmunity (NIIMA), Universal Scientific Education and Research Network (USERN), Children's Medical Center, No 63, Gharib Ave, Keshavarz Blv, Tehran, 1419733151, Iran
- Department of Immunology, School of Medicine, Tehran University of Medical Sciences, Felestin St., Keshavarz Blvd., Tehran, 1416634793, Iran
| |
Collapse
|
13
|
Diao B, Luo J, Guo Y. A comprehensive survey on deep learning-based identification and predicting the interaction mechanism of long non-coding RNAs. Brief Funct Genomics 2024; 23:314-324. [PMID: 38576205 DOI: 10.1093/bfgp/elae010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 02/25/2024] [Accepted: 03/14/2024] [Indexed: 04/06/2024] Open
Abstract
Long noncoding RNAs (lncRNAs) have been discovered to be extensively involved in eukaryotic epigenetic, transcriptional, and post-transcriptional regulatory processes with the advancements in sequencing technology and genomics research. Therefore, they play crucial roles in the body's normal physiology and various disease outcomes. Presently, numerous unknown lncRNA sequencing data require exploration. Establishing deep learning-based prediction models for lncRNAs provides valuable insights for researchers, substantially reducing time and costs associated with trial and error and facilitating the disease-relevant lncRNA identification for prognosis analysis and targeted drug development as the era of artificial intelligence progresses. However, most lncRNA-related researchers lack awareness of the latest advancements in deep learning models and model selection and application in functional research on lncRNAs. Thus, we elucidate the concept of deep learning models, explore several prevalent deep learning algorithms and their data preferences, conduct a comprehensive review of recent literature studies with exemplary predictive performance over the past 5 years in conjunction with diverse prediction functions, critically analyze and discuss the merits and limitations of current deep learning models and solutions, while also proposing prospects based on cutting-edge advancements in lncRNA research.
Collapse
Affiliation(s)
- Biyu Diao
- Department of Breast Surgery, The First Affiliated Hospital of Ningbo University, No. 59, Liuting Street, Haishu District, Ningbo 315000, China
| | - Jin Luo
- Department of Breast Surgery, The First Affiliated Hospital of Ningbo University, No. 59, Liuting Street, Haishu District, Ningbo 315000, China
| | - Yu Guo
- Department of Breast Surgery, The First Affiliated Hospital of Ningbo University, No. 59, Liuting Street, Haishu District, Ningbo 315000, China
| |
Collapse
|
14
|
Hwang H, Jeon H, Yeo N, Baek D. Big data and deep learning for RNA biology. Exp Mol Med 2024; 56:1293-1321. [PMID: 38871816 PMCID: PMC11263376 DOI: 10.1038/s12276-024-01243-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 02/27/2024] [Accepted: 03/05/2024] [Indexed: 06/15/2024] Open
Abstract
The exponential growth of big data in RNA biology (RB) has led to the development of deep learning (DL) models that have driven crucial discoveries. As constantly evidenced by DL studies in other fields, the successful implementation of DL in RB depends heavily on the effective utilization of large-scale datasets from public databases. In achieving this goal, data encoding methods, learning algorithms, and techniques that align well with biological domain knowledge have played pivotal roles. In this review, we provide guiding principles for applying these DL concepts to various problems in RB by demonstrating successful examples and associated methodologies. We also discuss the remaining challenges in developing DL models for RB and suggest strategies to overcome these challenges. Overall, this review aims to illuminate the compelling potential of DL for RB and ways to apply this powerful technology to investigate the intriguing biology of RNA more effectively.
Collapse
Affiliation(s)
- Hyeonseo Hwang
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Hyeonseong Jeon
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
- Genome4me Inc., Seoul, Republic of Korea
| | - Nagyeong Yeo
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Daehyun Baek
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea.
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea.
- Genome4me Inc., Seoul, Republic of Korea.
| |
Collapse
|
15
|
Li X, Qu W, Yan J, Tan J. RPI-EDLCN: An Ensemble Deep Learning Framework Based on Capsule Network for ncRNA-Protein Interaction Prediction. J Chem Inf Model 2024; 64:2221-2235. [PMID: 37158609 DOI: 10.1021/acs.jcim.3c00377] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
Noncoding RNAs (ncRNAs) play crucial roles in many cellular life activities by interacting with proteins. Identification of ncRNA-protein interactions (ncRPIs) is key to understanding the function of ncRNAs. Although a number of computational methods for predicting ncRPIs have been developed, the problem of predicting ncRPIs remains challenging. It has always been the focus of ncRPIs research to select suitable feature extraction methods and develop a deep learning architecture with better recognition performance. In this work, we proposed an ensemble deep learning framework, RPI-EDLCN, based on a capsule network (CapsuleNet) to predict ncRPIs. In terms of feature input, we extracted the sequence features, secondary structure sequence features, motif information, and physicochemical properties of ncRNA/protein. The sequence and secondary structure sequence features of ncRNA/protein are encoded by the conjoint k-mer method and then input into an ensemble deep learning model based on CapsuleNet by combining the motif information and physicochemical properties. In this model, the encoding features are processed by convolution neural network (CNN), deep neural network (DNN), and stacked autoencoder (SAE). Then the advanced features obtained from the processing are input into the CapsuleNet for further feature learning. Compared with other state-of-the-art methods under 5-fold cross-validation, the performance of RPI-EDLCN is the best, and the accuracy of RPI-EDLCN on RPI1807, RPI2241, and NPInter v2.0 data sets was 93.8%, 88.2%, and 91.9%, respectively. The results of the independent test indicated that RPI-EDLCN can effectively predict potential ncRPIs in different organisms. In addition, RPI-EDLCN successfully predicted hub ncRNAs and proteins in Mus musculus ncRNA-protein networks. Overall, our model can be used as an effective tool to predict ncRPIs and provides some useful guidance for future biological studies.
Collapse
Affiliation(s)
- Xiaoyi Li
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing University of Technology, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing 100124, China
| | - Wenyan Qu
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing University of Technology, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing 100124, China
| | - Jing Yan
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing University of Technology, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing 100124, China
| | - Jianjun Tan
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing University of Technology, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing 100124, China
| |
Collapse
|
16
|
Tian XC, Chen ZY, Nie S, Shi TL, Yan XM, Bao YT, Li ZC, Ma HY, Jia KH, Zhao W, Mao JF. Plant-LncPipe: a computational pipeline providing significant improvement in plant lncRNA identification. HORTICULTURE RESEARCH 2024; 11:uhae041. [PMID: 38638682 PMCID: PMC11024640 DOI: 10.1093/hr/uhae041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Accepted: 02/02/2024] [Indexed: 04/20/2024]
Abstract
Long non-coding RNAs (lncRNAs) play essential roles in various biological processes, such as chromatin remodeling, post-transcriptional regulation, and epigenetic modifications. Despite their critical functions in regulating plant growth, root development, and seed dormancy, the identification of plant lncRNAs remains a challenge due to the scarcity of specific and extensively tested identification methods. Most mainstream machine learning-based methods used for plant lncRNA identification were initially developed using human or other animal datasets, and their accuracy and effectiveness in predicting plant lncRNAs have not been fully evaluated or exploited. To overcome this limitation, we retrained several models, including CPAT, PLEK, and LncFinder, using plant datasets and compared their performance with mainstream lncRNA prediction tools such as CPC2, CNCI, RNAplonc, and LncADeep. Retraining these models significantly improved their performance, and two of the retrained models, LncFinder-plant and CPAT-plant, alongside their ensemble, emerged as the most suitable tools for plant lncRNA identification. This underscores the importance of model retraining in tackling the challenges associated with plant lncRNA identification. Finally, we developed a pipeline (Plant-LncPipe) that incorporates an ensemble of the two best-performing models and covers the entire data analysis process, including reads mapping, transcript assembly, lncRNA identification, classification, and origin, for the efficient identification of lncRNAs in plants. The pipeline, Plant-LncPipe, is available at: https://github.com/xuechantian/Plant-LncRNA-pipline.
Collapse
Affiliation(s)
- Xue-Chan Tian
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Zhao-Yang Chen
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Shuai Nie
- Rice Research Institute, Guangdong Academy of Agricultural Sciences & Key Laboratory of Genetics and Breeding of High Quality Rice in Southern China (Co-construction by Ministry and Province), Ministry of Agriculture and Rural Affairs & Guangdong Key Laboratory of New Technology in Rice Breeding, Guangzhou 510640, China
| | - Tian-Le Shi
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Xue-Mei Yan
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Yu-Tao Bao
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Zhi-Chao Li
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Hai-Yao Ma
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Kai-Hua Jia
- Key Laboratory of Crop Genetic Improvement & Ecology and Physiology, Institute of Crop Germplasm Resources, Shandong Academy of Agricultural Sciences, Jinan 250100, China
| | - Wei Zhao
- Department of Plant Physiology, Umeå Plant Science Centre (UPSC), Umeå University, Umeå 90187, Sweden
| | - Jian-Feng Mao
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- Department of Plant Physiology, Umeå Plant Science Centre (UPSC), Umeå University, Umeå 90187, Sweden
| |
Collapse
|
17
|
Chen Z, Ain NU, Zhao Q, Zhang X. From tradition to innovation: conventional and deep learning frameworks in genome annotation. Brief Bioinform 2024; 25:bbae138. [PMID: 38581418 PMCID: PMC10998533 DOI: 10.1093/bib/bbae138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 03/08/2024] [Accepted: 03/10/2024] [Indexed: 04/08/2024] Open
Abstract
Following the milestone success of the Human Genome Project, the 'Encyclopedia of DNA Elements (ENCODE)' initiative was launched in 2003 to unearth information about the numerous functional elements within the genome. This endeavor coincided with the emergence of numerous novel technologies, accompanied by the provision of vast amounts of whole-genome sequences, high-throughput data such as ChIP-Seq and RNA-Seq. Extracting biologically meaningful information from this massive dataset has become a critical aspect of many recent studies, particularly in annotating and predicting the functions of unknown genes. The core idea behind genome annotation is to identify genes and various functional elements within the genome sequence and infer their biological functions. Traditional wet-lab experimental methods still rely on extensive efforts for functional verification. However, early bioinformatics algorithms and software primarily employed shallow learning techniques; thus, the ability to characterize data and features learning was limited. With the widespread adoption of RNA-Seq technology, scientists from the biological community began to harness the potential of machine learning and deep learning approaches for gene structure prediction and functional annotation. In this context, we reviewed both conventional methods and contemporary deep learning frameworks, and highlighted novel perspectives on the challenges arising during annotation underscoring the dynamic nature of this evolving scientific landscape.
Collapse
Affiliation(s)
- Zhaojia Chen
- National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangzhou 518120, China
- College of Biomedical Engineering, Taiyuan University of Technology, Jinzhong 030600, China
| | - Noor ul Ain
- National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangzhou 518120, China
| | - Qian Zhao
- State Key Laboratory for Ecological Pest Control of Fujian/Taiwan Crops and College of Life Science, Fujian Agriculture and Forestry University, Fuzhou, 350002, China
| | - Xingtan Zhang
- National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangzhou 518120, China
| |
Collapse
|
18
|
Romeo-Cardeillac C, Trovero MF, Radío S, Smircich P, Rodríguez-Casuriaga R, Geisinger A, Sotelo-Silveira J. Uncovering a multitude of stage-specific splice variants and putative protein isoforms generated along mouse spermatogenesis. BMC Genomics 2024; 25:295. [PMID: 38509455 PMCID: PMC10953240 DOI: 10.1186/s12864-024-10170-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Accepted: 02/28/2024] [Indexed: 03/22/2024] Open
Abstract
BACKGROUND Mammalian testis is a highly complex and heterogeneous tissue. This complexity, which mostly derives from spermatogenic cells, is reflected at the transcriptional level, with the largest number of tissue-specific genes and long noncoding RNAs (lncRNAs) compared to other tissues, and one of the highest rates of alternative splicing. Although it is known that adequate alternative-splicing patterns and stage-specific isoforms are critical for successful spermatogenesis, so far only a very limited number of reports have addressed a detailed study of alternative splicing and isoforms along the different spermatogenic stages. RESULTS In the present work, using highly purified stage-specific testicular cell populations, we detected 33,002 transcripts expressed throughout mouse spermatogenesis not annotated so far. These include both splice variants of already annotated genes, and of hitherto unannotated genes. Using conservative criteria, we uncovered 13,471 spermatogenic lncRNAs, which reflects the still incomplete annotation of lncRNAs. A distinctive feature of lncRNAs was their lower number of splice variants compared to protein-coding ones, adding to the conclusion that lncRNAs are, in general, less complex than mRNAs. Besides, we identified 2,794 unannotated transcripts with high coding potential (including some arising from yet unannotated genes), many of which encode unnoticed putative testis-specific proteins. Some of the most interesting coding splice variants were chosen, and validated through RT-PCR. Remarkably, the largest number of stage-specific unannotated transcripts are expressed during early meiotic prophase stages, whose study has been scarcely addressed in former transcriptomic analyses. CONCLUSIONS We detected a high number of yet unannotated genes and alternatively spliced transcripts along mouse spermatogenesis, hence showing that the transcriptomic diversity of the testis is considerably higher than previously reported. This is especially prominent for specific, underrepresented stages such as those of early meiotic prophase, and its unveiling may constitute a step towards the understanding of their key events.
Collapse
Affiliation(s)
- Carlos Romeo-Cardeillac
- Laboratory of Molecular Biology of Reproduction, Department of Molecular Biology, Instituto de Investigaciones Biológicas Clemente Estable (IIBCE), 11,600, Montevideo, Uruguay
- Department of Genomics, IIBCE, 11,600, Montevideo, Uruguay
| | - María Fernanda Trovero
- Laboratory of Molecular Biology of Reproduction, Department of Molecular Biology, Instituto de Investigaciones Biológicas Clemente Estable (IIBCE), 11,600, Montevideo, Uruguay
- Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | - Santiago Radío
- Department of Genomics, IIBCE, 11,600, Montevideo, Uruguay
| | - Pablo Smircich
- Department of Genomics, IIBCE, 11,600, Montevideo, Uruguay
| | - Rosana Rodríguez-Casuriaga
- Laboratory of Molecular Biology of Reproduction, Department of Molecular Biology, Instituto de Investigaciones Biológicas Clemente Estable (IIBCE), 11,600, Montevideo, Uruguay
| | - Adriana Geisinger
- Laboratory of Molecular Biology of Reproduction, Department of Molecular Biology, Instituto de Investigaciones Biológicas Clemente Estable (IIBCE), 11,600, Montevideo, Uruguay.
- Biochemistry-Molecular Biology, Facultad de Ciencias, Universidad de la República (UdelaR), 11,400, Montevideo, Uruguay.
| | - José Sotelo-Silveira
- Department of Genomics, IIBCE, 11,600, Montevideo, Uruguay.
- Department of Cell and Molecular Biology, Facultad de Ciencias, UdelaR, 11,400, Montevideo, Uruguay.
| |
Collapse
|
19
|
Yan J, Qu W, Li X, Wang R, Tan J. GATLGEMF: A graph attention model with line graph embedding multi-complex features for ncRNA-protein interactions prediction. Comput Biol Chem 2024; 108:108000. [PMID: 38070456 DOI: 10.1016/j.compbiolchem.2023.108000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Revised: 11/27/2023] [Accepted: 12/03/2023] [Indexed: 01/22/2024]
Abstract
Non-coding RNA (ncRNA) plays an important role in many fundamental biological processes, and it may be closely associated with many complex human diseases. NcRNAs exert their functions by interacting with proteins. Therefore, identifying novel ncRNA-protein interactions (NPIs) is important for understanding the mechanism of ncRNAs role. The computational approach has the advantage of low cost and high efficiency. Machine learning and deep learning have achieved great success by making full use of sequence information and structure information. Graph neural network (GNN) is a deep learning algorithm for complex network link prediction, which can extract and discover features in graph topology data. In this study, we propose a new computational model called GATLGEMF. We used a line graph transformation strategy to obtain the most valuable feature information and input this feature information into the attention network to predict NPIs. The results on four benchmark datasets show that our method achieves superior performance. We further compare GATLGEMF with the state-of-the-art existing methods to evaluate the model performance. GATLGEMF shows the best performance with the area under curve (AUC) of 92.41% and 98.93% on RPI2241 and NPInter v2.0 datasets, respectively. In addition, a case study shows that GATLGEMF has the ability to predict new interactions based on known interactions. The source code is available at https://github.com/JianjunTan-Beijing/GATLGEMF.
Collapse
Affiliation(s)
- Jing Yan
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing University of Technology, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing 100124, China
| | - Wenyan Qu
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing University of Technology, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing 100124, China
| | - Xiaoyi Li
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing University of Technology, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing 100124, China
| | - Ruobing Wang
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing University of Technology, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing 100124, China
| | - Jianjun Tan
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing University of Technology, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing 100124, China.
| |
Collapse
|
20
|
Huiwen J, Kai S. Prediction of LncRNA-protein Interactions Using Auto-Encoder, SE-ResNet Models and Transfer Learning. Microrna 2024; 13:155-165. [PMID: 38591194 DOI: 10.2174/0122115366288068240322064431] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Revised: 02/26/2024] [Accepted: 03/09/2024] [Indexed: 04/10/2024]
Abstract
BACKGROUND Long non-coding RNA (lncRNA) plays a crucial role in various biological processes, and mutations or imbalances of lncRNAs can lead to several diseases, including cancer, Prader-Willi syndrome, autism, Alzheimer's disease, cartilage-hair hypoplasia, and hearing loss. Understanding lncRNA-protein interactions (LPIs) is vital for elucidating basic cellular processes, human diseases, viral replication, transcription, and plant pathogen resistance. Despite the development of several LPI calculation methods, predicting LPI remains challenging, with the selection of variables and deep learning structure being the focus of LPI research. METHODS We propose a deep learning framework called AR-LPI, which extracts sequence and secondary structure features of proteins and lncRNAs. The framework utilizes an auto-encoder for feature extraction and employs SE-ResNet for prediction. Additionally, we apply transfer learning to the deep neural network SE-ResNet for predicting small-sample datasets. RESULTS Through comprehensive experimental comparison, we demonstrate that the AR-LPI architecture performs better in LPI prediction. Specifically, the accuracy of AR-LPI increases by 2.86% to 94.52%, while the F-value of AR-LPI increases by 2.71% to 94.73%. CONCLUSION Our experimental results show that the overall performance of AR-LPI is better than that of other LPI prediction tools.
Collapse
Affiliation(s)
- Jiang Huiwen
- School of Mathematics and Statistics, Qingdao University, Qingdao, Shandong, China
| | - Song Kai
- School of Mathematics and Statistics, Qingdao University, Qingdao, Shandong, China
| |
Collapse
|
21
|
Pudova E, Kobelyatskaya A, Emelyanova M, Snezhkina A, Fedorova M, Pavlov V, Guvatova Z, Dalina A, Kudryavtseva A. Non-Coding RNAs and the Development of Chemoresistance to Docetaxel in Prostate Cancer: Regulatory Interactions and Approaches Based on Machine Learning Methods. Life (Basel) 2023; 13:2304. [PMID: 38137905 PMCID: PMC10744715 DOI: 10.3390/life13122304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 11/30/2023] [Accepted: 12/05/2023] [Indexed: 12/24/2023] Open
Abstract
Chemotherapy based on taxane-class drugs is the gold standard for treating advanced stages of various oncological diseases. However, despite the favorable response trends, most patients eventually develop resistance to this therapy. Drug resistance is the result of a combination of different events in the tumor cells under the influence of the drug, a comprehensive understanding of which has yet to be determined. In this review, we examine the role of the major classes of non-coding RNAs in the development of chemoresistance in the case of prostate cancer, one of the most common and socially significant types of cancer in men worldwide. We will focus on recent findings from experimental studies regarding the prognostic potential of the identified non-coding RNAs. Additionally, we will explore novel approaches based on machine learning to study these regulatory molecules, including their role in the development of drug resistance.
Collapse
Affiliation(s)
- Elena Pudova
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia
| | | | - Marina Emelyanova
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia
| | - Anastasiya Snezhkina
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia
| | - Maria Fedorova
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia
| | - Vladislav Pavlov
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia
| | - Zulfiya Guvatova
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia
- Russian Clinical Research Center for Gerontology, Pirogov Russian National Research Medical University, Ministry of Healthcare of the Russian Federation, 129226 Moscow, Russia
| | - Alexandra Dalina
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia
| | - Anna Kudryavtseva
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia
| |
Collapse
|
22
|
Chen XG, Yang X, Li C, Lin X, Zhang W. Non-coding RNA identification with pseudo RNA sequences and feature representation learning. Comput Biol Med 2023; 165:107355. [PMID: 37639767 DOI: 10.1016/j.compbiomed.2023.107355] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Revised: 07/16/2023] [Accepted: 08/12/2023] [Indexed: 08/31/2023]
Abstract
Distinguishing non-coding RNAs (ncRNAs) from coding RNAs is very important in bioinformatics. Although many methods have been proposed for solving this task, it remains highly challenging to further improve the accuracy of ncRNA identification. In this paper, we propose a coding potential predictor using feature representation learning based on pseudo RNA sequences named CPPFLPS. In this method, we use the pseudo RNA sequences generated by simulating RNA sequence mutations as new samples for data augmentation, and six string operations simulating RNA sequence mutations are considered: base replacement, base insertion, base deletion, subsequence reversion, subsequence repetition and subsequence deletion. In the feature representation learning framework, different types of pseudo RNA sequences are added to the training set to form new training sets that can be used to train baseline classifiers, thus obtaining baseline models. The resulting labels of these baseline models are used as feature vectors to represent RNA sequences, and the resulting feature vectors acquired after feature selection are used to train a predictive model for distinguishing ncRNAs from coding RNAs. Our method achieves better performance compared with that of existing state-of-the-art methods. The implementation of the proposed method is available at https://github.com/chenxgscuec/CPPFLPS.
Collapse
Affiliation(s)
- Xian-Gan Chen
- School of Biomedical Engineering, South-Central Minzu University, Wuhan, 430074, China; Hubei Key Laboratory of Medical Information Analysis and Tumor Diagnosis & Treatment, South-Central Minzu University, Wuhan, 430074, China; Key Laboratory of Cognitive Science(South-Central Minzu University), State Ethnic Affairs Commission, Wuhan, 430074, China.
| | - Xiaofei Yang
- School of Biomedical Engineering, South-Central Minzu University, Wuhan, 430074, China; Hubei Key Laboratory of Medical Information Analysis and Tumor Diagnosis & Treatment, South-Central Minzu University, Wuhan, 430074, China; Key Laboratory of Cognitive Science(South-Central Minzu University), State Ethnic Affairs Commission, Wuhan, 430074, China.
| | - Chenhong Li
- School of Biomedical Engineering, South-Central Minzu University, Wuhan, 430074, China; Hubei Key Laboratory of Medical Information Analysis and Tumor Diagnosis & Treatment, South-Central Minzu University, Wuhan, 430074, China; Key Laboratory of Cognitive Science(South-Central Minzu University), State Ethnic Affairs Commission, Wuhan, 430074, China.
| | - Xianguang Lin
- School of Biomedical Engineering, South-Central Minzu University, Wuhan, 430074, China; Hubei Key Laboratory of Medical Information Analysis and Tumor Diagnosis & Treatment, South-Central Minzu University, Wuhan, 430074, China; Key Laboratory of Cognitive Science(South-Central Minzu University), State Ethnic Affairs Commission, Wuhan, 430074, China.
| | - Wen Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China.
| |
Collapse
|
23
|
Ballarino M, Pepe G, Helmer-Citterich M, Palma A. Exploring the landscape of tools and resources for the analysis of long non-coding RNAs. Comput Struct Biotechnol J 2023; 21:4706-4716. [PMID: 37841333 PMCID: PMC10568309 DOI: 10.1016/j.csbj.2023.09.041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Revised: 09/28/2023] [Accepted: 09/28/2023] [Indexed: 10/17/2023] Open
Abstract
In recent years, research on long non-coding RNAs (lncRNAs) has gained considerable attention due to the increasing number of newly identified transcripts. Several characteristics make their functional evaluation challenging, which called for the urgent need to combine molecular biology with other disciplines, including bioinformatics. Indeed, the recent development of computational pipelines and resources has greatly facilitated both the discovery and the mechanisms of action of lncRNAs. In this review, we present a curated collection of the most recent computational resources, which have been categorized into distinct groups: databases and annotation, identification and classification, interaction prediction, and structure prediction. As the repertoire of lncRNAs and their analysis tools continues to expand over the years, standardizing the computational pipelines and improving the existing annotation of lncRNAs will be crucial to facilitate functional genomics studies.
Collapse
Affiliation(s)
- Monica Ballarino
- Department of Biology and Biotechnologies “Charles Darwin”, Sapienza University of Rome, Piazzale Aldo Moro 5, 00161 Rome, Italy
| | - Gerardo Pepe
- Department of Biology, University of Rome Tor Vergata, Via della Ricerca Scientifica, 1, 00133 Rome, Italy
| | - Manuela Helmer-Citterich
- Department of Biology, University of Rome Tor Vergata, Via della Ricerca Scientifica, 1, 00133 Rome, Italy
| | - Alessandro Palma
- Department of Biology and Biotechnologies “Charles Darwin”, Sapienza University of Rome, Piazzale Aldo Moro 5, 00161 Rome, Italy
| |
Collapse
|
24
|
Pronozin AY, Afonnikov DA. ICAnnoLncRNA: A Snakemake Pipeline for a Long Non-Coding-RNA Search and Annotation in Transcriptomic Sequences. Genes (Basel) 2023; 14:1331. [PMID: 37510236 PMCID: PMC10379598 DOI: 10.3390/genes14071331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 06/09/2023] [Accepted: 06/21/2023] [Indexed: 07/30/2023] Open
Abstract
Long non-coding RNAs (lncRNAs) are RNA molecules longer than 200 nucleotides that do not encode proteins. Experimental studies have shown the diversity and importance of lncRNA functions in plants. To expand knowledge about lncRNAs in other species, computational pipelines that allow for standardised data-processing steps in a mode that does not require user control up until the final result were actively developed recently. These advancements enable wider functionality for lncRNA data identification and analysis. In the present work, we propose the ICAnnoLncRNA pipeline for the automatic identification, classification and annotation of plant lncRNAs in assembled transcriptomic sequences. It uses the LncFinder software for the identification of lncRNAs and allows the adjustment of recognition parameters using genomic data for which lncRNA annotation is available. The pipeline allows the prediction of lncRNA candidates, alignment of lncRNA sequences to the reference genome, filtering of erroneous/noise transcripts and probable transposable elements, lncRNA classification by genome location, comparison with sequences from external databases and analysis of lncRNA structural features and expression. We used transcriptomic sequences from 15 maize libraries assembled by Trinity and Hisat2/StringTie to demonstrate the application of the ICAnnoLncRNA pipeline.
Collapse
Affiliation(s)
- Artem Yu Pronozin
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 630090 Novosibirsk, Russia
- Kurchatov Center for Genome Research, Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 630090 Novosibirsk, Russia
- Faculty of Natural Sciences, Novosibirsk State University, 630090 Novosibirsk, Russia
| | - Dmitry A Afonnikov
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 630090 Novosibirsk, Russia
- Kurchatov Center for Genome Research, Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 630090 Novosibirsk, Russia
- Faculty of Natural Sciences, Novosibirsk State University, 630090 Novosibirsk, Russia
| |
Collapse
|
25
|
Gao H, Gao P, Ye N. Prelnc2: A prediction tool for lncRNAs with enhanced multi-level features of RNAs. PLoS One 2023; 18:e0286377. [PMID: 37262050 DOI: 10.1371/journal.pone.0286377] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Accepted: 05/15/2023] [Indexed: 06/03/2023] Open
Abstract
Long non-coding RNAs (lncRNAs) have been widely studied for their important biological significance. In general, we need to distinguish them from protein coding RNAs (pcRNAs) with similar functions. Based on various strategies, algorithms and tools have been designed and developed to train and validate such classification capabilities. However, many of them lack certain scalability, versatility, and rely heavily on genome annotation. In this paper, we design a convenient and biologically meaningful classification tool "Prelnc2" using multi-scale position and frequency information of wavelet transform spectrum and generalizes the frequency statistics method. Finally, we used the extracted features and auxiliary features together to train the model and verify it with test data. PreLnc2 achieved 93.2% accuracy for animal and plant transcripts, outperforming PreLnc by 2.1% improvement and our method provides an effective alternative to the prediction of lncRNAs.
Collapse
Affiliation(s)
- Hua Gao
- College of Forestry, Nanjing Forestry University, Nanjing, China
- College of Information Science and Technology, Nanjing Forestry University, Nanjing, China
| | - Peng Gao
- The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
| | - Ning Ye
- College of Forestry, Nanjing Forestry University, Nanjing, China
- College of Information Science and Technology, Nanjing Forestry University, Nanjing, China
| |
Collapse
|
26
|
Palos K, Yu L, Railey CE, Nelson Dittrich AC, Nelson ADL. Linking discoveries, mechanisms, and technologies to develop a clearer perspective on plant long noncoding RNAs. THE PLANT CELL 2023; 35:1762-1786. [PMID: 36738093 PMCID: PMC10226578 DOI: 10.1093/plcell/koad027] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 12/19/2022] [Accepted: 12/22/2022] [Indexed: 05/30/2023]
Abstract
Long noncoding RNAs (lncRNAs) are a large and diverse class of genes in eukaryotic genomes that contribute to a variety of regulatory processes. Functionally characterized lncRNAs play critical roles in plants, ranging from regulating flowering to controlling lateral root formation. However, findings from the past decade have revealed that thousands of lncRNAs are present in plant transcriptomes, and characterization has lagged far behind identification. In this setting, distinguishing function from noise is challenging. However, the plant community has been at the forefront of discovery in lncRNA biology, providing many functional and mechanistic insights that have increased our understanding of this gene class. In this review, we examine the key discoveries and insights made in plant lncRNA biology over the past two and a half decades. We describe how discoveries made in the pregenomics era have informed efforts to identify and functionally characterize lncRNAs in the subsequent decades. We provide an overview of the functional archetypes into which characterized plant lncRNAs fit and speculate on new avenues of research that may uncover yet more archetypes. Finally, this review discusses the challenges facing the field and some exciting new molecular and computational approaches that may help inform lncRNA comparative and functional analyses.
Collapse
Affiliation(s)
- Kyle Palos
- Boyce Thompson Institute, Cornell University, Ithaca, NY 14853, USA
| | - Li’ang Yu
- Boyce Thompson Institute, Cornell University, Ithaca, NY 14853, USA
| | - Caylyn E Railey
- Boyce Thompson Institute, Cornell University, Ithaca, NY 14853, USA
- Plant Biology Graduate Field, Cornell University, Ithaca, NY 14853, USA
| | | | | |
Collapse
|
27
|
Han Y, Zhang SW. Docsubty: FLAncRPI-LGAT: Prediction of ncRNA-Protein Interactions with Line Graph Attention Network Framework. Comput Struct Biotechnol J 2023; 21:2286-2295. [PMID: 37035546 PMCID: PMC10073990 DOI: 10.1016/j.csbj.2023.03.027] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 03/11/2023] [Accepted: 03/16/2023] [Indexed: 03/19/2023] Open
Abstract
Identification of ncRNA-protein interactions (ncRPIs) through wet experiments is still time-consuming and highly-costly. Although several computational approaches have been developed to predict ncRPIs using the structure and sequence information of ncRNAs and proteins, the prediction accuracy needs to be improved, and the results lack interpretability. In this work, we proposed a novel computational method (called ncRPI-LGAT) to predict the ncRNA-Protein Interactions by transforming the link prediction (i.e., subgraph classification) task into a node classification task in the line network, and introducing a Line Graph ATtention network framework. ncRPI-LGAT first extracts the ncRNA/protein attributes using node2vec, and then generates the local enclosing subgraph of a target ncRNA-protein pair with SEAL. Because using the pooling operations in local enclosing subgraphs to learn a fixed-size feature vector for representing ncRNAs/proteins will cause the information loss, ncRPI-LGAT converts the local enclosing subgraphs into their corresponding line graphs, in which the node corresponds to the edge (i.e., ncRNA-protein pair) of the local enclosing subgraphs. Then, the attention mechanism-based graph neural network GATv2 is used on these line graphs to efficiently learn the embedding features of the target nodes (i.e., ncRNA-protein pairs) by focusing on learning the significance of one ncRNA-protein pair to another ncRNA-protein pair. These embedding features of one ncRNA-protein pair obtained from multi-head attention are concatenated in series and then fed them into a fully connected network to predict ncRPIs. Compared with other state-of-the-art methods in the 5CV test, ncRPI-LGAT shows superior performance on three benchmark datasets, demonstrating the effectiveness of our ncRPI-LGAT method in predicting ncRNA-protein interactions.
Collapse
|
28
|
Feng H, Wang S, Wang Y, Ni X, Yang Z, Hu X, Sen Yang. LncCat: An ORF attention model to identify LncRNA based on ensemble learning strategy and fused sequence information. Comput Struct Biotechnol J 2023; 21:1433-1447. [PMID: 36824229 PMCID: PMC9941877 DOI: 10.1016/j.csbj.2023.02.012] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 02/06/2023] [Accepted: 02/06/2023] [Indexed: 02/10/2023] Open
Abstract
Background Long non-coding RNA (lncRNA) is one of the most essential forms of transcripts, playing crucial regulatory roles in the development of cancers and diseases without protein-coding ability. It was assumed that short ORFs (sORFs) in lncRNA were weak to translate proteins. However, recent research has shown that sORFs can encode peptides, which increases the difficulty to identify lncRNA. Therefore, identifying lncRNAs with sORFs facilitates finding novel regulatory factors. Results In this paper, we propose LncCat for identifying lncRNA based on category boosting (CatBoost) and ORF-attention features. LncCat combines five types of features to encode transcript sequences and employs CatBoost to build a prediction model. In addition, the visualization comparison reveals that the ORF-attention features between lncRNAs and protein-coding transcripts are significantly distinct. The comparison results show that LncCat outperforms competing methods on several benchmark datasets. For Matthew's Correlation Coefficient (MCC), LncCat achieves 0.9503, 0.9219, 0.8591, 0.8672, and 0.9047 on the human, mouse, zebrafish, wheat, and chicken datasets, with improvements ranging from 1.90% to 7.82%, 1.49-17.63%, 6.11-21.50%, 3.02-51.64% and 5.35-26.90%, respectively. Moreover, LncCat dramatically improves the MCC by at least 11.90%, 12.96% and 42.61% on sORF test datasets of human, mouse, and zebrafish, respectively. Conclusions Experiments indicate that LncCat performs better both on long ORF and sORF datasets, and ORF-attention features show positive effects on predicting lncRNA. In brief, LncCat is a reliable method for identifying lncRNA. Additionally, a user-friendly web server is developed for academics at http://cczubio.top/lnccat.
Collapse
Affiliation(s)
- Hongqi Feng
- School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou 213164, China
| | - Shaocong Wang
- School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou 213164, China
| | - Yan Wang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
- School of Artificial Intelligence, Jilin University, Changchun 130012, China
| | - Xinye Ni
- The Affiliated Changzhou No.2 People’s Hospital of Nanjing Medical University, Changzhou 213164, China
| | - Zexi Yang
- School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou 213164, China
| | - Xuemei Hu
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
| | - Sen Yang
- School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou 213164, China
- The Affiliated Changzhou No.2 People’s Hospital of Nanjing Medical University, Changzhou 213164, China
| |
Collapse
|
29
|
Li Y, Sun H, Fang W, Ma Q, Han S, Wang-Sattler R, Du W, Yu Q. SURE: Screening Unlabeled Samples for Reliable Negative Samples Based on Reinforcement Learning. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2023.01.112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|
30
|
Zhao J, Sun J, Shuai SC, Zhao Q, Shuai J. Predicting potential interactions between lncRNAs and proteins via combined graph auto-encoder methods. Brief Bioinform 2023; 24:6896030. [PMID: 36515153 DOI: 10.1093/bib/bbac527] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 10/23/2022] [Accepted: 11/06/2022] [Indexed: 12/15/2022] Open
Abstract
Long noncoding RNA (lncRNA) is a kind of noncoding RNA with a length of more than 200 nucleotide units. Numerous research studies have proven that although lncRNAs cannot be directly translated into proteins, lncRNAs still play an important role in human growth processes by interacting with proteins. Since traditional biological experiments often require a lot of time and material costs to explore potential lncRNA-protein interactions (LPI), several computational models have been proposed for this task. In this study, we introduce a novel deep learning method known as combined graph auto-encoders (LPICGAE) to predict potential human LPIs. First, we apply a variational graph auto-encoder to learn the low dimensional representations from the high-dimensional features of lncRNAs and proteins. Then the graph auto-encoder is used to reconstruct the adjacency matrix for inferring potential interactions between lncRNAs and proteins. Finally, we minimize the loss of the two processes alternately to gain the final predicted interaction matrix. The result in 5-fold cross-validation experiments illustrates that our method achieves an average area under receiver operating characteristic curve of 0.974 and an average accuracy of 0.985, which is better than those of existing six state-of-the-art computational methods. We believe that LPICGAE can help researchers to gain more potential relationships between lncRNAs and proteins effectively.
Collapse
Affiliation(s)
- Jingxuan Zhao
- University of Science and Technology Liaoning, 66459, Anshan, China
| | | | - Stella C Shuai
- Northwestern University, 3270, Evanston, IllinoisUnited States
| | - Qi Zhao
- University of Science and Technology Liaoning, 66459, Anshan, China
| | - Jianwei Shuai
- Department of Physics, Xiamen University, Xiamen, China
| |
Collapse
|
31
|
Zhou B, Ding M, Feng J, Ji B, Huang P, Zhang J, Yu X, Cao Z, Yang Y, Zhou Y, Wang J. EVlncRNA-Dpred: improved prediction of experimentally validated lncRNAs by deep learning. Brief Bioinform 2022; 24:6961472. [PMID: 36573492 PMCID: PMC9851331 DOI: 10.1093/bib/bbac583] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 11/02/2022] [Accepted: 11/29/2022] [Indexed: 12/28/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) played essential roles in nearly every biological process and disease. Many algorithms were developed to distinguish lncRNAs from mRNAs in transcriptomic data and facilitated discoveries of more than 600 000 of lncRNAs. However, only a tiny fraction (<1%) of lncRNA transcripts (~4000) were further validated by low-throughput experiments (EVlncRNAs). Given the cost and labor-intensive nature of experimental validations, it is necessary to develop computational tools to prioritize those potentially functional lncRNAs because many lncRNAs from high-throughput sequencing (HTlncRNAs) could be resulted from transcriptional noises. Here, we employed deep learning algorithms to separate EVlncRNAs from HTlncRNAs and mRNAs. For overcoming the challenge of small datasets, we employed a three-layer deep-learning neural network (DNN) with a K-mer feature as the input and a small convolutional neural network (CNN) with one-hot encoding as the input. Three separate models were trained for human (h), mouse (m) and plant (p), respectively. The final concatenated models (EVlncRNA-Dpred (h), EVlncRNA-Dpred (m) and EVlncRNA-Dpred (p)) provided substantial improvement over a previous model based on support-vector-machines (EVlncRNA-pred). For example, EVlncRNA-Dpred (h) achieved 0.896 for the area under receiver-operating characteristic curve, compared with 0.582 given by sequence-based EVlncRNA-pred model. The models developed here should be useful for screening lncRNA transcripts for experimental validations. EVlncRNA-Dpred is available as a web server at https://www.sdklab-biophysics-dzu.net/EVlncRNA-Dpred/index.html, and the data and source code can be freely available along with the web server.
Collapse
Affiliation(s)
- Bailing Zhou
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Maolin Ding
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Jing Feng
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Baohua Ji
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Pingping Huang
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Junye Zhang
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Xue Yu
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Zanxia Cao
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Yaoqi Zhou
- Corresponding authors: Yaoqi Zhou, Institute for Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518055, China. Tel.: +86 (755) 6275 2684; E-mail: ; Jihua Wang, Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China. Tel.: +86 (534) 898 5933; E-mail:
| | - Jihua Wang
- Corresponding authors: Yaoqi Zhou, Institute for Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518055, China. Tel.: +86 (755) 6275 2684; E-mail: ; Jihua Wang, Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China. Tel.: +86 (534) 898 5933; E-mail:
| |
Collapse
|
32
|
Lindemann A, Brandes F, Borrmann M, Meidert AS, Kirchner B, Steinlein OK, Schelling G, Pfaffl MW, Reithmair M. Anesthetic‑specific lncRNA and mRNA profile changes in blood during colorectal cancer resection: A prospective, matched‑case pilot study. Oncol Rep 2022; 49:28. [PMID: 36562401 PMCID: PMC9813548 DOI: 10.3892/or.2022.8465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 11/03/2022] [Indexed: 12/23/2022] Open
Abstract
Prometastatic and antitumor effects of different anesthetics have been previously analyzed in several studies with conflicting results. Thus, the underlying perioperative molecular mechanisms mediated by anesthetics potentially affecting tumor phenotype and metastasis remain unclear. It was hypothesized that anesthetic‑specific long non‑coding RNA (lncRNA) expression changes are induced in the blood circulation and play a crucial role in tumor outcome. In the present study, high‑throughput sequencing and quantitative PCR were performed in order to identify lncRNA and mRNA expression changes affected by two therapeutic regimes, total intravenous anesthesia (TIVA) and volatile anesthetic gas (VAG) in patients undergoing colorectal cancer (CRC) resection. Total blood RNA was isolated prior to and following resection and characterized using RNA sequencing. mRNA‑lncRNA interactions and their roles in cancer‑related signaling of differentially expressed lncRNAs were identified using bioinformatics analyses. The comparison of these two time points revealed 35 differentially expressed lncRNAs in the TIVA‑group, and 25 in the VAG‑group, whereas eight were shared by both groups. Two lncRNAs in the TIVA‑group, and 23 in the VAG‑group of in silico identified target‑mRNAs were confirmed as differentially regulated in the NGS dataset of the present study. Pathway analysis was performed and cancer relevant canonical pathways for TIVA were identified. Target‑mRNA analysis of VAG revealed a markedly worsened immunological response against cancer. In this proof‑of‑concept study, anesthesic‑specific expression changes in lncRNA and mRNA profiles in blood were successfully identified. Moreover, the data of the present study provide the first evidence that anesthesia‑induced lncRNA pattern changes may contribute further in the observed differences in CRC outcome following tumor resection.
Collapse
Affiliation(s)
- Anja Lindemann
- Institute of Human Genetics, University Hospital, Ludwig-Maximilians-University Munich, 80336 Munich, Germany
| | - Florian Brandes
- Department of Anesthesiology, University Hospital, LMU Munich, 81377 Munich, Germany
| | - Melanie Borrmann
- Department of Anesthesiology, University Hospital, LMU Munich, 81377 Munich, Germany
| | - Agnes S. Meidert
- Department of Anesthesiology, University Hospital, LMU Munich, 81377 Munich, Germany
| | - Benedikt Kirchner
- Division of Animal Physiology and Immunology, School of Life Sciences Weihenstephan, Technical University of Munich, 85354 Freising, Germany
| | - Ortrud K. Steinlein
- Institute of Human Genetics, University Hospital, Ludwig-Maximilians-University Munich, 80336 Munich, Germany
| | - Gustav Schelling
- Department of Anesthesiology, University Hospital, LMU Munich, 81377 Munich, Germany
| | - Michael W. Pfaffl
- Division of Animal Physiology and Immunology, School of Life Sciences Weihenstephan, Technical University of Munich, 85354 Freising, Germany
| | - Marlene Reithmair
- Institute of Human Genetics, University Hospital, Ludwig-Maximilians-University Munich, 80336 Munich, Germany,Correspondence to: Dr Marlene Reithmair, Institute of Human Genetics, University Hospital, Ludwig-Maximilians-University Munich, Goethestraße 29, 80336 Munich, Germany, E-mail:
| |
Collapse
|
33
|
Singh D, Roy J. A large-scale benchmark study of tools for the classification of protein-coding and non-coding RNAs. Nucleic Acids Res 2022; 50:12094-12111. [PMID: 36420898 PMCID: PMC9757047 DOI: 10.1093/nar/gkac1092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Revised: 10/22/2022] [Accepted: 10/28/2022] [Indexed: 11/27/2022] Open
Abstract
Identification of protein-coding and non-coding transcripts is paramount for understanding their biological roles. Computational approaches have been addressing this task for over a decade; however, generalized and high-performance models are still unreliable. This benchmark study assessed the performance of 24 tools producing >55 models on the datasets covering a wide range of species. We have collected 135 small and large transcriptomic datasets from existing studies for comparison and identified the potential bottlenecks hampering the performance of current tools. The key insights of this study include lack of standardized training sets, reliance on homogeneous training data, gradual changes in annotated data, lack of augmentation with homology searches, the presence of false positives and negatives in datasets and the lower performance of end-to-end deep learning models. We also derived a new dataset, RNAChallenge, from the benchmark considering hard instances that may include potential false alarms. The best and least well performing models under- and overfit the dataset, respectively, thereby serving a dual purpose. For computational approaches, it will be valuable to develop accurate and unbiased models. The identification of false alarms will be of interest for genome annotators, and experimental study of hard RNAs will help to untangle the complexity of the RNA world.
Collapse
Affiliation(s)
- Dalwinder Singh
- To whom correspondence should be addressed. Tel: +91 172 5221206;
| | - Joy Roy
- Correspondence may also be addressed to Joy Roy.
| |
Collapse
|
34
|
Zhang H, Wang Y, Pan Z, Sun X, Mou M, Zhang B, Li Z, Li H, Zhu F. ncRNAInter: a novel strategy based on graph neural network to discover interactions between lncRNA and miRNA. Brief Bioinform 2022; 23:6747810. [PMID: 36198065 DOI: 10.1093/bib/bbac411] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 08/04/2022] [Accepted: 08/23/2022] [Indexed: 12/14/2022] Open
Abstract
In recent years, many studies have illustrated the significant role that non-coding RNA (ncRNA) plays in biological activities, in which lncRNA, miRNA and especially their interactions have been proved to affect many biological processes. Some in silico methods have been proposed and applied to identify novel lncRNA-miRNA interactions (LMIs), but there are still imperfections in their RNA representation and information extraction approaches, which imply there is still room for further improving their performances. Meanwhile, only a few of them are accessible at present, which limits their practical applications. The construction of a new tool for LMI prediction is thus imperative for the better understanding of their relevant biological mechanisms. This study proposed a novel method, ncRNAInter, for LMI prediction. A comprehensive strategy for RNA representation and an optimized deep learning algorithm of graph neural network were utilized in this study. ncRNAInter was robust and showed better performance of 26.7% higher Matthews correlation coefficient than existing reputable methods for human LMI prediction. In addition, ncRNAInter proved its universal applicability in dealing with LMIs from various species and successfully identified novel LMIs associated with various diseases, which further verified its effectiveness and usability. All source code and datasets are freely available at https://github.com/idrblab/ncRNAInter.
Collapse
Affiliation(s)
- Hanyu Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.,Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Yunxia Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Ziqi Pan
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Xiuna Sun
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Minjie Mou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Bing Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Zhaorong Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Honglin Li
- School of Computer Science and Technology, East China Normal University, Shanghai 200062, China.,Shanghai Key Laboratory of New Drug Design, East China University of Science and Technology, Shanghai 200237, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.,Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| |
Collapse
|
35
|
Han S, Yang X, Sun H, Yang H, Zhang Q, Peng C, Fang W, Li Y. LION: an integrated R package for effective prediction of ncRNA-protein interaction. Brief Bioinform 2022; 23:6713512. [PMID: 36155620 DOI: 10.1093/bib/bbac420] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 08/03/2022] [Accepted: 08/30/2022] [Indexed: 12/14/2022] Open
Abstract
Understanding ncRNA-protein interaction is of critical importance to unveil ncRNAs' functions. Here, we propose an integrated package LION which comprises a new method for predicting ncRNA/lncRNA-protein interaction as well as a comprehensive strategy to meet the requirement of customisable prediction. Experimental results demonstrate that our method outperforms its competitors on multiple benchmark datasets. LION can also improve the performance of some widely used tools and build adaptable models for species- and tissue-specific prediction. We expect that LION will be a powerful and efficient tool for the prediction and analysis of ncRNA/lncRNA-protein interaction. The R Package LION is available on GitHub at https://github.com/HAN-Siyu/LION/.
Collapse
Affiliation(s)
- Siyu Han
- College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, in Jilin University, China
| | - Xiao Yang
- College of Computer Science and Technology, Jilin University, Changchun, China
| | - Hang Sun
- College of Computer Science and Technology, Jilin University, Changchun, China
| | - Hu Yang
- 964 Hospital of Joint Logistic Support Force of the Chinese People's Liberation Army
| | - Qi Zhang
- College of Computer Science and Technology, Jilin University, Changchun, China
| | - Cheng Peng
- School of Software, Tsinghua University, Beijing, China
| | - Wensi Fang
- College of Computer Science and Technology, Jilin University, Changchun, China
| | - Ying Li
- College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| |
Collapse
|
36
|
Shetty A, Venkatesh T, Kabbekodu SP, Tsutsumi R, Suresh PS. LncRNA-miRNA-mRNA regulatory axes in endometrial cancer: a comprehensive overview. Arch Gynecol Obstet 2022; 306:1431-1447. [PMID: 35182183 DOI: 10.1007/s00404-022-06423-5] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Accepted: 01/24/2022] [Indexed: 12/12/2022]
Abstract
INTRODUCTION Recent research on tumorigenesis and progression has opened up an array of novel molecular mechanisms in the form of interactions between cellular non-coding RNAs (long non-coding RNA[lncRNA]/microRNA [miRNA]) and coding transcripts that regulate health and disease. Endometrial cancer (EC) is a prominent gynecological malignancy with a high incidence rate and poorly known etiology and prognostic factors that hinder the success of disease management. The emerging role of lncRNA-miRNA-mRNA interactions and their dysregulation in the pathophysiology of EC has been elucidated in many recent studies. METHODS A thorough literature review was conducted to explore information about lncRNA-miRNA-mRNA axes in EC. RESULTS Several lncRNAs act as molecular sponges that sequester various tumor suppressor miRNAs to inhibit their function, leading to the dysregulation of their target mRNA transcripts that contribute to the EC regulation. CONCLUSIONS This review summarizes these networks of molecular mechanisms and their contribution to different aspects of endometrial carcinogenesis, leading to a better conceptualization of the molecular pathways that underlie the disease and helping establish novel diagnostic biomarkers and therapeutic intervention points to aid the curative intent of EC.
Collapse
Affiliation(s)
- Abhishek Shetty
- Department of Biosciences, Mangalore University, Mangalagangothri, Mangalore, 574 199, Karnataka, India
| | - Thejaswini Venkatesh
- Department of Biochemistry and Molecular Biology, Central University of Kerala, Kasargod, 671316, Kerala, India
| | - Shama Prasada Kabbekodu
- Department of Cell and Molecular Biology, School of Life Sciences, Manipal Academy of Higher Education, Manipal, Karnataka, India
| | - Rie Tsutsumi
- Department of Nutrition and Metabolism, Institute of Biomedical Sciences, Tokushima University Graduate School, 3-18-15, Kuramoto-cho, Tokushima City, 770-8503, Japan
| | - Padmanaban S Suresh
- School of Biotechnology, National Institute of Technology, Calicut, 673601, Kerala, India.
| |
Collapse
|
37
|
Zhuo L, Chen Y, Song B, Liu Y, Su Y. A model for predicting ncRNA-protein interactions based on graph neural networks and community detection. Methods 2022; 207:74-80. [PMID: 36108992 DOI: 10.1016/j.ymeth.2022.09.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Revised: 08/07/2022] [Accepted: 09/03/2022] [Indexed: 10/31/2022] Open
Abstract
Non-coding RNA (ncRNA) s play an considerable role in the current biological sciences, such as gene transcription, gene expression, etc. Exploring the ncRNA-protein interactions(NPI) is of great significance, while some experimental techniques are very expensive in terms of time consumption and labor cost. This has promoted the birth of some computational algorithms related to traditional statistics and artificial intelligence. However, these algorithms usually require the sequence or structural feature vector of the molecule. Although graph neural network (GNN) s has been widely used in recent academic and industrial researches, its potential remains unexplored in the field of detecting NPI. Hence, we present a novel GNN-based model to detect NPI in this paper, where the detecting problem of NPI is transformed into the graph link prediction problem. Specifically, the proposed method utilizes two groups of labels to distinguish two different types of nodes: ncRNA and protein, which alleviates the problem of over-coupling in graph network. Subsequently, ncRNA and protein embedding is initially optimized based on the cluster ownership relationship of nodes in the graph. Moreover, the model applies a self-attention mechanism to preserve the graph topology to reduce information loss during pooling. The experimental results indicate that the proposed model indeed has superior performance.
Collapse
Affiliation(s)
- Linlin Zhuo
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou, Zhejiang 325035, China; College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China
| | - Yifan Chen
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China.
| | - Bosheng Song
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China
| | - Yuansheng Liu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China
| | - Yansen Su
- Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, Hefei 230601, China.
| |
Collapse
|
38
|
Zhuo L, Song B, Liu Y, Li Z, Fu X. Predicting ncRNA-protein interactions based on dual graph convolutional network and pairwise learning. Brief Bioinform 2022; 23:6691912. [PMID: 36063562 DOI: 10.1093/bib/bbac339] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Revised: 07/05/2022] [Accepted: 07/25/2022] [Indexed: 11/14/2022] Open
Abstract
Noncoding RNAs (ncRNAs) have recently attracted considerable attention due to their key roles in biology. The ncRNA-proteins interaction (NPI) is often explored to reveal some biological activities that ncRNA may affect, such as biological traits, diseases, etc. Traditional experimental methods can accomplish this work but are often labor-intensive and expensive. Machine learning and deep learning methods have achieved great success by exploiting sufficient sequence or structure information. Graph Neural Network (GNN)-based methods consider the topology in ncRNA-protein graphs and perform well on tasks like NPI prediction. Based on GNN, some pairwise constraint methods have been developed to apply on homogeneous networks, but not used for NPI prediction on heterogeneous networks. In this paper, we construct a pairwise constrained NPI predictor based on dual Graph Convolutional Network (GCN) called NPI-DGCN. To our knowledge, our method is the first to train a heterogeneous graph-based model using a pairwise learning strategy. Instead of binary classification, we use a rank layer to calculate the score of an ncRNA-protein pair. Moreover, our model is the first to predict NPIs on the ncRNA-protein bipartite graph rather than the homogeneous graph. We transform the original ncRNA-protein bipartite graph into two homogenous graphs on which to explore second-order implicit relationships. At the same time, we model direct interactions between two homogenous graphs to explore explicit relationships. Experimental results on the four standard datasets indicate that our method achieves competitive performance with other state-of-the-art methods. And the model is available at https://github.com/zhuoninnin1992/NPIPredict.
Collapse
Affiliation(s)
- Linlin Zhuo
- College of Data Science and Artificial Intelligence, Wenzhou University of Technology, 325027, Wenzhou, China
| | - Bosheng Song
- College of Computer Science and Electronic Engineering, Hunan University, 410082, Changsha, China
| | - Yuansheng Liu
- College of Computer Science and Electronic Engineering, Hunan University, 410082, Changsha, China
| | - Zejun Li
- School of Computer and Information Science, Hunan Institute of Technology, 421000, Hengyang, China
| | - Xiangzheng Fu
- College of Computer Science and Electronic Engineering, Hunan University, 410082, Changsha, China
| |
Collapse
|
39
|
Lin R, Wichadakul D. Interpretable Deep Learning Model Reveals Subsequences of Various Functions for Long Non-Coding RNA Identification. Front Genet 2022; 13:876721. [PMID: 35685437 PMCID: PMC9173695 DOI: 10.3389/fgene.2022.876721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Accepted: 04/11/2022] [Indexed: 11/13/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) play crucial roles in many biological processes and are implicated in several diseases. With the next-generation sequencing technologies, substantial unannotated transcripts have been discovered. Classifying unannotated transcripts using biological experiments are more time-consuming and expensive than computational approaches. Several tools are available for identifying long non-coding RNAs. These tools, however, did not explain the features in their tools that contributed to the prediction results. Here, we present Xlnc1DCNN, a tool for distinguishing long non-coding RNAs (lncRNAs) from protein-coding transcripts (PCTs) using a one-dimensional convolutional neural network with prediction explanations. The evaluation results of the human test set showed that Xlnc1DCNN outperformed other state-of-the-art tools in terms of accuracy and F1-score. The explanation results revealed that lncRNA transcripts were mainly identified as sequences with no conserved regions, short patterns with unknown functions, or only regions of transmembrane helices while protein-coding transcripts were mostly classified by conserved protein domains or families. The explanation results also conveyed the probably inconsistent annotations among the public databases, lncRNA transcripts which contain protein domains, protein families, or intrinsically disordered regions (IDRs). Xlnc1DCNN is freely available at https://github.com/cucpbioinfo/Xlnc1DCNN.
Collapse
Affiliation(s)
- Rattaphon Lin
- Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Pathumwan, Thailand
| | - Duangdao Wichadakul
- Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Pathumwan, Thailand
- Center of Excellence in Systems Biology, Faculty of Medicine, Chulalongkorn University, Pathumwan, Thailand
| |
Collapse
|
40
|
Ammunét T, Wang N, Khan S, Elo LL. Deep learning tools are top performers in long non-coding RNA prediction. Brief Funct Genomics 2022; 21:230-241. [PMID: 35136929 PMCID: PMC9123429 DOI: 10.1093/bfgp/elab045] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Revised: 11/08/2021] [Accepted: 12/02/2021] [Indexed: 11/23/2022] Open
Abstract
The increasing amount of transcriptomic data has brought to light vast numbers of potential novel RNA transcripts. Accurately distinguishing novel long non-coding RNAs (lncRNAs) from protein-coding messenger RNAs (mRNAs) has challenged bioinformatic tool developers. Most recently, tools implementing deep learning architectures have been developed for this task, with the potential of discovering sequence features and their interactions still not surfaced in current knowledge. We compared the performance of deep learning tools with other predictive tools that are currently used in lncRNA coding potential prediction. A total of 15 tools representing the variety of available methods were investigated. In addition to known annotated transcripts, we also evaluated the use of the tools in actual studies with real-life data. The robustness and scalability of the tools' performance was tested with varying sized test sets and test sets with different proportions of lncRNAs and mRNAs. In addition, the ease-of-use for each tested tool was scored. Deep learning tools were top performers in most metrics and labelled transcripts similarly with each other in the real-life dataset. However, the proportion of lncRNAs and mRNAs in the test sets affected the performance of all tools. Computational resources were utilized differently between the top-ranking tools, thus the nature of the study may affect the decision of choosing one well-performing tool over another. Nonetheless, the results suggest favouring the novel deep learning tools over other tools currently in broad use.
Collapse
Affiliation(s)
- Tea Ammunét
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland
| | - Ning Wang
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland
| | - Sofia Khan
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland
| | - Laura L Elo
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland
- Institute of Biomedicine, University of Turku, Turku, Finland
| |
Collapse
|
41
|
Xu D, Yuan W, Fan C, Liu B, Lu MZ, Zhang J. Opportunities and Challenges of Predictive Approaches for the Non-coding RNA in Plants. FRONTIERS IN PLANT SCIENCE 2022; 13:890663. [PMID: 35498708 PMCID: PMC9048598 DOI: 10.3389/fpls.2022.890663] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Accepted: 03/28/2022] [Indexed: 06/01/2023]
Affiliation(s)
- Dong Xu
- State Key Laboratory of Subtropical Silviculture, College of Forestry and Biotechnology, Zhejiang A&F University, Hangzhou, China
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Wenya Yuan
- State Key Laboratory of Subtropical Silviculture, College of Forestry and Biotechnology, Zhejiang A&F University, Hangzhou, China
| | - Chunjie Fan
- State Key Laboratory of Tree Genetics and Breeding, Research Institute of Tropical Forestry, Chinese Academy of Forestry, Guangzhou, China
| | - Bobin Liu
- Jiangsu Key Laboratory for Bioresources of Saline Soils, Jiangsu Synthetic Innovation Center for Coastal Bio-agriculture, School of Wetlands, Yancheng Teachers University, Yancheng, China
| | - Meng-Zhu Lu
- State Key Laboratory of Subtropical Silviculture, College of Forestry and Biotechnology, Zhejiang A&F University, Hangzhou, China
| | - Jin Zhang
- State Key Laboratory of Subtropical Silviculture, College of Forestry and Biotechnology, Zhejiang A&F University, Hangzhou, China
| |
Collapse
|
42
|
Feng S, Li H, Qiao J. Hierarchical multi-label classification based on LSTM network and Bayesian decision theory for LncRNA function prediction. Sci Rep 2022; 12:5819. [PMID: 35388048 PMCID: PMC8986818 DOI: 10.1038/s41598-022-09672-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Accepted: 03/21/2022] [Indexed: 02/01/2023] Open
Abstract
Growing evidence shows that long noncoding RNAs (lncRNAs) play an important role in cellular biological processes at multiple levels, such as gene imprinting, immune response, and genetic regulation, and are closely related to diseases because of their complex and precise control. However, most functions of lncRNAs remain undiscovered. Current computational methods for exploring lncRNA functions can avoid high-throughput experiments, but they usually focus on the construction of similarity networks and ignore the certain directed acyclic graph (DAG) formed by gene ontology annotations. In this paper, we view the function annotation work as a hierarchical multilabel classification problem and design a method HLSTMBD for classification with DAG-structured labels. With the help of a mathematical model based on Bayesian decision theory, the HLSTMBD algorithm is implemented with the long-short term memory network and a hierarchical constraint method DAGLabel. Compared with other state-of-the-art algorithms, the results on GOA-lncRNA datasets show that the proposed method can efficiently and accurately complete the label prediction work.
Collapse
Affiliation(s)
- Shou Feng
- College of Information and Communication Engineering, Harbin Engineering University, Harbin, 150001, China.,Ministry of Industry and Information Technology, Key Laboratory of Advanced Marine Communication and Information Technology, Harbin, 150001, China
| | - Huiying Li
- Harbin Institute of Technology, School of Electronic and Information Engineering, Harbin, 150001, China
| | - Jiaqing Qiao
- Harbin Institute of Technology, School of Electronic and Information Engineering, Harbin, 150001, China.
| |
Collapse
|
43
|
Guo B, Jiang T, Wu F, Ni H, Ye J, Wu X, Ni C, Jiang M, Ye L, Li Z, Zheng X, Li S, Yang Q, Wang Z, Huang X, Zhao C. LncRNA RP5-998N21.4 promotes immune defense through upregulation of IFIT2 and IFIT3 in schizophrenia. SCHIZOPHRENIA (HEIDELBERG, GERMANY) 2022; 8:11. [PMID: 35232977 PMCID: PMC8888552 DOI: 10.1038/s41537-021-00195-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/26/2021] [Accepted: 11/26/2021] [Indexed: 12/31/2022]
Abstract
Schizophrenia is a complex polygenic disease that is affected by genetic, developmental, and environmental factors. Accumulating evidence indicates that environmental factors such as maternal infection and excessive prenatal neuroinflammation may contribute to the onset of schizophrenia by affecting epigenetic modification. We recently identified a schizophrenia-associated upregulated long noncoding RNA (lncRNA) RP5-998N21.4 by transcriptomic analysis of monozygotic twins discordant for schizophrenia. Importantly, we found that genes coexpressed with RP5-998N21.4 were enriched in immune defense-related biological processes in twin subjects and in RP5-998N21.4-overexpressing (OE) SK-N-SH cell lines. We then identified two genes encoding an interferon-induced protein with tetratricopeptide repeat (IFIT) 2 and 3, which play an important role in immune defense, as potential targets of RP5-998N21.4 by integrative analysis of RP5-998N21.4OE-induced differentially expressed genes (DEGs) in SK-N-SH cells and RP5-998N21.4-coexpressed schizophrenia-associated DEGs from twin subjects. We further demonstrated that RP5-998N21.4 positively regulates the transcription of IFIT2 and IFIT3 by binding to their promoter regions and affecting their histone modifications. In addition, as a general nuclear coactivator, RMB14 (encoding RNA binding motif protein 14) was identified to facilitate the regulatory role of RP5-998N21.4 in IFIT2 and IFIT3 transcription. Finally, we observed that RP5-998N21.4OE can enhance IFIT2- and IFIT3-mediated immune defense responses through activation of signal transducer and activator of transcription 1 (STAT1) signaling pathway in U251 astrocytoma cells under treatment with the viral mimetic polyinosinic: polycytidylic acid (poly I:C). Taken together, our findings suggest that lncRNA RP5-998N21.4 is a critical regulator of immune defense, providing etiological and therapeutic implications for schizophrenia.
Collapse
Affiliation(s)
- Bo Guo
- Department of Medical Genetics, School of Basic Medical Sciences, and Guangdong Technology and Engineering Research Center for Molecular Diagnostics of Human Genetic Diseases, Southern Medical University, Guangzhou, Guangdong, China.,Key Laboratory of Mental Health of the Ministry of Education, Guangdong-Hong Kong-Macao Greater Bay Area Center for Brain Science and Brain-Inspired Intelligence, and Guangdong Province Key Laboratory of Psychiatric Disorders, Southern Medical University, Guangzhou, Guangdong, China
| | - Tingyun Jiang
- The Third People's Hospital of Zhongshan, Zhongshan, Guangdong, China
| | - Fengchun Wu
- Department of Psychiatry, the Affiliated Brain Hospital of Guangzhou Medical University (Guangzhou Huiai Hospital), Guangzhou, Guangdong, China
| | - Hongyu Ni
- Department of Medical Genetics, School of Basic Medical Sciences, and Guangdong Technology and Engineering Research Center for Molecular Diagnostics of Human Genetic Diseases, Southern Medical University, Guangzhou, Guangdong, China
| | - Junping Ye
- Department of Medical Genetics, School of Basic Medical Sciences, and Guangdong Technology and Engineering Research Center for Molecular Diagnostics of Human Genetic Diseases, Southern Medical University, Guangzhou, Guangdong, China
| | - Xiaohui Wu
- Department of Medical Genetics, School of Basic Medical Sciences, and Guangdong Technology and Engineering Research Center for Molecular Diagnostics of Human Genetic Diseases, Southern Medical University, Guangzhou, Guangdong, China
| | - Chaoying Ni
- Department of Medical Genetics, School of Basic Medical Sciences, and Guangdong Technology and Engineering Research Center for Molecular Diagnostics of Human Genetic Diseases, Southern Medical University, Guangzhou, Guangdong, China
| | - Meijun Jiang
- Guangdong Mental Health Center, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Linyan Ye
- Department of Medical Genetics, School of Basic Medical Sciences, and Guangdong Technology and Engineering Research Center for Molecular Diagnostics of Human Genetic Diseases, Southern Medical University, Guangzhou, Guangdong, China
| | - Zhongwei Li
- Department of Medical Genetics, School of Basic Medical Sciences, and Guangdong Technology and Engineering Research Center for Molecular Diagnostics of Human Genetic Diseases, Southern Medical University, Guangzhou, Guangdong, China
| | - Xianzhen Zheng
- Guangdong Mental Health Center, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Shufen Li
- Department of Medical Genetics, School of Basic Medical Sciences, and Guangdong Technology and Engineering Research Center for Molecular Diagnostics of Human Genetic Diseases, Southern Medical University, Guangzhou, Guangdong, China
| | - Qiong Yang
- Department of Psychiatry, the Affiliated Brain Hospital of Guangzhou Medical University (Guangzhou Huiai Hospital), Guangzhou, Guangdong, China
| | - Zhongju Wang
- Department of Medical Genetics, School of Basic Medical Sciences, and Guangdong Technology and Engineering Research Center for Molecular Diagnostics of Human Genetic Diseases, Southern Medical University, Guangzhou, Guangdong, China
| | - Xingbing Huang
- Department of Psychiatry, the Affiliated Brain Hospital of Guangzhou Medical University (Guangzhou Huiai Hospital), Guangzhou, Guangdong, China.
| | - Cunyou Zhao
- Department of Medical Genetics, School of Basic Medical Sciences, and Guangdong Technology and Engineering Research Center for Molecular Diagnostics of Human Genetic Diseases, Southern Medical University, Guangzhou, Guangdong, China. .,Key Laboratory of Mental Health of the Ministry of Education, Guangdong-Hong Kong-Macao Greater Bay Area Center for Brain Science and Brain-Inspired Intelligence, and Guangdong Province Key Laboratory of Psychiatric Disorders, Southern Medical University, Guangzhou, Guangdong, China. .,Experimental Education/Administration Center, School of Basic Medical Science, Southern Medical University, Guangzhou, China. .,Department of Rehabilitation, Zhujiang Hospital of Southern Medical University, Guangzhou, China.
| |
Collapse
|
44
|
Chen XG, Liu S, Zhang W. Predicting Coding Potential of RNA Sequences by Solving Local Data Imbalance. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1075-1083. [PMID: 32886613 DOI: 10.1109/tcbb.2020.3021800] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Non-coding RNAs (ncRNAs)play an important role in various biological processes and are associated with diseases. Distinguishing between coding RNAs and ncRNAs, also known as predicting coding potential of RNA sequences, is critical for downstream biological function analysis. Many machine learning-based methods have been proposed for predicting coding potential of RNA sequences. Recent studies reveal that most existing methods have poor performance on RNA sequences with short Open Reading Frames (sORF, ORF length<303nt). In this work, we analyze the distribution of ORF length of RNA sequences, and observe that the number of coding RNAs with sORF is inadequate and coding RNAs with sORF are much less than ncRNAs with sORF. Thus, there exists the problem of local data imbalance in RNA sequences with sORF. We propose a coding potential prediction method CPE-SLDI, which uses data oversampling techniques to augment samples for coding RNAs with sORF so as to alleviate local data imbalance. Compared with existing methods, CPE-SLDI produces the better performances, and studies reveal that data augmentation by various data oversampling techniques can enhance the performance of coding potential prediction, especially for RNA sequences with sORF. The implementation of the proposed method is available at https://github.com/chenxgscuec/CPESLDI.
Collapse
|
45
|
Abstract
Most of the transcribed human genome codes for noncoding RNAs (ncRNAs), and long noncoding RNAs (lncRNAs) make for the lion's share of the human ncRNA space. Despite growing interest in lncRNAs, because there are so many of them, and because of their tissue specialization and, often, lower abundance, their catalog remains incomplete and there are multiple ongoing efforts to improve it. Consequently, the number of human lncRNA genes may be lower than 10,000 or higher than 200,000. A key open challenge for lncRNA research, now that so many lncRNA species have been identified, is the characterization of lncRNA function and the interpretation of the roles of genetic and epigenetic alterations at their loci. After all, the most important human genes to catalog and study are those that contribute to important cellular functions-that affect development or cell differentiation and whose dysregulation may play a role in the genesis and progression of human diseases. Multiple efforts have used screens based on RNA-mediated interference (RNAi), antisense oligonucleotide (ASO), and CRISPR screens to identify the consequences of lncRNA dysregulation and predict lncRNA function in select contexts, but these approaches have unresolved scalability and accuracy challenges. Instead-as was the case for better-studied ncRNAs in the past-researchers often focus on characterizing lncRNA interactions and investigating their effects on genes and pathways with known functions. Here, we focus most of our review on computational methods to identify lncRNA interactions and to predict the effects of their alterations and dysregulation on human disease pathways.
Collapse
|
46
|
PLncWX: A Machine-Learning Algorithm for Plant lncRNA Identification Based on WOA-XGBoost. J CHEM-NY 2021. [DOI: 10.1155/2021/6256021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Long noncoding RNAs (lncRNAs) are a class of RNAs longer than 200 nt and cannot encode the protein. Studies have shown that lncRNAs can regulate gene expression at the epigenetic, transcriptional, and posttranscriptional levels, which are not only closely related to the occurrence, development, and prevention of human diseases, but also can regulate plant flowering and participate in plant abiotic stress responses such as drought and salt. Therefore, how to accurately and efficiently identify lncRNAs is still an essential job of relevant researches. There have been a large number of identification tools based on machine-learning and deep learning algorithms, mostly using human and mouse gene sequences as training sets, seldom plants, and only using one or one class of feature selection methods after feature extraction. We developed an identification model containing dicot, monocot, algae, moss, and fern. After comparing 20 feature selection methods (seven filter and thirteen wrapper methods) combined with seven classifiers, respectively, considering the correlation between features and model redundancy at the same time, we found that the WOA-XGBoost-based model had better performance with 91.55%, 96.78%, and 91.68% of accuracy, AUC, and F1_score. Meanwhile, the number of elements in the feature subset was reduced to 23, which effectively improved the prediction accuracy and modeling efficiency.
Collapse
|
47
|
Zhang Y, Long Y, Kwoh CK. Class similarity network for coding and long non-coding RNA classification. BMC Bioinformatics 2021; 22:609. [PMID: 34930120 PMCID: PMC8691036 DOI: 10.1186/s12859-021-04517-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Accepted: 12/06/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Long non-coding RNAs (lncRNAs) play significant roles in varieties of physiological and pathological processes.The premise of the lncRNA functional study is that the lncRNAs are identified correctly. Recently, deep learning method like convolutional neural network (CNN) has been successfully applied to identify the lncRNAs. However, the traditional CNN considers little relationships among samples via an indirect way. RESULTS Inspired by the Siamese Neural Network (SNN), here we propose a novel network named Class Similarity Network in coding RNA and lncRNA classification. Class Similarity Network considers more relationships among input samples in a direct way. It focuses on exploring the potential relationships between input samples and samples from both the same class and the different classes. To achieve this, Class Similarity Network trains the parameters specific to each class to obtain the high-level features and represents the general similarity to each class in a node. The comparison results on the validation dataset under the same conditions illustrate the superiority of our Class Similarity Network to the baseline CNN. Besides, our method performs effectively and achieves state-of-the-art performances on two test datasets. CONCLUSIONS We construct Class Similarity Network in coding RNA and lncRNA classification, which is shown to work effectively on two different datasets by achieving accuracy, precision, and F1-score as 98.43%, 0.9247, 0.9374, and 97.54%, 0.9990, 0.9860, respectively.
Collapse
Affiliation(s)
- Yu Zhang
- School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore, 639798, Singapore.,Wellcome Trust - Medical Research Council Cambridge Stem Cell Institute, Cambridge, CB2 0AW, UK
| | - Yahui Long
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410000, China
| | - Chee Keong Kwoh
- School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore, 639798, Singapore.
| |
Collapse
|
48
|
Klapproth C, Sen R, Stadler PF, Findeiß S, Fallmann J. Common Features in lncRNA Annotation and Classification: A Survey. Noncoding RNA 2021; 7:77. [PMID: 34940758 PMCID: PMC8708962 DOI: 10.3390/ncrna7040077] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Revised: 12/03/2021] [Accepted: 12/06/2021] [Indexed: 12/29/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) are widely recognized as important regulators of gene expression. Their molecular functions range from miRNA sponging to chromatin-associated mechanisms, leading to effects in disease progression and establishing them as diagnostic and therapeutic targets. Still, only a few representatives of this diverse class of RNAs are well studied, while the vast majority is poorly described beyond the existence of their transcripts. In this review we survey common in silico approaches for lncRNA annotation. We focus on the well-established sets of features used for classification and discuss their specific advantages and weaknesses. While the available tools perform very well for the task of distinguishing coding sequence from other RNAs, we find that current methods are not well suited to distinguish lncRNAs or parts thereof from other non-protein-coding input sequences. We conclude that the distinction of lncRNAs from intronic sequences and untranslated regions of coding mRNAs remains a pressing research gap.
Collapse
Affiliation(s)
- Christopher Klapproth
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany; (C.K.); (P.F.S.); (S.F.)
| | - Rituparno Sen
- Helmholtz Institute for RNA-Based Infection Research (HIRI), Helmholtz-Center for Infection Research (HZI), D-97080 Würzburg, Germany;
| | - Peter F. Stadler
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany; (C.K.); (P.F.S.); (S.F.)
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Competence Center for Scalable Data Services and Solutions, and Leipzig Research Center for Civilization Diseases, University Leipzig, D-04103 Leipzig, Germany
- Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig, Germany
- Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Vienna, Austria
- Facultad de Ciencias, Universidad National de Colombia, Bogotá CO-111321, Colombia
- Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501, USA
| | - Sven Findeiß
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany; (C.K.); (P.F.S.); (S.F.)
| | - Jörg Fallmann
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany; (C.K.); (P.F.S.); (S.F.)
| |
Collapse
|
49
|
LGFC-CNN: Prediction of lncRNA-Protein Interactions by Using Multiple Types of Features through Deep Learning. Genes (Basel) 2021; 12:genes12111689. [PMID: 34828296 PMCID: PMC8621699 DOI: 10.3390/genes12111689] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2021] [Revised: 10/11/2021] [Accepted: 10/22/2021] [Indexed: 12/12/2022] Open
Abstract
Long noncoding RNA (lncRNA) plays a crucial role in many critical biological processes and participates in complex human diseases through interaction with proteins. Considering that identifying lncRNA–protein interactions through experimental methods is expensive and time-consuming, we propose a novel method based on deep learning that combines raw sequence composition features, hand-designed features and structure features, called LGFC-CNN, to predict lncRNA–protein interactions. The two sequence preprocessing methods and CNN modules (GloCNN and LocCNN) are utilized to extract the raw sequence global and local features. Meanwhile, we select hand-designed features by comparing the predictive effect of different lncRNA and protein features combinations. Furthermore, we obtain the structure features and unifying the dimensions through Fourier transform. In the end, the four types of features are integrated to comprehensively predict the lncRNA–protein interactions. Compared with other state-of-the-art methods on three lncRNA–protein interaction datasets, LGFC-CNN achieves the best performance with an accuracy of 94.14%, on RPI21850; an accuracy of 92.94%, on RPI7317; and an accuracy of 98.19% on RPI1847. The results show that our LGFC-CNN can effectively predict the lncRNA–protein interactions by combining raw sequence composition features, hand-designed features and structure features.
Collapse
|
50
|
Tan X, Li Q, Zhang Q, Fan G, Liu Z, Zhou K. Integrative Analysis Reveals Potentially Functional N6-Methylandenosine-Related Long Noncoding RNAs in Colon Adenocarcinoma. Front Genet 2021; 12:739344. [PMID: 34603397 PMCID: PMC8484874 DOI: 10.3389/fgene.2021.739344] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2021] [Accepted: 09/07/2021] [Indexed: 01/23/2023] Open
Abstract
N6-methyladenosine (m6A) is one of the most prevalent RNA modifications in mRNA and non-coding RNA. In this study, we identified 10 upregulated m6A regulators at both mRNA and protein levels, and 2,479 m6A-related lncRNAs. Moreover, the m6A-related long noncoding RNAs (lncRNAs) could clearly stratify the colon adenocarcinoma (COAD) samples into three subtypes. The subtype 2 had nearly 40% of samples with microsatellite instability (MSI), significantly higher than the two other subtypes. In accordance with this finding, the inflammatory response-related pathways were highly activated in this subtype. The subtype-3 had a shorter overall survival and a higher proportion of patients with advanced stage than subtypes 1 and 2 (p-value < 0.05). Pathway analysis suggested that the energy metabolism-related pathways might be aberrantly activated in subtype 3. In addition, we observed that most of the m6A readers and m6A-related lncRNAs were upregulated in subtype 3, suggesting that the m6A readers and the m6A-related lncRNAs might be associated with metabolic reprogramming and unfavorable outcome in COAD. Among the m6A-related lncRNAs in subtype 3, four were predicted as prognostically relevant. Functional inference suggested that CTD-3184A7.4, RP11-458F8.4, and RP11-108L7.15 were positively correlated with the energy metabolism-related pathways, further suggesting that these lncRNAs might be involved in energy metabolism-related pathways. In summary, we conducted a systematic data analysis to identify the key m6A regulators and m6A-related lncRNAs, and evaluated their clinical and functional importance in COAD, which may provide important evidences for further m6A-related researches.
Collapse
Affiliation(s)
- Xinjie Tan
- School of Medicine, Nankai University, Tianjin, China
| | - Qian Li
- Department of Pediatrics, The Second Affiliated Hospital of Zheng Zhou University, Zhengzhou, China
| | - Qinya Zhang
- Department of Anesthesiology, Heidelberg University Hospital, Heidelberg, Germany.,Department of Anesthesiology, Affiliated Hospital of Guizhou Medical University, Guiyang, China
| | - Gang Fan
- The Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Hunan Cancer Hospital, Changsha, China.,Department of Urology, Huazhong University of Science and Technology Union Shenzhen Hospital, The 6th Affiliated Hospital of Shenzhen University Health Science Center, Shenzhen, China
| | - Zhuo Liu
- Third Department of General Surgery, The Central Hospital of Xiangtan, Xiangtan, China
| | - Kunyan Zhou
- The Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Hunan Cancer Hospital, Changsha, China
| |
Collapse
|