1
|
Sultan MF, Karim T, Hossain Shaon MS, Azim SM, Dehzangi I, Akter MS, Ibrahim SM, Ali MM, Ahmed K, Bui FM. DHUpredET: A comparative computational approach for identification of dihydrouridine modification sites in RNA sequence. Anal Biochem 2025; 702:115828. [PMID: 40057221 DOI: 10.1016/j.ab.2025.115828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2025] [Revised: 02/23/2025] [Accepted: 03/04/2025] [Indexed: 03/17/2025]
Abstract
Laboratory-based detection of D sites is laborious and expensive. In this study, we developed effective machine learning models employing efficient feature encoding methods to identify D sites. Initially, we explored various state-of-the-art feature encoding approaches and 30 machine learning techniques for each and selected the top eight models based on their independent testing and cross-validation outcomes. Finally, we developed DHUpredET using the extra tree classifier methods for predicting DHU sites. The DHUpredET model demonstrated balanced performance across all evaluation criteria, outperforming state-of-the-art models by 8 % and 14 % in terms of accuracy and sensitivity, respectively, on an independent test set. Further analysis revealed that the model achieved higher accuracy with position-specific two nucleotide (PS2) features, leading us to conclude that PS2 features are the best suited for the DHUpredET model. Therefore, our proposed model emerges as the most favorite choice for predicting D sites. In addition, we conducted an in-depth analysis of local features and identified a particularly significant attribute with a feature score of 0.035 for PS2_299 attributes. This tool holds immense promise as an advantageous instrument for accelerating the discovery of D modification sites, which contributes too many targeting therapeutic and understanding RNA structure.
Collapse
Affiliation(s)
- Md Fahim Sultan
- Department of Computer Science and Engineering, Oakland University, Rochester, MI, 48309, USA.
| | - Tasmin Karim
- Department of Computer Science and Engineering, Oakland University, Rochester, MI, 48309, USA.
| | | | - Sayed Mehedi Azim
- Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, 08102, USA.
| | - Iman Dehzangi
- Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, 08102, USA; Department of Computer Science, Rutgers University, Camden, NJ, 08102, USA.
| | - Mst Shapna Akter
- Department of Computer Science and Engineering, Oakland University, Rochester, MI, 48309, USA.
| | - Sobhy M Ibrahim
- Department of Biochemistry, College of Science, King Saud University, P.O. Box: 2455, Riyadh, 11451, Saudi Arabia.
| | - Md Mamun Ali
- Division of Biomedical Engineering, University of Saskatchewan, 57 Campus Drive, Saskatoon, SK, S7N 5A9, Canada; Department of Software Engineering, Daffodil International University, Daffodil Smart City, Birulia, Dhaka, 1216, Bangladesh.
| | - Kawsar Ahmed
- Department of Electrical and Computer Engineering, University of Saskatchewan, 57 Campus Drive, Saskatoon, SK, S7N 5A9, Canada; Health Informatics Research Lab, Department of Computer Science and Engineering, Daffodil International University, Daffodil Smart City, Birulia, Dhaka, 1216, Bangladesh.
| | - Francis M Bui
- Department of Electrical and Computer Engineering, University of Saskatchewan, 57 Campus Drive, Saskatoon, SK, S7N 5A9, Canada.
| |
Collapse
|
2
|
Huang G, Lyu J, Dai Q, Chen W. EVlncRNA-net: A dual-channel deep learning approach for accurate prediction of experimentally validated lncRNAs. Int J Biol Macromol 2025; 306:141538. [PMID: 40043997 DOI: 10.1016/j.ijbiomac.2025.141538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2024] [Revised: 02/11/2025] [Accepted: 02/25/2025] [Indexed: 03/14/2025]
Abstract
Long non-coding RNAs (lncRNAs) play key roles in numerous biological processes and are associated with various human diseases. High-throughput RNA sequencing (HTlncRNAs) has identified tens of thousands of lncRNAs across species, but only a small fraction have been functionally characterized. While the experimental validation of lncRNAs (EVlncRNAs) using low-throughput methods is increasing, the expensive costs limit the validation to a small subset of HTlncRNAs. Therefore, developing predictive tools to prioritize potentially functional lncRNAs for low-throughput validation is crucial. To address this need, we proposed EVlncRNA-net, a novel deep learning framework based on sequence language processing. This framework incorporates two representation learning modules: EVlncRNA-net (GCN) and EVlncRNA-net (CNN). EVlncRNA-net (GCN) introduces a novel graph construction method and a specialized node encoding technique. This module transforms lncRNA sequences into graphical formats and processes them using graph convolution. EVlncRNA-net (CNN) extracts features from one-hot encoded sequences via convolutional neural networks. Both modules ensure robust feature representation of lncRNA sequences. Tailored for humans, mice, and plants, EVlncRNA-net achieves prediction accuracies of 85.8 %, 83.1 %, and 85.4 %, respectively, outperforming existing methods. The platform is available at https://github.com/rice1ee/EVlncRNA_net/tree/master, serving as a valuable tool for prioritizing lncRNAs for experimental validation.
Collapse
Affiliation(s)
- Guohua Huang
- Hunan Provincial Key Laboratory of Finance& Economics Big Data Science and Technology, Hunan University of Finance and Economics, Changsha 410205, China
| | - Jianyi Lyu
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China.
| | - Qi Dai
- College of Life Science and Medicine, Zhejiang Sci-Tech University, Hangzhou 310018, China.
| | - Weihong Chen
- Hunan Provincial Key Laboratory of Finance& Economics Big Data Science and Technology, Hunan University of Finance and Economics, Changsha 410205, China.
| |
Collapse
|
3
|
Pronozin AY, Karetnikov DI, Shmakov NA, Bocharnikova ME, Afonnikova SD, Afonnikov DA, Kolchanov NA. CropGene: a software package for the analysis of genomic and transcriptomic data of agricultural plants. Vavilovskii Zhurnal Genet Selektsii 2025; 29:320-329. [PMID: 40264806 PMCID: PMC12011622 DOI: 10.18699/vjgb-25-35] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2024] [Revised: 01/15/2025] [Accepted: 01/15/2025] [Indexed: 04/24/2025] Open
Abstract
Currently, the breeding of agricultural plants is increasingly based on the use of molecular biological data on genetic sequences, which makes it possible to significantly accelerate the breeding process, create new plant varieties through genomic editing. These data have a large volume, variety and require a large amount of resources, both labor and computing, to analyze the costs. Data analysis of such volume and complexity can be effective only when using modern bioinformatics methods, which include algorithms for identifying genes, predicting their function, and evaluating the effect of mutation on plant phenotype. Such an analysis has recently become impossible without the use of integrated software systems that solve problems of different levels by executing computational pipelines. The paper describes the CropGene software package developed for the comprehensive analysis of genomic and transcriptomic data of agricultural plants. CropGene includes several blocks of bioinformatic analysis, such as analysis of gene variations, assembly of genomes and transcriptomes, as well as annotation of genes and proteins. CropGene implements new methods for analyzing long non-coding RNAs, protein domains, searching and analyzing polymorphisms, and genome-wide association research. CropGene has a user-friendly interface and supports working with various types of data, which greatly simplifies its use for researchers who do not have deep knowledge in the field of bioinformatics. The paper provides examples of the use of CropGene for the analysis of agricultural organisms such as Solanum tuberosum and Zea mays. With CropGene, genetic markers have been identified that explain up to 50 % of the variability in seed color parameters; potential genes that may become promising material for producing potato varieties; more than 100 thousand new long non-coding RNAs. Orthogroups were also found, the domain structure of which shows a marked similarity with the domain architecture of characteristic secreted A2 phospholipases. Thus, CropGene is an important tool for scientists and practitioners working in the field of agrobiotechnology and plant genetics.
Collapse
Affiliation(s)
- A Yu Pronozin
- Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia Kurchatov Genomic Center of ICG SB RAS, Novosibirsk, Russia
| | - D I Karetnikov
- Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia Kurchatov Genomic Center of ICG SB RAS, Novosibirsk, Russia
| | - N A Shmakov
- Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia Kurchatov Genomic Center of ICG SB RAS, Novosibirsk, Russia
| | - M E Bocharnikova
- Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia Kurchatov Genomic Center of ICG SB RAS, Novosibirsk, Russia
| | - S D Afonnikova
- Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia Kurchatov Genomic Center of ICG SB RAS, Novosibirsk, Russia
| | - D A Afonnikov
- Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia Kurchatov Genomic Center of ICG SB RAS, Novosibirsk, Russia
| | - N A Kolchanov
- Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia Kurchatov Genomic Center of ICG SB RAS, Novosibirsk, Russia
| |
Collapse
|
4
|
Bai H, Wang J, Jiang X, Guo Z, Yang W, Yang Z, Li J, Liu C. TetraRNA, a tetra-class machine learning model for deciphering the coding potential derivation of RNA world. Comput Struct Biotechnol J 2025; 27:1305-1317. [PMID: 40230410 PMCID: PMC11994946 DOI: 10.1016/j.csbj.2025.03.039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2024] [Revised: 03/20/2025] [Accepted: 03/24/2025] [Indexed: 04/16/2025] Open
Abstract
CncRNAs (coding and noncoding RNAs) are a class of bifunctional RNAs that that has both coding and noncoding biological activity. An increasing number of cncRNAs are being identified, prompting reassessment of our knowledge of RNA. However, most existing RNA classification tools are based on binary classification models which are not effective in distinguishing cncRNAs from mRNAs or long noncoding RNAs (lncRNAs). Our statistical analysis demonstrated that mRNA-derived cncRNAs (untranslated mRNAs, untr-mRNAs) and lncRNA-derived cncRNAs (translated ncRNAs, tr-ncRNAs) do not fall in the same cluster. Therefore, in this study, we devised a novel tetra-class RNA classification model that is systematically optimized for RNA feature extraction. According to our model, all human RNAs can be reclassified into one of four categories - mRNA, untr-mRNA, lncRNA, and tr-ncRNA - representing a novel RNA classification system and allowing the discovery of more potential cncRNAs. Further analysis revealed significant differences among the four types of RNAs in tissue-specific expression, functional annotation, sequence composition, and other factors, providing insights into their divergent evolution trajectories. Moreover, investigation of the small tr-ncRNA peptides demonstrated that their evolution is coordinated with that of the the conserved functional small RNAs associated with them. All analysis results have been integrated into a database - TetraRNADB accessible online (http://tetrarnadb.liu-lab.com/).
Collapse
Affiliation(s)
- Hanrui Bai
- College of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei 230026, China
- CAS Key Laboratory of Tropical Plant Resources and Sustainable Use, Yunnan Key Laboratory of Crop Wild Relatives Omics, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Kunming 650223, China
| | - Jie Wang
- Department of Chromosome Biology, Max Planck Institute for Plant Breeding Research, Carl-von-Linne-Weg 10, Cologne 50829, Germany
| | - Xiaoke Jiang
- CAS Key Laboratory of Tropical Plant Resources and Sustainable Use, Yunnan Key Laboratory of Crop Wild Relatives Omics, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Kunming 650223, China
| | - Zhen Guo
- College of Science and Engineering, Saint Louis University, St. Louis, MO 63103, USA
| | - Wenjing Yang
- CAS Key Laboratory of Tropical Plant Resources and Sustainable Use, Yunnan Key Laboratory of Crop Wild Relatives Omics, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Kunming 650223, China
| | - Zitian Yang
- CAS Key Laboratory of Tropical Plant Resources and Sustainable Use, Yunnan Key Laboratory of Crop Wild Relatives Omics, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Kunming 650223, China
| | - Jing Li
- CAS Key Laboratory of Tropical Plant Resources and Sustainable Use, Yunnan Key Laboratory of Crop Wild Relatives Omics, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Kunming 650223, China
| | - Changning Liu
- College of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei 230026, China
- CAS Key Laboratory of Tropical Plant Resources and Sustainable Use, Yunnan Key Laboratory of Crop Wild Relatives Omics, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Kunming 650223, China
| |
Collapse
|
5
|
Mei S, Huang J, Zhang Z, Lei H, Huang Q, Qu L, Zheng L. InfoScan: A New Transcript Identification Tool Based on scRNA-Seq and Its Application in Glioblastoma. Int J Mol Sci 2025; 26:2208. [PMID: 40076844 PMCID: PMC11900204 DOI: 10.3390/ijms26052208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2024] [Revised: 02/05/2025] [Accepted: 02/26/2025] [Indexed: 03/14/2025] Open
Abstract
InfoScan is a novel bioinformatics tool designed for the comprehensive analysis of full-length single-cell RNA sequencing (scRNA-seq) data. It enables the identification of unannotated transcripts and rare cell populations, providing a powerful platform for transcriptome characterization. In this study, InfoScan was applied to glioblastoma multiforme (GBM), identifying a rare "neoplastic-stemness" subpopulation exhibiting cancer stem cell-like features. Functional analyses suggested that tumor-associated macrophages (TAMs) secrete SPP1, which binds to CD44 on neoplastic-stemness cells, activating the PI3K/AKT pathway and driving lncRNA transcription to promote metastasis. Integration of TCGA and CGGA datasets further supported these findings, highlighting key mutations associated with the neoplastic-stemness subpopulation. Drug sensitivity assays indicated that neoplastic-stemness cells might be sensitive to omipalisib, a PI3K inhibitor, pointing to a potential therapeutic target. InfoScan offers a robust framework for exploring complex transcriptomic landscapes and characterizing rare cell populations, providing valuable insights into GBM biology and advancing precision cancer therapy.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Lingling Zheng
- MOE Key Laboratory of Gene Function and Regulation, State Key Laboratory for Biocontrol, Innovation Center for Evolutionary Synthetic Biology, School of Agriculture and Biotechnology, School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, China; (S.M.); (J.H.); (Z.Z.); (H.L.); (Q.H.); (L.Q.)
| |
Collapse
|
6
|
Wang S, Yu ZG, Han GS. MVSLLnc: LncRNA subcellular localization prediction based on multi-source features and two-stage voting strategy. Methods 2025; 234:324-332. [PMID: 39837434 DOI: 10.1016/j.ymeth.2025.01.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2024] [Revised: 12/28/2024] [Accepted: 01/16/2025] [Indexed: 01/23/2025] Open
Abstract
The subcellular localization of long non-coding RNAs (lncRNAs) is crucial for understanding the function of lncRNAs. Since the traditional biological experimental methods are time-consuming and some existing computational methods rely on high computing power, we are committed to finding a simple and easy-to-implement method to achieve more efficient prediction of the subcellular localization of lncRNAs. In this work, we proposed a model based on multi-source features and two-stage voting strategy for predicting the subcellular localization of lncRNAs (MVSLLnc). The multi-source features include k-mer frequency, features based on the coordinate values of Chaos Game Representation (CGR) and features based on physicochemical property (PhyChe). We feed the multi-source features into the traditional machine learning classifiers RF, SVM and XGBoost, respectively, and perform the final prediction task with two-stage voting strategy. Experimental results on three benchmark datasets show that the accuracy can reach 0.829, 0.793 and 0.968, respectively. The accuracy on three independent test sets is 0.642, 0.737 and 0.518, respectively, which are competitive with the existing methods. Our ablation analyses show that the two-stage voting strategy can make full use of the advantages of multi-source features and multiple classifiers, and obtain more robust results.
Collapse
Affiliation(s)
- Sheng Wang
- National Center for Applied Mathematics in Hunan, Xiangtan University, Hunan 411105, China; Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Hunan 411105, China
| | - Zu-Guo Yu
- National Center for Applied Mathematics in Hunan, Xiangtan University, Hunan 411105, China; Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Hunan 411105, China.
| | - Guo-Sheng Han
- National Center for Applied Mathematics in Hunan, Xiangtan University, Hunan 411105, China; Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Hunan 411105, China.
| |
Collapse
|
7
|
Gao Y, Takenaka K, Xu SM, Cheng Y, Janitz M. Recent advances in investigation of circRNA/lncRNA-miRNA-mRNA networks through RNA sequencing data analysis. Brief Funct Genomics 2025; 24:elaf005. [PMID: 40251826 PMCID: PMC12008121 DOI: 10.1093/bfgp/elaf005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2024] [Revised: 03/10/2025] [Accepted: 03/18/2025] [Indexed: 04/21/2025] Open
Abstract
Non-coding RNAs (ncRNAs) are RNA molecules that are transcribed from DNA but are not translated into proteins. Studies over the past decades have revealed that ncRNAs can be classified into small RNAs, long non-coding RNAs and circular RNAs by genomic size and structure. Accumulated evidences have eludicated the critical roles of these non-coding transcripts in regulating gene expression through transcription and translation, thereby shaping cellular function and disease pathogenesis. Notably, recent studies have investigated the function of ncRNAs as competitive endogenous RNAs (ceRNAs) that sequester miRNAs and modulate mRNAs expression. The ceRNAs network emerges as a pivotal regulatory function, with significant implications in various diseases such as cancer and neurodegenerative disease. Therefore, we highlighted multiple bioinformatics tools and databases that aim to predict ceRNAs interaction. Furthermore, we discussed limitations of using current technologies and potential improvement for ceRNAs network detection. Understanding of the dynamic interplay within ceRNAs may advance the biological comprehension, as well as providing potential targets for therapeutic intervention.
Collapse
Affiliation(s)
- Yulan Gao
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Gate 11 via Botany St, Sydney, NSW 2052, Australia
| | - Konii Takenaka
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Gate 11 via Botany St, Sydney, NSW 2052, Australia
| | - Si-Mei Xu
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Gate 11 via Botany St, Sydney, NSW 2052, Australia
| | - Yuning Cheng
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Gate 11 via Botany St, Sydney, NSW 2052, Australia
| | - Michael Janitz
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Gate 11 via Botany St, Sydney, NSW 2052, Australia
| |
Collapse
|
8
|
Xu J, Shen E, Guo F, Wang K, Hu Y, Shen L, Chen H, Li X, Zhu QH, Fan L, Chu Q. Identification of cell-type specificity, trans- and cis-acting functions of plant lincRNAs from single-cell transcriptomes. THE NEW PHYTOLOGIST 2025; 245:698-710. [PMID: 39550625 DOI: 10.1111/nph.20269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/21/2024] [Accepted: 10/21/2024] [Indexed: 11/18/2024]
Abstract
Long noncoding RNAs, including intergenic lncRNAs (lincRNAs), play a key role in various biological processes throughout the plant life cycle, and the advent of single-cell RNA sequencing (scRNA-seq) technology has opened up a valuable avenue for scrutinizing the intricate roles of lincRNAs in cellular processes. Here, we identified a new batch of lincRNAs using scRNA-seq data from diverse tissues of plants (rice, Arabidopsis, tomato, and maize). Based on well-annotated single-cell transcriptome atlases, plant lincRNAs were found to possess the same level of cell-type specificity as mRNAs and to be involved in the differentiation of certain cell types based on pseudo-time analysis. Many lincRNAs were predicted to play a hub role in the cell-type-specific co-expression networks of lincRNAs and mRNAs, suggesting their trans-acting abilities. Besides, plant lincRNAs were revealed to have potential cis-acting properties based on their genomic distances and expression correlations with the neighboring mRNAs. Furthermore, an online platform, PscLncRNA (http://ibi.zju.edu.cn/psclncrna/), was constructed for searching and visualizing all identified plant lincRNAs with annotated potential functions. Our work provides new insights into plant lincRNAs at single-cell resolution and an important resource for understanding and further investigation of plant lincRNAs.
Collapse
Affiliation(s)
- Jiwei Xu
- Hainan Institute, Zhejiang University, Sanya, 572025, China
- Institute of Crop Science, Zhejiang University, Hangzhou, 310058, China
| | - Enhui Shen
- Institute of Crop Science, Zhejiang University, Hangzhou, 310058, China
| | - Fu Guo
- Hainan Institute, Zhejiang University, Sanya, 572025, China
| | - Kaiqiang Wang
- Hainan Institute, Zhejiang University, Sanya, 572025, China
| | - Yurong Hu
- Institute of Crop Science, Zhejiang University, Hangzhou, 310058, China
| | - Leti Shen
- Hainan Institute, Zhejiang University, Sanya, 572025, China
- Institute of Crop Science, Zhejiang University, Hangzhou, 310058, China
| | - Hongyu Chen
- Institute of Crop Science, Zhejiang University, Hangzhou, 310058, China
| | - Xiaohan Li
- Institute of Crop Science, Zhejiang University, Hangzhou, 310058, China
| | - Qian-Hao Zhu
- CSIRO Agriculture and Food, GPO Box 1700, Canberra, ACT, 2601, Australia
| | - Longjiang Fan
- Hainan Institute, Zhejiang University, Sanya, 572025, China
- Institute of Crop Science, Zhejiang University, Hangzhou, 310058, China
| | - Qinjie Chu
- Institute of Crop Science, Zhejiang University, Hangzhou, 310058, China
| |
Collapse
|
9
|
Thakur A, Kumar M. Computational Resources for lncRNA Functions and Targetome. Methods Mol Biol 2025; 2883:299-323. [PMID: 39702714 DOI: 10.1007/978-1-0716-4290-0_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2024]
Abstract
Long non-coding RNAs (lncRNAs) are a type of non-coding RNA molecules exceeding 200 nucleotides in length and that do not encode proteins. The dysregulated expression of lncRNAs has been identified in various diseases, holding therapeutic significance. Over the past decade, numerous computational resources have been published in the field of lncRNA. In this chapter, we have provided a comprehensive review of the databases as well as predictive tools, that is, lncRNA databases, machine learning based algorithms, and tools predicting lncRNAs utilizing different techniques. The chapter will focus on the importance of lncRNA resources developed for different organisms specifically for humans, mouse, plants, and other model organisms. We have enlisted important databases, primarily focusing on comprehensive information related to lncRNA registries, associations with diseases, differential expression, lncRNA transcriptome, target regulations, and all-in-one resources. Further, we have also included the updated version of lncRNA resources. Additionally, computational identification of lncRNAs using algorithms like Deep learning, Support Vector Machine (SVM), and Random Forest (RF) was also discussed. In conclusion, this comprehensive overview concludes by summarizing vital in silico resources, empowering biologists to choose the most suitable tools for their lncRNA research endeavors. This chapter serves as a valuable guide, emphasizing the significance of computational approaches in understanding lncRNAs and their implications in various biological contexts.
Collapse
Affiliation(s)
- Anamika Thakur
- Virology Unit and Bioinformatics Centre, Institute of Microbial Technology, Council of Scientific and Industrial Research (CSIR), Sector 39A, Chandigarh, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| | - Manoj Kumar
- Virology Unit and Bioinformatics Centre, Institute of Microbial Technology, Council of Scientific and Industrial Research (CSIR), Sector 39A, Chandigarh, India.
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India.
| |
Collapse
|
10
|
Swigonska S, Nynca A, Molcan T, Petroff BK, Ciereszko RE. The Role of lncRNAs in the Protective Action of Tamoxifen on the Ovaries of Tumor-Bearing Rats Receiving Cyclophosphamide. Int J Mol Sci 2024; 25:12538. [PMID: 39684249 DOI: 10.3390/ijms252312538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2024] [Revised: 11/14/2024] [Accepted: 11/17/2024] [Indexed: 12/18/2024] Open
Abstract
Infertility due to ovarian toxicity is a common side effect of cancer treatment in premenopausal women. Tamoxifen (TAM) is a selective estrogen receptor modulator that prevented radiation- and chemotherapy-induced ovarian failure in preclinical studies. In the current study, we examined the potential regulatory role of long noncoding RNAs (lncRNAs) in the mechanism of action of TAM in the ovaries of tumor-bearing rats receiving cyclophosphamide (CPA) as cancer therapy. We identified 166 lncRNAs, among which 49 were demonstrated to be differentially expressed (DELs) in the ovaries of rats receiving TAM and CPA compared to those receiving only CPA. A total of 24 DELs were upregulated and 25 downregulated by tamoxifen. The identified DELs shared the characteristics of noncoding RNAs described in other reproductive tissues. Eleven of the identified DELs displayed divergent modes of action, regulating target transcripts via both cis- and trans-acting pathways. Functional enrichment analysis revealed that, among target genes ascribed to the identified DELs, the majority were involved in apoptosis, cell adhesion, immune response, and ovarian aging. The presented data suggest that the molecular mechanisms behind tamoxifen's protective effects in the ovaries may involve lncRNA-dependent regulation of critical signaling pathways related to inhibition of follicular transition and ovarian aging, along with the suppression of apoptosis and regulation of cell adhesion. Employing a tumor-bearing animal model undergoing chemotherapy, which accurately reflects the conditions of mammary cancer, reinforces the obtained results. Given that tamoxifen remains a key player in the management and prevention of breast cancer, understanding its ovarian-specific actions in cancer patients is crucial and requires detailed functional studies to clarify the underlying molecular mechanisms.
Collapse
Affiliation(s)
- Sylwia Swigonska
- Department of Biochemistry, University of Warmia and Mazury in Olsztyn, Prawochenskiego 5, 10-720 Olsztyn, Poland
| | - Anna Nynca
- Department of Animal Anatomy and Physiology, University of Warmia and Mazury in Olsztyn, 10-719 Olsztyn, Poland
| | - Tomasz Molcan
- Molecular Biology Laboratory, Institute of Animal Reproduction and Food Research, Polish Academy of Sciences, Tuwima 10, 10-748 Olsztyn, Poland
| | - Brian K Petroff
- Department of Pathobiology and Diagnostic Investigation, Michigan State University, East Lansing, MI 48824-1314, USA
| | - Renata E Ciereszko
- Department of Animal Anatomy and Physiology, University of Warmia and Mazury in Olsztyn, 10-719 Olsztyn, Poland
| |
Collapse
|
11
|
Kumar H, Qin X, Bhushan B, Dutt T, Panigrahi M. DeepGenomeScan of 15 Worldwide Bovine Populations Detects Spatially Varying Positive Selection Signals. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2024; 28:504-513. [PMID: 39315920 DOI: 10.1089/omi.2024.0154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
Identifying genomic regions under selection is essential for understanding the genetic mechanisms driving species evolution and adaptation. Traditional methods often fall short in detecting complex, spatially varying selection signals. Recent advances in deep learning, however, present promising new approaches for uncovering subtle selection signals that traditional methods might miss. In this study, we utilized the deep learning framework DeepGenomeScan to detect spatially varying selection signatures across 15 bovine populations worldwide. Our analysis uncovered novel insights into selective sweep hotspots within the bovine genome, revealing key genes associated with physiological and adaptive traits that were previously undetected. We identified significant quantitative trait loci linked to milk protein and fat percentages. By comparing the selection signatures identified in this study with those reported in the Bovine Genome Variation Database, we discovered 38 novel genes under selection that were not identified through traditional methods. These genes are primarily associated with milk and meat yield and quality. Our findings enhance our understanding of spatially varying selection's impact on bovine genomic diversity, laying a foundation for future research in genetic improvement and conservation. This is the first deep learning-based study of selection signatures in cattle, offering new insights for evolutionary and livestock genomics research.
Collapse
Affiliation(s)
- Harshit Kumar
- Division of Animal Genetics, Indian Veterinary Research Institute, Izatnagar, India
- ICAR-National Research Centre on Mithun, Medziphema, India
| | - Xinghu Qin
- School of Ecology and Nature Conservation, Beijing Forestry University, Beijing, China
| | - Bharat Bhushan
- Division of Animal Genetics, Indian Veterinary Research Institute, Izatnagar, India
| | - Triveni Dutt
- Indian Veterinary Research Institute, Izatnagar, India
| | - Manjit Panigrahi
- Division of Animal Genetics, Indian Veterinary Research Institute, Izatnagar, India
| |
Collapse
|
12
|
Low ETL, Chan KL, Zaki NM, Taranenko E, Ordway JM, Wischmeyer C, Buntjer J, Halim MAA, Sanusi NSNM, Nagappan J, Rosli R, Bondar E, Amiruddin N, Sarpan N, Ting NC, Chan PL, Ong-Abdullah M, Marjuni M, Mustaffa S, Abdullah N, Azizi N, Bacher B, Lakey N, Tatarinova TV, Manaf MAA, Sambanthamurti R, Singh R. Chromosome-scale Elaeis guineensis and E. oleifera assemblies: comparative genomics of oil palm and other Arecaceae. G3 (BETHESDA, MD.) 2024; 14:jkae135. [PMID: 38918881 PMCID: PMC11373658 DOI: 10.1093/g3journal/jkae135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Revised: 04/18/2023] [Accepted: 05/16/2024] [Indexed: 06/27/2024]
Abstract
Elaeis guineensis and E. oleifera are the two species of oil palm. E. guineensis is the most widely cultivated commercial species, and introgression of desirable traits from E. oleifera is ongoing. We report an improved E. guineensis genome assembly with substantially increased continuity and completeness, as well as the first chromosome-scale E. oleifera genome assembly. Each assembly was obtained by integration of long-read sequencing, proximity ligation sequencing, optical mapping, and genetic mapping. High interspecific genome conservation is observed between the two species. The study provides the most extensive gene annotation to date, including 46,697 E. guineensis and 38,658 E. oleifera gene predictions. Analyses of repetitive element families further resolve the DNA repeat architecture of both genomes. Comparative genomic analyses identified experimentally validated small structural variants between the oil palm species and resolved the mechanism of chromosomal fusions responsible for the evolutionary descending dysploidy from 18 to 16 chromosomes.
Collapse
Affiliation(s)
- Eng-Ti Leslie Low
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, 6 Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
| | - Kuang-Lim Chan
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, 6 Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
| | - Noorhariza Mohd Zaki
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, 6 Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
| | | | - Jared M Ordway
- Orion Genomics, 3730 Foundry Way, St. Louis, MO 63110, USA
| | | | - Jaap Buntjer
- Orion Genomics, 3730 Foundry Way, St. Louis, MO 63110, USA
| | - Mohd Amin Ab Halim
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, 6 Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
| | - Nik Shazana Nik Mohd Sanusi
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, 6 Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
| | - Jayanthi Nagappan
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, 6 Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
| | - Rozana Rosli
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, 6 Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
| | - Eugeniya Bondar
- Biology Department, University of La Verne, La Verne, CA 91750, USA
| | - Nadzirah Amiruddin
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, 6 Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
| | - Norashikin Sarpan
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, 6 Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
| | - Ngoot-Chin Ting
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, 6 Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
| | - Pek-Lan Chan
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, 6 Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
| | - Meilina Ong-Abdullah
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, 6 Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
| | - Marhalil Marjuni
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, 6 Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
| | - Suzana Mustaffa
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, 6 Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
| | - Norziha Abdullah
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, 6 Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
| | - Norazah Azizi
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, 6 Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
| | - Blaire Bacher
- Orion Genomics, 3730 Foundry Way, St. Louis, MO 63110, USA
| | - Nathan Lakey
- Orion Genomics, 3730 Foundry Way, St. Louis, MO 63110, USA
| | | | - Mohamad Arif Abd Manaf
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, 6 Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
| | - Ravigadevi Sambanthamurti
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, 6 Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
| | - Rajinder Singh
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, 6 Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
| |
Collapse
|
13
|
Zhang J, Lu H, Jiang Y, Ma Y, Deng L. ncRNA Coding Potential Prediction Using BiLSTM and Transformer Encoder-Based Model. J Chem Inf Model 2024; 64:6712-6722. [PMID: 39120528 DOI: 10.1021/acs.jcim.4c01097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/10/2024]
Abstract
Many noncoding RNAs (ncRNAs) have been identified, and many of them play vital roles in various biological processes, including gene expression regulation, epigenetic regulation, transcription, and control. Recently, a few observations revealed that ncRNAs are translated into functional peptides. Moreover, many computational methods have been developed to predict the coding potential of these transcripts, which contributes to a deeper investigation of their functions. However, most of these are used to distinguish ncRNAs and mRNAs. It is important to develop a highly accurate computational tool for identifying the coding potential of ncRNAs, thereby contributing to the discovery of novel peptides. In this Article, we propose a novel BiLSTM And Transformer encoder-based model (nBAT) with intrinsic features encoded for ncRNA coding potential prediction. In nBAT, we introduce a learnable position encoding mechanism to better obtain the embeddings of the ncRNA sequence. Moreover, we extract 43 intrinsic features from different perspectives and encode these features into the Transformer encoder by calculating their distances. Our performance comparisons show that nBAT achieves a superior performance than the state-of-the-art methods for coding potential prediction on different datasets. We also apply the method to new ncRNAs for identifying the coding potential, and the results further indicate the competitive performance of nBAT. We expect the method can be exploited as a useful tool for high-throughput coding potential prediction for ncRNAs.
Collapse
Affiliation(s)
- Jingpu Zhang
- School of Computer and Data Science, Henan University of Urban Construction, Pingdingshan 467000, China
| | - Hao Lu
- School of Computer and Data Science, Henan University of Urban Construction, Pingdingshan 467000, China
| | - Ying Jiang
- School of Computer Science and Engineering, Central South University, Changsha 410018, China
| | - Yuanyuan Ma
- School of Computer Engineering, Hubei University of Arts and Science, Xiangyang 441053, China
| | - Lei Deng
- School of Computer Science and Engineering, Central South University, Changsha 410018, China
| |
Collapse
|
14
|
Taylor AD, Hathaway QA, Kunovac A, Pinti MV, Newman MS, Cook CC, Cramer ER, Starcovic SA, Winters MT, Westemeier-Rice ES, Fink GK, Durr AJ, Rizwan S, Shepherd DL, Robart AR, Martinez I, Hollander JM. Mitochondrial sequencing identifies long noncoding RNA features that promote binding to PNPase. Am J Physiol Cell Physiol 2024; 327:C221-C236. [PMID: 38826135 PMCID: PMC11427107 DOI: 10.1152/ajpcell.00648.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 05/24/2024] [Accepted: 05/24/2024] [Indexed: 06/04/2024]
Abstract
Extranuclear localization of long noncoding RNAs (lncRNAs) is poorly understood. Based on machine learning evaluations, we propose a lncRNA-mitochondrial interaction pathway where polynucleotide phosphorylase (PNPase), through domains that provide specificity for primary sequence and secondary structure, binds nuclear-encoded lncRNAs to facilitate mitochondrial import. Using FVB/NJ mouse and human cardiac tissues, RNA from isolated subcellular compartments (cytoplasmic and mitochondrial) and cross-linked immunoprecipitate (CLIP) with PNPase within the mitochondrion were sequenced on the Illumina HiSeq and MiSeq, respectively. lncRNA sequence and structure were evaluated through supervised [classification and regression trees (CART) and support vector machines (SVM)] machine learning algorithms. In HL-1 cells, quantitative PCR of PNPase CLIP knockout mutants (KH and S1) was performed. In vitro fluorescence assays assessed PNPase RNA binding capacity and verified with PNPase CLIP. One hundred twelve (mouse) and 1,548 (human) lncRNAs were identified in the mitochondrion with Malat1 being the most abundant. Most noncoding RNAs binding PNPase were lncRNAs, including Malat1. lncRNA fragments bound to PNPase compared against randomly generated sequences of similar length showed stratification with SVM and CART algorithms. The lncRNAs bound to PNPase were used to create a criterion for binding, with experimental validation revealing increased binding affinity of RNA designed to bind PNPase compared to control RNA. The binding of lncRNAs to PNPase was decreased through the knockout of RNA binding domains KH and S1. In conclusion, sequence and secondary structural features identified by machine learning enhance the likelihood of nuclear-encoded lncRNAs binding to PNPase and undergoing import into the mitochondrion.NEW & NOTEWORTHY Long noncoding RNAs (lncRNAs) are relatively novel RNAs with increasingly prominent roles in regulating genetic expression, mainly in the nucleus but more recently in regions such as the mitochondrion. This study explores how lncRNAs interact with polynucleotide phosphorylase (PNPase), a protein that regulates RNA import into the mitochondrion. Machine learning identified several RNA structural features that improved lncRNA binding to PNPase, which may be useful in targeting RNA therapeutics to the mitochondrion.
Collapse
Affiliation(s)
- Andrew D Taylor
- Division of Exercise Physiology, West Virginia University School of Medicine, Morgantown, West Virginia, United States
- Mitochondria, Metabolism, and Bioenergetics Working Group, West Virginia University School of Medicine, Morgantown, West Virginia, United States
| | - Quincy A Hathaway
- Division of Exercise Physiology, West Virginia University School of Medicine, Morgantown, West Virginia, United States
- Heart and Vascular Institute, West Virginia University, Morgantown, West Virginia, United States
- Department of Medical Education, West Virginia University School of Medicine, Morgantown, West Virginia, United States
| | - Amina Kunovac
- Division of Exercise Physiology, West Virginia University School of Medicine, Morgantown, West Virginia, United States
- Mitochondria, Metabolism, and Bioenergetics Working Group, West Virginia University School of Medicine, Morgantown, West Virginia, United States
| | - Mark V Pinti
- Mitochondria, Metabolism, and Bioenergetics Working Group, West Virginia University School of Medicine, Morgantown, West Virginia, United States
- West Virginia University School of Pharmacy, Morgantown, West Virginia, United States
| | - Mackenzie S Newman
- Department of Physiology and Pharmacology, West Virginia University School of Medicine, Morgantown, West Virginia, United States
| | - Chris C Cook
- Cardiovascular and Thoracic Surgery, West Virginia University School of Medicine, Morgantown, West Virginia, United States
| | - Evan R Cramer
- Department of Biochemistry, West Virginia University School of Medicine, Morgantown, West Virginia, United States
| | - Sarah A Starcovic
- Department of Biochemistry, West Virginia University School of Medicine, Morgantown, West Virginia, United States
| | - Michael T Winters
- Department of Microbiology, Immunology, and Cell Biology, West Virginia University Cancer Institute, School of Medicine, Morgantown, West Virginia, United States
| | - Emily S Westemeier-Rice
- Department of Microbiology, Immunology, and Cell Biology, West Virginia University Cancer Institute, School of Medicine, Morgantown, West Virginia, United States
| | - Garrett K Fink
- Division of Exercise Physiology, West Virginia University School of Medicine, Morgantown, West Virginia, United States
| | - Andrya J Durr
- Division of Exercise Physiology, West Virginia University School of Medicine, Morgantown, West Virginia, United States
- Mitochondria, Metabolism, and Bioenergetics Working Group, West Virginia University School of Medicine, Morgantown, West Virginia, United States
| | - Saira Rizwan
- Division of Exercise Physiology, West Virginia University School of Medicine, Morgantown, West Virginia, United States
- Mitochondria, Metabolism, and Bioenergetics Working Group, West Virginia University School of Medicine, Morgantown, West Virginia, United States
| | - Danielle L Shepherd
- Division of Exercise Physiology, West Virginia University School of Medicine, Morgantown, West Virginia, United States
- Mitochondria, Metabolism, and Bioenergetics Working Group, West Virginia University School of Medicine, Morgantown, West Virginia, United States
| | - Aaron R Robart
- Department of Biochemistry, West Virginia University School of Medicine, Morgantown, West Virginia, United States
| | - Ivan Martinez
- Department of Microbiology, Immunology, and Cell Biology, West Virginia University Cancer Institute, School of Medicine, Morgantown, West Virginia, United States
| | - John M Hollander
- Division of Exercise Physiology, West Virginia University School of Medicine, Morgantown, West Virginia, United States
- Mitochondria, Metabolism, and Bioenergetics Working Group, West Virginia University School of Medicine, Morgantown, West Virginia, United States
| |
Collapse
|
15
|
Chaudhary U, Banerjee S. Decoding the Non-coding: Tools and Databases Unveiling the Hidden World of "Junk" RNAs for Innovative Therapeutic Exploration. ACS Pharmacol Transl Sci 2024; 7:1901-1915. [PMID: 39022352 PMCID: PMC11249652 DOI: 10.1021/acsptsci.3c00388] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2023] [Revised: 05/15/2024] [Accepted: 05/27/2024] [Indexed: 07/20/2024]
Abstract
Non-coding RNAs are pivotal regulators of gene and protein expression, exerting crucial influences on diverse biological processes. Their dysregulation is frequently implicated in the onset and progression of diseases, notably cancer. A profound comprehension of the intricate mechanisms governing ncRNAs is imperative for devising innovative therapeutic interventions against these debilitating conditions. Significantly, nearly 80% of our genome comprises ncRNAs, underscoring their centrality in cellular processes. The elucidation of ncRNA functions is pivotal for grasping the complexities of gene regulation and its implications for human health. Modern genome sequencing techniques yield vast datasets, stored in specialized databases. To harness this wealth of information and to understand the crosstalk of non-coding RNAs, knowledge of available databases is required, and many new sophisticated computational tools have emerged. These tools play a pivotal role in the identification, prediction, and annotation of ncRNAs, thereby facilitating their experimental validation. This Review succinctly outlines the current understanding of ncRNAs, emphasizing their involvement in disease development. It also highlights the databases and tools instrumental in classifying, annotating, and evaluating ncRNAs. By extracting meaningful biological insights from seemingly "junk" data, these tools empower scientists to unravel the intricate roles of ncRNAs in shaping human health.
Collapse
Affiliation(s)
- Uma Chaudhary
- Department of Biotechnology,
School of Biosciences and Technology, Vellore
Institute of Technology (VIT), Vellore, Tamil Nadu 632014, India
| | - Satarupa Banerjee
- Department of Biotechnology,
School of Biosciences and Technology, Vellore
Institute of Technology (VIT), Vellore, Tamil Nadu 632014, India
| |
Collapse
|
16
|
Chen L, Liu L, Su H, Xu Y. KbhbXG: A Machine learning architecture based on XGBoost for prediction of lysine β-Hydroxybutyrylation (Kbhb) modification sites. Methods 2024; 227:27-34. [PMID: 38679187 DOI: 10.1016/j.ymeth.2024.04.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Revised: 04/16/2024] [Accepted: 04/20/2024] [Indexed: 05/01/2024] Open
Abstract
Lysine β-hydroxybutyrylation is an important post-translational modification (PTM) involved in various physiological and biological processes. In this research, we introduce a novel predictor KbhbXG, which utilizes XGBoost to identify β-hydroxybutyrylation modification sites based on protein sequence information. The traditional experimental methods employed for the identification of β-hydroxybutyrylated sites using proteomic techniques are both costly and time-consuming. Thus, the development of computational methods and predictors can play a crucial role in facilitating the rapid identification of β-hydroxybutyrylation sites. Our proposed KbhbXG model first utilizes machine learning algorithm XGBoost to predict β-hydroxybutyrylation modification sites. On the independent test set, KbhbXG achieves an accuracy of 0.7457, specificity of 0.7771, and an impressive area under the curve (AUC) score of 0.8172. The high AUC score achieved by our method demonstrates its potential for effectively identifying novel β-hydroxybutyrylation sites, thereby facilitating further research and exploration of the β-hydroxybutyrylation process. Also, functional analyses have revealed that different organisms preferentially engage in distinct biological processes and pathways, which can provide valuable insights for understanding the mechanism of β-hydroxybutyrylation and guide experimental verification. To promote transparency and reproducibility, we have made both the codes and dataset of KbhbXG publicly available. Researchers interested in utilizing our proposed model can access these resources at https://github.com/Lab-Xu/KbhbXG.
Collapse
Affiliation(s)
- Leqi Chen
- Department of Statistics, University of Science and Technology Beijing, Beijing 100083, China
| | - Liwen Liu
- The Open University of China, Beijing 100039, China
| | - Haiyan Su
- School of Computing, Montclair State University, NJ 07043, USA
| | - Yan Xu
- Department of Statistics, University of Science and Technology Beijing, Beijing 100083, China.
| |
Collapse
|
17
|
Wang L, Yuan Z, Wang J, Guan Y. Genome-wide identification and functional profile analysis of long non-coding RNAs in Avicennia marina. THE PLANT GENOME 2024; 17:e20450. [PMID: 38600855 DOI: 10.1002/tpg2.20450] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 02/29/2024] [Accepted: 03/15/2024] [Indexed: 04/12/2024]
Abstract
Avicennia marina, known for its remarkable adaptability to the challenging coastal environment, including high salinity, tide, and anaerobic soils, holds pivotal functions in safeguarding the coastal ecosystem. Long non-coding RNAs (lncRNAs) have emerged as significant players in various natural processes of plants such as development. However, lncRNAs in A. marina remain largely unknown and uncharacterized. Here, we employed the transcriptome datasets from multiple tissues, such as root, leaf, and seed, to detect and characterize the lncRNAs of A. marina. Analyzing synthetically, we finally identified 6333 lncRNAs in the A. marina. These lncRNAs exhibited distinct features compared to messenger RNAs, including larger exons, lower guanine-cytosine contents, lower expression levels, and higher tissue specificities. Moreover, we identified thousands of tissue-specific lncRNAs across the examined tissues and further found that these tissue-specific lncRNAs were significantly enriched in biological processes related to the major functions of their corresponding tissues. For instance, leaf-specific lncRNAs showed prominent enrichment in photosynthesis, oxidation-reduction processes, and light harvesting. By providing a comprehensive dataset and functional annotations for A. marina lncRNAs, this study offers a valuable overview of lncRNAs in A. marina and lays the fundamental foundation for further functional exploring of them.
Collapse
Affiliation(s)
- Lingling Wang
- Ministry of Education Key Laboratory for Ecology of Tropical Islands, Key Laboratory of Tropical Animal and Plant Ecology of Hainan Province, College of Life Sciences, Hainan Normal University, Haikou, China
| | - Zixin Yuan
- Ministry of Education Key Laboratory for Ecology of Tropical Islands, Key Laboratory of Tropical Animal and Plant Ecology of Hainan Province, College of Life Sciences, Hainan Normal University, Haikou, China
| | - Jingyi Wang
- Ministry of Education Key Laboratory for Ecology of Tropical Islands, Key Laboratory of Tropical Animal and Plant Ecology of Hainan Province, College of Life Sciences, Hainan Normal University, Haikou, China
| | - Yali Guan
- Ministry of Education Key Laboratory for Ecology of Tropical Islands, Key Laboratory of Tropical Animal and Plant Ecology of Hainan Province, College of Life Sciences, Hainan Normal University, Haikou, China
- Hainan Observation and Research Station of Dongzhaigang Mangrove Wetland Ecosystem, Haikou, China
| |
Collapse
|
18
|
Hwang H, Jeon H, Yeo N, Baek D. Big data and deep learning for RNA biology. Exp Mol Med 2024; 56:1293-1321. [PMID: 38871816 PMCID: PMC11263376 DOI: 10.1038/s12276-024-01243-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 02/27/2024] [Accepted: 03/05/2024] [Indexed: 06/15/2024] Open
Abstract
The exponential growth of big data in RNA biology (RB) has led to the development of deep learning (DL) models that have driven crucial discoveries. As constantly evidenced by DL studies in other fields, the successful implementation of DL in RB depends heavily on the effective utilization of large-scale datasets from public databases. In achieving this goal, data encoding methods, learning algorithms, and techniques that align well with biological domain knowledge have played pivotal roles. In this review, we provide guiding principles for applying these DL concepts to various problems in RB by demonstrating successful examples and associated methodologies. We also discuss the remaining challenges in developing DL models for RB and suggest strategies to overcome these challenges. Overall, this review aims to illuminate the compelling potential of DL for RB and ways to apply this powerful technology to investigate the intriguing biology of RNA more effectively.
Collapse
Affiliation(s)
- Hyeonseo Hwang
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Hyeonseong Jeon
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
- Genome4me Inc., Seoul, Republic of Korea
| | - Nagyeong Yeo
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Daehyun Baek
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea.
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea.
- Genome4me Inc., Seoul, Republic of Korea.
| |
Collapse
|
19
|
Tian XC, Chen ZY, Nie S, Shi TL, Yan XM, Bao YT, Li ZC, Ma HY, Jia KH, Zhao W, Mao JF. Plant-LncPipe: a computational pipeline providing significant improvement in plant lncRNA identification. HORTICULTURE RESEARCH 2024; 11:uhae041. [PMID: 38638682 PMCID: PMC11024640 DOI: 10.1093/hr/uhae041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Accepted: 02/02/2024] [Indexed: 04/20/2024]
Abstract
Long non-coding RNAs (lncRNAs) play essential roles in various biological processes, such as chromatin remodeling, post-transcriptional regulation, and epigenetic modifications. Despite their critical functions in regulating plant growth, root development, and seed dormancy, the identification of plant lncRNAs remains a challenge due to the scarcity of specific and extensively tested identification methods. Most mainstream machine learning-based methods used for plant lncRNA identification were initially developed using human or other animal datasets, and their accuracy and effectiveness in predicting plant lncRNAs have not been fully evaluated or exploited. To overcome this limitation, we retrained several models, including CPAT, PLEK, and LncFinder, using plant datasets and compared their performance with mainstream lncRNA prediction tools such as CPC2, CNCI, RNAplonc, and LncADeep. Retraining these models significantly improved their performance, and two of the retrained models, LncFinder-plant and CPAT-plant, alongside their ensemble, emerged as the most suitable tools for plant lncRNA identification. This underscores the importance of model retraining in tackling the challenges associated with plant lncRNA identification. Finally, we developed a pipeline (Plant-LncPipe) that incorporates an ensemble of the two best-performing models and covers the entire data analysis process, including reads mapping, transcript assembly, lncRNA identification, classification, and origin, for the efficient identification of lncRNAs in plants. The pipeline, Plant-LncPipe, is available at: https://github.com/xuechantian/Plant-LncRNA-pipline.
Collapse
Affiliation(s)
- Xue-Chan Tian
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Zhao-Yang Chen
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Shuai Nie
- Rice Research Institute, Guangdong Academy of Agricultural Sciences & Key Laboratory of Genetics and Breeding of High Quality Rice in Southern China (Co-construction by Ministry and Province), Ministry of Agriculture and Rural Affairs & Guangdong Key Laboratory of New Technology in Rice Breeding, Guangzhou 510640, China
| | - Tian-Le Shi
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Xue-Mei Yan
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Yu-Tao Bao
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Zhi-Chao Li
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Hai-Yao Ma
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Kai-Hua Jia
- Key Laboratory of Crop Genetic Improvement & Ecology and Physiology, Institute of Crop Germplasm Resources, Shandong Academy of Agricultural Sciences, Jinan 250100, China
| | - Wei Zhao
- Department of Plant Physiology, Umeå Plant Science Centre (UPSC), Umeå University, Umeå 90187, Sweden
| | - Jian-Feng Mao
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- Department of Plant Physiology, Umeå Plant Science Centre (UPSC), Umeå University, Umeå 90187, Sweden
| |
Collapse
|
20
|
Tossou P, Wognum C, Craig M, Mary H, Noutahi E. Real-World Molecular Out-Of-Distribution: Specification and Investigation. J Chem Inf Model 2024; 64:697-711. [PMID: 38300258 PMCID: PMC10865358 DOI: 10.1021/acs.jcim.3c01774] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 01/09/2024] [Accepted: 01/10/2024] [Indexed: 02/02/2024]
Abstract
This study presents a rigorous framework for investigating molecular out-of-distribution (MOOD) generalization in drug discovery. The concept of MOOD is first clarified through a problem specification that demonstrates how the covariate shifts encountered during real-world deployment can be characterized by the distribution of sample distances to the training set. We find that these shifts can cause performance to drop by up to 60% and uncertainty calibration by up to 40%. This leads us to propose a splitting protocol that aims to close the gap between the deployment and testing. Then, using this protocol, a thorough investigation is conducted to assess the impact of model design, model selection, and data set characteristics on MOOD performance and uncertainty calibration. We find that appropriate representations and algorithms with built-in uncertainty estimation are crucial to improving performance and uncertainty calibration. This study sets itself apart by its exhaustiveness and opens an exciting avenue to benchmark meaningful algorithmic progress in molecular scoring.
Collapse
Affiliation(s)
- Prudencio Tossou
- Valence
Labs, Montréal, Québec H2S3G9, Canada
- Department
of Computer Science and Software Engineering, Université Laval, Montréal, Québec G1 V 0A6, Canada
| | - Cas Wognum
- Valence
Labs, Montréal, Québec H2S3G9, Canada
| | | | | | | |
Collapse
|
21
|
Gao H, Gao P, Ye N. A method for evaluating of RNA's coding potential using the interaction effects of open reading frames and high-energy scalograms. Comput Biol Med 2024; 168:107752. [PMID: 38007977 DOI: 10.1016/j.compbiomed.2023.107752] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 10/19/2023] [Accepted: 11/20/2023] [Indexed: 11/28/2023]
Abstract
The identification and function determination of long non-coding RNAs (lncRNAs) can help to better understand the transcriptional regulation in both normal development and disease pathology, thereby demanding methods to distinguish them from protein-coding (pcRNAs) after obtaining sequencing data. Many algorithms based on the statistical, structural, physical, and chemical properties of the sequences have been developed for evaluating the coding potential of RNA to distinguish them. In order to design common features that do not rely on hyperparameter tuning and optimization and are evaluated accurately, we designed a series of features from the effects of open reading frames (ORFs) on their mutual interactions and with the electrical intensity of sequence sites to further improve the screening accuracy. Finally, the single model constructed from our designed features meets the strong classifier criteria, where the accuracy is between 82% and 89%, and the prediction accuracy of the model constructed after combining the auxiliary features equal to or exceed some best classification tools. Moreover, our method does not require special hyper-parameter tuning operations and is species insensitive compared to other methods, which means this method can be easily applied to a wide range of species. Also, we find some correlations between the features, which provides some reference for follow-up studies.
Collapse
Affiliation(s)
- Hua Gao
- College of Forestry, Nanjing Forestry University, Longpan, Nanjing, 210037, Jiangsu, China; College of Information Science and Technology, Nanjing Forestry University, Longpan, Nanjing, 210037, Jiangsu, China.
| | - Peng Gao
- The First Affiliated Hospital of Xi'an Jiaotong University, 277 West Yanta Road, Xi'an, 710061, Shaanxi, China.
| | - Ning Ye
- College of Forestry, Nanjing Forestry University, Longpan, Nanjing, 210037, Jiangsu, China; College of Information Science and Technology, Nanjing Forestry University, Longpan, Nanjing, 210037, Jiangsu, China.
| |
Collapse
|
22
|
Rajesh P, Krishnamachari A. Composition, physicochemical property and base periodicity for discriminating lncRNA and mRNA. Bioinformation 2023; 19:1145-1152. [PMID: 38250538 PMCID: PMC10794758 DOI: 10.6026/973206300191145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 12/31/2023] [Accepted: 12/31/2023] [Indexed: 01/23/2024] Open
Abstract
Annotation of genome data with biological features is a challenging problem. One such problem deals with distinguishing lncRNA from mRNA. In this study, three groups of classification features, namely base periodicity, physicochemical property and nucleotide compositions were considered. We are attempting to propose a simple neural network model to obtain better results using judicious combination of the above said sequence features. Our approach uses balanced dataset, simple prediction model and use of limited features in distinguishing lncRNA and mRNA. Accordingly (a) two properties of base periodicity: peak power spectrum of the signal and noise-to-signal ratio (SNR) of this peak signal (b) three physicochemical properties: solvation, stacking and hydrogen-bonding energy and (c) all dinucleotides and trinucleotides compositions were used. Classification was performed by considering features independently followed by combining these properties for improvement. Classification metric was used to compare the result for seven eukaryotic organisms for various combinations of features. Nucleotide compositions combined with physicochemical property or base periodicity group of features becomes a strong classifier with more than 99 percentage accuracy. Base periodicity analysis with SNR can be used as discriminating feature of lncRNA from mRNA.
Collapse
Affiliation(s)
- Prasad Rajesh
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, 110067, India
| | | |
Collapse
|
23
|
Pudova E, Kobelyatskaya A, Emelyanova M, Snezhkina A, Fedorova M, Pavlov V, Guvatova Z, Dalina A, Kudryavtseva A. Non-Coding RNAs and the Development of Chemoresistance to Docetaxel in Prostate Cancer: Regulatory Interactions and Approaches Based on Machine Learning Methods. Life (Basel) 2023; 13:2304. [PMID: 38137905 PMCID: PMC10744715 DOI: 10.3390/life13122304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 11/30/2023] [Accepted: 12/05/2023] [Indexed: 12/24/2023] Open
Abstract
Chemotherapy based on taxane-class drugs is the gold standard for treating advanced stages of various oncological diseases. However, despite the favorable response trends, most patients eventually develop resistance to this therapy. Drug resistance is the result of a combination of different events in the tumor cells under the influence of the drug, a comprehensive understanding of which has yet to be determined. In this review, we examine the role of the major classes of non-coding RNAs in the development of chemoresistance in the case of prostate cancer, one of the most common and socially significant types of cancer in men worldwide. We will focus on recent findings from experimental studies regarding the prognostic potential of the identified non-coding RNAs. Additionally, we will explore novel approaches based on machine learning to study these regulatory molecules, including their role in the development of drug resistance.
Collapse
Affiliation(s)
- Elena Pudova
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia
| | | | - Marina Emelyanova
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia
| | - Anastasiya Snezhkina
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia
| | - Maria Fedorova
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia
| | - Vladislav Pavlov
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia
| | - Zulfiya Guvatova
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia
- Russian Clinical Research Center for Gerontology, Pirogov Russian National Research Medical University, Ministry of Healthcare of the Russian Federation, 129226 Moscow, Russia
| | - Alexandra Dalina
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia
| | - Anna Kudryavtseva
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia
| |
Collapse
|
24
|
Jastrzebski JP, Pascarella S, Lipka A, Dorocki S. IncRna: The R Package for Optimizing lncRNA Identification Processes. J Comput Biol 2023; 30:1322-1326. [PMID: 37878344 DOI: 10.1089/cmb.2023.0091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2023] Open
Abstract
In silico identification of long noncoding RNAs (lncRNAs) is a multistage process including filtering of transcripts according to their physical characteristics (e.g., length, exon-intron structure) and determination of the coding potential of the sequence. A common issue within this process is the choice of the most suitable method of coding potential analysis for the conducted research. Selection of tools on the sole basis of their single performance may not provide the most effective choice for a specific problem. To overcome these limitations, we developed the R library lncRna, which provides functions to easily carry out the entire lncRNA identification process. For example, the package prepares the data files for coding potential analysis to perform error analysis. Moreover, the package gives the opportunity to analyze the effectiveness of various combinations of the lncRNA prediction methods to select the optimal configuration of the entire process.
Collapse
Affiliation(s)
- Jan Pawel Jastrzebski
- Faculty of Biology and Biotechnology, University of Warmia and Mazury in Olsztyn, Olsztyn, Poland
| | - Stefano Pascarella
- Department of Biochemical Sciences "A. Rossi Fanelli" Sapienza University of Rome, Rome, Italy
| | - Aleksandra Lipka
- Institute of Oral Biology, Faculty of Dentistry University of Oslo, Oslo, Norway
| | | |
Collapse
|
25
|
Wang Y, Pan Z, Mou M, Xia W, Zhang H, Zhang H, Liu J, Zheng L, Luo Y, Zheng H, Yu X, Lian X, Zeng Z, Li Z, Zhang B, Zheng M, Li H, Hou T, Zhu F. A task-specific encoding algorithm for RNAs and RNA-associated interactions based on convolutional autoencoder. Nucleic Acids Res 2023; 51:e110. [PMID: 37889083 PMCID: PMC10682500 DOI: 10.1093/nar/gkad929] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Revised: 08/01/2023] [Accepted: 10/10/2023] [Indexed: 10/28/2023] Open
Abstract
RNAs play essential roles in diverse physiological and pathological processes by interacting with other molecules (RNA/protein/compound), and various computational methods are available for identifying these interactions. However, the encoding features provided by existing methods are limited and the existing tools does not offer an effective way to integrate the interacting partners. In this study, a task-specific encoding algorithm for RNAs and RNA-associated interactions was therefore developed. This new algorithm was unique in (a) realizing comprehensive RNA feature encoding by introducing a great many of novel features and (b) enabling task-specific integration of interacting partners using convolutional autoencoder-directed feature embedding. Compared with existing methods/tools, this novel algorithm demonstrated superior performances in diverse benchmark testing studies. This algorithm together with its source code could be readily accessed by all user at: https://idrblab.org/corain/ and https://github.com/idrblab/corain/.
Collapse
Affiliation(s)
- Yunxia Wang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Ziqi Pan
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Minjie Mou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Weiqi Xia
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Hongning Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Hanyu Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Jin Liu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Lingyan Zheng
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-ZJU Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Yongchao Luo
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Hanqi Zheng
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Xinyuan Yu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Xichen Lian
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Zhenyu Zeng
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-ZJU Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Zhaorong Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-ZJU Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Bing Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-ZJU Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Mingyue Zheng
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | - Honglin Li
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
- School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-ZJU Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
- Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang, China
| |
Collapse
|
26
|
Woo HM, Qian X, Tan L, Jha S, Alexander FJ, Dougherty ER, Yoon BJ. Optimal decision-making in high-throughput virtual screening pipelines. PATTERNS (NEW YORK, N.Y.) 2023; 4:100875. [PMID: 38035191 PMCID: PMC10682755 DOI: 10.1016/j.patter.2023.100875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Revised: 12/28/2022] [Accepted: 10/13/2023] [Indexed: 12/02/2023]
Abstract
The need for efficient computational screening of molecular candidates that possess desired properties frequently arises in various scientific and engineering problems, including drug discovery and materials design. However, the enormous search space containing the candidates and the substantial computational cost of high-fidelity property prediction models make screening practically challenging. In this work, we propose a general framework for constructing and optimizing a high-throughput virtual screening (HTVS) pipeline that consists of multi-fidelity models. The central idea is to optimally allocate the computational resources to models with varying costs and accuracy to optimize the return on computational investment. Based on both simulated and real-world data, we demonstrate that the proposed optimal HTVS framework can significantly accelerate virtual screening without any degradation in terms of accuracy. Furthermore, it enables an adaptive operational strategy for HTVS, where one can trade accuracy for efficiency.
Collapse
Affiliation(s)
- Hyun-Myung Woo
- Department of Biomedical & Robotics Engineering, Incheon National University, Incheon 22012, Republic of Korea
| | - Xiaoning Qian
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973, USA
| | - Li Tan
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973, USA
| | - Shantenu Jha
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973, USA
- Department of Electrical and Computer Engineering, Rutgers University, Piscataway, NJ 08854, USA
| | - Francis J. Alexander
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA
| | - Edward R. Dougherty
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA
| | - Byung-Jun Yoon
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973, USA
| |
Collapse
|
27
|
Tang X, Li Q, Feng X, Yang B, Zhong X, Zhou Y, Wang Q, Mao Y, Xie W, Liu T, Tang Q, Guo W, Wu F, Feng X, Wang Q, Lu Y, Xu J. Identification and Functional Analysis of Drought-Responsive Long Noncoding RNAs in Maize Roots. Int J Mol Sci 2023; 24:15039. [PMID: 37894720 PMCID: PMC10606207 DOI: 10.3390/ijms242015039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 09/28/2023] [Accepted: 09/28/2023] [Indexed: 10/29/2023] Open
Abstract
Long noncoding RNAs (lncRNAs) are transcripts with lengths of more than 200 nt and limited protein-coding potential. They were found to play important roles in plant stress responses. In this study, the maize drought-tolerant inbred line AC7643 and drought-sensitive inbred line AC7729/TZSRW, as well as their recombinant inbred lines (RILs) were selected to identify drought-responsive lncRNAs in roots. Compared with non-responsive lncRNAs, drought-responsive lncRNAs had different sequence characteristics in length of genes and number of exons. The ratio of down-regulated lncRNAs induced by drought was significantly higher than that of coding genes; and lncRNAs were more widespread expressed in recombination sites in the RILs. Additionally, by integration of the modifications of DNA 5-methylcytidine (5mC), histones, and RNA N6-methyladenosine (m6A), it was found that the enrichment of histone modifications associated with transcriptional activation in the genes generated lncRNAs was lower that coding genes. The lncRNAs-mRNAs co-expression network, containing 15,340 coding genes and 953 lncRNAs, was constructed to investigate the molecular functions of lncRNAs. There are 13 modules found to be associated with survival rate under drought. We found nine SNPs located in lncRNAs among the modules associated with plant survival under drought. In conclusion, we revealed the characteristics of lncRNAs responding to drought in maize roots based on multiomics studies. These findings enrich our understanding of lncRNAs under drought and shed light on the complex regulatory networks that are orchestrated by the noncoding RNAs in response to drought stress.
Collapse
Affiliation(s)
- Xin Tang
- Maize Research Institute, Sichuan Agricultural University, Chengdu 611130, China; (X.T.); (Q.L.); (X.F.); (B.Y.); (X.Z.); (Y.Z.); (Q.W.); (Y.M.); (W.X.); (T.L.); (Q.T.); (W.G.); (F.W.); (X.F.); (Q.W.)
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu 611130, China
| | - Qimeng Li
- Maize Research Institute, Sichuan Agricultural University, Chengdu 611130, China; (X.T.); (Q.L.); (X.F.); (B.Y.); (X.Z.); (Y.Z.); (Q.W.); (Y.M.); (W.X.); (T.L.); (Q.T.); (W.G.); (F.W.); (X.F.); (Q.W.)
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu 611130, China
| | - Xiaoju Feng
- Maize Research Institute, Sichuan Agricultural University, Chengdu 611130, China; (X.T.); (Q.L.); (X.F.); (B.Y.); (X.Z.); (Y.Z.); (Q.W.); (Y.M.); (W.X.); (T.L.); (Q.T.); (W.G.); (F.W.); (X.F.); (Q.W.)
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu 611130, China
| | - Bo Yang
- Maize Research Institute, Sichuan Agricultural University, Chengdu 611130, China; (X.T.); (Q.L.); (X.F.); (B.Y.); (X.Z.); (Y.Z.); (Q.W.); (Y.M.); (W.X.); (T.L.); (Q.T.); (W.G.); (F.W.); (X.F.); (Q.W.)
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu 611130, China
| | - Xiu Zhong
- Maize Research Institute, Sichuan Agricultural University, Chengdu 611130, China; (X.T.); (Q.L.); (X.F.); (B.Y.); (X.Z.); (Y.Z.); (Q.W.); (Y.M.); (W.X.); (T.L.); (Q.T.); (W.G.); (F.W.); (X.F.); (Q.W.)
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu 611130, China
| | - Yang Zhou
- Maize Research Institute, Sichuan Agricultural University, Chengdu 611130, China; (X.T.); (Q.L.); (X.F.); (B.Y.); (X.Z.); (Y.Z.); (Q.W.); (Y.M.); (W.X.); (T.L.); (Q.T.); (W.G.); (F.W.); (X.F.); (Q.W.)
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu 611130, China
| | - Qi Wang
- Maize Research Institute, Sichuan Agricultural University, Chengdu 611130, China; (X.T.); (Q.L.); (X.F.); (B.Y.); (X.Z.); (Y.Z.); (Q.W.); (Y.M.); (W.X.); (T.L.); (Q.T.); (W.G.); (F.W.); (X.F.); (Q.W.)
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu 611130, China
| | - Yan Mao
- Maize Research Institute, Sichuan Agricultural University, Chengdu 611130, China; (X.T.); (Q.L.); (X.F.); (B.Y.); (X.Z.); (Y.Z.); (Q.W.); (Y.M.); (W.X.); (T.L.); (Q.T.); (W.G.); (F.W.); (X.F.); (Q.W.)
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu 611130, China
| | - Wubin Xie
- Maize Research Institute, Sichuan Agricultural University, Chengdu 611130, China; (X.T.); (Q.L.); (X.F.); (B.Y.); (X.Z.); (Y.Z.); (Q.W.); (Y.M.); (W.X.); (T.L.); (Q.T.); (W.G.); (F.W.); (X.F.); (Q.W.)
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu 611130, China
| | - Tianhong Liu
- Maize Research Institute, Sichuan Agricultural University, Chengdu 611130, China; (X.T.); (Q.L.); (X.F.); (B.Y.); (X.Z.); (Y.Z.); (Q.W.); (Y.M.); (W.X.); (T.L.); (Q.T.); (W.G.); (F.W.); (X.F.); (Q.W.)
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu 611130, China
| | - Qi Tang
- Maize Research Institute, Sichuan Agricultural University, Chengdu 611130, China; (X.T.); (Q.L.); (X.F.); (B.Y.); (X.Z.); (Y.Z.); (Q.W.); (Y.M.); (W.X.); (T.L.); (Q.T.); (W.G.); (F.W.); (X.F.); (Q.W.)
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu 611130, China
| | - Wei Guo
- Maize Research Institute, Sichuan Agricultural University, Chengdu 611130, China; (X.T.); (Q.L.); (X.F.); (B.Y.); (X.Z.); (Y.Z.); (Q.W.); (Y.M.); (W.X.); (T.L.); (Q.T.); (W.G.); (F.W.); (X.F.); (Q.W.)
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu 611130, China
| | - Fengkai Wu
- Maize Research Institute, Sichuan Agricultural University, Chengdu 611130, China; (X.T.); (Q.L.); (X.F.); (B.Y.); (X.Z.); (Y.Z.); (Q.W.); (Y.M.); (W.X.); (T.L.); (Q.T.); (W.G.); (F.W.); (X.F.); (Q.W.)
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu 611130, China
| | - Xuanjun Feng
- Maize Research Institute, Sichuan Agricultural University, Chengdu 611130, China; (X.T.); (Q.L.); (X.F.); (B.Y.); (X.Z.); (Y.Z.); (Q.W.); (Y.M.); (W.X.); (T.L.); (Q.T.); (W.G.); (F.W.); (X.F.); (Q.W.)
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu 611130, China
| | - Qingjun Wang
- Maize Research Institute, Sichuan Agricultural University, Chengdu 611130, China; (X.T.); (Q.L.); (X.F.); (B.Y.); (X.Z.); (Y.Z.); (Q.W.); (Y.M.); (W.X.); (T.L.); (Q.T.); (W.G.); (F.W.); (X.F.); (Q.W.)
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu 611130, China
| | - Yanli Lu
- Maize Research Institute, Sichuan Agricultural University, Chengdu 611130, China; (X.T.); (Q.L.); (X.F.); (B.Y.); (X.Z.); (Y.Z.); (Q.W.); (Y.M.); (W.X.); (T.L.); (Q.T.); (W.G.); (F.W.); (X.F.); (Q.W.)
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu 611130, China
| | - Jie Xu
- Maize Research Institute, Sichuan Agricultural University, Chengdu 611130, China; (X.T.); (Q.L.); (X.F.); (B.Y.); (X.Z.); (Y.Z.); (Q.W.); (Y.M.); (W.X.); (T.L.); (Q.T.); (W.G.); (F.W.); (X.F.); (Q.W.)
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu 611130, China
| |
Collapse
|
28
|
Abdellaoui N, Kim SY, Kim MS. Effect of TRAF6-knockout on gene expression and lncRNA expression in Epithelioma papulosum cyprini (EPC) cells. Anim Cells Syst (Seoul) 2023; 27:197-207. [PMID: 37808550 PMCID: PMC10552615 DOI: 10.1080/19768354.2023.2263070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 09/20/2023] [Indexed: 10/10/2023] Open
Abstract
TRAF6 is a key immune gene that plays a significant role in toll-like receptor signal transduction and activates downstream immune genes involved in antiviral immunity in fish. To explore the role of TRAF6 in Epithelioma papulosum cyprini (EPC) cells, we knocked out the TRAF6 gene using the Clustered Regularly Interspaced Short Palindromic Repeats-Cas9 (CRISPR-Cas9) technique and then analyzed the transcriptomes of the knockout cells. In this study, we identified that 232 transcripts were differentially expressed in naive cells. Using the pipeline, we identified 381 novel lncRNAs in EPC cells, 23 of which were differentially expressed. Gene Ontology enrichment analysis demonstrated that differentially expressed genes (DEG) are implicated in various immune processes, such as neutrophil chemotaxis and mitogen-activated protein kinase binding. In addition, the KEGG pathway analysis revealed enrichment in immune-related pathways (Interleukin-17 signaling pathway, cytokine-cytokine receptor interaction, and TNF signaling pathway). Furthermore, the target genes of the differentially expressed lncRNAs were implicated in the negative regulation of interleukin-6 and tumor necrosis factor production. These results indicate that lncRNAs and protein-coding genes participate in the regulation of immune and metabolic processes in fish.
Collapse
Affiliation(s)
- Najib Abdellaoui
- Department of Biological Sciences, Kongju National University, Gongju, South Korea
| | - Seon Young Kim
- Department of Biological Sciences, Kongju National University, Gongju, South Korea
| | - Min Sun Kim
- Department of Biological Sciences, Kongju National University, Gongju, South Korea
- BK21 Team for Field-oriented BioCore Human Resources Development, Kongju National University, Gongju, South Korea
| |
Collapse
|
29
|
Chen XG, Yang X, Li C, Lin X, Zhang W. Non-coding RNA identification with pseudo RNA sequences and feature representation learning. Comput Biol Med 2023; 165:107355. [PMID: 37639767 DOI: 10.1016/j.compbiomed.2023.107355] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Revised: 07/16/2023] [Accepted: 08/12/2023] [Indexed: 08/31/2023]
Abstract
Distinguishing non-coding RNAs (ncRNAs) from coding RNAs is very important in bioinformatics. Although many methods have been proposed for solving this task, it remains highly challenging to further improve the accuracy of ncRNA identification. In this paper, we propose a coding potential predictor using feature representation learning based on pseudo RNA sequences named CPPFLPS. In this method, we use the pseudo RNA sequences generated by simulating RNA sequence mutations as new samples for data augmentation, and six string operations simulating RNA sequence mutations are considered: base replacement, base insertion, base deletion, subsequence reversion, subsequence repetition and subsequence deletion. In the feature representation learning framework, different types of pseudo RNA sequences are added to the training set to form new training sets that can be used to train baseline classifiers, thus obtaining baseline models. The resulting labels of these baseline models are used as feature vectors to represent RNA sequences, and the resulting feature vectors acquired after feature selection are used to train a predictive model for distinguishing ncRNAs from coding RNAs. Our method achieves better performance compared with that of existing state-of-the-art methods. The implementation of the proposed method is available at https://github.com/chenxgscuec/CPPFLPS.
Collapse
Affiliation(s)
- Xian-Gan Chen
- School of Biomedical Engineering, South-Central Minzu University, Wuhan, 430074, China; Hubei Key Laboratory of Medical Information Analysis and Tumor Diagnosis & Treatment, South-Central Minzu University, Wuhan, 430074, China; Key Laboratory of Cognitive Science(South-Central Minzu University), State Ethnic Affairs Commission, Wuhan, 430074, China.
| | - Xiaofei Yang
- School of Biomedical Engineering, South-Central Minzu University, Wuhan, 430074, China; Hubei Key Laboratory of Medical Information Analysis and Tumor Diagnosis & Treatment, South-Central Minzu University, Wuhan, 430074, China; Key Laboratory of Cognitive Science(South-Central Minzu University), State Ethnic Affairs Commission, Wuhan, 430074, China.
| | - Chenhong Li
- School of Biomedical Engineering, South-Central Minzu University, Wuhan, 430074, China; Hubei Key Laboratory of Medical Information Analysis and Tumor Diagnosis & Treatment, South-Central Minzu University, Wuhan, 430074, China; Key Laboratory of Cognitive Science(South-Central Minzu University), State Ethnic Affairs Commission, Wuhan, 430074, China.
| | - Xianguang Lin
- School of Biomedical Engineering, South-Central Minzu University, Wuhan, 430074, China; Hubei Key Laboratory of Medical Information Analysis and Tumor Diagnosis & Treatment, South-Central Minzu University, Wuhan, 430074, China; Key Laboratory of Cognitive Science(South-Central Minzu University), State Ethnic Affairs Commission, Wuhan, 430074, China.
| | - Wen Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China.
| |
Collapse
|
30
|
Gong L, Chen J, Cui X, Liu Y. RPIPCM: A deep network model for predicting lncRNA-protein interaction based on sequence feature encoding. Comput Biol Med 2023; 165:107366. [PMID: 37633089 DOI: 10.1016/j.compbiomed.2023.107366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 07/29/2023] [Accepted: 08/12/2023] [Indexed: 08/28/2023]
Abstract
LncRNA-protein interactionplays an important regulatory role in biological processes. In this paper, the proposed RPIPCM based on a novel deep network model uses the sequence feature encoding of both RNA and protein to predict lncRNA-protein interactions (LPIs). A negative sampling of sliding window method is proposed for solving the problem of unbalanced between positive and negative samples. The proposed negative sampling method is effective and helpful to solve the problem of data imbalance in the existing LPIs research by comparative experiments. Experimental results also show that the proposed sequence feature encoding method has good performance in predicting LPIs for different datasets of different sizes and types. In the RPI488 dataset related to animal, compared with the direct original sequence encoding model, the accuracy of sequence feature encoding model increased by 1.02%, the recall increased by 4.08%, and the value of MCC increased by 1.67%. In the case of the plant dataset ATH948, the sequence feature-based encoding demonstrated a 1.58% higher accuracy, a 1.53% higher recall, a 1.62% higher specificity, a 1.62% higher precision, and a 3.16% higher value of MCC compared to the direct original sequence-based encoding. Compared with the latest prediction work in the ZEA22133 dataset, RPIPCM is shown to be more effective with the accuracy increased by 2.23%, the recall increased by 1.78%, the specificity increased by 2.67%, the precision increased by 2.52%, and the value of MCC increased by 4.43%, which also proves the effectiveness and robustness of RPIPCM. In conclusion, RPIPCM of deep network model based on sequence feature encoding can automatically mine the hidden feature information of the sequence in the lncRNA-protein interaction without relying on external features or prior biomedical knowledge, and its low cost and high efficiency can provide a reference for biomedical researchers.
Collapse
Affiliation(s)
- Lejun Gong
- School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China.
| | - Jingmei Chen
- School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
| | - Xiong Cui
- School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
| | - Yang Liu
- School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
| |
Collapse
|
31
|
Wang XF, Yu CQ, You ZH, Qiao Y, Li ZW, Huang WZ. An efficient circRNA-miRNA interaction prediction model by combining biological text mining and wavelet diffusion-based sparse network structure embedding. Comput Biol Med 2023; 165:107421. [PMID: 37672925 DOI: 10.1016/j.compbiomed.2023.107421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2023] [Revised: 07/10/2023] [Accepted: 08/28/2023] [Indexed: 09/08/2023]
Abstract
MOTIVATION Accumulating clinical evidence shows that circular RNA (circRNA) plays an important regulatory role in the occurrence and development of human diseases, which is expected to provide a new perspective for the diagnosis and treatment of related diseases. Using computational methods can provide high probability preselection for wet experiments to save resources. However, due to the lack of neighborhood structure in sparse biological networks, the model based on network embedding and graph embedding is difficult to achieve ideal results. RESULTS In this paper, we propose BioDGW-CMI, which combines biological text mining and wavelet diffusion-based sparse network structure embedding to predict circRNA-miRNA interaction (CMI). In detail, BioDGW-CMI first uses the Bidirectional Encoder Representations from Transformers (BERT) for biological text mining to mine hidden features in RNA sequences, then constructs a CMI network, obtains the topological structure embedding of nodes in the network through heat wavelet diffusion patterns. Next, the Denoising autoencoder organically combines the structural features and Gaussian kernel similarity, finally, the feature is sent to lightGBM for training and prediction. BioDGW-CMI achieves the highest prediction performance in all three datasets in the field of CMI prediction. In the case study, all the 8 pairs of CMI based on circ-ITCH were successfully predicted. AVAILABILITY The data and source code can be found at https://github.com/1axin/BioDGW-CMI-model.
Collapse
Affiliation(s)
- Xin-Fei Wang
- School of Information Engineering, Xijing University, Xi'an, China
| | - Chang-Qing Yu
- School of Information Engineering, Xijing University, Xi'an, China.
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China.
| | - Yan Qiao
- College of Agriculture and Forestry, Longdong University, Qingyang, China
| | - Zheng-Wei Li
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China
| | - Wen-Zhun Huang
- School of Information Engineering, Xijing University, Xi'an, China
| |
Collapse
|
32
|
Liufu Y, Xi F, Wu L, Zhang Z, Wang H, Wang H, Zhang J, Wang B, Kou W, Gao J, Zhao L, Zhang H, Gu L. Inhibition of DNA and RNA methylation disturbs root development of moso bamboo. TREE PHYSIOLOGY 2023; 43:1653-1674. [PMID: 37294626 DOI: 10.1093/treephys/tpad074] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/04/2023] [Revised: 04/25/2023] [Accepted: 06/03/2023] [Indexed: 06/11/2023]
Abstract
DNA methylation (5mC) and N6-methyladenosine (m6A) are two important epigenetics regulators, which have a profound impact on plant growth development. Phyllostachys edulis (P. edulis) is one of the fastest spreading plants due to its well-developed root system. However, the association between 5mC and m6A has seldom been reported in P. edulis. In particular, the connection between m6A and several post-transcriptional regulators remains uncharacterized in P. edulis. Here, our morphological and electron microscope observations showed the phenotype of increased lateral root under RNA methylation inhibitor (DZnepA) and DNA methylation inhibitor (5-azaC) treatment. RNA epitranscriptome based on Nanopore direct RNA sequencing revealed that DZnepA treatment exhibits significantly decreased m6A level in the 3'-untranslated region (3'-UTR), which was accompanied by increased gene expression, full-length ratio, higher proximal poly(A) site usage and shorter poly(A) tail length. DNA methylation levels of CG and CHG were reduced in both coding sequencing and transposable element upon 5-azaC treatment. Cell wall synthesis was impaired under methylation inhibition. In particular, differentially expressed genes showed a high percentage of overlap between DZnepA and 5-azaC treatment, which suggested a potential correlation between two methylations. This study provides preliminary information for a better understanding of the link between m6A and 5mC in root development of moso bamboo.
Collapse
Affiliation(s)
- Yuxiang Liufu
- College of Forestry, Basic Forestry and Proteomics Research Center, Fujian Agriculture and Forestry University, No. 15 Shangxiadian Road, Cangshan District, Fuzhou City, Fujian Province 350002, China
| | - Feihu Xi
- College of Life Science, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Lin Wu
- College of Forestry, Basic Forestry and Proteomics Research Center, Fujian Agriculture and Forestry University, No. 15 Shangxiadian Road, Cangshan District, Fuzhou City, Fujian Province 350002, China
| | - Zeyu Zhang
- College of Forestry, Basic Forestry and Proteomics Research Center, Fujian Agriculture and Forestry University, No. 15 Shangxiadian Road, Cangshan District, Fuzhou City, Fujian Province 350002, China
| | - Huihui Wang
- College of Forestry, Basic Forestry and Proteomics Research Center, Fujian Agriculture and Forestry University, No. 15 Shangxiadian Road, Cangshan District, Fuzhou City, Fujian Province 350002, China
| | - Huiyuan Wang
- College of Forestry, Basic Forestry and Proteomics Research Center, Fujian Agriculture and Forestry University, No. 15 Shangxiadian Road, Cangshan District, Fuzhou City, Fujian Province 350002, China
| | - Jun Zhang
- College of Forestry, Basic Forestry and Proteomics Research Center, Fujian Agriculture and Forestry University, No. 15 Shangxiadian Road, Cangshan District, Fuzhou City, Fujian Province 350002, China
| | - Baijie Wang
- College of Forestry, Basic Forestry and Proteomics Research Center, Fujian Agriculture and Forestry University, No. 15 Shangxiadian Road, Cangshan District, Fuzhou City, Fujian Province 350002, China
| | - Wenjing Kou
- College of Forestry, Basic Forestry and Proteomics Research Center, Fujian Agriculture and Forestry University, No. 15 Shangxiadian Road, Cangshan District, Fuzhou City, Fujian Province 350002, China
| | - Jian Gao
- Key Laboratory of Bamboo and Rattan Science and Technology, State Forestry Administration, International Center for Bamboo and Rattan, Beijing 100102, China
| | - Liangzhen Zhao
- College of Life Science, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Hangxiao Zhang
- Key Laboratory of Bamboo and Rattan Science and Technology, State Forestry Administration, International Center for Bamboo and Rattan, Beijing 100102, China
| | - Lianfeng Gu
- Basic Forestry and Proteomics Research Center, College of Forestry, School of Future Technology, Fujian Agriculture and Forestry University, No. 15 Shangxiadian Road, Cangshan District, Fuzhou City, Fujian Province 350002, China
| |
Collapse
|
33
|
Zhang M, Zhao J, Wu J, Wang Y, Zhuang M, Zou L, Mao R, Jiang B, Liu J, Song X. In-depth characterization and identification of translatable lncRNAs. Comput Biol Med 2023; 164:107243. [PMID: 37453378 DOI: 10.1016/j.compbiomed.2023.107243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Revised: 06/16/2023] [Accepted: 07/07/2023] [Indexed: 07/18/2023]
Abstract
Long non-coding RNAs (LncRNAs) are non-protein coding transcripts more than 200 nucleotides in length. Deep sequencing technologies have unveiled lncRNAs can harbor translatable short open reading frames (sORFs). Yet the regulatory mechanisms governing lncRNA translation events remain poorly understood. Here, we exhaustively detected the sequence, functional element, and structure features relevant to lncRNA translation in human. Extensive identification and analysis reveal that translatable lncRNAs contain richer protein-coding related sequence features, cap-dependent and cap-independent translation initiation mechanisms, and more stable secondary structures, as compared to untranslatable lncRNAs. These findings strongly support lncRNAs serve as a repository for the production of new small peptides. Based on the feature fusion affecting translation and the extreme gradient boosting (XGBoost) algorithm, we developed the first computational tool that dedicated for predicting translatable lncRNAs, named TransLncPred. Benchmark experimental results show that our method outperforms several state-of-the-art RNA coding potential prediction tools on the same training and testing datasets. The 100-time 10-fold cross-validation tests also demonstrate that regulatory element-derived features, especially N7-methylguanosine (m7G) and internal ribosome entry site (IRES), contribute to the improvement in predictive performance.
Collapse
Affiliation(s)
- Meng Zhang
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China
| | - Jian Zhao
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China.
| | - Jing Wu
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, 211166, China
| | - Yulan Wang
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China
| | - Minhui Zhuang
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China
| | - Lingxiao Zou
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China
| | - Renlong Mao
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China
| | - Bin Jiang
- College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China
| | - Jingjing Liu
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China
| | - Xiaofeng Song
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China.
| |
Collapse
|
34
|
Dhakal P, Tayara H, Chong KT. An ensemble of stacking classifiers for improved prediction of miRNA-mRNA interactions. Comput Biol Med 2023; 164:107242. [PMID: 37473564 DOI: 10.1016/j.compbiomed.2023.107242] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Revised: 06/21/2023] [Accepted: 07/07/2023] [Indexed: 07/22/2023]
Abstract
MicroRNAs (miRNAs) are small non-coding RNA molecules that play a crucial role in regulating gene expression at the post-transcriptional level by binding to potential target sites of messenger RNAs (mRNAs), facilitated by the Argonaute family of proteins. Selecting the conservative candidate target sites (CTS) is a challenging step, considering that most of the existing computational algorithms primarily focus on canonical site types, which is a time-consuming and inefficient utilization of miRNA target site interactions. We developed a stacking classifier algorithm that addresses the CTS selection criteria using feature-encoding techniques that generates feature vectors, including k-mer nucleotide composition, dinucleotide composition, pseudo-nucleotide composition, and sequence order coupling. This innovative stacking classifier algorithm surpassed previous state-of-the-art algorithms in predicting functional miRNA targets. We evaluated the performance of the proposed model on 10 independent test datasets and obtained an average accuracy of 79.77%, which is a significant improvement of 7.26 % over previous models. This improvement shows that the proposed method has great potential for distinguishing highly functional miRNA targets and can serve as a valuable tool in biomedical and drug development research.
Collapse
Affiliation(s)
- Priyash Dhakal
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju-si, 54896, Jeollabuk-do, Republic of Korea
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju-si, 54896, Jeollabuk-do, Republic of Korea.
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju-si, 54896, Jeollabuk-do, Republic of Korea; Advanced Electronics and Information Research Center, Jeonbuk National University, Jeonju-si, 54896, Jeollabuk-do, Republic of Korea.
| |
Collapse
|
35
|
Hidalgo M, Ramos C, Zolla G. Analysis of lncRNAs in Lupinus mutabilis (Tarwi) and Their Potential Role in Drought Response. Noncoding RNA 2023; 9:48. [PMID: 37736894 PMCID: PMC10514842 DOI: 10.3390/ncrna9050048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 08/01/2023] [Accepted: 08/16/2023] [Indexed: 09/23/2023] Open
Abstract
Lupinus mutabilis is a legume with high agronomic potential and available transcriptomic data for which lncRNAs have not been studied. Therefore, our objective was to identify, characterize, and validate the drought-responsive lncRNAs in L. mutabilis. To achieve this, we used a multilevel approach based on lncRNA prediction, annotation, subcellular location, thermodynamic characterization, structural conservation, and validation. Thus, 590 lncRNAs were identified by at least two algorithms of lncRNA identification. Annotation with the PLncDB database showed 571 lncRNAs unique to tarwi and 19 lncRNAs with homology in 28 botanical families including Solanaceae (19), Fabaceae (17), Brassicaceae (17), Rutaceae (17), Rosaceae (16), and Malvaceae (16), among others. In total, 12 lncRNAs had homology in more than 40 species. A total of 67% of lncRNAs were located in the cytoplasm and 33% in exosomes. Thermodynamic characterization of S03 showed a stable secondary structure with -105.67 kcal/mol. This structure included three regions, with a multibranch loop containing a hairpin with a SECIS-like element. Evaluation of the structural conservation by CROSSalign revealed partial similarities between L. mutabilis (S03) and S. lycopersicum (Solyc04r022210.1). RT-PCR validation demonstrated that S03 was upregulated in a drought-tolerant accession of L. mutabilis. Finally, these results highlighted the importance of lncRNAs in tarwi improvement under drought conditions.
Collapse
Affiliation(s)
- Manuel Hidalgo
- Programa de Estudio de Medicina Humana, Universidad Privada Antenor Orrego, Av. América Sur 3145, Trujillo 13008, Peru; (M.H.); (C.R.)
| | - Cynthia Ramos
- Programa de Estudio de Medicina Humana, Universidad Privada Antenor Orrego, Av. América Sur 3145, Trujillo 13008, Peru; (M.H.); (C.R.)
| | - Gaston Zolla
- Laboratorio de Fisiología Molecular de Plantas del Programa de Cereales y Granos Nativos, Facultad de Agronomía, Universidad Nacional Agraria La Molina, Lima 12, Peru
| |
Collapse
|
36
|
Chiu KP, Stuart L, Ooi HS, Yu J, Smith DG, Pei KJC. Genome sequencing and application of Taiwanese macaque Macaca cyclopis. Sci Rep 2023; 13:11545. [PMID: 37460589 DOI: 10.1038/s41598-023-38402-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Accepted: 07/07/2023] [Indexed: 07/20/2023] Open
Abstract
Formosan macaque (Macaca cyclopis) is the only non-human primate in Taiwan Island. We performed de novo hybrid assembly for M. cyclopis using Illumina paired-end short reads, mate-pair reads and Nanopore long reads and obtained 5065 contigs with a N50 of 2.66 megabases. M. cyclopis contigs > = 10 kb were assigned to chromosomes using Indian rhesus macaque (Macaca mulatta mulatta) genome assembly Mmul_10 as reference, resulting in a draft of M. cyclopis genome of 2,846,042,475 bases, distributed in 21 chromosomes. The draft genome contains 23,462 transcriptional origins (genes), capable of expressing 716,231 exons in 59,484 transcripts. Genome-based phylogenetic study using the assembled M. cyclopis genome together with genomes of four other macaque species, human, orangutan and chimpanzee showed similar result as previously reported. However, the M. cyclopis species was found to diverge from Chinese M. mulatta lasiota about 1.8 million years ago. Fossil gene analysis detected the presence of gap and pol endogenous viral elements of simian retrovirus in all macaques tested, including M. fascicularis, M. m. mulatta and M. cyclopis. However, M. cyclopis showed ~ 2 times less in number and more uniform in chromosomal locations. The constrain in foreign genome disturbance, presumably due to geographical isolation, should be able to simplify genomics-related investigations, making M. cyclopis an ideal primate species for medical research.
Collapse
Affiliation(s)
- Kuo-Ping Chiu
- Genomics Research Center, Academia Sinica, Taipei, Taiwan.
- Top Science Biotechnologies, Inc., 4F, 50-2 Dingping Rd., Sec. 1, Shiding District, New Taipei City, 223002, Taiwan.
| | - Lutimba Stuart
- Top Science Biotechnologies, Inc., 4F, 50-2 Dingping Rd., Sec. 1, Shiding District, New Taipei City, 223002, Taiwan
| | - Hong Sain Ooi
- Top Science Biotechnologies, Inc., 4F, 50-2 Dingping Rd., Sec. 1, Shiding District, New Taipei City, 223002, Taiwan
| | - John Yu
- Institute of Stem Cell and Translational Cancer Research, Chang Gung Memorial Hospital at Linkou, No.5, Fu-Shin St., Kuei Shang, Taoyuan, 333, Taiwan
| | - David Glenn Smith
- Department of Anthropology, University of California Davis, Davis, CA, USA
| | - Kurtis Jai-Chyi Pei
- Institute of Wildlife Conservation, College of Veterinary Medicine, National Pingtung University of Science and Technology, Pingtung, Taiwan
| |
Collapse
|
37
|
Deng L, Jiang Y, Hu X, Zheng R, Huang Z, Zhang J. ABLNCPP: Attention Mechanism-Based Bidirectional Long Short-Term Memory for Noncoding RNA Coding Potential Prediction. J Chem Inf Model 2023; 63:3955-3966. [PMID: 37294848 DOI: 10.1021/acs.jcim.3c00366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
With the continuous development of ribosome profiling, sequencing technology, and proteomics, evidence is mounting that noncoding RNA (ncRNA) may be a novel source of peptides or proteins. These peptides and proteins play crucial roles in inhibiting tumor progression and interfering with cancer metabolism and other essential physiological processes. Therefore, identifying ncRNAs with coding potential is vital to ncRNA functional research. However, existing studies perform well in classifying ncRNAs and mRNAs, and no research has been explicitly raised to distinguish whether ncRNA transcripts have coding potential. For this reason, we propose an attention mechanism-based bidirectional LSTM network called ABLNCPP to assess the coding possibility of ncRNA sequences. Considering the sequential information loss in previous methods, we introduce a novel nonoverlapping trinucleotide embedding (NOLTE) method for ncRNAs to obtain embeddings containing sequential features. The extensive evaluations show that ABLNCPP outperforms other state-of-the-art models. In general, ABLNCPP overcomes the bottleneck of ncRNA coding potential prediction and is expected to provide valuable contributions to cancer discovery and treatment in the future. The source code and data sets are freely available at https://github.com/YinggggJ/ABLNCPP.
Collapse
Affiliation(s)
- Lei Deng
- School of Computer Science and Engineering, Central South University, Changsha 410018, China
| | - Ying Jiang
- School of Computer Science and Engineering, Central South University, Changsha 410018, China
| | - Xiaowen Hu
- School of Computer Science and Engineering, Central South University, Changsha 410018, China
| | - Rongtao Zheng
- School of Computer Science and Engineering, Central South University, Changsha 410018, China
| | - Zhijian Huang
- School of Computer Science and Engineering, Central South University, Changsha 410018, China
| | - Jingpu Zhang
- School of Computer and Data Science, Henan University of Urban Construction, Pingdingshan 467000, China
| |
Collapse
|
38
|
Pronozin AY, Afonnikov DA. ICAnnoLncRNA: A Snakemake Pipeline for a Long Non-Coding-RNA Search and Annotation in Transcriptomic Sequences. Genes (Basel) 2023; 14:1331. [PMID: 37510236 PMCID: PMC10379598 DOI: 10.3390/genes14071331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 06/09/2023] [Accepted: 06/21/2023] [Indexed: 07/30/2023] Open
Abstract
Long non-coding RNAs (lncRNAs) are RNA molecules longer than 200 nucleotides that do not encode proteins. Experimental studies have shown the diversity and importance of lncRNA functions in plants. To expand knowledge about lncRNAs in other species, computational pipelines that allow for standardised data-processing steps in a mode that does not require user control up until the final result were actively developed recently. These advancements enable wider functionality for lncRNA data identification and analysis. In the present work, we propose the ICAnnoLncRNA pipeline for the automatic identification, classification and annotation of plant lncRNAs in assembled transcriptomic sequences. It uses the LncFinder software for the identification of lncRNAs and allows the adjustment of recognition parameters using genomic data for which lncRNA annotation is available. The pipeline allows the prediction of lncRNA candidates, alignment of lncRNA sequences to the reference genome, filtering of erroneous/noise transcripts and probable transposable elements, lncRNA classification by genome location, comparison with sequences from external databases and analysis of lncRNA structural features and expression. We used transcriptomic sequences from 15 maize libraries assembled by Trinity and Hisat2/StringTie to demonstrate the application of the ICAnnoLncRNA pipeline.
Collapse
Affiliation(s)
- Artem Yu Pronozin
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 630090 Novosibirsk, Russia
- Kurchatov Center for Genome Research, Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 630090 Novosibirsk, Russia
- Faculty of Natural Sciences, Novosibirsk State University, 630090 Novosibirsk, Russia
| | - Dmitry A Afonnikov
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 630090 Novosibirsk, Russia
- Kurchatov Center for Genome Research, Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 630090 Novosibirsk, Russia
- Faculty of Natural Sciences, Novosibirsk State University, 630090 Novosibirsk, Russia
| |
Collapse
|
39
|
Gao H, Gao P, Ye N. Prelnc2: A prediction tool for lncRNAs with enhanced multi-level features of RNAs. PLoS One 2023; 18:e0286377. [PMID: 37262050 DOI: 10.1371/journal.pone.0286377] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Accepted: 05/15/2023] [Indexed: 06/03/2023] Open
Abstract
Long non-coding RNAs (lncRNAs) have been widely studied for their important biological significance. In general, we need to distinguish them from protein coding RNAs (pcRNAs) with similar functions. Based on various strategies, algorithms and tools have been designed and developed to train and validate such classification capabilities. However, many of them lack certain scalability, versatility, and rely heavily on genome annotation. In this paper, we design a convenient and biologically meaningful classification tool "Prelnc2" using multi-scale position and frequency information of wavelet transform spectrum and generalizes the frequency statistics method. Finally, we used the extracted features and auxiliary features together to train the model and verify it with test data. PreLnc2 achieved 93.2% accuracy for animal and plant transcripts, outperforming PreLnc by 2.1% improvement and our method provides an effective alternative to the prediction of lncRNAs.
Collapse
Affiliation(s)
- Hua Gao
- College of Forestry, Nanjing Forestry University, Nanjing, China
- College of Information Science and Technology, Nanjing Forestry University, Nanjing, China
| | - Peng Gao
- The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
| | - Ning Ye
- College of Forestry, Nanjing Forestry University, Nanjing, China
- College of Information Science and Technology, Nanjing Forestry University, Nanjing, China
| |
Collapse
|
40
|
Palos K, Yu L, Railey CE, Nelson Dittrich AC, Nelson ADL. Linking discoveries, mechanisms, and technologies to develop a clearer perspective on plant long noncoding RNAs. THE PLANT CELL 2023; 35:1762-1786. [PMID: 36738093 PMCID: PMC10226578 DOI: 10.1093/plcell/koad027] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 12/19/2022] [Accepted: 12/22/2022] [Indexed: 05/30/2023]
Abstract
Long noncoding RNAs (lncRNAs) are a large and diverse class of genes in eukaryotic genomes that contribute to a variety of regulatory processes. Functionally characterized lncRNAs play critical roles in plants, ranging from regulating flowering to controlling lateral root formation. However, findings from the past decade have revealed that thousands of lncRNAs are present in plant transcriptomes, and characterization has lagged far behind identification. In this setting, distinguishing function from noise is challenging. However, the plant community has been at the forefront of discovery in lncRNA biology, providing many functional and mechanistic insights that have increased our understanding of this gene class. In this review, we examine the key discoveries and insights made in plant lncRNA biology over the past two and a half decades. We describe how discoveries made in the pregenomics era have informed efforts to identify and functionally characterize lncRNAs in the subsequent decades. We provide an overview of the functional archetypes into which characterized plant lncRNAs fit and speculate on new avenues of research that may uncover yet more archetypes. Finally, this review discusses the challenges facing the field and some exciting new molecular and computational approaches that may help inform lncRNA comparative and functional analyses.
Collapse
Affiliation(s)
- Kyle Palos
- Boyce Thompson Institute, Cornell University, Ithaca, NY 14853, USA
| | - Li’ang Yu
- Boyce Thompson Institute, Cornell University, Ithaca, NY 14853, USA
| | - Caylyn E Railey
- Boyce Thompson Institute, Cornell University, Ithaca, NY 14853, USA
- Plant Biology Graduate Field, Cornell University, Ithaca, NY 14853, USA
| | | | | |
Collapse
|
41
|
Liu Z, Lan P, Liu T, Liu X, Liu T. m6Aminer: Predicting the m6Am Sites on mRNA by Fusing Multiple Sequence-Derived Features into a CatBoost-Based Classifier. Int J Mol Sci 2023; 24:ijms24097878. [PMID: 37175594 PMCID: PMC10177809 DOI: 10.3390/ijms24097878] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 04/20/2023] [Accepted: 04/24/2023] [Indexed: 05/15/2023] Open
Abstract
As one of the most important post-transcriptional modifications, m6Am plays a fairly important role in conferring mRNA stability and in the progression of cancers. The accurate identification of the m6Am sites is critical for explaining its biological significance and developing its application in the medical field. However, conventional experimental approaches are time-consuming and expensive, making them unsuitable for the large-scale identification of the m6Am sites. To address this challenge, we exploit a CatBoost-based method, m6Aminer, to identify the m6Am sites on mRNA. For feature extraction, nine different feature-encoding schemes (pseudo electron-ion interaction potential, hash decimal conversion method, dinucleotide binary encoding, nucleotide chemical properties, pseudo k-tuple composition, dinucleotide numerical mapping, K monomeric units, series correlation pseudo trinucleotide composition, and K-spaced nucleotide pair frequency) were utilized to form the initial feature space. To obtain the optimized feature subset, the ExtraTreesClassifier algorithm was adopted to perform feature importance ranking, and the top 300 features were selected as the optimal feature subset. With different performance assessment methods, 10-fold cross-validation and independent test, m6Aminer achieved average AUC of 0.913 and 0.754, demonstrating a competitive performance with the state-of-the-art models m6AmPred (0.905 and 0.735) and DLm6Am (0.897 and 0.730). The prediction model developed in this study can be used to identify the m6Am sites in the whole transcriptome, laying a foundation for the functional research of m6Am.
Collapse
Affiliation(s)
- Ze Liu
- College of Water Resources and Architectural Engineering, Northwest A&F University, Xianyang 712100, China
| | - Pengfei Lan
- College of Water Resources and Architectural Engineering, Northwest A&F University, Xianyang 712100, China
| | - Ting Liu
- College of Water Resources and Architectural Engineering, Northwest A&F University, Xianyang 712100, China
- Department of Mechanical Engineering, Faculty of Engineering, The University of Hong Kong, Hong Kong 999077, China
| | - Xudong Liu
- College of Water Resources and Architectural Engineering, Northwest A&F University, Xianyang 712100, China
- College of Control Science and Engineering, Zhejiang University, Hangzhou 310027, China
| | - Tao Liu
- College of Water Resources and Architectural Engineering, Northwest A&F University, Xianyang 712100, China
- Key Laboratory of Agricultural Soil and Water Engineering in Arid and Semiarid Areas, Ministry of Education, Northwest A&F University, Xianyang 712100, China
| |
Collapse
|
42
|
Kruse E, Göringer HU. Nanopore-Based Direct RNA Sequencing of the Trypanosoma brucei Transcriptome Identifies Novel lncRNAs. Genes (Basel) 2023; 14:genes14030610. [PMID: 36980882 PMCID: PMC10048164 DOI: 10.3390/genes14030610] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2023] [Revised: 02/23/2023] [Accepted: 02/26/2023] [Indexed: 03/04/2023] Open
Abstract
Trypanosomatids are single-cell eukaryotic parasites. Unlike higher eukaryotes, they control gene expression post-transcriptionally and not at the level of transcription initiation. This involves all known cellular RNA circuits, from mRNA processing to mRNA decay, to translation, in addition to a large panel of RNA-interacting proteins that modulate mRNA abundance. However, other forms of gene regulation, for example by lncRNAs, cannot be excluded. LncRNAs are poorly studied in trypanosomatids, with only a single lncRNA characterized to date. Furthermore, it is not clear whether the complete inventory of trypanosomatid lncRNAs is known, because of the inherent cDNA-recoding and DNA-amplification limitations of short-read RNA sequencing. Here, we overcome these limitations by using long-read direct RNA sequencing (DRS) on nanopore arrays. We analyze the native RNA pool of the two main lifecycle stages of the African trypanosome Trypanosoma brucei, with a special emphasis on the inventory of lncRNAs. We identify 207 previously unknown lncRNAs, 32 of which are stage-specifically expressed. We also present insights into the complexity of the T. brucei transcriptome, including alternative transcriptional start and stop sites and potential transcript isoforms, to provide a bias-free understanding of the intricate RNA landscape in T. brucei.
Collapse
|
43
|
Long Non-Coding RNAs of Plants in Response to Abiotic Stresses and Their Regulating Roles in Promoting Environmental Adaption. Cells 2023; 12:cells12050729. [PMID: 36899864 PMCID: PMC10001313 DOI: 10.3390/cells12050729] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Revised: 02/10/2023] [Accepted: 02/21/2023] [Indexed: 03/03/2023] Open
Abstract
Abiotic stresses triggered by climate change and human activity cause substantial agricultural and environmental problems which hamper plant growth. Plants have evolved sophisticated mechanisms in response to abiotic stresses, such as stress perception, epigenetic modification, and regulation of transcription and translation. Over the past decade, a large body of literature has revealed the various regulatory roles of long non-coding RNAs (lncRNAs) in the plant response to abiotic stresses and their irreplaceable functions in environmental adaptation. LncRNAs are recognized as a class of ncRNAs that are longer than 200 nucleotides, influencing a variety of biological processes. In this review, we mainly focused on the recent progress of plant lncRNAs, outlining their features, evolution, and functions of plant lncRNAs in response to drought, low or high temperature, salt, and heavy metal stress. The approaches to characterize the function of lncRNAs and the mechanisms of how they regulate plant responses to abiotic stresses were further reviewed. Moreover, we discuss the accumulating discoveries regarding the biological functions of lncRNAs on plant stress memory as well. The present review provides updated information and directions for us to characterize the potential functions of lncRNAs in abiotic stresses in the future.
Collapse
|
44
|
Feng H, Wang S, Wang Y, Ni X, Yang Z, Hu X, Sen Yang. LncCat: An ORF attention model to identify LncRNA based on ensemble learning strategy and fused sequence information. Comput Struct Biotechnol J 2023; 21:1433-1447. [PMID: 36824229 PMCID: PMC9941877 DOI: 10.1016/j.csbj.2023.02.012] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 02/06/2023] [Accepted: 02/06/2023] [Indexed: 02/10/2023] Open
Abstract
Background Long non-coding RNA (lncRNA) is one of the most essential forms of transcripts, playing crucial regulatory roles in the development of cancers and diseases without protein-coding ability. It was assumed that short ORFs (sORFs) in lncRNA were weak to translate proteins. However, recent research has shown that sORFs can encode peptides, which increases the difficulty to identify lncRNA. Therefore, identifying lncRNAs with sORFs facilitates finding novel regulatory factors. Results In this paper, we propose LncCat for identifying lncRNA based on category boosting (CatBoost) and ORF-attention features. LncCat combines five types of features to encode transcript sequences and employs CatBoost to build a prediction model. In addition, the visualization comparison reveals that the ORF-attention features between lncRNAs and protein-coding transcripts are significantly distinct. The comparison results show that LncCat outperforms competing methods on several benchmark datasets. For Matthew's Correlation Coefficient (MCC), LncCat achieves 0.9503, 0.9219, 0.8591, 0.8672, and 0.9047 on the human, mouse, zebrafish, wheat, and chicken datasets, with improvements ranging from 1.90% to 7.82%, 1.49-17.63%, 6.11-21.50%, 3.02-51.64% and 5.35-26.90%, respectively. Moreover, LncCat dramatically improves the MCC by at least 11.90%, 12.96% and 42.61% on sORF test datasets of human, mouse, and zebrafish, respectively. Conclusions Experiments indicate that LncCat performs better both on long ORF and sORF datasets, and ORF-attention features show positive effects on predicting lncRNA. In brief, LncCat is a reliable method for identifying lncRNA. Additionally, a user-friendly web server is developed for academics at http://cczubio.top/lnccat.
Collapse
Affiliation(s)
- Hongqi Feng
- School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou 213164, China
| | - Shaocong Wang
- School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou 213164, China
| | - Yan Wang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
- School of Artificial Intelligence, Jilin University, Changchun 130012, China
| | - Xinye Ni
- The Affiliated Changzhou No.2 People’s Hospital of Nanjing Medical University, Changzhou 213164, China
| | - Zexi Yang
- School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou 213164, China
| | - Xuemei Hu
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
| | - Sen Yang
- School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou 213164, China
- The Affiliated Changzhou No.2 People’s Hospital of Nanjing Medical University, Changzhou 213164, China
| |
Collapse
|
45
|
Wang Y, Wang X, Cui X, Meng J, Rong R. Self-attention enabled deep learning of dihydrouridine (D) modification on mRNAs unveiled a distinct sequence signature from tRNAs. MOLECULAR THERAPY. NUCLEIC ACIDS 2023; 31:411-420. [PMID: 36845339 PMCID: PMC9945750 DOI: 10.1016/j.omtn.2023.01.014] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Accepted: 01/23/2023] [Indexed: 01/28/2023]
Abstract
Dihydrouridine (D) is a modified pyrimidine nucleotide universally found in viral, prokaryotic, and eukaryotic species. It serves as a metabolic modulator for various pathological conditions, and its elevated levels in tumors are associated with a series of cancers. Precise identification of D sites on RNA is vital for understanding its biological function. A number of computational approaches have been developed for predicting D sites on tRNAs; however, none have considered mRNAs. We present here DPred, the first computational tool for predicting D on mRNAs in yeast from the primary RNA sequences. Built on a local self-attention layer and a convolutional neural network (CNN) layer, the proposed deep learning model outperformed classic machine learning approaches (random forest, support vector machines, etc.) and achieved reasonable accuracy and reliability with areas under the curve of 0.9166 and 0.9027 in jackknife cross-validation and on an independent testing dataset, respectively. Importantly, we showed that distinct sequence signatures are associated with the D sites on mRNAs and tRNAs, implying potentially different formation mechanisms and putative divergent functionality of this modification on the two types of RNA. DPred is available as a user-friendly Web server.
Collapse
Affiliation(s)
- Yue Wang
- Department of Mathematical Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China,Department of Computer Science, University of Liverpool, L69 7ZB Liverpool, UK
| | - Xuan Wang
- Department of Biological Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
| | - Xiaodong Cui
- School of Marine Science and Technology, Northwestern Polytechnical University, Xi’an, Shaanxi 710072, China
| | - Jia Meng
- Department of Biological Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China,AI University Research Centre, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China,Institute of Systems, Molecular and Integrative Biology, University of Liverpool, L69 7ZB Liverpool, UK
| | - Rong Rong
- Department of Biological Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China,Corresponding author: Rong Rong, Department of Biological Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China.
| |
Collapse
|
46
|
Liu T, Zou B, He M, Hu Y, Dou Y, Cui T, Tan P, Li S, Rao S, Huang Y, Liu S, Cai K, Wang D. LncReader: identification of dual functional long noncoding RNAs using a multi-head self-attention mechanism. Brief Bioinform 2023; 24:6961607. [PMID: 36575567 DOI: 10.1093/bib/bbac579] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 11/11/2022] [Accepted: 11/28/2022] [Indexed: 12/29/2022] Open
Abstract
Long noncoding ribonucleic acids (RNAs; LncRNAs) endowed with both protein-coding and noncoding functions are referred to as 'dual functional lncRNAs'. Recently, dual functional lncRNAs have been intensively studied and identified as involved in various fundamental cellular processes. However, apart from time-consuming and cell-type-specific experiments, there is virtually no in silico method for predicting the identity of dual functional lncRNAs. Here, we developed a deep-learning model with a multi-head self-attention mechanism, LncReader, to identify dual functional lncRNAs. Our data demonstrated that LncReader showed multiple advantages compared to various classical machine learning methods using benchmark datasets from our previously reported cncRNAdb project. Moreover, to obtain independent in-house datasets for robust testing, mass spectrometry proteomics combined with RNA-seq and Ribo-seq were applied in four leukaemia cell lines, which further confirmed that LncReader achieved the best performance compared to other tools. Therefore, LncReader provides an accurate and practical tool that enables fast dual functional lncRNA identification.
Collapse
Affiliation(s)
- Tianyuan Liu
- Department of Hematology and Oncology, Shenzhen Children's Hospital, Shenzhen 518038, China.,Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Bohao Zou
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China.,Department of Statistics, University of California Davis, Davis, California, USA
| | - Manman He
- State Key Laboratory of Medical Molecular Biology, Key Laboratorytar of RNA Regulation and Hematopoiesis, Department of Biochemistry and Molecular Biology, Institute of Basic Medical Sciences, School of Basic Medicine, CAMS and Peking Union Medical College, Beijing 100005, China
| | - Yongfei Hu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China.,Dermatology Hospital, Southern Medical University, Guangzhou, 510091, China
| | - Yiying Dou
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Tianyu Cui
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Puwen Tan
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Shaobin Li
- Department of Thoracic Surgery, Nanfang Hospital, Southern Medical University, Guangzhou 510515, China
| | - Shuan Rao
- Department of Thoracic Surgery, Nanfang Hospital, Southern Medical University, Guangzhou 510515, China
| | - Yan Huang
- Cancer Research Institute, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Sixi Liu
- Department of Hematology and Oncology, Shenzhen Children's Hospital, Shenzhen 518038, China
| | - Kaican Cai
- Department of Thoracic Surgery, Nanfang Hospital, Southern Medical University, Guangzhou 510515, China
| | - Dong Wang
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China.,Dermatology Hospital, Southern Medical University, Guangzhou, 510091, China.,Department of Bioinformatics, Fujian Key Laboratory of Medical Bioinformatics, School of Medical Technology and Engineering, Fujian Medical University, Fuzhou, 350122, China
| |
Collapse
|
47
|
Chen JW, Shrestha L, Green G, Leier A, Marquez-Lago TT. The hitchhikers' guide to RNA sequencing and functional analysis. Brief Bioinform 2023; 24:bbac529. [PMID: 36617463 PMCID: PMC9851315 DOI: 10.1093/bib/bbac529] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 10/18/2022] [Accepted: 11/07/2022] [Indexed: 01/10/2023] Open
Abstract
DNA and RNA sequencing technologies have revolutionized biology and biomedical sciences, sequencing full genomes and transcriptomes at very high speeds and reasonably low costs. RNA sequencing (RNA-Seq) enables transcript identification and quantification, but once sequencing has concluded researchers can be easily overwhelmed with questions such as how to go from raw data to differential expression (DE), pathway analysis and interpretation. Several pipelines and procedures have been developed to this effect. Even though there is no unique way to perform RNA-Seq analysis, it usually follows these steps: 1) raw reads quality check, 2) alignment of reads to a reference genome, 3) aligned reads' summarization according to an annotation file, 4) DE analysis and 5) gene set analysis and/or functional enrichment analysis. Each step requires researchers to make decisions, and the wide variety of options and resulting large volumes of data often lead to interpretation challenges. There also seems to be insufficient guidance on how best to obtain relevant information and derive actionable knowledge from transcription experiments. In this paper, we explain RNA-Seq steps in detail and outline differences and similarities of different popular options, as well as advantages and disadvantages. We also discuss non-coding RNA analysis, multi-omics, meta-transcriptomics and the use of artificial intelligence methods complementing the arsenal of tools available to researchers. Lastly, we perform a complete analysis from raw reads to DE and functional enrichment analysis, visually illustrating how results are not absolute truths and how algorithmic decisions can greatly impact results and interpretation.
Collapse
Affiliation(s)
- Jiung-Wen Chen
- Department of Biology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Lisa Shrestha
- Department of Genetics, University of Alabama at Birmingham, School of Medicine, Birmingham, AL, USA
| | - George Green
- Department of Biology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - André Leier
- Department of Genetics, University of Alabama at Birmingham, School of Medicine, Birmingham, AL, USA
- Department of Cell, Developmental and Integrative Biology, University of Alabama at Birmingham, School of Medicine, Birmingham, AL, USA
| | - Tatiana T Marquez-Lago
- Department of Genetics, University of Alabama at Birmingham, School of Medicine, Birmingham, AL, USA
- Department of Cell, Developmental and Integrative Biology, University of Alabama at Birmingham, School of Medicine, Birmingham, AL, USA
- Department of Microbiology, University of Alabama at Birmingham, School of Medicine, Birmingham, AL, USA
| |
Collapse
|
48
|
Singh D, Roy J. A large-scale benchmark study of tools for the classification of protein-coding and non-coding RNAs. Nucleic Acids Res 2022; 50:12094-12111. [PMID: 36420898 PMCID: PMC9757047 DOI: 10.1093/nar/gkac1092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Revised: 10/22/2022] [Accepted: 10/28/2022] [Indexed: 11/27/2022] Open
Abstract
Identification of protein-coding and non-coding transcripts is paramount for understanding their biological roles. Computational approaches have been addressing this task for over a decade; however, generalized and high-performance models are still unreliable. This benchmark study assessed the performance of 24 tools producing >55 models on the datasets covering a wide range of species. We have collected 135 small and large transcriptomic datasets from existing studies for comparison and identified the potential bottlenecks hampering the performance of current tools. The key insights of this study include lack of standardized training sets, reliance on homogeneous training data, gradual changes in annotated data, lack of augmentation with homology searches, the presence of false positives and negatives in datasets and the lower performance of end-to-end deep learning models. We also derived a new dataset, RNAChallenge, from the benchmark considering hard instances that may include potential false alarms. The best and least well performing models under- and overfit the dataset, respectively, thereby serving a dual purpose. For computational approaches, it will be valuable to develop accurate and unbiased models. The identification of false alarms will be of interest for genome annotators, and experimental study of hard RNAs will help to untangle the complexity of the RNA world.
Collapse
Affiliation(s)
- Dalwinder Singh
- To whom correspondence should be addressed. Tel: +91 172 5221206;
| | - Joy Roy
- Correspondence may also be addressed to Joy Roy.
| |
Collapse
|
49
|
Dindhoria K, Monga I, Thind AS. Computational approaches and challenges for identification and annotation of non-coding RNAs using RNA-Seq. Funct Integr Genomics 2022; 22:1105-1112. [DOI: 10.1007/s10142-022-00915-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 11/04/2022] [Accepted: 11/04/2022] [Indexed: 11/22/2022]
|
50
|
Zhang H, Wang Y, Pan Z, Sun X, Mou M, Zhang B, Li Z, Li H, Zhu F. ncRNAInter: a novel strategy based on graph neural network to discover interactions between lncRNA and miRNA. Brief Bioinform 2022; 23:6747810. [PMID: 36198065 DOI: 10.1093/bib/bbac411] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 08/04/2022] [Accepted: 08/23/2022] [Indexed: 12/14/2022] Open
Abstract
In recent years, many studies have illustrated the significant role that non-coding RNA (ncRNA) plays in biological activities, in which lncRNA, miRNA and especially their interactions have been proved to affect many biological processes. Some in silico methods have been proposed and applied to identify novel lncRNA-miRNA interactions (LMIs), but there are still imperfections in their RNA representation and information extraction approaches, which imply there is still room for further improving their performances. Meanwhile, only a few of them are accessible at present, which limits their practical applications. The construction of a new tool for LMI prediction is thus imperative for the better understanding of their relevant biological mechanisms. This study proposed a novel method, ncRNAInter, for LMI prediction. A comprehensive strategy for RNA representation and an optimized deep learning algorithm of graph neural network were utilized in this study. ncRNAInter was robust and showed better performance of 26.7% higher Matthews correlation coefficient than existing reputable methods for human LMI prediction. In addition, ncRNAInter proved its universal applicability in dealing with LMIs from various species and successfully identified novel LMIs associated with various diseases, which further verified its effectiveness and usability. All source code and datasets are freely available at https://github.com/idrblab/ncRNAInter.
Collapse
Affiliation(s)
- Hanyu Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.,Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Yunxia Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Ziqi Pan
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Xiuna Sun
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Minjie Mou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Bing Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Zhaorong Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Honglin Li
- School of Computer Science and Technology, East China Normal University, Shanghai 200062, China.,Shanghai Key Laboratory of New Drug Design, East China University of Science and Technology, Shanghai 200237, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.,Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| |
Collapse
|