1
|
Magnus M, Gao W, Dutta N, Vicens Q, Rivas E. RNAhub-an automated pipeline to search and align RNA homologs with secondary structure assessment. Nucleic Acids Res 2025:gkaf342. [PMID: 40297999 DOI: 10.1093/nar/gkaf342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2025] [Revised: 04/10/2025] [Accepted: 04/16/2025] [Indexed: 04/30/2025] Open
Abstract
The complexity in the generation of RNA multiple sequence alignments and assessment of the accuracy of such alignments contributes to the challenges in the utilization of RNA MSAs in diverse integrative methods. RNAhub is a freely available user-friendly web server for a reliable generation of RNA multiple sequence alignents and the detection of the presence of structural RNA utilizing evolutionary information. This web-based tool, developed by the integration of existing computational approaches, takes an RNA sequence as input and automatically retrieves and aligns sequences homologous to the input (query) RNA sequence through an iterative and structure-agnostic approach. Based on the alignment, this tool statistically assesses whether the query RNA sequence has a conserved RNA structure using covariation analysis. The web server allows the user to efficiently search the sequence of interest against carefully curated, ready-to-use genomic databases to produce an MSA. Using this alignment, our tool either detects the presence of a conserved structural RNA, finds evidence against the presence of a conserved structure, or cannot make any assessment due to a lack of sequence diversity in the alignment. The web server is freely available at https://rnahub.org.
Collapse
Affiliation(s)
- Marcin Magnus
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, United States
| | - William Gao
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, United States
- Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, United States
| | - Nivedita Dutta
- Department of Biology and Biochemistry, Center for Nuclear Receptors and Cell Signaling, University of Houston, Houston, TX 77204, United States
| | - Quentin Vicens
- Department of Biology and Biochemistry, Center for Nuclear Receptors and Cell Signaling, University of Houston, Houston, TX 77204, United States
| | - Elena Rivas
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, United States
| |
Collapse
|
2
|
Magnus M, Gao W, Dutta N, Vicens Q, Rivas E. RNAhub - an automated pipeline to search and align RNA homologs with secondary structure assessment. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.03.11.642701. [PMID: 40161733 PMCID: PMC11952402 DOI: 10.1101/2025.03.11.642701] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/02/2025]
Abstract
The complexity in the generation of RNA multiple sequence alignments and assessment of the accuracy of such alignments contributes to the challenges in the utilization of RNA alignments in diverse integrative methods. RNAhub is a freely available user-friendly web server for a reliable generation of RNA multiple sequence alignments and the detection of the presence of structural RNA utilizing evolutionary information. This web-based tool, developed by the integration of existing computational approaches, takes an RNA sequence as input and automatically retrieves and aligns sequences homologous to the input (query) RNA sequence through an iterative and structure-agnostic approach. Based on the alignment, this tool statistically assesses whether the query RNA sequence has a conserved RNA structure using covariation analysis. The web server allows the user to efficiently search the sequence of interest against carefully curated, ready-to-use genomic databases to produce a multiple sequence alignment. Using this alignment, our tool either detects the presence of a conserved structural RNA, finds evidence against the presence of a conserved structure, or cannot make any assessment due to a lack of sequence diversity in the alignment. The web server is freely available at https://rnahub.org.
Collapse
Affiliation(s)
- Marcin Magnus
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA
| | - William Gao
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA
- Department of Genetics, University of Pennsylvania, Philadelphia, 19104 Pennsylvania, USA
| | - Nivedita Dutta
- Department of Biology and Biochemistry, Center for Nuclear Receptors and Cell Signaling, University of Houston, Houston, TX, 77204, USA
| | - Quentin Vicens
- Department of Biology and Biochemistry, Center for Nuclear Receptors and Cell Signaling, University of Houston, Houston, TX, 77204, USA
| | - Elena Rivas
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA
| |
Collapse
|
3
|
Chaturvedi M, Rashid MA, Paliwal KK. RNA structure prediction using deep learning - A comprehensive review. Comput Biol Med 2025; 188:109845. [PMID: 39983363 DOI: 10.1016/j.compbiomed.2025.109845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2024] [Revised: 02/09/2025] [Accepted: 02/10/2025] [Indexed: 02/23/2025]
Abstract
In computational biology, accurate RNA structure prediction offers several benefits, including facilitating a better understanding of RNA functions and RNA-based drug design. Implementing deep learning techniques for RNA structure prediction has led tremendous progress in this field, resulting in significant improvements in prediction accuracy. This comprehensive review aims to provide an overview of the diverse strategies employed in predicting RNA secondary structures, emphasizing deep learning methods. The article categorizes the discussion into three main dimensions: feature extraction methods, existing state-of-the-art learning model architectures, and prediction approaches. We present a comparative analysis of various techniques and models highlighting their strengths and weaknesses. Finally, we identify gaps in the literature, discuss current challenges, and suggest future approaches to enhance model performance and applicability in RNA structure prediction tasks. This review provides a deeper insight into the subject and paves the way for further progress in this dynamic intersection of life sciences and artificial intelligence.
Collapse
Affiliation(s)
- Mayank Chaturvedi
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD, 4111, Australia.
| | - Mahmood A Rashid
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD, 4111, Australia.
| | - Kuldip K Paliwal
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD, 4111, Australia.
| |
Collapse
|
4
|
Cao X, Zhang Y, Ding Y, Wan Y. Identification of RNA structures and their roles in RNA functions. Nat Rev Mol Cell Biol 2024; 25:784-801. [PMID: 38926530 DOI: 10.1038/s41580-024-00748-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/28/2024] [Indexed: 06/28/2024]
Abstract
The development of high-throughput RNA structure profiling methods in the past decade has greatly facilitated our ability to map and characterize different aspects of RNA structures transcriptome-wide in cell populations, single cells and single molecules. The resulting high-resolution data have provided insights into the static and dynamic nature of RNA structures, revealing their complexity as they perform their respective functions in the cell. In this Review, we discuss recent technical advances in the determination of RNA structures, and the roles of RNA structures in RNA biogenesis and functions, including in transcription, processing, translation, degradation, localization and RNA structure-dependent condensates. We also discuss the current understanding of how RNA structures could guide drug design for treating genetic diseases and battling pathogenic viruses, and highlight existing challenges and future directions in RNA structure research.
Collapse
Affiliation(s)
- Xinang Cao
- Stem Cell and Regenerative Biology, Genome Institute of Singapore, Singapore, Singapore
| | - Yueying Zhang
- Department of Cell and Developmental Biology, John Innes Centre, Norwich, UK
| | - Yiliang Ding
- Department of Cell and Developmental Biology, John Innes Centre, Norwich, UK.
| | - Yue Wan
- Stem Cell and Regenerative Biology, Genome Institute of Singapore, Singapore, Singapore.
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
5
|
Gadekar V, Munk AW, Miladi M, Junge A, Backofen R, Seemann S, Gorodkin J. Clusters of mammalian conserved RNA structures in UTRs associate with RBP binding sites. NAR Genom Bioinform 2024; 6:lqae089. [PMID: 39131818 PMCID: PMC11310781 DOI: 10.1093/nargab/lqae089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 06/26/2024] [Accepted: 07/16/2024] [Indexed: 08/13/2024] Open
Abstract
RNA secondary structures play essential roles in the formation of the tertiary structure and function of a transcript. Recent genome-wide studies highlight significant potential for RNA structures in the mammalian genome. However, a major challenge is assigning functional roles to these structured RNAs. In this study, we conduct a guilt-by-association analysis of clusters of computationally predicted conserved RNA structure (CRSs) in human untranslated regions (UTRs) to associate them with gene functions. We filtered a broad pool of ∼500 000 human CRSs for UTR overlap, resulting in 4734 and 24 754 CRSs from the 5' and 3' UTR of protein-coding genes, respectively. We separately clustered these CRSs for both sets using RNAscClust, obtaining 793 and 2403 clusters, each containing an average of five CRSs per cluster. We identified overrepresented binding sites for 60 and 43 RNA-binding proteins co-localizing with the clustered CRSs. Furthermore, 104 and 441 clusters from the 5' and 3' UTRs, respectively, showed enrichment for various Gene Ontologies, including biological processes such as 'signal transduction', 'nervous system development', molecular functions like 'transferase activity' and the cellular components such as 'synapse' among others. Our study shows that significant functional insights can be gained by clustering RNA structures based on their structural characteristics.
Collapse
Affiliation(s)
- Veerendra P Gadekar
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Ridebanevej 9, 1870 Frederiksberg, Denmark
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, 1870 Frederiksberg, Denmark
- Centre for Integrative Biology and Systems Medicine (IBSE), IIT Madras, Chennai, India
- Robert Bosch Centre for Data Science and Artificial Intelligence (RBCDSAI), IIT Madras, Chennai, India
| | - Alexander Welford Munk
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Ridebanevej 9, 1870 Frederiksberg, Denmark
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, 1870 Frederiksberg, Denmark
| | - Milad Miladi
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg im Breisgau, Germany
| | - Alexander Junge
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Ridebanevej 9, 1870 Frederiksberg, Denmark
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, 1870 Frederiksberg, Denmark
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg im Breisgau, Germany
| | - Stefan E Seemann
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Ridebanevej 9, 1870 Frederiksberg, Denmark
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, 1870 Frederiksberg, Denmark
| | - Jan Gorodkin
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Ridebanevej 9, 1870 Frederiksberg, Denmark
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, 1870 Frederiksberg, Denmark
| |
Collapse
|
6
|
Maity U, Aggarwal R, Balasubramanian R, Venkatraman DL, R Hegde S. Devising Isolation Forest-Based Method to Investigate the sRNAome of Mycobacterium tuberculosis Using sRNA-seq Data. Bioinform Biol Insights 2024; 18:11779322241263674. [PMID: 39091283 PMCID: PMC11292719 DOI: 10.1177/11779322241263674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Accepted: 06/04/2024] [Indexed: 08/04/2024] Open
Abstract
Small non-coding RNAs (sRNAs) regulate the synthesis of virulence factors and other pathogenic traits, which enables the bacteria to survive and proliferate after host infection. While high-throughput sequencing data have proved useful in identifying sRNAs from the intergenic regions (IGRs) of the genome, it remains a challenge to present a complete genome-wide map of the expression of the sRNAs. Moreover, existing methodologies necessitate multiple dependencies for executing their algorithm and also lack a targeted approach for the de novo sRNA identification. We developed an Isolation Forest algorithm-based method and the tool Prediction Of sRNAs using Isolation Forest for the de novo identification of sRNAs from available bacterial sRNA-seq data (http://posif.ibab.ac.in/). Using this framework, we predicted 1120 sRNAs and 46 small proteins in Mycobacterium tuberculosis. Besides, we highlight the context-dependent expression of novel sRNAs, their probable synthesis, and their potential relevance in stress response mechanisms manifested by M. tuberculosis.
Collapse
Affiliation(s)
- Upasana Maity
- Institute of Bioinformatics and Applied Biotechnology, Bengaluru, India
| | - Ritika Aggarwal
- Institute of Bioinformatics and Applied Biotechnology, Bengaluru, India
- Novartis Pharmaceuticals, Hyderabad, India
| | | | | | - Shubhada R Hegde
- Institute of Bioinformatics and Applied Biotechnology, Bengaluru, India
| |
Collapse
|
7
|
Ziesel A, Jabbari H. Unveiling hidden structural patterns in the SARS-CoV-2 genome: Computational insights and comparative analysis. PLoS One 2024; 19:e0298164. [PMID: 38574063 PMCID: PMC10994416 DOI: 10.1371/journal.pone.0298164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Accepted: 01/19/2024] [Indexed: 04/06/2024] Open
Abstract
SARS-CoV-2, the causative agent of COVID-19, is known to exhibit secondary structures in its 5' and 3' untranslated regions, along with the frameshifting stimulatory element situated between ORF1a and 1b. To identify additional regions containing conserved structures, we utilized a multiple sequence alignment with related coronaviruses as a starting point. We applied a computational pipeline developed for identifying non-coding RNA elements. Our pipeline employed three different RNA structural prediction approaches. We identified forty genomic regions likely to harbor structures, with ten of them showing three-way consensus substructure predictions among our predictive utilities. We conducted intracomparisons of the predictive utilities within the pipeline and intercomparisons with four previously published SARS-CoV-2 structural datasets. While there was limited agreement on the precise structure, different approaches seemed to converge on regions likely to contain structures in the viral genome. By comparing and combining various computational approaches, we can predict regions most likely to form structures, as well as a probable structure or ensemble of structures. These predictions can be used to guide surveillance, prophylactic measures, or therapeutic efforts. Data and scripts employed in this study may be found at https://doi.org/10.5281/zenodo.8298680.
Collapse
Affiliation(s)
- Alison Ziesel
- Department of Biomedical Engineering, University of Alberta, Edmonton, Alberta, Canada
| | - Hosna Jabbari
- Department of Biomedical Engineering, University of Alberta, Edmonton, Alberta, Canada
| |
Collapse
|
8
|
Backofen R, Gorodkin J, Hofacker IL, Stadler PF. Comparative RNA Genomics. Methods Mol Biol 2024; 2802:347-393. [PMID: 38819565 DOI: 10.1007/978-1-0716-3838-5_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Over the last quarter of a century it has become clear that RNA is much more than just a boring intermediate in protein expression. Ancient RNAs still appear in the core information metabolism and comprise a surprisingly large component in bacterial gene regulation. A common theme with these types of mostly small RNAs is their reliance of conserved secondary structures. Large-scale sequencing projects, on the other hand, have profoundly changed our understanding of eukaryotic genomes. Pervasively transcribed, they give rise to a plethora of large and evolutionarily extremely flexible non-coding RNAs that exert a vastly diverse array of molecule functions. In this chapter we provide a-necessarily incomplete-overview of the current state of comparative analysis of non-coding RNAs, emphasizing computational approaches as a means to gain a global picture of the modern RNA world.
Collapse
Affiliation(s)
- Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, Germany
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
| | - Jan Gorodkin
- Center for Non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Ivo L Hofacker
- Institute for Theoretical Chemistry, University of Vienna, Wien, Austria
- Bioinformatics and Computational Biology research group, University of Vienna, Vienna, Austria
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, University of Leipzig, Leipzig, Germany.
- Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany.
- Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany.
- Universidad National de Colombia, Bogotá, Colombia.
- Institute for Theoretical Chemistry, University of Vienna, Wien, Austria.
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark.
- Santa Fe Institute, Santa Fe, NM, USA.
| |
Collapse
|
9
|
Tang M, Hwang K, Kang SH. StemP: A Fast and Deterministic Stem-Graph Approach for RNA Secondary Structure Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3278-3291. [PMID: 37028040 DOI: 10.1109/tcbb.2023.3253049] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
We propose a new deterministic methodology to predict the secondary structure of RNA sequences. What information of stem is important for structure prediction, and is it enough ? The proposed simple deterministic algorithm uses minimum stem length, Stem-Loop score, and co-existence of stems, to give good structure predictions for short RNA and tRNA sequences. The main idea is to consider all possible stem with certain stem loop energy and strength to predict RNA secondary structure. We use graph notation, where stems are represented as vertexes, and co-existence between stems as edges. This full Stem-graph presents all possible folding structure, and we pick sub-graph(s) which give the best matching energy for structure prediction. Stem-Loop score adds structure information and speeds up the computation. The proposed method can predict secondary structure even with pseudo knots. One of the strengths of this approach is the simplicity and flexibility of the algorithm, and it gives a deterministic answer. Numerical experiments are done on various sequences from Protein Data Bank and the Gutell Lab using a laptop and results take only a few seconds.
Collapse
|
10
|
Sato K, Hamada M. Recent trends in RNA informatics: a review of machine learning and deep learning for RNA secondary structure prediction and RNA drug discovery. Brief Bioinform 2023; 24:bbad186. [PMID: 37232359 PMCID: PMC10359090 DOI: 10.1093/bib/bbad186] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 04/24/2023] [Accepted: 04/25/2023] [Indexed: 05/27/2023] Open
Abstract
Computational analysis of RNA sequences constitutes a crucial step in the field of RNA biology. As in other domains of the life sciences, the incorporation of artificial intelligence and machine learning techniques into RNA sequence analysis has gained significant traction in recent years. Historically, thermodynamics-based methods were widely employed for the prediction of RNA secondary structures; however, machine learning-based approaches have demonstrated remarkable advancements in recent years, enabling more accurate predictions. Consequently, the precision of sequence analysis pertaining to RNA secondary structures, such as RNA-protein interactions, has also been enhanced, making a substantial contribution to the field of RNA biology. Additionally, artificial intelligence and machine learning are also introducing technical innovations in the analysis of RNA-small molecule interactions for RNA-targeted drug discovery and in the design of RNA aptamers, where RNA serves as its own ligand. This review will highlight recent trends in the prediction of RNA secondary structure, RNA aptamers and RNA drug discovery using machine learning, deep learning and related technologies, and will also discuss potential future avenues in the field of RNA informatics.
Collapse
Affiliation(s)
- Kengo Sato
- School of System Design and Technology, Tokyo Denki University, 5 Senju Asahi-cho, Adachi-ku, Tokyo 120-8551, Japan
| | - Michiaki Hamada
- Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL) , National Institute of Advanced Industrial Science and Technology (AIST), 3-4-1, Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
- Graduate School of Medicine, Nippon Medical School, 1-1-5, Sendagi, Bunkyo-ku, Tokyo 113-8602, Japan
| |
Collapse
|
11
|
Gao W, Yang A, Rivas E. Thirteen dubious ways to detect conserved structural RNAs. IUBMB Life 2023; 75:471-492. [PMID: 36495545 PMCID: PMC11234323 DOI: 10.1002/iub.2694] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Accepted: 10/24/2022] [Indexed: 12/14/2022]
Abstract
Covariation induced by compensatory base substitutions in RNA alignments is a great way to deduce conserved RNA structure, in principle. In practice, success depends on many factors, importantly the quality and depth of the alignment and the choice of covariation statistic. Measuring covariation between pairs of aligned positions is easy. However, using covariation to infer evolutionarily conserved RNA structure is complicated by other extraneous sources of covariation such as that resulting from homologous sequences having evolved from a common ancestor. In order to provide evidence of evolutionarily conserved RNA structure, a method to distinguish covariation due to sources other than RNA structure is necessary. Moreover, there are several sorts of artifactually generated covariation signals that can further confound the analysis. Additionally, some covariation signal is difficult to detect due to incomplete comparative data. Here, we investigate and critically discuss the practice of inferring conserved RNA structure by comparative sequence analysis. We provide new methods on how to approach and decide which of the numerous long non-coding RNAs (lncRNAs) have biologically relevant structures.
Collapse
Affiliation(s)
- William Gao
- Department of Genetics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Ann Yang
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, USA
| | - Elena Rivas
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, USA
| |
Collapse
|
12
|
Random and Natural Non-Coding RNA Have Similar Structural Motif Patterns but Differ in Bulge, Loop, and Bond Counts. Life (Basel) 2023; 13:life13030708. [PMID: 36983865 PMCID: PMC10054693 DOI: 10.3390/life13030708] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Revised: 02/15/2023] [Accepted: 02/27/2023] [Indexed: 03/08/2023] Open
Abstract
An important question in evolutionary biology is whether (and in what ways) genotype–phenotype (GP) map biases can influence evolutionary trajectories. Untangling the relative roles of natural selection and biases (and other factors) in shaping phenotypes can be difficult. Because the RNA secondary structure (SS) can be analyzed in detail mathematically and computationally, is biologically relevant, and a wealth of bioinformatic data are available, it offers a good model system for studying the role of bias. For quite short RNA (length L≤126), it has recently been shown that natural and random RNA types are structurally very similar, suggesting that bias strongly constrains evolutionary dynamics. Here, we extend these results with emphasis on much larger RNA with lengths up to 3000 nucleotides. By examining both abstract shapes and structural motif frequencies (i.e., the number of helices, bonds, bulges, junctions, and loops), we find that large natural and random structures are also very similar, especially when contrasted to typical structures sampled from the spaces of all possible RNA structures. Our motif frequency study yields another result, where the frequencies of different motifs can be used in machine learning algorithms to classify random and natural RNA with high accuracy, especially for longer RNA (e.g., ROC AUC 0.86 for L = 1000). The most important motifs for classification are the number of bulges, loops, and bonds. This finding may be useful in using SS to detect candidates for functional RNA within ‘junk’ DNA regions.
Collapse
|
13
|
Tan XY, Citartan M, Chinni SV, Ahmed SA, Tang TH. Biocomputational Identification of sRNAs in Leptospira interrogans Serovar Lai. Indian J Microbiol 2023; 63:33-41. [PMID: 37188232 PMCID: PMC10172424 DOI: 10.1007/s12088-022-01050-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 12/02/2022] [Indexed: 12/24/2022] Open
Abstract
Regulatory small RNAs (sRNA) are RNA transcripts that are not translated into proteins but act as functional RNAs. Pathogenic Leptospira cause an epidemic spirochaetal zoonosis, Leptospirosis. It is speculated that Leptospiral sRNAs are involved in orchestrating their pathogenicity. In this study, biocomputational approach was adopted to identify Leptospiral sRNAs. In this study, two sRNA prediction programs, i.e., RNAz and nocoRNAc, were employed to screen the reference genome of Leptospira interrogans serovar Lai. Out of 126 predicted sRNAs, there are 96 cis-antisense sRNAs, 28 trans-encoded sRNAs and 2 sRNAs that partially overlap with protein-coding genes in a sense orientation. To determine whether these candidates are expressed in the pathogen, they were compared with the coverage files generated from our RNA-seq datasets. It was found out that 7 predicted sRNAs are expressed in mid-log phase, stationary phase, serum stress, temperature stress and iron stress while 2 sRNAs are expressed in mid-log phase, stationary phase, serum stress, and temperature stress. Besides, their expressions were also confirmed experimentally via RT-PCR. These experimentally validated candidates were also subjected to mRNA target prediction using TargetRNA2. Taken together, our study demonstrated that biocomputational strategy can serve as an alternative or as a complementary strategy to the laborious and expensive deep sequencing methods not only to uncover putative sRNAs but also to predict their targets in bacteria. In fact, this is the first study that integrates computational approach to predict putative sRNAs in L. interrogans serovar Lai. Supplementary Information The online version contains supplementary material available at 10.1007/s12088-022-01050-9.
Collapse
Affiliation(s)
- Xinq Yuan Tan
- Advanced Medical and Dental Institute (AMDI), Universiti Sains Malaysia, Bertam, 13200 Kepala Batas, Penang Malaysia
| | - Marimuthu Citartan
- Advanced Medical and Dental Institute (AMDI), Universiti Sains Malaysia, Bertam, 13200 Kepala Batas, Penang Malaysia
| | - Suresh Venkata Chinni
- Department of Biotechnology, Faculty of Applied Sciences, AIMST University, 08100 Bedong, Kedah Malaysia
| | - Siti Aminah Ahmed
- Advanced Medical and Dental Institute (AMDI), Universiti Sains Malaysia, Bertam, 13200 Kepala Batas, Penang Malaysia
| | - Thean-Hock Tang
- Advanced Medical and Dental Institute (AMDI), Universiti Sains Malaysia, Bertam, 13200 Kepala Batas, Penang Malaysia
| |
Collapse
|
14
|
Kavita K, Breaker RR. Discovering riboswitches: the past and the future. Trends Biochem Sci 2023; 48:119-141. [PMID: 36150954 PMCID: PMC10043782 DOI: 10.1016/j.tibs.2022.08.009] [Citation(s) in RCA: 119] [Impact Index Per Article: 59.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2022] [Revised: 08/18/2022] [Accepted: 08/26/2022] [Indexed: 01/25/2023]
Abstract
Riboswitches are structured noncoding RNA domains used by many bacteria to monitor the concentrations of target ligands and regulate gene expression accordingly. In the past 20 years over 55 distinct classes of natural riboswitches have been discovered that selectively sense small molecules or elemental ions, and thousands more are predicted to exist. Evidence suggests that some riboswitches might be direct descendants of the RNA-based sensors and switches that were likely present in ancient organisms before the evolutionary emergence of proteins. We provide an overview of the current state of riboswitch research, focusing primarily on the discovery of riboswitches, and speculate on the major challenges facing researchers in the field.
Collapse
Affiliation(s)
- Kumari Kavita
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT 06520-8103, USA
| | - Ronald R Breaker
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT 06520-8103, USA; Howard Hughes Medical Institute, Yale University, New Haven, CT 06520-8103, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520-8103, USA.
| |
Collapse
|
15
|
Greco M, Morard R, Darling K, Kucera M. Macroevolutionary patterns in intragenomic rDNA variability among planktonic foraminifera. PeerJ 2023; 11:e15255. [PMID: 37123000 PMCID: PMC10143585 DOI: 10.7717/peerj.15255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 03/28/2023] [Indexed: 05/02/2023] Open
Abstract
Ribosomal intragenomic variability in prokaryotes and eukaryotes is a genomic feature commonly studied for its inflationary impact on molecular diversity assessments. However, the evolutionary mechanisms and distribution of this phenomenon within a microbial group are rarely explored. Here, we investigate the intragenomic variability in 33 species of planktonic foraminifera, calcifying marine protists, by inspecting 2,403 partial SSU sequences obtained from single-cell clone libraries. Our analyses show that polymorphisms are common among planktonic foraminifera species, but the number of polymorphic sites significantly differs among clades. With our molecular simulations, we could assess that most of these mutations are located in paired regions that do not affect the secondary structure of the SSU fragment. Finally, by mapping the number of polymorphic sites on the phylogeny of the clades, we were able to discuss the evolution and potential sources of intragenomic variability in planktonic foraminifera, linking this trait to the distinctive nuclear and genomic dynamics of this microbial group.
Collapse
Affiliation(s)
- Mattia Greco
- Institute of Oceanology, Polish Academy of Sciences, Sopot, Poland
- MARUM-Center for Marine Environmental Sciences, University of Bremen, Bremen, Germany
- Institut de Ciències del Mar (ICM), Consejo Superior de Investigaciones Científicas, Barcelona, Spain
| | - Raphaël Morard
- MARUM-Center for Marine Environmental Sciences, University of Bremen, Bremen, Germany
| | - Kate Darling
- School of Geosciences, University of Edinburgh, Edinburgh, United Kingdom
- Biological and Environmental Sciences, University of Stirling, Stirling, United Kingdom
| | - Michal Kucera
- MARUM-Center for Marine Environmental Sciences, University of Bremen, Bremen, Germany
| |
Collapse
|
16
|
rMSA: a sequence search and alignment algorithm to improve RNA structure modeling. J Mol Biol 2022. [DOI: 10.1016/j.jmb.2022.167904] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
|
17
|
Binet T, Avalle B, Dávila Felipe M, Maffucci I. AptaMat: a matrix-based algorithm to compare single-stranded oligonucleotides secondary structures. Bioinformatics 2022; 39:6849515. [PMID: 36440922 PMCID: PMC9805580 DOI: 10.1093/bioinformatics/btac752] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Revised: 11/14/2022] [Accepted: 11/24/2022] [Indexed: 11/30/2022] Open
Abstract
MOTIVATION Comparing single-stranded nucleic acids (ssNAs) secondary structures is fundamental when investigating their function and evolution and predicting the effect of mutations on their structures. Many comparison metrics exist, although they are either too elaborate or not sensitive enough to distinguish close ssNAs structures. RESULTS In this context, we developed AptaMat, a simple and sensitive algorithm for ssNAs secondary structures comparison based on matrices representing the ssNAs secondary structures and a metric built upon the Manhattan distance in the plane. We applied AptaMat to several examples and compared the results to those obtained by the most frequently used metrics, namely the Hamming distance and the RNAdistance, and by a recently developed image-based approach. We showed that AptaMat is able to discriminate between similar sequences, outperforming all the other here considered metrics. In addition, we showed that AptaMat was able to correctly classify 14 RFAM families within a clustering procedure. AVAILABILITY AND IMPLEMENTATION The python code for AptaMat is available at https://github.com/GEC-git/AptaMat.git. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Thomas Binet
- Université de technologie de Compiègne, UPJV, CNRS, Enzyme and Cell Engineering, Centre de recherche Royallieu, CS 60 319 - 60 203, Compiègne Cedex, France
| | - Bérangère Avalle
- Université de technologie de Compiègne, UPJV, CNRS, Enzyme and Cell Engineering, Centre de recherche Royallieu, CS 60 319 - 60 203, Compiègne Cedex, France
| | | | | |
Collapse
|
18
|
Ono Y, Katayama K, Onuma T, Kubo K, Tsuyuzaki H, Hamada M, Sato M. Structure-based screening for functional non-coding RNAs in fission yeast identifies a factor repressing untimely initiation of sexual differentiation. Nucleic Acids Res 2022; 50:11229-11242. [PMID: 36259651 PMCID: PMC9638895 DOI: 10.1093/nar/gkac825] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 09/06/2022] [Accepted: 09/14/2022] [Indexed: 12/04/2022] Open
Abstract
Non-coding RNAs (ncRNAs) ubiquitously exist in normal and cancer cells. Despite their prevalent distribution, the functions of most long ncRNAs remain uncharacterized. The fission yeast Schizosaccharomyces pombe expresses >1800 ncRNAs annotated to date, but most unconventional ncRNAs (excluding tRNA, rRNA, snRNA and snoRNA) remain uncharacterized. To discover the functional ncRNAs, here we performed a combinatory screening of computational and biological tests. First, all S. pombe ncRNAs were screened in silico for those showing conservation in sequence as well as in secondary structure with ncRNAs in closely related species. Almost a half of the 151 selected conserved ncRNA genes were uncharacterized. Twelve ncRNA genes that did not overlap with protein-coding sequences were next chosen for biological screening that examines defects in growth or sexual differentiation, as well as sensitivities to drugs and stresses. Finally, we highlighted an ncRNA transcribed from SPNCRNA.1669, which inhibited untimely initiation of sexual differentiation. A domain that was predicted as conserved secondary structure by the computational operations was essential for the ncRNA to function. Thus, this study demonstrates that in silico selection focusing on conservation of the secondary structure over species is a powerful method to pinpoint novel functional ncRNAs.
Collapse
Affiliation(s)
- Yu Ono
- Laboratory of Cytoskeletal Logistics, Department of Life Science and Medical Bioscience, School of Advanced Science and Engineering, Waseda University, 2-2 Wakamatsucho, Shinjuku-ku, Tokyo 162-8480, Japan
| | - Kenta Katayama
- Laboratory of Cytoskeletal Logistics, Department of Life Science and Medical Bioscience, School of Advanced Science and Engineering, Waseda University, 2-2 Wakamatsucho, Shinjuku-ku, Tokyo 162-8480, Japan.,Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
| | - Tomoki Onuma
- Laboratory of Cytoskeletal Logistics, Department of Life Science and Medical Bioscience, School of Advanced Science and Engineering, Waseda University, 2-2 Wakamatsucho, Shinjuku-ku, Tokyo 162-8480, Japan
| | - Kento Kubo
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan.,Bioinformatics Laboratory, Department of Electrical Engineering and Bioscience, School of Advanced Science and Engineering, Waseda University, 3-4-1 Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Hayato Tsuyuzaki
- Laboratory of Cytoskeletal Logistics, Department of Life Science and Medical Bioscience, School of Advanced Science and Engineering, Waseda University, 2-2 Wakamatsucho, Shinjuku-ku, Tokyo 162-8480, Japan.,Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
| | - Michiaki Hamada
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan.,Bioinformatics Laboratory, Department of Electrical Engineering and Bioscience, School of Advanced Science and Engineering, Waseda University, 3-4-1 Okubo Shinjuku-ku, Tokyo 169-8555, Japan.,Institute for Medical-oriented Structural Biology, Waseda University, 2-2 Wakamatsucho, Shinjuku-ku, Tokyo 162-8480, Japan
| | - Masamitsu Sato
- Laboratory of Cytoskeletal Logistics, Department of Life Science and Medical Bioscience, School of Advanced Science and Engineering, Waseda University, 2-2 Wakamatsucho, Shinjuku-ku, Tokyo 162-8480, Japan.,Institute for Medical-oriented Structural Biology, Waseda University, 2-2 Wakamatsucho, Shinjuku-ku, Tokyo 162-8480, Japan.,Institute for Advanced Research of Biosystem Dynamics, Waseda Research Institute for Science and Engineering, Graduate School of Advanced Science and Engineering, Waseda University, 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
| |
Collapse
|
19
|
Chen JC, Chen JP, Shen MW, Wornow M, Bae M, Yeh WH, Hsu A, Liu DR. Generating experimentally unrelated target molecule-binding highly functionalized nucleic-acid polymers using machine learning. Nat Commun 2022; 13:4541. [PMID: 35927274 PMCID: PMC9352670 DOI: 10.1038/s41467-022-31955-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 07/11/2022] [Indexed: 11/09/2022] Open
Abstract
In vitro selection queries large combinatorial libraries for sequence-defined polymers with target binding and reaction catalysis activity. While the total sequence space of these libraries can extend beyond 1022 sequences, practical considerations limit starting sequences to ≤~1015 distinct molecules. Selection-induced sequence convergence and limited sequencing depth further constrain experimentally observable sequence space. To address these limitations, we integrate experimental and machine learning approaches to explore regions of sequence space unrelated to experimentally derived variants. We perform in vitro selections to discover highly side-chain-functionalized nucleic acid polymers (HFNAPs) with potent affinities for a target small molecule (daunomycin KD = 5-65 nM). We then use the selection data to train a conditional variational autoencoder (CVAE) machine learning model to generate diverse and unique HFNAP sequences with high daunomycin affinities (KD = 9-26 nM), even though they are unrelated in sequence to experimental polymers. Coupling in vitro selection with a machine learning model thus enables direct generation of active variants, demonstrating a new approach to the discovery of functional biopolymers.
Collapse
Affiliation(s)
- Jonathan C. Chen
- grid.66859.340000 0004 0546 1623Merkin Institute of Transformative Technologies in Healthcare, Broad Institute of Harvard and MIT, Cambridge, MA USA ,grid.38142.3c000000041936754XDepartment of Chemistry and Chemical Biology, Harvard University, Cambridge, MA USA ,grid.38142.3c000000041936754XHoward Hughes Medical Institute, Harvard University, Cambridge, MA USA
| | - Jonathan P. Chen
- grid.512059.aWork conducted at Uber AI Labs, Uber Technologies, Inc., San Francisco, CA USA ,Meta Platforms, Menlo Park, CA USA
| | - Max W. Shen
- grid.66859.340000 0004 0546 1623Merkin Institute of Transformative Technologies in Healthcare, Broad Institute of Harvard and MIT, Cambridge, MA USA ,grid.38142.3c000000041936754XDepartment of Chemistry and Chemical Biology, Harvard University, Cambridge, MA USA ,grid.38142.3c000000041936754XHoward Hughes Medical Institute, Harvard University, Cambridge, MA USA ,grid.116068.80000 0001 2341 2786Computational and Systems Biology Program, Massachusetts Institute of Technology, Cambridge, MA USA
| | - Michael Wornow
- grid.66859.340000 0004 0546 1623Merkin Institute of Transformative Technologies in Healthcare, Broad Institute of Harvard and MIT, Cambridge, MA USA ,grid.38142.3c000000041936754XDepartment of Chemistry and Chemical Biology, Harvard University, Cambridge, MA USA
| | - Minwoo Bae
- grid.66859.340000 0004 0546 1623Merkin Institute of Transformative Technologies in Healthcare, Broad Institute of Harvard and MIT, Cambridge, MA USA ,grid.38142.3c000000041936754XDepartment of Chemistry and Chemical Biology, Harvard University, Cambridge, MA USA
| | - Wei-Hsi Yeh
- grid.66859.340000 0004 0546 1623Merkin Institute of Transformative Technologies in Healthcare, Broad Institute of Harvard and MIT, Cambridge, MA USA ,grid.38142.3c000000041936754XDepartment of Chemistry and Chemical Biology, Harvard University, Cambridge, MA USA ,grid.38142.3c000000041936754XHoward Hughes Medical Institute, Harvard University, Cambridge, MA USA ,grid.38142.3c000000041936754XProgram in Speech and Hearing Bioscience and Technology, Harvard Medical School, Boston, MA USA
| | - Alvin Hsu
- grid.66859.340000 0004 0546 1623Merkin Institute of Transformative Technologies in Healthcare, Broad Institute of Harvard and MIT, Cambridge, MA USA ,grid.38142.3c000000041936754XDepartment of Chemistry and Chemical Biology, Harvard University, Cambridge, MA USA ,grid.38142.3c000000041936754XHoward Hughes Medical Institute, Harvard University, Cambridge, MA USA
| | - David R. Liu
- grid.66859.340000 0004 0546 1623Merkin Institute of Transformative Technologies in Healthcare, Broad Institute of Harvard and MIT, Cambridge, MA USA ,grid.38142.3c000000041936754XDepartment of Chemistry and Chemical Biology, Harvard University, Cambridge, MA USA ,grid.38142.3c000000041936754XHoward Hughes Medical Institute, Harvard University, Cambridge, MA USA
| |
Collapse
|
20
|
Meta-omics approaches reveal unique small RNAs exhibited by the uncultured microorganisms dwelling deep-sea hydrothermal sediment in Guaymas Basin. Arch Microbiol 2022; 204:461. [DOI: 10.1007/s00203-022-03085-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2021] [Revised: 04/08/2022] [Accepted: 06/16/2022] [Indexed: 11/02/2022]
|
21
|
Li S, Lam J, Souliotis L, Alam MT, Constantinidou C. Posttranscriptional Regulation in Response to Different Environmental Stresses in Campylobacter jejuni. Microbiol Spectr 2022; 10:e0020322. [PMID: 35678555 PMCID: PMC9241687 DOI: 10.1128/spectrum.00203-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Accepted: 05/10/2022] [Indexed: 11/20/2022] Open
Abstract
The survival strategies that Campylobacter jejuni (C. jejuni) employ throughout its transmission and infection life cycles remain largely elusive. Specifically, there is a lack of understanding about the posttranscriptional regulation of stress adaptations resulting from small noncoding RNAs (sRNAs). Published C. jejuni sRNAs have been discovered in specific conditions but with limited insights into their biological activities. Many more sRNAs are yet to be discovered as they may be condition-dependent. Here, we have generated transcriptomic data from 21 host- and transmission-relevant conditions. The data uncovered transcription start sites, expression patterns and posttranscriptional regulation during various stress conditions. This data set helped predict a list of putative sRNAs. We further explored the sRNAs' biological functions by integrating differential gene expression analysis, coexpression analysis, and genome-wide sRNA target prediction. The results showed that the C. jejuni gene expression was influenced primarily by nutrient deprivation and food storage conditions. Further exploration revealed a putative sRNA (CjSA21) that targeted tlp1 to 4 under food processing conditions. tlp1 to 4 are transcripts that encode methyl-accepting chemotaxis proteins (MCPs), which are responsible for chemosensing. These results suggested CjSA21 inhibits chemotaxis and promotes survival under food processing conditions. This study presents the broader research community with a comprehensive data set and highlights a novel sRNA as a potential chemotaxis inhibitor. IMPORTANCE The foodborne pathogen C. jejuni is a significant challenge for the global health care system. It is crucial to investigate C. jejuni posttranscriptional regulation by small RNAs (sRNAs) in order to understand how it adapts to different stress conditions. However, limited data are available for investigating sRNA activity under stress. In this study, we generate gene expression data of C. jejuni under 21 stress conditions. Our data analysis indicates that one of the novel sRNAs mediates the adaptation to food processing conditions. Results from our work shed light on the posttranscriptional regulation of C. jejuni and identify an sRNA associated with food safety.
Collapse
Affiliation(s)
- Stephen Li
- Warwick Medical School, University of Warwick, Coventry, United Kingdom
| | - Jenna Lam
- Warwick Medical School, University of Warwick, Coventry, United Kingdom
| | | | - Mohammad Tauqeer Alam
- Department of Biology, College of Science, United Arab Emirates University, Al-Ain, United Arab Emirates
| | | |
Collapse
|
22
|
Tagashira M, Asai K. ConsAlifold: considering RNA structural alignments improves prediction accuracy of RNA consensus secondary structures. Bioinformatics 2022; 38:710-719. [PMID: 34694364 DOI: 10.1093/bioinformatics/btab738] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 08/24/2021] [Accepted: 10/20/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION By detecting homology among RNAs, the probabilistic consideration of RNA structural alignments has improved the prediction accuracy of significant RNA prediction problems. Predicting an RNA consensus secondary structure from an RNA sequence alignment is a fundamental research objective because in the detection of conserved base-pairings among RNA homologs, predicting an RNA consensus secondary structure is more convenient than predicting an RNA structural alignment. RESULTS We developed and implemented ConsAlifold, a dynamic programming-based method that predicts the consensus secondary structure of an RNA sequence alignment. ConsAlifold considers RNA structural alignments. ConsAlifold achieves moderate running time and the best prediction accuracy of RNA consensus secondary structures among available prediction methods. AVAILABILITY AND IMPLEMENTATION ConsAlifold, data and Python scripts for generating both figures and tables are freely available at https://github.com/heartsh/consalifold. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Masaki Tagashira
- Department of Computational Biology and Medical Sciences, University of Tokyo, Chiba 277-8561, Japan.,Artificial Intelligence Research Center, AIST, Tokyo 135-0064, Japan
| | - Kiyoshi Asai
- Department of Computational Biology and Medical Sciences, University of Tokyo, Chiba 277-8561, Japan.,Artificial Intelligence Research Center, AIST, Tokyo 135-0064, Japan
| |
Collapse
|
23
|
Seemann SE, Mirza AH, Bang-Berthelsen CH, Garde C, Christensen-Dalsgaard M, Workman CT, Pociot F, Tommerup N, Gorodkin J, Ruzzo WL. OUP accepted manuscript. Nucleic Acids Res 2022; 50:2452-2463. [PMID: 35188540 PMCID: PMC8934657 DOI: 10.1093/nar/gkac067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 01/07/2022] [Accepted: 01/25/2022] [Indexed: 12/01/2022] Open
Abstract
Accelerated evolution of any portion of the genome is of significant interest, potentially signaling positive selection of phenotypic traits and adaptation. Accelerated evolution remains understudied for structured RNAs, despite the fact that an RNA’s structure is often key to its function. RNA structures are typically characterized by compensatory (structure-preserving) basepair changes that are unexpected given the underlying sequence variation, i.e., they have evolved through negative selection on structure. We address the question of how fast the primary sequence of an RNA can change through evolution while conserving its structure. Specifically, we consider predicted and known structures in vertebrate genomes. After careful control of false discovery rates, we obtain 13 de novo structures (and three known Rfam structures) that we predict to have rapidly evolving sequences—defined as structures where the primary sequences of human and mouse have diverged at least twice as fast (1.5 times for Rfam) as nearby neutrally evolving sequences. Two of the three known structures function in translation inhibition related to infection and immune response. We conclude that rapid sequence divergence does not preclude RNA structure conservation in vertebrates, although these events are relatively rare.
Collapse
Affiliation(s)
| | - Aashiq H Mirza
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Steno Diabetes Center Copenhagen, Gentofte, Denmark
| | - Claus H Bang-Berthelsen
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- National Food Institute, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Christian Garde
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
| | | | - Christopher T Workman
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Center for Biological Sequence Analysis, Technical University of Denmark, Denmark
| | - Flemming Pociot
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Steno Diabetes Center Copenhagen, Gentofte, Denmark
| | - Niels Tommerup
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Department of Cellular and Molecular Medicine (ICMM), University of Copenhagen, Denmark
| | - Jan Gorodkin
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Department of Veterinary and Animal Sciences, University of Copenhagen, Denmark
| | - Walter L Ruzzo
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Computer Science and Engineering and Genome Sciences, University of Washington, USA
- Fred Hutchinson Cancer Research Center, Seattle, USA
| |
Collapse
|
24
|
Zeng C, Takeda A, Sekine K, Osato N, Fukunaga T, Hamada M. Bioinformatics Approaches for Determining the Functional Impact of Repetitive Elements on Non-coding RNAs. Methods Mol Biol 2022; 2509:315-340. [PMID: 35796972 DOI: 10.1007/978-1-0716-2380-0_19] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
With a large number of annotated non-coding RNAs (ncRNAs), repetitive sequences are found to constitute functional components (termed as repetitive elements) in ncRNAs that perform specific biological functions. Bioinformatics analysis is a powerful tool for improving our understanding of the role of repetitive elements in ncRNAs. This chapter summarizes recent findings that reveal the role of repetitive elements in ncRNAs. Furthermore, relevant bioinformatics approaches are systematically reviewed, which promises to provide valuable resources for studying the functional impact of repetitive elements on ncRNAs.
Collapse
Affiliation(s)
- Chao Zeng
- Faculty of Science and Engineering, Waseda University, Tokyo, Japan.
- AIST-Waseda University Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), Tokyo, Japan.
| | - Atsushi Takeda
- Faculty of Science and Engineering, Waseda University, Tokyo, Japan
| | - Kotaro Sekine
- Faculty of Science and Engineering, Waseda University, Tokyo, Japan
| | - Naoki Osato
- Faculty of Science and Engineering, Waseda University, Tokyo, Japan
| | - Tsukasa Fukunaga
- Waseda Institute for Advanced Study, Waseda University, Tokyo, Japan
| | - Michiaki Hamada
- Faculty of Science and Engineering, Waseda University, Tokyo, Japan.
- AIST-Waseda University Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), Tokyo, Japan.
| |
Collapse
|
25
|
Steger G. Predicting the Structure of a Viroid : Structure, Structure Distribution, Consensus Structure, and Structure Drawing. Methods Mol Biol 2022; 2316:331-371. [PMID: 34845705 DOI: 10.1007/978-1-0716-1464-8_26] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Viroids are small non-coding RNAs that require a special sequence and structure to be replicated and transported by the host machinery. Many of these features can be predicted and later experimentally verified. Here, we will present workflows to predict viroid structures and draw the predicted structures in a pleasing and descriptive way using recently developed software.
Collapse
Affiliation(s)
- Gerhard Steger
- Institut für Physikalische Biologie, Heinrich-Heine-Universität Düsseldorf, Düsseldorf, Germany.
| |
Collapse
|
26
|
Zhang C, Forsdyke DR. Potential Achilles heels of SARS-CoV-2 are best displayed by the base order-dependent component of RNA folding energy. Comput Biol Chem 2021; 94:107570. [PMID: 34500325 PMCID: PMC8410225 DOI: 10.1016/j.compbiolchem.2021.107570] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Revised: 08/29/2021] [Accepted: 08/30/2021] [Indexed: 11/29/2022]
Abstract
The base order-dependent component of folding energy has revealed a highly conserved region in HIV-1 genomes that associates with RNA structure. This corresponds to a packaging signal that is recognized by the nucleocapsid domain of the Gag polyprotein. Long viewed as a potential HIV-1 "Achilles heel," the signal can be targeted by a new antiviral compound. Although SARS-CoV-2 differs in many respects from HIV-1, the same technology displays regions with a high base order-dependent folding energy component, which are also highly conserved. This indicates structural invariance (SI) sustained by natural selection. While the regions are often also protein-encoding (e. g. NSP3, ORF3a), we suggest that their nucleic acid level functions can be considered potential "Achilles heels" for SARS-CoV-2, perhaps susceptible to therapies like those envisaged for AIDS. The ribosomal frameshifting element scored well, but higher SI scores were obtained in other regions, including those encoding NSP13 and the nucleocapsid (N) protein.
Collapse
Affiliation(s)
- Chiyu Zhang
- Shanghai Public Health Clinical Center, Fudan University, Shanghai, China
| | - Donald R Forsdyke
- Department of Biomedical and Molecular Sciences, Queen's University, Kingston, Ontario K7L3N6, Canada.
| |
Collapse
|
27
|
Asim MN, Ibrahim MA, Imran Malik M, Dengel A, Ahmed S. Advances in Computational Methodologies for Classification and Sub-Cellular Locality Prediction of Non-Coding RNAs. Int J Mol Sci 2021; 22:8719. [PMID: 34445436 PMCID: PMC8395733 DOI: 10.3390/ijms22168719] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2021] [Revised: 08/02/2021] [Accepted: 08/03/2021] [Indexed: 02/06/2023] Open
Abstract
Apart from protein-coding Ribonucleic acids (RNAs), there exists a variety of non-coding RNAs (ncRNAs) which regulate complex cellular and molecular processes. High-throughput sequencing technologies and bioinformatics approaches have largely promoted the exploration of ncRNAs which revealed their crucial roles in gene regulation, miRNA binding, protein interactions, and splicing. Furthermore, ncRNAs are involved in the development of complicated diseases like cancer. Categorization of ncRNAs is essential to understand the mechanisms of diseases and to develop effective treatments. Sub-cellular localization information of ncRNAs demystifies diverse functionalities of ncRNAs. To date, several computational methodologies have been proposed to precisely identify the class as well as sub-cellular localization patterns of RNAs). This paper discusses different types of ncRNAs, reviews computational approaches proposed in the last 10 years to distinguish coding-RNA from ncRNA, to identify sub-types of ncRNAs such as piwi-associated RNA, micro RNA, long ncRNA, and circular RNA, and to determine sub-cellular localization of distinct ncRNAs and RNAs. Furthermore, it summarizes diverse ncRNA classification and sub-cellular localization determination datasets along with benchmark performance to aid the development and evaluation of novel computational methodologies. It identifies research gaps, heterogeneity, and challenges in the development of computational approaches for RNA sequence analysis. We consider that our expert analysis will assist Artificial Intelligence researchers with knowing state-of-the-art performance, model selection for various tasks on one platform, dominantly used sequence descriptors, neural architectures, and interpreting inter-species and intra-species performance deviation.
Collapse
Affiliation(s)
- Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany
| | - Muhammad Ali Ibrahim
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany
| | - Muhammad Imran Malik
- National Center for Artificial Intelligence (NCAI), National University of Sciences and Technology, Islamabad 44000, Pakistan;
- School of Electrical Engineering & Computer Science, National University of Sciences and Technology, Islamabad 44000, Pakistan
| | - Andreas Dengel
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany
| | - Sheraz Ahmed
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- DeepReader GmbH, Trippstadter Str. 122, 67663 Kaiserslautern, Germany
| |
Collapse
|
28
|
Onoguchi M, Zeng C, Matsumaru A, Hamada M. Binding patterns of RNA-binding proteins to repeat-derived RNA sequences reveal putative functional RNA elements. NAR Genom Bioinform 2021; 3:lqab055. [PMID: 34235430 PMCID: PMC8253551 DOI: 10.1093/nargab/lqab055] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Revised: 05/25/2021] [Accepted: 06/02/2021] [Indexed: 12/21/2022] Open
Abstract
Recent reports have revealed that repeat-derived sequences embedded in introns or long noncoding RNAs (lncRNAs) are targets of RNA-binding proteins (RBPs) and contribute to biological processes such as RNA splicing or transcriptional regulation. These findings suggest that repeat-derived RNAs are important as scaffolds of RBPs and functional elements. However, the overall functional sequences of the repeat-derived RNAs are not fully understood. Here, we show the putative functional repeat-derived RNAs by analyzing the binding patterns of RBPs based on ENCODE eCLIP data. We mapped all eCLIP reads to repeat sequences and observed that 10.75 % and 7.04 % of reads on average were enriched (at least 2-fold over control) in the repeats in K562 and HepG2 cells, respectively. Using these data, we predicted functional RNA elements on the sense and antisense strands of long interspersed element 1 (LINE1) sequences. Furthermore, we found several new sets of RBPs on fragments derived from other transposable element (TE) families. Some of these fragments show specific and stable secondary structures and are found to be inserted into the introns of genes or lncRNAs. These results suggest that the repeat-derived RNA sequences are strong candidates for the functional RNA elements of endogenous noncoding RNAs.
Collapse
Affiliation(s)
- Masahiro Onoguchi
- Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1 Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Chao Zeng
- Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1 Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Ayako Matsumaru
- Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1 Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Michiaki Hamada
- Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1 Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| |
Collapse
|
29
|
Mehta D, Ramesh A. Diversity and prevalence of ANTAR RNAs across actinobacteria. BMC Microbiol 2021; 21:159. [PMID: 34051745 PMCID: PMC8164766 DOI: 10.1186/s12866-021-02234-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2020] [Accepted: 05/18/2021] [Indexed: 11/13/2022] Open
Abstract
Background Computational approaches are often used to predict regulatory RNAs in bacteria, but their success is limited to RNAs that are highly conserved across phyla, in sequence and structure. The ANTAR regulatory system consists of a family of RNAs (the ANTAR-target RNAs) that selectively recruit ANTAR proteins. This protein-RNA complex together regulates genes at the level of translation or transcriptional elongation. Despite the widespread distribution of ANTAR proteins in bacteria, their target RNAs haven’t been identified in certain bacterial phyla such as actinobacteria. Results Here, by using a computational search model that is tuned to actinobacterial genomes, we comprehensively identify ANTAR-target RNAs in actinobacteria. These RNA motifs lie in select transcripts, often overlapping with the ribosome binding site or start codon, to regulate translation. Transcripts harboring ANTAR-target RNAs majorly encode proteins involved in the transport and metabolism of cellular metabolites like sugars, amino acids and ions; or encode transcription factors that in turn regulate diverse genes. Conclusion In this report, we substantially diversify and expand the family of ANTAR RNAs across bacteria. These findings now provide a starting point to investigate the actinobacterial processes that are regulated by ANTAR. Supplementary Information The online version contains supplementary material available at 10.1186/s12866-021-02234-x.
Collapse
Affiliation(s)
- Dolly Mehta
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, GKVK Campus, Bellary Road, Bangalore, 560065, India.,SASTRA University, Tirumalaisamudram, Thanjavur, 613401, India
| | - Arati Ramesh
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, GKVK Campus, Bellary Road, Bangalore, 560065, India.
| |
Collapse
|
30
|
Yang TH, Wang CY, Tsai HC, Liu CT. Human IRES Atlas: an integrative platform for studying IRES-driven translational regulation in humans. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2021; 2021:6263636. [PMID: 33942874 PMCID: PMC8094437 DOI: 10.1093/database/baab025] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Revised: 04/16/2021] [Accepted: 04/23/2021] [Indexed: 11/13/2022]
Abstract
It is now known that cap-independent translation initiation facilitated by internal ribosome entry sites (IRESs) is vital in selective cellular protein synthesis under stress and different physiological conditions. However, three problems make it hard to understand transcriptome-wide cellular IRES-mediated translation initiation mechanisms: (i) complex interplay between IRESs and other translation initiation–related information, (ii) reliability issue of in silico cellular IRES investigation and (iii) labor-intensive in vivo IRES identification. In this research, we constructed the Human IRES Atlas database for a comprehensive understanding of cellular IRESs in humans. First, currently available and suitable IRES prediction tools (IRESfinder, PatSearch and IRESpy) were used to obtain transcriptome-wide human IRESs. Then, we collected eight genres of translation initiation–related features to help study the potential molecular mechanisms of each of the putative IRESs. Three functional tests (conservation, structural RNA–protein scores and conditional translation efficiency) were devised to evaluate the functionality of the identified putative IRESs. Moreover, an easy-to-use interface and an IRES–translation initiation interaction map for each gene transcript were implemented to help understand the interactions between IRESs and translation initiation–related features. Researchers can easily search/browse an IRES of interest using the web interface and deduce testable mechanism hypotheses of human IRES-driven translation initiation based on the integrated results. In summary, Human IRES Atlas integrates putative IRES elements and translation initiation–related experiments for better usage of these data and deduction of mechanism hypotheses. Database URL: http://cobishss0.im.nuk.edu.tw/Human_IRES_Atlas/
Collapse
Affiliation(s)
- Tzu-Hsien Yang
- Department of Information Management, National University of Kaohsiung, 700, Kaohsiung University Rd., Nanzih District, Kaohsiung, Taiwan 811, Republic of China
| | - Chung-Yu Wang
- Department of Information Management, National University of Kaohsiung, 700, Kaohsiung University Rd., Nanzih District, Kaohsiung, Taiwan 811, Republic of China
| | - Hsiu-Chun Tsai
- Department of Information Management, National University of Kaohsiung, 700, Kaohsiung University Rd., Nanzih District, Kaohsiung, Taiwan 811, Republic of China
| | - Cheng-Tse Liu
- Department of Information Management, National University of Kaohsiung, 700, Kaohsiung University Rd., Nanzih District, Kaohsiung, Taiwan 811, Republic of China
| |
Collapse
|
31
|
Neutralism versus selectionism: Chargaff's second parity rule, revisited. Genetica 2021; 149:81-88. [PMID: 33880685 PMCID: PMC8057000 DOI: 10.1007/s10709-021-00119-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Accepted: 04/09/2021] [Indexed: 11/03/2022]
Abstract
Of Chargaff's four "rules" on DNA base frequencies, the functional interpretation of his second parity rule (PR2) is the most contentious. Thermophile base compositions (GC%) were taken by Galtier and Lobry (1997) as favoring Sueoka's neutral PR2 hypothesis over Forsdyke's selective PR2 hypothesis, namely that mutations improving local within-species recombination efficiency had generated a genome-wide potential for the strands of duplex DNA to separate and initiate recombination through the "kissing" of the tips of stem-loops. However, following Chargaff's GC rule, base composition mainly reflects a species-specific, genome-wide, evolutionary pressure. GC% could not have consistently followed the dictates of temperature, since it plays fundamental roles in both sustaining species integrity and, through primarily neutral genome-wide mutation, fostering speciation. Evidence for a local within-species recombination-initiating role of base order was obtained with a novel technology that masked the contribution of base composition to nucleic acid folding energy. Forsdyke's results were consistent with his PR2 hypothesis, appeared to resolve some root problems in biology and provided a theoretical underpinning for alignment-free taxonomic analyses using relative oligonucleotide frequencies (k-mer analysis). Moreover, consistent with Chargaff's cluster rule, discovery of the thermoadaptive role of the "purine-loading" of open reading frames made less tenable the Galtier-Lobry anti-selectionist arguments.
Collapse
|
32
|
Wang X, Yang Y, Liu J, Wang G. The stacking strategy-based hybrid framework for identifying non-coding RNAs. Brief Bioinform 2021; 22:6165004. [PMID: 33693454 DOI: 10.1093/bib/bbab023] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Revised: 01/16/2021] [Indexed: 12/12/2022] Open
Abstract
With the development of next-generation sequencing technology, a large number of transcripts need to be analyzed, and it has been a challenge to distinguish non-coding ribonucleic acid (RNAs) (ncRNAs) from coding RNAs. And for non-model organisms, due to the lack of transcriptional data, many existing methods cannot identify them. Therefore, in addition to using deoxyribonucleic acid-based and RNA-based features, we also proposed a hybrid framework based on the stacking strategy to identify ncRNAs, and we innovatively added eight features based on predicted peptides. The proposed framework was based on stacking two-layer classifier which combined random forest (RF), LightGBM, XGBoost and logistic regression (LR) models. We used this framework to build two types of models. For cross-species ncRNAs identification model, we tested it on six different species: human, mouse, zebrafish, fruit fly, worm and Arabidopsis. Compared with other tools, our model was the best in datasets of Arabidopsis, worm and zebrafish with the accuracy of 98.36%, 99.65% and 94.12%. For performance metrics analysis, the datasets of the six species were considered as a whole set, and the sensitivity, accuracy, precision and F1 values of our model were the best. For the plant-specific ncRNAs identification model, the average values of the six metrics of the two experiments were all greater than 95%, which demonstrated it can be used to identify ncRNAs in plants. The above indicates that the hybrid framework we designed is universal between animals and plants and has significant advantages in the identification of cross-species ncRNAs.
Collapse
Affiliation(s)
- Xin Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yang Yang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Jian Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Guohua Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
33
|
Hou L, Xie J, Wu Y, Wang J, Duan A, Ao Y, Liu X, Yu X, Yan H, Perreault J, Li S. Identification of 11 candidate structured noncoding RNA motifs in humans by comparative genomics. BMC Genomics 2021; 22:164. [PMID: 33750298 PMCID: PMC7941889 DOI: 10.1186/s12864-021-07474-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Accepted: 02/24/2021] [Indexed: 11/12/2022] Open
Abstract
Background Only 1.5% of the human genome encodes proteins, while large part of the remaining encodes noncoding RNAs (ncRNA). Many ncRNAs form structures and perform many important functions. Accurately identifying structured ncRNAs in the human genome and discovering their biological functions remain a major challenge. Results Here, we have established a pipeline (CM-line) with the following features for analyzing the large genomes of humans and other animals. First, we selected species with larger genetic distances to facilitate the discovery of covariations and compatible mutations. Second, we used CMfinder, which can generate useful alignments even with low sequence conservation. Third, we removed repetitive sequences and known structured ncRNAs to reduce the workload of CMfinder. Fourth, we used Infernal to find more representatives and refine the structure. We reported 11 classes of structured ncRNA candidates with significant covariations in humans. Functional analysis showed that these ncRNAs may have variable functions. Some may regulate circadian clock genes through poly (A) signals (PAS); some may regulate the elongation factor (EEF1A) and the T-cell receptor signaling pathway by cooperating with RNA binding proteins. Conclusions By searching for important features of RNA structure from large genomes, the CM-line has revealed the existence of a variety of novel structured ncRNAs. Functional analysis suggests that some newly discovered ncRNA motifs may have biological functions. The pipeline we have established for the discovery of structured ncRNAs and the identification of their functions can also be applied to analyze other large genomes. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07474-9.
Collapse
Affiliation(s)
- Lijuan Hou
- Medical School, Molecular Medicine Engineering and Research Center of Ministry of Education, Key Laboratory of Precision Medicine and Molecular Diagnosis of Fujian Universities, Institute of Genomics, School of Biomedical Sciences, Huaqiao University, Xiamen, 361021, P. R. China
| | - Jin Xie
- Medical School, Molecular Medicine Engineering and Research Center of Ministry of Education, Key Laboratory of Precision Medicine and Molecular Diagnosis of Fujian Universities, Institute of Genomics, School of Biomedical Sciences, Huaqiao University, Xiamen, 361021, P. R. China
| | - Yaoyao Wu
- Medical School, Molecular Medicine Engineering and Research Center of Ministry of Education, Key Laboratory of Precision Medicine and Molecular Diagnosis of Fujian Universities, Institute of Genomics, School of Biomedical Sciences, Huaqiao University, Xiamen, 361021, P. R. China
| | - Jiaojiao Wang
- Medical School, Molecular Medicine Engineering and Research Center of Ministry of Education, Key Laboratory of Precision Medicine and Molecular Diagnosis of Fujian Universities, Institute of Genomics, School of Biomedical Sciences, Huaqiao University, Xiamen, 361021, P. R. China
| | - Anqi Duan
- Medical School, Molecular Medicine Engineering and Research Center of Ministry of Education, Key Laboratory of Precision Medicine and Molecular Diagnosis of Fujian Universities, Institute of Genomics, School of Biomedical Sciences, Huaqiao University, Xiamen, 361021, P. R. China
| | - Yaqi Ao
- Medical School, Molecular Medicine Engineering and Research Center of Ministry of Education, Key Laboratory of Precision Medicine and Molecular Diagnosis of Fujian Universities, Institute of Genomics, School of Biomedical Sciences, Huaqiao University, Xiamen, 361021, P. R. China
| | - Xuejiao Liu
- Medical School, Molecular Medicine Engineering and Research Center of Ministry of Education, Key Laboratory of Precision Medicine and Molecular Diagnosis of Fujian Universities, Institute of Genomics, School of Biomedical Sciences, Huaqiao University, Xiamen, 361021, P. R. China
| | - Xinmei Yu
- Medical School, Molecular Medicine Engineering and Research Center of Ministry of Education, Key Laboratory of Precision Medicine and Molecular Diagnosis of Fujian Universities, Institute of Genomics, School of Biomedical Sciences, Huaqiao University, Xiamen, 361021, P. R. China
| | - Hui Yan
- Medical School, Molecular Medicine Engineering and Research Center of Ministry of Education, Key Laboratory of Precision Medicine and Molecular Diagnosis of Fujian Universities, Institute of Genomics, School of Biomedical Sciences, Huaqiao University, Xiamen, 361021, P. R. China
| | - Jonathan Perreault
- INRS - Institut Armand-Frappier, 531 boul des Prairies, Laval, Québec, H7V1B7, Canada
| | - Sanshu Li
- Medical School, Molecular Medicine Engineering and Research Center of Ministry of Education, Key Laboratory of Precision Medicine and Molecular Diagnosis of Fujian Universities, Institute of Genomics, School of Biomedical Sciences, Huaqiao University, Xiamen, 361021, P. R. China.
| |
Collapse
|
34
|
RNA Secondary Structures with Limited Base Pair Span: Exact Backtracking and an Application. Genes (Basel) 2020; 12:genes12010014. [PMID: 33374382 PMCID: PMC7823788 DOI: 10.3390/genes12010014] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Revised: 12/18/2020] [Accepted: 12/21/2020] [Indexed: 11/24/2022] Open
Abstract
The accuracy of RNA secondary structure prediction decreases with the span of a base pair, i.e., the number of nucleotides that it encloses. The dynamic programming algorithms for RNA folding can be easily specialized in order to consider only base pairs with a limited span L, reducing the memory requirements to O(nL), and further to O(n) by interleaving backtracking. However, the latter is an approximation that precludes the retrieval of the globally optimal structure. So far, the ViennaRNA package therefore does not provide a tool for computing optimal, span-restricted minimum energy structure. Here, we report on an efficient backtracking algorithm that reconstructs the globally optimal structure from the locally optimal fragments that are produced by the interleaved backtracking implemented in RNALfold. An implementation is integrated into the ViennaRNA package. The forward and the backtracking recursions of RNALfold are both easily constrained to structural components with a sufficiently negative z-scores. This provides a convenient method in order to identify hyper-stable structural elements. A screen of the C. elegans genome shows that such features are more abundant in real genomic sequences when compared to a di-nucleotide shuffled background model.
Collapse
|
35
|
Kumar K, Chakraborty A, Chakrabarti S. PresRAT: a server for identification of bacterial small-RNA sequences and their targets with probable binding region. RNA Biol 2020; 18:1152-1159. [PMID: 33103602 DOI: 10.1080/15476286.2020.1836455] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open
Abstract
Bacterial small-RNA (sRNA) sequences are functional RNAs, which play an important role in regulating the expression of a diverse class of genes. It is thus critical to identify such sRNA sequences and their probable mRNA targets. Here, we discuss new procedures to identify and characterize sRNA and their targets via the introduction of an integrated online platform 'PresRAT'. PresRAT uses the primary and secondary structural attributes of sRNA sequences to predict sRNA from a given sequence or bacterial genome. PresRAT also finds probable target mRNAs of sRNA sequences from a given bacterial chromosome and further concentrates on the identification of the probable sRNA-mRNA binding regions. Using PresRAT, we have identified a total of 66,209 potential sRNA sequences from 292 bacterial genomes and 2247 potential targets from 13 bacterial genomes. We have also implemented a protocol to build and refine 3D models of sRNA and sRNA-mRNA duplex regions and generated 3D models of 50 known sRNAs and 81 sRNA-mRNA duplexes using this platform. Along with the server part, PresRAT also contains a database section, which enlists the predicted sRNA sequences, sRNA targets, and their corresponding 3D models with structural dynamics information.
Collapse
Affiliation(s)
- Krishna Kumar
- Structural Biology and Bioinformatics Division, CSIR-Indian Institute of Chemical Biology, Kolkata, India
| | - Abhijit Chakraborty
- Division of Vaccine-Discovery, La Jolla Institute for Immunology, San Diego, California, USA
| | - Saikat Chakrabarti
- Structural Biology and Bioinformatics Division, CSIR-Indian Institute of Chemical Biology, Kolkata, India
| |
Collapse
|
36
|
Zhao C, Zhang D, Jiang Y, Chen SJ. Modeling Loop Composition and Ion Concentration Effects in RNA Hairpin Folding Stability. Biophys J 2020; 119:1439-1455. [PMID: 32949490 PMCID: PMC7568001 DOI: 10.1016/j.bpj.2020.07.042] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2020] [Revised: 06/12/2020] [Accepted: 07/08/2020] [Indexed: 12/21/2022] Open
Abstract
The ability to accurately predict RNA hairpin structure and stability for different loop sequences and salt conditions is important for understanding, modeling, and designing larger RNA folds. However, traditional RNA secondary structure models cannot treat loop-sequence and ionic effects on RNA hairpin folding. Here, we describe a general, three-dimensional (3D) conformation-based computational method for modeling salt concentration-dependent conformational distributions and the detailed 3D structures for a set of three RNA hairpins that contain a variable, 15-nucleotide loop sequence. For a given RNA sequence, the new, to our knowledge, method integrates a Vfold2D two-dimensional structure folding model with IsRNA coarse-grained molecular dynamics 3D folding simulations and Monte Carlo tightly bound ion estimations of ion-mediated electrostatic interactions. The model predicts free-energy landscapes for the different RNA hairpin-forming sequences with variable salt conditions. The theoretically predicted results agree with the experimental fluorescence measurements, validating the strategy. Furthermore, the theoretical model goes beyond the experimental results by enabling in-depth 3D structural analysis, revealing energetic mechanisms for the sequence- and salt-dependent folding stability. Although the computational framework presented here is developed for RNA hairpin systems, the general method may be applied to investigate other RNA systems, such as multiway junctions or pseudoknots in mixed metal ion solutions.
Collapse
Affiliation(s)
- Chenhan Zhao
- Department of Physics, Department of Biochemistry, and Institute for Data Science and Informatics, University of Missouri, Columbia, Missouri
| | - Dong Zhang
- Department of Physics, Department of Biochemistry, and Institute for Data Science and Informatics, University of Missouri, Columbia, Missouri
| | - Yangwei Jiang
- Department of Physics, Department of Biochemistry, and Institute for Data Science and Informatics, University of Missouri, Columbia, Missouri
| | - Shi-Jie Chen
- Department of Physics, Department of Biochemistry, and Institute for Data Science and Informatics, University of Missouri, Columbia, Missouri.
| |
Collapse
|
37
|
Corona-Gomez JA, Garcia-Lopez IJ, Stadler PF, Fernandez-Valverde SL. Splicing conservation signals in plant long noncoding RNAs. RNA (NEW YORK, N.Y.) 2020; 26:784-793. [PMID: 32241834 PMCID: PMC7297117 DOI: 10.1261/rna.074393.119] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Accepted: 03/28/2020] [Indexed: 05/12/2023]
Abstract
Long noncoding RNAs (lncRNAs) have recently emerged as prominent regulators of gene expression in eukaryotes. LncRNAs often drive the modification and maintenance of gene activation or gene silencing states via chromatin conformation rearrangements. In plants, lncRNAs have been shown to participate in gene regulation, and are essential to processes such as vernalization and photomorphogenesis. Despite their prominent functions, only over a dozen lncRNAs have been experimentally and functionally characterized. Similar to its animal counterparts, the rates of sequence divergence are much higher in plant lncRNAs than in protein coding mRNAs, making it difficult to identify lncRNA conservation using traditional sequence comparison methods. Beyond this, little is known about the evolutionary patterns of lncRNAs in plants. Here, we characterized the splicing conservation of lncRNAs in Brassicaceae. We generated a whole-genome alignment of 16 Brassica species and used it to identify synthenic lncRNA orthologs. Using a scoring system trained on transcriptomes from A. thaliana and B. oleracea, we identified splice sites across the whole alignment and measured their conservation. Our analysis revealed that 17.9% (112/627) of all intergenic lncRNAs display splicing conservation in at least one exon, an estimate that is substantially higher than previous estimates of lncRNA conservation in this group. Our findings agree with similar studies in vertebrates, demonstrating that splicing conservation can be evidence of stabilizing selection. We provide conclusive evidence for the existence of evolutionary deeply conserved lncRNAs in plants and describe a generally applicable computational workflow to identify functional lncRNAs in plants.
Collapse
Affiliation(s)
| | | | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, University Leipzig, D-04107 Leipzig, Germany
- Interdisciplinary Center for Bioinformatics, University Leipzig, D-04107 Leipzig, Germany
- Max Planck Institute for Mathematics in the Sciences, D-04103 Leipzig, Germany
- Department of Theoretical Chemistry, University of Vienna, A-1090 Wien, Austria
- Facultad de Ciencias, Universidad Nacional de Colombia, 11001 Sede Bogotá, Colombia
- Santa Fe Institute, Santa Fe, New Mexico 87501, USA
| | | |
Collapse
|
38
|
Cagliani R, Forni D, Clerici M, Sironi M. Coding potential and sequence conservation of SARS-CoV-2 and related animal viruses. INFECTION GENETICS AND EVOLUTION 2020; 83:104353. [PMID: 32387562 PMCID: PMC7199688 DOI: 10.1016/j.meegid.2020.104353] [Citation(s) in RCA: 66] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Revised: 04/14/2020] [Accepted: 05/02/2020] [Indexed: 12/13/2022]
Abstract
In December 2019, a novel human-infecting coronavirus (SARS-CoV-2) was recognized in China. In a few months, SARS-CoV-2 has caused thousands of disease cases and deaths in several countries. Phylogenetic analyses indicated that SARS-CoV-2 clusters with SARS-CoV in the Sarbecovirus subgenus and viruses related to SARS-CoV-2 were identified from bats and pangolins. Coronaviruses have long and complex genomes with high plasticity in terms of gene content. To date, the coding potential of SARS-CoV-2 remains partially unknown. We thus used available sequences of bat and pangolin viruses to determine the selective events that shaped the genome structure of SARS-CoV-2 and to assess its coding potential. By searching for signals of significantly reduced variability at synonymous sites (dS), we identified six genomic regions, one of these corresponding to the programmed −1 ribosomal frameshift. The most prominent signal of dS reduction was observed within the E gene. A genome-wide analysis of conserved RNA structures indicated that this region harbors a putative functional RNA element that is shared with the SARS-CoV lineage. Additional signals of reduced dS indicated the presence of internal ORFs. Whereas the presence ORF9a (internal to N) was previously proposed by homology with a well characterized protein of SARS-CoV, ORF3h (for hypothetical, within ORF3a) was not previously described. The predicted product of ORF3h has 90% identity with the corresponding predicted product of SARS-CoV and displays features suggestive of a viroporin. Finally, analysis of the putative ORF10 revealed high dN/dS (3.82) in SARS-CoV-2 and related coronaviruses. In the SARS-CoV lineage, the ORF is predicted to encode a truncated protein and is neutrally evolving. These data suggest that ORF10 encodes a functional protein in SARS-CoV-2 and that positive selection is driving its evolution. Experimental analyses will be necessary to validate and characterize the coding and non-coding functional elements we identified. We analyzed the coding region of SARS-CoV-2 and related bat/pangolin viruses. We identified six regions of significantly low variability at sysnonymous sites. One of these corresponds to a conserved RNA structure shared with the SARS-CoV lineage. The dS reduction within ORF3a corresponds to a potential ORF encoding a viroporin. In SARS-CoV-2 and related viruses, the putative 3′ terminal ORF10 has high dN/dS.
Collapse
Affiliation(s)
- Rachele Cagliani
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, Bosisio Parini, Italy
| | - Diego Forni
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, Bosisio Parini, Italy
| | - Mario Clerici
- Department of Physiopathology and Transplantation, University of Milan, Milan, Italy; Don C. Gnocchi Foundation ONLUS, IRCCS, Milan, Italy
| | - Manuela Sironi
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, Bosisio Parini, Italy.
| |
Collapse
|
39
|
Suksamran R, Saithong T, Thammarongtham C, Kalapanulak S. Genomic and Transcriptomic Analysis Identified Novel Putative Cassava lncRNAs Involved in Cold and Drought Stress. Genes (Basel) 2020; 11:E366. [PMID: 32231066 PMCID: PMC7230406 DOI: 10.3390/genes11040366] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2020] [Revised: 03/23/2020] [Accepted: 03/24/2020] [Indexed: 01/09/2023] Open
Abstract
Long non-coding RNAs (lncRNAs) play important roles in the regulation of complex cellular processes, including transcriptional and post-transcriptional regulation of gene expression relevant for development and stress response, among others. Compared to other important crops, there is limited knowledge of cassava lncRNAs and their roles in abiotic stress adaptation. In this study, we performed a genome-wide study of ncRNAs in cassava, integrating genomics- and transcriptomics-based approaches. In total, 56,840 putative ncRNAs were identified, and approximately half the number were verified using expression data or previously known ncRNAs. Among these were 2229 potential novel lncRNA transcripts with unmatched sequences, 250 of which were differentially expressed in cold or drought conditions, relative to controls. We showed that lncRNAs might be involved in post-transcriptional regulation of stress-induced transcription factors (TFs) such as zinc-finger, WRKY, and nuclear factor Y gene families. These findings deepened our knowledge of cassava lncRNAs and shed light on their stress-responsive roles.
Collapse
Affiliation(s)
- Rungaroon Suksamran
- Biotechnology Program, School of Bioresources and Technology, King Mongkut's University of Technology Thonburi (Bang KhunThian), Bangkok 10150, Thailand
| | - Treenut Saithong
- Bioinformatics and Systems Biology Program, School of Bioresources and Technology, King Mongkut's University of Technology Thonburi (Bang KhunThian), Bangkok 10150, Thailand
- Center for Agricultural Systems Biology, Systems Biology and Bioinformatics Research Group, Pilot Plant Development and Training Institute, King Mongkut's University of Technology Thonburi (Bang KhunThian), Bangkok 10150, Thailand
| | - Chinae Thammarongtham
- Biochemical Engineering and Systems Biology Research Group, National Center for Genetic Engineering and Biotechnology at King Mongkut's University of Technology Thonburi (Bang KhunThian), Bangkok 10150, Thailand
| | - Saowalak Kalapanulak
- Bioinformatics and Systems Biology Program, School of Bioresources and Technology, King Mongkut's University of Technology Thonburi (Bang KhunThian), Bangkok 10150, Thailand
- Center for Agricultural Systems Biology, Systems Biology and Bioinformatics Research Group, Pilot Plant Development and Training Institute, King Mongkut's University of Technology Thonburi (Bang KhunThian), Bangkok 10150, Thailand
| |
Collapse
|
40
|
Adams PP, Storz G. Prevalence of small base-pairing RNAs derived from diverse genomic loci. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2020; 1863:194524. [PMID: 32147527 DOI: 10.1016/j.bbagrm.2020.194524] [Citation(s) in RCA: 53] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/27/2019] [Revised: 03/03/2020] [Accepted: 03/03/2020] [Indexed: 12/21/2022]
Abstract
Small RNAs (sRNAs) that act by base-pairing have been shown to play important roles in fine-tuning the levels and translation of their target transcripts across a variety of model and pathogenic organisms. Work from many different groups in a wide range of bacterial species has provided evidence for the importance and complexity of sRNA regulatory networks, which allow bacteria to quickly respond to changes in their environment. However, despite the expansive literature, much remains to be learned about all aspects of sRNA-mediated regulation, particularly in bacteria beyond the well-characterized Escherichia coli and Salmonella enterica species. Here we discuss what is known, and what remains to be learned, about the identification of regulatory base-pairing RNAs produced from diverse genomic loci including how their expression is regulated. This article is part of a Special Issue entitled: RNA and gene control in bacteria edited by Dr. M. Guillier and F. Repoila.
Collapse
Affiliation(s)
- Philip P Adams
- Division of Molecular and Cellular Biology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, MD 20892-5430, USA; Postdoctoral Research Associate Program, National Institute of General Medical Sciences, National Institutes of Health, Bethesda, MD 20892-6200, USA.
| | - Gisela Storz
- Division of Molecular and Cellular Biology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, MD 20892-5430, USA
| |
Collapse
|
41
|
Chen CC, Qian X, Yoon BJ. RNAdetect: efficient computational detection of novel non-coding RNAs. Bioinformatics 2020; 35:1133-1141. [PMID: 30169792 DOI: 10.1093/bioinformatics/bty765] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2017] [Revised: 07/30/2018] [Accepted: 08/30/2018] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Non-coding RNAs (ncRNAs) are known to play crucial roles in various biological processes, and there is a pressing need for accurate computational detection methods that could be used to efficiently scan genomes to detect novel ncRNAs. However, unlike coding genes, ncRNAs often lack distinctive sequence features that could be used for recognizing them. Although many ncRNAs are known to have a well conserved secondary structure, which provides useful cues for computational prediction, it has been also shown that a structure-based approach alone may not be sufficient for detecting ncRNAs in a single sequence. Currently, the most effective ncRNA detection methods combine structure-based techniques with a comparative genome analysis approach to improve the prediction performance. RESULTS In this paper, we propose RNAdetect, a computational method incorporating novel features for accurate detection of ncRNAs in combination with comparative genome analysis. Given a sequence alignment, RNAdetect can accurately detect the presence of functional ncRNAs by incorporating novel predictive features based on the concept of generalized ensemble defect (GED), which assesses the degree of structure conservation across multiple related sequences and the conformation of the individual folding structures to a common consensus structure. Furthermore, n-gram models (NGMs) are used to extract features that can effectively capture sequence homology to known ncRNA families. Utilization of NGMs can enhance the detection of ncRNAs that have sparse folding structures with many unpaired bases. Extensive performance evaluation based on the Rfam database and bacterial genomes demonstrate that RNAdetect can accurately and reliably detect novel ncRNAs, outperforming the current state-of-the-art methods. AVAILABILITY AND IMPLEMENTATION The source code for RNAdetect and the benchmark data used in this paper can be downloaded at https://github.com/bjyoontamu/RNAdetect.
Collapse
Affiliation(s)
- Chun-Chi Chen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA.,TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, TX, USA
| | - Xiaoning Qian
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA.,TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, TX, USA
| | - Byung-Jun Yoon
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA.,TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, TX, USA
| |
Collapse
|
42
|
Haning K, Engels SM, Williams P, Arnold M, Contreras LM. Applying a New REFINE Approach in Zymomonas mobilis Identifies Novel sRNAs That Confer Improved Stress Tolerance Phenotypes. Front Microbiol 2020; 10:2987. [PMID: 31998271 PMCID: PMC6970203 DOI: 10.3389/fmicb.2019.02987] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Accepted: 12/10/2019] [Indexed: 12/19/2022] Open
Abstract
As global controllers of gene expression, small RNAs represent powerful tools for engineering complex phenotypes. However, a general challenge prevents the more widespread use of sRNA engineering strategies: mechanistic analysis of these regulators in bacteria lags far behind their high-throughput search and discovery. This makes it difficult to understand how to efficiently identify useful sRNAs to engineer a phenotype of interest. To help address this, we developed a forward systems approach to identify naturally occurring sRNAs relevant to a desired phenotype: RNA-seq Examiner for Phenotype-Informed Network Engineering (REFINE). This pipeline uses existing RNA-seq datasets under different growth conditions. It filters the total transcriptome to locate and rank regulatory-RNA-containing regions that can influence a metabolic phenotype of interest, without the need for previous mechanistic characterization. Application of this approach led to the uncovering of six novel sRNAs related to ethanol tolerance in non-model ethanol-producing bacterium Zymomonas mobilis. Furthermore, upon overexpressing multiple sRNA candidates predicted by REFINE, we demonstrate improved ethanol tolerance reflected by up to an approximately twofold increase in relative growth rate compared to controls not expressing these sRNAs in 7% ethanol (v/v) RMG-supplemented media. In this way, the REFINE approach informs strain-engineering strategies that we expect are applicable for general strain engineering.
Collapse
Affiliation(s)
- Katie Haning
- McKetta Department of Chemical Engineering, The University of Texas at Austin, Austin, TX, United States
| | - Sean M. Engels
- McKetta Department of Chemical Engineering, The University of Texas at Austin, Austin, TX, United States
| | - Paige Williams
- Department of Aerospace Engineering & Engineering Mechanics, The University of Texas at Austin, Austin, TX, United States
| | - Margaret Arnold
- Department of Computer Science and Engineering, School of Engineering and Applied Sciences, University at Buffalo, Buffalo, NY, United States
| | - Lydia M. Contreras
- McKetta Department of Chemical Engineering, The University of Texas at Austin, Austin, TX, United States
- Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX, United States
| |
Collapse
|
43
|
Hecker N, Hiller M. A genome alignment of 120 mammals highlights ultraconserved element variability and placenta-associated enhancers. Gigascience 2020; 9:giz159. [PMID: 31899510 PMCID: PMC6941714 DOI: 10.1093/gigascience/giz159] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2019] [Revised: 11/29/2019] [Accepted: 12/13/2019] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND Multiple alignments of mammalian genomes have been the basis of many comparative genomic studies aiming at annotating genes, detecting regions under evolutionary constraint, and studying genome evolution. A key factor that affects the power of comparative analyses is the number of species included in a genome alignment. RESULTS To utilize the increased number of sequenced genomes and to provide an accessible resource for genomic studies, we generated a mammalian genome alignment comprising 120 species. We used this alignment and the CESAR method to provide protein-coding gene annotations for 119 non-human mammals. Furthermore, we illustrate the utility of this alignment by 2 exemplary analyses. First, we quantified how variable ultraconserved elements (UCEs) are among placental mammals. Leveraging the high taxonomic coverage in our alignment, we estimate that UCEs contain on average 4.7%-15.6% variable alignment columns. Furthermore, we show that the center regions of UCEs are generally most constrained. Second, we identified enhancer sequences that are only conserved in placental mammals. We found that these enhancers are significantly associated with placenta-related genes, suggesting that some of these enhancers may be involved in the evolution of placental mammal-specific aspects of the placenta. CONCLUSION The 120-mammal alignment and all other data are available for analysis and visualization in a genome browser at https://genome-public.pks.mpg.de/and for download at https://bds.mpi-cbg.de/hillerlab/120MammalAlignment/.
Collapse
Affiliation(s)
- Nikolai Hecker
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstr. 108, 01307 Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, Noethnitzer Str. 38, 01187 Dresden, Germany
- Center for Systems Biology Dresden, Pfotenhauerstr. 108, 01307 Dresden, Germany
| | - Michael Hiller
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstr. 108, 01307 Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, Noethnitzer Str. 38, 01187 Dresden, Germany
- Center for Systems Biology Dresden, Pfotenhauerstr. 108, 01307 Dresden, Germany
| |
Collapse
|
44
|
Miladi M, Sokhoyan E, Houwaart T, Heyne S, Costa F, Grüning B, Backofen R. GraphClust2: Annotation and discovery of structured RNAs with scalable and accessible integrative clustering. Gigascience 2019; 8:giz150. [PMID: 31808801 PMCID: PMC6897289 DOI: 10.1093/gigascience/giz150] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2019] [Revised: 08/23/2019] [Accepted: 11/20/2019] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND RNA plays essential roles in all known forms of life. Clustering RNA sequences with common sequence and structure is an essential step towards studying RNA function. With the advent of high-throughput sequencing techniques, experimental and genomic data are expanding to complement the predictive methods. However, the existing methods do not effectively utilize and cope with the immense amount of data becoming available. RESULTS Hundreds of thousands of non-coding RNAs have been detected; however, their annotation is lagging behind. Here we present GraphClust2, a comprehensive approach for scalable clustering of RNAs based on sequence and structural similarities. GraphClust2 bridges the gap between high-throughput sequencing and structural RNA analysis and provides an integrative solution by incorporating diverse experimental and genomic data in an accessible manner via the Galaxy framework. GraphClust2 can efficiently cluster and annotate large datasets of RNAs and supports structure-probing data. We demonstrate that the annotation performance of clustering functional RNAs can be considerably improved. Furthermore, an off-the-shelf procedure is introduced for identifying locally conserved structure candidates in long RNAs. We suggest the presence and the sparseness of phylogenetically conserved local structures for a collection of long non-coding RNAs. CONCLUSIONS By clustering data from 2 cross-linking immunoprecipitation experiments, we demonstrate the benefits of GraphClust2 for motif discovery under the presence of biological and methodological biases. Finally, we uncover prominent targets of double-stranded RNA binding protein Roquin-1, such as BCOR's 3' untranslated region that contains multiple binding stem-loops that are evolutionary conserved.
Collapse
Affiliation(s)
- Milad Miladi
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
| | - Eteri Sokhoyan
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
| | - Torsten Houwaart
- Institute of Medical Microbiology and Hospital Hygiene, University of Dusseldorf, Universitaetsstr. 1, 40225 Dusseldorf, Germany
| | - Steffen Heyne
- Max Planck Institute of Immunobiology and Epigenetics, Freiburg, Stuebeweg 51, 79108 Freiburg, Germany
| | - Fabrizio Costa
- Department of Computer Science, University of Exeter, North Park Road, EX4 4QF Exeter, UK
| | - Björn Grüning
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
- ZBSA Centre for Biological Systems Analysis, University of Freiburg, Hauptstr. 1, 79104 Freiburg, Germany
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
- ZBSA Centre for Biological Systems Analysis, University of Freiburg, Hauptstr. 1, 79104 Freiburg, Germany
- Signalling Research Centres BIOSS and CIBSS, University of Freiburg, Schaenzlestr. 18, 79104 Freiburg, Germany
| |
Collapse
|
45
|
Crum M, Ram-Mohan N, Meyer MM. Regulatory context drives conservation of glycine riboswitch aptamers. PLoS Comput Biol 2019; 15:e1007564. [PMID: 31860665 PMCID: PMC6944388 DOI: 10.1371/journal.pcbi.1007564] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2019] [Revised: 01/06/2020] [Accepted: 11/25/2019] [Indexed: 12/13/2022] Open
Abstract
In comparison to protein coding sequences, the impact of mutation and natural selection on the sequence and function of non-coding (ncRNA) genes is not well understood. Many ncRNA genes are narrowly distributed to only a few organisms, and appear to be rapidly evolving. Compared to protein coding sequences, there are many challenges associated with assessment of ncRNAs that are not well addressed by conventional phylogenetic approaches, including: short sequence length, lack of primary sequence conservation, and the importance of secondary structure for biological function. Riboswitches are structured ncRNAs that directly interact with small molecules to regulate gene expression in bacteria. They typically consist of a ligand-binding domain (aptamer) whose folding changes drive changes in gene expression. The glycine riboswitch is among the most well-studied due to the widespread occurrence of a tandem aptamer arrangement (tandem), wherein two homologous aptamers interact with glycine and each other to regulate gene expression. However, a significant proportion of glycine riboswitches are comprised of single aptamers (singleton). Here we use graph clustering to circumvent the limitations of traditional phylogenetic analysis when studying the relationship between the tandem and singleton glycine aptamers. Graph clustering enables a broader range of pairwise comparison measures to be used to assess aptamer similarity. Using this approach, we show that one aptamer of the tandem glycine riboswitch pair is typically much more highly conserved, and that which aptamer is conserved depends on the regulated gene. Furthermore, our analysis also reveals that singleton aptamers are more similar to either the first or second tandem aptamer, again based on the regulated gene. Taken together, our findings suggest that tandem glycine riboswitches degrade into functional singletons, with the regulated gene(s) dictating which glycine-binding aptamer is conserved.
Collapse
Affiliation(s)
- Matt Crum
- Department of Biology, Boston College, Chestnut Hill, Massachusetts, United States of America
| | - Nikhil Ram-Mohan
- Department of Biology, Boston College, Chestnut Hill, Massachusetts, United States of America
| | - Michelle M. Meyer
- Department of Biology, Boston College, Chestnut Hill, Massachusetts, United States of America
| |
Collapse
|
46
|
Nowick K, Walter Costa MB, Höner Zu Siederdissen C, Stadler PF. Selection Pressures on RNA Sequences and Structures. Evol Bioinform Online 2019; 15:1176934319871919. [PMID: 31496634 PMCID: PMC6716170 DOI: 10.1177/1176934319871919] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2019] [Accepted: 07/29/2019] [Indexed: 12/31/2022] Open
Abstract
With the discovery of increasingly more functional noncoding RNAs (ncRNAs), it becomes eminent to more strongly consider them as important players during species evolution. Although tests for negative selection of ncRNAs already exist since the beginning of this century, the SSS-test is the first one for also investigating positive selection. When analyzing selection in ncRNAs, it should be taken into account that selection pressures can independently act on sequence and structure. We applied the SSS-test to explore the evolution of ncRNAs in primates and identified more than 100 long noncoding RNAs (lncRNAs) that might evolve under positive selection in humans. With this test, it is now possible to more thoroughly include ncRNAs into evolutionary studies.
Collapse
Affiliation(s)
- Katja Nowick
- Human Biology Group, Institute for Biology, Department of Biology, Chemistry, Pharmacy, Freie Universität Berlin, Berlin, Germany
| | | | - Christian Höner Zu Siederdissen
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Leipzig, Germany
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Leipzig, Germany.,Max Planck Institute for Mathematics in the Science, Leipzig, Germany.,Department of Theoretical Chemistry, Universität Wien, Wien, Austria.,Faculdad de Ciencias, Universidad Nacional de Colombia, Bogotá, Colombia.,Santa Fe Institute, Santa Fe, NM, USA
| |
Collapse
|
47
|
Braun J, Fischer S, Xu ZZ, Sun H, Ghoneim DH, Gimbel AT, Plessmann U, Urlaub H, Mathews DH, Weigand JE. Identification of new high affinity targets for Roquin based on structural conservation. Nucleic Acids Res 2019; 46:12109-12125. [PMID: 30295819 PMCID: PMC6294493 DOI: 10.1093/nar/gky908] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2018] [Accepted: 10/05/2018] [Indexed: 12/13/2022] Open
Abstract
Post-transcriptional gene regulation controls the amount of protein produced from a specific mRNA by altering both its decay and translation rates. Such regulation is primarily achieved by the interaction of trans-acting factors with cis-regulatory elements in the untranslated regions (UTRs) of mRNAs. These interactions are guided either by sequence- or structure-based recognition. Similar to sequence conservation, the evolutionary conservation of a UTR’s structure thus reflects its functional importance. We used such structural conservation to identify previously unknown cis-regulatory elements. Using the RNA folding program Dynalign, we scanned all UTRs of humans and mice for conserved structures. Characterizing a subset of putative conserved structures revealed a binding site of the RNA-binding protein Roquin. Detailed functional characterization in vivo enabled us to redefine the binding preferences of Roquin and identify new target genes. Many of these new targets are unrelated to the established role of Roquin in inflammation and immune responses and thus highlight additional, unstudied cellular functions of this important repressor. Moreover, the expression of several Roquin targets is highly cell-type-specific. In consequence, these targets are difficult to detect using methods dependent on mRNA abundance, yet easily detectable with our unbiased strategy.
Collapse
Affiliation(s)
- Johannes Braun
- Department of Biology, Technische Universität Darmstadt, Darmstadt 64287, Germany
| | - Sandra Fischer
- Department of Biology, Technische Universität Darmstadt, Darmstadt 64287, Germany
| | - Zhenjiang Z Xu
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 14642, USA
| | - Hongying Sun
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 14642, USA
| | - Dalia H Ghoneim
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 14642, USA
| | - Anna T Gimbel
- Department of Biology, Technische Universität Darmstadt, Darmstadt 64287, Germany
| | - Uwe Plessmann
- Biophysical Mass Spectrometry Group, Max Planck Institute for Biophysical Chemistry, Göttingen 37077, Germany
| | - Henning Urlaub
- Biophysical Mass Spectrometry Group, Max Planck Institute for Biophysical Chemistry, Göttingen 37077, Germany.,Bioanalytics, Institute for Clinical Chemistry, University Medical Center, 37073 Göttingen, Germany
| | - David H Mathews
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 14642, USA
| | - Julia E Weigand
- Department of Biology, Technische Universität Darmstadt, Darmstadt 64287, Germany
| |
Collapse
|
48
|
Patiño-Galindo JÁ, González-Candelas F, Pybus OG. The Effect of RNA Substitution Models on Viroid and RNA Virus Phylogenies. Genome Biol Evol 2019; 10:657-666. [PMID: 29325030 PMCID: PMC5814974 DOI: 10.1093/gbe/evx273] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/08/2018] [Indexed: 12/16/2022] Open
Abstract
Many viroids and RNA viruses have genomes that exhibit secondary structure, with paired nucleotides forming stems and loops. Such structures violate a key assumption of most methods of phylogenetic reconstruction, that sequence change is independent among sites. However, phylogenetic analyses of these transmissible agents rarely use evolutionary models that account for RNA secondary structure. Here, we assess the effect of using RNA-specific nucleotide substitution models on the phylogenetic inference of viroids and RNA viruses. We obtained data sets comprising full-genome nucleotide sequences from six viroid and ten single-stranded RNA virus species. For each alignment, we inferred consensus RNA secondary structures, then evaluated different DNA and RNA substitution models. We used model selection to choose the best-fitting model and evaluate estimated Bayesian phylogenies. Further, for each data set we generated and compared Robinson–Foulds (RF) statistics in order to test whether the distributions of trees generated under alternative models are notably different to each other. In all alignments, the best-fitting model was one that considers RNA secondary structure: RNA models that allow a nonzero rate of double substitution (RNA16A and RNA16C) fitted best for both viral and viroid data sets. In 14 of 16 data sets, the use of an RNA-specific model led to significantly longer tree lengths, but only in three cases did it have a significant effect on RFs. In conclusion, using RNA model when undertaking phylogenetic inference of viroids and RNA viruses can provide a better model fit than standard approaches and model choice can significantly affect branch length estimates.
Collapse
Affiliation(s)
- Juan Ángel Patiño-Galindo
- Unidad Mixta Infección y Salud Pública FISABIO-Salud Púbica/Universitat de València-I2SysBio, València, Spain.,CIBER Epidemiología y Salud Pública, València, Spain
| | - Fernando González-Candelas
- Unidad Mixta Infección y Salud Pública FISABIO-Salud Púbica/Universitat de València-I2SysBio, València, Spain.,CIBER Epidemiología y Salud Pública, València, Spain
| | - Oliver G Pybus
- Department of Zoology, University of Oxford, United Kingdom
| |
Collapse
|
49
|
Faiza M, Tanveer K, Fatihi S, Wang Y, Raza K. Comprehensive Overview and Assessment of microRNA Target Prediction Tools in Homo sapiens and Drosophila melanogaster. Curr Bioinform 2019. [DOI: 10.2174/1574893614666190103101033] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Background:
MicroRNAs (miRNAs) are small non-coding RNAs that control gene expression
at the post-transcriptional level through complementary base pairing with the target
mRNA, leading to mRNA degradation and blocking translation process. Many dysfunctions of
these small regulatory molecules have been linked to the development and progression of several
diseases. Therefore, it is necessary to reliably predict potential miRNA targets.
Objective:
A large number of computational prediction tools have been developed which provide a
faster way to find putative miRNA targets, but at the same time, their results are often inconsistent.
Hence, finding a reliable, functional miRNA target is still a challenging task. Also, each tool is
equipped with different algorithms, and it is difficult for the biologists to know which tool is the
best choice for their study.
Methods:
We analyzed eleven miRNA target predictors on Drosophila melanogaster and Homo
sapiens by applying significant empirical methods to evaluate and assess their accuracy and performance
using experimentally validated high confident mature miRNAs and their targets. In addition,
this paper also describes miRNA target prediction algorithms, and discusses common features
of frequently used target prediction tools.
Results:
The results show that MicroT, microRNA and CoMir are the best performing tool on
Drosopihla melanogaster; while TargetScan and miRmap perform well for Homo sapiens. The
predicted results of each tool were combined in order to improve the performance in both the datasets,
but any significant improvement is not observed in terms of true positives.
Conclusion:
The currently available miRNA target prediction tools greatly suffer from a large
number of false positives. Therefore, computational prediction of significant targets with high statistical
confidence is still an open challenge.
Collapse
Affiliation(s)
- Muniba Faiza
- School of Food Science and Engineering, South China University of Technology, Guangzhou, 510640, China
| | - Khushnuma Tanveer
- Department of Computer Science, Jamia Millia Islamia, New Delhi-110025, India
| | - Saman Fatihi
- Department of Computer Science, Jamia Millia Islamia, New Delhi-110025, India
| | - Yonghua Wang
- School of Food Science and Engineering, South China University of Technology, Guangzhou, 510640, China
| | - Khalid Raza
- Department of Computer Science, Jamia Millia Islamia, New Delhi-110025, India
| |
Collapse
|
50
|
Emamjomeh A, Zahiri J, Asadian M, Behmanesh M, Fakheri BA, Mahdevar G. Identification, Prediction and Data Analysis of Noncoding RNAs: A Review. Med Chem 2019; 15:216-230. [PMID: 30484409 DOI: 10.2174/1573406414666181015151610] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2017] [Revised: 06/03/2018] [Accepted: 09/30/2018] [Indexed: 12/13/2022]
Abstract
BACKGROUND Noncoding RNAs (ncRNAs) which play an important role in various cellular processes are important in medicine as well as in drug design strategies. Different studies have shown that ncRNAs are dis-regulated in cancer cells and play an important role in human tumorigenesis. Therefore, it is important to identify and predict such molecules by experimental and computational methods, respectively. However, to avoid expensive experimental methods, computational algorithms have been developed for accurately and fast prediction of ncRNAs. OBJECTIVE The aim of this review was to introduce the experimental and computational methods to identify and predict ncRNAs structure. Also, we explained the ncRNA's roles in cellular processes and drugs design, briefly. METHOD In this survey, we will introduce ncRNAs and their roles in biological and medicinal processes. Then, some important laboratory techniques will be studied to identify ncRNAs. Finally, the state-of-the-art models and algorithms will be introduced along with important tools and databases. RESULTS The results showed that the integration of experimental and computational approaches improves to identify ncRNAs. Moreover, the high accurate databases, algorithms and tools were compared to predict the ncRNAs. CONCLUSION ncRNAs prediction is an exciting research field, but there are different difficulties. It requires accurate and reliable algorithms and tools. Also, it should be mentioned that computational costs of such algorithm including running time and usage memory are very important. Finally, some suggestions were presented to improve computational methods of ncRNAs gene and structural prediction.
Collapse
Affiliation(s)
- Abbasali Emamjomeh
- Laboratory of Computational Biotechnology and Bioinformatics (CBB), Department of Plant Breeding and Biotechnology (PBB), University of Zabol, Zabol, Iran
| | - Javad Zahiri
- Bioinformatics and Computational Omics Lab (BioCOOL), Department of Biophysics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran
| | - Mehrdad Asadian
- Department of Plant Breeding and Biotechnology (PBB), Faculty of Agriculture, University of Zabol, Zabol, Iran
| | - Mehrdad Behmanesh
- Department of Genetics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran
| | - Barat A Fakheri
- Department of Plant Breeding and Biotechnology (PBB), Faculty of Agriculture, University of Zabol, Zabol, Iran
| | - Ghasem Mahdevar
- Department of Mathematics, Faculty of Sciences, University of Isfahan, Isfahan, Iran
| |
Collapse
|