1
|
Voß B. Classified Dynamic Programming in RNA Structure Analysis. Methods Mol Biol 2024; 2726:125-141. [PMID: 38780730 DOI: 10.1007/978-1-0716-3519-3_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2024]
Abstract
Analysis of the folding space of RNA generally suffers from its exponential size. With classified Dynamic Programming algorithms, it is possible to alleviate this burden and to analyse the folding space of RNA in great depth. Key to classified DP is that the search space is partitioned into classes based on an on-the-fly computed feature. A class-wise evaluation is then used to compute class-wide properties, such as the lowest free energy structure for each class, or aggregate properties, such as the class' probability. In this paper we describe the well-known shape and hishape abstraction of RNA structures, their power to help better understand RNA function and related methods that are based on these abstractions.
Collapse
Affiliation(s)
- Björn Voß
- RNA Biology and Bioinformatics, Institute of Biomedical Genetics, University of Stuttgart, Stuttgart, Germany
| |
Collapse
|
2
|
Tieng FYF, Abdullah-Zawawi MR, Md Shahri NAA, Mohamed-Hussein ZA, Lee LH, Mutalib NSA. A Hitchhiker's guide to RNA-RNA structure and interaction prediction tools. Brief Bioinform 2023; 25:bbad421. [PMID: 38040490 PMCID: PMC10753535 DOI: 10.1093/bib/bbad421] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 10/16/2023] [Accepted: 10/26/2023] [Indexed: 12/03/2023] Open
Abstract
RNA biology has risen to prominence after a remarkable discovery of diverse functions of noncoding RNA (ncRNA). Most untranslated transcripts often exert their regulatory functions into RNA-RNA complexes via base pairing with complementary sequences in other RNAs. An interplay between RNAs is essential, as it possesses various functional roles in human cells, including genetic translation, RNA splicing, editing, ribosomal RNA maturation, RNA degradation and the regulation of metabolic pathways/riboswitches. Moreover, the pervasive transcription of the human genome allows for the discovery of novel genomic functions via RNA interactome investigation. The advancement of experimental procedures has resulted in an explosion of documented data, necessitating the development of efficient and precise computational tools and algorithms. This review provides an extensive update on RNA-RNA interaction (RRI) analysis via thermodynamic- and comparative-based RNA secondary structure prediction (RSP) and RNA-RNA interaction prediction (RIP) tools and their general functions. We also highlighted the current knowledge of RRIs and the limitations of RNA interactome mapping via experimental data. Then, the gap between RSP and RIP, the importance of RNA homologues, the relationship between pseudoknots, and RNA folding thermodynamics are discussed. It is hoped that these emerging prediction tools will deepen the understanding of RNA-associated interactions in human diseases and hasten treatment processes.
Collapse
Affiliation(s)
- Francis Yew Fu Tieng
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia (UKM), Kuala Lumpur 56000, Malaysia
| | | | - Nur Alyaa Afifah Md Shahri
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia (UKM), Kuala Lumpur 56000, Malaysia
| | - Zeti-Azura Mohamed-Hussein
- Institute of Systems Biology (INBIOSIS), UKM, Selangor 43600, Malaysia
- Department of Applied Physics, Faculty of Science and Technology, UKM, Selangor 43600, Malaysia
| | - Learn-Han Lee
- Sunway Microbiomics Centre, School of Medical and Life Sciences, Sunway University, Sunway City 47500, Malaysia
- Novel Bacteria and Drug Discovery Research Group, Microbiome and Bioresource Research Strength, Jeffrey Cheah School of Medicine and Health Sciences, Monash University of Malaysia, Selangor 47500, Malaysia
| | - Nurul-Syakima Ab Mutalib
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia (UKM), Kuala Lumpur 56000, Malaysia
- Novel Bacteria and Drug Discovery Research Group, Microbiome and Bioresource Research Strength, Jeffrey Cheah School of Medicine and Health Sciences, Monash University of Malaysia, Selangor 47500, Malaysia
- Faculty of Health Sciences, UKM, Kuala Lumpur 50300, Malaysia
| |
Collapse
|
3
|
Dupont MJ, Major F. D-ORB: A Web Server to Extract Structural Features of Related But Unaligned RNA Sequences. J Mol Biol 2023; 435:168181. [PMID: 37468182 DOI: 10.1016/j.jmb.2023.168181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2022] [Revised: 06/02/2023] [Accepted: 06/06/2023] [Indexed: 07/21/2023]
Abstract
Identifying the common structural elements of functionally related RNA sequences (family) is usually based on an alignment of the sequences, which is often subject to human bias and may not be accurate. The resulting covariance model (CM) provides probabilities for each base to covary with another, which allows to support evolutionarily the formation of double helical regions and possibly pseudoknots. The coexistence of alternative folds in RNA, resulting from its dynamic nature, may lead to the potential omission of motifs by CM. To overcome this limitation, we present D-ORB, a system of algorithms that identifies overrepresented motifs in the secondary conformational landscapes of a family when compared to those of unrelated sequences. The algorithms are bundled into an easy-to-use website allowing users to submit a family, and optionally provide unrelated sequences. D-ORB produces a non-pseudoknotted secondary structure based on the overrepresented motifs, a deep neural network classifier and two decision trees. When used to model an Rfam family, D-ORB fits overrepresented motifs in the corresponding Rfam structure; more than a hundred Rfam families have been modeled. The statistical approach behind D-ORB derives the structural composition of an RNA family, making it a valuable tool for analyzing and modeling it. Its easy-to-use interface and advanced algorithms make it an essential resource for researchers studying RNA structure. D-ORB is available at https://d-orb.major.iric.ca/.
Collapse
Affiliation(s)
- Mathieu J Dupont
- Department of Computer Science and Operations Research, and the Institute for Research in Immunology and Cancer, Université de Montréal, Montreal, Quebec H3C 3J7, Canada
| | - François Major
- Department of Computer Science and Operations Research, and the Institute for Research in Immunology and Cancer, Université de Montréal, Montreal, Quebec H3C 3J7, Canada. https://twitter.com/francois_major
| |
Collapse
|
4
|
Hollar A, Bursey H, Jabbari H. Pseudoknots in RNA Structure Prediction. Curr Protoc 2023; 3:e661. [PMID: 36779804 DOI: 10.1002/cpz1.661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/14/2023]
Abstract
RNA molecules play active roles in the cell and are important for numerous applications in biotechnology and medicine. The function of an RNA molecule stems from its structure. RNA structure determination is time consuming, challenging, and expensive using experimental methods. Thus, much research has been directed at RNA structure prediction through computational means. Many of these methods focus primarily on the secondary structure of the molecule, ignoring the possibility of pseudoknotted structures. However, pseudoknots are known to play functional roles in many RNA molecules or in their method of interaction with other molecules. Improving the accuracy and efficiency of computational methods that predict pseudoknots is an ongoing challenge for single RNA molecules, RNA-RNA interactions, and RNA-protein interactions. To improve the accuracy of prediction, many methods focus on specific applications while restricting the length and the class of the pseudoknotted structures they can identify. In recent years, computational methods for structure prediction have begun to catch up with the impressive developments seen in biotechnology. Here, we provide a non-comprehensive overview of available pseudoknot prediction methods and their best-use cases. © 2023 Wiley Periodicals LLC.
Collapse
Affiliation(s)
- Andrew Hollar
- Department of Computer Science, University of Victoria, Victoria, Canada
| | - Hunter Bursey
- Department of Computer Science, University of Victoria, Victoria, Canada
| | - Hosna Jabbari
- Department of Computer Science, University of Victoria, Victoria, Canada
| |
Collapse
|
5
|
Yuan S, Gong Y, Wang G, Zhang B, Liu Y, Zhang H. MSFF-CDCGAN: A novel method to predict RNA secondary structure based on Generative Adversarial Network. Methods 2022; 204:368-375. [DOI: 10.1016/j.ymeth.2022.04.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Revised: 04/07/2022] [Accepted: 04/11/2022] [Indexed: 11/25/2022] Open
|
6
|
Steger G. Predicting the Structure of a Viroid : Structure, Structure Distribution, Consensus Structure, and Structure Drawing. Methods Mol Biol 2022; 2316:331-371. [PMID: 34845705 DOI: 10.1007/978-1-0716-1464-8_26] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Viroids are small non-coding RNAs that require a special sequence and structure to be replicated and transported by the host machinery. Many of these features can be predicted and later experimentally verified. Here, we will present workflows to predict viroid structures and draw the predicted structures in a pleasing and descriptive way using recently developed software.
Collapse
Affiliation(s)
- Gerhard Steger
- Institut für Physikalische Biologie, Heinrich-Heine-Universität Düsseldorf, Düsseldorf, Germany.
| |
Collapse
|
7
|
Bossanyi MA, Carpentier V, Glouzon JPS, Ouangraoua A, Anselmetti Y. aliFreeFoldMulti: alignment-free method to predict secondary structures of multiple RNA homologs. NAR Genom Bioinform 2020; 2:lqaa086. [PMID: 33575631 PMCID: PMC7671329 DOI: 10.1093/nargab/lqaa086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Accepted: 10/19/2020] [Indexed: 11/18/2022] Open
Abstract
Predicting RNA structure is crucial for understanding RNA’s mechanism of action. Comparative approaches for the prediction of RNA structures can be classified into four main strategies. The three first—align-and-fold, align-then-fold and fold-then-align—exploit multiple sequence alignments to improve the accuracy of conserved RNA-structure prediction. Align-and-fold methods perform generally better, but are also typically slower than the other alignment-based methods. The fourth strategy—alignment-free—consists in predicting the conserved RNA structure without relying on sequence alignment. This strategy has the advantage of being the faster, while predicting accurate structures through the use of latent representations of the candidate structures for each sequence. This paper presents aliFreeFoldMulti, an extension of the aliFreeFold algorithm. This algorithm predicts a representative secondary structure of multiple RNA homologs by using a vector representation of their suboptimal structures. aliFreeFoldMulti improves on aliFreeFold by additionally computing the conserved structure for each sequence. aliFreeFoldMulti is assessed by comparing its prediction performance and time efficiency with a set of leading RNA-structure prediction methods. aliFreeFoldMulti has the lowest computing times and the highest maximum accuracy scores. It achieves comparable average structure prediction accuracy as other methods, except TurboFoldII which is the best in terms of average accuracy but with the highest computing times. We present aliFreeFoldMulti as an illustration of the potential of alignment-free approaches to provide fast and accurate RNA-structure prediction methods.
Collapse
Affiliation(s)
- Marc-André Bossanyi
- CoBIUS lab, Department of Computer Science, University of Sherbrooke, 2500 Boulevard de l’Université, Sherbrooke, QC J1K 2R1, Canada
| | - Valentin Carpentier
- CoBIUS lab, Department of Computer Science, University of Sherbrooke, 2500 Boulevard de l’Université, Sherbrooke, QC J1K 2R1, Canada
| | - Jean-Pierre S Glouzon
- CoBIUS lab, Department of Computer Science, University of Sherbrooke, 2500 Boulevard de l’Université, Sherbrooke, QC J1K 2R1, Canada
| | - Aïda Ouangraoua
- CoBIUS lab, Department of Computer Science, University of Sherbrooke, 2500 Boulevard de l’Université, Sherbrooke, QC J1K 2R1, Canada
| | - Yoann Anselmetti
- CoBIUS lab, Department of Computer Science, University of Sherbrooke, 2500 Boulevard de l’Université, Sherbrooke, QC J1K 2R1, Canada
| |
Collapse
|
8
|
RNAdemocracy: an ensemble method for RNA secondary structure prediction using consensus scoring. Comput Biol Chem 2019; 83:107151. [DOI: 10.1016/j.compbiolchem.2019.107151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2018] [Revised: 06/05/2019] [Accepted: 10/15/2019] [Indexed: 11/18/2022]
|
9
|
Glouzon JPS, Ouangraoua A. aliFreeFold: an alignment-free approach to predict secondary structure from homologous RNA sequences. Bioinformatics 2019; 34:i70-i78. [PMID: 29949960 PMCID: PMC6022685 DOI: 10.1093/bioinformatics/bty234] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Motivation Predicting the conserved secondary structure of homologous ribonucleic acid (RNA) sequences is crucial for understanding RNA functions. However, fast and accurate RNA structure prediction is challenging, especially when the number and the divergence of homologous RNA increases. To address this challenge, we propose aliFreeFold, based on a novel alignment-free approach which computes a representative structure from a set of homologous RNA sequences using sub-optimal secondary structures generated for each sequence. It is based on a vector representation of sub-optimal structures capturing structure conservation signals by weighting structural motifs according to their conservation across the sub-optimal structures. Results We demonstrate that aliFreeFold provides a good balance between speed and accuracy regarding predictions of representative structures for sets of homologous RNA compared to traditional methods based on sequence and structure alignment. We show that aliFreeFold is capable of uncovering conserved structural features fastly and effectively thanks to its weighting scheme that gives more (resp. less) importance to common (resp. uncommon) structural motifs. The weighting scheme is also shown to be capable of capturing conservation signal as the number of homologous RNA increases. These results demonstrate the ability of aliFreefold to efficiently and accurately provide interesting structural representatives of RNA families. Availability and implementation aliFreeFold was implemented in C++. Source code and Linux binary are freely available at https://github.com/UdeS-CoBIUS/aliFreeFold. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Aïda Ouangraoua
- Department of Computer Science, University of Sherbrooke, Sherbrooke, QC, Canada
| |
Collapse
|
10
|
Löwes B, Chauve C, Ponty Y, Giegerich R. The BRaliBase dent-a tale of benchmark design and interpretation. Brief Bioinform 2017; 18:306-311. [PMID: 26984616 PMCID: PMC5444242 DOI: 10.1093/bib/bbw022] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2015] [Indexed: 11/25/2022] Open
Abstract
BRaliBase is a widely used benchmark for assessing the accuracy of RNA secondary structure alignment methods. In most case studies based on the BRaliBase benchmark, one can observe a puzzling drop in accuracy in the 40–60% sequence identity range, the so-called ‘BRaliBase Dent’. In this article, we show this dent is owing to a bias in the composition of the BRaliBase benchmark, namely the inclusion of a disproportionate number of transfer RNAs, which exhibit a conserved secondary structure. Our analysis, aside of its interest regarding the specific case of the BRaliBase benchmark, also raises important questions regarding the design and use of benchmarks in computational biology.
Collapse
Affiliation(s)
- Benedikt Löwes
- Division of Cardiology, University of Nebraska Medical Center, USA
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, Burnaby, BC, Canada
| | - Yann Ponty
- LIX, CNRS/Inria AMIB, Ecole Polytechnique, Palaiseau, France
| | - Robert Giegerich
- Institute for Bioinformatics, Bielefeld University, Bielefeld, Germany
| |
Collapse
|
11
|
Kunz M, Wolf B, Schulze H, Atlan D, Walles T, Walles H, Dandekar T. Non-Coding RNAs in Lung Cancer: Contribution of Bioinformatics Analysis to the Development of Non-Invasive Diagnostic Tools. Genes (Basel) 2016; 8:E8. [PMID: 28035947 PMCID: PMC5295003 DOI: 10.3390/genes8010008] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2016] [Revised: 12/05/2016] [Accepted: 12/15/2016] [Indexed: 01/11/2023] Open
Abstract
Lung cancer is currently the leading cause of cancer related mortality due to late diagnosis and limited treatment intervention. Non-coding RNAs are not translated into proteins and have emerged as fundamental regulators of gene expression. Recent studies reported that microRNAs and long non-coding RNAs are involved in lung cancer development and progression. Moreover, they appear as new promising non-invasive biomarkers for early lung cancer diagnosis. Here, we highlight their potential as biomarker in lung cancer and present how bioinformatics can contribute to the development of non-invasive diagnostic tools. For this, we discuss several bioinformatics algorithms and software tools for a comprehensive understanding and functional characterization of microRNAs and long non-coding RNAs.
Collapse
Affiliation(s)
- Meik Kunz
- Functional Genomics and Systems Biology Group, Department of Bioinformatics, Biocenter, University of Wuerzburg, 97074 Wuerzburg, Germany.
| | - Beat Wolf
- Functional Genomics and Systems Biology Group, Department of Bioinformatics, Biocenter, University of Wuerzburg, 97074 Wuerzburg, Germany.
- University of Applied Sciences and Arts of Western Switzerland, Perolles 80, 1700 Fribourg, Switzerland.
| | - Harald Schulze
- Institute of Experimental Biomedicine, University Hospital Wuerzburg, 97080 Wuerzburg, Germany.
| | - David Atlan
- Phenosystems SA, 137 Rue de Tubize, 1440 Braine le Château, Belgium.
| | - Thorsten Walles
- Department of Cardiothoracic Surgery, University Hospital of Wuerzburg, 97080 Wuerzburg, Germany.
| | - Heike Walles
- Department of Tissue Engineering and Regenerative Medicine, University Hospital Wuerzburg, Roentgenring 11, 97070 Wuerzburg, Germany.
- Translational Center Wuerzburg "Regenerative therapies in oncology and musculoskeletal disease" Wuerzburg branch of the Fraunhofer Institute Interfacial Engineering and Biotechnology (IGB), Roentgenring 11, 97070 Wuerzburg, Germany.
| | - Thomas Dandekar
- Functional Genomics and Systems Biology Group, Department of Bioinformatics, Biocenter, University of Wuerzburg, 97074 Wuerzburg, Germany.
- BioComputing Unit, European Molecular Biology Laboratory (EMBL) Heidelberg, Meyerhofstraße 1, 69117 Heidelberg, Germany.
| |
Collapse
|
12
|
Abstract
Deciphering the folding pathways and predicting the structures of complex three-dimensional biomolecules is central to elucidating biological function. RNA is single-stranded, which gives it the freedom to fold into complex secondary and tertiary structures. These structures endow RNA with the ability to perform complex chemistries and functions ranging from enzymatic activity to gene regulation. Given that RNA is involved in many essential cellular processes, it is critical to understand how it folds and functions in vivo. Within the last few years, methods have been developed to probe RNA structures in vivo and genome-wide. These studies reveal that RNA often adopts very different structures in vivo and in vitro, and provide profound insights into RNA biology. Nonetheless, both in vitro and in vivo approaches have limitations: studies in the complex and uncontrolled cellular environment make it difficult to obtain insight into RNA folding pathways and thermodynamics, and studies in vitro often lack direct cellular relevance, leaving a gap in our knowledge of RNA folding in vivo. This gap is being bridged by biophysical and mechanistic studies of RNA structure and function under conditions that mimic the cellular environment. To date, most artificial cytoplasms have used various polymers as molecular crowding agents and a series of small molecules as cosolutes. Studies under such in vivo-like conditions are yielding fresh insights, such as cooperative folding of functional RNAs and increased activity of ribozymes. These observations are accounted for in part by molecular crowding effects and interactions with other molecules. In this review, we report milestones in RNA folding in vitro and in vivo and discuss ongoing experimental and computational efforts to bridge the gap between these two conditions in order to understand how RNA folds in the cell.
Collapse
|
13
|
Barquist L, Burge SW, Gardner PP. Studying RNA Homology and Conservation with Infernal: From Single Sequences to RNA Families. CURRENT PROTOCOLS IN BIOINFORMATICS 2016; 54:12.13.1-12.13.25. [PMID: 27322404 PMCID: PMC5010141 DOI: 10.1002/cpbi.4] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Emerging high-throughput technologies have led to a deluge of putative non-coding RNA (ncRNA) sequences identified in a wide variety of organisms. Systematic characterization of these transcripts will be a tremendous challenge. Homology detection is critical to making maximal use of functional information gathered about ncRNAs: identifying homologous sequence allows us to transfer information gathered in one organism to another quickly and with a high degree of confidence. ncRNA presents a challenge for homology detection, as the primary sequence is often poorly conserved and de novo secondary structure prediction and search remain difficult. This unit introduces methods developed by the Rfam database for identifying "families" of homologous ncRNAs starting from single "seed" sequences, using manually curated sequence alignments to build powerful statistical models of sequence and structure conservation known as covariance models (CMs), implemented in the Infernal software package. We provide a step-by-step iterative protocol for identifying ncRNA homologs and then constructing an alignment and corresponding CM. We also work through an example for the bacterial small RNA MicA, discovering a previously unreported family of divergent MicA homologs in genus Xenorhabdus in the process. © 2016 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Lars Barquist
- Institute for Molecular Infection Biology, University of Würzburg, Würzburg, D-97080 Germany
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA United Kingdom; Fax: +44 (0)1223 494919
| | - Sarah W. Burge
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA United Kingdom; Fax: +44 (0)1223 494919
| | - Paul P. Gardner
- School of Biological Sciences, University of Canterbury, Private Bag 4800, Christchurch, New Zealand
- Biomolecular Interaction Centre, University of Canterbury, Private Bag 4800, Christchurch, New Zealand
| |
Collapse
|
14
|
Lorenz R, Wolfinger MT, Tanzer A, Hofacker IL. Predicting RNA secondary structures from sequence and probing data. Methods 2016; 103:86-98. [PMID: 27064083 DOI: 10.1016/j.ymeth.2016.04.004] [Citation(s) in RCA: 69] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2015] [Revised: 03/29/2016] [Accepted: 04/04/2016] [Indexed: 01/08/2023] Open
Abstract
RNA secondary structures have proven essential for understanding the regulatory functions performed by RNA such as microRNAs, bacterial small RNAs, or riboswitches. This success is in part due to the availability of efficient computational methods for predicting RNA secondary structures. Recent advances focus on dealing with the inherent uncertainty of prediction by considering the ensemble of possible structures rather than the single most stable one. Moreover, the advent of high-throughput structural probing has spurred the development of computational methods that incorporate such experimental data as auxiliary information.
Collapse
Affiliation(s)
- Ronny Lorenz
- University of Vienna, Faculty of Chemistry, Department of Theoretical Chemistry, Währingerstrasse 17, 1090 Vienna, Austria.
| | - Michael T Wolfinger
- University of Vienna, Faculty of Chemistry, Department of Theoretical Chemistry, Währingerstrasse 17, 1090 Vienna, Austria; Medical University of Vienna, Center for Anatomy and Cell Biology, Währingerstraße 13, 1090 Vienna, Austria.
| | - Andrea Tanzer
- University of Vienna, Faculty of Chemistry, Department of Theoretical Chemistry, Währingerstrasse 17, 1090 Vienna, Austria.
| | - Ivo L Hofacker
- University of Vienna, Faculty of Chemistry, Department of Theoretical Chemistry, Währingerstrasse 17, 1090 Vienna, Austria; University of Vienna, Faculty of Computer Science, Research Group Bioinformatics and Computational Biology, Währingerstr. 29, 1090 Vienna, Austria.
| |
Collapse
|
15
|
Chatzou M, Magis C, Chang JM, Kemena C, Bussotti G, Erb I, Notredame C. Multiple sequence alignment modeling: methods and applications. Brief Bioinform 2015; 17:1009-1023. [PMID: 26615024 DOI: 10.1093/bib/bbv099] [Citation(s) in RCA: 98] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2015] [Revised: 10/16/2015] [Indexed: 12/20/2022] Open
Abstract
This review provides an overview on the development of Multiple sequence alignment (MSA) methods and their main applications. It is focused on progress made over the past decade. The three first sections review recent algorithmic developments for protein, RNA/DNA and genomic alignments. The fourth section deals with benchmarks and explores the relationship between empirical and simulated data, along with the impact on method developments. The last part of the review gives an overview on available MSA local reliability estimators and their dependence on various algorithmic properties of available methods.
Collapse
|
16
|
Li Y, Zhong C, Zhang S. Finding consensus stable local optimal structures for aligned RNA sequences and its application to discovering riboswitch elements. ACTA ACUST UNITED AC 2015; 10:498-518. [PMID: 24989865 DOI: 10.1504/ijbra.2014.062997] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Many non-coding RNAs (ncRNAs) can fold into alternate native structures and perform different biological functions. The computational prediction of an ncRNA's alternate native structures can be conducted by analysing the ncRNA's energy landscape. Previously, we have developed a computational approach, RNASLOpt, to predict alternate native structures for a single RNA. In this paper, in order to improve the accuracy of the prediction, we incorporate structural conservation information among a family of related ncRNA sequences to the prediction. We propose a comparative approach, RNAConSLOpt, to produce all possible consensus SLOpt stack configurations that are conserved on the consensus energy landscape of a family of related ncRNAs. Benchmarking tests show that RNAConSLOpt can reduce the number of candidate structures compared with RNASLOpt, and can predict ncRNAs' alternate native structures accurately. Moreover, an application of the proposed pipeline to bacteria in Bacillus genus has discovered several novel riboswitch candidates.
Collapse
Affiliation(s)
- Yuan Li
- Department of Electrical Engineering and Computer Science, University of Central Florida, Orlando, Florida 32816, USA
| | - Cuncong Zhong
- Department of Electrical Engineering and Computer Science, University of Central Florida, Orlando, Florida 32816, USA
| | - Shaojie Zhang
- Department of Electrical Engineering and Computer Science, University of Central Florida, Orlando, Florida 32816, USA
| |
Collapse
|
17
|
Kunz M, Xiao K, Liang C, Viereck J, Pachel C, Frantz S, Thum T, Dandekar T. Bioinformatics of cardiovascular miRNA biology. J Mol Cell Cardiol 2014; 89:3-10. [PMID: 25486579 DOI: 10.1016/j.yjmcc.2014.11.027] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/03/2014] [Revised: 11/05/2014] [Accepted: 11/29/2014] [Indexed: 12/16/2022]
Abstract
MicroRNAs (miRNAs) are small ~22 nucleotide non-coding RNAs and are highly conserved among species. Moreover, miRNAs regulate gene expression of a large number of genes associated with important biological functions and signaling pathways. Recently, several miRNAs have been found to be associated with cardiovascular diseases. Thus, investigating the complex regulatory effect of miRNAs may lead to a better understanding of their functional role in the heart. To achieve this, bioinformatics approaches have to be coupled with validation and screening experiments to understand the complex interactions of miRNAs with the genome. This will boost the subsequent development of diagnostic markers and our understanding of the physiological and therapeutic role of miRNAs in cardiac remodeling. In this review, we focus on and explain different bioinformatics strategies and algorithms for the identification and analysis of miRNAs and their regulatory elements to better understand cardiac miRNA biology. Starting with the biogenesis of miRNAs, we present approaches such as LocARNA and miRBase for combining sequence and structure analysis including phylogenetic comparisons as well as detailed analysis of RNA folding patterns, functional target prediction, signaling pathway as well as functional analysis. We also show how far bioinformatics helps to tackle the unprecedented level of complexity and systemic effects by miRNA, underlining the strong therapeutic potential of miRNA and miRNA target structures in cardiovascular disease. In addition, we discuss drawbacks and limitations of bioinformatics algorithms and the necessity of experimental approaches for miRNA target identification. This article is part of a Special Issue entitled 'Non-coding RNAs'.
Collapse
Affiliation(s)
- Meik Kunz
- Functional Genomics and Systems Biology Group, Department of Bioinformatics, Biocenter, Würzburg, Germany; Institute for Molecular and Translational Therapeutic Strategies (IMTTS), Hannover Medical School, Hannover, Germany
| | - Ke Xiao
- Institute for Molecular and Translational Therapeutic Strategies (IMTTS), Hannover Medical School, Hannover, Germany; Plant Breeding Institute, Christian-Albrechts-University of Kiel, Olshausenstr. 40, 24098 Kiel, Germany
| | - Chunguang Liang
- Functional Genomics and Systems Biology Group, Department of Bioinformatics, Biocenter, Würzburg, Germany
| | - Janika Viereck
- Institute for Molecular and Translational Therapeutic Strategies (IMTTS), Hannover Medical School, Hannover, Germany
| | - Christina Pachel
- Department of Internal Medicine I, University Hospital Würzburg, Germany and Comprehensive Heart Failure Center, University of Würzburg, Germany
| | - Stefan Frantz
- Department of Internal Medicine I, University Hospital Würzburg, Germany and Comprehensive Heart Failure Center, University of Würzburg, Germany
| | - Thomas Thum
- Institute for Molecular and Translational Therapeutic Strategies (IMTTS), Hannover Medical School, Hannover, Germany; Excellence Cluster REBIRTH, Hannover Medical School, Hannover, Germany; National Heart and Lung Institute, Imperial College London, London, UK
| | - Thomas Dandekar
- Functional Genomics and Systems Biology Group, Department of Bioinformatics, Biocenter, Würzburg, Germany; EMBL Heidelberg, BioComputing Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany.
| |
Collapse
|
18
|
Abstract
MOTIVATION Abstract shape analysis, first proposed in 2004, allows one to extract several relevant structures from the folding space of an RNA sequence, preferable to focusing in a single structure of minimal free energy. We report recent extensions to this approach. RESULTS We have rebuilt the original RNAshapes as a repository of components that allows us to integrate several established tools for RNA structure analysis: RNAshapes, RNAalishapes and pknotsRG, including its recent extension pKiss. As a spin-off, we obtain heretofore unavailable functionality: e. g. with pKiss, we can now perform abstract shape analysis for structures holding pseudoknots up to the complexity of kissing hairpin motifs. The new tool pAliKiss can predict kissing hairpin motifs from aligned sequences. Along with the integration, the functionality of the tools was also extended in manifold ways. AVAILABILITY AND IMPLEMENTATION As before, the tool is available on the Bielefeld Bioinformatics server at http://bibiserv.cebitec.uni-bielefeld.de/rnashapesstudio. CONTACT bibi-help@cebitec.uni-bielefeld.de.
Collapse
Affiliation(s)
- Stefan Janssen
- Practical Computer Science, Faculty of Technology, Bielefeld University, D-33615 Bielefeld, Germany
| | - Robert Giegerich
- Practical Computer Science, Faculty of Technology, Bielefeld University, D-33615 Bielefeld, Germany
| |
Collapse
|
19
|
Abstract
Abstract shape analysis abstract shape analysis is a method to learn more about the complete Boltzmann ensemble of the secondary structures of a single RNA molecule. Abstract shapes classify competing secondary structures into classes that are defined by their arrangement of helices. It allows us to compute, in addition to the structure of minimal free energy, a set of structures that represents relevant and interesting structural alternatives. Furthermore, it allows to compute probabilities of all structures within a shape class. This allows to ensure that our representative subset covers the complete Boltzmann ensemble, except for a portion of negligible probability. This chapter explains the main functions of abstract shape analysis, as implemented in the tool RNA shapes. RNA shapes It reports on some other types of analysis that are based on the abstract shapes idea and shows how you can solve novel problems by creating your own shape abstractions.
Collapse
|
20
|
Andersen ES. The art of editing RNA structural alignments. Methods Mol Biol 2014; 1097:379-394. [PMID: 24639168 DOI: 10.1007/978-1-62703-709-9_17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Manual editing of RNA structural alignments may be considered more art than science, since it still requires an expert biologist to take multiple levels of information into account and be slightly creative when constructing high-quality alignments. Even though the task is rather tedious, it is rewarded by great insight into the evolution of structure and function of your favorite RNA molecule. In this chapter I will review the methods and considerations that go into constructing RNA structural alignments at the secondary and tertiary structure level; introduce software, databases, and algorithms that have proven useful in semiautomating the work process; and suggest future directions towards full automatization.
Collapse
|
21
|
Bussotti G, Notredame C, Enright AJ. Detecting and comparing non-coding RNAs in the high-throughput era. Int J Mol Sci 2013; 14:15423-58. [PMID: 23887659 PMCID: PMC3759867 DOI: 10.3390/ijms140815423] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2013] [Revised: 07/16/2013] [Accepted: 07/17/2013] [Indexed: 02/07/2023] Open
Abstract
In recent years there has been a growing interest in the field of non-coding RNA. This surge is a direct consequence of the discovery of a huge number of new non-coding genes and of the finding that many of these transcripts are involved in key cellular functions. In this context, accurately detecting and comparing RNA sequences has become important. Aligning nucleotide sequences is a key requisite when searching for homologous genes. Accurate alignments reveal evolutionary relationships, conserved regions and more generally any biologically relevant pattern. Comparing RNA molecules is, however, a challenging task. The nucleotide alphabet is simpler and therefore less informative than that of amino-acids. Moreover for many non-coding RNAs, evolution is likely to be mostly constrained at the structural level and not at the sequence level. This results in very poor sequence conservation impeding comparison of these molecules. These difficulties define a context where new methods are urgently needed in order to exploit experimental results to their full potential. This review focuses on the comparative genomics of non-coding RNAs in the context of new sequencing technologies and especially dealing with two extremely important and timely research aspects: the development of new methods to align RNAs and the analysis of high-throughput data.
Collapse
Affiliation(s)
- Giovanni Bussotti
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK; E-Mail:
| | - Cedric Notredame
- Bioinformatics and Genomics Program, Centre for Genomic Regulation (CRG), Aiguader, 88, 08003 Barcelona, Spain; E-Mail:
| | - Anton J. Enright
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK; E-Mail:
| |
Collapse
|
22
|
Theil Have C, Zambach S, Christiansen H. Effects of using coding potential, sequence conservation and mRNA structure conservation for predicting pyrrolysine containing genes. BMC Bioinformatics 2013; 14:118. [PMID: 23557142 PMCID: PMC3639795 DOI: 10.1186/1471-2105-14-118] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2012] [Accepted: 03/19/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Pyrrolysine (the 22nd amino acid) is in certain organisms and under certain circumstances encoded by the amber stop codon, UAG. The circumstances driving pyrrolysine translation are not well understood. The involvement of a predicted mRNA structure in the region downstream UAG has been suggested, but the structure does not seem to be present in all pyrrolysine incorporating genes. RESULTS We propose a strategy to predict pyrrolysine encoding genes in genomes of archaea and bacteria. We cluster open reading frames interrupted by the amber codon based on sequence similarity. We rank these clusters according to several features that may influence pyrrolysine translation. The ranking effects of different features are assessed and we propose a weighted combination of these features which best explains the currently known pyrrolysine incorporating genes. We devote special attention to the effect of structural conservation and provide further substantiation to support that structural conservation may be influential - but is not a necessary factor. Finally, from the weighted ranking, we identify a number of potentially pyrrolysine incorporating genes. CONCLUSIONS We propose a method for prediction of pyrrolysine incorporating genes in genomes of bacteria and archaea leading to insights about the factors driving pyrrolysine translation and identification of new gene candidates. The method predicts known conserved genes with high recall and predicts several other promising candidates for experimental verification. The method is implemented as a computational pipeline which is available on request.
Collapse
Affiliation(s)
- Christian Theil Have
- Research group PLIS: Programming, Logic and Intelligent Systems, Department of Communication, Business and Information Technologies, Roskilde University, P.O. Box 260, Roskilde, DK-4000, Denmark.
| | | | | |
Collapse
|
23
|
Achawanantakun R, Sun Y. Shape and secondary structure prediction for ncRNAs including pseudoknots based on linear SVM. BMC Bioinformatics 2013; 14 Suppl 2:S1. [PMID: 23369147 PMCID: PMC3549817 DOI: 10.1186/1471-2105-14-s2-s1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Accurate secondary structure prediction provides important information to undefirstafinding the tertiary structures and thus the functions of ncRNAs. However, the accuracy of the native structure derivation of ncRNAs is still not satisfactory, especially on sequences containing pseudoknots. It is recently shown that using the abstract shapes, which retain adjacency and nesting of structural features but disregard the length details of helix and loop regions, can improve the performance of structure prediction. In this work, we use SVM-based feature selection to derive the consensus abstract shape of homologous ncRNAs and apply the predicted shape to structure prediction including pseudoknots. Results Our approach was applied to predict shapes and secondary structures on hundreds of ncRNA data sets with and without psuedoknots. The experimental results show that we can achieve 18% higher accuracy in shape prediction than the state-of-the-art consensus shape prediction tools. Using predicted shapes in structure prediction allows us to achieve approximate 29% higher sensitivity and 10% higher positive predictive value than other pseudoknot prediction tools. Conclusions Extensive analysis of RNA properties based on SVM allows us to identify important properties of sequences and structures related to their shapes. The combination of mass data analysis and SVM-based feature selection makes our approach a promising method for shape and structure prediction. The implemented tools, Knot Shape and Knot Structure are open source software and can be downloaded at: http://www.cse.msu.edu/~achawana/KnotShape.
Collapse
Affiliation(s)
- Rujira Achawanantakun
- Department of Computer Science and Engineering, Michigan State University, Michigan, USA
| | | |
Collapse
|
24
|
Abstract
The purpose of this section is to detail methods for the computational prediction of RNA secondary structure. This protocol is intended to provide an easy entry into the field of RNA structure prediction for those wishing to utilize it in their research and to suggest 'best practices' for going from sequence to secondary structure depending on the available data.
Collapse
Affiliation(s)
- Walter N Moss
- Department of Chemistry, University of Rochester, Rochester, NY, USA.
| |
Collapse
|
25
|
Huang J, Backofen R, Voß B. Abstract folding space analysis based on helices. RNA (NEW YORK, N.Y.) 2012; 18:2135-2147. [PMID: 23104999 PMCID: PMC3504666 DOI: 10.1261/rna.033548.112] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/30/2012] [Accepted: 09/24/2012] [Indexed: 06/01/2023]
Abstract
RNA has many pivotal functions especially in the regulation of gene expression by ncRNAs. Identification of their structure is an important requirement for understanding their function. Structure prediction alone is often insufficient for this task, due to algorithmic problems, parameter inaccuracies, and biological peculiarities. Among the latter, there are base modifications, cotranscriptional folding leading to folding traps, and conformational switching as in the case of riboswitches. All these require more in-depth analysis of the folding space. The major drawback, which all methods have to cope with, is the exponential growth of the folding space. Therefore, methods are often limited in the sequence length they can analyze, or they make use of heuristics, sampling, or abstraction. Our approach adopts the abstraction strategy and remedies some problems of existing methods. We introduce a position-specific abstraction based on helices that we term helix index shapes, or hishapes for short. Utilizing a dynamic programming framework, we have implemented this abstraction in the program RNAHeliCes. Furthermore, we developed two hishape-based methods, one for energy barrier estimation, called HiPath, and one for abstract structure comparison, termed HiTed. We demonstrate the superior performance of HiPath compared to other existing methods and the competitive accuracy of HiTed. RNAHeliCes, together with HiPath and HiTed, are available for download at http://www.cyanolab.de/software/RNAHeliCes.htm.
Collapse
Affiliation(s)
- Jiabin Huang
- Genetics & Experimental Bioinformatics, Faculty of Biology, University of Freiburg, Freiburg 79104, Germany
| | - Rolf Backofen
- Chair for Bioinformatics, Faculty of Technology, University of Freiburg, Freiburg 79110, Germany
| | - Björn Voß
- Genetics & Experimental Bioinformatics, Faculty of Biology, University of Freiburg, Freiburg 79104, Germany
| |
Collapse
|
26
|
Computational prediction and experimental verification of miRNAs in Panicum miliaceum L. SCIENCE CHINA-LIFE SCIENCES 2012; 55:807-17. [DOI: 10.1007/s11427-012-4367-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/02/2012] [Accepted: 07/12/2012] [Indexed: 10/27/2022]
|
27
|
Conservation and Occurrence of Trans-Encoded sRNAs in the Rhizobiales. Genes (Basel) 2011; 2:925-56. [PMID: 24710299 PMCID: PMC3927594 DOI: 10.3390/genes2040925] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2011] [Revised: 10/24/2011] [Accepted: 10/26/2011] [Indexed: 12/13/2022] Open
Abstract
Post-transcriptional regulation by trans-encoded sRNAs, for example via base-pairing with target mRNAs, is a common feature in bacteria and influences various cell processes, e.g., response to stress factors. Several studies based on computational and RNA-seq approaches identified approximately 180 trans-encoded sRNAs in Sinorhizobium meliloti. The initial point of this report is a set of 52 trans-encoded sRNAs derived from the former studies. Sequence homology combined with structural conservation analyses were applied to elucidate the occurrence and distribution of conserved trans-encoded sRNAs in the order of Rhizobiales. This approach resulted in 39 RNA family models (RFMs) which showed various taxonomic distribution patterns. Whereas the majority of RFMs was restricted to Sinorhizobium species or the Rhizobiaceae, members of a few RFMs were more widely distributed in the Rhizobiales. Access to this data is provided via the RhizoGATE portal [1,2].
Collapse
|
28
|
Lerman YV, Kennedy SD, Shankar N, Parisien M, Major F, Turner DH. NMR structure of a 4 x 4 nucleotide RNA internal loop from an R2 retrotransposon: identification of a three purine-purine sheared pair motif and comparison to MC-SYM predictions. RNA (NEW YORK, N.Y.) 2011; 17:1664-77. [PMID: 21778280 PMCID: PMC3162332 DOI: 10.1261/rna.2641911] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/24/2011] [Accepted: 05/08/2011] [Indexed: 05/31/2023]
Abstract
The NMR solution structure is reported of a duplex, 5'GUGAAGCCCGU/3'UCACAGGAGGC, containing a 4 × 4 nucleotide internal loop from an R2 retrotransposon RNA. The loop contains three sheared purine-purine pairs and reveals a structural element found in other RNAs, which we refer to as the 3RRs motif. Optical melting measurements of the thermodynamics of the duplex indicate that the internal loop is 1.6 kcal/mol more stable at 37°C than predicted. The results identify the 3RRs motif as a common structural element that can facilitate prediction of 3D structure. Known examples include internal loops having the pairings: 5'GAA/3'AGG, 5'GAG/3'AGG, 5'GAA/3'AAG, and 5'AAG/3'AGG. The structural information is compared with predictions made with the MC-Sym program.
Collapse
Affiliation(s)
- Yelena V. Lerman
- Department of Chemistry, University of Rochester, Rochester, New York 14627, USA
| | - Scott D. Kennedy
- Department of Biochemistry and Biophysics, University of Rochester School of Medicine and Dentistry, Rochester, New York 14642, USA
| | - Neelaabh Shankar
- Department of Biochemistry and Biophysics, University of Rochester School of Medicine and Dentistry, Rochester, New York 14642, USA
| | - Marc Parisien
- Department of Computer Science and Operations Research, University of Montreal, Montreal, Quebec H3C CJ7, Canada
| | - Francois Major
- Department of Computer Science and Operations Research, University of Montreal, Montreal, Quebec H3C CJ7, Canada
| | - Douglas H. Turner
- Department of Chemistry, University of Rochester, Rochester, New York 14627, USA
| |
Collapse
|
29
|
Scharff LB, Childs L, Walther D, Bock R. Local absence of secondary structure permits translation of mRNAs that lack ribosome-binding sites. PLoS Genet 2011; 7:e1002155. [PMID: 21731509 PMCID: PMC3121790 DOI: 10.1371/journal.pgen.1002155] [Citation(s) in RCA: 84] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2011] [Accepted: 05/05/2011] [Indexed: 02/05/2023] Open
Abstract
The initiation of translation is a fundamental and highly regulated process in gene expression. Translation initiation in prokaryotic systems usually requires interaction between the ribosome and an mRNA sequence upstream of the initiation codon, the so-called ribosome-binding site (Shine-Dalgarno sequence). However, a large number of genes do not possess Shine-Dalgarno sequences, and it is unknown how start codon recognition occurs in these mRNAs. We have performed genome-wide searches in various groups of prokaryotes in order to identify sequence elements and/or RNA secondary structural motifs that could mediate translation initiation in mRNAs lacking Shine-Dalgarno sequences. We find that mRNAs without a Shine-Dalgarno sequence are generally less structured in their translation initiation region and show a minimum of mRNA folding at the start codon. Using reporter gene constructs in bacteria, we also provide experimental support for local RNA unfoldedness determining start codon recognition in Shine-Dalgarno–independent translation. Consistent with this, we show that AUG start codons reside in single-stranded regions, whereas internal AUG codons are usually in structured regions of the mRNA. Taken together, our bioinformatics analyses and experimental data suggest that local absence of RNA secondary structure is necessary and sufficient to initiate Shine-Dalgarno–independent translation. Thus, our results provide a plausible mechanism for how the correct translation initiation site is recognized in the absence of a ribosome-binding site. Protein biosynthesis (translation) is a highly regulated process in gene expression. In all organisms, initiation of translation depends on molecular recognition of the messenger RNA by ribosomes. In prokaryotes (bacteria, mitochondria, and chloroplasts), this recognition is mediated by a specific sequence motif in the 5′ untranslated region of the mRNA, called “ribosome-binding site” or “Shine-Dalgarno sequence.” However, many messenger RNAs lack Shine-Dalgarno sequences, and it is currently unknown how the correct translation initiation site is recognized in these mRNAs. Here, we provide insights into the mechanism of translation initiation in the absence of a ribosome-binding site. We have performed genome-wide searches for Shine-Dalgarno–independent translation in bacterial and organellar genomes and report that a large fraction of transcripts is translated in a Shine-Dalgarno–independent manner in all prokaryotic systems. We find that Shine-Dalgarno–independent translation initiation is strongly correlated with the presence of a local minimum in RNA secondary structure around the translational start codon. The significance of RNA unfoldedness as the key determinant of start codon recognition in Shine-Dalgarno–independent translation initiation was confirmed experimentally by employing reporter gene fusions in the bacterium Escherichia coli. In conclusion, our work suggests an intriguing mechanism for translation initiation on mRNAs that lack a ribosome-binding site.
Collapse
Affiliation(s)
- Lars B Scharff
- Max-Planck-Institut für Molekulare Pflanzenphysiologie, Potsdam-Golm, Germany
| | | | | | | |
Collapse
|
30
|
Meyer F, Kurtz S, Backofen R, Will S, Beckstette M. Structator: fast index-based search for RNA sequence-structure patterns. BMC Bioinformatics 2011; 12:214. [PMID: 21619640 PMCID: PMC3154205 DOI: 10.1186/1471-2105-12-214] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2010] [Accepted: 05/27/2011] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND The secondary structure of RNA molecules is intimately related to their function and often more conserved than the sequence. Hence, the important task of searching databases for RNAs requires to match sequence-structure patterns. Unfortunately, current tools for this task have, in the best case, a running time that is only linear in the size of sequence databases. Furthermore, established index data structures for fast sequence matching, like suffix trees or arrays, cannot benefit from the complementarity constraints introduced by the secondary structure of RNAs. RESULTS We present a novel method and readily applicable software for time efficient matching of RNA sequence-structure patterns in sequence databases. Our approach is based on affix arrays, a recently introduced index data structure, preprocessed from the target database. Affix arrays support bidirectional pattern search, which is required for efficiently handling the structural constraints of the pattern. Structural patterns like stem-loops can be matched inside out, such that the loop region is matched first and then the pairing bases on the boundaries are matched consecutively. This allows to exploit base pairing information for search space reduction and leads to an expected running time that is sublinear in the size of the sequence database. The incorporation of a new chaining approach in the search of RNA sequence-structure patterns enables the description of molecules folding into complex secondary structures with multiple ordered patterns. The chaining approach removes spurious matches from the set of intermediate results, in particular of patterns with little specificity. In benchmark experiments on the Rfam database, our method runs up to two orders of magnitude faster than previous methods. CONCLUSIONS The presented method's sublinear expected running time makes it well suited for RNA sequence-structure pattern matching in large sequence databases. RNA molecules containing several stem-loop substructures can be described by multiple sequence-structure patterns and their matches are efficiently handled by a novel chaining method. Beyond our algorithmic contributions, we provide with Structator a complete and robust open-source software solution for index-based search of RNA sequence-structure patterns. The Structator software is available at http://www.zbh.uni-hamburg.de/Structator.
Collapse
Affiliation(s)
- Fernando Meyer
- Center for Bioinformatics, University of Hamburg, Bundesstrasse 43, 20146 Hamburg, Germany
| | | | | | | | | |
Collapse
|
31
|
Harmanci AO, Sharma G, Mathews DH. TurboFold: iterative probabilistic estimation of secondary structures for multiple RNA sequences. BMC Bioinformatics 2011; 12:108. [PMID: 21507242 PMCID: PMC3120699 DOI: 10.1186/1471-2105-12-108] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2010] [Accepted: 04/20/2011] [Indexed: 01/07/2023] Open
Abstract
Background The prediction of secondary structure, i.e. the set of canonical base pairs between nucleotides, is a first step in developing an understanding of the function of an RNA sequence. The most accurate computational methods predict conserved structures for a set of homologous RNA sequences. These methods usually suffer from high computational complexity. In this paper, TurboFold, a novel and efficient method for secondary structure prediction for multiple RNA sequences, is presented. Results TurboFold takes, as input, a set of homologous RNA sequences and outputs estimates of the base pairing probabilities for each sequence. The base pairing probabilities for a sequence are estimated by combining intrinsic information, derived from the sequence itself via the nearest neighbor thermodynamic model, with extrinsic information, derived from the other sequences in the input set. For a given sequence, the extrinsic information is computed by using pairwise-sequence-alignment-based probabilities for co-incidence with each of the other sequences, along with estimated base pairing probabilities, from the previous iteration, for the other sequences. The extrinsic information is introduced as free energy modifications for base pairing in a partition function computation based on the nearest neighbor thermodynamic model. This process yields updated estimates of base pairing probability. The updated base pairing probabilities in turn are used to recompute extrinsic information, resulting in the overall iterative estimation procedure that defines TurboFold. TurboFold is benchmarked on a number of ncRNA datasets and compared against alternative secondary structure prediction methods. The iterative procedure in TurboFold is shown to improve estimates of base pairing probability with each iteration, though only small gains are obtained beyond three iterations. Secondary structures composed of base pairs with estimated probabilities higher than a significance threshold are shown to be more accurate for TurboFold than for alternative methods that estimate base pairing probabilities. TurboFold-MEA, which uses base pairing probabilities from TurboFold in a maximum expected accuracy algorithm for secondary structure prediction, has accuracy comparable to the best performing secondary structure prediction methods. The computational and memory requirements for TurboFold are modest and, in terms of sequence length and number of sequences, scale much more favorably than joint alignment and folding algorithms. Conclusions TurboFold is an iterative probabilistic method for predicting secondary structures for multiple RNA sequences that efficiently and accurately combines the information from the comparative analysis between sequences with the thermodynamic folding model. Unlike most other multi-sequence structure prediction methods, TurboFold does not enforce strict commonality of structures and is therefore useful for predicting structures for homologous sequences that have diverged significantly. TurboFold can be downloaded as part of the RNAstructure package at http://rna.urmc.rochester.edu.
Collapse
Affiliation(s)
- Arif O Harmanci
- Department of Electrical and Computer Engineering, University of Rochester, Rochester, NY, USA
| | | | | |
Collapse
|
32
|
Abstract
MOTIVATION RNA family models group nucleotide sequences that share a common biological function. These models can be used to find new sequences belonging to the same family. To succeed in this task, a model needs to exhibit high sensitivity as well as high specificity. As model construction is guided by a manual process, a number of problems can occur, such as the introduction of more than one model for the same family or poorly constructed models. We explore the Rfam database to discover such problems. RESULTS Our main contribution is in the definition of the discriminatory power of RNA family models, together with a first algorithm for its computation. In addition, we present calculations across the whole Rfam database that show several families lacking high specificity when compared to other families. We give a list of these clusters of families and provide a tentative explanation. Our program can be used to: (i) make sure that new models are not equivalent to any model already present in the database; and (ii) new models are not simply submodels of existing families. AVAILABILITY www.tbi.univie.ac.at/software/cmcompare/. The code is licensed under the GPLv3. Results for the whole Rfam database and supporting scripts are available together with the software.
Collapse
|
33
|
Mathews DH, Moss WN, Turner DH. Folding and finding RNA secondary structure. Cold Spring Harb Perspect Biol 2010; 2:a003665. [PMID: 20685845 DOI: 10.1101/cshperspect.a003665] [Citation(s) in RCA: 106] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Optimal exploitation of the expanding database of sequences requires rapid finding and folding of RNAs. Methods are reviewed that automate folding and discovery of RNAs with algorithms that couple thermodynamics with chemical mapping, NMR, and/or sequence comparison. New functional noncoding RNAs in genome sequences can be found by combining sequence comparison with the assumption that functional noncoding RNAs will have more favorable folding free energies than other RNAs. When a new RNA is discovered, experiments and sequence comparison can restrict folding space so that secondary structure can be rapidly determined with the help of predicted free energies. In turn, secondary structure restricts folding in three dimensions, which allows modeling of three-dimensional structure. An example from a domain of a retrotransposon is described. Discovery of new RNAs and their structures will provide insights into evolution, biology, and design of therapeutics. Applications to studies of evolution are also reviewed.
Collapse
Affiliation(s)
- David H Mathews
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester School of Medicine and Dentistry, Rochester, New York 14642, USA
| | | | | |
Collapse
|
34
|
|
35
|
Du J, Wu Y, Fang X, Cao J, Zhao L, Tao S. Prediction of sorghum miRNAs and their targets with computational methods. ACTA ACUST UNITED AC 2010. [DOI: 10.1007/s11434-010-0035-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
36
|
Bremges A, Schirmer S, Giegerich R. Fine-tuning structural RNA alignments in the twilight zone. BMC Bioinformatics 2010; 11:222. [PMID: 20433706 PMCID: PMC2876130 DOI: 10.1186/1471-2105-11-222] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2009] [Accepted: 04/30/2010] [Indexed: 11/25/2022] Open
Abstract
Background A widely used method to find conserved secondary structure in RNA is to first construct a multiple sequence alignment, and then fold the alignment, optimizing a score based on thermodynamics and covariance. This method works best around 75% sequence similarity. However, in a "twilight zone" below 55% similarity, the sequence alignment tends to obscure the covariance signal used in the second phase. Therefore, while the overall shape of the consensus structure may still be found, the degree of conservation cannot be estimated reliably. Results Based on a combination of available methods, we present a method named planACstar for improving structure conservation in structural alignments in the twilight zone. After constructing a consensus structure by alignment folding, planACstar abandons the original sequence alignment, refolds the sequences individually, but consistent with the consensus, aligns the structures, irrespective of sequence, by a pure structure alignment method, and derives an improved sequence alignment from the alignment of structures, to be re-submitted to alignment folding, etc.. This circle may be iterated as long as structural conservation improves, but normally, one step suffices. Conclusions Employing the tools ClustalW, RNAalifold, and RNAforester, we find that for sequences with 30-55% sequence identity, structural conservation can be improved by 10% on average, with a large variation, measured in terms of RNAalifold's own criterion, the structure conservation index.
Collapse
Affiliation(s)
- Andreas Bremges
- Faculty of Technology, Bielefeld University, 33615 Bielefeld, Germany
| | | | | |
Collapse
|
37
|
Abstract
Motivation: Abstract shape analysis allows efficient computation of a representative sample of low-energy foldings of an RNA molecule. More comprehensive information is obtained by computing shape probabilities, accumulating the Boltzmann probabilities of all structures within each abstract shape. Such information is superior to free energies because it is independent of sequence length and base composition. However, up to this point, computation of shape probabilities evaluates all shapes simultaneously and comes with a computation cost which is exponential in the length of the sequence. Results: We device an approach called RapidShapes that computes the shapes above a specified probability threshold T by generating a list of promising shapes and constructing specialized folding programs for each shape to compute its share of Boltzmann probability. This aims at a heuristic improvement of runtime, while still computing exact probability values. Conclusion: Evaluating this approach and several substrategies, we find that only a small proportion of shapes have to be actually computed. For an RNA sequence of length 400, this leads, depending on the threshold, to a 10–138 fold speed-up compared with the previous complete method. Thus, probabilistic shape analysis has become feasible in medium-scale applications, such as the screening of RNA transcripts in a bacterial genome. Availability:RapidShapes is available via http://bibiserv.cebitec.uni-bielefeld.de/rnashapes Contact:robert@techfak.uni-bielefeld.de Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Stefan Janssen
- Practical Computer Science, Faculty of Technology, Bielefeld University, D-33615 Bielefeld, Germany
| | | |
Collapse
|
38
|
Simultaneous alignment and folding of 28S rRNA sequences uncovers phylogenetic signal in structure variation. Mol Phylogenet Evol 2009; 53:758-71. [DOI: 10.1016/j.ympev.2009.07.033] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2009] [Revised: 07/22/2009] [Accepted: 07/28/2009] [Indexed: 11/21/2022]
|
39
|
Bernhart SH, Hofacker IL. From consensus structure prediction to RNA gene finding. BRIEFINGS IN FUNCTIONAL GENOMICS AND PROTEOMICS 2009; 8:461-71. [PMID: 19833701 DOI: 10.1093/bfgp/elp043] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
Reliable structure prediction is a prerequisite for most types of bioinformatical analysis of RNA. Since the accuracy of structure prediction from single sequences is limited, one often resorts to computing the consensus structure for a set of related RNA sequences. Since functionally important RNA structures are expected to evolve much more slowly than the underlying sequences, the pattern of sequence (co-)variation can be exploited to dramatically improve structure prediction. Since a conserved common structure is only expected when the RNA structure is under selective pressure, consensus structure prediction also provides an ideal starting point for the de novo detection of structured non-coding RNAs. Here, we review different strategies for the prediction of consensus secondary structures, and show how these approaches can be used to predict non-coding RNA genes.
Collapse
Affiliation(s)
- Stephan H Bernhart
- Department of Theoretical Chemistry, University of Vienna, Währingerstrasse 17, A-1090 Wien, Austria.
| | | |
Collapse
|
40
|
Reeder J, Giegerich R. RNA secondary structure analysis using the RNAshapes package. ACTA ACUST UNITED AC 2009; Chapter 12:12.8.1-12.8.17. [PMID: 19496058 DOI: 10.1002/0471250953.bi1208s26] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
This unit shows how to use the RNAshapes package for the prediction of the secondary structure of a single RNA sequence using either minimum free energy methods or weighted ensemble information. It also includes a protocol for the consensus prediction of a set of related sequences.
Collapse
|
41
|
Nebel ME, Scheid A. On quantitative effects of RNA shape abstraction. Theory Biosci 2009; 128:211-25. [PMID: 19756808 DOI: 10.1007/s12064-009-0074-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2009] [Accepted: 08/07/2009] [Indexed: 11/26/2022]
Abstract
Over the last few decades, much effort has been taken to develop approaches for identifying good predictions of RNA secondary structure. This is due to the fact that most computational prediction methods based on free energy minimization compute a number of suboptimal foldings and we have to identify the native folding among all these possible secondary structures. Using the abstract shapes approach as introduced by Giegerich et al. (Nucleic Acids Res 32(16):4843-4851, 2004), each class of similar secondary structures is represented by one shape and the native structures can be found among the top shape representatives. In this article, we derive some interesting results answering enumeration problems for abstract shapes and secondary structures of RNA. We compute precise asymptotics for the number of different shape representations of size n and for the number of different shapes showing up when abstracting from secondary structures of size n under a combinatorial point of view. A more realistic model taking primary structures into account remains an open challenge. We give some arguments why the present techniques cannot be applied in this case.
Collapse
Affiliation(s)
- Markus E Nebel
- Fachbereich Informatik, Technische Universität Kaiserslautern, Gottlieb-Daimler-Strasse 48, 67663 Kaiserslautern, Germany.
| | | |
Collapse
|
42
|
Stocsits RR, Letsch H, Hertel J, Misof B, Stadler PF. Accurate and efficient reconstruction of deep phylogenies from structured RNAs. Nucleic Acids Res 2009; 37:6184-93. [PMID: 19723687 PMCID: PMC2764418 DOI: 10.1093/nar/gkp600] [Citation(s) in RCA: 84] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Ribosomal RNA (rRNA) genes are probably the most frequently used data source in phylogenetic reconstruction. Individual columns of rRNA alignments are not independent as a consequence of their highly conserved secondary structures. Unless explicitly taken into account, these correlation can distort the phylogenetic signal and/or lead to gross overestimates of tree stability. Maximum likelihood and Bayesian approaches are of course amenable to using RNA-specific substitution models that treat conserved base pairs appropriately, but require accurate secondary structure models as input. So far, however, no accurate and easy-to-use tool has been available for computing structure-aware alignments and consensus structures that can deal with the large rRNAs. The RNAsalsa approach is designed to fill this gap. Capitalizing on the improved accuracy of pairwise consensus structures and informed by a priori knowledge of group-specific structural constraints, the tool provides both alignments and consensus structures that are of sufficient accuracy for routine phylogenetic analysis based on RNA-specific substitution models. The power of the approach is demonstrated using two rRNA data sets: a mitochondrial rRNA set of 26 Mammalia, and a collection of 28S nuclear rRNAs representative of the five major echinoderm groups.
Collapse
|
43
|
Harmanci AO, Sharma G, Mathews DH. Stochastic sampling of the RNA structural alignment space. Nucleic Acids Res 2009; 37:4063-75. [PMID: 19429694 PMCID: PMC2709569 DOI: 10.1093/nar/gkp276] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
A novel method is presented for predicting the common secondary structures and alignment of two homologous RNA sequences by sampling the 'structural alignment' space, i.e. the joint space of their alignments and common secondary structures. The structural alignment space is sampled according to a pseudo-Boltzmann distribution based on a pseudo-free energy change that combines base pairing probabilities from a thermodynamic model and alignment probabilities from a hidden Markov model. By virtue of the implicit comparative analysis between the two sequences, the method offers an improvement over single sequence sampling of the Boltzmann ensemble. A cluster analysis shows that the samples obtained from joint sampling of the structural alignment space cluster more closely than samples generated by the single sequence method. On average, the representative (centroid) structure and alignment of the most populated cluster in the sample of structures and alignments generated by joint sampling are more accurate than single sequence sampling and alignment based on sequence alone, respectively. The 'best' centroid structure that is closest to the known structure among all the centroids is, on average, more accurate than structure predictions of other methods. Additionally, cluster analysis identifies, on average, a few clusters, whose centroids can be presented as alternative candidates. The source code for the proposed method can be downloaded at http://rna.urmc.rochester.edu.
Collapse
Affiliation(s)
- Arif Ozgun Harmanci
- Department of Electrical and Computer Engineering, University of Rochester, Hopeman 204, RC Box 270126, Rochester, NY 14627, USA
| | | | | |
Collapse
|
44
|
Smit S, Knight R, Heringa J. RNA structure prediction from evolutionary patterns of nucleotide composition. Nucleic Acids Res 2009; 37:1378-86. [PMID: 19129237 PMCID: PMC2655677 DOI: 10.1093/nar/gkn987] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Structural elements in RNA molecules have a distinct nucleotide composition, which changes gradually over evolutionary time. We discovered certain features of these compositional patterns that are shared between all RNA families. Based on this information, we developed a structure prediction method that evaluates candidate structures for a set of homologous RNAs on their ability to reproduce the patterns exhibited by biological structures. The method is named SPuNC for ‘Structure Prediction using Nucleotide Composition’. In a performance test on a diverse set of RNA families we demonstrate that the SPuNC algorithm succeeds in selecting the most realistic structures in an ensemble. The average accuracy of top-scoring structures is significantly higher than the average accuracy of all ensemble members (improvements of more than 20% observed). In addition, a consensus structure that includes the most reliable base pairs gleaned from a set of top-scoring structures is generally more accurate than a consensus derived from the full structural ensemble. Our method achieves better accuracy than existing methods on several RNA families, including novel riboswitches and ribozymes. The results clearly show that nucleotide composition can be used to reveal the quality of RNA structures and thus the presented technique should be added to the set of prediction tools.
Collapse
Affiliation(s)
- S Smit
- Centre for Integrative Bioinformatics VU (IBIVU), Vrije Universiteit, 1081 HV Amsterdam, The Netherlands.
| | | | | |
Collapse
|
45
|
Mimouni NK, Lyngsø RB, Griffiths-Jones S, Hein J. An analysis of structural influences on selection in RNA genes. Mol Biol Evol 2009; 26:209-16. [PMID: 18948299 DOI: 10.1093/molbev/msn240] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2025] Open
Abstract
Noncoding RNAs (ncRNAs) are transcripts that do not code for protein but rather function as RNA in catalytic, regulatory, or structural roles in the cell. ncRNAs are involved in universally conserved biological processes, including protein synthesis and gene regulation, and have more specific roles, such as in X-chromosome inactivation in eutherian mammals. In this paper, we propose and investigate a hypothesis for patterns of sequence selection in structurally conserved ncRNAs. Previous attempts at defining RNA selection compared rates of evolution between paired and unpaired bases with largely inconclusive results. Our approach focuses only on paired bases in ncRNAs with conserved structure. By analogy to the different properties of codon positions based on the genetic code, we use a well-developed energy model for RNA structure to classify stem positions into structural classes and argue that they are under different selective constraints. We validate the hypothesis on several RNA families and use simulated data to verify the evolutionary origin of signals. Our class labeling is shown to be a better model of ncRNA evolution than the tradition of treating stem positions equally. As well as providing a better understanding of RNA evolution, the evolutionary footprint we identify can easily be incorporated into gene finders to improve their specificity.
Collapse
Affiliation(s)
- Naila K Mimouni
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | | | | | | |
Collapse
|
46
|
|
47
|
Bernhart SH, Hofacker IL, Will S, Gruber AR, Stadler PF. RNAalifold: improved consensus structure prediction for RNA alignments. BMC Bioinformatics 2008; 9:474. [PMID: 19014431 PMCID: PMC2621365 DOI: 10.1186/1471-2105-9-474] [Citation(s) in RCA: 424] [Impact Index Per Article: 24.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2008] [Accepted: 11/11/2008] [Indexed: 11/17/2022] Open
Abstract
Background The prediction of a consensus structure for a set of related RNAs is an important
first step for subsequent analyses. RNAalifold, which computes the minimum energy
structure that is simultaneously formed by a set of aligned sequences, is one of
the oldest and most widely used tools for this task. In recent years, several
alternative approaches have been advocated, pointing to several shortcomings of
the original RNAalifold approach. Results We show that the accuracy of RNAalifold predictions can be improved substantially
by introducing a different, more rational handling of alignment gaps, and by
replacing the rather simplistic model of covariance scoring with more
sophisticated RIBOSUM-like scoring matrices. These improvements are achieved
without compromising the computational efficiency of the algorithm. We show here
that the new version of RNAalifold not only outperforms the old one, but also
several other tools recently developed, on different datasets. Conclusion The new version of RNAalifold not only can replace the old one for almost any
application but it is also competitive with other approaches including those based
on SCFGs, maximum expected accuracy, or hierarchical nearest neighbor
classifiers.
Collapse
Affiliation(s)
- Stephan H Bernhart
- Department of Computer Science, University of Leipzig, Leipzig, Germany.
| | | | | | | | | |
Collapse
|
48
|
Abraham M, Dror O, Nussinov R, Wolfson HJ. Analysis and classification of RNA tertiary structures. RNA (NEW YORK, N.Y.) 2008; 14:2274-89. [PMID: 18824509 PMCID: PMC2578864 DOI: 10.1261/rna.853208] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 10/10/2007] [Accepted: 07/05/2008] [Indexed: 05/19/2023]
Abstract
There is a fast growing interest in noncoding RNA transcripts. These transcripts are not translated into proteins, but play essential roles in many cellular and pathological processes. Recent efforts toward comprehension of their function has led to a substantial increase in both the number and the size of solved RNA structures. With the aim of addressing questions relating to RNA structural diversity, we examined RNA conservation at three structural levels: primary, secondary, and tertiary structure. Additionally, we developed an automated method for classifying RNA structures based on spatial (three-dimensional [3D]) similarity. Applying the method to all solved RNA structures resulted in a classified database of RNA tertiary structures (DARTS). DARTS embodies 1333 solved RNA structures classified into 94 clusters. The classification is hierarchical, reflecting the structural relationship between and within clusters. We also developed an application for searching DARTS with a new structure. The search is fast and its performance was successfully tested on all solved RNA structures since the creation of DARTS. A user-friendly interface for both the database and the search application is available online. We show intracluster and intercluster similarities in DARTS and demonstrate the usefulness of the search application. The analysis reveals the current structural repertoire of RNA and exposes common global folds and local tertiary motifs. Further study of these conserved substructures may suggest possible RNA domains and building blocks. This should be beneficial for structure prediction and for gaining insights into structure-function relationships.
Collapse
Affiliation(s)
- Mira Abraham
- School of Computer Science, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | | | | | | |
Collapse
|
49
|
Brenneis M, Hering O, Lange C, Soppa J. Experimental characterization of Cis-acting elements important for translation and transcription in halophilic archaea. PLoS Genet 2008; 3:e229. [PMID: 18159946 PMCID: PMC2151090 DOI: 10.1371/journal.pgen.0030229] [Citation(s) in RCA: 116] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2007] [Accepted: 11/08/2007] [Indexed: 02/01/2023] Open
Abstract
The basal transcription apparatus of archaea is well characterized. However, much less is known about the mechanisms of transcription termination and translation initation. Recently, experimental determination of the 5′-ends of ten transcripts from Pyrobaculum aerophilum revealed that these are devoid of a 5′-UTR. Bioinformatic analysis indicated that many transcripts of other archaeal species might also be leaderless. The 5′-ends and 3′-ends of 40 transcripts of two haloarchaeal species, Halobacterium salinarum and Haloferax volcanii, have been determined. They were used to characterize the lengths of 5′-UTRs and 3′-UTRs and to deduce consensus sequence-elements for transcription and translation. The experimental approach was complemented with a bioinformatics analysis of the H. salinarum genome sequence. Furthermore, the influence of selected 5′-UTRs and 3′-UTRs on transcript stability and translational efficiency in vivo was characterized using a newly established reporter gene system, gene fusions, and real-time PCR. Consensus sequences for basal promoter elements could be refined and a novel element was discovered. A consensus motif probably important for transcriptional termination was established. All 40 haloarchaeal transcripts analyzed had a 3′-UTR (average size 57 nt), and their 3′-ends were not posttranscriptionally modified. Experimental data and genome analyses revealed that the majority of haloarchaeal transcripts are leaderless, indicating that this is the predominant mode for translation initiation in haloarchaea. Surprisingly, the 5′-UTRs of most leadered transcripts did not contain a Shine-Dalgarno (SD) sequence. A genome analysis indicated that less than 10% of all genes are preceded by a SD sequence and even most proximal genes in operons lack a SD sequence. Seven different leadered transcripts devoid of a SD sequence were efficiently translated in vivo, including artificial 5′-UTRs of random sequences. Thus, an interaction of the 5′-UTRs of these leadered transcripts with the 16S rRNA could be excluded. Taken together, either a scanning mechanism similar to the mechanism of translation initiation operating in eukaryotes or a novel mechanism must operate on most leadered haloarchaeal transcripts. Expression of the information encoded in the genome of an organism into its phenotype involves transcription of the DNA into messenger RNAs and translation of mRNAs into proteins. The textbook view is that an mRNA consists of an untranslated region (5′-UTR), an open reading frame encoding the protein, and another untranslated region (3′-UTR). We have determined the 5′-ends and the 3′-ends of 40 mRNAs of two haloarchaeal species and used this dataset to gain information about nucleotide elements important for transcription and translation. Two thirds of the mRNAs were devoid of a 5′-UTR, and therefore the major pathway for translation initiation in haloarchaea involves so-called leaderless transcripts. Very unexpectedly, most leadered mRNAs were found to be devoid of a sequence motif believed to be essential for translation initiation in bacteria and archaea (Shine-Dalgarno sequence). A bioinformatic genome analysis revealed that less than 10% of the genes contain a Shine-Dalgarno sequence. mRNAs lacking this motif were efficiently translated in vivo, including mRNAs with artificial 5′-UTRs of total random sequence. Thus, translation initiation on these mRNAs either involves a scanning mechanism similar to the mechanism operating in eukaryotes or a totally novel mechanism operating at least in haloarchaea.
Collapse
Affiliation(s)
- Mariam Brenneis
- Institute for Molecular Biosciences, Goethe-University, Frankfurt, Germany
| | | | | | | |
Collapse
|
50
|
Abstract
We present an easy-to-use webserver that makes it possible to simultaneously use a number of state of the art methods for performing multiple alignment and secondary structure prediction for noncoding RNA sequences. This makes it possible to use the programs without having to download the code and get the programs to run. The results of all the programs are presented on a webpage and can easily be downloaded for further analysis. Additional measures are calculated for each program to make it easier to judge the individual predictions, and a consensus prediction taking all the programs into account is also calculated. This website is free and open to all users and there is no login requirement. The webserver can be found at: http://genome.ku.dk/resources/war.
Collapse
Affiliation(s)
- Elfar Torarinsson
- Division of Genetics and Bioinformatics, IBHV, Faculty of Life Sciences, University of Copenhagen, Groennegaardsvej 3, DK-1870 Frederiksberg C, Denmark
| | | |
Collapse
|