1
|
Rinaldi S, Moroni E, Rozza R, Magistrato A. Frontiers and Challenges of Computing ncRNAs Biogenesis, Function and Modulation. J Chem Theory Comput 2024; 20:993-1018. [PMID: 38287883 DOI: 10.1021/acs.jctc.3c01239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2024]
Abstract
Non-coding RNAs (ncRNAs), generated from nonprotein coding DNA sequences, constitute 98-99% of the human genome. Non-coding RNAs encompass diverse functional classes, including microRNAs, small interfering RNAs, PIWI-interacting RNAs, small nuclear RNAs, small nucleolar RNAs, and long non-coding RNAs. With critical involvement in gene expression and regulation across various biological and physiopathological contexts, such as neuronal disorders, immune responses, cardiovascular diseases, and cancer, non-coding RNAs are emerging as disease biomarkers and therapeutic targets. In this review, after providing an overview of non-coding RNAs' role in cell homeostasis, we illustrate the potential and the challenges of state-of-the-art computational methods exploited to study non-coding RNAs biogenesis, function, and modulation. This can be done by directly targeting them with small molecules or by altering their expression by targeting the cellular engines underlying their biosynthesis. Drawing from applications, also taken from our work, we showcase the significance and role of computer simulations in uncovering fundamental facets of ncRNA mechanisms and modulation. This information may set the basis to advance gene modulation tools and therapeutic strategies to address unmet medical needs.
Collapse
Affiliation(s)
- Silvia Rinaldi
- National Research Council of Italy (CNR) - Institute of Chemistry of OrganoMetallic Compounds (ICCOM), c/o Area di Ricerca CNR di Firenze Via Madonna del Piano 10, 50019 Sesto Fiorentino, Florence, Italy
| | - Elisabetta Moroni
- National Research Council of Italy (CNR) - Institute of Chemical Sciences and Technologies (SCITEC), via Mario Bianco 9, 20131 Milano, Italy
| | - Riccardo Rozza
- National Research Council of Italy (CNR) - Institute of Material Foundry (IOM) c/o International School for Advanced Studies (SISSA), Via Bonomea, 265, 34136 Trieste, Italy
| | - Alessandra Magistrato
- National Research Council of Italy (CNR) - Institute of Material Foundry (IOM) c/o International School for Advanced Studies (SISSA), Via Bonomea, 265, 34136 Trieste, Italy
| |
Collapse
|
2
|
Kaloudas D, Pavlova N, Penchovsky R. GHOST-NOT and GHOST-YES: Two programs for generating high-speed biosensors with randomized oligonucleotide binding sites with NOT or YES Boolean logic functions based on experimentally validated algorithms. J Biotechnol 2023; 373:82-89. [PMID: 37499876 DOI: 10.1016/j.jbiotec.2023.07.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2023] [Accepted: 07/19/2023] [Indexed: 07/29/2023]
Abstract
High-speed allosteric hammerhead ribozymes can be engineered to distinguish well between a perfectly matching effector and the nucleic acid sequences with a few mismatches under physiologically relevant conditions. Such ribozymes can be designed to control the expression of exogenous mRNAs and can be used to develop new gene therapies, including anticancer treatments. The in vivo selection of such ribozymes is a complicated and lengthy procedure with no guarantee of success. Thus, in silico selection of high-speed ribozymes can be employed using secondary RNA structure computation based on the partition function of the RNA folding in combination with random search algorithms. This approach has already been proven very accurate in designing allosteric hammerhead ribozymes. Herein, we present two programs for the computational design of allosteric ribozymes sensing randomized oligonucleotides based on the extended version of the hammerhead ribozyme. A Generator for High-speed Oligonucleotide Sensing allosteric ribozymes with NOT logic function (GHOST-NOT) and a Generator for High-speed Oligonucleotide Sensing allosteric ribozymes with YES logic function (GHOST-YES) for computational design of high-speed allosteric ribozymes are described. The allosteric hammerhead ribozymes had a high self-cleavage rate of about 1.8 per minute and were very selective in sensing an effector sequence.
Collapse
Affiliation(s)
- Dimitrios Kaloudas
- Laboratory of Synthetic Biology and Bioinformatics, Faculty of Biology, Sofia University, "St. Kliment Ohridski", 1164 Sofia, 8 Dragan Tsankov Blvd., Sofia, Bulgaria
| | - Nikolet Pavlova
- Laboratory of Synthetic Biology and Bioinformatics, Faculty of Biology, Sofia University, "St. Kliment Ohridski", 1164 Sofia, 8 Dragan Tsankov Blvd., Sofia, Bulgaria
| | - Robert Penchovsky
- Laboratory of Synthetic Biology and Bioinformatics, Faculty of Biology, Sofia University, "St. Kliment Ohridski", 1164 Sofia, 8 Dragan Tsankov Blvd., Sofia, Bulgaria.
| |
Collapse
|
3
|
Lin BC, Katneni U, Jankowska KI, Meyer D, Kimchi-Sarfaty C. In silico methods for predicting functional synonymous variants. Genome Biol 2023; 24:126. [PMID: 37217943 PMCID: PMC10204308 DOI: 10.1186/s13059-023-02966-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 05/10/2023] [Indexed: 05/24/2023] Open
Abstract
Single nucleotide variants (SNVs) contribute to human genomic diversity. Synonymous SNVs are previously considered to be "silent," but mounting evidence has revealed that these variants can cause RNA and protein changes and are implicated in over 85 human diseases and cancers. Recent improvements in computational platforms have led to the development of numerous machine-learning tools, which can be used to advance synonymous SNV research. In this review, we discuss tools that should be used to investigate synonymous variants. We provide supportive examples from seminal studies that demonstrate how these tools have driven new discoveries of functional synonymous SNVs.
Collapse
Affiliation(s)
- Brian C Lin
- Hemostasis Branch 1, Division of Hemostasis, Office of Plasma Protein Therapeutics CMC, Office of Therapeutic Products, Center for Biologics Evaluation and Research, US FDA, Silver Spring, MD, USA
| | - Upendra Katneni
- Hemostasis Branch 1, Division of Hemostasis, Office of Plasma Protein Therapeutics CMC, Office of Therapeutic Products, Center for Biologics Evaluation and Research, US FDA, Silver Spring, MD, USA
| | - Katarzyna I Jankowska
- Hemostasis Branch 1, Division of Hemostasis, Office of Plasma Protein Therapeutics CMC, Office of Therapeutic Products, Center for Biologics Evaluation and Research, US FDA, Silver Spring, MD, USA
| | - Douglas Meyer
- Hemostasis Branch 1, Division of Hemostasis, Office of Plasma Protein Therapeutics CMC, Office of Therapeutic Products, Center for Biologics Evaluation and Research, US FDA, Silver Spring, MD, USA
| | - Chava Kimchi-Sarfaty
- Hemostasis Branch 1, Division of Hemostasis, Office of Plasma Protein Therapeutics CMC, Office of Therapeutic Products, Center for Biologics Evaluation and Research, US FDA, Silver Spring, MD, USA.
| |
Collapse
|
4
|
An allosteric ribozyme generator and an inverse folding ribozyme generator: Two computer programs for automated computational design of oligonucleotide-sensing allosteric hammerhead ribozymes with YES Boolean logic function based on experimentally validated algorithms. Comput Biol Med 2022; 145:105469. [DOI: 10.1016/j.compbiomed.2022.105469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2021] [Revised: 03/26/2022] [Accepted: 03/27/2022] [Indexed: 11/18/2022]
|
5
|
Flamm C, Hellmuth M, Merkle D, Nojgaard N, Stadler PF. Generic Context-Aware Group Contributions. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:429-442. [PMID: 32750852 DOI: 10.1109/tcbb.2020.2998948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Many properties of molecules vary systematically with changes in the structural formula and can thus be estimated from regression models defined on small structural building blocks, usually functional groups. Typically, such approaches are limited to a particular class of compounds and requires hand-curated lists of chemically plausible groups. This limits their use in particular in the context of generative approaches to explore large chemical spaces. Here we overcome this limitation by proposing a generic group contribution method that iteratively identifies significant regressors of increasing size. To this end, LASSO regression is used and the context-dependent contributions are "anchored" around a reference edge to reduce ambiguities and prevent overcounting due to multiple embeddings. We benchmark our approach, which is available as "Context AwaRe Group cOntribution" ( CARGO), on artificial data, typical applications from chemical thermodynamics. As we shall see, this method yields stable results with accuracies comparable to other regression techniques. As a by-product, we obtain interpretable additive contributions for individual chemical bonds and correction terms depending on local contexts.
Collapse
|
6
|
De Bisschop G, Allouche D, Frezza E, Masquida B, Ponty Y, Will S, Sargueil B. Progress toward SHAPE Constrained Computational Prediction of Tertiary Interactions in RNA Structure. Noncoding RNA 2021; 7:71. [PMID: 34842779 PMCID: PMC8628965 DOI: 10.3390/ncrna7040071] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Revised: 10/29/2021] [Accepted: 11/02/2021] [Indexed: 01/04/2023] Open
Abstract
As more sequencing data accumulate and novel puzzling genetic regulations are discovered, the need for accurate automated modeling of RNA structure increases. RNA structure modeling from chemical probing experiments has made tremendous progress, however accurately predicting large RNA structures is still challenging for several reasons: RNA are inherently flexible and often adopt many energetically similar structures, which are not reliably distinguished by the available, incomplete thermodynamic model. Moreover, computationally, the problem is aggravated by the relevance of pseudoknots and non-canonical base pairs, which are hardly predicted efficiently. To identify nucleotides involved in pseudoknots and non-canonical interactions, we scrutinized the SHAPE reactivity of each nucleotide of the 188 nt long lariat-capping ribozyme under multiple conditions. Reactivities analyzed in the light of the X-ray structure were shown to report accurately the nucleotide status. Those that seemed paradoxical were rationalized by the nucleotide behavior along molecular dynamic simulations. We show that valuable information on intricate interactions can be deduced from probing with different reagents, and in the presence or absence of Mg2+. Furthermore, probing at increasing temperature was remarkably efficient at pointing to non-canonical interactions and pseudoknot pairings. The possibilities of following such strategies to inform structure modeling software are discussed.
Collapse
Affiliation(s)
- Grégoire De Bisschop
- Université de Paris, CNRS, UMR 8038/CiTCoM, F-75006 Paris, France; (G.D.B.); (D.A.); (E.F.)
- Institut de Recherches Cliniques de Montréal (IRCM), Montréal, QC H2W 1R7, Canada
| | - Delphine Allouche
- Université de Paris, CNRS, UMR 8038/CiTCoM, F-75006 Paris, France; (G.D.B.); (D.A.); (E.F.)
- Institut Necker-Enfants Malades (INEM), Inserm U1151, 156 rue de Vaugirard, CEDEX 15, 75015 Paris, France
| | - Elisa Frezza
- Université de Paris, CNRS, UMR 8038/CiTCoM, F-75006 Paris, France; (G.D.B.); (D.A.); (E.F.)
| | - Benoît Masquida
- Université de Strasbourg, CNRS UMR7156 GMGM, 67084 Strasbourg, France;
| | - Yann Ponty
- Ecole Polytechnique, CNRS UMR 7161, LIX, 91120 Palaiseau, France; (Y.P.); (S.W.)
| | - Sebastian Will
- Ecole Polytechnique, CNRS UMR 7161, LIX, 91120 Palaiseau, France; (Y.P.); (S.W.)
| | - Bruno Sargueil
- Université de Paris, CNRS, UMR 8038/CiTCoM, F-75006 Paris, France; (G.D.B.); (D.A.); (E.F.)
| |
Collapse
|
7
|
Zhao Q, Zhao Z, Fan X, Yuan Z, Mao Q, Yao Y. Review of machine learning methods for RNA secondary structure prediction. PLoS Comput Biol 2021; 17:e1009291. [PMID: 34437528 PMCID: PMC8389396 DOI: 10.1371/journal.pcbi.1009291] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open
Abstract
Secondary structure plays an important role in determining the function of noncoding RNAs. Hence, identifying RNA secondary structures is of great value to research. Computational prediction is a mainstream approach for predicting RNA secondary structure. Unfortunately, even though new methods have been proposed over the past 40 years, the performance of computational prediction methods has stagnated in the last decade. Recently, with the increasing availability of RNA structure data, new methods based on machine learning (ML) technologies, especially deep learning, have alleviated the issue. In this review, we provide a comprehensive overview of RNA secondary structure prediction methods based on ML technologies and a tabularized summary of the most important methods in this field. The current pending challenges in the field of RNA secondary structure prediction and future trends are also discussed.
Collapse
Affiliation(s)
- Qi Zhao
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning, China
| | - Zheng Zhao
- School of Information Science and Technology, Dalian Maritime University, Dalian, Liaoning, China
| | - Xiaoya Fan
- School of Software, Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province, Dalian University of Technology, Dalian, Liaoning, China
| | - Zhengwei Yuan
- Key Laboratory of Health Ministry for Congenital Malformation, Shengjing Hospital of China Medical University, Shenyang, Liaoning, China
| | - Qian Mao
- College of Light Industry, Liaoning University, Shenyang, Liaoning, China
- Key Laboratory of Agroproducts Processing Technology, Changchun University, Changchun, Jilin, China
| | - Yudong Yao
- Department of Electrical and Computer Engineering, Stevens Institute of Technology, Hoboken, New Jersey, United States of America
| |
Collapse
|
8
|
Abstract
Alignments of discrete objects can be constructed in a very general setting as super-objects from which the constituent objects are recovered by means of projections. Here, we focus on contact maps, i.e. undirected graphs with an ordered set of vertices. These serve as natural discretizations of RNA and protein structures. In the general case, the alignment problem for vertex-ordered graphs is NP-complete. In the special case of RNA secondary structures, i.e. crossing-free matchings, however, the alignments have a recursive structure. The alignment problem then can be solved by a variant of the Sankoff algorithm in polynomial time. Moreover, the tree or forest alignments of RNA secondary structure can be understood as the alignments of ordered edge sets.
Collapse
Affiliation(s)
- Peter F Stadler
- Bioinformatics Group, Department of Computer Science and Interdisciplinary Centre for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, 04107 Leipzig, Germany.,German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Competence Centre for Scalable Data Services and Solutions Dresden-Leipzig, Leipzig Research Centre for Civilization Diseases, and Centre for Biotechnology and Biomedicine at Leipzig University, Universität Leipzig, Leipzig, Germany.,Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, 04103 Leipzig, Germany.,Institute for Theoretical Chemistry, University of Vienna, Währingerstrasse 17, 1090 Wien, Austria.,Facultad de Ciencias, Universidad National de Colombia, Bogotá, Colombia.,Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA
| |
Collapse
|
9
|
Reinharz V, Sarrazin-Gendron R, Waldispühl J. Modeling and Predicting RNA Three-Dimensional Structures. Methods Mol Biol 2021; 2284:17-42. [PMID: 33835435 DOI: 10.1007/978-1-0716-1307-8_2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Modeling the three-dimensional structure of RNAs is a milestone toward better understanding and prediction of nucleic acids molecular functions. Physics-based approaches and molecular dynamics simulations are not tractable on large molecules with all-atom models. To address this issue, coarse-grained models of RNA three-dimensional structures have been developed. In this chapter, we describe a graphical modeling based on the Leontis-Westhof extended base pair classification. This representation of RNA structures enables us to identify highly conserved structural motifs with complex nucleotide interactions in structure databases. We show how to take advantage of this knowledge to quickly predict three-dimensional structures of large RNA molecules and present the RNA-MoIP web server (http://rnamoip.cs.mcgill.ca) that streamlines the computational and visualization processes. Finally, we show recent advances in the prediction of local 3D motifs from sequence data with the BayesPairing software and discuss its impact toward complete 3D structure prediction.
Collapse
Affiliation(s)
- Vladimir Reinharz
- Department of Computer Science, Université du Québec à Montréal, Montréal, QC, Canada
| | | | - Jérôme Waldispühl
- School of Computer Science, McGill University, Montréal, QC, Canada.
| |
Collapse
|
10
|
Chillón I, Marcia M. The molecular structure of long non-coding RNAs: emerging patterns and functional implications. Crit Rev Biochem Mol Biol 2020; 55:662-690. [PMID: 33043695 DOI: 10.1080/10409238.2020.1828259] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Long non-coding RNAs (lncRNAs) are recently-discovered transcripts that regulate vital cellular processes and are crucially connected to diseases. Despite their unprecedented molecular complexity, it is emerging that lncRNAs possess distinct structural motifs. Remarkably, the 3D shape and topology of full-length, native lncRNAs have been visualized for the first time in the last year. These studies reveal that lncRNA structures dictate lncRNA functions. Here, we review experimentally determined lncRNA structures and emphasize that lncRNA structural characterization requires synergistic integration of computational, biochemical and biophysical approaches. Based on these emerging paradigms, we discuss how to overcome the challenges posed by the complex molecular architecture of lncRNAs, with the goal of obtaining a detailed understanding of lncRNA functions and molecular mechanisms in the future.
Collapse
Affiliation(s)
- Isabel Chillón
- European Molecular Biology Laboratory (EMBL) Grenoble, Grenoble, France
| | - Marco Marcia
- European Molecular Biology Laboratory (EMBL) Grenoble, Grenoble, France
| |
Collapse
|
11
|
Singh J, Hanson J, Paliwal K, Zhou Y. RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning. Nat Commun 2019; 10:5407. [PMID: 31776342 PMCID: PMC6881452 DOI: 10.1038/s41467-019-13395-9] [Citation(s) in RCA: 181] [Impact Index Per Article: 30.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2019] [Accepted: 11/01/2019] [Indexed: 01/03/2023] Open
Abstract
The majority of our human genome transcribes into noncoding RNAs with unknown structures and functions. Obtaining functional clues for noncoding RNAs requires accurate base-pairing or secondary-structure prediction. However, the performance of such predictions by current folding-based algorithms has been stagnated for more than a decade. Here, we propose the use of deep contextual learning for base-pair prediction including those noncanonical and non-nested (pseudoknot) base pairs stabilized by tertiary interactions. Since only [Formula: see text]250 nonredundant, high-resolution RNA structures are available for model training, we utilize transfer learning from a model initially trained with a recent high-quality bpRNA dataset of [Formula: see text]10,000 nonredundant RNAs made available through comparative analysis. The resulting method achieves large, statistically significant improvement in predicting all base pairs, noncanonical and non-nested base pairs in particular. The proposed method (SPOT-RNA), with a freely available server and standalone software, should be useful for improving RNA structure modeling, sequence alignment, and functional annotations.
Collapse
Affiliation(s)
- Jaswinder Singh
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD, 4111, Australia
| | - Jack Hanson
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD, 4111, Australia
| | - Kuldip Paliwal
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD, 4111, Australia.
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Dr., Southport, QLD, 4222, Australia.
| |
Collapse
|
12
|
Antczak M, Zablocki M, Zok T, Rybarczyk A, Blazewicz J, Szachniuk M. RNAvista: a webserver to assess RNA secondary structures with non-canonical base pairs. Bioinformatics 2019; 35:152-155. [PMID: 29985979 PMCID: PMC6298044 DOI: 10.1093/bioinformatics/bty609] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2018] [Accepted: 07/06/2018] [Indexed: 01/23/2023] Open
Abstract
Motivation In the study of 3D RNA structure, information about non-canonical interactions between nucleobases is increasingly important. Specialized databases support investigation of this issue based on experimental data, and several programs can annotate non-canonical base pairs in the RNA 3D structure. However, predicting the extended RNA secondary structure which describes both canonical and non-canonical interactions remains difficult. Results Here, we present RNAvista that allows predicting an extended RNA secondary structure from sequence or from the list enumerating canonical base pairs only. RNAvista is implemented as a publicly available webserver with user-friendly interface. It runs on all major web browsers. Availability and implementation http://rnavista.cs.put.poznan.pl
Collapse
Affiliation(s)
- Maciej Antczak
- Institute of Computing Science, Poznan University of Technology, Poznan, Poland.,Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| | - Marcin Zablocki
- Institute of Computing Science, Poznan University of Technology, Poznan, Poland
| | - Tomasz Zok
- Institute of Computing Science, Poznan University of Technology, Poznan, Poland
| | - Agnieszka Rybarczyk
- Institute of Computing Science, Poznan University of Technology, Poznan, Poland.,Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| | - Jacek Blazewicz
- Institute of Computing Science, Poznan University of Technology, Poznan, Poland.,Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| | - Marta Szachniuk
- Institute of Computing Science, Poznan University of Technology, Poznan, Poland.,Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| |
Collapse
|
13
|
Zhang QL, Wang H, Zhu QH, Wang XX, Li YM, Chen JY, Morikawa H, Yang LF, Wang YJ. Genome-Wide Identification and Transcriptomic Analysis of MicroRNAs Across Various Amphioxus Organs Using Deep Sequencing. Front Genet 2019; 10:877. [PMID: 31616471 PMCID: PMC6775235 DOI: 10.3389/fgene.2019.00877] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2019] [Accepted: 08/21/2019] [Indexed: 01/28/2023] Open
Abstract
Amphioxus is the closest living invertebrate proxy of the vertebrate ancestor. Systematic gene identification and expression profile analysis of amphioxus organs are thus important for clarifying the molecular mechanisms of organ function formation and further understanding the evolutionary origin of organs and genes in vertebrates. The precise regulation of microRNAs (miRNAs) is crucial for the functional specification and differentiation of organs. In particular, those miRNAs that are expressed specifically in organs (OSMs) play key roles in organ identity, differentiation, and function. In this study, the genome-wide miRNA transcriptome was analyzed in eight organs of adult amphioxus Branchiostoma belcheri using deep sequencing. A total of 167 known miRNAs and 23 novel miRNAs (named novel_mir), including 139 conserved miRNAs, were discovered, and 79 of these were identified as OSMs. Additionally, analyses of the expression patterns of eight randomly selected known miRNAs demonstrated the accuracy of the miRNA deep sequencing that was used in this study. Furthermore, potentially OSM-regulated genes were predicted for each organ type. Functional enrichment of these predicted targets, as well as further functional analyses of known OSMs, was conducted. We found that the OSMs were potentially to be involved in organ-specific functions, such as epidermis development, gonad development, muscle cell development, proteolysis, lipid metabolism, and generation of neurons. Moreover, OSMs with non-organ-specific functions were detected and primarily include those related to innate immunity and response to stimuli. These findings provide insights into the regulatory roles of OSMs in various amphioxus organs.
Collapse
Affiliation(s)
- Qi-Lin Zhang
- Guangxi Key Laboratory of Beibu Gulf Marine Biodiversity Conservation, Ocean College, Beibu Gulf University, Qinzhou, China.,Faculty of Life Science and Technology, Kunming University of Science and Technology, Kunming, China
| | - Hong Wang
- Guangxi Key Laboratory of Beibu Gulf Marine Biodiversity Conservation, Ocean College, Beibu Gulf University, Qinzhou, China
| | | | - Xiao-Xue Wang
- Guangxi Key Laboratory of Beibu Gulf Marine Biodiversity Conservation, Ocean College, Beibu Gulf University, Qinzhou, China
| | - Yi-Min Li
- Guangxi Key Laboratory of Beibu Gulf Marine Biodiversity Conservation, Ocean College, Beibu Gulf University, Qinzhou, China
| | - Jun-Yuan Chen
- Evo-devo Institute, School of Life Sciences, Nanjing University, Nanjing, China
| | - Hideaki Morikawa
- Faculty of Textile Science and Technology, Shinshu University, Ueda, Nagano, Japan
| | | | - Yu-Jun Wang
- Guangxi Key Laboratory of Beibu Gulf Marine Biodiversity Conservation, Ocean College, Beibu Gulf University, Qinzhou, China
| |
Collapse
|
14
|
RNApolis: Computational Platform for RNA Structure Analysis. FOUNDATIONS OF COMPUTING AND DECISION SCIENCES 2019. [DOI: 10.2478/fcds-2019-0012] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Abstract
In the 1970s, computer scientists began to engage in research in the field of structural biology. The first structural databases, as well as models and methods supporting the analysis of biomolecule structures, started to be created. RNA was put at the centre of scientific interest quite late. However, more and more methods dedicated to this molecule are currently being developed. This paper presents RNApolis - a new computing platform, which offers access to seven bioinformatic tools developed to support the RNA structure study. The set of tools include a structural database and systems for predicting, modelling, annotating and evaluating the RNA structure. RNApolis supports research at different structural levels and allows the discovery, establishment, and validation of relationships between the primary, secondary and tertiary structure of RNAs. The platform is freely available at http://rnapolis.pl
Collapse
|
15
|
Mathews DH. How to benchmark RNA secondary structure prediction accuracy. Methods 2019; 162-163:60-67. [PMID: 30951834 DOI: 10.1016/j.ymeth.2019.04.003] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2018] [Revised: 03/24/2019] [Accepted: 04/01/2019] [Indexed: 11/18/2022] Open
Abstract
RNA secondary structure prediction is widely used. As new methods are developed, these are often benchmarked for accuracy against existing methods. This review discusses good practices for performing these benchmarks, including the choice of benchmarking structures, metrics to quantify accuracy, the importance of allowing flexibility for pairs in the accepted structure, and the importance of statistical testing for significance.
Collapse
Affiliation(s)
- David H Mathews
- Center for RNA Biology, Department of Biochemistry & Biophysics, and Department of Biostatistics & Computational Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, United States.
| |
Collapse
|
16
|
Bellaousov S, Kayedkhordeh M, Peterson RJ, Mathews DH. Accelerated RNA secondary structure design using preselected sequences for helices and loops. RNA (NEW YORK, N.Y.) 2018; 24:1555-1567. [PMID: 30097542 PMCID: PMC6191713 DOI: 10.1261/rna.066324.118] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/07/2018] [Accepted: 08/06/2018] [Indexed: 06/08/2023]
Abstract
Nucleic acids can be designed to be nano-machines, pharmaceuticals, or probes. RNA secondary structures can form the basis of self-assembling nanostructures. There are only four natural RNA bases, therefore it can be difficult to design sequences that fold to a single, specified structure because many other structures are often possible for a given sequence. One approach taken by state-of-the-art sequence design methods is to select sequences that fold to the specified structure using stochastic, iterative refinement. The goal of this work is to accelerate design. Many existing iterative methods select and refine sequences one base pair and one unpaired nucleotide at a time. Here, the hypothesis that sequences can be preselected in order to accelerate design was tested. To this aim, a database was built of helix sequences that demonstrate thermodynamic features found in natural sequences and that also have little tendency to cross-hybridize. Additionally, a database was assembled of RNA loop sequences with low helix-formation propensity and little tendency to cross-hybridize with either the helices or other loops. These databases of preselected sequences accelerate the selection of sequences that fold with minimal ensemble defect by replacing some of the trial and error of current refinement approaches. When using the database of preselected sequences as compared to randomly chosen sequences, sequences for natural structures are designed 36 times faster, and random structures are designed six times faster. The sequences selected with the aid of the database have similar ensemble defect as those sequences selected at random. The sequence database is part of RNAstructure package at http://rna.urmc.rochester.edu/RNAstructure.html.
Collapse
Affiliation(s)
- Stanislav Bellaousov
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York 14642, USA
| | - Mohammad Kayedkhordeh
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York 14642, USA
| | | | - David H Mathews
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York 14642, USA
- Department of Biostatistics and Computational Biology, University of Rochester Medical Center, Rochester, New York 14642, USA
| |
Collapse
|
17
|
Tanzer A, Hofacker IL, Lorenz R. RNA modifications in structure prediction - Status quo and future challenges. Methods 2018; 156:32-39. [PMID: 30385321 DOI: 10.1016/j.ymeth.2018.10.019] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2018] [Revised: 10/12/2018] [Accepted: 10/26/2018] [Indexed: 01/01/2023] Open
Abstract
Chemical modifications of RNA nucleotides change their identity and characteristics and thus alter genetic and structural information encoded in the genomic DNA. tRNA and rRNA are probably the most heavily modified genes, and often depend on derivatization or isomerization of their nucleobases in order to correctly fold into their functional structures. Recent RNomics studies, however, report transcriptome wide RNA modification and suggest a more general regulation of structuredness of RNAs by this so called epitranscriptome. Modification seems to require specific substrate structures, which in turn are stabilized or destabilized and thus promote or inhibit refolding events of regulatory RNA structures. In this review, we revisit RNA modifications and the related structures from a computational point of view. We discuss known substrate structures, their properties such as sub-motifs as well as consequences of modifications on base pairing patterns and possible refolding events. Given that efficient RNA structure prediction methods for canonical base pairs have been established several decades ago, we review to what extend these methods allow the inclusion of modified nucleotides to model and study epitranscriptomic effects on RNA structures.
Collapse
Affiliation(s)
- Andrea Tanzer
- Department of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Waehringerstrasse 17, 1090 Vienna, Austria
| | - Ivo L Hofacker
- Department of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Waehringerstrasse 17, 1090 Vienna, Austria; Research Group Bioinformatics and Computational Biology, Faculty of Computer Science, University of Vienna, Waehringerstrasse 29, 1090 Vienna, Austria
| | - Ronny Lorenz
- Department of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Waehringerstrasse 17, 1090 Vienna, Austria
| |
Collapse
|
18
|
Guo QH, Sun LH. Combinatorics of Contacts in Protein Contact Maps. Bull Math Biol 2017; 80:385-403. [PMID: 29230703 DOI: 10.1007/s11538-017-0380-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2016] [Accepted: 12/05/2017] [Indexed: 10/18/2022]
Abstract
Contacts play a fundamental role in the study of protein structure and folding problems. The contact map of a protein can be represented by arranging its amino acids on a horizontal line and drawing an arc between two residues if they form a contact. In this paper, we are mainly concerned with the combinatorial enumeration of the arcs in m-regular linear stack, an elementary structure of the protein contact map, which was introduced by Chen et al. (J Comput Biol 21(12):915-935, 2014). We modify the generating function for m-regular linear stacks by introducing a new variable y regarding to the number of arcs and obtain an equation satisfied by the generating function for m-regular linear stacks with n vertices and k arcs. Consequently, we also derive an equation satisfied by the generating function of the overall number of arcs in m-regular linear stacks with n vertices.
Collapse
Affiliation(s)
- Qiang-Hui Guo
- Center for Combinatorics, LPMC, Nankai University, Tianjin, 300071, People's Republic of China
| | - Lisa H Sun
- Center for Combinatorics, LPMC, Nankai University, Tianjin, 300071, People's Republic of China.
| |
Collapse
|
19
|
Sloma MF, Mathews DH. Base pair probability estimates improve the prediction accuracy of RNA non-canonical base pairs. PLoS Comput Biol 2017; 13:e1005827. [PMID: 29107980 PMCID: PMC5690697 DOI: 10.1371/journal.pcbi.1005827] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2017] [Revised: 11/16/2017] [Accepted: 10/17/2017] [Indexed: 12/21/2022] Open
Abstract
Prediction of RNA tertiary structure from sequence is an important problem, but generating accurate structure models for even short sequences remains difficult. Predictions of RNA tertiary structure tend to be least accurate in loop regions, where non-canonical pairs are important for determining the details of structure. Non-canonical pairs can be predicted using a knowledge-based model of structure that scores nucleotide cyclic motifs, or NCMs. In this work, a partition function algorithm is introduced that allows the estimation of base pairing probabilities for both canonical and non-canonical interactions. Pairs that are predicted to be probable are more likely to be found in the true structure than pairs of lower probability. Pair probability estimates can be further improved by predicting the structure conserved across multiple homologous sequences using the TurboFold algorithm. These pairing probabilities, used in concert with prior knowledge of the canonical secondary structure, allow accurate inference of non-canonical pairs, an important step towards accurate prediction of the full tertiary structure. Software to predict non-canonical base pairs and pairing probabilities is now provided as part of the RNAstructure software package.
Collapse
Affiliation(s)
- Michael F. Sloma
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, NY, United States of America
| | - David H. Mathews
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, NY, United States of America
- Department of Biostatistics & Computational Biology, University of Rochester Medical Center, Rochester, NY, United States of America
- * E-mail:
| |
Collapse
|
20
|
Fallmann J, Will S, Engelhardt J, Grüning B, Backofen R, Stadler PF. Recent advances in RNA folding. J Biotechnol 2017; 261:97-104. [DOI: 10.1016/j.jbiotec.2017.07.007] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2017] [Revised: 07/02/2017] [Accepted: 07/04/2017] [Indexed: 12/23/2022]
|
21
|
RNA structure prediction: from 2D to 3D. Emerg Top Life Sci 2017; 1:275-285. [DOI: 10.1042/etls20160027] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2017] [Revised: 07/27/2017] [Accepted: 08/10/2017] [Indexed: 11/17/2022]
Abstract
We summarize different levels of RNA structure prediction, from classical 2D structure to extended secondary structure and motif-based research toward 3D structure prediction of RNA. We outline the importance of classical secondary structure during all those levels of structure prediction.
Collapse
|
22
|
Biocomputational identification and validation of novel microRNAs predicted from bubaline whole genome shotgun sequences. Comput Biol Chem 2017; 70:96-106. [PMID: 28844020 DOI: 10.1016/j.compbiolchem.2017.08.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2016] [Revised: 07/01/2017] [Accepted: 08/07/2017] [Indexed: 02/07/2023]
Abstract
MicroRNAs (miRNAs) are small (19-25 base long), non-coding RNAs that regulate post-transcriptional gene expression by cleaving targeted mRNAs in several eukaryotes. The miRNAs play vital roles in multiple biological and metabolic processes, including developmental timing, signal transduction, cell maintenance and differentiation, diseases and cancers. Experimental identification of microRNAs is expensive and lab-intensive. Alternatively, computational approaches for predicting putative miRNAs from genomic or exomic sequences rely on features of miRNAs viz. secondary structures, sequence conservation, minimum free energy index (MFEI) etc. To date, not a single miRNA has been identified in bubaline (Bubalus bubalis), which is an economically important livestock. The present study aims at predicting the putative miRNAs of buffalo using comparative computational approach from buffalo whole genome shotgun sequencing data (INSDC: AWWX00000000.1). The sequences were blasted against the known mammalian miRNA. The obtained miRNAs were then passed through a series of filtration criteria to obtain the set of predicted (putative and novel) bubaline miRNA. Eight miRNAs were selected based on lowest E-value and validated by real time PCR (SYBR green chemistry) using RNU6 as endogenous control. The results from different trails of real time PCR shows that out of selected 8 miRNAs, only 2 (hsa-miR-1277-5p; bta-miR-2285b) are not expressed in bubaline PBMCs. The potential target genes based on their sequence complementarities were then predicted using miRanda. This work is the first report on prediction of bubaline miRNA from whole genome sequencing data followed by experimental validation. The finding could pave the way to future studies in economically important traits in buffalo.
Collapse
|
23
|
Regular Simple Queues of Protein Contact Maps. Bull Math Biol 2016; 79:21-35. [PMID: 27844300 DOI: 10.1007/s11538-016-0212-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2016] [Accepted: 09/21/2016] [Indexed: 10/20/2022]
Abstract
A protein fold can be viewed as a self-avoiding walk in certain lattice model, and its contact map is a graph that represents the patterns of contacts in the fold. Goldman, Istrail, and Papadimitriou showed that a contact map in the 2D square lattice can be decomposed into at most two stacks and one queue. In the terminology of combinatorics, stacks and queues are noncrossing and nonnesting partitions, respectively. In this paper, we are concerned with 2-regular and 3-regular simple queues, for which the degree of each vertex is at most one and the arc lengths are at least 2 and 3, respectively. We show that 2-regular simple queues are in one-to-one correspondence with hill-free Motzkin paths, which have been enumerated by Barcucci, Pergola, Pinzani, and Rinaldi by using the Enumerating Combinatorial Objects method. We derive a recurrence relation for the generating function of Motzkin paths with [Formula: see text] peaks at level i, which reduces to the generating function for hill-free Motzkin paths. Moreover, we show that 3-regular simple queues are in one-to-one correspondence with Motzkin paths avoiding certain patterns. Then we obtain a formula for the generating function of 3-regular simple queues. Asymptotic formulas for 2-regular and 3-regular simple queues are derived based on the generating functions.
Collapse
|
24
|
Roll J, Zirbel CL, Sweeney B, Petrov AI, Leontis N. JAR3D Webserver: Scoring and aligning RNA loop sequences to known 3D motifs. Nucleic Acids Res 2016; 44:W320-7. [PMID: 27235417 PMCID: PMC4987954 DOI: 10.1093/nar/gkw453] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2016] [Accepted: 05/11/2016] [Indexed: 12/20/2022] Open
Abstract
Many non-coding RNAs have been identified and may function by forming 2D and 3D structures. RNA hairpin and internal loops are often represented as unstructured on secondary structure diagrams, but RNA 3D structures show that most such loops are structured by non-Watson-Crick basepairs and base stacking. Moreover, different RNA sequences can form the same RNA 3D motif. JAR3D finds possible 3D geometries for hairpin and internal loops by matching loop sequences to motif groups from the RNA 3D Motif Atlas, by exact sequence match when possible, and by probabilistic scoring and edit distance for novel sequences. The scoring gauges the ability of the sequences to form the same pattern of interactions observed in 3D structures of the motif. The JAR3D webserver at http://rna.bgsu.edu/jar3d/ takes one or many sequences of a single loop as input, or else one or many sequences of longer RNAs with multiple loops. Each sequence is scored against all current motif groups. The output shows the ten best-matching motif groups. Users can align input sequences to each of the motif groups found by JAR3D. JAR3D will be updated with every release of the RNA 3D Motif Atlas, and so its performance is expected to improve over time.
Collapse
Affiliation(s)
- James Roll
- Department of Mathematics and Statistics, Bowling Green State University, Bowling Green, OH 43403, USA
| | - Craig L Zirbel
- Department of Mathematics and Statistics, Bowling Green State University, Bowling Green, OH 43403, USA
| | - Blake Sweeney
- Department of Biology, Bowling Green State University, Bowling Green, OH 43403, USA
| | - Anton I Petrov
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Neocles Leontis
- Department of Chemistry, Bowling Green State University, Bowling Green, OH 43403, USA
| |
Collapse
|
25
|
Lorenz R, Wolfinger MT, Tanzer A, Hofacker IL. Predicting RNA secondary structures from sequence and probing data. Methods 2016; 103:86-98. [PMID: 27064083 DOI: 10.1016/j.ymeth.2016.04.004] [Citation(s) in RCA: 69] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2015] [Revised: 03/29/2016] [Accepted: 04/04/2016] [Indexed: 01/08/2023] Open
Abstract
RNA secondary structures have proven essential for understanding the regulatory functions performed by RNA such as microRNAs, bacterial small RNAs, or riboswitches. This success is in part due to the availability of efficient computational methods for predicting RNA secondary structures. Recent advances focus on dealing with the inherent uncertainty of prediction by considering the ensemble of possible structures rather than the single most stable one. Moreover, the advent of high-throughput structural probing has spurred the development of computational methods that incorporate such experimental data as auxiliary information.
Collapse
Affiliation(s)
- Ronny Lorenz
- University of Vienna, Faculty of Chemistry, Department of Theoretical Chemistry, Währingerstrasse 17, 1090 Vienna, Austria.
| | - Michael T Wolfinger
- University of Vienna, Faculty of Chemistry, Department of Theoretical Chemistry, Währingerstrasse 17, 1090 Vienna, Austria; Medical University of Vienna, Center for Anatomy and Cell Biology, Währingerstraße 13, 1090 Vienna, Austria.
| | - Andrea Tanzer
- University of Vienna, Faculty of Chemistry, Department of Theoretical Chemistry, Währingerstrasse 17, 1090 Vienna, Austria.
| | - Ivo L Hofacker
- University of Vienna, Faculty of Chemistry, Department of Theoretical Chemistry, Währingerstrasse 17, 1090 Vienna, Austria; University of Vienna, Faculty of Computer Science, Research Group Bioinformatics and Computational Biology, Währingerstr. 29, 1090 Vienna, Austria.
| |
Collapse
|
26
|
zu Siederdissen CH, Prohaska SJ, Stadler PF. Algebraic Dynamic Programming over general data structures. BMC Bioinformatics 2015; 16 Suppl 19:S2. [PMID: 26695390 PMCID: PMC4686793 DOI: 10.1186/1471-2105-16-s19-s2] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Background Dynamic programming algorithms provide exact solutions to many problems in computational biology, such as sequence alignment, RNA folding, hidden Markov models (HMMs), and scoring of phylogenetic trees. Structurally analogous algorithms compute optimal solutions, evaluate score distributions, and perform stochastic sampling. This is explained in the theory of Algebraic Dynamic Programming (ADP) by a strict separation of state space traversal (usually represented by a context free grammar), scoring (encoded as an algebra), and choice rule. A key ingredient in this theory is the use of yield parsers that operate on the ordered input data structure, usually strings or ordered trees. The computation of ensemble properties, such as a posteriori probabilities of HMMs or partition functions in RNA folding, requires the combination of two distinct, but intimately related algorithms, known as the inside and the outside recursion. Only the inside recursions are covered by the classical ADP theory. Results The ideas of ADP are generalized to a much wider scope of data structures by relaxing the concept of parsing. This allows us to formalize the conceptual complementarity of inside and outside variables in a natural way. We demonstrate that outside recursions are generically derivable from inside decomposition schemes. In addition to rephrasing the well-known algorithms for HMMs, pairwise sequence alignment, and RNA folding we show how the TSP and the shortest Hamiltonian path problem can be implemented efficiently in the extended ADP framework. As a showcase application we investigate the ancient evolution of HOX gene clusters in terms of shortest Hamiltonian paths. Conclusions The generalized ADP framework presented here greatly facilitates the development and implementation of dynamic programming algorithms for a wide spectrum of applications.
Collapse
|
27
|
Waldispühl J, Reinharz V. Modeling and predicting RNA three-dimensional structures. Methods Mol Biol 2015; 1269:101-21. [PMID: 25577374 DOI: 10.1007/978-1-4939-2291-8_6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/16/2023]
Abstract
Modeling the three-dimensional structure of RNAs is a milestone toward better understanding and prediction of nucleic acids molecular functions. Physics-based approaches and molecular dynamics simulations are not tractable on large molecules with all-atom models. To address this issue, coarse-grained models of RNA three-dimensional structures have been developed. In this chapter, we describe a graphical modeling based on the Leontis-Westhof extended base-pair classification. This representation of RNA structures enables us to identify highly conserved structural motifs with complex nucleotide interactions in structure databases. Further, we show how to take advantage of this knowledge to quickly and simply predict three-dimensional structures of large RNA molecules.
Collapse
Affiliation(s)
- Jérôme Waldispühl
- School of Computer Science, McGill University, 3480 University Street, Room 318, Montreal, QC, Canada, H3A 0E9,
| | | |
Collapse
|
28
|
Rybarczyk A, Szostak N, Antczak M, Zok T, Popenda M, Adamiak R, Blazewicz J, Szachniuk M. New in silico approach to assessing RNA secondary structures with non-canonical base pairs. BMC Bioinformatics 2015; 16:276. [PMID: 26329823 PMCID: PMC4557229 DOI: 10.1186/s12859-015-0718-6] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2015] [Accepted: 08/24/2015] [Indexed: 11/19/2022] Open
Abstract
Background The function of RNA is strongly dependent on its structure, so an appropriate recognition of this structure, on every level of organization, is of great importance. One particular concern is the assessment of base-base interactions, described as the secondary structure, the knowledge of which greatly facilitates an interpretation of RNA function and allows for structure analysis on the tertiary level. The RNA secondary structure can be predicted from a sequence using in silico methods often adjusted with experimental data, or assessed from 3D structure atom coordinates. Computational approaches typically consider only canonical, Watson-Crick and wobble base pairs. Handling of non-canonical interactions, important for a full description of RNA structure, is still very difficult. Results We introduce our novel approach to assessing an extended RNA secondary structure, which characterizes both canonical and non-canonical base pairs, along with their type classification. It is based on predicting the RNA 3D structure from a user-provided sequence or a secondary structure that only describes canonical base pairs, and then deriving the extended secondary structure from atom coordinates. In our example implementation, this was achieved by integrating the functionality of two fully automated, high fidelity methods in a computational pipeline: RNAComposer for the 3D RNA structure prediction and RNApdbee for base-pair annotation. Conclusions The presented methodology ties together existing applications for RNA 3D structure prediction and base-pair annotation. The example performance, applying RNAComposer and RNApdbee, reveals better accuracy in non-canonical base pair assessment than the compared methods that directly predict RNA secondary structure. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0718-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Agnieszka Rybarczyk
- Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965, Poznan, Poland. .,Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704, Poznan, Poland.
| | - Natalia Szostak
- Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965, Poznan, Poland.
| | - Maciej Antczak
- Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965, Poznan, Poland.
| | - Tomasz Zok
- Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965, Poznan, Poland.
| | - Mariusz Popenda
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704, Poznan, Poland. .,European Center for Bioinformatics and Genomics, Poznan University of Technology, Piotrowo 2, 60-965, Poznan, Poland.
| | - Ryszard Adamiak
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704, Poznan, Poland. .,European Center for Bioinformatics and Genomics, Poznan University of Technology, Piotrowo 2, 60-965, Poznan, Poland.
| | - Jacek Blazewicz
- Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965, Poznan, Poland. .,Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704, Poznan, Poland.
| | - Marta Szachniuk
- Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965, Poznan, Poland. .,Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704, Poznan, Poland.
| |
Collapse
|
29
|
Pietal MJ, Bujnicki JM, Kozlowski LP. GDFuzz3D: a method for protein 3D structure reconstruction from contact maps, based on a non-Euclidean distance function. Bioinformatics 2015; 31:3499-505. [PMID: 26130575 DOI: 10.1093/bioinformatics/btv390] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2014] [Accepted: 06/23/2015] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION To date, only a few distinct successful approaches have been introduced to reconstruct a protein 3D structure from a map of contacts between its amino acid residues (a 2D contact map). Current algorithms can infer structures from information-rich contact maps that contain a limited fraction of erroneous predictions. However, it is difficult to reconstruct 3D structures from predicted contact maps that usually contain a high fraction of false contacts. RESULTS We describe a new, multi-step protocol that predicts protein 3D structures from the predicted contact maps. The method is based on a novel distance function acting on a fuzzy residue proximity graph, which predicts a 2D distance map from a 2D predicted contact map. The application of a Multi-Dimensional Scaling algorithm transforms that predicted 2D distance map into a coarse 3D model, which is further refined by typical modeling programs into an all-atom representation. We tested our approach on contact maps predicted de novo by MULTICOM, the top contact map predictor according to CASP10. We show that our method outperforms FT-COMAR, the state-of-the-art method for 3D structure reconstruction from 2D maps. For all predicted 2D contact maps of relatively low sensitivity (60-84%), GDFuzz3D generates more accurate 3D models, with the average improvement of 4.87 Å in terms of RMSD. AVAILABILITY AND IMPLEMENTATION GDFuzz3D server and standalone version are freely available at http://iimcb.genesilico.pl/gdserver/GDFuzz3D/. CONTACT iamb@genesilico.pl SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Michal J Pietal
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, Warsaw, Poland, Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Warsaw, Poland and
| | - Janusz M Bujnicki
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, Warsaw, Poland, Bioinformatics Laboratory, Institute of Molecular Biology and Biotechnology, Adam Mickiewicz University, Poznan, Poland
| | - Lukasz P Kozlowski
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, Warsaw, Poland
| |
Collapse
|
30
|
Müller R, Nebel ME. Combinatorics of RNA Secondary Structures with Base Triples. J Comput Biol 2015; 22:619-48. [PMID: 26098199 DOI: 10.1089/cmb.2013.0022] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The structure of RNA has been the subject of intense research over the last decades due to its importance for the correct functioning of RNA molecules in biological processes. Hence, a large number of models for RNA folding and corresponding algorithms for structure prediction have been developed. However, previous models often only consider base pairs, although every base is capable of up to three edge-to-edge interactions with other bases. Recently, Höner zu Siederdissen et al. presented an extended model of RNA secondary structure, including base triples together with a folding algorithm-the first thermodynamics-based algorithm that allows the prediction of secondary structures with base triples. In this article, we investigate the search space processed by this new algorithm, that is, the combinatorics of extended RNA secondary structures with base triples. We present generalized definitions for structural motifs like hairpins, stems, bulges, or interior loops occurring in structures with base triples. Furthermore, we prove precise asymptotic results for the number of different structures (size of search space) and expectations for various parameters associated with structural motifs (typical shape of folding). Our analysis shows that the asymptotic number of secondary structures of size n increases exponentially to [Formula: see text] compared to the classic model by Stein and Waterman for which [Formula: see text] structures exist. A comparison with the classic model reveals large deviations in the expected structural appearance, too. The inclusion of base triples constitutes a significant refinement of the combinatorial model of RNA secondary structure, which, by our findings, is quantitatively characterized. Our results are of special theoretical interest, because a closer look at the numbers involved suggests that extended RNA secondary structures constitute a new combinatorial class not bijective with any other combinatorial objects studied so far.
Collapse
Affiliation(s)
- Robert Müller
- 1 Department of Computer Science, Kaiserslautern University , Kaiserslautern, Germany
| | - Markus E Nebel
- 1 Department of Computer Science, Kaiserslautern University , Kaiserslautern, Germany .,2 IMADA, Southern Denmark University , Odense, Denmark
| |
Collapse
|
31
|
Kerpedjiev P, Höner Zu Siederdissen C, Hofacker IL. Predicting RNA 3D structure using a coarse-grain helix-centered model. RNA (NEW YORK, N.Y.) 2015; 21:1110-1121. [PMID: 25904133 PMCID: PMC4436664 DOI: 10.1261/rna.047522.114] [Citation(s) in RCA: 54] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/20/2014] [Accepted: 02/13/2015] [Indexed: 06/04/2023]
Abstract
A 3D model of RNA structure can provide information about its function and regulation that is not possible with just the sequence or secondary structure. Current models suffer from low accuracy and long running times and either neglect or presume knowledge of the long-range interactions which stabilize the tertiary structure. Our coarse-grained, helix-based, tertiary structure model operates with only a few degrees of freedom compared with all-atom models while preserving the ability to sample tertiary structures given a secondary structure. It strikes a balance between the precision of an all-atom tertiary structure model and the simplicity and effectiveness of a secondary structure representation. It provides a simplified tool for exploring global arrangements of helices and loops within RNA structures. We provide an example of a novel energy function relying only on the positions of stems and loops. We show that coupling our model to this energy function produces predictions as good as or better than the current state of the art tools. We propose that given the wide range of conformational space that needs to be explored, a coarse-grain approach can explore more conformations in less iterations than an all-atom model coupled to a fine-grain energy function. Finally, we emphasize the overarching theme of providing an ensemble of predicted structures, something which our tool excels at, rather than providing a handful of the lowest energy structures.
Collapse
Affiliation(s)
| | - Christian Höner Zu Siederdissen
- Institute for Theoretical Chemistry, A-1090 Vienna, Austria Bioinformatics Group, Department of Computer Science, Universität Leipzig, D-04107 Leipzig, Germany Interdisciplinary Center for Bioinformatics, Universität Leipzig, D-04107 Leipzig, Germany
| | - Ivo L Hofacker
- Institute for Theoretical Chemistry, A-1090 Vienna, Austria Research Group Bioinformatics and Computational Biology, University of Vienna, A-1090 Vienna, Austria Center for non-coding RNA in Technology and Health, Department of Veterinary Clinical and Animal Science, University of Copenhagen, DK-1870 Frederiksberg, Denmark
| |
Collapse
|
32
|
Chitsaz H, Aminisharifabad M. Exact Learning of RNA Energy Parameters From Structure. J Comput Biol 2015; 22:463-73. [DOI: 10.1089/cmb.2014.0164] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Hamidreza Chitsaz
- Department of Computer Science, Colorado State University, Fort Collins, Colorado
| | | |
Collapse
|
33
|
Abstract
In this chapter, we review both computational and experimental aspects of de novo RNA sequence design. We give an overview of currently available design software and their limitations, and discuss the necessary setup to experimentally validate proper function in vitro and in vivo. We focus on transcription-regulating riboswitches, a task that has just recently lead to first successful designs of such RNA elements.
Collapse
Affiliation(s)
- Sven Findeiß
- Research Group Bioinformatics and Computational Biology, Faculty of Computer Science, University of Vienna, Vienna, Austria; Institute for Theoretical Chemistry, University of Vienna, Vienna, Austria
| | - Manja Wachsmuth
- Institute for Biochemistry, University of Leipzig, Leipzig, Germany
| | - Mario Mörl
- Institute for Biochemistry, University of Leipzig, Leipzig, Germany.
| | - Peter F Stadler
- Institute for Theoretical Chemistry, University of Vienna, Vienna, Austria; Bioinformatics Group, Department of Computer Science and the Interdisciplinary Center for Bioinformatic, University of Leipzig, Leipzig, Germany; Center for RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark; Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany; Fraunhofer Institute for Cell Therapy and Immunology, Leipzig, Germany; Santa Fe Institute, Santa Fe, New Mexico, USA
| |
Collapse
|
34
|
Abstract
The contact map of a protein fold is a graph that represents the patterns of contacts in the fold. It is known that the contact map can be decomposed into stacks and queues. RNA secondary structures are special stacks in which the degree of each vertex is at most one and each arc has length of at least two. Waterman and Smith derived a formula for the number of RNA secondary structures of length n with exactly k arcs. Höner zu Siederdissen et al. developed a folding algorithm for extended RNA secondary structures in which each vertex has maximum degree two. An equation for the generating function of extended RNA secondary structures was obtained by Müller and Nebel by using a context-free grammar approach, which leads to an asymptotic formula. In this article, we consider m-regular linear stacks, where each arc has length at least m and the degree of each vertex is bounded by two. Extended RNA secondary structures are exactly 2-regular linear stacks. For any m ≥ 2, we obtain an equation for the generating function of the m-regular linear stacks. For given m, we deduce a recurrence relation and an asymptotic formula for the number of m-regular linear stacks on n vertices. To establish the equation, we use the reduction operation of Chen, Deng, and Du to transform an m-regular linear stack to an m-reduced zigzag (or alternating) stack. Then we find an equation for m-reduced zigzag stacks leading to an equation for m-regular linear stacks.
Collapse
Affiliation(s)
- William Y C Chen
- 1 Center for Combinatorics, The Key Laboratory of Pure Mathematics and Combinatorics of Ministry of Education of China and Tianjin Key Laboratory of Combinatorics (LPMC-TJKLC), Nankai University , Tianjin, P.R. China
| | | | | | | |
Collapse
|
35
|
Theis C, Höner Zu Siederdissen C, Hofacker IL, Gorodkin J. Automated identification of RNA 3D modules with discriminative power in RNA structural alignments. Nucleic Acids Res 2013; 41:9999-10009. [PMID: 24005040 PMCID: PMC3905863 DOI: 10.1093/nar/gkt795] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Recent progress in predicting RNA structure is moving towards filling the ‘gap’ in 2D RNA structure prediction where, for example, predicted internal loops often form non-canonical base pairs. This is increasingly recognized with the steady increase of known RNA 3D modules. There is a general interest in matching structural modules known from one molecule to other molecules for which the 3D structure is not known yet. We have created a pipeline, metaRNAmodules, which completely automates extracting putative modules from the FR3D database and mapping of such modules to Rfam alignments to obtain comparative evidence. Subsequently, the modules, initially represented by a graph, are turned into models for the RMDetect program, which allows to test their discriminative power using real and randomized Rfam alignments. An initial extraction of 22 495 3D modules in all PDB files results in 977 internal loop and 17 hairpin modules with clear discriminatory power. Many of these modules describe only minor variants of each other. Indeed, mapping of the modules onto Rfam families results in 35 unique locations in 11 different families. The metaRNAmodules pipeline source for the internal loop modules is available at http://rth.dk/resources/mrm.
Collapse
Affiliation(s)
- Corinna Theis
- Center for non-coding RNA in Technology and Health, Department of Veterinary Clinical and Animal Science, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg, Denmark, Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Vienna, Austria and Research Group Bioinformatics and Computational Biology, University of Vienna, Währingerstraße 17, A-1090 Vienna, Austria
| | | | | | | |
Collapse
|
36
|
Rivas E. The four ingredients of single-sequence RNA secondary structure prediction. A unifying perspective. RNA Biol 2013; 10:1185-96. [PMID: 23695796 PMCID: PMC3849167 DOI: 10.4161/rna.24971] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2013] [Revised: 05/06/2013] [Accepted: 05/08/2013] [Indexed: 12/31/2022] Open
Abstract
Any method for RNA secondary structure prediction is determined by four ingredients. The architecture is the choice of features implemented by the model (such as stacked basepairs, loop length distributions, etc.). The architecture determines the number of parameters in the model. The scoring scheme is the nature of those parameters (whether thermodynamic, probabilistic, or weights). The parameterization stands for the specific values assigned to the parameters. These three ingredients are referred to as "the model." The fourth ingredient is the folding algorithms used to predict plausible secondary structures given the model and the sequence of a structural RNA. Here, I make several unifying observations drawn from looking at more than 40 years of methods for RNA secondary structure prediction in the light of this classification. As a final observation, there seems to be a performance ceiling that affects all methods with complex architectures, a ceiling that impacts all scoring schemes with remarkable similarity. This suggests that modeling RNA secondary structure by using intrinsic sequence-based plausible "foldability" will require the incorporation of other forms of information in order to constrain the folding space and to improve prediction accuracy. This could give an advantage to probabilistic scoring systems since a probabilistic framework is a natural platform to incorporate different sources of information into one single inference problem.
Collapse
Affiliation(s)
- Elena Rivas
- Janelia Farm Research Campus; Howard Hughes Medical Institute; Ashburn, VA USA
| |
Collapse
|
37
|
Chitsaz H, Forouzmand E, Haffari G. An efficient algorithm for upper bound on the partition function of nucleic acids. J Comput Biol 2013; 20:486-94. [PMID: 23829650 DOI: 10.1089/cmb.2013.0003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
It has been shown that minimum free-energy structure for RNAs and RNA-RNA interaction is often incorrect due to inaccuracies in the energy parameters and inherent limitations of the energy model. In contrast, ensemble-based quantities such as melting temperature and equilibrium concentrations can be more reliably predicted. Even structure prediction by sampling from the ensemble and clustering those structures by Sfold has proven to be more reliable than minimum free energy structure prediction. The main obstacle for ensemble-based approaches is the computational complexity of the partition function and base-pairing probabilities. For instance, the space complexity of the partition function for RNA-RNA interaction is O(n4) and the time complexity is O(n6), which is prohibitively large. Our goal in this article is to present a fast algorithm, based on sparse folding, to calculate an upper bound on the partition function. Our work is based on the recent algorithm of Hazan and Jaakkola (2012). The space complexity of our algorithm is the same as that of sparse folding algorithms, and the time complexity of our algorithm is O(MFE(n)ℓ) for single RNA and O(MFE(m, n)ℓ) for RNA-RNA interaction in practice, in which MFE is the running time of sparse folding and ℓ≤n (ℓ≤n+m) is a sequence-dependent parameter.
Collapse
Affiliation(s)
- Hamidreza Chitsaz
- Department of Computer Science, Wayne State University, Detroit, Michigan 48202, USA.
| | | | | |
Collapse
|
38
|
Puton T, Kozlowski LP, Rother KM, Bujnicki JM. CompaRNA: a server for continuous benchmarking of automated methods for RNA secondary structure prediction. Nucleic Acids Res 2013; 41:4307-23. [PMID: 23435231 PMCID: PMC3627593 DOI: 10.1093/nar/gkt101] [Citation(s) in RCA: 81] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
We present a continuous benchmarking approach for the assessment of RNA secondary structure prediction methods implemented in the CompaRNA web server. As of 3 October 2012, the performance of 28 single-sequence and 13 comparative methods has been evaluated on RNA sequences/structures released weekly by the Protein Data Bank. We also provide a static benchmark generated on RNA 2D structures derived from the RNAstrand database. Benchmarks on both data sets offer insight into the relative performance of RNA secondary structure prediction methods on RNAs of different size and with respect to different types of structure. According to our tests, on the average, the most accurate predictions obtained by a comparative approach are generated by CentroidAlifold, MXScarna, RNAalifold and TurboFold. On the average, the most accurate predictions obtained by single-sequence analyses are generated by CentroidFold, ContextFold and IPknot. The best comparative methods typically outperform the best single-sequence methods if an alignment of homologous RNA sequences is available. This article presents the results of our benchmarks as of 3 October 2012, whereas the rankings presented online are continuously updated. We will gladly include new prediction methods and new measures of accuracy in the new editions of CompaRNA benchmarks.
Collapse
Affiliation(s)
- Tomasz Puton
- Bioinformatics Laboratory, Institute for Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University, ul. Umultowska 89, 61-614 Poznan, Poland
| | | | | | | |
Collapse
|
39
|
Reinharz V, Major F, Waldispühl J. Towards 3D structure prediction of large RNA molecules: an integer programming framework to insert local 3D motifs in RNA secondary structure. Bioinformatics 2013; 28:i207-14. [PMID: 22689763 PMCID: PMC3371858 DOI: 10.1093/bioinformatics/bts226] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Motivation: The prediction of RNA 3D structures from its sequence only is a milestone to RNA function analysis and prediction. In recent years, many methods addressed this challenge, ranging from cycle decomposition and fragment assembly to molecular dynamics simulations. However, their predictions remain fragile and limited to small RNAs. To expand the range and accuracy of these techniques, we need to develop algorithms that will enable to use all the structural information available. In particular, the energetic contribution of secondary structure interactions is now well documented, but the quantification of non-canonical interactions—those shaping the tertiary structure—is poorly understood. Nonetheless, even if a complete RNA tertiary structure energy model is currently unavailable, we now have catalogues of local 3D structural motifs including non-canonical base pairings. A practical objective is thus to develop techniques enabling us to use this knowledge for robust RNA tertiary structure predictors. Results: In this work, we introduce RNA-MoIP, a program that benefits from the progresses made over the last 30 years in the field of RNA secondary structure prediction and expands these methods to incorporate the novel local motif information available in databases. Using an integer programming framework, our method refines predicted secondary structures (i.e. removes incorrect canonical base pairs) to accommodate the insertion of RNA 3D motifs (i.e. hairpins, internal loops and k-way junctions). Then, we use predictions as templates to generate complete 3D structures with the MC-Sym program. We benchmarked RNA-MoIP on a set of 9 RNAs with sizes varying from 53 to 128 nucleotides. We show that our approach (i) improves the accuracy of canonical base pair predictions; (ii) identifies the best secondary structures in a pool of suboptimal structures; and (iii) predicts accurate 3D structures of large RNA molecules. Availability:RNA-MoIP is publicly available at: http://csb.cs.mcgill.ca/RNAMoIP. Contact:jeromew@cs.mcgill.ca
Collapse
Affiliation(s)
- Vladimir Reinharz
- School of Computer Science & McGill Centre for Bioinformatics, McGill University, Montréal, Canada
| | | | | |
Collapse
|
40
|
Sato K, Kato Y, Akutsu T, Asai K, Sakakibara Y. DAFS: simultaneous aligning and folding of RNA sequences via dual decomposition. ACTA ACUST UNITED AC 2012; 28:3218-24. [PMID: 23060618 DOI: 10.1093/bioinformatics/bts612] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
MOTIVATION It is well known that the accuracy of RNA secondary structure prediction from a single sequence is limited, and thus a comparative approach that predicts a common secondary structure from aligned sequences is a better choice if homologous sequences with reliable alignments are available. However, correct secondary structure information is needed to produce reliable alignments of RNA sequences. To tackle this dilemma, we require a fast and accurate aligner that takes structural information into consideration to yield reliable structural alignments, which are suitable for common secondary structure prediction. RESULTS We develop DAFS, a novel algorithm that simultaneously aligns and folds RNA sequences based on maximizing expected accuracy of a predicted common secondary structure and its alignment. DAFS decomposes the pairwise structural alignment problem into two independent secondary structure prediction problems and one pairwise (non-structural) alignment problem by the dual decomposition technique, and maintains the consistency of a pairwise structural alignment by imposing penalties on inconsistent base pairs and alignment columns that are iteratively updated. Furthermore, we extend DAFS to consider pseudoknots in RNA structural alignments by integrating IPknot for predicting a pseudoknotted structure. The experiments on publicly available datasets showed that DAFS can produce reliable structural alignments from unaligned sequences in terms of accuracy of common secondary structure prediction.
Collapse
Affiliation(s)
- Kengo Sato
- Department of Biosciences and Informatics, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama 223-8522, Japan.
| | | | | | | | | |
Collapse
|
41
|
Washietl S, Will S, Hendrix DA, Goff LA, Rinn JL, Berger B, Kellis M. Computational analysis of noncoding RNAs. WILEY INTERDISCIPLINARY REVIEWS-RNA 2012; 3:759-78. [PMID: 22991327 DOI: 10.1002/wrna.1134] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Noncoding RNAs have emerged as important key players in the cell. Understanding their surprisingly diverse range of functions is challenging for experimental and computational biology. Here, we review computational methods to analyze noncoding RNAs. The topics covered include basic and advanced techniques to predict RNA structures, annotation of noncoding RNAs in genomic data, mining RNA-seq data for novel transcripts and prediction of transcript structures, computational aspects of microRNAs, and database resources.
Collapse
Affiliation(s)
- Stefan Washietl
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | | | | | | | | | | | | |
Collapse
|
42
|
Lorenz R, Bernhart SH, Höner zu Siederdissen C, Tafer H, Flamm C, Stadler PF, Hofacker IL. ViennaRNA Package 2.0. Algorithms Mol Biol 2011; 6:26. [PMID: 22115189 PMCID: PMC3319429 DOI: 10.1186/1748-7188-6-26] [Citation(s) in RCA: 3230] [Impact Index Per Article: 230.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2011] [Accepted: 11/24/2011] [Indexed: 12/31/2022] Open
Abstract
Background Secondary structure forms an important intermediate level of description of nucleic acids that encapsulates the dominating part of the folding energy, is often well conserved in evolution, and is routinely used as a basis to explain experimental findings. Based on carefully measured thermodynamic parameters, exact dynamic programming algorithms can be used to compute ground states, base pairing probabilities, as well as thermodynamic properties. Results The ViennaRNA Package has been a widely used compilation of RNA secondary structure related computer programs for nearly two decades. Major changes in the structure of the standard energy model, the Turner 2004 parameters, the pervasive use of multi-core CPUs, and an increasing number of algorithmic variants prompted a major technical overhaul of both the underlying RNAlib and the interactive user programs. New features include an expanded repertoire of tools to assess RNA-RNA interactions and restricted ensembles of structures, additional output information such as centroid structures and maximum expected accuracy structures derived from base pairing probabilities, or z-scores for locally stable secondary structures, and support for input in fasta format. Updates were implemented without compromising the computational efficiency of the core algorithms and ensuring compatibility with earlier versions. Conclusions The ViennaRNA Package 2.0, supporting concurrent computations via OpenMP, can be downloaded from http://www.tbi.univie.ac.at/RNA.
Collapse
|