1
|
Omnes L, Angel E, Bartet P, Radvanyi F, Tahi F. A divide-and-conquer approach based on deep learning for long RNA secondary structure prediction: Focus on pseudoknots identification. PLoS One 2025; 20:e0314837. [PMID: 40279361 PMCID: PMC12026937 DOI: 10.1371/journal.pone.0314837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2024] [Accepted: 03/04/2025] [Indexed: 04/27/2025] Open
Abstract
The accurate prediction of RNA secondary structure, and pseudoknots in particular, is of great importance in understanding the functions of RNAs since they give insights into their folding in three-dimensional space. However, existing approaches often face computational challenges or lack precision when dealing with long RNA sequences and/or pseudoknots. To address this, we propose a divide-and-conquer method based on deep learning, called DivideFold, for predicting the secondary structures including pseudoknots of long RNAs. Our approach is able to scale to long RNAs by recursively partitioning sequences into smaller fragments until they can be managed by an existing model able to predict RNA secondary structure including pseudoknots. We show that our approach exhibits superior performance compared to state-of-the-art methods for pseudoknot prediction and secondary structure prediction including pseudoknots for long RNAs. The source code of DivideFold, along with all the datasets used in this study, is accessible at https://evryrna.ibisc.univ-evry.fr/evryrna/dividefold/home.
Collapse
Affiliation(s)
- Loïc Omnes
- Université Paris-Saclay, Univ Evry, IBISC, 91020 Evry-Courcouronnes, France
- ADLIN Science, 91037 Evry-Courcouronnes, France
| | - Eric Angel
- Université Paris-Saclay, Univ Evry, IBISC, 91020 Evry-Courcouronnes, France
| | | | - François Radvanyi
- Molecular Oncology UMR144, CNRS - Institut Curie, 75005 Paris, France
| | - Fariza Tahi
- Université Paris-Saclay, Univ Evry, IBISC, 91020 Evry-Courcouronnes, France
| |
Collapse
|
2
|
Biane C, Hampikian G, Kirgizov S, Nurligareev K. Endhered Patterns in Matchings and RNA. J Comput Biol 2025; 32:28-46. [PMID: 39714916 DOI: 10.1089/cmb.2024.0658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2024] Open
Abstract
An endhered (end-adhered) pattern is a subset of arcs in matchings, such that the corresponding starting points are consecutive, and the same holds for the ending points. Such patterns are in one-to-one correspondence with the permutations. We focus on the occurrence frequency of such patterns in matchings and native (real-world) RNA structures with pseudoknots. We present combinatorial results related to the distribution and asymptotic behavior of the pattern 21, which corresponds to two consecutive base pairs frequently encountered in RNA, and the pattern 12, representing the archetypal minimal pseudoknot. We show that in matchings these two patterns are equidistributed, which is quite different from what we can find in native RNAs. We also examine the distribution of endhered patterns of size 3, showing how the patterns change under the transformation called endhered twist. Finally, we compute the distributions of endhered patterns of size 2 and 3 in native secondary RNA structures with pseudoknots and discuss possible outcomes of our study.
Collapse
Affiliation(s)
- Célia Biane
- Laboratoire d'Informatique de Bourgogne, Université de Bourgogne, Dijon Cedex, France
| | | | - Sergey Kirgizov
- Laboratoire d'Informatique de Bourgogne, Université de Bourgogne, Dijon Cedex, France
| | - Khaydar Nurligareev
- Laboratoire d'Informatique de Bourgogne, Université de Bourgogne, Dijon Cedex, France
| |
Collapse
|
3
|
Zhang C, Freddolino L. FURNA: A database for functional annotations of RNA structures. PLoS Biol 2024; 22:e3002476. [PMID: 39074139 PMCID: PMC11309384 DOI: 10.1371/journal.pbio.3002476] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 08/08/2024] [Accepted: 06/24/2024] [Indexed: 07/31/2024] Open
Abstract
Despite the increasing number of 3D RNA structures in the Protein Data Bank, the majority of experimental RNA structures lack thorough functional annotations. As the significance of the functional roles played by noncoding RNAs becomes increasingly apparent, comprehensive annotation of RNA function is becoming a pressing concern. In response to this need, we have developed FURNA (Functions of RNAs), the first database for experimental RNA structures that aims to provide a comprehensive repository of high-quality functional annotations. These include Gene Ontology terms, Enzyme Commission numbers, ligand-binding sites, RNA families, protein-binding motifs, and cross-references to related databases. FURNA is available at https://seq2fun.dcmb.med.umich.edu/furna/ to enable quick discovery of RNA functions from their structures and sequences.
Collapse
Affiliation(s)
- Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Lydia Freddolino
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, United States of America
| |
Collapse
|
4
|
Kolaitis A, Makris E, Karagiannis AA, Tsanakas P, Pavlatos C. Knotify_V2.0: Deciphering RNA Secondary Structures with H-Type Pseudoknots and Hairpin Loops. Genes (Basel) 2024; 15:670. [PMID: 38927606 PMCID: PMC11203014 DOI: 10.3390/genes15060670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Revised: 05/19/2024] [Accepted: 05/22/2024] [Indexed: 06/28/2024] Open
Abstract
Accurately predicting the pairing order of bases in RNA molecules is essential for anticipating RNA secondary structures. Consequently, this task holds significant importance in unveiling previously unknown biological processes. The urgent need to comprehend RNA structures has been accentuated by the unprecedented impact of the widespread COVID-19 pandemic. This paper presents a framework, Knotify_V2.0, which makes use of syntactic pattern recognition techniques in order to predict RNA structures, with a specific emphasis on tackling the demanding task of predicting H-type pseudoknots that encompass bulges and hairpins. By leveraging the expressive capabilities of a Context-Free Grammar (CFG), the suggested framework integrates the inherent benefits of CFG and makes use of minimum free energy and maximum base pairing criteria. This integration enables the effective management of this inherently ambiguous task. The main contribution of Knotify_V2.0 compared to earlier versions lies in its capacity to identify additional motifs like bulges and hairpins within the internal loops of the pseudoknot. Notably, the proposed methodology, Knotify_V2.0, demonstrates superior accuracy in predicting core stems compared to state-of-the-art frameworks. Knotify_V2.0 exhibited exceptional performance by accurately identifying both core base pairing that form the ground truth pseudoknot in 70% of the examined sequences. Furthermore, Knotify_V2.0 narrowed the performance gap with Knotty, which had demonstrated better performance than Knotify and even surpassed it in Recall and F1-score metrics. Knotify_V2.0 achieved a higher count of true positives (tp) and a significantly lower count of false negatives (fn) compared to Knotify, highlighting improvements in Prediction and Recall metrics, respectively. Consequently, Knotify_V2.0 achieved a higher F1-score than any other platform. The source code and comprehensive implementation details of Knotify_V2.0 are publicly available on GitHub.
Collapse
Affiliation(s)
- Angelos Kolaitis
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece; (A.K.); (E.M.); (A.A.K.); (P.T.)
| | - Evangelos Makris
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece; (A.K.); (E.M.); (A.A.K.); (P.T.)
| | - Alexandros Anastasios Karagiannis
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece; (A.K.); (E.M.); (A.A.K.); (P.T.)
| | - Panayiotis Tsanakas
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece; (A.K.); (E.M.); (A.A.K.); (P.T.)
| | - Christos Pavlatos
- Hellenic Air Force Academy, Dekelia Air Base, Acharnes, 13671 Athens, Greece
| |
Collapse
|
5
|
Loyer G, Reinharz V. Concurrent prediction of RNA secondary structures with pseudoknots and local 3D motifs in an integer programming framework. Bioinformatics 2024; 40:btae022. [PMID: 38230755 PMCID: PMC10868335 DOI: 10.1093/bioinformatics/btae022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 11/30/2023] [Accepted: 01/12/2024] [Indexed: 01/18/2024] Open
Abstract
MOTIVATION The prediction of RNA structure canonical base pairs from a single sequence, especially pseudoknotted ones, remains challenging in a thermodynamic models that approximates the energy of the local 3D motifs joining canonical stems. It has become more and more apparent in recent years that the structural motifs in the loops, composed of noncanonical interactions, are essential for the final shape of the molecule enabling its multiple functions. Our capacity to predict accurate 3D structures is also limited when it comes to the organization of the large intricate network of interactions that form inside those loops. RESULTS We previously developed the integer programming framework RNA Motifs over Integer Programming (RNAMoIP) to reconcile RNA secondary structure and local 3D motif information available in databases. We further develop our model to now simultaneously predict the canonical base pairs (with pseudoknots) from base pair probability matrices with or without alignment. We benchmarked our new method over the all nonredundant RNAs below 150 nucleotides. We show that the joined prediction of canonical base pairs structure and local conserved motifs (i) improves the ratio of well-predicted interactions in the secondary structure, (ii) predicts well canonical and Wobble pairs at the location where motifs are inserted, (iii) is greatly improved with evolutionary information, and (iv) noncanonical motifs at kink-turn locations. AVAILABILITY AND IMPLEMENTATION The source code of the framework is available at https://gitlab.info.uqam.ca/cbe/RNAMoIP and an interactive web server at https://rnamoip.cbe.uqam.ca/.
Collapse
Affiliation(s)
- Gabriel Loyer
- Department of Computer Science, Université du Québec à Montréal, Montréal, QC H2X 3Y7, Canada
| | - Vladimir Reinharz
- Department of Computer Science, Université du Québec à Montréal, Montréal, QC H2X 3Y7, Canada
| |
Collapse
|
6
|
Bohdan DR, Voronina VV, Bujnicki JM, Baulin EF. A comprehensive survey of long-range tertiary interactions and motifs in non-coding RNA structures. Nucleic Acids Res 2023; 51:8367-8382. [PMID: 37471030 PMCID: PMC10484739 DOI: 10.1093/nar/gkad605] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Accepted: 07/07/2023] [Indexed: 07/21/2023] Open
Abstract
Understanding the 3D structure of RNA is key to understanding RNA function. RNA 3D structure is modular and can be seen as a composition of building blocks of various sizes called tertiary motifs. Currently, long-range motifs formed between distant loops and helical regions are largely less studied than the local motifs determined by the RNA secondary structure. We surveyed long-range tertiary interactions and motifs in a non-redundant set of non-coding RNA 3D structures. A new dataset of annotated LOng-RAnge RNA 3D modules (LORA) was built using an approach that does not rely on the automatic annotations of non-canonical interactions. An original algorithm, ARTEM, was developed for annotation-, sequence- and topology-independent superposition of two arbitrary RNA 3D modules. The proposed methods allowed us to identify and describe the most common long-range RNA tertiary motifs. Along with the prevalent canonical A-minor interactions, a large number of previously undescribed staple interactions were observed. The most frequent long-range motifs were found to belong to three main motif families: planar staples, tilted staples, and helical packing motifs.
Collapse
Affiliation(s)
- Davyd R Bohdan
- Department of Innovation and High Technology, Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
| | - Valeria V Voronina
- Department of Information Systems, Ulyanovsk State Technical University, Ulyanovsk 432027, Russia
| | - Janusz M Bujnicki
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, Warsaw 02-109, Poland
| | - Eugene F Baulin
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, Warsaw 02-109, Poland
| |
Collapse
|
7
|
Chojnowski G, Zaborowski R, Magnus M, Mukherjee S, Bujnicki JM. RNA 3D structure modeling by fragment assembly with small-angle X-ray scattering restraints. Bioinformatics 2023; 39:btad527. [PMID: 37647627 PMCID: PMC10474949 DOI: 10.1093/bioinformatics/btad527] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 07/14/2023] [Accepted: 08/28/2023] [Indexed: 09/01/2023] Open
Abstract
SUMMARY Structure determination is a key step in the functional characterization of many non-coding RNA molecules. High-resolution RNA 3D structure determination efforts, however, are not keeping up with the pace of discovery of new non-coding RNA sequences. This increases the importance of computational approaches and low-resolution experimental data, such as from the small-angle X-ray scattering experiments. We present RNA Masonry, a computer program and a web service for a fully automated modeling of RNA 3D structures. It assemblies RNA fragments into geometrically plausible models that meet user-provided secondary structure constraints, restraints on tertiary contacts, and small-angle X-ray scattering data. We illustrate the method description with detailed benchmarks and its application to structural studies of viral RNAs with SAXS restraints. AVAILABILITY AND IMPLEMENTATION The program web server is available at http://iimcb.genesilico.pl/rnamasonry. The source code is available at https://gitlab.com/gchojnowski/rnamasonry.
Collapse
Affiliation(s)
- Grzegorz Chojnowski
- International Institute of Molecular and Cell Biology, Warsaw 02-109, Poland
- European Molecular Biology Laboratory, Hamburg Unit, Hamburg 22607, Germany
| | - Rafał Zaborowski
- International Institute of Molecular and Cell Biology, Warsaw 02-109, Poland
| | - Marcin Magnus
- ReMedy International Research Agenda Unit, IMol Polish Academy of Sciences, Warsaw, Poland
| | - Sunandan Mukherjee
- International Institute of Molecular and Cell Biology, Warsaw 02-109, Poland
| | - Janusz M Bujnicki
- International Institute of Molecular and Cell Biology, Warsaw 02-109, Poland
| |
Collapse
|
8
|
Kunzmann P, Müller TD, Greil M, Krumbach JH, Anter JM, Bauer D, Islam F, Hamacher K. Biotite: new tools for a versatile Python bioinformatics library. BMC Bioinformatics 2023; 24:236. [PMID: 37277726 PMCID: PMC10243083 DOI: 10.1186/s12859-023-05345-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Accepted: 05/18/2023] [Indexed: 06/07/2023] Open
Abstract
BACKGROUND Biotite is a program library for sequence and structural bioinformatics written for the Python programming language. It implements widely used computational methods into a consistent and accessible package. This allows for easy combination of various data analysis, modeling and simulation methods. RESULTS This article presents major functionalities introduced into Biotite since its original publication. The fields of application are shown using concrete examples. We show that the computational performance of Biotite for bioinformatics tasks is comparable to individual, special purpose software systems specifically developed for the respective single task. CONCLUSIONS The results show that Biotite can be used as program library to either answer specific bioinformatics questions and simultaneously allow the user to write entire, self-contained software applications with sufficient performance for general application.
Collapse
Affiliation(s)
- Patrick Kunzmann
- Computational Biology and Simulation, Technical University of Darmstadt, Schnittspahnstraße 2, 64287, Darmstadt, Germany.
| | - Tom David Müller
- Department of Computer Science, Eberhard Karls University of Tübingen, Sand 14, 72076, Tübingen, Germany
| | | | - Jan Hendrik Krumbach
- Computational Biology and Simulation, Technical University of Darmstadt, Schnittspahnstraße 2, 64287, Darmstadt, Germany
| | - Jacob Marcel Anter
- Computational Biology and Simulation, Technical University of Darmstadt, Schnittspahnstraße 2, 64287, Darmstadt, Germany
| | - Daniel Bauer
- Computational Biology and Simulation, Technical University of Darmstadt, Schnittspahnstraße 2, 64287, Darmstadt, Germany
| | - Faisal Islam
- Computational Biology and Simulation, Technical University of Darmstadt, Schnittspahnstraße 2, 64287, Darmstadt, Germany
| | - Kay Hamacher
- Computational Biology and Simulation, Technical University of Darmstadt, Schnittspahnstraße 2, 64287, Darmstadt, Germany
| |
Collapse
|
9
|
Metrics for RNA Secondary Structure Comparison. Methods Mol Biol 2023; 2586:79-88. [PMID: 36705899 DOI: 10.1007/978-1-0716-2768-6_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
RNA secondary structure comparison is one of the important analyses for elucidating individual functions of RNAs since it is widely accepted that their functions and structures are strongly correlated. However, although the RNA secondary structures with pseudoknot play important roles in vivo, it is difficult to deal with such structures in silico due to their structural complexity, which is a major obstacle to the analysis of RNA functions.Here, we introduce an algorithm and a metric for comparing pseudoknotted RNA secondary structures based on topological centroid identification and tree edit distance and describe the usage protocol of a software enabling us to run the comparison. This software is publicly available and works on both Microsoft Windows and Apple macOS.
Collapse
|
10
|
Genomic Analysis of Non-B Nucleic Acids Structures in SARS-CoV-2: Potential Key Roles for These Structures in Mutability, Translation, and Replication? Genes (Basel) 2023; 14:genes14010157. [PMID: 36672896 PMCID: PMC9859294 DOI: 10.3390/genes14010157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Revised: 01/01/2023] [Accepted: 01/04/2023] [Indexed: 01/09/2023] Open
Abstract
Non-B nucleic acids structures have arisen as key contributors to genetic variation in SARS-CoV-2. Herein, we investigated the presence of defining spike protein mutations falling within inverted repeats (IRs) for 18 SARS-CoV-2 variants, discussed the potential roles of G-quadruplexes (G4s) in SARS-CoV-2 biology, and identified potential pseudoknots within the SARS-CoV-2 genome. Surprisingly, there was a large variation in the number of defining spike protein mutations arising within IRs between variants and these were more likely to occur in the stem region of the predicted hairpin stem-loop secondary structure. Notably, mutations implicated in ACE2 binding and propagation (e.g., ΔH69/V70, N501Y, and D614G) were likely to occur within IRs, whilst mutations involved in antibody neutralization and reduced vaccine efficacy (e.g., T19R, ΔE156, ΔF157, R158G, and G446S) were rarely found within IRs. We also predicted that RNA pseudoknots could predominantly be found within, or next to, 29 mutations found in the SARS-CoV-2 spike protein. Finally, the Omicron variants BA.2, BA.4, BA.5, BA.2.12.1, and BA.2.75 appear to have lost two of the predicted G4-forming sequences found in other variants. These were found in nsp2 and the sequence complementary to the conserved stem-loop II-like motif (S2M) in the 3' untranslated region (UTR). Taken together, non-B nucleic acids structures likely play an integral role in SARS-CoV-2 evolution and genetic diversity.
Collapse
|
11
|
Zhou L, Wang X, Yu S, Tan YL, Tan ZJ. FebRNA: An automated fragment-ensemble-based model for building RNA 3D structures. Biophys J 2022; 121:3381-3392. [PMID: 35978551 PMCID: PMC9515226 DOI: 10.1016/j.bpj.2022.08.017] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Revised: 07/19/2022] [Accepted: 08/15/2022] [Indexed: 11/23/2022] Open
Abstract
Knowledge of RNA three-dimensional (3D) structures is critical to understanding the important biological functions of RNAs. Although various structure prediction models have been developed, the high-accuracy predictions of RNA 3D structures are still limited to the RNAs with short lengths or with simple topology. In this work, we proposed a new model, namely FebRNA, for building RNA 3D structures through fragment assembly based on coarse-grained (CG) fragment ensembles. Specifically, FebRNA is composed of four processes: establishing the library of different types of non-redundant CG fragment ensembles regardless of the sequences, building CG 3D structure ensemble through fragment assembly, identifying top-scored CG structures through a specific CG scoring function, and rebuilding the all-atom structures from the top-scored CG ones. Extensive examination against different types of RNA structures indicates that FebRNA consistently gives the reliable predictions on RNA 3D structures, including pseudoknots, three-way junctions, four-way and five-way junctions, and RNAs in the RNA-Puzzles. FebRNA is available on the Web site: https://github.com/Tan-group/FebRNA.
Collapse
Affiliation(s)
- Li Zhou
- Department of Physics and Key Laboratory of Artificial Micro & Nano-structures of Education, School of Physics and Technology, Wuhan University, Wuhan 430072, China
| | - Xunxun Wang
- Department of Physics and Key Laboratory of Artificial Micro & Nano-structures of Education, School of Physics and Technology, Wuhan University, Wuhan 430072, China
| | - Shixiong Yu
- Department of Physics and Key Laboratory of Artificial Micro & Nano-structures of Education, School of Physics and Technology, Wuhan University, Wuhan 430072, China
| | - Ya-Lan Tan
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University, Wuhan 430073, China.
| | - Zhi-Jie Tan
- Department of Physics and Key Laboratory of Artificial Micro & Nano-structures of Education, School of Physics and Technology, Wuhan University, Wuhan 430072, China.
| |
Collapse
|
12
|
Huang FW, Barrett CL, Reidys CM. The energy-spectrum of bicompatible sequences. Algorithms Mol Biol 2021; 16:7. [PMID: 34074304 PMCID: PMC8167974 DOI: 10.1186/s13015-021-00187-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Accepted: 05/24/2021] [Indexed: 12/04/2022] Open
Abstract
Background Genotype-phenotype maps provide a meaningful filtration of sequence space and RNA secondary structures are particular such phenotypes. Compatible sequences, which satisfy the base-pairing constraints of a given RNA structure, play an important role in the context of neutral evolution. Sequences that are simultaneously compatible with two given structures (bicompatible sequences), are beacons in phenotypic transitions, induced by erroneously replicating populations of RNA sequences. RNA riboswitches, which are capable of expressing two distinct secondary structures without changing the underlying sequence, are one example of bicompatible sequences in living organisms. Results We present a full loop energy model Boltzmann sampler of bicompatible sequences for pairs of structures. The sequence sampler employs a dynamic programming routine whose time complexity is polynomial when assuming the maximum number of exposed vertices, \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\kappa $$\end{document}κ, is a constant. The parameter \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\kappa $$\end{document}κ depends on the two structures and can be very large. We introduce a novel topological framework encapsulating the relations between loops that sheds light on the understanding of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\kappa $$\end{document}κ. Based on this framework, we give an algorithm to sample sequences with minimum \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\kappa $$\end{document}κ on a particular topologically classified case as well as giving hints to the solution in the other cases. As a result, we utilize our sequence sampler to study some established riboswitches. Conclusion Our analysis of riboswitch sequences shows that a pair of structures needs to satisfy key properties in order to facilitate phenotypic transitions and that pairs of random structures are unlikely to do so. Our analysis observes a distinct signature of riboswitch sequences, suggesting a new criterion for identifying native sequences and sequences subjected to evolutionary pressure. Our free software is available at: https://github.com/FenixHuang667/Bifold.
Collapse
|
13
|
Soulé A, Reinharz V, Sarrazin-Gendron R, Denise A, Waldispühl J. Finding recurrent RNA structural networks with fast maximal common subgraphs of edge-colored graphs. PLoS Comput Biol 2021; 17:e1008990. [PMID: 34048427 PMCID: PMC8191989 DOI: 10.1371/journal.pcbi.1008990] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 06/10/2021] [Accepted: 04/22/2021] [Indexed: 11/25/2022] Open
Abstract
RNA tertiary structure is crucial to its many non-coding molecular functions. RNA architecture is shaped by its secondary structure composed of stems, stacked canonical base pairs, enclosing loops. While stems are precisely captured by free-energy models, loops composed of non-canonical base pairs are not. Nor are distant interactions linking together those secondary structure elements (SSEs). Databases of conserved 3D geometries (a.k.a. modules) not captured by energetic models are leveraged for structure prediction and design, but the computational complexity has limited their study to local elements, loops. Representing the RNA structure as a graph has recently allowed to expend this work to pairs of SSEs, uncovering a hierarchical organization of these 3D modules, at great computational cost. Systematically capturing recurrent patterns on a large scale is a main challenge in the study of RNA structures. In this paper, we present an efficient algorithm to compute maximal isomorphisms in edge colored graphs. We extend this algorithm to a framework well suited to identify RNA modules, and fast enough to considerably generalize previous approaches. To exhibit the versatility of our framework, we first reproduce results identifying all common modules spanning more than 2 SSEs, in a few hours instead of weeks. The efficiency of our new algorithm is demonstrated by computing the maximal modules between any pair of entire RNA in the non-redundant corpus of known RNA 3D structures. We observe that the biggest modules our method uncovers compose large shared sub-structure spanning hundreds of nucleotides and base pairs between the ribosomes of Thermus thermophilus, Escherichia Coli, and Pseudomonas aeruginosa. Ribonucleic Acids (RNAs) are performing a broad range of essential molecular functions in cells, many of which rely on intricate folding properties of the molecule. Watson-Crick and Wobble base pairs form early, stack onto each other to create stems connected by loops, which are themselves stabilized by more sophisticated base interaction patterns. These networks are essential to shape RNA 3D structures but unfortunately still poorly understood. Here, we undertake the task to build a catalog of base interaction networks occurring in multiple structures. However, a pairwise comparison of all RNA structures is computationally heavy. Therefore, we devise an algorithm leveraging intrinsic properties of RNA base interaction networks that enables us to quickly mine full databases of 3D structures. Compared to previous methods, our techniques bring the total running time of the analysis from months to hours while performing more general searches. The data collected though this work will benefit molecular evolution studies and serve in structure prediction tools.
Collapse
Affiliation(s)
- Antoine Soulé
- School of Computer Science, McGill University, Montréal, Canada
- LiX, École Polytechnique, Paris, France
| | - Vladimir Reinharz
- Department of Computer Science, Université du Québec à Montréal, Montréal, Canada
| | | | - Alain Denise
- Laboratoire de recherche en informatique, Université Paris-Saclay - CNRS, Orsay, France
- Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay - CEA - CNRS, Gif-sur-Yvette, France
| | - Jérôme Waldispühl
- School of Computer Science, McGill University, Montréal, Canada
- * E-mail:
| |
Collapse
|
14
|
Zok T. BioCommons: a robust Java library for RNA structural bioinformatics. Bioinformatics 2021; 37:2766-2767. [PMID: 33532837 PMCID: PMC8428578 DOI: 10.1093/bioinformatics/btab069] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Revised: 12/28/2020] [Accepted: 01/29/2021] [Indexed: 11/30/2022] Open
Abstract
Motivation Biomolecular structures come in multiple representations and diverse data formats. Their incompatibility with the requirements of data analysis programs significantly hinders the analytics and the creation of new structure-oriented bioinformatic tools. Therefore, the need for robust libraries of data processing functions is still growing. Results BioCommons is an open-source, Java library for structural bioinformatics. It contains many functions working with the 2D and 3D structures of biomolecules, with a particular emphasis on RNA. Availability and implementation The library is available in Maven Central Repository and its source code is hosted on GitHub: https://github.com/tzok/BioCommons Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tomasz Zok
- Poznan University of Technology, Institute of Computing Science
| |
Collapse
|
15
|
Saaidi A, Allouche D, Regnier M, Sargueil B, Ponty Y. IPANEMAP: integrative probing analysis of nucleic acids empowered by multiple accessibility profiles. Nucleic Acids Res 2020; 48:8276-8289. [PMID: 32735675 PMCID: PMC7470984 DOI: 10.1093/nar/gkaa607] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Revised: 07/03/2020] [Accepted: 07/29/2020] [Indexed: 11/13/2022] Open
Abstract
The manual production of reliable RNA structure models from chemical probing experiments benefits from the integration of information derived from multiple protocols and reagents. However, the interpretation of multiple probing profiles remains a complex task, hindering the quality and reproducibility of modeling efforts. We introduce IPANEMAP, the first automated method for the modeling of RNA structure from multiple probing reactivity profiles. Input profiles can result from experiments based on diverse protocols, reagents, or collection of variants, and are jointly analyzed to predict the dominant conformations of an RNA. IPANEMAP combines sampling, clustering and multi-optimization, to produce secondary structure models that are both stable and well-supported by experimental evidences. The analysis of multiple reactivity profiles, both publicly available and produced in our study, demonstrates the good performances of IPANEMAP, even in a mono probing setting. It confirms the potential of integrating multiple sources of probing data, informing the design of informative probing assays.
Collapse
Affiliation(s)
- Afaf Saaidi
- CNRS UMR 7161, LIX, Ecole Polytechnique, Institut Polytechnique de Paris, 1 rue Estienne d'Orves, 91120 Palaiseau, France
| | - Delphine Allouche
- CNRS UMR 8038, CitCoM, Université de Paris, 4 avenue de l'observatoire, 75006 Paris, France
| | - Mireille Regnier
- CNRS UMR 7161, LIX, Ecole Polytechnique, Institut Polytechnique de Paris, 1 rue Estienne d'Orves, 91120 Palaiseau, France
| | - Bruno Sargueil
- CNRS UMR 8038, CitCoM, Université de Paris, 4 avenue de l'observatoire, 75006 Paris, France
| | - Yann Ponty
- CNRS UMR 7161, LIX, Ecole Polytechnique, Institut Polytechnique de Paris, 1 rue Estienne d'Orves, 91120 Palaiseau, France
| |
Collapse
|
16
|
Chen X, Khan NS, Zhang S. LocalSTAR3D: a local stack-based RNA 3D structural alignment tool. Nucleic Acids Res 2020; 48:e77. [PMID: 32496533 PMCID: PMC7367197 DOI: 10.1093/nar/gkaa453] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2020] [Revised: 05/15/2020] [Accepted: 05/27/2020] [Indexed: 11/29/2022] Open
Abstract
A fast-growing number of non-coding RNA structures have been resolved and deposited in Protein Data Bank (PDB). In contrast to the wide range of global alignment and motif search tools, there is still a lack of local alignment tools. Among all the global alignment tools for RNA 3D structures, STAR3D has become a valuable tool for its unprecedented speed and accuracy. STAR3D compares the 3D structures of RNA molecules using consecutive base-pairs (stacks) as anchors and generates an optimal global alignment. In this article, we developed a local RNA 3D structural alignment tool, named LocalSTAR3D, which was extended from STAR3D and designed to report multiple local alignments between two RNAs. The benchmarking results show that LocalSTAR3D has better accuracy and coverage than other local alignment tools. Furthermore, the utility of this tool has been demonstrated by rediscovering kink-turn motif instances, conserved domains in group II intron RNAs, and the tRNA mimicry of IRES RNAs.
Collapse
Affiliation(s)
- Xiaoli Chen
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Nabila Shahnaz Khan
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Shaojie Zhang
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| |
Collapse
|
17
|
Wang F, Akutsu T, Mori T. Comparison of Pseudoknotted RNA Secondary Structures by Topological Centroid Identification and Tree Edit Distance. J Comput Biol 2020; 27:1443-1451. [PMID: 32058802 DOI: 10.1089/cmb.2019.0512] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Comparison of RNA structures is one of the most crucial analysis for elucidating their individual functions and promoting medical applications. Because it is widely accepted that their functions and structures are strongly correlated, various methods for RNA secondary structure analysis have been proposed owing to the difficulty in predicting RNA three-dimensional structure directly from its sequence. However, there are few methods dealing with RNA secondary structures with a specific and complex partial structure called pseudoknot despite its significance to biological process, which is a big obstacle for analyzing their functions. In this study, we propose a novel tree representation of pseudoknotted RNA secondary structures by topological centroid identification and their comparison methods based on the tree edit distance. In the proposed method, a given graph representing an RNA secondary structure is transformed to a tree rooted at one of the vertices constituting the topological centroid that is identified by removing cycles with peeling processing for the graph. When comparing tree-represented RNA secondary structures collected from a public database using the tree edit distance and functional gene groups defined by Gene Ontology (GO), the proposed method showed better clustering results according to their GOs than canonical RNA sequence-based comparison. In addition, we also report a case that the combination of the tree edit distance and the sequence edit distance shows a better classification of the pseudoknotted RNA secondary structures.
Collapse
Affiliation(s)
- Feiqi Wang
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan
| | - Tomoya Mori
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan
| |
Collapse
|
18
|
Zhou G, Loper J, Geman S. Base-pair ambiguity and the kinetics of RNA folding. BMC Bioinformatics 2019; 20:666. [PMID: 31830902 PMCID: PMC6909616 DOI: 10.1186/s12859-019-3303-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Accepted: 12/02/2019] [Indexed: 01/28/2023] Open
Abstract
Background A pairings of nucleotide sequences. Given this forbidding free-energy landscape, mechanisms have evolved that contribute to a directed and efficient folding process, including catalytic proteins and error-detecting chaperones. Among structural RNA molecules we make a distinction between “bound” molecules, which are active as part of ribonucleoprotein (RNP) complexes, and “unbound,” with physiological functions performed without necessarily being bound in RNP complexes. We hypothesized that unbound molecules, lacking the partnering structure of a protein, would be more vulnerable than bound molecules to kinetic traps that compete with native stem structures. We defined an “ambiguity index”—a normalized function of the primary and secondary structure of an individual molecule that measures the number of kinetic traps available to nucleotide sequences that are paired in the native structure, presuming that unbound molecules would have lower indexes. The ambiguity index depends on the purported secondary structure, and was computed under both the comparative (“gold standard”) and an equilibrium-based prediction which approximates the minimum free energy (MFE) structure. Arguing that kinetically accessible metastable structures might be more biologically relevant than thermodynamic equilibrium structures, we also hypothesized that MFE-derived ambiguities would be less effective in separating bound and unbound molecules. Results We have introduced an intuitive and easily computed function of primary and secondary structures that measures the availability of complementary sequences that could disrupt the formation of native stems on a given molecule—an ambiguity index. Using comparative secondary structures, the ambiguity index is systematically smaller among unbound than bound molecules, as expected. Furthermore, the effect is lost when the presumably more accurate comparative structure is replaced instead by the MFE structure. Conclusions A statistical analysis of the relationship between the primary and secondary structures of non-coding RNA molecules suggests that stem-disrupting kinetic traps are substantially less prevalent in molecules not participating in RNP complexes. In that this distinction is apparent under the comparative but not the MFE secondary structure, the results highlight a possible deficiency in structure predictions when based upon assumptions of thermodynamic equilibrium.
Collapse
Affiliation(s)
| | - Jackson Loper
- Data Science Institute, Columbia University, New York, NY, USA
| | - Stuart Geman
- Division of Applied Mathematics, Brown University, Providence, RI, USA
| |
Collapse
|
19
|
Zok T, Antczak M, Zurkowski M, Popenda M, Blazewicz J, Adamiak RW, Szachniuk M. RNApdbee 2.0: multifunctional tool for RNA structure annotation. Nucleic Acids Res 2019; 46:W30-W35. [PMID: 29718468 PMCID: PMC6031003 DOI: 10.1093/nar/gky314] [Citation(s) in RCA: 71] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Accepted: 04/14/2018] [Indexed: 01/07/2023] Open
Abstract
In the field of RNA structural biology and bioinformatics, an access to correctly annotated RNA structure is of crucial importance, especially in the secondary and 3D structure predictions. RNApdbee webserver, introduced in 2014, primarily aimed to address the problem of RNA secondary structure extraction from the PDB files. Its new version, RNApdbee 2.0, is a highly advanced multifunctional tool for RNA structure annotation, revealing the relationship between RNA secondary and 3D structure given in the PDB or PDBx/mmCIF format. The upgraded version incorporates new algorithms for recognition and classification of high-ordered pseudoknots in large RNA structures. It allows analysis of isolated base pairs impact on RNA structure. It can visualize RNA secondary structures—including that of quadruplexes—with depiction of non-canonical interactions. It also annotates motifs to ease identification of stems, loops and single-stranded fragments in the input RNA structure. RNApdbee 2.0 is implemented as a publicly available webserver with an intuitive interface and can be freely accessed at http://rnapdbee.cs.put.poznan.pl/
Collapse
Affiliation(s)
- Tomasz Zok
- Institute of Computing Science, and European Centre for Bioinformatics and Genomics, Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland.,Poznan Supercomputing and Networking Center, Jana Pawla II 10, 61-139 Poznan, Poland
| | - Maciej Antczak
- Institute of Computing Science, and European Centre for Bioinformatics and Genomics, Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland
| | - Michal Zurkowski
- Institute of Computing Science, and European Centre for Bioinformatics and Genomics, Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland
| | - Mariusz Popenda
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland
| | - Jacek Blazewicz
- Institute of Computing Science, and European Centre for Bioinformatics and Genomics, Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland.,Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland
| | - Ryszard W Adamiak
- Institute of Computing Science, and European Centre for Bioinformatics and Genomics, Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland.,Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland
| | - Marta Szachniuk
- Institute of Computing Science, and European Centre for Bioinformatics and Genomics, Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland.,Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland
| |
Collapse
|
20
|
Reinharz V, Soulé A, Westhof E, Waldispühl J, Denise A. Mining for recurrent long-range interactions in RNA structures reveals embedded hierarchies in network families. Nucleic Acids Res 2019; 46:3841-3851. [PMID: 29608773 PMCID: PMC5934684 DOI: 10.1093/nar/gky197] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2017] [Accepted: 03/22/2018] [Indexed: 11/14/2022] Open
Abstract
The wealth of the combinatorics of nucleotide base pairs enables RNA molecules to assemble into sophisticated interaction networks, which are used to create complex 3D substructures. These interaction networks are essential to shape the 3D architecture of the molecule, and also to provide the key elements to carry molecular functions such as protein or ligand binding. They are made of organised sets of long-range tertiary interactions which connect distinct secondary structure elements in 3D structures. Here, we present a de novo data-driven approach to extract automatically from large data sets of full RNA 3D structures the recurrent interaction networks (RINs). Our methodology enables us for the first time to detect the interaction networks connecting distinct components of the RNA structure, highlighting their diversity and conservation through non-related functional RNAs. We use a graphical model to perform pairwise comparisons of all RNA structures available and to extract RINs and modules. Our analysis yields a complete catalog of RNA 3D structures available in the Protein Data Bank and reveals the intricate hierarchical organization of the RNA interaction networks and modules. We assembled our results in an online database (http://carnaval.lri.fr) which will be regularly updated. Within the site, a tool allows users with a novel RNA structure to detect automatically whether the novel structure contains previously observed RINs.
Collapse
Affiliation(s)
- Vladimir Reinharz
- Department of Computer Science, Ben-Gurion University of the Negev, P.O.B. 653 Beer-Sheva, 84105, Israel.,School of Computer Science, McGill University, 3480 University, Montreal, Quebec H3A 0E9, Canada
| | - Antoine Soulé
- School of Computer Science, McGill University, 3480 University, Montreal, Quebec H3A 0E9, Canada.,LIX, École Polytechnique, CNRS, Inria, Palaiseau 91120, France
| | - Eric Westhof
- ARN, Université de Strasbourg, IBMC-CNRS, 15 rue René Descartes, Strasbourg Cedex 67084, France
| | - Jérôme Waldispühl
- School of Computer Science, McGill University, 3480 University, Montreal, Quebec H3A 0E9, Canada
| | - Alain Denise
- LRI, Université Paris-Sud, CNRS, Université Paris-Saclay, Bâtiment 650, Orsay cedex 91405, France.,I2BC, Université Paris-Sud, CNRS, CEA, Université Paris-Saclay, Bâtiment 400, Orsay cedex 91405, France
| |
Collapse
|
21
|
Danaee P, Rouches M, Wiley M, Deng D, Huang L, Hendrix D. bpRNA: large-scale automated annotation and analysis of RNA secondary structure. Nucleic Acids Res 2019; 46:5381-5394. [PMID: 29746666 PMCID: PMC6009582 DOI: 10.1093/nar/gky285] [Citation(s) in RCA: 109] [Impact Index Per Article: 18.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2018] [Accepted: 04/11/2018] [Indexed: 01/04/2023] Open
Abstract
While RNA secondary structure prediction from sequence data has made remarkable progress, there is a need for improved strategies for annotating the features of RNA secondary structures. Here, we present bpRNA, a novel annotation tool capable of parsing RNA structures, including complex pseudoknot-containing RNAs, to yield an objective, precise, compact, unambiguous, easily-interpretable description of all loops, stems, and pseudoknots, along with the positions, sequence, and flanking base pairs of each such structural feature. We also introduce several new informative representations of RNA structure types to improve structure visualization and interpretation. We have further used bpRNA to generate a web-accessible meta-database, ‘bpRNA-1m’, of over 100 000 single-molecule, known secondary structures; this is both more fully and accurately annotated and over 20-times larger than existing databases. We use a subset of the database with highly similar (≥90% identical) sequences filtered out to report on statistical trends in sequence, flanking base pairs, and length. Both the bpRNA method and the bpRNA-1m database will be valuable resources both for specific analysis of individual RNA molecules and large-scale analyses such as are useful for updating RNA energy parameters for computational thermodynamic predictions, improving machine learning models for structure prediction, and for benchmarking structure-prediction algorithms.
Collapse
Affiliation(s)
| | | | | | - Dezhong Deng
- School of Electrical Engineering and Computer Science
| | - Liang Huang
- School of Electrical Engineering and Computer Science
| | - David Hendrix
- School of Electrical Engineering and Computer Science.,Department of Biochemistry and Biophysics
| |
Collapse
|
22
|
Su C, Weir JD, Zhang F, Yan H, Wu T. ENTRNA: a framework to predict RNA foldability. BMC Bioinformatics 2019; 20:373. [PMID: 31269893 PMCID: PMC6610807 DOI: 10.1186/s12859-019-2948-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2018] [Accepted: 06/12/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND RNA molecules play many crucial roles in living systems. The spatial complexity that exists in RNA structures determines their cellular functions. Therefore, understanding RNA folding conformations, in particular, RNA secondary structures, is critical for elucidating biological functions. Existing literature has focused on RNA design as either an RNA structure prediction problem or an RNA inverse folding problem where free energy has played a key role. RESULTS In this research, we propose a Positive-Unlabeled data- driven framework termed ENTRNA. Other than free energy and commonly studied sequence and structural features, we propose a new feature, Sequence Segment Entropy (SSE), to measure the diversity of RNA sequences. ENTRNA is trained and cross-validated using 1024 pseudoknot-free RNAs and 1060 pseudoknotted RNAs from the RNASTRAND database respectively. To test the robustness of the ENTRNA, the models are further blind tested on 206 pseudoknot-free and 93 pseudoknotted RNAs from the PDB database. For pseudoknot-free RNAs, ENTRNA has 86.5% sensitivity on the training dataset and 80.6% sensitivity on the testing dataset. For pseudoknotted RNAs, ENTRNA shows 81.5% sensitivity on the training dataset and 71.0% on the testing dataset. To test the applicability of ENTRNA to long structural-complex RNA, we collect 5 laboratory synthetic RNAs ranging from 1618 to 1790 nucleotides. ENTRNA is able to predict the foldability of 4 RNAs. CONCLUSION In this article, we reformulate the RNA design problem as a foldability prediction problem which is to predict the likelihood of the co-existence of a sequence-structure pair. This new construct has the potential for both RNA structure prediction and the inverse folding problem. In addition, this new construct enables us to explore data-driven approaches in RNA research.
Collapse
Affiliation(s)
- Congzhe Su
- School of Computing, Informatics, Decision Systems Engineering, Arizona State University, Tempe, AZ 85281 USA
| | - Jeffery D. Weir
- Department of Operational Sciences, Graduate School of Engineering and Management, Air Force Institute of Technology, Wright-Patterson AFB, Dayton, OH 45433 USA
| | - Fei Zhang
- Biodesign Center for Molecular Design and Biomimetics, The Biodesign Institute & School of Molecular Sciences, Arizona State University, Tempe, AZ 85281 USA
| | - Hao Yan
- Biodesign Center for Molecular Design and Biomimetics, The Biodesign Institute & School of Molecular Sciences, Arizona State University, Tempe, AZ 85281 USA
| | - Teresa Wu
- School of Computing, Informatics, Decision Systems Engineering, Arizona State University, Tempe, AZ 85281 USA
| |
Collapse
|
23
|
Thiel BC, Beckmann IK, Kerpedjiev P, Hofacker IL. 3D based on 2D: Calculating helix angles and stacking patterns using forgi 2.0, an RNA Python library centered on secondary structure elements. F1000Res 2019; 8:ISCB Comm J-287. [PMID: 31069053 PMCID: PMC6480952 DOI: 10.12688/f1000research.18458.2] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 04/04/2019] [Indexed: 01/01/2023] Open
Abstract
We present forgi, a Python library to analyze the tertiary structure of RNA secondary structure elements. Our representation of an RNA molecule is centered on secondary structure elements (stems, bulges and loops). By fitting a cylinder to the helix axis, these elements are carried over into a coarse-grained 3D structure representation. Integration with Biopython allows for handling of all-atom 3D information. forgi can deal with a variety of file formats including dotbracket strings, PDB and MMCIF files. We can handle modified residues, missing residues, cofold and multifold structures as well as nucleotide numbers starting at arbitrary positions. We apply this library to the study of stacking helices in junctions and pseudoknots and investigate how far stacking helices in solved experimental structures can divert from coaxial geometries.
Collapse
Affiliation(s)
- Bernhard C. Thiel
- Department of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Vienna, 1090, Austria
| | - Irene K. Beckmann
- Department of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Vienna, 1090, Austria
| | - Peter Kerpedjiev
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, 02115, USA
| | - Ivo L. Hofacker
- Department of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Vienna, 1090, Austria
- Research Group Bioinformatics and Computational Biology, Faculty of Computer Science, University of Vienna, Vienna, 1090, Austria
| |
Collapse
|
24
|
Thiel BC, Beckmann IK, Kerpedjiev P, Hofacker IL. 3D based on 2D: Calculating helix angles and stacking patterns using forgi 2.0, an RNA Python library centered on secondary structure elements. F1000Res 2019; 8:ISCB Comm J-287. [PMID: 31069053 PMCID: PMC6480952 DOI: 10.12688/f1000research.18458.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/06/2019] [Indexed: 10/12/2023] Open
Abstract
We present forgi, a Python library to analyze the tertiary structure of RNA secondary structure elements. Our representation of an RNA molecule is centered on secondary structure elements (stems, bulges and loops). By fitting a cylinder to the helix axis, these elements are carried over into a coarse-grained 3D structure representation. Integration with Biopython allows for handling of all-atom 3D information. forgi can deal with a variety of file formats including dotbracket strings, PDB and MMCIF files. We can handle modified residues, missing residues, cofold and multifold structures as well as nucleotide numbers starting at arbitrary positions. We apply this library to the study of stacking helices in junctions and pseudo knots and investigate how far stacking helices in solved experimental structures can divert from coaxial geometries.
Collapse
Affiliation(s)
- Bernhard C. Thiel
- Department of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Vienna, 1090, Austria
| | - Irene K. Beckmann
- Department of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Vienna, 1090, Austria
| | - Peter Kerpedjiev
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, 02115, USA
| | - Ivo L. Hofacker
- Department of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Vienna, 1090, Austria
- Research Group Bioinformatics and Computational Biology, Faculty of Computer Science, University of Vienna, Vienna, 1090, Austria
| |
Collapse
|
25
|
Bellaousov S, Kayedkhordeh M, Peterson RJ, Mathews DH. Accelerated RNA secondary structure design using preselected sequences for helices and loops. RNA (NEW YORK, N.Y.) 2018; 24:1555-1567. [PMID: 30097542 PMCID: PMC6191713 DOI: 10.1261/rna.066324.118] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/07/2018] [Accepted: 08/06/2018] [Indexed: 06/08/2023]
Abstract
Nucleic acids can be designed to be nano-machines, pharmaceuticals, or probes. RNA secondary structures can form the basis of self-assembling nanostructures. There are only four natural RNA bases, therefore it can be difficult to design sequences that fold to a single, specified structure because many other structures are often possible for a given sequence. One approach taken by state-of-the-art sequence design methods is to select sequences that fold to the specified structure using stochastic, iterative refinement. The goal of this work is to accelerate design. Many existing iterative methods select and refine sequences one base pair and one unpaired nucleotide at a time. Here, the hypothesis that sequences can be preselected in order to accelerate design was tested. To this aim, a database was built of helix sequences that demonstrate thermodynamic features found in natural sequences and that also have little tendency to cross-hybridize. Additionally, a database was assembled of RNA loop sequences with low helix-formation propensity and little tendency to cross-hybridize with either the helices or other loops. These databases of preselected sequences accelerate the selection of sequences that fold with minimal ensemble defect by replacing some of the trial and error of current refinement approaches. When using the database of preselected sequences as compared to randomly chosen sequences, sequences for natural structures are designed 36 times faster, and random structures are designed six times faster. The sequences selected with the aid of the database have similar ensemble defect as those sequences selected at random. The sequence database is part of RNAstructure package at http://rna.urmc.rochester.edu/RNAstructure.html.
Collapse
Affiliation(s)
- Stanislav Bellaousov
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York 14642, USA
| | - Mohammad Kayedkhordeh
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York 14642, USA
| | | | - David H Mathews
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York 14642, USA
- Department of Biostatistics and Computational Biology, University of Rochester Medical Center, Rochester, New York 14642, USA
| |
Collapse
|
26
|
Lindow N, Baum D, Leborgne M, Hege HC. Interactive Visualization of RNA and DNA Structures. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 25:967-976. [PMID: 30334794 DOI: 10.1109/tvcg.2018.2864507] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The analysis and visualization of nucleic acids (RNA and DNA) is playing an increasingly important role due to their fundamental importance for all forms of life and the growing number of known 3D structures of such molecules. The great complexity of these structures, in particular, those of RNA, demands interactive visualization to get deeper insights into the relationship between the 2D secondary structure motifs and their 3D tertiary structures. Over the last decades, a lot of research in molecular visualization has focused on the visual exploration of protein structures while nucleic acids have only been marginally addressed. In contrast to proteins, which are composed of amino acids, the ingredients of nucleic acids are nucleotides. They form structuring patterns that differ from those of proteins and, hence, also require different visualization and exploration techniques. In order to support interactive exploration of nucleic acids, the computation of secondary structure motifs as well as their visualization in 2D and 3D must be fast. Therefore, in this paper, we focus on the performance of both the computation and visualization of nucleic acid structure. We present a ray casting-based visualization of RNA and DNA secondary and tertiary structures, which enables for the first time real-time visualization of even large molecular dynamics trajectories. Furthermore, we provide a detailed description of all important aspects to visualize nucleic acid secondary and tertiary structures. With this, we close an important gap in molecular visualization.
Collapse
|
27
|
Ge P, Islam S, Zhong C, Zhang S. De novo discovery of structural motifs in RNA 3D structures through clustering. Nucleic Acids Res 2018; 46:4783-4793. [PMID: 29534235 PMCID: PMC5961109 DOI: 10.1093/nar/gky139] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2017] [Revised: 02/09/2018] [Accepted: 02/16/2018] [Indexed: 11/16/2022] Open
Abstract
As functional components in three-dimensional (3D) conformation of an RNA, the RNA structural motifs provide an easy way to associate the molecular architectures with their biological mechanisms. In the past years, many computational tools have been developed to search motif instances by using the existing knowledge of well-studied families. Recently, with the rapidly increasing number of resolved RNA 3D structures, there is an urgent need to discover novel motifs with the newly presented information. In this work, we classify all the loops in non-redundant RNA 3D structures to detect plausible RNA structural motif families by using a clustering pipeline. Compared with other clustering approaches, our method has two benefits: first, the underlying alignment algorithm is tolerant to the variations in 3D structures. Second, sophisticated downstream analysis has been performed to ensure the clusters are valid and easily applied to further research. The final clustering results contain many interesting new variants of known motif families, such as GNAA tetraloop, kink-turn, sarcin-ricin and T-loop. We have also discovered potential novel functional motifs conserved in ribosomal RNA, sgRNA, SRP RNA, riboswitch and ribozyme.
Collapse
Affiliation(s)
- Ping Ge
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Shahidul Islam
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Cuncong Zhong
- Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS 66045, USA
| | - Shaojie Zhang
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| |
Collapse
|
28
|
Antczak M, Popenda M, Zok T, Zurkowski M, Adamiak RW, Szachniuk M. New algorithms to represent complex pseudoknotted RNA structures in dot-bracket notation. Bioinformatics 2018; 34:1304-1312. [PMID: 29236971 PMCID: PMC5905660 DOI: 10.1093/bioinformatics/btx783] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2017] [Revised: 10/23/2017] [Accepted: 12/08/2017] [Indexed: 11/12/2022] Open
Abstract
Motivation Understanding the formation, architecture and roles of pseudoknots in RNA structures are one of the most difficult challenges in RNA computational biology and structural bioinformatics. Methods predicting pseudoknots typically perform this with poor accuracy, often despite experimental data incorporation. Existing bioinformatic approaches differ in terms of pseudoknots' recognition and revealing their nature. A few ways of pseudoknot classification exist, most common ones refer to a genus or order. Following the latter one, we propose new algorithms that identify pseudoknots in RNA structure provided in BPSEQ format, determine their order and encode in dot-bracket-letter notation. The proposed encoding aims to illustrate the hierarchy of RNA folding. Results New algorithms are based on dynamic programming and hybrid (combining exhaustive search and random walk) approaches. They evolved from elementary algorithm implemented within the workflow of RNA FRABASE 1.0, our database of RNA structure fragments. They use different scoring functions to rank dissimilar dot-bracket representations of RNA structure. Computational experiments show an advantage of new methods over the others, especially for large RNA structures. Availability and implementation Presented algorithms have been implemented as new functionality of RNApdbee webserver and are ready to use at http://rnapdbee.cs.put.poznan.pl. Contact mszachniuk@cs.put.poznan.pl. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Maciej Antczak
- Institute of Computing Science, Poznan University of Technology, Poznan, Poland
| | - Mariusz Popenda
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| | - Tomasz Zok
- Institute of Computing Science, Poznan University of Technology, Poznan, Poland
- Poznan Supercomputing and Networking Center, Poznan, Poland
| | - Michal Zurkowski
- Institute of Computing Science, Poznan University of Technology, Poznan, Poland
| | - Ryszard W Adamiak
- Institute of Computing Science, Poznan University of Technology, Poznan, Poland
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| | - Marta Szachniuk
- Institute of Computing Science, Poznan University of Technology, Poznan, Poland
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| |
Collapse
|
29
|
Shabash B, Wiese KC. RNA Visualization: Relevance and the Current State-of-the-Art Focusing on Pseudoknots. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:696-712. [PMID: 26915129 DOI: 10.1109/tcbb.2016.2522421] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
RNA visualization is crucial in order to understand the relationship that exists between RNA structure and its function, as well as the development of better RNA structure prediction algorithms. However, in the context of RNA visualization, one key structure remains difficult to visualize: Pseudoknots. Pseudoknots occur in RNA folding when two secondary structural components form base-pairs between them. The three-dimensional nature of these components makes them challenging to visualize in two-dimensional media, such as print media or screens. In this review, we focus on the advancements that have been made in the field of RNA visualization in two-dimensional media in the past two decades. The review aims at presenting all relevant aspects of pseudoknot visualization. We start with an overview of several pseudoknotted structures and their relevance in RNA function. Next, we discuss the theoretical basis for RNA structural topology classification and present RNA classification systems for both pseudoknotted and non-pseudoknotted RNAs. Each description of RNA classification system is followed by a discussion of the software tools and algorithms developed to date to visualize RNA, comparing the different tools' strengths and shortcomings.
Collapse
|
30
|
Holzhauser E, Ge P, Zhang S. WebSTAR3D: a web server for RNA 3D structural alignment. Bioinformatics 2016; 32:3673-3675. [DOI: 10.1093/bioinformatics/btw502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2016] [Revised: 06/17/2016] [Accepted: 07/26/2016] [Indexed: 11/13/2022] Open
|
31
|
Hua L, Song Y, Kim N, Laing C, Wang JTL, Schlick T. CHSalign: A Web Server That Builds upon Junction-Explorer and RNAJAG for Pairwise Alignment of RNA Secondary Structures with Coaxial Helical Stacking. PLoS One 2016; 11:e0147097. [PMID: 26789998 PMCID: PMC4720362 DOI: 10.1371/journal.pone.0147097] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2015] [Accepted: 12/29/2015] [Indexed: 01/01/2023] Open
Abstract
RNA junctions are important structural elements of RNA molecules. They are formed when three or more helices come together in three-dimensional space. Recent studies have focused on the annotation and prediction of coaxial helical stacking (CHS) motifs within junctions. Here we exploit such predictions to develop an efficient alignment tool to handle RNA secondary structures with CHS motifs. Specifically, we build upon our Junction-Explorer software for predicting coaxial stacking and RNAJAG for modelling junction topologies as tree graphs to incorporate constrained tree matching and dynamic programming algorithms into a new method, called CHSalign, for aligning the secondary structures of RNA molecules containing CHS motifs. Thus, CHSalign is intended to be an efficient alignment tool for RNAs containing similar junctions. Experimental results based on thousands of alignments demonstrate that CHSalign can align two RNA secondary structures containing CHS motifs more accurately than other RNA secondary structure alignment tools. CHSalign yields a high score when aligning two RNA secondary structures with similar CHS motifs or helical arrangement patterns, and a low score otherwise. This new method has been implemented in a web server, and the program is also made freely available, at http://bioinformatics.njit.edu/CHSalign/.
Collapse
Affiliation(s)
- Lei Hua
- Bioinformatics Laboratory, Department of Computer Science, New Jersey Institute of Technology, Newark, New Jersey, United States of America
| | - Yang Song
- Bioinformatics Laboratory, Department of Computer Science, New Jersey Institute of Technology, Newark, New Jersey, United States of America
| | - Namhee Kim
- Department of Chemistry, New York University, New York, New York, United States of America
| | - Christian Laing
- Bioinformatics Laboratory, Department of Computer Science, New Jersey Institute of Technology, Newark, New Jersey, United States of America
| | - Jason T. L. Wang
- Bioinformatics Laboratory, Department of Computer Science, New Jersey Institute of Technology, Newark, New Jersey, United States of America
- * E-mail: (JW); (TS)
| | - Tamar Schlick
- Department of Chemistry, New York University, New York, New York, United States of America
- Courant Institute of Mathematical Sciences, New York University, New York, New York, United States of America
- * E-mail: (JW); (TS)
| |
Collapse
|
32
|
Chen JL, Bellaousov S, Tubbs JD, Kennedy SD, Lopez MJ, Mathews DH, Turner DH. Nuclear Magnetic Resonance-Assisted Prediction of Secondary Structure for RNA: Incorporation of Direction-Dependent Chemical Shift Constraints. Biochemistry 2015; 54:6769-82. [PMID: 26451676 PMCID: PMC4666457 DOI: 10.1021/acs.biochem.5b00833] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
![]()
Knowledge
of RNA
structure is necessary to determine structure–function relationships
and to facilitate design of potential therapeutics.
RNA secondary structure prediction can be improved by applying constraints
from nuclear magnetic resonance (NMR) experiments to a dynamic programming
algorithm. Imino proton walks from NOESY spectra reveal double-stranded
regions. Chemical shifts of protons in GH1, UH3, and UH5 of GU pairs,
UH3, UH5, and AH2 of AU pairs, and GH1 of GC pairs were analyzed to
identify constraints for the 5′ to 3′ directionality
of base pairs in helices. The 5′ to 3′ directionality
constraints were incorporated into an NMR-assisted prediction of secondary
structure (NAPSS-CS) program. When it was tested on 18 structures,
including nine pseudoknots, the sensitivity and positive predictive
value were improved relative to those of three unrestrained programs.
The prediction accuracy for the pseudoknots improved the most. The
program also facilitates assignment of chemical shifts to individual
nucleotides, a necessary step for determining three-dimensional structure.
Collapse
Affiliation(s)
- Jonathan L Chen
- Department of Chemistry, University of Rochester , Rochester, New York 14627, United States
| | - Stanislav Bellaousov
- Department of Biochemistry and Biophysics, University of Rochester School of Medicine and Dentistry , Rochester, New York 14642, United States
| | - Jason D Tubbs
- Department of Chemistry, University of Rochester , Rochester, New York 14627, United States
| | - Scott D Kennedy
- Department of Biochemistry and Biophysics, University of Rochester School of Medicine and Dentistry , Rochester, New York 14642, United States
| | - Michael J Lopez
- Department of Chemistry, University of Rochester , Rochester, New York 14627, United States
| | - David H Mathews
- Department of Biochemistry and Biophysics, University of Rochester School of Medicine and Dentistry , Rochester, New York 14642, United States.,Center for RNA Biology, University of Rochester , Rochester, New York 14642, United States
| | - Douglas H Turner
- Department of Chemistry, University of Rochester , Rochester, New York 14627, United States.,Center for RNA Biology, University of Rochester , Rochester, New York 14642, United States
| |
Collapse
|
33
|
Abstract
We describe the first dynamic programming algorithm that computes the expected degree for the network, or graph G = (V, E) of all secondary structures of a given RNA sequence a = a1, …, an. Here, the nodes V correspond to all secondary structures of a, while an edge exists between nodes s, t if the secondary structure t can be obtained from s by adding, removing or shifting a base pair. Since secondary structure kinetics programs implement the Gillespie algorithm, which simulates a random walk on the network of secondary structures, the expected network degree may provide a better understanding of kinetics of RNA folding when allowing defect diffusion, helix zippering, and related conformation transformations. We determine the correlation between expected network degree, contact order, conformational entropy, and expected number of native contacts for a benchmarking dataset of RNAs. Source code is available at http://bioinformatics.bc.edu/clotelab/RNAexpNumNbors.
Collapse
|
34
|
FASTR: A novel data format for concomitant representation of RNA sequence and secondary structure information. J Biosci 2015; 40:571-7. [PMID: 26333403 DOI: 10.1007/s12038-015-9546-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
Given the importance of RNA secondary structures in defining their biological role, it would be convenient for researchers seeking RNA data if both sequence and structural information pertaining to RNA molecules are made available together. Current nucleotide data repositories archive only RNA sequence data. Furthermore, storage formats which can frugally represent RNA sequence as well as structure data in a single file, are currently unavailable. This article proposes a novel storage format, 'FASTR', for concomitant representation of RNA sequence and structure. The storage efficiency of the proposed FASTR format has been evaluated using RNA data from various microorganisms. Results indicate that the size of FASTR formatted files (containing both RNA sequence as well as structure information) are equivalent to that of FASTA-format files, which contain only RNA sequence information. RNA secondary structure is typically represented using a combination of a string of nucleotide characters along with the corresponding dot-bracket notation indicating structural attributes. 'FASTR' - the novel storage format proposed in the present study enables a frugal representation of both RNA sequence and structural information in the form of a single string. In spite of having a relatively smaller storage footprint, the resultant 'fastr' string(s) retain all sequence as well as secondary structural information that could be stored using a dot-bracket notation. An implementation of the 'FASTR' methodology is available for download at http://metagenomics.atc.tcs.com/compression/fastr.
Collapse
|
35
|
Zahran M, Sevim Bayrak C, Elmetwaly S, Schlick T. RAG-3D: a search tool for RNA 3D substructures. Nucleic Acids Res 2015; 43:9474-88. [PMID: 26304547 PMCID: PMC4627073 DOI: 10.1093/nar/gkv823] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2014] [Accepted: 08/03/2015] [Indexed: 01/23/2023] Open
Abstract
To address many challenges in RNA structure/function prediction, the characterization of RNA's modular architectural units is required. Using the RNA-As-Graphs (RAG) database, we have previously explored the existence of secondary structure (2D) submotifs within larger RNA structures. Here we present RAG-3D—a dataset of RNA tertiary (3D) structures and substructures plus a web-based search tool—designed to exploit graph representations of RNAs for the goal of searching for similar 3D structural fragments. The objects in RAG-3D consist of 3D structures translated into 3D graphs, cataloged based on the connectivity between their secondary structure elements. Each graph is additionally described in terms of its subgraph building blocks. The RAG-3D search tool then compares a query RNA 3D structure to those in the database to obtain structurally similar structures and substructures. This comparison reveals conserved 3D RNA features and thus may suggest functional connections. Though RNA search programs based on similarity in sequence, 2D, and/or 3D structural elements are available, our graph-based search tool may be advantageous for illuminating similarities that are not obvious; using motifs rather than sequence space also reduces search times considerably. Ultimately, such substructuring could be useful for RNA 3D structure prediction, structure/function inference and inverse folding.
Collapse
Affiliation(s)
- Mai Zahran
- Biological Sciences Department, New York City College of Technology, City University of New York, Brooklyn, NY 11201, USA
| | | | - Shereef Elmetwaly
- Department of Chemistry, New York University, New York, NY 10003, USA
| | - Tamar Schlick
- Department of Chemistry, New York University, New York, NY 10003, USA Courant Institute of Mathematical Sciences, New York University, New York, NY 10012, USA
| |
Collapse
|
36
|
Lu XJ, Bussemaker HJ, Olson WK. DSSR: an integrated software tool for dissecting the spatial structure of RNA. Nucleic Acids Res 2015; 43:e142. [PMID: 26184874 PMCID: PMC4666379 DOI: 10.1093/nar/gkv716] [Citation(s) in RCA: 151] [Impact Index Per Article: 15.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2015] [Accepted: 07/02/2015] [Indexed: 12/16/2022] Open
Abstract
Insight into the three-dimensional architecture of RNA is essential for understanding its cellular functions. However, even the classic transfer RNA structure contains features that are overlooked by existing bioinformatics tools. Here we present DSSR (Dissecting the Spatial Structure of RNA), an integrated and automated tool for analyzing and annotating RNA tertiary structures. The software identifies canonical and noncanonical base pairs, including those with modified nucleotides, in any tautomeric or protonation state. DSSR detects higher-order coplanar base associations, termed multiplets. It finds arrays of stacked pairs, classifies them by base-pair identity and backbone connectivity, and distinguishes a stem of covalently connected canonical pairs from a helix of stacked pairs of arbitrary type/linkage. DSSR identifies coaxial stacking of multiple stems within a single helix and lists isolated canonical pairs that lie outside of a stem. The program characterizes 'closed' loops of various types (hairpin, bulge, internal, and junction loops) and pseudoknots of arbitrary complexity. Notably, DSSR employs isolated pairs and the ends of stems, whether pseudoknotted or not, to define junction loops. This new, inclusive definition provides a novel perspective on the spatial organization of RNA. Tests on all nucleic acid structures in the Protein Data Bank confirm the efficiency and robustness of the software, and applications to representative RNA molecules illustrate its unique features. DSSR and related materials are freely available at http://x3dna.org/.
Collapse
Affiliation(s)
- Xiang-Jun Lu
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
| | - Harmen J Bussemaker
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | - Wilma K Olson
- Department of Chemistry and Chemical Biology, Rutgers - The State University of New Jersey, Piscataway, NJ 08854, USA
| |
Collapse
|
37
|
Ge P, Zhang S. STAR3D: a stack-based RNA 3D structural alignment tool. Nucleic Acids Res 2015; 43:e137. [PMID: 26184875 PMCID: PMC4787758 DOI: 10.1093/nar/gkv697] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2015] [Accepted: 06/26/2015] [Indexed: 01/08/2023] Open
Abstract
The various roles of versatile non-coding RNAs typically require the attainment of complex high-order structures. Therefore, comparing the 3D structures of RNA molecules can yield in-depth understanding of their functional conservation and evolutionary history. Recently, many powerful tools have been developed to align RNA 3D structures. Although some methods rely on both backbone conformations and base pairing interactions, none of them consider the entire hierarchical formation of the RNA secondary structure. One of the major issues is that directly applying the algorithms of matching 2D structures to the 3D coordinates is particularly time-consuming. In this article, we propose a novel RNA 3D structural alignment tool, STAR3D, to take into full account the 2D relations between stacks without the complicated comparison of secondary structures. First, the 3D conserved stacks in the inputs are identified and then combined into a tree-like consensus. Afterward, the loop regions are compared one-to-one in accordance with their relative positions in the consensus tree. The experimental results show that the prediction of STAR3D is more accurate for both non-homologous and homologous RNAs than other state-of-the-art tools with shorter running time.
Collapse
Affiliation(s)
- Ping Ge
- Department of Electrical Engineering and Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Shaojie Zhang
- Department of Electrical Engineering and Computer Science, University of Central Florida, Orlando, FL 32816, USA
| |
Collapse
|
38
|
Kerpedjiev P, Höner Zu Siederdissen C, Hofacker IL. Predicting RNA 3D structure using a coarse-grain helix-centered model. RNA (NEW YORK, N.Y.) 2015; 21:1110-1121. [PMID: 25904133 PMCID: PMC4436664 DOI: 10.1261/rna.047522.114] [Citation(s) in RCA: 54] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/20/2014] [Accepted: 02/13/2015] [Indexed: 06/04/2023]
Abstract
A 3D model of RNA structure can provide information about its function and regulation that is not possible with just the sequence or secondary structure. Current models suffer from low accuracy and long running times and either neglect or presume knowledge of the long-range interactions which stabilize the tertiary structure. Our coarse-grained, helix-based, tertiary structure model operates with only a few degrees of freedom compared with all-atom models while preserving the ability to sample tertiary structures given a secondary structure. It strikes a balance between the precision of an all-atom tertiary structure model and the simplicity and effectiveness of a secondary structure representation. It provides a simplified tool for exploring global arrangements of helices and loops within RNA structures. We provide an example of a novel energy function relying only on the positions of stems and loops. We show that coupling our model to this energy function produces predictions as good as or better than the current state of the art tools. We propose that given the wide range of conformational space that needs to be explored, a coarse-grain approach can explore more conformations in less iterations than an all-atom model coupled to a fine-grain energy function. Finally, we emphasize the overarching theme of providing an ensemble of predicted structures, something which our tool excels at, rather than providing a handful of the lowest energy structures.
Collapse
Affiliation(s)
| | - Christian Höner Zu Siederdissen
- Institute for Theoretical Chemistry, A-1090 Vienna, Austria Bioinformatics Group, Department of Computer Science, Universität Leipzig, D-04107 Leipzig, Germany Interdisciplinary Center for Bioinformatics, Universität Leipzig, D-04107 Leipzig, Germany
| | - Ivo L Hofacker
- Institute for Theoretical Chemistry, A-1090 Vienna, Austria Research Group Bioinformatics and Computational Biology, University of Vienna, A-1090 Vienna, Austria Center for non-coding RNA in Technology and Health, Department of Veterinary Clinical and Animal Science, University of Copenhagen, DK-1870 Frederiksberg, Denmark
| |
Collapse
|
39
|
Górska A, Jasiński M, Trylska J. MINT: software to identify motifs and short-range interactions in trajectories of nucleic acids. Nucleic Acids Res 2015; 43:e114. [PMID: 26024667 PMCID: PMC4787793 DOI: 10.1093/nar/gkv559] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2015] [Accepted: 05/15/2015] [Indexed: 12/18/2022] Open
Abstract
Structural biology experiments and structure prediction tools have provided many
high-resolution three-dimensional structures of nucleic acids. Also, molecular
dynamics force field parameters have been adapted to simulating charged and flexible
nucleic acid structures on microsecond time scales. Therefore, we can generate the
dynamics of DNA or RNA molecules, but we still lack adequate tools for the analysis
of the resulting huge amounts of data. We present MINT (Motif
Identifier for Nucleic acids Trajectory) — an automatic tool for analyzing
three-dimensional structures of RNA and DNA, and their full-atom molecular dynamics
trajectories or other conformation sets (e.g. X-ray or nuclear magnetic
resonance-derived structures). For each RNA or DNA conformation
MINT determines the hydrogen bonding network resolving the
base pairing patterns, identifies secondary structure motifs (helices, junctions,
loops, etc.) and pseudoknots. MINT also estimates the energy
of stacking and phosphate anion-base interactions. For many conformations, as in a
molecular dynamics trajectory, MINT provides averages of the
above structural and energetic features and their evolution. We show
MINT functionality based on all-atom explicit solvent
molecular dynamics trajectory of the 30S ribosomal subunit.
Collapse
Affiliation(s)
- Anna Górska
- Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097 Warsaw, Poland Master studies at the Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, Banacha 2, Warsaw, Poland
| | - Maciej Jasiński
- Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097 Warsaw, Poland College of Inter-Faculty Individual Studies in Mathematics and Natural Sciences, University of Warsaw, Al. Żwirki i Wigury 93, 02-089 Warsaw, Poland
| | - Joanna Trylska
- Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097 Warsaw, Poland
| |
Collapse
|
40
|
Chiu JKH, Chen YPP. Efficient conversion of RNA pseudoknots to knot-free structures using a graphical model. IEEE Trans Biomed Eng 2014; 62:1265-71. [PMID: 25474805 DOI: 10.1109/tbme.2014.2375360] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
RNA secondary structures are vital in determining the 3-D structures of noncoding RNA molecules, which in turn affect their functions. Computational RNA secondary structure alignment and analysis are biologically significant, because they help identify numerous functionally important motifs. Unfortunately, many analysis methods suffer from computational intractability in the presence of pseudoknots. The conversion of knotted to knot-free secondary structures is an essential preprocessing step, and is regarded as pseudoknot removal. Although exact methods have been proposed for this task, their computational complexities are undetermined, and so their efficiencies in processing complex pseudoknots are currently unknown. We transformed the pseudoknot removal problem into a circle graph maximum weight independent set (MWIS) problem, in which each MWIS represents a unique optimal deknotted structure. An existing circle graph MWIS algorithm was extended to report either single or all solutions. Its time complexity depends on the number of MWISs, and is guaranteed to report one solution in polynomial time. Experimental results suggest that our extended algorithm is much more efficient than the state-of-the-art tool. We also devised a novel concept called the structural scoring function, and investigated its effectiveness in more accurate solution candidate selection for a certain criteria.
Collapse
|
41
|
Antczak M, Zok T, Popenda M, Lukasiak P, Adamiak RW, Blazewicz J, Szachniuk M. RNApdbee--a webserver to derive secondary structures from pdb files of knotted and unknotted RNAs. Nucleic Acids Res 2014; 42:W368-72. [PMID: 24771339 PMCID: PMC4086112 DOI: 10.1093/nar/gku330] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
In RNA structural biology and bioinformatics an access to correct RNA secondary structure and its proper representation is of crucial importance. This is true especially in the field of secondary and 3D RNA structure prediction. Here, we introduce RNApdbee-a new tool that allows to extract RNA secondary structure from the pdb file, and presents it in both textual and graphical form. RNApdbee supports processing of knotted and unknotted structures of large RNAs, also within protein complexes. The method works not only for first but also for high order pseudoknots, and gives an information about canonical and non-canonical base pairs. A combination of these features is unique among existing applications for RNA structure analysis. Additionally, a function of converting between the text notations, i.e. BPSEQ, CT and extended dot-bracket, is provided. In order to facilitate a more comprehensive study, the webserver integrates the functionality of RNAView, MC-Annotate and 3DNA/DSSR, being the most common tools used for automated identification and classification of RNA base pairs. RNApdbee is implemented as a publicly available webserver with an intuitive interface and can be freely accessed at http://rnapdbee.cs.put.poznan.pl/.
Collapse
Affiliation(s)
- Maciej Antczak
- Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland
| | - Tomasz Zok
- Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland
| | - Mariusz Popenda
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland
| | - Piotr Lukasiak
- Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland
| | - Ryszard W Adamiak
- Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland
| | - Jacek Blazewicz
- Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland
| | - Marta Szachniuk
- Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland
| |
Collapse
|
42
|
Petrov AS, Bernier CR, Gulen B, Waterbury CC, Hershkovits E, Hsiao C, Harvey SC, Hud NV, Fox GE, Wartell RM, Williams LD. Secondary structures of rRNAs from all three domains of life. PLoS One 2014; 9:e88222. [PMID: 24505437 PMCID: PMC3914948 DOI: 10.1371/journal.pone.0088222] [Citation(s) in RCA: 98] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2013] [Accepted: 01/03/2014] [Indexed: 12/19/2022] Open
Abstract
Accurate secondary structures are important for understanding ribosomes, which are extremely large and highly complex. Using 3D structures of ribosomes as input, we have revised and corrected traditional secondary (2°) structures of rRNAs. We identify helices by specific geometric and molecular interaction criteria, not by co-variation. The structural approach allows us to incorporate non-canonical base pairs on parity with Watson-Crick base pairs. The resulting rRNA 2° structures are up-to-date and consistent with three-dimensional structures, and are information-rich. These 2° structures are relatively simple to understand and are amenable to reproduction and modification by end-users. The 2° structures made available here broadly sample the phylogenetic tree and are mapped with a variety of data related to molecular interactions and geometry, phylogeny and evolution. We have generated 2° structures for both large subunit (LSU) 23S/28S and small subunit (SSU) 16S/18S rRNAs of Escherichia coli, Thermus thermophilus, Haloarcula marismortui (LSU rRNA only), Saccharomyces cerevisiae, Drosophila melanogaster, and Homo sapiens. We provide high-resolution editable versions of the 2° structures in several file formats. For the SSU rRNA, the 2° structures use an intuitive representation of the central pseudoknot where base triples are presented as pairs of base pairs. Both LSU and SSU secondary maps are available (http://apollo.chemistry.gatech.edu/RibosomeGallery). Mapping of data onto 2° structures was performed on the RiboVision server (http://apollo.chemistry.gatech.edu/RiboVision).
Collapse
Affiliation(s)
- Anton S Petrov
- Center for Ribosomal Origins and Evolution, Georgia Institute of Technology, Atlanta, Georgia, United States of America ; School of Chemistry and Biochemistry Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Chad R Bernier
- Center for Ribosomal Origins and Evolution, Georgia Institute of Technology, Atlanta, Georgia, United States of America ; School of Chemistry and Biochemistry Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Burak Gulen
- Center for Ribosomal Origins and Evolution, Georgia Institute of Technology, Atlanta, Georgia, United States of America ; School of Chemistry and Biochemistry Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Chris C Waterbury
- Center for Ribosomal Origins and Evolution, Georgia Institute of Technology, Atlanta, Georgia, United States of America ; School of Chemistry and Biochemistry Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Eli Hershkovits
- Center for Ribosomal Origins and Evolution, Georgia Institute of Technology, Atlanta, Georgia, United States of America ; School of Chemistry and Biochemistry Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Chiaolong Hsiao
- Center for Ribosomal Origins and Evolution, Georgia Institute of Technology, Atlanta, Georgia, United States of America ; School of Chemistry and Biochemistry Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Stephen C Harvey
- Center for Ribosomal Origins and Evolution, Georgia Institute of Technology, Atlanta, Georgia, United States of America ; School of Chemistry and Biochemistry Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Nicholas V Hud
- Center for Ribosomal Origins and Evolution, Georgia Institute of Technology, Atlanta, Georgia, United States of America ; School of Chemistry and Biochemistry Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - George E Fox
- Center for Ribosomal Origins and Evolution, Georgia Institute of Technology, Atlanta, Georgia, United States of America ; Department of Biology and Biochemistry, University of Houston, Houston, Texas, United States of America
| | - Roger M Wartell
- Center for Ribosomal Origins and Evolution, Georgia Institute of Technology, Atlanta, Georgia, United States of America ; School of Chemistry and Biochemistry Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Loren Dean Williams
- Center for Ribosomal Origins and Evolution, Georgia Institute of Technology, Atlanta, Georgia, United States of America ; School of Chemistry and Biochemistry Georgia Institute of Technology, Atlanta, Georgia, United States of America
| |
Collapse
|
43
|
Abstract
Abstract shape analysis abstract shape analysis is a method to learn more about the complete Boltzmann ensemble of the secondary structures of a single RNA molecule. Abstract shapes classify competing secondary structures into classes that are defined by their arrangement of helices. It allows us to compute, in addition to the structure of minimal free energy, a set of structures that represents relevant and interesting structural alternatives. Furthermore, it allows to compute probabilities of all structures within a shape class. This allows to ensure that our representative subset covers the complete Boltzmann ensemble, except for a portion of negligible probability. This chapter explains the main functions of abstract shape analysis, as implemented in the tool RNA shapes. RNA shapes It reports on some other types of analysis that are based on the abstract shapes idea and shows how you can solve novel problems by creating your own shape abstractions.
Collapse
|
44
|
Chojnowski G, Walen T, Bujnicki JM. RNA Bricks--a database of RNA 3D motifs and their interactions. Nucleic Acids Res 2013; 42:D123-31. [PMID: 24220091 PMCID: PMC3965019 DOI: 10.1093/nar/gkt1084] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
The RNA Bricks database (http://iimcb.genesilico.pl/rnabricks), stores information about recurrent RNA 3D motifs and their interactions, found in experimentally determined RNA structures and in RNA–protein complexes. In contrast to other similar tools (RNA 3D Motif Atlas, RNA Frabase, Rloom) RNA motifs, i.e. ‘RNA bricks’ are presented in the molecular environment, in which they were determined, including RNA, protein, metal ions, water molecules and ligands. All nucleotide residues in RNA bricks are annotated with structural quality scores that describe real-space correlation coefficients with the electron density data (if available), backbone geometry and possible steric conflicts, which can be used to identify poorly modeled residues. The database is also equipped with an algorithm for 3D motif search and comparison. The algorithm compares spatial positions of backbone atoms of the user-provided query structure and of stored RNA motifs, without relying on sequence or secondary structure information. This enables the identification of local structural similarities among evolutionarily related and unrelated RNA molecules. Besides, the search utility enables searching ‘RNA bricks’ according to sequence similarity, and makes it possible to identify motifs with modified ribonucleotide residues at specific positions.
Collapse
Affiliation(s)
- Grzegorz Chojnowski
- International Institute of Molecular and Cell Biology, Trojdena 4, 02-109 Warsaw, Poland, Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, Banacha 2, 02-097 Warsaw, Poland and Institute of Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University, Umultowska 89, 61-614 Poznan, Poland
| | | | | |
Collapse
|
45
|
Bellaousov S, Reuter JS, Seetin MG, Mathews DH. RNAstructure: Web servers for RNA secondary structure prediction and analysis. Nucleic Acids Res 2013; 41:W471-4. [PMID: 23620284 PMCID: PMC3692136 DOI: 10.1093/nar/gkt290] [Citation(s) in RCA: 294] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
RNAstructure is a software package for RNA secondary structure prediction and analysis. This contribution describes a new set of web servers to provide its functionality. The web server offers RNA secondary structure prediction, including free energy minimization, maximum expected accuracy structure prediction and pseudoknot prediction. Bimolecular secondary structure prediction is also provided. Additionally, the server can predict secondary structures conserved in either two homologs or more than two homologs. Folding free energy changes can be predicted for a given RNA structure using nearest neighbor rules. Secondary structures can be compared using circular plots or the scoring methods, sensitivity and positive predictive value. Additionally, structure drawings can be rendered as SVG, postscript, jpeg or pdf. The web server is freely available for public use at: http://rna.urmc.rochester.edu/RNAstructureWeb.
Collapse
Affiliation(s)
- Stanislav Bellaousov
- Department of Biochemistry and Biophysics, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, USA
| | | | | | | |
Collapse
|
46
|
Lamiable A, Barth D, Denise A, Quessette F, Vial S, Westhof É. Automated prediction of three-way junction topological families in RNA secondary structures. Comput Biol Chem 2012; 37:1-5. [DOI: 10.1016/j.compbiolchem.2011.11.001] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2011] [Revised: 11/14/2011] [Accepted: 11/16/2011] [Indexed: 11/24/2022]
|
47
|
Seetin MG, Mathews DH. TurboKnot: rapid prediction of conserved RNA secondary structures including pseudoknots. ACTA ACUST UNITED AC 2012; 28:792-8. [PMID: 22285566 DOI: 10.1093/bioinformatics/bts044] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
MOTIVATION Many RNA molecules function without being translated into proteins, and function depends on structure. Pseudoknots are motifs in RNA secondary structures that are difficult to predict but are also often functionally important. RESULTS TurboKnot is a new algorithm for predicting the secondary structure, including pseudoknotted pairs, conserved across multiple sequences. TurboKnot finds 81.6% of all known base pairs in the systems tested, and 75.6% of predicted pairs were found in the known structures. Pseudoknots are found with half or better of the false-positive rate of previous methods.
Collapse
Affiliation(s)
- Matthew G Seetin
- Department of Biochemistry and Biophysics, University of Rochester Medical Center, Rochester, NY 14642, USA
| | | |
Collapse
|
48
|
On the page number of RNA secondary structures with pseudoknots. J Math Biol 2011; 65:1337-57. [PMID: 22159642 DOI: 10.1007/s00285-011-0493-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2011] [Revised: 07/28/2011] [Indexed: 01/05/2023]
Abstract
Let S denote the set of (possibly noncanonical) base pairs {i, j } of an RNA tertiary structure; i.e. {i, j} ∈ S if there is a hydrogen bond between the ith and jth nucleotide. The page number of S, denoted π(S), is the minimum number k such that Scan be decomposed into a disjoint union of k secondary structures. Here, we show that computing the page number is NP-complete; we describe an exact computation of page number, using constraint programming, and determine the page number of a collection of RNA tertiary structures, for which the topological genus is known. We describe an approximation algorithm from which it follows that ω(S) ≤ π(S) ≤ ω(S) ・log n,where the clique number of S, ω(S), denotes the maximum number of base pairs that pairwise cross each other.
Collapse
|
49
|
Zhong C, Zhang S. Clustering RNA structural motifs in ribosomal RNAs using secondary structural alignment. Nucleic Acids Res 2011; 40:1307-17. [PMID: 21976732 PMCID: PMC3273805 DOI: 10.1093/nar/gkr804] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
RNA structural motifs are the building blocks of the complex RNA architecture. Identification of non-coding RNA structural motifs is a critical step towards understanding of their structures and functionalities. In this article, we present a clustering approach for de novo RNA structural motif identification. We applied our approach on a data set containing 5S, 16S and 23S rRNAs and rediscovered many known motifs including GNRA tetraloop, kink-turn, C-loop, sarcin–ricin, reverse kink-turn, hook-turn, E-loop and tandem-sheared motifs, with higher accuracy than the state-of-the-art clustering method. We also identified a number of potential novel instances of GNRA tetraloop, kink-turn, sarcin–ricin and tandem-sheared motifs. More importantly, several novel structural motif families have been revealed by our clustering analysis. We identified a highly asymmetric bulge loop motif that resembles the rope sling. We also found an internal loop motif that can significantly increase the twist of the helix. Finally, we discovered a subfamily of hexaloop motif, which has significantly different geometry comparing to the currently known hexaloop motif. Our discoveries presented in this article have largely increased current knowledge of RNA structural motifs.
Collapse
Affiliation(s)
- Cuncong Zhong
- Department of Electrical Engineering and Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | | |
Collapse
|
50
|
Rother K, Potrzebowski W, Puton T, Rother M, Wywial E, Bujnicki JM. A toolbox for developing bioinformatics software. Brief Bioinform 2011; 13:244-57. [PMID: 21803787 DOI: 10.1093/bib/bbr035] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Creating useful software is a major activity of many scientists, including bioinformaticians. Nevertheless, software development in an academic setting is often unsystematic, which can lead to problems associated with maintenance and long-term availibility. Unfortunately, well-documented software development methodology is difficult to adopt, and technical measures that directly improve bioinformatic programming have not been described comprehensively. We have examined 22 software projects and have identified a set of practices for software development in an academic environment. We found them useful to plan a project, support the involvement of experts (e.g. experimentalists), and to promote higher quality and maintainability of the resulting programs. This article describes 12 techniques that facilitate a quick start into software engineering. We describe 3 of the 22 projects in detail and give many examples to illustrate the usage of particular techniques. We expect this toolbox to be useful for many bioinformatics programming projects and to the training of scientific programmers.
Collapse
Affiliation(s)
- Kristian Rother
- Laboratory of Structural Bioinformatics, Institute of Molecular Biology and Biotechnology, Collegium Biologicum, Adam Mickiewicz University, ul. Umultowska 89, 61-614 Poznan, Poland.
| | | | | | | | | | | |
Collapse
|