1
|
Rahaman MM, Khan NS, Zhang S. RNAMotifComp: a comprehensive method to analyze and identify structurally similar RNA motif families. Bioinformatics 2023; 39:i337-i346. [PMID: 37387191 DOI: 10.1093/bioinformatics/btad223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION The 3D structures of RNA play a critical role in understanding their functionalities. There exist several computational methods to study RNA 3D structures by identifying structural motifs and categorizing them into several motif families based on their structures. Although the number of such motif families is not limited, a few of them are well-studied. Out of these structural motif families, there exist several families that are visually similar or very close in structure, even with different base interactions. Alternatively, some motif families share a set of base interactions but maintain variation in their 3D formations. These similarities among different motif families, if known, can provide a better insight into the RNA 3D structural motifs as well as their characteristic functions in cell biology. RESULTS In this work, we proposed a method, RNAMotifComp, that analyzes the instances of well-known structural motif families and establishes a relational graph among them. We also have designed a method to visualize the relational graph where the families are shown as nodes and their similarity information is represented as edges. We validated our discovered correlations of the motif families using RNAMotifContrast. Additionally, we used a basic Naïve Bayes classifier to show the importance of RNAMotifComp. The relational analysis explains the functional analogies of divergent motif families and illustrates the situations where the motifs of disparate families are predicted to be of the same family. AVAILABILITY AND IMPLEMENTATION Source code publicly available at https://github.com/ucfcbb/RNAMotifFamilySimilarity.
Collapse
Affiliation(s)
- Md Mahfuzur Rahaman
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, United States
| | - Nabila Shahnaz Khan
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, United States
| | - Shaojie Zhang
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, United States
| |
Collapse
|
2
|
Chen X, Zhang S. CircularSTAR3D: a stack-based RNA 3D structural alignment tool for circular matching. Nucleic Acids Res 2023; 51:e53. [PMID: 36987885 PMCID: PMC10201423 DOI: 10.1093/nar/gkad222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 03/04/2023] [Accepted: 03/28/2023] [Indexed: 03/30/2023] Open
Abstract
The functions of non-coding RNAs usually depend on their 3D structures. Therefore, comparing RNA 3D structures is critical in analyzing their functions. We noticed an interesting phenomenon that two non-coding RNAs may share similar substructures when rotating their sequence order. To the best of our knowledge, no existing RNA 3D structural alignment tools can detect this type of matching. In this article, we defined the RNA 3D structure circular matching problem and developed a software tool named CircularSTAR3D to solve this problem. CircularSTAR3D first uses the conserved stacks (consecutive base pairs with similar 3D structures) in the input RNAs to identify the circular matched internal loops and multiloops. Then it performs a local extension iteratively to obtain the whole circular matched substructures. The computational experiments conducted on a non-redundant RNA structure dataset show that circular matching is ubiquitous. Furthermore, we demonstrated the utility of CircularSTAR3D by detecting the conserved substructures missed by regular alignment tools, including structural motifs and conserved structures between riboswitches and ribozymes from different classes. We anticipate CircularSTAR3D to be a valuable supplement to the existing RNA 3D structural analysis techniques.
Collapse
Affiliation(s)
- Xiaoli Chen
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Shaojie Zhang
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| |
Collapse
|
3
|
Ghani NSA, Emrizal R, Moffit SM, Hamdani HY, Ramlan EI, Firdaus-Raih M. GrAfSS: a webserver for substructure similarity searching and comparisons in the structures of proteins and RNA. Nucleic Acids Res 2022; 50:W375-W383. [PMID: 35639505 PMCID: PMC9252811 DOI: 10.1093/nar/gkac402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 04/28/2022] [Accepted: 05/08/2022] [Indexed: 12/03/2022] Open
Abstract
The GrAfSS (Graph theoretical Applications for Substructure Searching) webserver is a platform to search for three-dimensional substructures of: (i) amino acid side chains in protein structures; and (ii) base arrangements in RNA structures. The webserver interfaces the functions of five different graph theoretical algorithms – ASSAM, SPRITE, IMAAAGINE, NASSAM and COGNAC – into a single substructure searching suite. Users will be able to identify whether a three-dimensional (3D) arrangement of interest, such as a ligand binding site or 3D motif, observed in a protein or RNA structure can be found in other structures available in the Protein Data Bank (PDB). The webserver also allows users to determine whether a protein or RNA structure of interest contains substructural arrangements that are similar to known motifs or 3D arrangements. These capabilities allow for the functional annotation of new structures that were either experimentally determined or computationally generated (such as the coordinates generated by AlphaFold2) and can provide further insights into the diversity or conservation of functional mechanisms of structures in the PDB. The computed substructural superpositions are visualized using integrated NGL viewers. The GrAfSS server is available at http://mfrlab.org/grafss/.
Collapse
Affiliation(s)
- Nur Syatila Ab Ghani
- Institute of Systems Biology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor, Malaysia
| | - Reeki Emrizal
- Department of Applied Physics, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor, Malaysia
| | - Sabrina Mohamed Moffit
- Department of Applied Physics, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor, Malaysia
| | - Hazrina Yusof Hamdani
- Advanced Medical and Dental Institute, Universiti Sains Malaysia, Bertam, Kepala Batas 13200, Pulau Pinang, Malaysia
| | | | - Mohd Firdaus-Raih
- Institute of Systems Biology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor, Malaysia.,Department of Applied Physics, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor, Malaysia
| |
Collapse
|
4
|
Sun S, Yang J, Zhang Z. RNALigands: a database and web server for RNA-ligand interactions. RNA (NEW YORK, N.Y.) 2022; 28:115-122. [PMID: 34732566 PMCID: PMC8906548 DOI: 10.1261/rna.078889.121] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/05/2021] [Accepted: 10/25/2021] [Indexed: 06/13/2023]
Abstract
RNA molecules can fold into complex and stable 3D structures, allowing them to carry out important genetic, structural, and regulatory roles inside the cell. These complex structures often contain 3D pockets made up of secondary structural motifs that can be potentially targeted by small molecule ligands. Indeed, many RNA structures in PDB contain bound small molecules, and high-throughput experimental studies have generated a large number of interacting RNA and ligand pairs. There is considerable interest in developing small molecule lead compounds targeting viral RNAs or those RNAs implicated in neurological diseases or cancer. We hypothesize that RNAs that have similar secondary structural motifs may bind to similar small molecule ligands. Toward this goal, we established a database collecting RNA secondary structural motifs and bound small molecule ligands. We further developed a computational pipeline, which takes as input an RNA sequence, predicts its secondary structure, extracts structural motifs, and searches the database for similar secondary structure motifs and interacting small molecule. We demonstrated the utility of the server by querying α-synuclein mRNA 5' UTR sequence and finding potential matches which were validated as correct. The server is publicly available at http://RNALigands.ccbr.utoronto.ca The source code can also be downloaded at https://github.com/SaisaiSun/RNALigands.
Collapse
Affiliation(s)
- Saisai Sun
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, Shanxi, China
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Jianyi Yang
- School of Mathematical Sciences, Nankai University, Tianjin, 300071, China
| | - Zhaolei Zhang
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 3E1, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| |
Collapse
|
5
|
Oliver C, Mallet V, Philippopoulos P, Hamilton WL, Waldispühl J. Vernal: a tool for mining fuzzy network motifs in RNA. Bioinformatics 2022; 38:970-976. [PMID: 34791045 DOI: 10.1093/bioinformatics/btab768] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 09/19/2021] [Accepted: 11/09/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION RNA 3D motifs are recurrent substructures, modeled as networks of base pair interactions, which are crucial for understanding structure-function relationships. The task of automatically identifying such motifs is computationally hard, and remains a key challenge in the field of RNA structural biology and network analysis. State-of-the-art methods solve special cases of the motif problem by constraining the structural variability in occurrences of a motif, and narrowing the substructure search space. RESULTS Here, we relax these constraints by posing the motif finding problem as a graph representation learning and clustering task. This framing takes advantage of the continuous nature of graph representations to model the flexibility and variability of RNA motifs in an efficient manner. We propose a set of node similarity functions, clustering methods and motif construction algorithms to recover flexible RNA motifs. Our tool, Vernal can be easily customized by users to desired levels of motif flexibility, abundance and size. We show that Vernal is able to retrieve and expand known classes of motifs, as well as to propose novel motifs. AVAILABILITY AND IMPLEMENTATION The source code, data and a webserver are available at vernal.cs.mcgill.ca. We also provide a flexible interface and a user-friendly webserver to browse and download our results. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Carlos Oliver
- School of Computer Science, McGill University, Montréal, QC H3A 0E9, Canada.,Montreal Institute for Learning Algorithms (MILA), Montréal, QC H2S 3H1, Canada
| | - Vincent Mallet
- Structural Bioinformatics Unit, Department of Structural Biology and Chemistry, Institut Pasteur, CNRS UMR3528, C3BI, USR3756, Paris, France.,Mines ParisTech, Paris-Sciences-et-Lettres Research University, Center for Computational Biology, Paris 75272, France
| | | | - William L Hamilton
- School of Computer Science, McGill University, Montréal, QC H3A 0E9, Canada.,Montreal Institute for Learning Algorithms (MILA), Montréal, QC H2S 3H1, Canada
| | - Jérôme Waldispühl
- School of Computer Science, McGill University, Montréal, QC H3A 0E9, Canada
| |
Collapse
|
6
|
Gianfrotta C, Reinharz V, Lespinet O, Barth D, Denise A. On the predictibility of A-minor motifs from their local contexts. RNA Biol 2022; 19:1208-1227. [PMID: 36384383 PMCID: PMC9673937 DOI: 10.1080/15476286.2022.2144611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
This study investigates the importance of the structural context in the formation of a type I/II A-minor motif. This very frequent structural motif has been shown to be important in the spatial folding of RNA molecules. We developed an automated method to classify A-minor motif occurrences according to their 3D context similarities, and we used a graph approach to represent both the structural A-minor motif occurrences and their classes at different scales. This approach leads us to uncover new subclasses of A-minor motif occurrences according to their local 3D similarities. The majority of classes are composed of homologous occurrences, but some of them are composed of non-homologous occurrences. The different classifications we obtain allow us to better understand the importance of the context in the formation of A-minor motifs. In a second step, we investigate how much knowledge of the context around an A-minor motif can help to infer its presence (and position). More specifically, we want to determine what kind of information, contained in the structural context, can be useful to characterize and predict A-minor motifs. We show that, for some A-minor motifs, the topology combined with a sequence signal is sufficient to predict the presence and the position of an A-minor motif occurrence. In most other cases, these signals are not sufficient for predicting the A-minor motif, however we show that they are good signals for this purpose. All the classification and prediction pipelines rely on automated processes, for which we describe the underlying algorithms and parameters.
Collapse
Affiliation(s)
- Coline Gianfrotta
- Données et Algorithmes pour une Ville Intelligente et Durable (DAVID), Université de Versailles Saint-Quentin-en-Yvelines, Université Paris-Saclay, Versailles, France,Laboratoire Interdisciplinaire des Sciences du Numérique (LISN), Université Paris-Saclay, CNRS, Orsay, France,CONTACT Coline Gianfrotta Données et Algorithmes pour une Ville Intelligente et Durable (DAVID), Université de Versailles Saint-Quentin-en-Yvelines, Université Paris-Saclay, France
| | - Vladimir Reinharz
- Department of Computer Science, Université du Québec à Montréal, Québec, Canada
| | - Olivier Lespinet
- Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, CEA, CNRS, Gif-sur-Yvette, France
| | - Dominique Barth
- Données et Algorithmes pour une Ville Intelligente et Durable (DAVID), Université de Versailles Saint-Quentin-en-Yvelines, Université Paris-Saclay, Versailles, France
| | - Alain Denise
- Laboratoire Interdisciplinaire des Sciences du Numérique (LISN), Université Paris-Saclay, CNRS, Orsay, France,Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, CEA, CNRS, Gif-sur-Yvette, France
| |
Collapse
|
7
|
Emrizal R, Hamdani HY, Firdaus-Raih M. Graph Theoretical Methods and Workflows for Searching and Annotation of RNA Tertiary Base Motifs and Substructures. Int J Mol Sci 2021; 22:ijms22168553. [PMID: 34445259 PMCID: PMC8395288 DOI: 10.3390/ijms22168553] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Revised: 08/01/2021] [Accepted: 08/06/2021] [Indexed: 12/12/2022] Open
Abstract
The increasing number and complexity of structures containing RNA chains in the Protein Data Bank (PDB) have led to the need for automated structure annotation methods to replace or complement expert visual curation. This is especially true when searching for tertiary base motifs and substructures. Such base arrangements and motifs have diverse roles that range from contributions to structural stability to more direct involvement in the molecule's functions, such as the sites for ligand binding and catalytic activity. We review the utility of computational approaches in annotating RNA tertiary base motifs in a dataset of PDB structures, particularly the use of graph theoretical algorithms that can search for such base motifs and annotate them or find and annotate clusters of hydrogen-bond-connected bases. We also demonstrate how such graph theoretical algorithms can be integrated into a workflow that allows for functional analysis and comparisons of base arrangements and sub-structures, such as those involved in ligand binding. The capacity to carry out such automatic curations has led to the discovery of novel motifs and can give new context to known motifs as well as enable the rapid compilation of RNA 3D motifs into a database.
Collapse
Affiliation(s)
- Reeki Emrizal
- Department of Applied Physics, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, UKM Bangi, Bangi 43600, Selangor, Malaysia;
- Institute of Systems Biology, Universiti Kebangsaan Malaysia, UKM Bangi, Bangi 43600, Selangor, Malaysia
| | - Hazrina Yusof Hamdani
- Advanced Medical and Dental Institute, Universiti Sains Malaysia, Bertam, Kepala Batas 13200, Pulau Pinang, Malaysia
- Correspondence: (H.Y.H.); (M.F.-R.)
| | - Mohd Firdaus-Raih
- Department of Applied Physics, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, UKM Bangi, Bangi 43600, Selangor, Malaysia;
- Institute of Systems Biology, Universiti Kebangsaan Malaysia, UKM Bangi, Bangi 43600, Selangor, Malaysia
- Correspondence: (H.Y.H.); (M.F.-R.)
| |
Collapse
|
8
|
Islam S, Rahaman MM, Zhang S. RNAMotifContrast: a method to discover and visualize RNA structural motif subfamilies. Nucleic Acids Res 2021; 49:e61. [PMID: 33693841 PMCID: PMC8216276 DOI: 10.1093/nar/gkab131] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2020] [Revised: 02/16/2021] [Accepted: 02/18/2021] [Indexed: 01/17/2023] Open
Abstract
Understanding the 3D structural properties of RNAs will play a critical role in identifying their functional characteristics and designing new RNAs for RNA-based therapeutics and nanotechnology. While several existing computational methods can help in the analysis of RNA properties by recognizing structural motifs, they do not provide the means to compare and contrast those motifs extensively. We have developed a new method, RNAMotifContrast, which focuses on analyzing the similarities and variations of RNA structural motif characteristics. In this method, a graph is formed to represent the similarities among motifs, and a new traversal algorithm is applied to generate visualizations of their structural properties. Analyzing the structural features among motifs, we have recognized and generalized the concept of motif subfamilies. To asses its effectiveness, we have applied RNAMotifContrast on a dataset of known RNA structural motif families. From the results, we observed that the derived subfamilies possess unique structural variations while holding standard features of the families. Overall, the visualization approach of this method presents a new perspective to observe the relation among motifs more closely, and the discovered subfamilies provide opportunities to achieve valuable insights into RNA’s diverse roles.
Collapse
Affiliation(s)
- Shahidul Islam
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Md Mahfuzur Rahaman
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Shaojie Zhang
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| |
Collapse
|
9
|
Soulé A, Reinharz V, Sarrazin-Gendron R, Denise A, Waldispühl J. Finding recurrent RNA structural networks with fast maximal common subgraphs of edge-colored graphs. PLoS Comput Biol 2021; 17:e1008990. [PMID: 34048427 PMCID: PMC8191989 DOI: 10.1371/journal.pcbi.1008990] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 06/10/2021] [Accepted: 04/22/2021] [Indexed: 11/25/2022] Open
Abstract
RNA tertiary structure is crucial to its many non-coding molecular functions. RNA architecture is shaped by its secondary structure composed of stems, stacked canonical base pairs, enclosing loops. While stems are precisely captured by free-energy models, loops composed of non-canonical base pairs are not. Nor are distant interactions linking together those secondary structure elements (SSEs). Databases of conserved 3D geometries (a.k.a. modules) not captured by energetic models are leveraged for structure prediction and design, but the computational complexity has limited their study to local elements, loops. Representing the RNA structure as a graph has recently allowed to expend this work to pairs of SSEs, uncovering a hierarchical organization of these 3D modules, at great computational cost. Systematically capturing recurrent patterns on a large scale is a main challenge in the study of RNA structures. In this paper, we present an efficient algorithm to compute maximal isomorphisms in edge colored graphs. We extend this algorithm to a framework well suited to identify RNA modules, and fast enough to considerably generalize previous approaches. To exhibit the versatility of our framework, we first reproduce results identifying all common modules spanning more than 2 SSEs, in a few hours instead of weeks. The efficiency of our new algorithm is demonstrated by computing the maximal modules between any pair of entire RNA in the non-redundant corpus of known RNA 3D structures. We observe that the biggest modules our method uncovers compose large shared sub-structure spanning hundreds of nucleotides and base pairs between the ribosomes of Thermus thermophilus, Escherichia Coli, and Pseudomonas aeruginosa. Ribonucleic Acids (RNAs) are performing a broad range of essential molecular functions in cells, many of which rely on intricate folding properties of the molecule. Watson-Crick and Wobble base pairs form early, stack onto each other to create stems connected by loops, which are themselves stabilized by more sophisticated base interaction patterns. These networks are essential to shape RNA 3D structures but unfortunately still poorly understood. Here, we undertake the task to build a catalog of base interaction networks occurring in multiple structures. However, a pairwise comparison of all RNA structures is computationally heavy. Therefore, we devise an algorithm leveraging intrinsic properties of RNA base interaction networks that enables us to quickly mine full databases of 3D structures. Compared to previous methods, our techniques bring the total running time of the analysis from months to hours while performing more general searches. The data collected though this work will benefit molecular evolution studies and serve in structure prediction tools.
Collapse
Affiliation(s)
- Antoine Soulé
- School of Computer Science, McGill University, Montréal, Canada
- LiX, École Polytechnique, Paris, France
| | - Vladimir Reinharz
- Department of Computer Science, Université du Québec à Montréal, Montréal, Canada
| | | | - Alain Denise
- Laboratoire de recherche en informatique, Université Paris-Saclay - CNRS, Orsay, France
- Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay - CEA - CNRS, Gif-sur-Yvette, France
| | - Jérôme Waldispühl
- School of Computer Science, McGill University, Montréal, Canada
- * E-mail:
| |
Collapse
|
10
|
Liu B, Thippabhotla S, Zhang J, Zhong C. DRAGoM: Classification and Quantification of Noncoding RNA in Metagenomic Data. Front Genet 2021; 12:669495. [PMID: 34025724 PMCID: PMC8131839 DOI: 10.3389/fgene.2021.669495] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 03/23/2021] [Indexed: 12/21/2022] Open
Abstract
Noncoding RNAs (ncRNAs) play important regulatory and functional roles in microorganisms, such as regulation of gene expression, signaling, protein synthesis, and RNA processing. Hence, their classification and quantification are central tasks toward the understanding of the function of the microbial community. However, the majority of the current metagenomic sequencing technologies generate short reads, which may contain only a partial secondary structure that complicates ncRNA homology detection. Meanwhile, de novo assembly of the metagenomic sequencing data remains challenging for complex communities. To tackle these challenges, we developed a novel algorithm called DRAGoM (Detection of RNA using Assembly Graph from Metagenomic data). DRAGoM first constructs a hybrid graph by merging an assembly string graph and an assembly de Bruijn graph. Then, it classifies paths in the hybrid graph and their constituent readsinto differentncRNA families based on both sequence and structural homology. Our benchmark experiments show that DRAGoMcan improve the performance and robustness over traditional approaches on the classification and quantification of a wide class of ncRNA families.
Collapse
Affiliation(s)
- Ben Liu
- Department of Electrical Engineering and Computer Science, The University of Kansas, Lawrence, KS, United States
| | - Sirisha Thippabhotla
- Department of Electrical Engineering and Computer Science, The University of Kansas, Lawrence, KS, United States
| | - Jun Zhang
- Division of Medical Oncology, Department of Internal Medicine, University of Kansas Medical Center, Kansas City, KS, United States.,Department of Cancer Biology, University of Kansas Medical Center, Kansas City, KS, United States
| | - Cuncong Zhong
- Department of Electrical Engineering and Computer Science, The University of Kansas, Lawrence, KS, United States.,Bioengineering Program, The University of Kansas, Lawrence, KS, United States.,Center for Computational Biology, The University of Kansas, Lawrence, KS, United States
| |
Collapse
|
11
|
Chen X, Khan NS, Zhang S. LocalSTAR3D: a local stack-based RNA 3D structural alignment tool. Nucleic Acids Res 2020; 48:e77. [PMID: 32496533 PMCID: PMC7367197 DOI: 10.1093/nar/gkaa453] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2020] [Revised: 05/15/2020] [Accepted: 05/27/2020] [Indexed: 11/29/2022] Open
Abstract
A fast-growing number of non-coding RNA structures have been resolved and deposited in Protein Data Bank (PDB). In contrast to the wide range of global alignment and motif search tools, there is still a lack of local alignment tools. Among all the global alignment tools for RNA 3D structures, STAR3D has become a valuable tool for its unprecedented speed and accuracy. STAR3D compares the 3D structures of RNA molecules using consecutive base-pairs (stacks) as anchors and generates an optimal global alignment. In this article, we developed a local RNA 3D structural alignment tool, named LocalSTAR3D, which was extended from STAR3D and designed to report multiple local alignments between two RNAs. The benchmarking results show that LocalSTAR3D has better accuracy and coverage than other local alignment tools. Furthermore, the utility of this tool has been demonstrated by rediscovering kink-turn motif instances, conserved domains in group II intron RNAs, and the tRNA mimicry of IRES RNAs.
Collapse
Affiliation(s)
- Xiaoli Chen
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Nabila Shahnaz Khan
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Shaojie Zhang
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| |
Collapse
|
12
|
Černý J, Božíková P, Svoboda J, Schneider B. A unified dinucleotide alphabet describing both RNA and DNA structures. Nucleic Acids Res 2020; 48:6367-6381. [PMID: 32406923 PMCID: PMC7293047 DOI: 10.1093/nar/gkaa383] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2020] [Revised: 04/11/2020] [Accepted: 04/30/2020] [Indexed: 12/13/2022] Open
Abstract
By analyzing almost 120 000 dinucleotides in over 2000 nonredundant nucleic acid crystal structures, we define 96+1 diNucleotide Conformers, NtCs, which describe the geometry of RNA and DNA dinucleotides. NtC classes are grouped into 15 codes of the structural alphabet CANA (Conformational Alphabet of Nucleic Acids) to simplify symbolic annotation of the prominent structural features of NAs and their intuitive graphical display. The search for nontrivial patterns of NtCs resulted in the identification of several types of RNA loops, some of them observed for the first time. Over 30% of the nearly six million dinucleotides in the PDB cannot be assigned to any NtC class but we demonstrate that up to a half of them can be re-refined with the help of proper refinement targets. A statistical analysis of the preferences of NtCs and CANA codes for the 16 dinucleotide sequences showed that neither the NtC class AA00, which forms the scaffold of RNA structures, nor BB00, the DNA most populated class, are sequence neutral but their distributions are significantly biased. The reported automated assignment of the NtC classes and CANA codes available at dnatco.org provides a powerful tool for unbiased analysis of nucleic acid structures by structural and molecular biologists.
Collapse
Affiliation(s)
- Jiří Černý
- Institute of Biotechnology of the Czech Academy of Sciences, BIOCEV, CZ-252 50 Vestec, Prague-West, Czech Republic
| | - Paulína Božíková
- Institute of Biotechnology of the Czech Academy of Sciences, BIOCEV, CZ-252 50 Vestec, Prague-West, Czech Republic
| | - Jakub Svoboda
- Institute of Biotechnology of the Czech Academy of Sciences, BIOCEV, CZ-252 50 Vestec, Prague-West, Czech Republic
| | - Bohdan Schneider
- Institute of Biotechnology of the Czech Academy of Sciences, BIOCEV, CZ-252 50 Vestec, Prague-West, Czech Republic
| |
Collapse
|
13
|
Sarrazin-Gendron R, Reinharz V, Oliver CG, Moitessier N, Waldispühl J. Automated, customizable and efficient identification of 3D base pair modules with BayesPairing. Nucleic Acids Res 2019; 47:3321-3332. [PMID: 30828711 PMCID: PMC6468301 DOI: 10.1093/nar/gkz102] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2018] [Revised: 02/06/2019] [Accepted: 02/28/2019] [Indexed: 12/12/2022] Open
Abstract
RNA structures possess multiple levels of structural organization. A secondary structure, made of Watson–Crick helices connected by loops, forms a scaffold for the tertiary structure. The 3D structures adopted by these loops are therefore critical determinants shaping the global 3D architecture. Earlier studies showed that these local 3D structures can be described as conserved sets of ordered non-Watson–Crick base pairs called RNA structural modules. Unfortunately, the computational efficiency and scope of the current 3D module identification methods are too limited yet to benefit from all the knowledge accumulated in the module databases. We present BayesPairing, an automated, efficient and customizable tool for (i) building Bayesian networks representing RNA 3D modules and (ii) rapid identification of 3D modules in sequences. BayesPairing uses a flexible definition of RNA 3D modules that allows us to consider complex architectures such as multi-branched loops and features multiple algorithmic improvements. We benchmarked our methods using cross-validation techniques on 3409 RNA chains and show that BayesPairing achieves up to ∼70% identification accuracy on module positions and base pair interactions. BayesPairing can handle a broader range of motifs (versatility) and offers considerable running time improvements (efficiency), opening the door to a broad range of large-scale applications.
Collapse
Affiliation(s)
| | - Vladimir Reinharz
- Center for Soft and Living Matter, Institute for Basic Science, Ulsan 44919, Republic of Korea
| | - Carlos G Oliver
- School of Computer Science, McGill University, Montreal, QC H3A 0E9, Canada
| | - Nicolas Moitessier
- Department of Chemistry, McGill University, Montreal, QC H3A 0B8, Canada
| | - Jérôme Waldispühl
- School of Computer Science, McGill University, Montreal, QC H3A 0E9, Canada
| |
Collapse
|
14
|
Reinharz V, Soulé A, Westhof E, Waldispühl J, Denise A. Mining for recurrent long-range interactions in RNA structures reveals embedded hierarchies in network families. Nucleic Acids Res 2019; 46:3841-3851. [PMID: 29608773 PMCID: PMC5934684 DOI: 10.1093/nar/gky197] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2017] [Accepted: 03/22/2018] [Indexed: 11/14/2022] Open
Abstract
The wealth of the combinatorics of nucleotide base pairs enables RNA molecules to assemble into sophisticated interaction networks, which are used to create complex 3D substructures. These interaction networks are essential to shape the 3D architecture of the molecule, and also to provide the key elements to carry molecular functions such as protein or ligand binding. They are made of organised sets of long-range tertiary interactions which connect distinct secondary structure elements in 3D structures. Here, we present a de novo data-driven approach to extract automatically from large data sets of full RNA 3D structures the recurrent interaction networks (RINs). Our methodology enables us for the first time to detect the interaction networks connecting distinct components of the RNA structure, highlighting their diversity and conservation through non-related functional RNAs. We use a graphical model to perform pairwise comparisons of all RNA structures available and to extract RINs and modules. Our analysis yields a complete catalog of RNA 3D structures available in the Protein Data Bank and reveals the intricate hierarchical organization of the RNA interaction networks and modules. We assembled our results in an online database (http://carnaval.lri.fr) which will be regularly updated. Within the site, a tool allows users with a novel RNA structure to detect automatically whether the novel structure contains previously observed RINs.
Collapse
Affiliation(s)
- Vladimir Reinharz
- Department of Computer Science, Ben-Gurion University of the Negev, P.O.B. 653 Beer-Sheva, 84105, Israel.,School of Computer Science, McGill University, 3480 University, Montreal, Quebec H3A 0E9, Canada
| | - Antoine Soulé
- School of Computer Science, McGill University, 3480 University, Montreal, Quebec H3A 0E9, Canada.,LIX, École Polytechnique, CNRS, Inria, Palaiseau 91120, France
| | - Eric Westhof
- ARN, Université de Strasbourg, IBMC-CNRS, 15 rue René Descartes, Strasbourg Cedex 67084, France
| | - Jérôme Waldispühl
- School of Computer Science, McGill University, 3480 University, Montreal, Quebec H3A 0E9, Canada
| | - Alain Denise
- LRI, Université Paris-Sud, CNRS, Université Paris-Saclay, Bâtiment 650, Orsay cedex 91405, France.,I2BC, Université Paris-Sud, CNRS, CEA, Université Paris-Saclay, Bâtiment 400, Orsay cedex 91405, France
| |
Collapse
|
15
|
Sallam T, Sandhu J, Tontonoz P. Long Noncoding RNA Discovery in Cardiovascular Disease: Decoding Form to Function. Circ Res 2019; 122:155-166. [PMID: 29301847 DOI: 10.1161/circresaha.117.311802] [Citation(s) in RCA: 187] [Impact Index Per Article: 37.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Despite significant improvements during the past 3 decades, cardiovascular disease remains a leading worldwide health epidemic. The recent identification of a fascinating group of mediators known as long noncoding RNAs (lncRNAs) has provided a wealth of new biology to explore for cardiovascular risk mitigation. lncRNAs are expressed in a highly context-specific fashion, and multiple lines of evidence implicated them in diverse biological processes. Indeed, abnormalities of lncRNAs have been directly linked with human ailments, including cardiovascular biology and disease. Of particular interest to the cardiovascular research community, dysregulation in lncRNA regulatory circuits have been associated with cardiac pathological hypertrophy, vascular disease, cell fate programming and development, atherosclerosis, dyslipidemia, and metabolic syndrome. Although techniques in interrogating noncoding RNAs are rapidly evolving, a major challenge in studying lncRNAs remains navigating through multiple technical constraints. In this review, we provide a road map for lncRNA discovery and interrogation in biological systems relevant to cardiovascular disease and highlight approaches to decipher their modes of action.
Collapse
Affiliation(s)
- Tamer Sallam
- From the Division of Cardiology, Department of Medicine (T.S.) and Department of Pathology and Laboratory Medicine, Howard Hughes Medical Institute (J.S., P.T.), University of California, Los Angeles.
| | - Jaspreet Sandhu
- From the Division of Cardiology, Department of Medicine (T.S.) and Department of Pathology and Laboratory Medicine, Howard Hughes Medical Institute (J.S., P.T.), University of California, Los Angeles
| | - Peter Tontonoz
- From the Division of Cardiology, Department of Medicine (T.S.) and Department of Pathology and Laboratory Medicine, Howard Hughes Medical Institute (J.S., P.T.), University of California, Los Angeles
| |
Collapse
|
16
|
Ge P, Islam S, Zhong C, Zhang S. De novo discovery of structural motifs in RNA 3D structures through clustering. Nucleic Acids Res 2018; 46:4783-4793. [PMID: 29534235 PMCID: PMC5961109 DOI: 10.1093/nar/gky139] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2017] [Revised: 02/09/2018] [Accepted: 02/16/2018] [Indexed: 11/16/2022] Open
Abstract
As functional components in three-dimensional (3D) conformation of an RNA, the RNA structural motifs provide an easy way to associate the molecular architectures with their biological mechanisms. In the past years, many computational tools have been developed to search motif instances by using the existing knowledge of well-studied families. Recently, with the rapidly increasing number of resolved RNA 3D structures, there is an urgent need to discover novel motifs with the newly presented information. In this work, we classify all the loops in non-redundant RNA 3D structures to detect plausible RNA structural motif families by using a clustering pipeline. Compared with other clustering approaches, our method has two benefits: first, the underlying alignment algorithm is tolerant to the variations in 3D structures. Second, sophisticated downstream analysis has been performed to ensure the clusters are valid and easily applied to further research. The final clustering results contain many interesting new variants of known motif families, such as GNAA tetraloop, kink-turn, sarcin-ricin and T-loop. We have also discovered potential novel functional motifs conserved in ribosomal RNA, sgRNA, SRP RNA, riboswitch and ribozyme.
Collapse
Affiliation(s)
- Ping Ge
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Shahidul Islam
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Cuncong Zhong
- Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS 66045, USA
| | - Shaojie Zhang
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| |
Collapse
|
17
|
Šponer J, Bussi G, Krepl M, Banáš P, Bottaro S, Cunha RA, Gil-Ley A, Pinamonti G, Poblete S, Jurečka P, Walter NG, Otyepka M. RNA Structural Dynamics As Captured by Molecular Simulations: A Comprehensive Overview. Chem Rev 2018; 118:4177-4338. [PMID: 29297679 PMCID: PMC5920944 DOI: 10.1021/acs.chemrev.7b00427] [Citation(s) in RCA: 326] [Impact Index Per Article: 54.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2017] [Indexed: 12/14/2022]
Abstract
With both catalytic and genetic functions, ribonucleic acid (RNA) is perhaps the most pluripotent chemical species in molecular biology, and its functions are intimately linked to its structure and dynamics. Computer simulations, and in particular atomistic molecular dynamics (MD), allow structural dynamics of biomolecular systems to be investigated with unprecedented temporal and spatial resolution. We here provide a comprehensive overview of the fast-developing field of MD simulations of RNA molecules. We begin with an in-depth, evaluatory coverage of the most fundamental methodological challenges that set the basis for the future development of the field, in particular, the current developments and inherent physical limitations of the atomistic force fields and the recent advances in a broad spectrum of enhanced sampling methods. We also survey the closely related field of coarse-grained modeling of RNA systems. After dealing with the methodological aspects, we provide an exhaustive overview of the available RNA simulation literature, ranging from studies of the smallest RNA oligonucleotides to investigations of the entire ribosome. Our review encompasses tetranucleotides, tetraloops, a number of small RNA motifs, A-helix RNA, kissing-loop complexes, the TAR RNA element, the decoding center and other important regions of the ribosome, as well as assorted others systems. Extended sections are devoted to RNA-ion interactions, ribozymes, riboswitches, and protein/RNA complexes. Our overview is written for as broad of an audience as possible, aiming to provide a much-needed interdisciplinary bridge between computation and experiment, together with a perspective on the future of the field.
Collapse
Affiliation(s)
- Jiří Šponer
- Institute of Biophysics of the Czech Academy of Sciences , Kralovopolska 135 , Brno 612 65 , Czech Republic
| | - Giovanni Bussi
- Scuola Internazionale Superiore di Studi Avanzati , Via Bonomea 265 , Trieste 34136 , Italy
| | - Miroslav Krepl
- Institute of Biophysics of the Czech Academy of Sciences , Kralovopolska 135 , Brno 612 65 , Czech Republic
- Regional Centre of Advanced Technologies and Materials, Department of Physical Chemistry, Faculty of Science , Palacky University Olomouc , 17. listopadu 12 , Olomouc 771 46 , Czech Republic
| | - Pavel Banáš
- Regional Centre of Advanced Technologies and Materials, Department of Physical Chemistry, Faculty of Science , Palacky University Olomouc , 17. listopadu 12 , Olomouc 771 46 , Czech Republic
| | - Sandro Bottaro
- Structural Biology and NMR Laboratory, Department of Biology , University of Copenhagen , Copenhagen 2200 , Denmark
| | - Richard A Cunha
- Scuola Internazionale Superiore di Studi Avanzati , Via Bonomea 265 , Trieste 34136 , Italy
| | - Alejandro Gil-Ley
- Scuola Internazionale Superiore di Studi Avanzati , Via Bonomea 265 , Trieste 34136 , Italy
| | - Giovanni Pinamonti
- Scuola Internazionale Superiore di Studi Avanzati , Via Bonomea 265 , Trieste 34136 , Italy
| | - Simón Poblete
- Scuola Internazionale Superiore di Studi Avanzati , Via Bonomea 265 , Trieste 34136 , Italy
| | - Petr Jurečka
- Regional Centre of Advanced Technologies and Materials, Department of Physical Chemistry, Faculty of Science , Palacky University Olomouc , 17. listopadu 12 , Olomouc 771 46 , Czech Republic
| | - Nils G Walter
- Single Molecule Analysis Group and Center for RNA Biomedicine, Department of Chemistry , University of Michigan , Ann Arbor , Michigan 48109 , United States
| | - Michal Otyepka
- Regional Centre of Advanced Technologies and Materials, Department of Physical Chemistry, Faculty of Science , Palacky University Olomouc , 17. listopadu 12 , Olomouc 771 46 , Czech Republic
| |
Collapse
|
18
|
Islam S, Ge P, Zhang S. CompAnnotate: a comparative approach to annotate base-pairing interactions in RNA 3D structures. Nucleic Acids Res 2017. [PMID: 28641399 PMCID: PMC5737500 DOI: 10.1093/nar/gkx538] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
The analysis of RNA tertiary structure is hindered by the fact that not too many structural data are available and a significant amount of them are in low resolution. Due to the atomic coordinate errors posed by the limitations of low-resolution RNA three-dimensional structures, it becomes a critical challenge to extract key geometric characteristics of RNA, particularly, the interaction of bases. To address this issue, we have devised a comparative method, named CompAnnotate, that utilizes more precise structural information of high-resolution homologs to annotate the base-pairing interactions in the low-resolution structures, by aligning and making comparative geometric assessments. The benchmarking results show that our method can improve the annotations of the existing methods significantly. We have achieved different levels of improvements for various methods and datasets, including an example of significant sensitivity and precision enhancement from 28 to 57% and from 53 to 82%, respectively.
Collapse
Affiliation(s)
- Shahidul Islam
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Ping Ge
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Shaojie Zhang
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| |
Collapse
|
19
|
Zhong C, Zhang S. RNAMotifScanX: a graph alignment approach for RNA structural motif identification. RNA (NEW YORK, N.Y.) 2015; 21:333-346. [PMID: 25595715 PMCID: PMC4338331 DOI: 10.1261/rna.044891.114] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/16/2014] [Accepted: 11/28/2014] [Indexed: 06/04/2023]
Abstract
RNA structural motifs are recurrent three-dimensional (3D) components found in the RNA architecture. These RNA structural motifs play important structural or functional roles and usually exhibit highly conserved 3D geometries and base-interaction patterns. Analysis of the RNA 3D structures and elucidation of their molecular functions heavily rely on efficient and accurate identification of these motifs. However, efficient RNA structural motif search tools are lacking due to the high complexity of these motifs. In this work, we present RNAMotifScanX, a motif search tool based on a base-interaction graph alignment algorithm. This novel algorithm enables automatic identification of both partially and fully matched motif instances. RNAMotifScanX considers noncanonical base-pairing interactions, base-stacking interactions, and sequence conservation of the motifs, which leads to significantly improved sensitivity and specificity as compared with other state-of-the-art search tools. RNAMotifScanX also adopts a carefully designed branch-and-bound technique, which enables ultra-fast search of large kink-turn motifs against a 23S rRNA. The software package RNAMotifScanX is implemented using GNU C++, and is freely available from http://genome.ucf.edu/RNAMotifScanX.
Collapse
Affiliation(s)
- Cuncong Zhong
- Department of Electrical Engineering and Computer Science, University of Central Florida, Orlando, Florida 32816, USA
| | - Shaojie Zhang
- Department of Electrical Engineering and Computer Science, University of Central Florida, Orlando, Florida 32816, USA
| |
Collapse
|
20
|
Bottaro S, Di Palma F, Bussi G. The role of nucleobase interactions in RNA structure and dynamics. Nucleic Acids Res 2014; 42:13306-14. [PMID: 25355509 PMCID: PMC4245972 DOI: 10.1093/nar/gku972] [Citation(s) in RCA: 93] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
The intricate network of interactions observed in RNA three-dimensional structures is often described in terms of a multitude of geometrical properties, including helical parameters, base pairing/stacking, hydrogen bonding and backbone conformation. We show that a simple molecular representation consisting in one oriented bead per nucleotide can account for the fundamental structural properties of RNA. In this framework, canonical Watson-Crick, non-Watson-Crick base-pairing and base-stacking interactions can be unambiguously identified within a well-defined interaction shell. We validate this representation by performing two independent, complementary tests. First, we use it to construct a sequence-independent, knowledge-based scoring function for RNA structural prediction, which compares favorably to fully atomistic, state-of-the-art techniques. Second, we define a metric to measure deviation between RNA structures that directly reports on the differences in the base–base interaction network. The effectiveness of this metric is tested with respect to the ability to discriminate between structurally and kinetically distant RNA conformations, performing better compared to standard techniques. Taken together, our results suggest that this minimalist, nucleobase-centric representation captures the main interactions that are relevant for describing RNA structure and dynamics.
Collapse
Affiliation(s)
- Sandro Bottaro
- Scuola Internazionale Superiore di Studi Avanzati, International School for Advanced Studies, 265, Via Bonomea I-34136 Trieste, Italy
| | - Francesco Di Palma
- Scuola Internazionale Superiore di Studi Avanzati, International School for Advanced Studies, 265, Via Bonomea I-34136 Trieste, Italy
| | - Giovanni Bussi
- Scuola Internazionale Superiore di Studi Avanzati, International School for Advanced Studies, 265, Via Bonomea I-34136 Trieste, Italy
| |
Collapse
|
21
|
He G, Steppi A, Laborde J, Srivastava A, Zhao P, Zhang J. RASS: a web server for RNA alignment in the joint sequence-structure space. Nucleic Acids Res 2014; 42:W377-81. [PMID: 24831547 PMCID: PMC4086137 DOI: 10.1093/nar/gku429] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Comparison of ribonucleic acid (RNA) molecules is important for revealing their
evolutionary relationships, predicting their functions and predicting their
structures. Many methods have been developed for comparing RNAs using either
sequence or three-dimensional (3D) structure (backbone geometry) information.
Sequences and 3D structures contain non-overlapping sets of information that
both determine RNA functions. When comparing RNA 3D structures, both types of
information need to be taken into account. However, few methods compare RNA
structures using both sequence and 3D structure information. Recently, we have
developed a new method based on elastic shape analysis (ESA) that compares RNA
molecules by combining both sequence and 3D structure information. ESA treats
RNA structures as 3D curves with sequence information encoded on additional
coordinates so that the alignment can be performed in the joint
sequence-structure space. The similarity between two RNA molecules is quantified
by a formal distance, geodesic distance. In this study, we implement a web
server for the method, called RASS, to make it publicly available to research
community. The web server is located at http://cloud.stat.fsu.edu/RASS/.
Collapse
Affiliation(s)
- Gewen He
- Department of Computer Science, Florida State University, Tallahassee, FL 32306, USA
| | - Albert Steppi
- Department of Statistics, Florida State University, Tallahassee, FL 32306, USA
| | - Jose Laborde
- Department of Statistics, Florida State University, Tallahassee, FL 32306, USA
| | - Anuj Srivastava
- Department of Statistics, Florida State University, Tallahassee, FL 32306, USA
| | - Peixiang Zhao
- Department of Computer Science, Florida State University, Tallahassee, FL 32306, USA
| | - Jinfeng Zhang
- Department of Statistics, Florida State University, Tallahassee, FL 32306, USA
| |
Collapse
|
22
|
Sheth P, Cervantes-Cervantes M, Nagula A, Laing C, Wang JTL. Novel features for identifying A-minors in three-dimensional RNA molecules. Comput Biol Chem 2013; 47:240-5. [PMID: 24211672 DOI: 10.1016/j.compbiolchem.2013.10.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2013] [Revised: 10/15/2013] [Accepted: 10/16/2013] [Indexed: 01/08/2023]
Abstract
RNA tertiary interactions or tertiary motifs are conserved structural patterns formed by pairwise interactions between nucleotides. They include base-pairing, base-stacking, and base-phosphate interactions. A-minor motifs are the most common tertiary interactions in the large ribosomal subunit. The A-minor motif is a nucleotide triple in which minor groove edges of an adenine base are inserted into the minor groove of neighboring helices, leading to interaction with a stabilizing base pair. We propose here novel features for identifying and predicting A-minor motifs in a given three-dimensional RNA molecule. By utilizing the features together with machine learning algorithms including random forests and support vector machines, we show experimentally that our approach is capable of predicting A-minor motifs in the given RNA molecule effectively, demonstrating the usefulness of the proposed approach. The techniques developed from this work will be useful for molecular biologists and biochemists to analyze RNA tertiary motifs, specifically A-minor interactions.
Collapse
Affiliation(s)
- Palak Sheth
- Bioinformatics Program, New Jersey Institute of Technology, Newark, NJ 07102, USA
| | | | | | | | | |
Collapse
|
23
|
Efficient alignment of RNA secondary structures using sparse dynamic programming. BMC Bioinformatics 2013; 14:269. [PMID: 24011432 PMCID: PMC3871798 DOI: 10.1186/1471-2105-14-269] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2013] [Accepted: 09/03/2013] [Indexed: 12/11/2022] Open
Abstract
Background Current advances of the next-generation sequencing technology have revealed a large number of un-annotated RNA transcripts. Comparative study of the RNA structurome is an important approach to assess their biological functionalities. Due to the large sizes and abundance of the RNA transcripts, an efficient and accurate RNA structure-structure alignment algorithm is in urgent need to facilitate the comparative study. Despite the importance of the RNA secondary structure alignment problem, there are no computational tools available that provide high computational efficiency and accuracy. In this case, designing and implementing such an efficient and accurate RNA secondary structure alignment algorithm is highly desirable. Results In this work, through incorporating the sparse dynamic programming technique, we implemented an algorithm that has an O(n3) expected time complexity, where n is the average number of base pairs in the RNA structures. This complexity, which can be shown assuming the polymer-zeta property, is confirmed by our experiments. The resulting new RNA secondary structure alignment tool is called ERA. Benchmark results indicate that ERA can significantly speedup RNA structure-structure alignments compared to other state-of-the-art RNA alignment tools, while maintaining high alignment accuracy. Conclusions Using the sparse dynamic programming technique, we are able to develop a new RNA secondary structure alignment tool that is both efficient and accurate. We anticipate that the new alignment algorithm ERA will significantly promote comparative RNA structure studies. The program, ERA, is freely available at http://genome.ucf.edu/ERA.
Collapse
|
24
|
Theis C, Höner Zu Siederdissen C, Hofacker IL, Gorodkin J. Automated identification of RNA 3D modules with discriminative power in RNA structural alignments. Nucleic Acids Res 2013; 41:9999-10009. [PMID: 24005040 PMCID: PMC3905863 DOI: 10.1093/nar/gkt795] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Recent progress in predicting RNA structure is moving towards filling the ‘gap’ in 2D RNA structure prediction where, for example, predicted internal loops often form non-canonical base pairs. This is increasingly recognized with the steady increase of known RNA 3D modules. There is a general interest in matching structural modules known from one molecule to other molecules for which the 3D structure is not known yet. We have created a pipeline, metaRNAmodules, which completely automates extracting putative modules from the FR3D database and mapping of such modules to Rfam alignments to obtain comparative evidence. Subsequently, the modules, initially represented by a graph, are turned into models for the RMDetect program, which allows to test their discriminative power using real and randomized Rfam alignments. An initial extraction of 22 495 3D modules in all PDB files results in 977 internal loop and 17 hairpin modules with clear discriminatory power. Many of these modules describe only minor variants of each other. Indeed, mapping of the modules onto Rfam families results in 35 unique locations in 11 different families. The metaRNAmodules pipeline source for the internal loop modules is available at http://rth.dk/resources/mrm.
Collapse
Affiliation(s)
- Corinna Theis
- Center for non-coding RNA in Technology and Health, Department of Veterinary Clinical and Animal Science, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg, Denmark, Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Vienna, Austria and Research Group Bioinformatics and Computational Biology, University of Vienna, Währingerstraße 17, A-1090 Vienna, Austria
| | | | | | | |
Collapse
|
25
|
Shen Y, Wong HS, Zhang S, Zhang L. RNA structural motif recognition based on least-squares distance. RNA (NEW YORK, N.Y.) 2013; 19:1183-1191. [PMID: 23887146 PMCID: PMC3753925 DOI: 10.1261/rna.037648.112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/09/2012] [Accepted: 06/13/2013] [Indexed: 06/02/2023]
Abstract
RNA structural motifs are recurrent structural elements occurring in RNA molecules. RNA structural motif recognition aims to find RNA substructures that are similar to a query motif, and it is important for RNA structure analysis and RNA function prediction. In view of this, we propose a new method known as RNA Structural Motif Recognition based on Least-Squares distance (LS-RSMR) to effectively recognize RNA structural motifs. A test set consisting of five types of RNA structural motifs occurring in Escherichia coli ribosomal RNA is compiled by us. Experiments are conducted for recognizing these five types of motifs. The experimental results fully reveal the superiority of the proposed LS-RSMR compared with four other state-of-the-art methods.
Collapse
Affiliation(s)
- Ying Shen
- School of Software Engineering, Tongji University, Shanghai 200092, China
| | - Hau-San Wong
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
| | - Shaohong Zhang
- Department of Computer Science, Guangzhou University, Guangzhou 510006, China
| | - Lin Zhang
- School of Software Engineering, Tongji University, Shanghai 200092, China
| |
Collapse
|
26
|
Ananth P, Goldsmith G, Yathindra N. An innate twist between Crick's wobble and Watson-Crick base pairs. RNA (NEW YORK, N.Y.) 2013; 19:1038-1053. [PMID: 23861536 PMCID: PMC3708525 DOI: 10.1261/rna.036905.112] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Non-Watson-Crick pairs like the G·U wobble are frequent in RNA duplexes. Their geometric dissimilarity (nonisostericity) with the Watson-Crick base pairs and among themselves imparts structural variations decisive for biological functions. Through a novel circular representation of base pairs, a simple and general metric scheme for quantification of base-pair nonisostericity, in terms of residual twist and radial difference that can also envisage its mechanistic effect, is proposed. The scheme is exemplified by G·U and U·G wobble pairs, and their predicable local effects on helical twist angle are validated by MD simulations. New insights into a possible rationale for contextual occurrence of G·U and other non-WC pairs, as well as the influence of a G·U pair on other non-Watson-Crick pair neighborhood and RNA-protein interactions are obtained from analysis of crystal structure data. A few instances of RNA-protein interactions along the major groove are documented in addition to the well-recognized interaction of the G·U pair along the minor groove. The nonisostericity-mediated influence of wobble pairs for facilitating helical packing through long-range interactions in ribosomal RNAs is also reviewed.
Collapse
|
27
|
Laborde J, Robinson D, Srivastava A, Klassen E, Zhang J. RNA global alignment in the joint sequence-structure space using elastic shape analysis. Nucleic Acids Res 2013; 41:e114. [PMID: 23585278 PMCID: PMC3675459 DOI: 10.1093/nar/gkt187] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2012] [Revised: 02/26/2013] [Accepted: 02/27/2013] [Indexed: 01/22/2023] Open
Abstract
The functions of RNAs, like proteins, are determined by their structures, which, in turn, are determined by their sequences. Comparison/alignment of RNA molecules provides an effective means to predict their functions and understand their evolutionary relationships. For RNA sequence alignment, most methods developed for protein and DNA sequence alignment can be directly applied. RNA 3-dimensional structure alignment, on the other hand, tends to be more difficult than protein structure alignment due to the lack of regular secondary structures as observed in proteins. Most of the existing RNA 3D structure alignment methods use only the backbone geometry and ignore the sequence information. Using both the sequence and backbone geometry information in RNA alignment may not only produce more accurate classification, but also deepen our understanding of the sequence-structure-function relationship of RNA molecules. In this study, we developed a new RNA alignment method based on elastic shape analysis (ESA). ESA treats RNA structures as three dimensional curves with sequence information encoded on additional dimensions so that the alignment can be performed in the joint sequence-structure space. The similarity between two RNA molecules is quantified by a formal distance, geodesic distance. Based on ESA, a rigorous mathematical framework can be built for RNA structure comparison. Means and covariances of full structures can be defined and computed, and probability distributions on spaces of such structures can be constructed for a group of RNAs. Our method was further applied to predict functions of RNA molecules and showed superior performance compared with previous methods when tested on benchmark datasets. The programs are available at http://stat.fsu.edu/ ∼jinfeng/ESA.html.
Collapse
Affiliation(s)
- Jose Laborde
- Department of Statistics, Florida State University, FL, USA and Department of Mathematics, Florida State University, FL, USA
| | - Daniel Robinson
- Department of Statistics, Florida State University, FL, USA and Department of Mathematics, Florida State University, FL, USA
| | - Anuj Srivastava
- Department of Statistics, Florida State University, FL, USA and Department of Mathematics, Florida State University, FL, USA
| | - Eric Klassen
- Department of Statistics, Florida State University, FL, USA and Department of Mathematics, Florida State University, FL, USA
| | - Jinfeng Zhang
- Department of Statistics, Florida State University, FL, USA and Department of Mathematics, Florida State University, FL, USA
| |
Collapse
|
28
|
Abstract
The recent discoveries of regulatory non-coding RNAs changed our view of RNA as a simple information transfer molecule. Understanding the architecture and function of active RNA molecules requires methods for comparing and analyzing their 3D structures. While structural alignment of short RNAs is achievable in a reasonable amount of time, large structures represent much bigger challenge. Here, we present the SETTER web server for the RNA structure pairwise comparison utilizing the SETTER (SEcondary sTructure-based TERtiary Structure Similarity Algorithm) algorithm. The SETTER method divides an RNA structure into the set of non-overlapping structural elements called generalized secondary structure units (GSSUs). The SETTER algorithm scales as O(n2) with the size of a GSSUs and as O(n) with the number of GSSUs in the structure. This scaling gives SETTER its high speed as the average size of the GSSU remains constant irrespective of the size of the structure. However, the favorable speed of the algorithm does not compromise its accuracy. The SETTER web server together with the stand-alone implementation of the SETTER algorithm are freely accessible at http://siret.cz/setter.
Collapse
Affiliation(s)
- Petr Cech
- Laboratory of Informatics and Chemistry, Faculty of Chemical Technology, Prague, Czech Republic
| | | | | |
Collapse
|
29
|
Vanegas PL, Hudson GA, Davis AR, Kelly SC, Kirkpatrick CC, Znosko BM. RNA CoSSMos: Characterization of Secondary Structure Motifs--a searchable database of secondary structure motifs in RNA three-dimensional structures. Nucleic Acids Res 2011; 40:D439-44. [PMID: 22127861 PMCID: PMC3245015 DOI: 10.1093/nar/gkr943] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
RNA secondary structure is important for designing therapeutics, understanding protein–RNA binding and predicting tertiary structure of RNA. Several databases and downloadable programs exist that specialize in the three-dimensional (3D) structure of RNA, but none focus specifically on secondary structural motifs such as internal, bulge and hairpin loops. The RNA Characterization of Secondary Structure Motifs (RNA CoSSMos) database is a freely accessible and searchable online database and website of 3D characteristics of secondary structure motifs. To create the RNA CoSSMos database, 2156 Protein Data Bank (PDB) files were searched for internal, bulge and hairpin loops, and each loop's structural information, including sugar pucker, glycosidic linkage, hydrogen bonding patterns and stacking interactions, was included in the database. False positives were defined, identified and reclassified or omitted from the database to ensure the most accurate results possible. Users can search via general PDB information, experimental parameters, sequence and specific motif and by specific structural parameters in the subquery page after the initial search. Returned results for each search can be viewed individually or a complete set can be downloaded into a spreadsheet to allow for easy comparison. The RNA CoSSMos database is automatically updated weekly and is available at http://cossmos.slu.edu.
Collapse
Affiliation(s)
- Pamela L Vanegas
- Department of Chemistry, Saint Louis University, Saint Louis, MO 63103, USA
| | | | | | | | | | | |
Collapse
|
30
|
Höner zu Siederdissen C, Bernhart SH, Stadler PF, Hofacker IL. A folding algorithm for extended RNA secondary structures. Bioinformatics 2011; 27:i129-36. [PMID: 21685061 PMCID: PMC3117358 DOI: 10.1093/bioinformatics/btr220] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Motivation: RNA secondary structure contains many non-canonical base pairs of different pair families. Successful prediction of these structural features leads to improved secondary structures with applications in tertiary structure prediction and simultaneous folding and alignment. Results: We present a theoretical model capturing both RNA pair families and extended secondary structure motifs with shared nucleotides using 2-diagrams. We accompany this model with a number of programs for parameter optimization and structure prediction. Availability: All sources (optimization routines, RNA folding, RNA evaluation, extended secondary structure visualization) are published under the GPLv3 and available at www.tbi.univie.ac.at/software/rnawolf/. Contact:choener@tbi.univie.ac.at
Collapse
|
31
|
Zhong C, Zhang S. Clustering RNA structural motifs in ribosomal RNAs using secondary structural alignment. Nucleic Acids Res 2011; 40:1307-17. [PMID: 21976732 PMCID: PMC3273805 DOI: 10.1093/nar/gkr804] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
RNA structural motifs are the building blocks of the complex RNA architecture. Identification of non-coding RNA structural motifs is a critical step towards understanding of their structures and functionalities. In this article, we present a clustering approach for de novo RNA structural motif identification. We applied our approach on a data set containing 5S, 16S and 23S rRNAs and rediscovered many known motifs including GNRA tetraloop, kink-turn, C-loop, sarcin–ricin, reverse kink-turn, hook-turn, E-loop and tandem-sheared motifs, with higher accuracy than the state-of-the-art clustering method. We also identified a number of potential novel instances of GNRA tetraloop, kink-turn, sarcin–ricin and tandem-sheared motifs. More importantly, several novel structural motif families have been revealed by our clustering analysis. We identified a highly asymmetric bulge loop motif that resembles the rope sling. We also found an internal loop motif that can significantly increase the twist of the helix. Finally, we discovered a subfamily of hexaloop motif, which has significantly different geometry comparing to the currently known hexaloop motif. Our discoveries presented in this article have largely increased current knowledge of RNA structural motifs.
Collapse
Affiliation(s)
- Cuncong Zhong
- Department of Electrical Engineering and Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | | |
Collapse
|
32
|
Shen Y, Wong HS, Zhang S, Yu Z. Feature-based 3D motif filtering for ribosomal RNA. ACTA ACUST UNITED AC 2011; 27:2828-35. [PMID: 21873638 DOI: 10.1093/bioinformatics/btr495] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION RNA 3D motifs are recurrent substructures in an RNA subunit and are building blocks of the RNA architecture. They play an important role in binding proteins and consolidating RNA tertiary structures. RNA 3D motif searching consists of two steps: candidate generation and candidate filtering. We proposed a novel method, known as Feature-based RNA Motif Filtering (FRMF), for identifying motifs based on a set of moment invariants and the Earth Mover's Distance in the second step. RESULTS A positive set of RNA motifs belonging to six characteristic types, with eight subtypes occurring in HM 50S, is compiled by us. The proposed method is validated on this representative set. FRMF successfully finds most of the positive fragments. Besides the proposed new method and the compiled positive set, we also recognize some new motifs, in particular a π-turn and some non-standard A-minor motifs are found. These newly discovered motifs provide more information about RNA structure conformation. AVAILABILITY Matlab code can be downloaded from www.cs.cityu.edu.hk/~yingshen/FRMF.html CONTACT cshswong@cityu.edu.hk SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ying Shen
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
| | | | | | | |
Collapse
|
33
|
Sklenovský P, Florová P, Banáš P, Réblová K, Lankaš F, Otyepka M, Šponer J. Understanding RNA Flexibility Using Explicit Solvent Simulations: The Ribosomal and Group I Intron Reverse Kink-Turn Motifs. J Chem Theory Comput 2011; 7:2963-80. [PMID: 26605485 DOI: 10.1021/ct200204t] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Reverse kink-turn is a recurrent elbow-like RNA building block occurring in the ribosome and in the group I intron. Its sequence signature almost matches that of the conventional kink-turn. However, the reverse and conventional kink-turns have opposite directions of bending. The reverse kink-turn lacks basically any tertiary interaction between its stems. We report unrestrained, explicit solvent molecular dynamics simulations of ribosomal and intron reverse kink-turns (54 simulations with 7.4 μs of data in total) with different variants (ff94, ff99, ff99bsc0, ff99χOL, and ff99bsc0χOL) of the Cornell et al. force field. We test several ion conditions and two water models. The simulations characterize the directional intrinsic flexibility of reverse kink-turns pertinent to their folded functional geometries. The reverse kink-turns are the most flexible RNA motifs studied so far by explicit solvent simulations which are capable at the present simulation time scale to spontaneously and reversibly sample a wide range of geometries from tightly kinked ones through flexible intermediates up to extended, unkinked structures. A possible biochemical role of the flexibility is discussed. Among the tested force fields, the latest χOL variant is essential to obtaining stable trajectories while all force field versions lacking the χ correction are prone to a swift degradation toward senseless ladder-like structures of stems, characterized by high-anti glycosidic torsions. The type of explicit water model affects the simulations considerably more than concentration and the type of ions.
Collapse
Affiliation(s)
- Petr Sklenovský
- Regional Centre of Advanced Technologies and Materials, Department of Physical Chemistry, Faculty of Science, Palacky University Olomouc , tr. 17. listopadu 12, 771 46 Olomouc, Czech Republic
| | - Petra Florová
- Regional Centre of Advanced Technologies and Materials, Department of Physical Chemistry, Faculty of Science, Palacky University Olomouc , tr. 17. listopadu 12, 771 46 Olomouc, Czech Republic
| | - Pavel Banáš
- Regional Centre of Advanced Technologies and Materials, Department of Physical Chemistry, Faculty of Science, Palacky University Olomouc , tr. 17. listopadu 12, 771 46 Olomouc, Czech Republic
| | - Kamila Réblová
- Institute of Biophysics, Academy of Sciences of the Czech Republic , Kralovopolska 135, 612 65 Brno, Czech Republic
| | - Filip Lankaš
- Centre for Complex Molecular Systems and Biomolecules, Institute of Organic Chemistry and Biochemistry , Flemingovo nam. 2, 166 10 Praha 6, Czech Republic
| | - Michal Otyepka
- Regional Centre of Advanced Technologies and Materials, Department of Physical Chemistry, Faculty of Science, Palacky University Olomouc , tr. 17. listopadu 12, 771 46 Olomouc, Czech Republic
| | - Jiří Šponer
- Institute of Biophysics, Academy of Sciences of the Czech Republic , Kralovopolska 135, 612 65 Brno, Czech Republic
| |
Collapse
|
34
|
Sequence-based identification of 3D structural modules in RNA with RMDetect. Nat Methods 2011; 8:513-21. [PMID: 21552257 DOI: 10.1038/nmeth.1603] [Citation(s) in RCA: 67] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2010] [Accepted: 04/11/2011] [Indexed: 01/24/2023]
Abstract
Structural RNA modules, sets of ordered non-Watson-Crick base pairs embedded between Watson-Crick pairs, have central roles as architectural organizers and sites of ligand binding in RNA molecules, and are recurrently observed in RNA families throughout the phylogeny. Here we describe a computational tool, RNA three-dimensional (3D) modules detection, or RMDetect, for identifying known 3D structural modules in single and multiple RNA sequences in the absence of any other information. Currently, four modules can be searched for: G-bulge loop, kink-turn, C-loop and tandem-GA loop. In control test sequences we found all of the known modules with a false discovery rate of 0.23. Scanning through 1,444 publicly available alignments, we identified 21 yet unreported modules and 141 known modules. RMDetect can be used to refine RNA 2D structure, assemble RNA 3D models, and search and annotate structured RNAs in genomic data.
Collapse
|