1
|
Khan N, Rahaman M, Zhang S. GINClus: RNA structural motif clustering using graph isomorphism network. NAR Genom Bioinform 2025; 7:lqaf050. [PMID: 40290315 PMCID: PMC12034103 DOI: 10.1093/nargab/lqaf050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2024] [Revised: 04/02/2025] [Accepted: 04/15/2025] [Indexed: 04/30/2025] Open
Abstract
Ribonucleic acid (RNA) structural motif identification is a crucial step for understanding RNA structure and functionality. Due to the complexity and variations of RNA 3D structures, identifying RNA structural motifs is challenging and time-consuming. Particularly, discovering new RNA structural motif families is a hard problem and still largely depends on manual analysis. In this paper, we proposed an RNA structural motif clustering tool, named GINClus, which uses a semi-supervised deep learning model to cluster RNA motif candidates (RNA loop regions) based on both base interaction and 3D structure similarities. GINClus converts base interactions and 3D structures of RNA motif candidates into graph representations and using graph isomorphism network (GIN) model in combination with K-means and hierarchical agglomerative clustering, GINClus clusters the RNA motif candidates based on their structural similarities. GINClus has a clustering accuracy of 87.88% for known internal loop motifs and 97.69% for known hairpin loop motifs. Using GINClus, we successfully clustered the motifs of the same families together and were able to find 927 new instances of Sarcin-ricin, Kink-turn, Tandem-shear, Hook-turn, E-loop, C-loop, T-loop, and GNRA loop motif families. We also identified 12 new RNA structural motif families with unique structure and base-pair interactions.
Collapse
Affiliation(s)
- Nabila Shahnaz Khan
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, United States
| | - Md Mahfuzur Rahaman
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, United States
| | - Shaojie Zhang
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, United States
| |
Collapse
|
2
|
Karan A, Rivas E. All-at-once RNA folding with 3D motif prediction framed by evolutionary information. RESEARCH SQUARE 2025:rs.3.rs-5664139. [PMID: 40195991 PMCID: PMC11974997 DOI: 10.21203/rs.3.rs-5664139/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2025]
Abstract
Structural RNAs exhibit a vast array of recurrent short 3D elements involving non-Watson-Crick interactions that help arrange canonical double helices into tertiary structures. We present CaCoFold-R3D, a probabilistic grammar that predicts these RNA 3D motifs (also termed modules) jointly with RNA secondary structure over a sequence or alignment. CaCoFold-R3D uses evolutionary information present in an RNA alignment to reliably identify canonical helices (including pseudoknots) by covariation. We further introduce the R3D grammars, which also exploit helix covariation that constrains the positioning of the mostly non-covarying RNA 3D motifs. Our method runs predictions over an almost-exhaustive list of over fifty known RNA motifs (everything). Motifs can appear in any non-helical loop region (including 3-way, 4-way and higher junctions) (everywhere). All structural motifs as well as the canonical helices are arranged into one single structure predicted by one single joint probabilistic grammar (all-at-once). Our results demonstrate that CaCoFold-R3D is a valid alternative for predicting the all-residue interactions present in a RNA 3D structure. Furthermore, CaCoFold-R3D is fast and easily customizable for novel motif discovery.
Collapse
|
3
|
Karan A, Rivas E. All-at-once RNA folding with 3D motif prediction framed by evolutionary information. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.12.17.628809. [PMID: 39764046 PMCID: PMC11702757 DOI: 10.1101/2024.12.17.628809] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/15/2025]
Abstract
Structural RNAs exhibit a vast array of recurrent short 3D elements involving non-Watson-Crick interactions that help arrange canonical double helices into tertiary structures. We present CaCoFold-R3D, a probabilistic grammar that predicts these RNA 3D motifs (also termed modules) jointly with RNA secondary structure over a sequence or alignment. CaCoFold-R3D uses evolutionary information present in an RNA alignment to reliably identify canonical helices (including pseudoknots) by covariation. We further introduce the R3D grammars, which also exploit helix covariation that constrains the positioning of the mostly non-covarying RNA 3D motifs. Our method runs predictions over an almost-exhaustive list of over fifty known RNA motifs (everything). Motifs can appear in any non-helical loop region (including 3-way, 4-way and higher junctions) (everywhere). All structural motifs as well as the canonical helices are arranged into one single structure predicted by one single joint probabilistic grammar (all-at-once). Our results demonstrate that CaCoFold-R3D is a valid alternative for predicting the all-residue interactions present in a RNA 3D structure. Furthermore, CaCoFold-R3D is fast and easily customizable for novel motif discovery.
Collapse
|
4
|
Mondal M, Gao YQ. Atomistic Insights into Sequence-Mediated Spontaneous Association of Short RNA Chains. Biochemistry 2024; 63:2916-2936. [PMID: 39377398 DOI: 10.1021/acs.biochem.4c00293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/09/2024]
Abstract
RNA-RNA association and phase separation appear to be essential for the assembly of stress granules and underlie RNA foci formation in repeat expansion disorders. RNA molecules are found to play a significant role in gene-regulatory functions via condensate formation among themselves or with RNA-binding proteins. The interplay between driven versus spontaneous processes is likely to be an important factor for controlling the formation of RNA-mediated biomolecular condensate. However, the sequence-specific interactions and molecular mechanisms that drive the spontaneous RNA-RNA association and help to form RNA-mediated phase-separated condensate remain unclear. With microseconds-long atomistic molecular simulations here, we report how essential aspects of RNA chains, namely, base composition, metal ion binding, and hydration properties, contribute to the association of the series of simplest biologically relevant homopolymeric and heteropolymeric short RNA chains. We show that spontaneous processes make the key contributions governed by the sequence-intrinsic properties of RNA chains, where the definite roles of base-specific hydrogen bonding and stacking interactions are prominent in the association of the RNA chains. Purine versus pyrimidine contents of RNA chains can directly influence the association properties of RNA chains by modulating hydrogen bonding and base stacking interactions. This study determines the impact of ionic environment in sequence-specific spontaneous association of short RNA chains, hydration features, and base-specific interactions of Na+, K+, and Mg2+ ions with RNA chains.
Collapse
Affiliation(s)
- Manas Mondal
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, 518107 Shenzhen, China
| | - Yi Qin Gao
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, 518107 Shenzhen, China
- Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University, 100871 Beijing, China
- Biomedical Pioneering Innovation Center, Peking University, 100871 Beijing, China
- Changping Laboratory, Beijing 102200, China
| |
Collapse
|
5
|
Appasamy SD, Zirbel CL. R3DMCS: a web server for visualizing structural variation in RNA motifs across experimental 3D structures from the same organism or across species. Bioinformatics 2024; 40:btae682. [PMID: 39546379 PMCID: PMC11588024 DOI: 10.1093/bioinformatics/btae682] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2024] [Revised: 10/17/2024] [Accepted: 11/13/2024] [Indexed: 11/17/2024] Open
Abstract
MOTIVATION The recent progress in RNA structure determination methods has resulted in a surge of newly solved RNA 3D structures. However, there is an absence of a user-friendly browser-based tool that can facilitate the comparison and visualization of RNA motifs across multiple 3D structures. RESULTS We introduce R3DMCS, a web server that allows users to compare selected RNA nucleotides across all 3D structures of a given molecule from a given species, or across all 3D structures mapped to a single Rfam family. Starting from one instance of the motif, R3DMCS retrieves, aligns, annotates, organizes, and displays 3D coordinates of corresponding sets of nucleotides from other 3D structures. With R3DMCS, one can explore conformational changes of motifs due to 3D structures being solved in different functional states or different experimental conditions. One can also investigate conservation of 3D structure across species, or changes in 3D structure due to changes in sequence. AVAILABILITY AND IMPLEMENTATION R3DMCS is open-source software and freely available at https://rna.bgsu.edu/correspondence/ and https://github.com/BGSU-RNA/RNA-3D-correspondence.
Collapse
Affiliation(s)
- Sri Devan Appasamy
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
- Department of Biological Sciences, Bowling Green State University, Bowling Green, OH 43403, United States
| | - Craig L Zirbel
- Department of Mathematics and Statistics, Bowling Green State University, Bowling Green, OH 43403, United States
| |
Collapse
|
6
|
Rahaman MM, Zhang S. RNAMotifProfile: a graph-based approach to build RNA structural motif profiles. NAR Genom Bioinform 2024; 6:lqae128. [PMID: 39328267 PMCID: PMC11426329 DOI: 10.1093/nargab/lqae128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 07/24/2024] [Accepted: 09/09/2024] [Indexed: 09/28/2024] Open
Abstract
RNA structural motifs are the recurrent segments in RNA three-dimensional structures that play a crucial role in the functional diversity of RNAs. Understanding the similarities and variations within these recurrent motif groups is essential for gaining insights into RNA structure and function. While recurrent structural motifs are generally assumed to be composed of the same isosteric base interactions, this consistent pattern is not observed across all examples of these motifs. Existing methods for analyzing and comparing RNA structural motifs may overlook variations in base interactions and associated nucleotides. RNAMotifProfile is a novel profile-to-profile alignment algorithm that generates a comprehensive profile from a group of structural motifs, incorporating all base interactions and associated nucleotides at each position. By structurally aligning input motif instances using a guide-tree-based approach, RNAMotifProfile captures the similarities and variations within recurrent motif groups. Additionally, RNAMotifProfile can function as a motif search tool, enabling the identification of instances of a specific motif family by searching with the corresponding profile. The ability to generate accurate and comprehensive profiles for RNA structural motif families, and to search for these motifs, facilitates a deeper understanding of RNA structure-function relationships and potential applications in RNA engineering and therapeutic design.
Collapse
Affiliation(s)
- Md Mahfuzur Rahaman
- Department of Computer Science, University of Central Florida, 4328 Scorpius Street, Orlando, FL 32816-2362, USA
| | - Shaojie Zhang
- Department of Computer Science, University of Central Florida, 4328 Scorpius Street, Orlando, FL 32816-2362, USA
| |
Collapse
|
7
|
McCann HM, Meade CD, Banerjee B, Penev PI, Dean Williams L, Petrov AS. RiboVision2: A Web Server for Advanced Visualization of Ribosomal RNAs. J Mol Biol 2024; 436:168556. [PMID: 39237196 DOI: 10.1016/j.jmb.2024.168556] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 03/24/2024] [Accepted: 03/25/2024] [Indexed: 09/07/2024]
Abstract
RiboVision2 is a web server designed to visualize phylogenetic, structural, and evolutionary properties of ribosomal RNAs simultaneously at the levels of primary, secondary, and three-dimensional structure and in the context of full ribosomal complexes. RiboVision2 instantly computes and displays a broad variety of data; it has no login requirements, is open-source, free for all users, and available at https://ribovision2.chemistry.gatech.edu.
Collapse
Affiliation(s)
- Holly M McCann
- NASA Center for the Origin of Life, Georgia Institute of Technology, Atlanta, GA 30332-0400, USA; School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Caeden D Meade
- NASA Center for the Origin of Life, Georgia Institute of Technology, Atlanta, GA 30332-0400, USA; School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Biswajit Banerjee
- NASA Center for the Origin of Life, Georgia Institute of Technology, Atlanta, GA 30332-0400, USA; School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Petar I Penev
- NASA Center for the Origin of Life, Georgia Institute of Technology, Atlanta, GA 30332-0400, USA; School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Loren Dean Williams
- NASA Center for the Origin of Life, Georgia Institute of Technology, Atlanta, GA 30332-0400, USA; School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Anton S Petrov
- NASA Center for the Origin of Life, Georgia Institute of Technology, Atlanta, GA 30332-0400, USA; School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA.
| |
Collapse
|
8
|
Szikszai M, Magnus M, Sanghi S, Kadyan S, Bouatta N, Rivas E. RNA3DB: A structurally-dissimilar dataset split for training and benchmarking deep learning models for RNA structure prediction. J Mol Biol 2024; 436:168552. [PMID: 38552946 PMCID: PMC11377173 DOI: 10.1016/j.jmb.2024.168552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 03/19/2024] [Accepted: 03/22/2024] [Indexed: 04/09/2024]
Abstract
With advances in protein structure prediction thanks to deep learning models like AlphaFold, RNA structure prediction has recently received increased attention from deep learning researchers. RNAs introduce substantial challenges due to the sparser availability and lower structural diversity of the experimentally resolved RNA structures in comparison to protein structures. These challenges are often poorly addressed by the existing literature, many of which report inflated performance due to using training and testing sets with significant structural overlap. Further, the most recent Critical Assessment of Structure Prediction (CASP15) has shown that deep learning models for RNA structure are currently outperformed by traditional methods. In this paper we present RNA3DB, a dataset of structured RNAs, derived from the Protein Data Bank (PDB), that is designed for training and benchmarking deep learning models. The RNA3DB method arranges the RNA 3D chains into distinct groups (Components) that are non-redundant both with regard to sequence as well as structure, providing a robust way of dividing training, validation, and testing sets. Any split of these structurally-dissimilar Components are guaranteed to produce test and validations sets that are distinct by sequence and structure from those in the training set. We provide the RNA3DB dataset, a particular train/test split of the RNA3DB Components (in an approximate 70/30 ratio) that will be updated periodically. We also provide the RNA3DB methodology along with the source-code, with the goal of creating a reproducible and customizable tool for producing structurally-dissimilar dataset splits for structural RNAs.
Collapse
Affiliation(s)
- Marcell Szikszai
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, 02138, MA, USA
| | - Marcin Magnus
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, 02138, MA, USA
| | - Siddhant Sanghi
- Department of Systems Biology, Columbia University, New York 10027, NY, USA; College of Biological Sciences, UC Davis, Davis 95616, CA, USA
| | - Sachin Kadyan
- Department of Systems Biology, Columbia University, New York 10027, NY, USA
| | - Nazim Bouatta
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston 02115, MA, USA
| | - Elena Rivas
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, 02138, MA, USA
| |
Collapse
|
9
|
Zhang C, Freddolino L. FURNA: A database for functional annotations of RNA structures. PLoS Biol 2024; 22:e3002476. [PMID: 39074139 PMCID: PMC11309384 DOI: 10.1371/journal.pbio.3002476] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 08/08/2024] [Accepted: 06/24/2024] [Indexed: 07/31/2024] Open
Abstract
Despite the increasing number of 3D RNA structures in the Protein Data Bank, the majority of experimental RNA structures lack thorough functional annotations. As the significance of the functional roles played by noncoding RNAs becomes increasingly apparent, comprehensive annotation of RNA function is becoming a pressing concern. In response to this need, we have developed FURNA (Functions of RNAs), the first database for experimental RNA structures that aims to provide a comprehensive repository of high-quality functional annotations. These include Gene Ontology terms, Enzyme Commission numbers, ligand-binding sites, RNA families, protein-binding motifs, and cross-references to related databases. FURNA is available at https://seq2fun.dcmb.med.umich.edu/furna/ to enable quick discovery of RNA functions from their structures and sequences.
Collapse
Affiliation(s)
- Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Lydia Freddolino
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, United States of America
| |
Collapse
|
10
|
Chol A, Sarrazin-Gendron R, Lécuyer É, Blanchette M, Waldispühl J. PERFUMES: pipeline to extract RNA functional motifs and exposed structures. Bioinformatics 2024; 40:btae056. [PMID: 38291894 PMCID: PMC10868343 DOI: 10.1093/bioinformatics/btae056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 11/28/2023] [Accepted: 01/28/2024] [Indexed: 02/01/2024] Open
Abstract
MOTIVATION Up to 75% of the human genome encodes RNAs. The function of many non-coding RNAs relies on their ability to fold into 3D structures. Specifically, nucleotides inside secondary structure loops form non-canonical base pairs that help stabilize complex local 3D structures. These RNA 3D motifs can promote specific interactions with other molecules or serve as catalytic sites. RESULTS We introduce PERFUMES, a computational pipeline to identify 3D motifs that can be associated with observable features. Given a set of RNA sequences with associated binary experimental measurements, PERFUMES searches for RNA 3D motifs using BayesPairing2 and extracts those that are over-represented in the set of positive sequences. It also conducts a thermodynamics analysis of the structural context that can support the interpretation of the predictions. We illustrate PERFUMES' usage on the SNRPA protein binding site, for which the tool retrieved both previously known binder motifs and new ones. AVAILABILITY AND IMPLEMENTATION PERFUMES is an open-source Python package (https://jwgitlab.cs.mcgill.ca/arnaud_chol/perfumes).
Collapse
Affiliation(s)
- Arnaud Chol
- School of Computer Science, McGill University, Montréal, QC H3A 0E9, Canada
| | | | - Éric Lécuyer
- Institut de Recherches Cliniques de Montréal (IRCM), Montréal, QC H2W 1R7, Canada
| | - Mathieu Blanchette
- School of Computer Science, McGill University, Montréal, QC H3A 0E9, Canada
| | - Jérôme Waldispühl
- School of Computer Science, McGill University, Montréal, QC H3A 0E9, Canada
| |
Collapse
|
11
|
Loyer G, Reinharz V. Concurrent prediction of RNA secondary structures with pseudoknots and local 3D motifs in an integer programming framework. Bioinformatics 2024; 40:btae022. [PMID: 38230755 PMCID: PMC10868335 DOI: 10.1093/bioinformatics/btae022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 11/30/2023] [Accepted: 01/12/2024] [Indexed: 01/18/2024] Open
Abstract
MOTIVATION The prediction of RNA structure canonical base pairs from a single sequence, especially pseudoknotted ones, remains challenging in a thermodynamic models that approximates the energy of the local 3D motifs joining canonical stems. It has become more and more apparent in recent years that the structural motifs in the loops, composed of noncanonical interactions, are essential for the final shape of the molecule enabling its multiple functions. Our capacity to predict accurate 3D structures is also limited when it comes to the organization of the large intricate network of interactions that form inside those loops. RESULTS We previously developed the integer programming framework RNA Motifs over Integer Programming (RNAMoIP) to reconcile RNA secondary structure and local 3D motif information available in databases. We further develop our model to now simultaneously predict the canonical base pairs (with pseudoknots) from base pair probability matrices with or without alignment. We benchmarked our new method over the all nonredundant RNAs below 150 nucleotides. We show that the joined prediction of canonical base pairs structure and local conserved motifs (i) improves the ratio of well-predicted interactions in the secondary structure, (ii) predicts well canonical and Wobble pairs at the location where motifs are inserted, (iii) is greatly improved with evolutionary information, and (iv) noncanonical motifs at kink-turn locations. AVAILABILITY AND IMPLEMENTATION The source code of the framework is available at https://gitlab.info.uqam.ca/cbe/RNAMoIP and an interactive web server at https://rnamoip.cbe.uqam.ca/.
Collapse
Affiliation(s)
- Gabriel Loyer
- Department of Computer Science, Université du Québec à Montréal, Montréal, QC H2X 3Y7, Canada
| | - Vladimir Reinharz
- Department of Computer Science, Université du Québec à Montréal, Montréal, QC H2X 3Y7, Canada
| |
Collapse
|
12
|
Sarrazin-Gendron R, Waldispühl J, Reinharz V. Classification and Identification of Non-canonical Base Pairs and Structural Motifs. Methods Mol Biol 2024; 2726:143-168. [PMID: 38780731 DOI: 10.1007/978-1-0716-3519-3_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2024]
Abstract
The 3D structures of many ribonucleic acid (RNA) loops are characterized by highly organized networks of non-canonical interactions. Multiple computational methods have been developed to annotate structures with those interactions or automatically identify recurrent interaction networks. By contrast, the reverse problem that aims to retrieve the geometry of a look from its sequence or ensemble of interactions remains much less explored. In this chapter, we will describe how to retrieve and build families of conserved structural motifs using their underlying network of non-canonical interactions. Then, we will show how to assign sequence alignments to those families and use the software BayesPairing to build statistical models of structural motifs with their associated sequence alignments. From this model, we will apply BayesPairing to identify in new sequences regions where those loop geometries can occur.
Collapse
Affiliation(s)
| | | | - Vladimir Reinharz
- Department of Computer Science, Université du Québec à Montréal, Montreal, QC, Canada.
| |
Collapse
|
13
|
Zhang C, Freddolino PL. FURNA: a database for function annotations of RNA structures. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.19.572314. [PMID: 38187637 PMCID: PMC10769261 DOI: 10.1101/2023.12.19.572314] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Despite the increasing number of 3D RNA structures in the Protein Data Bank, the majority of experimental RNA structures lack thorough functional annotations. As the significance of the functional roles played by non-coding RNAs becomes increasingly apparent, comprehensive annotation of RNA function is becoming a pressing concern. In response to this need, we have developed FURNA (Functions of RNAs), the first database for experimental RNA structures that aims to provide a comprehensive repository of high-quality functional annotations. These include Gene Ontology terms, Enzyme Commission numbers, ligand binding sites, RNA families, protein binding motifs, and cross-references to related databases. FURNA is available at https://seq2fun.dcmb.med.umich.edu/furna/ to enable quick discovery of RNA functions from their structures and sequences.
Collapse
Affiliation(s)
- Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| | - P. Lydia Freddolino
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
14
|
Horlacher M, Cantini G, Hesse J, Schinke P, Goedert N, Londhe S, Moyon L, Marsico A. A systematic benchmark of machine learning methods for protein-RNA interaction prediction. Brief Bioinform 2023; 24:bbad307. [PMID: 37635383 PMCID: PMC10516373 DOI: 10.1093/bib/bbad307] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 06/15/2023] [Accepted: 07/18/2023] [Indexed: 08/29/2023] Open
Abstract
RNA-binding proteins (RBPs) are central actors of RNA post-transcriptional regulation. Experiments to profile-binding sites of RBPs in vivo are limited to transcripts expressed in the experimental cell type, creating the need for computational methods to infer missing binding information. While numerous machine-learning based methods have been developed for this task, their use of heterogeneous training and evaluation datasets across different sets of RBPs and CLIP-seq protocols makes a direct comparison of their performance difficult. Here, we compile a set of 37 machine learning (primarily deep learning) methods for in vivo RBP-RNA interaction prediction and systematically benchmark a subset of 11 representative methods across hundreds of CLIP-seq datasets and RBPs. Using homogenized sample pre-processing and two negative-class sample generation strategies, we evaluate methods in terms of predictive performance and assess the impact of neural network architectures and input modalities on model performance. We believe that this study will not only enable researchers to choose the optimal prediction method for their tasks at hand, but also aid method developers in developing novel, high-performing methods by introducing a standardized framework for their evaluation.
Collapse
Affiliation(s)
- Marc Horlacher
- Computational Health Center, Helmholtz Center Munich, Germany
- School of Computation, Information and Technology, Technical University Munich (TUM), Germany
| | - Giulia Cantini
- Computational Health Center, Helmholtz Center Munich, Germany
| | - Julian Hesse
- Computational Health Center, Helmholtz Center Munich, Germany
| | - Patrick Schinke
- Computational Health Center, Helmholtz Center Munich, Germany
| | - Nicolas Goedert
- Computational Health Center, Helmholtz Center Munich, Germany
| | | | - Lambert Moyon
- Computational Health Center, Helmholtz Center Munich, Germany
| | | |
Collapse
|
15
|
Bohdan DR, Voronina VV, Bujnicki JM, Baulin EF. A comprehensive survey of long-range tertiary interactions and motifs in non-coding RNA structures. Nucleic Acids Res 2023; 51:8367-8382. [PMID: 37471030 PMCID: PMC10484739 DOI: 10.1093/nar/gkad605] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Accepted: 07/07/2023] [Indexed: 07/21/2023] Open
Abstract
Understanding the 3D structure of RNA is key to understanding RNA function. RNA 3D structure is modular and can be seen as a composition of building blocks of various sizes called tertiary motifs. Currently, long-range motifs formed between distant loops and helical regions are largely less studied than the local motifs determined by the RNA secondary structure. We surveyed long-range tertiary interactions and motifs in a non-redundant set of non-coding RNA 3D structures. A new dataset of annotated LOng-RAnge RNA 3D modules (LORA) was built using an approach that does not rely on the automatic annotations of non-canonical interactions. An original algorithm, ARTEM, was developed for annotation-, sequence- and topology-independent superposition of two arbitrary RNA 3D modules. The proposed methods allowed us to identify and describe the most common long-range RNA tertiary motifs. Along with the prevalent canonical A-minor interactions, a large number of previously undescribed staple interactions were observed. The most frequent long-range motifs were found to belong to three main motif families: planar staples, tilted staples, and helical packing motifs.
Collapse
Affiliation(s)
- Davyd R Bohdan
- Department of Innovation and High Technology, Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
| | - Valeria V Voronina
- Department of Information Systems, Ulyanovsk State Technical University, Ulyanovsk 432027, Russia
| | - Janusz M Bujnicki
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, Warsaw 02-109, Poland
| | - Eugene F Baulin
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, Warsaw 02-109, Poland
| |
Collapse
|
16
|
Rahaman MM, Khan NS, Zhang S. RNAMotifComp: a comprehensive method to analyze and identify structurally similar RNA motif families. Bioinformatics 2023; 39:i337-i346. [PMID: 37387191 DOI: 10.1093/bioinformatics/btad223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION The 3D structures of RNA play a critical role in understanding their functionalities. There exist several computational methods to study RNA 3D structures by identifying structural motifs and categorizing them into several motif families based on their structures. Although the number of such motif families is not limited, a few of them are well-studied. Out of these structural motif families, there exist several families that are visually similar or very close in structure, even with different base interactions. Alternatively, some motif families share a set of base interactions but maintain variation in their 3D formations. These similarities among different motif families, if known, can provide a better insight into the RNA 3D structural motifs as well as their characteristic functions in cell biology. RESULTS In this work, we proposed a method, RNAMotifComp, that analyzes the instances of well-known structural motif families and establishes a relational graph among them. We also have designed a method to visualize the relational graph where the families are shown as nodes and their similarity information is represented as edges. We validated our discovered correlations of the motif families using RNAMotifContrast. Additionally, we used a basic Naïve Bayes classifier to show the importance of RNAMotifComp. The relational analysis explains the functional analogies of divergent motif families and illustrates the situations where the motifs of disparate families are predicted to be of the same family. AVAILABILITY AND IMPLEMENTATION Source code publicly available at https://github.com/ucfcbb/RNAMotifFamilySimilarity.
Collapse
Affiliation(s)
- Md Mahfuzur Rahaman
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, United States
| | - Nabila Shahnaz Khan
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, United States
| | - Shaojie Zhang
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, United States
| |
Collapse
|
17
|
Li J, Chen SJ. RNAJP: enhanced RNA 3D structure predictions with non-canonical interactions and global topology sampling. Nucleic Acids Res 2023; 51:3341-3356. [PMID: 36864729 PMCID: PMC10123122 DOI: 10.1093/nar/gkad122] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Revised: 01/14/2023] [Accepted: 02/25/2023] [Indexed: 03/04/2023] Open
Abstract
RNA 3D structures are critical for understanding their functions. However, only a limited number of RNA structures have been experimentally solved, so computational prediction methods are highly desirable. Nevertheless, accurate prediction of RNA 3D structures, especially those containing multiway junctions, remains a significant challenge, mainly due to the complicated non-canonical base pairing and stacking interactions in the junction loops and the possible long-range interactions between loop structures. Here we present RNAJP ('RNA Junction Prediction'), a nucleotide- and helix-level coarse-grained model for the prediction of RNA 3D structures, particularly junction structures, from a given 2D structure. Through global sampling of the 3D arrangements of the helices in junctions using molecular dynamics simulations and in explicit consideration of non-canonical base pairing and base stacking interactions as well as long-range loop-loop interactions, the model can provide significantly improved predictions for multibranched junction structures than existing methods. Moreover, integrated with additional restraints from experiments, such as junction topology and long-range interactions, the model may serve as a useful structure generator for various applications.
Collapse
Affiliation(s)
- Jun Li
- Department of Physics, Department of Biochemistry and Institute for Data Science and Informatics, University of Missouri, Columbia, MO 65211, USA
| | - Shi-Jie Chen
- Department of Physics, Department of Biochemistry and Institute for Data Science and Informatics, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
18
|
Saon MS, Kirkpatrick CC, Znosko BM. Identification and characterization of RNA pentaloop sequence families. NAR Genom Bioinform 2023; 5:lqac102. [PMID: 36632613 PMCID: PMC9830547 DOI: 10.1093/nargab/lqac102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Revised: 10/28/2022] [Accepted: 12/12/2022] [Indexed: 01/11/2023] Open
Abstract
One of the current methods for predicting RNA tertiary structure is fragment-based homology, which predicts tertiary structure from secondary structure. For a successful prediction, this method requires a library of the tertiary structures of small motifs clipped from previously solved RNA 3D structures. Because of the limited number of available tertiary structures, it is not practical to find structures for all sequences of all motifs. Identifying sequence families for motifs can fill the gaps because all sequences within a family are expected to have similar structural features. Currently, a collection of well-characterized sequence families has been identified for tetraloops. Because of their prevalence and biological functions, pentaloop structures should also be well-characterized. In this study, 10 pentaloop sequence families are identified. For each family, the common and distinguishing structural features are highlighted. These sequence families can be used to predict the tertiary structure of pentaloop sequences for which a solved structure is not available.
Collapse
Affiliation(s)
- Md Sharear Saon
- Department of Chemistry, Saint Louis University, Saint Louis, MO 63103, USA
| | | | - Brent M Znosko
- Department of Chemistry, Saint Louis University, Saint Louis, MO 63103, USA
| |
Collapse
|
19
|
Zhao M, Wang R, Yang K, Jiang Y, Peng Y, Li Y, Zhang Z, Ding J, Shi S. Nucleic acid nanoassembly-enhanced RNA therapeutics and diagnosis. Acta Pharm Sin B 2023; 13:916-941. [PMID: 36970219 PMCID: PMC10031267 DOI: 10.1016/j.apsb.2022.10.019] [Citation(s) in RCA: 61] [Impact Index Per Article: 30.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2022] [Revised: 08/22/2022] [Accepted: 09/10/2022] [Indexed: 11/16/2022] Open
Abstract
RNAs are involved in the crucial processes of disease progression and have emerged as powerful therapeutic targets and diagnostic biomarkers. However, efficient delivery of therapeutic RNA to the targeted location and precise detection of RNA markers remains challenging. Recently, more and more attention has been paid to applying nucleic acid nanoassemblies in diagnosing and treating. Due to the flexibility and deformability of nucleic acids, the nanoassemblies could be fabricated with different shapes and structures. With hybridization, nucleic acid nanoassemblies, including DNA and RNA nanostructures, can be applied to enhance RNA therapeutics and diagnosis. This review briefly introduces the construction and properties of different nucleic acid nanoassemblies and their applications for RNA therapy and diagnosis and makes further prospects for their development.
Collapse
Affiliation(s)
- Mengnan Zhao
- State Key Laboratory of Southwestern Chinese Medicine Resources, School of Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
| | - Rujing Wang
- State Key Laboratory of Southwestern Chinese Medicine Resources, School of Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
| | - Kunmeng Yang
- The First Norman Bethune College of Clinical Medicine, Jilin University, Changchun 130061, China
| | - Yuhong Jiang
- School of Life Science and Engineering, Southwest Jiaotong University, Chengdu 610031, China
| | - Yachen Peng
- Key Laboratory of Polymer Ecomaterials, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun 130022, China
- Department of Orthopedics, China-Japan Union Hospital of Jilin University, Changchun 130033, China
| | - Yuke Li
- State Key Laboratory of Southwestern Chinese Medicine Resources, School of Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
| | - Zhen Zhang
- State Key Laboratory of Southwestern Chinese Medicine Resources, School of Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
| | - Jianxun Ding
- Key Laboratory of Polymer Ecomaterials, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun 130022, China
| | - Sanjun Shi
- State Key Laboratory of Southwestern Chinese Medicine Resources, School of Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
| |
Collapse
|
20
|
Aliyev DA, Zirbel CL. Seriation Using Tree-penalized Path Length. EUROPEAN JOURNAL OF OPERATIONAL RESEARCH 2023; 305:617-629. [PMID: 36385922 PMCID: PMC9642984 DOI: 10.1016/j.ejor.2022.06.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Given a sample of n data points and an n by n dissimilarity matrix, data seriation methods produce a linear ordering of the objects, putting similar objects nearby in the ordering. One may visualize the reordered dissimilarity matrix with a heat map and thus understand the structure of the data, while still displaying the full matrix of dissimilarities. Good orderings produce heat maps that are easy to read and allow for clear interpretation. We consider two popular seriation methods, minimizing path length by solving the Traveling Salesman Problem (TSP), and Optimal Leaf Ordering (OLO), which minimizes path length among all orderings consistent with a given tree structure. Learning from the strengths and weaknesses of the two methods, we introduce a new hybrid seriation method, tree-penalized Path Length (tpPL). The objective is a linear combination of path length and the extent of violations of the tree structure, with a parameter that transitions the optimal paths smoothly from TSP to OLO. We present a detailed study over 44 synthetic datasets which are designed to bring out the strengths and weaknesses of the three methods, finding that the hybrid nature of tpPL enables it to overcome the weaknesses of TSP and OLO.
Collapse
Affiliation(s)
- Denis A Aliyev
- Department of Applied Mathematics, Virginia Military Institute, Lexington, VA 24450
| | - Craig L Zirbel
- Department of Mathematics and Statistics, Bowling Green State University, Bowling Green, OH 43403
| |
Collapse
|
21
|
Jurich CP, Yesselman JD. Automated 3D Design and Evaluation of RNA Nanostructures with RNAMake. Methods Mol Biol 2023; 2586:251-261. [PMID: 36705909 DOI: 10.1007/978-1-0716-2768-6_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Despite growing interest in applying RNA's unique structural characteristics to solve diverse biotechnology and nanotechnology problems, there are few computational tools for targeted tertiary design. As a result, RNA 3D design is traditionally slow, resource-consuming, and dependent on expert modeling. In this chapter, we discuss our recently developed software package: RNAMake, a set of applications capable of designing RNA tertiary structures to solve various relevant nanotechnology problems and provide basic thermodynamic calculations for the generated designs. We provide in-depth examples and instructions for designing example RNA nanostructures such as minimal RNA sequences containing a single tertiary contact, generating RNAs that stabilize small-molecule ligands, and building tethers that link ribosomal subunits together. We also highlight the addition of a new Monte Carlo design algorithm and the ability to estimate the thermodynamic contribution of helical elements in RNA 3D structures.
Collapse
Affiliation(s)
- Chris P Jurich
- Department of Chemistry, University of Nebraska-Lincoln, Lincoln, NE, USA
| | - Joseph D Yesselman
- Department of Chemistry, University of Nebraska-Lincoln, Lincoln, NE, USA.
| |
Collapse
|
22
|
Zhou L, Wang X, Yu S, Tan YL, Tan ZJ. FebRNA: An automated fragment-ensemble-based model for building RNA 3D structures. Biophys J 2022; 121:3381-3392. [PMID: 35978551 PMCID: PMC9515226 DOI: 10.1016/j.bpj.2022.08.017] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Revised: 07/19/2022] [Accepted: 08/15/2022] [Indexed: 11/23/2022] Open
Abstract
Knowledge of RNA three-dimensional (3D) structures is critical to understanding the important biological functions of RNAs. Although various structure prediction models have been developed, the high-accuracy predictions of RNA 3D structures are still limited to the RNAs with short lengths or with simple topology. In this work, we proposed a new model, namely FebRNA, for building RNA 3D structures through fragment assembly based on coarse-grained (CG) fragment ensembles. Specifically, FebRNA is composed of four processes: establishing the library of different types of non-redundant CG fragment ensembles regardless of the sequences, building CG 3D structure ensemble through fragment assembly, identifying top-scored CG structures through a specific CG scoring function, and rebuilding the all-atom structures from the top-scored CG ones. Extensive examination against different types of RNA structures indicates that FebRNA consistently gives the reliable predictions on RNA 3D structures, including pseudoknots, three-way junctions, four-way and five-way junctions, and RNAs in the RNA-Puzzles. FebRNA is available on the Web site: https://github.com/Tan-group/FebRNA.
Collapse
Affiliation(s)
- Li Zhou
- Department of Physics and Key Laboratory of Artificial Micro & Nano-structures of Education, School of Physics and Technology, Wuhan University, Wuhan 430072, China
| | - Xunxun Wang
- Department of Physics and Key Laboratory of Artificial Micro & Nano-structures of Education, School of Physics and Technology, Wuhan University, Wuhan 430072, China
| | - Shixiong Yu
- Department of Physics and Key Laboratory of Artificial Micro & Nano-structures of Education, School of Physics and Technology, Wuhan University, Wuhan 430072, China
| | - Ya-Lan Tan
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University, Wuhan 430073, China.
| | - Zhi-Jie Tan
- Department of Physics and Key Laboratory of Artificial Micro & Nano-structures of Education, School of Physics and Technology, Wuhan University, Wuhan 430072, China.
| |
Collapse
|
23
|
Wiedemann J, Kaczor J, Milostan M, Zok T, Blazewicz J, Szachniuk M, Antczak M. RNAloops: a database of RNA multiloops. Bioinformatics 2022; 38:4200-4205. [PMID: 35809063 PMCID: PMC9438955 DOI: 10.1093/bioinformatics/btac484] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Revised: 06/26/2022] [Accepted: 07/06/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Knowledge of the 3D structure of RNA supports discovering its functions and is crucial for designing drugs and modern therapeutic solutions. Thus, much attention is devoted to experimental determination and computational prediction targeting the global fold of RNA and its local substructures. The latter include multi-branched loops-functionally significant elements that highly affect the spatial shape of the entire molecule. Unfortunately, their computational modeling constitutes a weak point of structural bioinformatics. A remedy for this is in collecting these motifs and analyzing their features. RESULTS RNAloops is a self-updating database that stores multi-branched loops identified in the PDB-deposited RNA structures. A description of each loop includes angular data-planar and Euler angles computed between pairs of adjacent helices to allow studying their mutual arrangement in space. The system enables search and analysis of multiloops, presents their structure details numerically and visually, and computes data statistics. AVAILABILITY AND IMPLEMENTATION RNAloops is freely accessible at https://rnaloops.cs.put.poznan.pl. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jakub Wiedemann
- Institute of Computing Science, Poznan University of Technology, 60-965 Poznan, Poland
| | - Jacek Kaczor
- Institute of Computing Science, Poznan University of Technology, 60-965 Poznan, Poland
| | - Maciej Milostan
- Institute of Computing Science, Poznan University of Technology, 60-965 Poznan, Poland,Poznan Supercomputing and Networking Center, 61-131 Poznan, Poland
| | - Tomasz Zok
- Institute of Computing Science, Poznan University of Technology, 60-965 Poznan, Poland,Poznan Supercomputing and Networking Center, 61-131 Poznan, Poland
| | - Jacek Blazewicz
- Institute of Computing Science, Poznan University of Technology, 60-965 Poznan, Poland,Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704 Poznan, Poland
| | | | | |
Collapse
|
24
|
Developing Community Resources for Nucleic Acid Structures. Life (Basel) 2022; 12:life12040540. [PMID: 35455031 PMCID: PMC9031032 DOI: 10.3390/life12040540] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 03/28/2022] [Accepted: 03/31/2022] [Indexed: 01/14/2023] Open
Abstract
In this review, we describe the creation of the Nucleic Acid Database (NDB) at Rutgers University and how it became a testbed for the current infrastructure of the RCSB Protein Data Bank. We describe some of the special features of the NDB and how it has been used to enable research. Plans for the next phase as the Nucleic Acid Knowledgebase (NAKB) are summarized.
Collapse
|
25
|
Takase N, Otsu M, Hirakata S, Ishizu H, Siomi MC, Kawai G. T-hairpin structure found in the RNA element involved in piRNA biogenesis. RNA (NEW YORK, N.Y.) 2022; 28:541-550. [PMID: 34987083 PMCID: PMC8925976 DOI: 10.1261/rna.078967.121] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Accepted: 12/21/2021] [Indexed: 06/14/2023]
Abstract
PIWI-interacting RNAs (piRNAs) repress transposons to protect the germline genome from DNA damage caused by transposon transposition. In Drosophila, the Traffic jam (Tj) mRNA is consumed to produce piRNA in its 3'-UTR. A cis element located within the 3'-UTR, Tj-cis, is necessary for piRNA biogenesis. In this study, we analyzed the structure of the Tj-cis RNA, a 100-nt RNA corresponding to the Tj-cis element, by the SHAPE and NMR analyses and found that a stable hairpin structure formed in the 5' half of the Tj-cis RNA. The tertiary structure of the 16-nt stable hairpin was analyzed by NMR, and a novel stem-loop structure, the T-hairpin, was found. In the T-hairpin, four uridine residues are exposed to the solvent, suggesting that this stem-loop is the target of Yb protein, a Tudor domain-containing piRNA biogenesis factor. The piRNA biogenesis assay showed that both the T-hairpin and the 3' half are required for the function of the Tj-cis element, suggesting that both the T-hairpin and the 3' half are recognized by Yb protein.
Collapse
Affiliation(s)
- Naomi Takase
- Department of Life Science, Graduate School of Advanced Engineering, Chiba Institute of Technology, Chiba 275-0016, Japan
| | - Maina Otsu
- Department of Life Science, Graduate School of Advanced Engineering, Chiba Institute of Technology, Chiba 275-0016, Japan
| | - Shigeki Hirakata
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo 113-0032, Japan
| | - Hirotsugu Ishizu
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo 113-0032, Japan
- Department of Molecular Biology, Keio University School of Medicine, Tokyo 160-8582, Japan
| | - Mikiko C Siomi
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo 113-0032, Japan
| | - Gota Kawai
- Department of Life Science, Graduate School of Advanced Engineering, Chiba Institute of Technology, Chiba 275-0016, Japan
| |
Collapse
|
26
|
Oliver C, Mallet V, Philippopoulos P, Hamilton WL, Waldispühl J. Vernal: a tool for mining fuzzy network motifs in RNA. Bioinformatics 2022; 38:970-976. [PMID: 34791045 DOI: 10.1093/bioinformatics/btab768] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 09/19/2021] [Accepted: 11/09/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION RNA 3D motifs are recurrent substructures, modeled as networks of base pair interactions, which are crucial for understanding structure-function relationships. The task of automatically identifying such motifs is computationally hard, and remains a key challenge in the field of RNA structural biology and network analysis. State-of-the-art methods solve special cases of the motif problem by constraining the structural variability in occurrences of a motif, and narrowing the substructure search space. RESULTS Here, we relax these constraints by posing the motif finding problem as a graph representation learning and clustering task. This framing takes advantage of the continuous nature of graph representations to model the flexibility and variability of RNA motifs in an efficient manner. We propose a set of node similarity functions, clustering methods and motif construction algorithms to recover flexible RNA motifs. Our tool, Vernal can be easily customized by users to desired levels of motif flexibility, abundance and size. We show that Vernal is able to retrieve and expand known classes of motifs, as well as to propose novel motifs. AVAILABILITY AND IMPLEMENTATION The source code, data and a webserver are available at vernal.cs.mcgill.ca. We also provide a flexible interface and a user-friendly webserver to browse and download our results. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Carlos Oliver
- School of Computer Science, McGill University, Montréal, QC H3A 0E9, Canada.,Montreal Institute for Learning Algorithms (MILA), Montréal, QC H2S 3H1, Canada
| | - Vincent Mallet
- Structural Bioinformatics Unit, Department of Structural Biology and Chemistry, Institut Pasteur, CNRS UMR3528, C3BI, USR3756, Paris, France.,Mines ParisTech, Paris-Sciences-et-Lettres Research University, Center for Computational Biology, Paris 75272, France
| | | | - William L Hamilton
- School of Computer Science, McGill University, Montréal, QC H3A 0E9, Canada.,Montreal Institute for Learning Algorithms (MILA), Montréal, QC H2S 3H1, Canada
| | - Jérôme Waldispühl
- School of Computer Science, McGill University, Montréal, QC H3A 0E9, Canada
| |
Collapse
|
27
|
Wei J, Chen S, Zong L, Gao X, Li Y. Protein-RNA interaction prediction with deep learning: structure matters. Brief Bioinform 2022; 23:bbab540. [PMID: 34929730 PMCID: PMC8790951 DOI: 10.1093/bib/bbab540] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Revised: 11/14/2021] [Accepted: 11/22/2021] [Indexed: 12/11/2022] Open
Abstract
Protein-RNA interactions are of vital importance to a variety of cellular activities. Both experimental and computational techniques have been developed to study the interactions. Because of the limitation of the previous database, especially the lack of protein structure data, most of the existing computational methods rely heavily on the sequence data, with only a small portion of the methods utilizing the structural information. Recently, AlphaFold has revolutionized the entire protein and biology field. Foreseeably, the protein-RNA interaction prediction will also be promoted significantly in the upcoming years. In this work, we give a thorough review of this field, surveying both the binding site and binding preference prediction problems and covering the commonly used datasets, features and models. We also point out the potential challenges and opportunities in this field. This survey summarizes the development of the RNA-binding protein-RNA interaction field in the past and foresees its future development in the post-AlphaFold era.
Collapse
Affiliation(s)
- Junkang Wei
- Department of Computer Science and Engineering (CSE), The Chinese
University of Hong Kong (CUHK), 999077, Hong Kong SAR, China
| | - Siyuan Chen
- Computational Bioscience Research Center (CBRC),
King Abdullah University of Science and Technology (KAUST),
23955-6900, Thuwal, Saudi Arabia
| | - Licheng Zong
- Department of Computer Science and Engineering (CSE), The Chinese
University of Hong Kong (CUHK), 999077, Hong Kong SAR, China
| | - Xin Gao
- Computational Bioscience Research Center (CBRC),
King Abdullah University of Science and Technology (KAUST),
23955-6900, Thuwal, Saudi Arabia
| | - Yu Li
- Department of Computer Science and Engineering (CSE), The Chinese
University of Hong Kong (CUHK), 999077, Hong Kong SAR, China
- The CUHK Shenzhen Research Institute, Hi-Tech Park, 518057,
Shenzhen, China
| |
Collapse
|
28
|
Seemann SE, Mirza AH, Bang-Berthelsen CH, Garde C, Christensen-Dalsgaard M, Workman CT, Pociot F, Tommerup N, Gorodkin J, Ruzzo WL. OUP accepted manuscript. Nucleic Acids Res 2022; 50:2452-2463. [PMID: 35188540 PMCID: PMC8934657 DOI: 10.1093/nar/gkac067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 01/07/2022] [Accepted: 01/25/2022] [Indexed: 12/01/2022] Open
Abstract
Accelerated evolution of any portion of the genome is of significant interest, potentially signaling positive selection of phenotypic traits and adaptation. Accelerated evolution remains understudied for structured RNAs, despite the fact that an RNA’s structure is often key to its function. RNA structures are typically characterized by compensatory (structure-preserving) basepair changes that are unexpected given the underlying sequence variation, i.e., they have evolved through negative selection on structure. We address the question of how fast the primary sequence of an RNA can change through evolution while conserving its structure. Specifically, we consider predicted and known structures in vertebrate genomes. After careful control of false discovery rates, we obtain 13 de novo structures (and three known Rfam structures) that we predict to have rapidly evolving sequences—defined as structures where the primary sequences of human and mouse have diverged at least twice as fast (1.5 times for Rfam) as nearby neutrally evolving sequences. Two of the three known structures function in translation inhibition related to infection and immune response. We conclude that rapid sequence divergence does not preclude RNA structure conservation in vertebrates, although these events are relatively rare.
Collapse
Affiliation(s)
| | - Aashiq H Mirza
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Steno Diabetes Center Copenhagen, Gentofte, Denmark
| | - Claus H Bang-Berthelsen
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- National Food Institute, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Christian Garde
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
| | | | - Christopher T Workman
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Center for Biological Sequence Analysis, Technical University of Denmark, Denmark
| | - Flemming Pociot
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Steno Diabetes Center Copenhagen, Gentofte, Denmark
| | - Niels Tommerup
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Department of Cellular and Molecular Medicine (ICMM), University of Copenhagen, Denmark
| | - Jan Gorodkin
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Department of Veterinary and Animal Sciences, University of Copenhagen, Denmark
| | - Walter L Ruzzo
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Computer Science and Engineering and Genome Sciences, University of Washington, USA
- Fred Hutchinson Cancer Research Center, Seattle, USA
| |
Collapse
|
29
|
Gianfrotta C, Reinharz V, Lespinet O, Barth D, Denise A. On the predictibility of A-minor motifs from their local contexts. RNA Biol 2022; 19:1208-1227. [PMID: 36384383 PMCID: PMC9673937 DOI: 10.1080/15476286.2022.2144611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
This study investigates the importance of the structural context in the formation of a type I/II A-minor motif. This very frequent structural motif has been shown to be important in the spatial folding of RNA molecules. We developed an automated method to classify A-minor motif occurrences according to their 3D context similarities, and we used a graph approach to represent both the structural A-minor motif occurrences and their classes at different scales. This approach leads us to uncover new subclasses of A-minor motif occurrences according to their local 3D similarities. The majority of classes are composed of homologous occurrences, but some of them are composed of non-homologous occurrences. The different classifications we obtain allow us to better understand the importance of the context in the formation of A-minor motifs. In a second step, we investigate how much knowledge of the context around an A-minor motif can help to infer its presence (and position). More specifically, we want to determine what kind of information, contained in the structural context, can be useful to characterize and predict A-minor motifs. We show that, for some A-minor motifs, the topology combined with a sequence signal is sufficient to predict the presence and the position of an A-minor motif occurrence. In most other cases, these signals are not sufficient for predicting the A-minor motif, however we show that they are good signals for this purpose. All the classification and prediction pipelines rely on automated processes, for which we describe the underlying algorithms and parameters.
Collapse
Affiliation(s)
- Coline Gianfrotta
- Données et Algorithmes pour une Ville Intelligente et Durable (DAVID), Université de Versailles Saint-Quentin-en-Yvelines, Université Paris-Saclay, Versailles, France,Laboratoire Interdisciplinaire des Sciences du Numérique (LISN), Université Paris-Saclay, CNRS, Orsay, France,CONTACT Coline Gianfrotta Données et Algorithmes pour une Ville Intelligente et Durable (DAVID), Université de Versailles Saint-Quentin-en-Yvelines, Université Paris-Saclay, France
| | - Vladimir Reinharz
- Department of Computer Science, Université du Québec à Montréal, Québec, Canada
| | - Olivier Lespinet
- Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, CEA, CNRS, Gif-sur-Yvette, France
| | - Dominique Barth
- Données et Algorithmes pour une Ville Intelligente et Durable (DAVID), Université de Versailles Saint-Quentin-en-Yvelines, Université Paris-Saclay, Versailles, France
| | - Alain Denise
- Laboratoire Interdisciplinaire des Sciences du Numérique (LISN), Université Paris-Saclay, CNRS, Orsay, France,Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, CEA, CNRS, Gif-sur-Yvette, France
| |
Collapse
|
30
|
Hong X, Zheng J, Xie J, Tong X, Liu X, Song Q, Liu S, Liu S. RR3DD: an RNA global structure-based RNA three-dimensional structural classification database. RNA Biol 2021; 18:738-746. [PMID: 34663179 DOI: 10.1080/15476286.2021.1989200] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
The three-dimensional (3D) structure of RNA usually plays an important role in the recognition with RNA-binding protein. Along with the discovering of RNAs, several RNA databases are developed to study the functions of RNA based on sequence, secondary structure, local 3D structural motif and global structure. Based on RNA function and structure, different RNAs are classified and stored in SCOR and DARTS, respectively. The classification of RNA structures is useful in RNA structure prediction and function annotation. However, the SCOR and DARTS are not updated any more. In this study, we present an RNA classification database RR3DD based on RNA fold with the global 3D structural similarity. The RR3DD includes 13,601 RNA chains from PDB and mmCIF format structures which are classified into 780 RNA folds. The RNA chains from PDB and mmCIF format structures are aligned and clustered into 675 and 220 RNA folds, respectively. By analysing the RNA structure in RR3DD, we find that there are 11 clusters with more than 50 members. These clusters include rRNAs, riboswitches, tRNAs and so on. By mapping RR3DD into Rfam, we found that some RNAs without annotation by Rfam can be annotated through structural alignment. For example, we analysed tRNAs and found that tRNA were successfully grouped in RR3DD for which Rfam did not classify them into one family. Finally, we provide a web interface of RR3DD offering functions of browsing RR3DD, annotating RNA 3D structure and finding templates for RNA homology modelling.
Collapse
Affiliation(s)
- Xu Hong
- School of Physics, Huazhong University of Science and Technology, Wuhan, China
| | - Jinfang Zheng
- School of Physics, Huazhong University of Science and Technology, Wuhan, China
| | - Juan Xie
- School of Physics, Huazhong University of Science and Technology, Wuhan, China
| | - Xiaoxue Tong
- School of Physics, Huazhong University of Science and Technology, Wuhan, China
| | - Xudong Liu
- School of Physics, Huazhong University of Science and Technology, Wuhan, China
| | - Qi Song
- Key Laboratory of Fermentation Engineering (Ministry of Education, Hubei University of Technology, Wuhan, China
| | - Sen Liu
- Key Laboratory of Fermentation Engineering (Ministry of Education, Hubei University of Technology, Wuhan, China
| | - Shiyong Liu
- School of Physics, Huazhong University of Science and Technology, Wuhan, China
| |
Collapse
|
31
|
Emrizal R, Hamdani HY, Firdaus-Raih M. Graph Theoretical Methods and Workflows for Searching and Annotation of RNA Tertiary Base Motifs and Substructures. Int J Mol Sci 2021; 22:ijms22168553. [PMID: 34445259 PMCID: PMC8395288 DOI: 10.3390/ijms22168553] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Revised: 08/01/2021] [Accepted: 08/06/2021] [Indexed: 12/12/2022] Open
Abstract
The increasing number and complexity of structures containing RNA chains in the Protein Data Bank (PDB) have led to the need for automated structure annotation methods to replace or complement expert visual curation. This is especially true when searching for tertiary base motifs and substructures. Such base arrangements and motifs have diverse roles that range from contributions to structural stability to more direct involvement in the molecule's functions, such as the sites for ligand binding and catalytic activity. We review the utility of computational approaches in annotating RNA tertiary base motifs in a dataset of PDB structures, particularly the use of graph theoretical algorithms that can search for such base motifs and annotate them or find and annotate clusters of hydrogen-bond-connected bases. We also demonstrate how such graph theoretical algorithms can be integrated into a workflow that allows for functional analysis and comparisons of base arrangements and sub-structures, such as those involved in ligand binding. The capacity to carry out such automatic curations has led to the discovery of novel motifs and can give new context to known motifs as well as enable the rapid compilation of RNA 3D motifs into a database.
Collapse
Affiliation(s)
- Reeki Emrizal
- Department of Applied Physics, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, UKM Bangi, Bangi 43600, Selangor, Malaysia;
- Institute of Systems Biology, Universiti Kebangsaan Malaysia, UKM Bangi, Bangi 43600, Selangor, Malaysia
| | - Hazrina Yusof Hamdani
- Advanced Medical and Dental Institute, Universiti Sains Malaysia, Bertam, Kepala Batas 13200, Pulau Pinang, Malaysia
- Correspondence: (H.Y.H.); (M.F.-R.)
| | - Mohd Firdaus-Raih
- Department of Applied Physics, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, UKM Bangi, Bangi 43600, Selangor, Malaysia;
- Institute of Systems Biology, Universiti Kebangsaan Malaysia, UKM Bangi, Bangi 43600, Selangor, Malaysia
- Correspondence: (H.Y.H.); (M.F.-R.)
| |
Collapse
|
32
|
Binzel DW, Li X, Burns N, Khan E, Lee WJ, Chen LC, Ellipilli S, Miles W, Ho YS, Guo P. Thermostability, Tunability, and Tenacity of RNA as Rubbery Anionic Polymeric Materials in Nanotechnology and Nanomedicine-Specific Cancer Targeting with Undetectable Toxicity. Chem Rev 2021; 121:7398-7467. [PMID: 34038115 PMCID: PMC8312718 DOI: 10.1021/acs.chemrev.1c00009] [Citation(s) in RCA: 56] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
RNA nanotechnology is the bottom-up self-assembly of nanometer-scale architectures, resembling LEGOs, composed mainly of RNA. The ideal building material should be (1) versatile and controllable in shape and stoichiometry, (2) spontaneously self-assemble, and (3) thermodynamically, chemically, and enzymatically stable with a long shelf life. RNA building blocks exhibit each of the above. RNA is a polynucleic acid, making it a polymer, and its negative-charge prevents nonspecific binding to negatively charged cell membranes. The thermostability makes it suitable for logic gates, resistive memory, sensor set-ups, and NEM devices. RNA can be designed and manipulated with a level of simplicity of DNA while displaying versatile structure and enzyme activity of proteins. RNA can fold into single-stranded loops or bulges to serve as mounting dovetails for intermolecular or domain interactions without external linking dowels. RNA nanoparticles display rubber- and amoeba-like properties and are stretchable and shrinkable through multiple repeats, leading to enhanced tumor targeting and fast renal excretion to reduce toxicities. It was predicted in 2014 that RNA would be the third milestone in pharmaceutical drug development. The recent approval of several RNA drugs and COVID-19 mRNA vaccines by FDA suggests that this milestone is being realized. Here, we review the unique properties of RNA nanotechnology, summarize its recent advancements, describe its distinct attributes inside or outside the body and discuss potential applications in nanotechnology, medicine, and material science.
Collapse
Affiliation(s)
- Daniel W Binzel
- Center for RNA Nanobiotechnology and Nanomedicine, College of Pharmacy, Dorothy M. Davis Heart and Lung Research Institute, James Comprehensive Cancer Center, College of Medicine, The Ohio State University, Columbus, Ohio 43210, United States
| | - Xin Li
- Center for RNA Nanobiotechnology and Nanomedicine, College of Pharmacy, Dorothy M. Davis Heart and Lung Research Institute, James Comprehensive Cancer Center, College of Medicine, The Ohio State University, Columbus, Ohio 43210, United States
| | - Nicolas Burns
- Center for RNA Nanobiotechnology and Nanomedicine, College of Pharmacy, Dorothy M. Davis Heart and Lung Research Institute, James Comprehensive Cancer Center, College of Medicine, The Ohio State University, Columbus, Ohio 43210, United States
| | - Eshan Khan
- Department of Cancer Biology and Genetics, The Ohio State University Comprehensive Cancer Center, College of Medicine, Center for RNA Biology, The Ohio State University, Columbus, Ohio 43210, United States
| | - Wen-Jui Lee
- TMU Research Center of Cancer Translational Medicine, School of Medical Laboratory Science and Biotechnology, College of Medical Science and Technology, Graduate Institute of Medical Sciences, College of Medicine, Taipei Medical University, Department of Laboratory Medicine, Taipei Medical University Hospital, Taipei 110, Taiwan
| | - Li-Ching Chen
- TMU Research Center of Cancer Translational Medicine, School of Medical Laboratory Science and Biotechnology, College of Medical Science and Technology, Graduate Institute of Medical Sciences, College of Medicine, Taipei Medical University, Department of Laboratory Medicine, Taipei Medical University Hospital, Taipei 110, Taiwan
| | - Satheesh Ellipilli
- Center for RNA Nanobiotechnology and Nanomedicine, College of Pharmacy, Dorothy M. Davis Heart and Lung Research Institute, James Comprehensive Cancer Center, College of Medicine, The Ohio State University, Columbus, Ohio 43210, United States
| | - Wayne Miles
- Department of Cancer Biology and Genetics, The Ohio State University Comprehensive Cancer Center, College of Medicine, Center for RNA Biology, The Ohio State University, Columbus, Ohio 43210, United States
| | - Yuan Soon Ho
- TMU Research Center of Cancer Translational Medicine, School of Medical Laboratory Science and Biotechnology, College of Medical Science and Technology, Graduate Institute of Medical Sciences, College of Medicine, Taipei Medical University, Department of Laboratory Medicine, Taipei Medical University Hospital, Taipei 110, Taiwan
| | - Peixuan Guo
- Center for RNA Nanobiotechnology and Nanomedicine, College of Pharmacy, Dorothy M. Davis Heart and Lung Research Institute, James Comprehensive Cancer Center, College of Medicine, The Ohio State University, Columbus, Ohio 43210, United States
| |
Collapse
|
33
|
Islam S, Rahaman MM, Zhang S. RNAMotifContrast: a method to discover and visualize RNA structural motif subfamilies. Nucleic Acids Res 2021; 49:e61. [PMID: 33693841 PMCID: PMC8216276 DOI: 10.1093/nar/gkab131] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2020] [Revised: 02/16/2021] [Accepted: 02/18/2021] [Indexed: 01/17/2023] Open
Abstract
Understanding the 3D structural properties of RNAs will play a critical role in identifying their functional characteristics and designing new RNAs for RNA-based therapeutics and nanotechnology. While several existing computational methods can help in the analysis of RNA properties by recognizing structural motifs, they do not provide the means to compare and contrast those motifs extensively. We have developed a new method, RNAMotifContrast, which focuses on analyzing the similarities and variations of RNA structural motif characteristics. In this method, a graph is formed to represent the similarities among motifs, and a new traversal algorithm is applied to generate visualizations of their structural properties. Analyzing the structural features among motifs, we have recognized and generalized the concept of motif subfamilies. To asses its effectiveness, we have applied RNAMotifContrast on a dataset of known RNA structural motif families. From the results, we observed that the derived subfamilies possess unique structural variations while holding standard features of the families. Overall, the visualization approach of this method presents a new perspective to observe the relation among motifs more closely, and the discovered subfamilies provide opportunities to achieve valuable insights into RNA’s diverse roles.
Collapse
Affiliation(s)
- Shahidul Islam
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Md Mahfuzur Rahaman
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Shaojie Zhang
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| |
Collapse
|
34
|
Becquey L, Angel E, Tahi F. RNANet: an automatically built dual-source dataset integrating homologous sequences and RNA structures. Bioinformatics 2021; 37:1218-1224. [PMID: 33135044 PMCID: PMC8189678 DOI: 10.1093/bioinformatics/btaa944] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Revised: 10/09/2020] [Accepted: 10/27/2020] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Applied research in machine learning progresses faster when a clean dataset is available and ready to use. Several datasets have been proposed and released over the years for specific tasks such as image classification, speech-recognition and more recently for protein structure prediction. However, for the fundamental problem of RNA structure prediction, information is spread between several databases depending on the level we are interested in: sequence, secondary structure, 3D structure or interactions with other macromolecules. In order to speed-up advances in machine-learning based approaches for RNA secondary and/or 3D structure prediction, a dataset integrating all this information is required, to avoid spending time on data gathering and cleaning. RESULTS Here, we propose the first attempt of a standardized and automatically generated dataset dedicated to RNA combining together: RNA sequences, homology information (under the form of position-specific scoring matrices) and information derived by annotation of available 3D structures (including secondary structure, canonical and non-canonical interactions and backbone torsion angles). The data are retrieved from public databases PDB, Rfam and SILVA. The paper describes the procedure to build such dataset and the RNA structure descriptors we provide. Some statistical descriptions of the resulting dataset are also provided. AVAILABILITY AND IMPLEMENTATION The dataset is updated every month and available online (in flat-text file format) on the EvryRNA software platform (https://evryrna.ibisc.univ-evry.fr/evryrna/rnanet). An efficient parallel pipeline to build the dataset is also provided for easy reproduction or modification. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Louis Becquey
- Université Paris-Saclay, Univ Evry, IBISC, Evry-Courcouronnes 91020, France
| | - Eric Angel
- Université Paris-Saclay, Univ Evry, IBISC, Evry-Courcouronnes 91020, France
| | - Fariza Tahi
- Université Paris-Saclay, Univ Evry, IBISC, Evry-Courcouronnes 91020, France
| |
Collapse
|
35
|
Soulé A, Reinharz V, Sarrazin-Gendron R, Denise A, Waldispühl J. Finding recurrent RNA structural networks with fast maximal common subgraphs of edge-colored graphs. PLoS Comput Biol 2021; 17:e1008990. [PMID: 34048427 PMCID: PMC8191989 DOI: 10.1371/journal.pcbi.1008990] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 06/10/2021] [Accepted: 04/22/2021] [Indexed: 11/25/2022] Open
Abstract
RNA tertiary structure is crucial to its many non-coding molecular functions. RNA architecture is shaped by its secondary structure composed of stems, stacked canonical base pairs, enclosing loops. While stems are precisely captured by free-energy models, loops composed of non-canonical base pairs are not. Nor are distant interactions linking together those secondary structure elements (SSEs). Databases of conserved 3D geometries (a.k.a. modules) not captured by energetic models are leveraged for structure prediction and design, but the computational complexity has limited their study to local elements, loops. Representing the RNA structure as a graph has recently allowed to expend this work to pairs of SSEs, uncovering a hierarchical organization of these 3D modules, at great computational cost. Systematically capturing recurrent patterns on a large scale is a main challenge in the study of RNA structures. In this paper, we present an efficient algorithm to compute maximal isomorphisms in edge colored graphs. We extend this algorithm to a framework well suited to identify RNA modules, and fast enough to considerably generalize previous approaches. To exhibit the versatility of our framework, we first reproduce results identifying all common modules spanning more than 2 SSEs, in a few hours instead of weeks. The efficiency of our new algorithm is demonstrated by computing the maximal modules between any pair of entire RNA in the non-redundant corpus of known RNA 3D structures. We observe that the biggest modules our method uncovers compose large shared sub-structure spanning hundreds of nucleotides and base pairs between the ribosomes of Thermus thermophilus, Escherichia Coli, and Pseudomonas aeruginosa. Ribonucleic Acids (RNAs) are performing a broad range of essential molecular functions in cells, many of which rely on intricate folding properties of the molecule. Watson-Crick and Wobble base pairs form early, stack onto each other to create stems connected by loops, which are themselves stabilized by more sophisticated base interaction patterns. These networks are essential to shape RNA 3D structures but unfortunately still poorly understood. Here, we undertake the task to build a catalog of base interaction networks occurring in multiple structures. However, a pairwise comparison of all RNA structures is computationally heavy. Therefore, we devise an algorithm leveraging intrinsic properties of RNA base interaction networks that enables us to quickly mine full databases of 3D structures. Compared to previous methods, our techniques bring the total running time of the analysis from months to hours while performing more general searches. The data collected though this work will benefit molecular evolution studies and serve in structure prediction tools.
Collapse
Affiliation(s)
- Antoine Soulé
- School of Computer Science, McGill University, Montréal, Canada
- LiX, École Polytechnique, Paris, France
| | - Vladimir Reinharz
- Department of Computer Science, Université du Québec à Montréal, Montréal, Canada
| | | | - Alain Denise
- Laboratoire de recherche en informatique, Université Paris-Saclay - CNRS, Orsay, France
- Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay - CEA - CNRS, Gif-sur-Yvette, France
| | - Jérôme Waldispühl
- School of Computer Science, McGill University, Montréal, Canada
- * E-mail:
| |
Collapse
|
36
|
Zhang T, Singh J, Litfin T, Zhan J, Paliwal K, Zhou Y. RNAcmap: A Fully Automatic Pipeline for Predicting Contact Maps of RNAs by Evolutionary Coupling Analysis. Bioinformatics 2021; 37:3494-3500. [PMID: 34021744 DOI: 10.1093/bioinformatics/btab391] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2020] [Revised: 03/27/2021] [Accepted: 05/18/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION The accuracy of RNA secondary and tertiary structure prediction can be significantly improved by using structural restraints derived from evolutionary coupling or direct coupling analysis. Currently, these coupling analyses relied on manually curated multiple sequence alignments collected in the Rfam database, which contains 3016 families. By comparison, millions of non-coding RNA sequences are known. Here, we established RNAcmap, a fully automatic pipeline that enables evolutionary coupling analysis for any RNA sequences. The homology search was based on the covariance model built by INFERNAL according to two secondary structure predictors: a folding-based algorithm RNAfold and the latest deep-learning method SPOT-RNA. RESULTS We showed that the performance of RNAcmap is less dependent on the specific evolutionary coupling tool but is more dependent on the accuracy of secondary structure predictor with the best performance given by RNAcmap (SPOT-RNA). The performance of RNAcmap (SPOT-RNA) is comparable to that based on Rfam-supplied alignment and consistent for those sequences that are not in Rfam collections. Further improvement can be made with a simple meta predictor RNAcmap (SPOT-RNA/RNAfold) depending on which secondary structure predictor can find more homologous sequences. Reliable base-pairing information generated from RNAcmap, for RNAs with high effective homologous sequences, in particular, will be useful for aiding RNA structure prediction. AVAILABILITY RNAcmap is available as a web server at https://sparks-lab.org/server/rnacmap/ and as a standalone application along with the datasets at https://github.com/sparks-lab-org/RNAcmap_standalone. A platform independent and fully configured docker image of RNAcmap is also provided at https://hub.docker.com/r/jaswindersingh2/rnacmap.
Collapse
Affiliation(s)
- Tongchuan Zhang
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Dr. Southport, QLD 4222, Australia
| | - Jaswinder Singh
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD 4111, Australia
| | - Thomas Litfin
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Dr. Southport, QLD 4222, Australia
| | - Jian Zhan
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Dr. Southport, QLD 4222, Australia
| | - Kuldip Paliwal
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD 4111, Australia
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Dr. Southport, QLD 4222, Australia.,Institute for Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518055, China
| |
Collapse
|
37
|
Studying RNA-Protein Complexes Using X-Ray Crystallography. Methods Mol Biol 2021; 2263:423-446. [PMID: 33877611 DOI: 10.1007/978-1-0716-1197-5_20] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/14/2023]
Abstract
A wide range of biological processes rely on complexes between ribonucleic acids (RNAs) and proteins. Determining the three-dimensional structures of RNA-protein complexes is crucial to elucidate the relationship between structure and biological function. X-ray crystallography represents the most widely used technique to characterize RNA-protein complexes at atomic resolution; however, determining their three-dimensional structures remains challenging. RNase contamination can ruin crystallization experiments by degrading RNA in complex with protein, leading to sample heterogeneity, and the conformational flexibility inherent in both protein and RNA can limit crystallizability. Furthermore, the three-dimensional structure can be difficult to accurately model at the typical diffraction limit of 2.5 Å resolution or lower for RNA-protein complex crystals. At this resolution, phosphates, which are electron dense, and bases, which are large, rigid, and planar, tend to be well resolved and easy to position in the electron density map, whereas other features, e.g., sugar atoms, can be difficult to accurately position. This chapter focuses on methods that can be used to overcome the unique problems faced when crystallizing RNA-protein complexes and determining their three-dimensional structures using X-ray crystallography.
Collapse
|
38
|
Rangan R, Watkins AM, Chacon J, Kretsch R, Kladwang W, Zheludev IN, Townley J, Rynge M, Thain G, Das R. De novo 3D models of SARS-CoV-2 RNA elements from consensus experimental secondary structures. Nucleic Acids Res 2021; 49:3092-3108. [PMID: 33693814 PMCID: PMC8034642 DOI: 10.1093/nar/gkab119] [Citation(s) in RCA: 64] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Revised: 02/08/2021] [Accepted: 02/16/2021] [Indexed: 12/12/2022] Open
Abstract
The rapid spread of COVID-19 is motivating development of antivirals targeting conserved SARS-CoV-2 molecular machinery. The SARS-CoV-2 genome includes conserved RNA elements that offer potential small-molecule drug targets, but most of their 3D structures have not been experimentally characterized. Here, we provide a compilation of chemical mapping data from our and other labs, secondary structure models, and 3D model ensembles based on Rosetta's FARFAR2 algorithm for SARS-CoV-2 RNA regions including the individual stems SL1-8 in the extended 5' UTR; the reverse complement of the 5' UTR SL1-4; the frameshift stimulating element (FSE); and the extended pseudoknot, hypervariable region, and s2m of the 3' UTR. For eleven of these elements (the stems in SL1-8, reverse complement of SL1-4, FSE, s2m and 3' UTR pseudoknot), modeling convergence supports the accuracy of predicted low energy states; subsequent cryo-EM characterization of the FSE confirms modeling accuracy. To aid efforts to discover small molecule RNA binders guided by computational models, we provide a second set of similarly prepared models for RNA riboswitches that bind small molecules. Both datasets ('FARFAR2-SARS-CoV-2', https://github.com/DasLab/FARFAR2-SARS-CoV-2; and 'FARFAR2-Apo-Riboswitch', at https://github.com/DasLab/FARFAR2-Apo-Riboswitch') include up to 400 models for each RNA element, which may facilitate drug discovery approaches targeting dynamic ensembles of RNA molecules.
Collapse
Affiliation(s)
- Ramya Rangan
- Biophysics Program, Stanford University, Stanford, CA 94305, USA
| | - Andrew M Watkins
- Department of Biochemistry, Stanford University School of Medicine, Stanford CA 94305, USA
| | - Jose Chacon
- Department of Biochemistry, Stanford University School of Medicine, Stanford CA 94305, USA
| | - Rachael Kretsch
- Biophysics Program, Stanford University, Stanford, CA 94305, USA
| | - Wipapat Kladwang
- Department of Biochemistry, Stanford University School of Medicine, Stanford CA 94305, USA
| | - Ivan N Zheludev
- Department of Biochemistry, Stanford University School of Medicine, Stanford CA 94305, USA
| | | | - Mats Rynge
- Information Sciences Institute, University of Southern California, Marina Del Rey, CA 90292, USA
| | - Gregory Thain
- Department of Computer Sciences, University of Wisconsin–Madison, Madison, WI 53706 USA
| | - Rhiju Das
- Biophysics Program, Stanford University, Stanford, CA 94305, USA
- Department of Biochemistry, Stanford University School of Medicine, Stanford CA 94305, USA
- Department of Physics, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
39
|
An RNA-centric historical narrative around the Protein Data Bank. J Biol Chem 2021; 296:100555. [PMID: 33744291 PMCID: PMC8080527 DOI: 10.1016/j.jbc.2021.100555] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Revised: 02/17/2021] [Accepted: 03/16/2021] [Indexed: 01/06/2023] Open
Abstract
Some of the amazing contributions brought to the scientific community by the Protein Data Bank (PDB) are described. The focus is on nucleic acid structures with a bias toward RNA. The evolution and key roles in science of the PDB and other structural databases for nucleic acids illustrate how small initial ideas can become huge and indispensable resources with the unflinching willingness of scientists to cooperate globally. The progress in the understanding of the molecular interactions driving RNA architectures followed the rapid increase in RNA structures in the PDB. That increase was consecutive to improvements in chemical synthesis and purification of RNA molecules, as well as in biophysical methods for structure determination and computer technology. The RNA modeling efforts from the early beginnings are also described together with their links to the state of structural knowledge and technological development. Structures of RNA and of its assemblies are physical objects, which, together with genomic data, allow us to integrate present-day biological functions and the historical evolution in all living species on earth.
Collapse
|
40
|
Moore PB, Petrov A, Westhof E, Zirbel CL. Neocles B. Leontis (1955 - 2020). RNA (NEW YORK, N.Y.) 2021; 27:rna.078673.121. [PMID: 33452229 PMCID: PMC7962483 DOI: 10.1261/rna.078673.121] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Accepted: 01/07/2021] [Indexed: 06/12/2023]
Affiliation(s)
- Peter B Moore
- Department of chemistry Yale University, 225 Prospect St, New Haven, CT 06511-8499
| | - Anton Petrov
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Eric Westhof
- Université de Strasbourg, Institut de biologie moléculaire et cellulaire du CNRS, Architecture et Réactivité de l'ARN, Strasbourg, France;
| | - Craig L Zirbel
- Department of Mathematics and Statistics Bowling Green State University Bowling Green, OH 43403
| |
Collapse
|
41
|
Reinharz V, Sarrazin-Gendron R, Waldispühl J. Modeling and Predicting RNA Three-Dimensional Structures. Methods Mol Biol 2021; 2284:17-42. [PMID: 33835435 DOI: 10.1007/978-1-0716-1307-8_2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Modeling the three-dimensional structure of RNAs is a milestone toward better understanding and prediction of nucleic acids molecular functions. Physics-based approaches and molecular dynamics simulations are not tractable on large molecules with all-atom models. To address this issue, coarse-grained models of RNA three-dimensional structures have been developed. In this chapter, we describe a graphical modeling based on the Leontis-Westhof extended base pair classification. This representation of RNA structures enables us to identify highly conserved structural motifs with complex nucleotide interactions in structure databases. We show how to take advantage of this knowledge to quickly predict three-dimensional structures of large RNA molecules and present the RNA-MoIP web server (http://rnamoip.cs.mcgill.ca) that streamlines the computational and visualization processes. Finally, we show recent advances in the prediction of local 3D motifs from sequence data with the BayesPairing software and discuss its impact toward complete 3D structure prediction.
Collapse
Affiliation(s)
- Vladimir Reinharz
- Department of Computer Science, Université du Québec à Montréal, Montréal, QC, Canada
| | | | - Jérôme Waldispühl
- School of Computer Science, McGill University, Montréal, QC, Canada.
| |
Collapse
|
42
|
Baulin E, Metelev V, Bogdanov A. Base-intercalated and base-wedged stacking elements in 3D-structure of RNA and RNA-protein complexes. Nucleic Acids Res 2020; 48:8675-8685. [PMID: 32687167 PMCID: PMC7470943 DOI: 10.1093/nar/gkaa610] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Revised: 07/05/2020] [Accepted: 07/15/2020] [Indexed: 12/25/2022] Open
Abstract
Along with nucleobase pairing, base-base stacking interactions are one of the two main types of strong non-covalent interactions that define the unique secondary and tertiary structure of RNA. In this paper we studied two subfamilies of nucleobase-inserted stacking structures: (i) with any base intercalated between neighboring nucleotide residues (base-intercalated element, BIE, i + 1); (ii) with any base wedged into a hydrophobic cavity formed by heterocyclic bases of two nucleotides which are one nucleotide apart in sequence (base-wedged element, BWE, i + 2). We have exploited the growing database of natively folded RNA structures in Protein Data Bank to analyze the distribution and structural role of these motifs in RNA. We found that these structural elements initially found in yeast tRNAPhe are quite widespread among the tertiary structures of various RNAs. These motifs perform diverse roles in RNA 3D structure formation and its maintenance. They contribute to the folding of RNA bulges and loops and participate in long-range interactions of single-stranded stretches within RNA macromolecules. Furthermore, both base-intercalated and base-wedged motifs participate directly or indirectly in the formation of RNA functional centers, which interact with various ligands, antibiotics and proteins.
Collapse
Affiliation(s)
- Eugene Baulin
- Laboratory of Applied Mathematics, Institute of Mathematical Problems of Biology RAS - the Branch of Keldysh Institute of Applied Mathematics of Russian Academy of Sciences, Pushchino, Moscow Region 142290, Russia
| | - Valeriy Metelev
- Department of Chemistry, Lomonosov Moscow State University, Moscow 119991, Russia
| | - Alexey Bogdanov
- To whom correspondence should be addressed. Tel: +7 495 9393143; Fax: +7 495 9393181;
| |
Collapse
|
43
|
Zhang H, Zhang H, Chen C. Simulation Study of the Plasticity of k-Turn Motif in Different Environments. Biophys J 2020; 119:1416-1426. [PMID: 32918889 DOI: 10.1016/j.bpj.2020.08.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2020] [Revised: 07/15/2020] [Accepted: 08/12/2020] [Indexed: 10/23/2022] Open
Abstract
The k-turn is a widespread and important motif in RNA. According to the internal hydrogen bond network, it has two stable states, called N1 and N3. The relative stability between the states changes with the environment. It is able to accept different conformations in different environments. This is called the "plasticity" of a molecule. In this work, we study the plasticity of k-turn by the mixing REMD method in explicit solvent. The results are concluded as follows. First, N1 and N3 are almost equally stable when k-turn is in the solvent alone. The molecule is quite flexible as a hinge. However, after binding to different proteins, such as the proteins L7Ae and L24e, k-turn falls into one global minimum. The preferred state could be either N1 or N3. On the contrary, the other nonpreferred state becomes unstable with a weaker binding affinity to the protein. It reveals that RNA-binding protein is able to modulate the representative state of k-turn at equilibrium. This is in agreement with the findings in experiments. Moreover, free energy calculations show that the free energy barrier between the N1 and N3 states of k-turn increases in the complexes. The state-to-state transition is greatly impeded. We also give a deep discussion on the mechanism of the high plasticity of k-turn in different environments.
Collapse
Affiliation(s)
- Haomiao Zhang
- Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Haozhe Zhang
- Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Changjun Chen
- Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei, China.
| |
Collapse
|
44
|
Oliver C, Mallet V, Gendron RS, Reinharz V, Hamilton W, Moitessier N, Waldispühl J. Augmented base pairing networks encode RNA-small molecule binding preferences. Nucleic Acids Res 2020; 48:7690-7699. [PMID: 32652015 PMCID: PMC7430648 DOI: 10.1093/nar/gkaa583] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Revised: 06/23/2020] [Accepted: 07/08/2020] [Indexed: 12/14/2022] Open
Abstract
RNA-small molecule binding is a key regulatory mechanism which can stabilize 3D structures and activate molecular functions. The discovery of RNA-targeting compounds is thus a current topic of interest for novel therapies. Our work is a first attempt at bringing the scalability and generalization abilities of machine learning methods to the problem of RNA drug discovery, as well as a step towards understanding the interactions which drive binding specificity. Our tool, RNAmigos, builds and encodes a network representation of RNA structures to predict likely ligands for novel binding sites. We subject ligand predictions to virtual screening and show that we are able to place the true ligand in the 71st-73rd percentile in two decoy libraries, showing a significant improvement over several baselines, and a state of the art method. Furthermore, we observe that augmenting structural networks with non-canonical base pairing data is the only representation able to uncover a significant signal, suggesting that such interactions are a necessary source of binding specificity. We also find that pre-training with an auxiliary graph representation learning task significantly boosts performance of ligand prediction. This finding can serve as a general principle for RNA structure-function prediction when data is scarce. RNAmigos shows that RNA binding data contains structural patterns with potential for drug discovery, and provides methodological insights for possible applications to other structure-function learning tasks. The source code, data and a Web server are freely available at http://rnamigos.cs.mcgill.ca.
Collapse
Affiliation(s)
- Carlos Oliver
- School of Computer Science, McGill University, Montreal H3A 0E9, Canada
- Mila - Quebec Artificial Intelligence Institute, H2S 3S1, Canada
| | - Vincent Mallet
- Institut Pasteur, Structural Bioinformatics Unit, Paris, F-75015, France
- MINES ParisTech, PSL Research University, CBIO - Centre for Computational Biology, F-75006 Paris, France
| | | | - Vladimir Reinharz
- Department of Computer Science, Université du Québec à Montréal, Montreal H2X 3Y7, Canada
| | - William L Hamilton
- School of Computer Science, McGill University, Montreal H3A 0E9, Canada
- Mila - Quebec Artificial Intelligence Institute, H2S 3S1, Canada
| | | | - Jérôme Waldispühl
- School of Computer Science, McGill University, Montreal H3A 0E9, Canada
| |
Collapse
|
45
|
Gallego D, Darré L, Dans PD, Orozco M. VeriNA3d: an R package for nucleic acids data mining. Bioinformatics 2020; 35:5334-5336. [PMID: 31286135 DOI: 10.1093/bioinformatics/btz553] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Revised: 06/20/2019] [Accepted: 07/06/2019] [Indexed: 11/12/2022] Open
Abstract
SUMMARY veriNA3d is an R package for the analysis of nucleic acids structural data, with an emphasis in complex RNA structures. In addition to single-structure analyses, veriNA3d also implements functions to handle whole datasets of mmCIF/PDB structures that could be retrieved from public/local repositories. Our package aims to fill a gap in the data mining of nucleic acids structures to produce flexible and high throughput analysis of structural databases. AVAILABILITY AND IMPLEMENTATION http://mmb.irbbarcelona.org/gitlab/dgallego/veriNA3d. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Diego Gallego
- Computational Biology Node, Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology.,Department of Biochemistry and Biomedicine, Faculty of Biology, University of Barcelona, Barcelona, Spain
| | - Leonardo Darré
- Computational Biology Node, Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology.,Functional Genomics Laboratory and Biomolecular Simulations Laboratory, Institute Pasteur of Montevideo, Montevideo, Uruguay
| | - Pablo D Dans
- Computational Biology Node, Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology
| | - Modesto Orozco
- Computational Biology Node, Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology.,Department of Biochemistry and Biomedicine, Faculty of Biology, University of Barcelona, Barcelona, Spain
| |
Collapse
|
46
|
Structural Insights into RNA Dimerization: Motifs, Interfaces and Functions. Molecules 2020; 25:molecules25122881. [PMID: 32585844 PMCID: PMC7357161 DOI: 10.3390/molecules25122881] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2020] [Revised: 06/18/2020] [Accepted: 06/19/2020] [Indexed: 12/26/2022] Open
Abstract
In comparison with the pervasive use of protein dimers and multimers in all domains of life, functional RNA oligomers have so far rarely been observed in nature. Their diminished occurrence contrasts starkly with the robust intrinsic potential of RNA to multimerize through long-range base-pairing ("kissing") interactions, self-annealing of palindromic or complementary sequences, and stable tertiary contact motifs, such as the GNRA tetraloop-receptors. To explore the general mechanics of RNA dimerization, we performed a meta-analysis of a collection of exemplary RNA homodimer structures consisting of viral genomic elements, ribozymes, riboswitches, etc., encompassing both functional and fortuitous dimers. Globally, we found that domain-swapped dimers and antiparallel, head-to-tail arrangements are predominant architectural themes. Locally, we observed that the same structural motifs, interfaces and forces that enable tertiary RNA folding also drive their higher-order assemblies. These feature prominently long-range kissing loops, pseudoknots, reciprocal base intercalations and A-minor interactions. We postulate that the scarcity of functional RNA multimers and limited diversity in multimerization motifs may reflect evolutionary constraints imposed by host antiviral immune surveillance and stress sensing. A deepening mechanistic understanding of RNA multimerization is expected to facilitate investigations into RNA and RNP assemblies, condensates, and granules and enable their potential therapeutical targeting.
Collapse
|
47
|
Černý J, Božíková P, Svoboda J, Schneider B. A unified dinucleotide alphabet describing both RNA and DNA structures. Nucleic Acids Res 2020; 48:6367-6381. [PMID: 32406923 PMCID: PMC7293047 DOI: 10.1093/nar/gkaa383] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2020] [Revised: 04/11/2020] [Accepted: 04/30/2020] [Indexed: 12/13/2022] Open
Abstract
By analyzing almost 120 000 dinucleotides in over 2000 nonredundant nucleic acid crystal structures, we define 96+1 diNucleotide Conformers, NtCs, which describe the geometry of RNA and DNA dinucleotides. NtC classes are grouped into 15 codes of the structural alphabet CANA (Conformational Alphabet of Nucleic Acids) to simplify symbolic annotation of the prominent structural features of NAs and their intuitive graphical display. The search for nontrivial patterns of NtCs resulted in the identification of several types of RNA loops, some of them observed for the first time. Over 30% of the nearly six million dinucleotides in the PDB cannot be assigned to any NtC class but we demonstrate that up to a half of them can be re-refined with the help of proper refinement targets. A statistical analysis of the preferences of NtCs and CANA codes for the 16 dinucleotide sequences showed that neither the NtC class AA00, which forms the scaffold of RNA structures, nor BB00, the DNA most populated class, are sequence neutral but their distributions are significantly biased. The reported automated assignment of the NtC classes and CANA codes available at dnatco.org provides a powerful tool for unbiased analysis of nucleic acid structures by structural and molecular biologists.
Collapse
Affiliation(s)
- Jiří Černý
- Institute of Biotechnology of the Czech Academy of Sciences, BIOCEV, CZ-252 50 Vestec, Prague-West, Czech Republic
| | - Paulína Božíková
- Institute of Biotechnology of the Czech Academy of Sciences, BIOCEV, CZ-252 50 Vestec, Prague-West, Czech Republic
| | - Jakub Svoboda
- Institute of Biotechnology of the Czech Academy of Sciences, BIOCEV, CZ-252 50 Vestec, Prague-West, Czech Republic
| | - Bohdan Schneider
- Institute of Biotechnology of the Czech Academy of Sciences, BIOCEV, CZ-252 50 Vestec, Prague-West, Czech Republic
| |
Collapse
|
48
|
Kasprzak WK, Ahmed NA, Shapiro BA. Modeling ligand docking to RNA in the design of RNA-based nanostructures. Curr Opin Biotechnol 2020; 63:16-25. [DOI: 10.1016/j.copbio.2019.10.010] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2019] [Accepted: 10/30/2019] [Indexed: 12/30/2022]
|
49
|
Becquey L, Angel E, Tahi F. BiORSEO: a bi-objective method to predict RNA secondary structures with pseudoknots using RNA 3D modules. Bioinformatics 2020; 36:2451-2457. [DOI: 10.1093/bioinformatics/btz962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Revised: 11/15/2019] [Accepted: 01/02/2020] [Indexed: 11/12/2022] Open
Abstract
Abstract
Motivation
RNA loops have been modelled and clustered from solved 3D structures into ordered collections of recurrent non-canonical interactions called ‘RNA modules’, available in databases. This work explores what information from such modules can be used to improve secondary structure prediction. We propose a bi-objective method for predicting RNA secondary structures by minimizing both an energy-based and a knowledge-based potential. The tool, called BiORSEO, outputs secondary structures corresponding to the optimal solutions from the Pareto set.
Results
We compare several approaches to predict secondary structures using inserted RNA modules information: two module data sources, Rna3Dmotif and the RNA 3D Motif Atlas, and different ways to score the module insertions: module size, module complexity or module probability according to models like JAR3D and BayesPairing. We benchmark them against a large set of known secondary structures, including some state-of-the-art tools, and comment on the usefulness of the half physics-based, half data-based approach.
Availability and implementation
The software is available for download on the EvryRNA website, as well as the datasets.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Louis Becquey
- Université Paris-Saclay, Univ Evry, IBISC, 91020, Evry, France
| | - Eric Angel
- Université Paris-Saclay, Univ Evry, IBISC, 91020, Evry, France
| | - Fariza Tahi
- Université Paris-Saclay, Univ Evry, IBISC, 91020, Evry, France
| |
Collapse
|
50
|
Stacking geometry between two sheared Watson-Crick basepairs: Computational chemistry and bioinformatics based prediction. Biochim Biophys Acta Gen Subj 2020; 1864:129600. [PMID: 32179130 DOI: 10.1016/j.bbagen.2020.129600] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2019] [Revised: 03/05/2020] [Accepted: 03/11/2020] [Indexed: 11/21/2022]
Abstract
BACKGROUND Molecular modeling of RNA double helices is possible using most probable values of basepair parameters obtained from crystal structure database. The A:A w:wC non-canonical basepair, involving Watson-Crick edges of two Adenines in cis orientation, appears quite frequently in database. Bimodal distribution of its Shear, due to two different H-bonding schemes, introduces the confusion in assigning most the probable value. Its effect is pronounced when the A:A w:wC basepair stacks on Sheared wobble G:U W:WC basepairs. METHODS We employed molecular dynamics simulations of three possible double helices with GAG, UAG and GAU sequence motifs at their centers and quantum chemical calculation for non-canonical A:A w:wC basepair stacked on G:U W:WC basepair. RESULTS We noticed stable structures of GAG motif with specifically negative Shear of the A:A basepair but stabilities of the other motifs were not found with A:A w:wC basepairing. Hybrid DFT-D and MP2 stacking energy analyses on dinucleotide step sequences, A:A w:wC::G:U W:WC and A:A w:wC::U:G W:WC reveal that viable orientation of A:A::G:U prefers one of the H-bonding modes with negative Shear, supported by crystal structure database. The A:A::U:G dinucleotide, however, prefers structure with only positive Shear. CONCLUSIONS The quantum chemical calculations explain why MD simulations of GAG sequence motif only appear stable. In the cases of the GAU and UAG motifs "tug of war" situation between positive and negative Shears of A:A w:wC basepair induces conformational plasticity. GENERAL SIGNIFICANCE We have projected comprehensive reason behind the promiscuous nature of A:A w:wC basepair which brings occasional structural plasticity.
Collapse
|