1
|
Bugnon LA, Di Persia L, Gerard M, Raad J, Prochetto S, Fenoy E, Chorostecki U, Ariel F, Stegmayer G, Milone DH. sincFold: end-to-end learning of short- and long-range interactions in RNA secondary structure. Brief Bioinform 2024; 25:bbae271. [PMID: 38855913 PMCID: PMC11163250 DOI: 10.1093/bib/bbae271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Revised: 05/03/2024] [Accepted: 05/24/2024] [Indexed: 06/11/2024] Open
Abstract
MOTIVATION Coding and noncoding RNA molecules participate in many important biological processes. Noncoding RNAs fold into well-defined secondary structures to exert their functions. However, the computational prediction of the secondary structure from a raw RNA sequence is a long-standing unsolved problem, which after decades of almost unchanged performance has now re-emerged due to deep learning. Traditional RNA secondary structure prediction algorithms have been mostly based on thermodynamic models and dynamic programming for free energy minimization. More recently deep learning methods have shown competitive performance compared with the classical ones, but there is still a wide margin for improvement. RESULTS In this work we present sincFold, an end-to-end deep learning approach, that predicts the nucleotides contact matrix using only the RNA sequence as input. The model is based on 1D and 2D residual neural networks that can learn short- and long-range interaction patterns. We show that structures can be accurately predicted with minimal physical assumptions. Extensive experiments were conducted on several benchmark datasets, considering sequence homology and cross-family validation. sincFold was compared with classical methods and recent deep learning models, showing that it can outperform the state-of-the-art methods.
Collapse
Affiliation(s)
- Leandro A Bugnon
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina
| | - Leandro Di Persia
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina
| | - Matias Gerard
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina
| | - Jonathan Raad
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina
| | - Santiago Prochetto
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina
- Instituto de Agrobiotecnología del Litoral, CONICET-UNL, CCT-Santa Fe, Ruta Nacional N° 168 Km 0, s/n, Paraje el Pozo, 3000, Santa Fe, Argentina
| | - Emilio Fenoy
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina
| | - Uciel Chorostecki
- Faculty of Medicine and Health Sciences, Universitat Internacional de Catalunya, Barcelona, Spain
| | - Federico Ariel
- Instituto de Agrobiotecnología del Litoral, CONICET-UNL, CCT-Santa Fe, Ruta Nacional N° 168 Km 0, s/n, Paraje el Pozo, 3000, Santa Fe, Argentina
| | - Georgina Stegmayer
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina
| | - Diego H Milone
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina
| |
Collapse
|
2
|
Bohdan DR, Voronina VV, Bujnicki JM, Baulin EF. A comprehensive survey of long-range tertiary interactions and motifs in non-coding RNA structures. Nucleic Acids Res 2023; 51:8367-8382. [PMID: 37471030 PMCID: PMC10484739 DOI: 10.1093/nar/gkad605] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Accepted: 07/07/2023] [Indexed: 07/21/2023] Open
Abstract
Understanding the 3D structure of RNA is key to understanding RNA function. RNA 3D structure is modular and can be seen as a composition of building blocks of various sizes called tertiary motifs. Currently, long-range motifs formed between distant loops and helical regions are largely less studied than the local motifs determined by the RNA secondary structure. We surveyed long-range tertiary interactions and motifs in a non-redundant set of non-coding RNA 3D structures. A new dataset of annotated LOng-RAnge RNA 3D modules (LORA) was built using an approach that does not rely on the automatic annotations of non-canonical interactions. An original algorithm, ARTEM, was developed for annotation-, sequence- and topology-independent superposition of two arbitrary RNA 3D modules. The proposed methods allowed us to identify and describe the most common long-range RNA tertiary motifs. Along with the prevalent canonical A-minor interactions, a large number of previously undescribed staple interactions were observed. The most frequent long-range motifs were found to belong to three main motif families: planar staples, tilted staples, and helical packing motifs.
Collapse
Affiliation(s)
- Davyd R Bohdan
- Department of Innovation and High Technology, Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
| | - Valeria V Voronina
- Department of Information Systems, Ulyanovsk State Technical University, Ulyanovsk 432027, Russia
| | - Janusz M Bujnicki
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, Warsaw 02-109, Poland
| | - Eugene F Baulin
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, Warsaw 02-109, Poland
| |
Collapse
|
3
|
González Buitrón M, Tunque Cahui RR, García Ríos E, Hirsh L, Parisi G, Fornasari MS, Palopoli N. CoDNaS-RNA: a database of conformational diversity in the native state of RNA. Bioinformatics 2022; 38:1745-1748. [PMID: 34954795 DOI: 10.1093/bioinformatics/btab858] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Revised: 09/29/2021] [Accepted: 12/23/2021] [Indexed: 02/03/2023] Open
Abstract
SUMMARY Conformational changes in RNA native ensembles are central to fulfill many of their biological roles. Systematic knowledge of the extent and possible modulators of this conformational diversity is desirable to better understand the relationship between RNA dynamics and function. We have developed CoDNaS-RNA as the first database of conformational diversity in RNA molecules. Known RNA structures are retrieved and clustered to identify alternative conformers of each molecule. Pairwise structural comparisons between all conformers within each cluster allows to measure the variability of the molecule. Additional annotations about structural features, molecular interactions and biological function are provided. All data in CoDNaS-RNA is free to download and available as a public website that can be of interest for researchers in computational biology and other life science disciplines. AVAILABILITY AND IMPLEMENTATION The data underlying this article are available at http://ufq.unq.edu.ar/codnasrna or https://codnas-rna.bioinformatica.org/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Martín González Buitrón
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Buenos Aires, Argentina.,Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
| | | | - Emilio García Ríos
- Departamento de Ingeniería, Pontificia Universidad Católica del Perú, Lima, Peru
| | - Layla Hirsh
- Departamento de Ingeniería, Pontificia Universidad Católica del Perú, Lima, Peru
| | - Gustavo Parisi
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Buenos Aires, Argentina.,Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
| | - María Silvina Fornasari
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Buenos Aires, Argentina.,Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
| | - Nicolas Palopoli
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Buenos Aires, Argentina.,Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
| |
Collapse
|
4
|
Guo ZH, Yuan L, Tan YL, Zhang BG, Shi YZ. RNAStat: An Integrated Tool for Statistical Analysis of RNA 3D Structures. FRONTIERS IN BIOINFORMATICS 2022; 1:809082. [PMID: 36303785 PMCID: PMC9580920 DOI: 10.3389/fbinf.2021.809082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Accepted: 12/17/2021] [Indexed: 11/13/2022] Open
Abstract
The 3D architectures of RNAs are essential for understanding their cellular functions. While an accurate scoring function based on the statistics of known RNA structures is a key component for successful RNA structure prediction or evaluation, there are few tools or web servers that can be directly used to make comprehensive statistical analysis for RNA 3D structures. In this work, we developed RNAStat, an integrated tool for making statistics on RNA 3D structures. For given RNA structures, RNAStat automatically calculates RNA structural properties such as size and shape, and shows their distributions. Based on the RNA structure annotation from DSSR, RNAStat provides statistical information of RNA secondary structure motifs including canonical/non-canonical base pairs, stems, and various loops. In particular, the geometry of base-pairing/stacking can be calculated in RNAStat by constructing a local coordinate system for each base. In addition, RNAStat also supplies the distribution of distance between any atoms to the users to help build distance-based RNA statistical potentials. To test the usability of the tool, we established a non-redundant RNA 3D structure dataset, and based on the dataset, we made a comprehensive statistical analysis on RNA structures, which could have the guiding significance for RNA structure modeling. The python code of RNAStat, the dataset used in this work, and corresponding statistical data files are freely available at GitHub (https://github.com/RNA-folding-lab/RNAStat).
Collapse
Affiliation(s)
- Zhi-Hao Guo
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University, Wuhan, China
- School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan, China
| | - Li Yuan
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University, Wuhan, China
- School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan, China
| | - Ya-Lan Tan
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University, Wuhan, China
| | - Ben-Gong Zhang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University, Wuhan, China
| | - Ya-Zhou Shi
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University, Wuhan, China
- *Correspondence: Ya-Zhou Shi,
| |
Collapse
|
5
|
Baulin EF. Features and Functions of the A-Minor Motif, the Most Common Motif in RNA Structure. BIOCHEMISTRY (MOSCOW) 2021; 86:952-961. [PMID: 34488572 DOI: 10.1134/s000629792108006x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
A-minor motifs are RNA tertiary structure motifs that generally involve a canonical base pair and an adenine base forming hydrogen bonds with the minor groove of the base pair. Such motifs are among the most common tertiary interactions in known RNA structures, comparable in number with the non-canonical base pairs. They are often found in functionally important regions of non-coding RNAs and, in particular, play a central role in protein synthesis. Here, we review local variations of the A-minor geometry and discuss difficulties associated with their annotation, as well as various structural contexts and common A-minor co-motifs, and diverse functions of A-minors in various processes in a living cell.
Collapse
Affiliation(s)
- Eugene F Baulin
- Institute of Mathematical Problems of Biology RAS - the Branch of Keldysh Institute of Applied Mathematics of the Russian Academy of Sciences, Pushchino, Moscow Region, 142290, Russia. .,Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, 141701, Russia
| |
Collapse
|
6
|
Shalybkova AA, Mikhailova DS, Kulakovskiy IV, Fakhranurova LI, Baulin EF. Annotation of the local context of the RNA secondary structure improves the classification and prediction of A-minors. RNA (NEW YORK, N.Y.) 2021; 27:rna.078535.120. [PMID: 34016706 PMCID: PMC8284323 DOI: 10.1261/rna.078535.120] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Accepted: 05/17/2021] [Indexed: 05/15/2023]
Abstract
Non-coding RNAs play a crucial role in various cellular processes in living organisms, and RNA functions heavily depend on molecule structures composed of stems, loops, and various tertiary motifs. Among those, the most frequent are A-minor interactions, which are often involved in the formation of more complex motifs such as kink-turns and pseudoknots. We present a novel classification of A-minors in terms of RNA secondary structure where each nucleotide of an A-minor is attributed to the stem or loop, and each pair of nucleotides is attributed to their relative position within the secondary structure. By analyzing classes of A-minors in known RNA structures, we found that the largest classes are mostly homogeneous and preferably localize with known A-minor co-motifs, e.g. tetraloop-tetraloop receptor and coaxial stacking. Detailed analysis of local A-minors within internal loops revealed a novel recurrent RNA tertiary motif, the across-bulged motif. Interestingly, the motif resembles the previously known GAAA/11nt motif but with the local adenines performing the role of the GAAA-tetraloop. By using machine learning, we show that particular classes of local A-minors can be predicted from sequence and secondary structure. The proposed classification is the first step toward automatic annotation of not only A-minors and their co-motifs but various types of RNA tertiary motifs as well.
Collapse
Affiliation(s)
| | | | - Ivan V Kulakovskiy
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences; Vavilov Institute of General Genetics, Russian Academy of Sciences; Institute of Protein Research, Russian Academy of Sciences
| | - Liliia I Fakhranurova
- Institute of Theoretical and Experimental Biophysics, Russian Academy of Sciences; Shemiakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences
| | - Eugene F Baulin
- Institute of Mathematical Problems of Biology RAS; Moscow Institute of Physics and Technology
| |
Collapse
|
7
|
Baulin E, Metelev V, Bogdanov A. Base-intercalated and base-wedged stacking elements in 3D-structure of RNA and RNA-protein complexes. Nucleic Acids Res 2020; 48:8675-8685. [PMID: 32687167 PMCID: PMC7470943 DOI: 10.1093/nar/gkaa610] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Revised: 07/05/2020] [Accepted: 07/15/2020] [Indexed: 12/25/2022] Open
Abstract
Along with nucleobase pairing, base-base stacking interactions are one of the two main types of strong non-covalent interactions that define the unique secondary and tertiary structure of RNA. In this paper we studied two subfamilies of nucleobase-inserted stacking structures: (i) with any base intercalated between neighboring nucleotide residues (base-intercalated element, BIE, i + 1); (ii) with any base wedged into a hydrophobic cavity formed by heterocyclic bases of two nucleotides which are one nucleotide apart in sequence (base-wedged element, BWE, i + 2). We have exploited the growing database of natively folded RNA structures in Protein Data Bank to analyze the distribution and structural role of these motifs in RNA. We found that these structural elements initially found in yeast tRNAPhe are quite widespread among the tertiary structures of various RNAs. These motifs perform diverse roles in RNA 3D structure formation and its maintenance. They contribute to the folding of RNA bulges and loops and participate in long-range interactions of single-stranded stretches within RNA macromolecules. Furthermore, both base-intercalated and base-wedged motifs participate directly or indirectly in the formation of RNA functional centers, which interact with various ligands, antibiotics and proteins.
Collapse
Affiliation(s)
- Eugene Baulin
- Laboratory of Applied Mathematics, Institute of Mathematical Problems of Biology RAS - the Branch of Keldysh Institute of Applied Mathematics of Russian Academy of Sciences, Pushchino, Moscow Region 142290, Russia
| | - Valeriy Metelev
- Department of Chemistry, Lomonosov Moscow State University, Moscow 119991, Russia
| | - Alexey Bogdanov
- To whom correspondence should be addressed. Tel: +7 495 9393143; Fax: +7 495 9393181;
| |
Collapse
|
8
|
Lu XJ. DSSR-enabled innovative schematics of 3D nucleic acid structures with PyMOL. Nucleic Acids Res 2020; 48:e74. [PMID: 32442277 PMCID: PMC7367123 DOI: 10.1093/nar/gkaa426] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2020] [Revised: 04/26/2020] [Accepted: 05/07/2020] [Indexed: 12/11/2022] Open
Abstract
Sophisticated analysis and simplified visualization are crucial for understanding complicated structures of biomacromolecules. DSSR (Dissecting the Spatial Structure of RNA) is an integrated computational tool that has streamlined the analysis and annotation of 3D nucleic acid structures. The program creates schematic block representations in diverse styles that can be seamlessly integrated into PyMOL and complement its other popular visualization options. In addition to portraying individual base blocks, DSSR can draw Watson-Crick pairs as long blocks and highlight the minor-groove edges. Notably, DSSR can dramatically simplify the depiction of G-quadruplexes by automatically detecting G-tetrads and treating them as large square blocks. The DSSR-enabled innovative schematics with PyMOL are aesthetically pleasing and highly informative: the base identity, pairing geometry, stacking interactions, double-helical stems, and G-quadruplexes are immediately obvious. These features can be accessed via four interfaces: the command-line interface, the DSSR plugin for PyMOL, the web application, and the web application programming interface. The supplemental PDF serves as a practical guide, with complete and reproducible examples. Thus, even beginners or occasional users can get started quickly, especially via the web application at http://skmatic.x3dna.org.
Collapse
Affiliation(s)
- Xiang-Jun Lu
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
| |
Collapse
|
9
|
Torkamanian-Afshar M, Lanjanian H, Nematzadeh S, Tabarzad M, Najafi A, Kiani F, Masoudi-Nejad A. RPINBASE: An online toolbox to extract features for predicting RNA-protein interactions. Genomics 2020; 112:2623-2632. [DOI: 10.1016/j.ygeno.2020.02.013] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2019] [Revised: 01/04/2020] [Accepted: 02/13/2020] [Indexed: 12/12/2022]
|