1
|
Wang J, Fan Y, Hong L, Hu Z, Li Y. Deep learning for RNA structure prediction. Curr Opin Struct Biol 2025; 91:102991. [PMID: 39933218 DOI: 10.1016/j.sbi.2025.102991] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2024] [Revised: 11/27/2024] [Accepted: 01/04/2025] [Indexed: 02/13/2025]
Abstract
Predicting RNA structures from sequences with computational approaches is of vital importance in RNA biology considering the high costs of experimental determination. AI methods have revolutionized this field in recent years, enabling RNA structure prediction with increasingly higher accuracy and efficiency. With an increase in the number of models proposed for this task, this review presents a timely summary of the applications of AI, particularly deep learning, in RNA structure prediction, highlighting their methodology advances as well as the challenges and opportunities for further work in this field.
Collapse
Affiliation(s)
- Jiuming Wang
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Yimin Fan
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Liang Hong
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Zhihang Hu
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Yu Li
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China.
| |
Collapse
|
2
|
Anand R, Joshi CK, Morehead A, Jamasb AR, Harris C, Mathis SV, Didi K, Ying R, Hooi B, Liò P. RNA-FrameFlow: Flow Matching for de novo 3D RNA Backbone Design. ARXIV 2025:arXiv:2406.13839v3. [PMID: 38947930 PMCID: PMC11213149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
We introduce RNA-FrameFlow, the first generative model for 3D RNA backbone design. We build upon S E ( 3 ) flow matching for protein backbone generation and establish protocols for data preparation and evaluation to address unique challenges posed by RNA modeling. We formulate RNA structures as a set of rigid-body frames and associated loss functions which account for larger, more conformationally flexible RNA backbones (13 atoms per nucleotide) vs. proteins (4 atoms per residue). Toward tackling the lack of diversity in 3D RNA datasets, we explore training with structural clustering and cropping augmentations. Additionally, we define a suite of evaluation metrics to measure whether the generated RNA structures are globally self-consistent (via inverse folding followed by forward folding) and locally recover RNA-specific structural descriptors. The most performant version of RNA-FrameFlow generates locally realistic RNA backbones of 40-150 nucleotides, over 40% of which pass our validity criteria as measured by a self-consistency TM-score ≥ 0.45, at which two RNAs have the same global fold. Open-source code: github.com/rish-16/rna-backbone-design.
Collapse
|
3
|
Upadhyay U, Pucci F, Herold J, Schug A. NucleoSeeker-precision filtering of RNA databases to curate high-quality datasets. NAR Genom Bioinform 2025; 7:lqaf021. [PMID: 40104673 PMCID: PMC11915511 DOI: 10.1093/nargab/lqaf021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2024] [Revised: 01/28/2025] [Accepted: 02/24/2025] [Indexed: 03/20/2025] Open
Abstract
The structural prediction of biomolecules via computational methods complements the often involved wet-lab experiments. Unlike protein structure prediction, RNA structure prediction remains a significant challenge in bioinformatics, primarily due to the scarcity of annotated RNA structure data and its varying quality. Many methods have used this limited data to train deep learning models but redundancy, data leakage and bad data quality hampers their performance. In this work, we present NucleoSeeker, a tool designed to curate high-quality, tailored datasets from the Protein Data Bank (PDB) database. It is a unified framework that combines multiple tools and streamlines an otherwise complicated process of data curation. It offers multiple filters at structure, sequence, and annotation levels, giving researchers full control over data curation. Further, we present several use cases. In particular, we demonstrate how NucleoSeeker allows the creation of a nonredundant RNA structure dataset to assess AlphaFold3's performance for RNA structure prediction. This demonstrates NucleoSeeker's effectiveness in curating valuable nonredundant tailored datasets to both train novel and judge existing methods. NucleoSeeker is very easy to use, highly flexible, and can significantly increase the quality of RNA structure datasets.
Collapse
Affiliation(s)
- Utkarsh Upadhyay
- John von Neumann Institute for Computing, Jülich Supercomputing Centre, 52428 Jülich, Germany
| | - Fabrizio Pucci
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Interuniversity Institute of Bioinformatics, 1050 Brussels, Belgium
| | - Julian Herold
- Scientific Computing Center, Karlsruhe Institute for Technology, 76344 Karlsruhe, Germany
| | - Alexander Schug
- John von Neumann Institute for Computing, Jülich Supercomputing Centre, 52428 Jülich, Germany
- Department of Biology, University of Duisburg-Essen, D-45141 Essen, Germany
| |
Collapse
|
4
|
Manzourolajdad A, Mohebbi M. Secondary-Structure-Informed RNA Inverse Design via Relational Graph Neural Networks. Noncoding RNA 2025; 11:18. [PMID: 40126342 PMCID: PMC11932209 DOI: 10.3390/ncrna11020018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2024] [Revised: 01/31/2025] [Accepted: 02/18/2025] [Indexed: 03/25/2025] Open
Abstract
RNA inverse design is an essential part of many RNA therapeutic strategies. To date, there have been great advances in computationally driven RNA design. The current machine learning approaches can predict the sequence of an RNA given its 3D structure with acceptable accuracy and at tremendous speed. The design and engineering of RNA regulators such as riboswitches, however, is often more difficult, partly due to their inherent conformational switching abilities. Although recent state-of-the-art models do incorporate information about the multiple structures that a sequence can fold into, there is great room for improvement in modeling structural switching. In this work, a relational geometric graph neural network is proposed that explicitly incorporates alternative structures to predict an RNA sequence. Converting the RNA structure into a geometric graph, the proposed model uses edge types to distinguish between the primary structure, secondary structure, and spatial positioning of the nucleotides in representing structures. The results show higher native sequence recovery rates over those of gRNAde across different test sets (eg. 72% vs. 66%) and a benchmark from the literature (60% vs. 57%). Secondary-structure edge types had a more significant impact on the sequence recovery than the spatial edge types as defined in this work. Overall, these results suggest the need for more complex and case-specific characterization of RNA for successful inverse design.
Collapse
Affiliation(s)
- Amirhossein Manzourolajdad
- Department of Computer Science, State University of New York Polytechnic Institute, 100 Seymour Rd., Utica, NY 13502, USA
| | - Mohammad Mohebbi
- Department of Computer Science and Information Science, University of North Georgia, Dahlonega, GA 30597, USA;
| |
Collapse
|
5
|
Joshi CK, Jamasb AR, Viñas R, Harris C, Mathis SV, Morehead A, Anand R, Liò P. gRNAde: Geometric Deep Learning for 3D RNA inverse design. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.03.31.587283. [PMID: 38826198 PMCID: PMC11142113 DOI: 10.1101/2024.03.31.587283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Computational RNA design tasks are often posed as inverse problems, where sequences are designed based on adopting a single desired secondary structure without considering 3D conformational diversity. We introduce gRNAde, a geometric RNA design pipeline operating on 3D RNA backbones to design sequences that explicitly account for structure and dynamics. gRNAde uses a multi-state Graph Neural Network and autoregressive decoding to generates candidate RNA sequences conditioned on one or more 3D backbone structures where the identities of the bases are unknown. On a single-state fixed backbone re-design benchmark of 14 RNA structures from the PDB identified by Das et al. (2010), gRNAde obtains higher native sequence recovery rates (56% on average) compared to Rosetta (45% on average), taking under a second to produce designs compared to the reported hours for Rosetta. We further demonstrate the utility of gRNAde on a new benchmark of multi-state design for structurally flexible RNAs, as well as zero-shot ranking of mutational fitness landscapes in a retrospective analysis of a recent ribozyme. Experimental wet lab validation on 10 different structured RNA backbones finds that gRNAde has a success rate of 50% at designing pseudoknotted RNA structures, a significant advance over 35% for Rosetta. Open source code and tutorials are available at: github.com/chaitjo/geometric-rna-design.
Collapse
|
6
|
Bernard C, Postic G, Ghannay S, Tahi F. Has AlphaFold3 achieved success for RNA? Acta Crystallogr D Struct Biol 2025; 81:49-62. [PMID: 39868559 PMCID: PMC11804252 DOI: 10.1107/s2059798325000592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2024] [Accepted: 01/21/2025] [Indexed: 01/28/2025] Open
Abstract
Predicting the 3D structure of RNA is a significant challenge despite ongoing advancements in the field. Although AlphaFold has successfully addressed this problem for proteins, RNA structure prediction raises difficulties due to the fundamental differences between proteins and RNA, which hinder its direct adaptation. The latest release of AlphaFold, AlphaFold3, has broadened its scope to include multiple different molecules such as DNA, ligands and RNA. While the AlphaFold3 article discussed the results for the last CASP-RNA data set, the scope of its performance and the limitations for RNA are unclear. In this article, we provide a comprehensive analysis of the performance of AlphaFold3 in the prediction of 3D structures of RNA. Through an extensive benchmark over five different test sets, we discuss the performance and limitations of AlphaFold3. We also compare its performance with ten existing state-of-the-art ab initio, template-based and deep-learning approaches. Our results are freely available on the EvryRNA platform at https://evryrna.ibisc.univ-evry.fr/evryrna/alphafold3/.
Collapse
Affiliation(s)
- Clément Bernard
- Université Paris-Saclay, Université Evry, IBISC, 91020Evry-Courcouronnes, France
- LISN – CNRS/Université Paris-Saclay, 91400Orsay, France
| | - Guillaume Postic
- Université Paris-Saclay, Université Evry, IBISC, 91020Evry-Courcouronnes, France
| | - Sahar Ghannay
- LISN – CNRS/Université Paris-Saclay, 91400Orsay, France
| | - Fariza Tahi
- Université Paris-Saclay, Université Evry, IBISC, 91020Evry-Courcouronnes, France
| |
Collapse
|
7
|
Antczak M, Szachniuk M. Toward Increasing the Credibility of RNA Design. Methods Mol Biol 2025; 2847:137-151. [PMID: 39312141 DOI: 10.1007/978-1-0716-4079-1_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
In the problem of RNA design, also known as inverse folding, RNA sequences are predicted that achieve the desired secondary structure at the lowest possible free energy and under certain constraints. The designed sequences have applications in synthetic biology and RNA-based nanotechnologies. There are also known cases of the successful use of inverse folding to discover previously unknown noncoding RNAs. Several computational methods have been dedicated to the problem of RNA design. They differ by algorithm and additional parameters, e.g., those determining the goal function in the sequence optimization process. Users can obtain many promising RNA sequences quite easily. The more difficult issue is to critically evaluate them and select the most favorable and reliable sequence that form1s the expected RNA structure. The latter problem is addressed in this paper. We propose an RNA design protocol extended to include sequence evaluation, for which a 3D structure is used. Experiments show that the accuracy of RNA design can be improved by adding a 3D structure prediction and analysis step.
Collapse
Affiliation(s)
- Maciej Antczak
- Institute of Computing Science, Poznan University of Technology, Poznan, Poland
| | - Marta Szachniuk
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland.
| |
Collapse
|
8
|
Joshi CK, Liò P. gRNAde: A Geometric Deep Learning Pipeline for 3D RNA Inverse Design. Methods Mol Biol 2025; 2847:121-135. [PMID: 39312140 DOI: 10.1007/978-1-0716-4079-1_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
Fundamental to the diverse biological functions of RNA are its 3D structure and conformational flexibility, which enable single sequences to adopt a variety of distinct 3D states. Currently, computational RNA design tasks are often posed as inverse problems, where sequences are designed based on adopting a single desired secondary structure without considering 3D geometry and conformational diversity. In this tutorial, we present gRNAde, a geometric RNA design pipeline operating on sets of 3D RNA backbone structures to design sequences that explicitly account for RNA 3D structure and dynamics. gRNAde is a graph neural network that uses an SE (3) equivariant encoder-decoder framework for generating RNA sequences conditioned on backbone structures where the identities of the bases are unknown. We demonstrate the utility of gRNAde for fixed-backbone re-design of existing RNA structures of interest from the PDB, including riboswitches, aptamers, and ribozymes. gRNAde is more accurate in terms of native sequence recovery while being significantly faster compared to existing physics-based tools for 3D RNA inverse design, such as Rosetta.
Collapse
Affiliation(s)
- Chaitanya K Joshi
- Department of Computer Science and Technology, University of Cambridge, Cambridge, UK.
| | - Pietro Liò
- Department of Computer Science and Technology, University of Cambridge, Cambridge, UK
| |
Collapse
|
9
|
Bahai A, Kwoh CK, Mu Y, Li Y. Systematic benchmarking of deep-learning methods for tertiary RNA structure prediction. PLoS Comput Biol 2024; 20:e1012715. [PMID: 39775239 PMCID: PMC11723642 DOI: 10.1371/journal.pcbi.1012715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2024] [Revised: 01/10/2025] [Accepted: 12/10/2024] [Indexed: 01/11/2025] Open
Abstract
The 3D structure of RNA critically influences its functionality, and understanding this structure is vital for deciphering RNA biology. Experimental methods for determining RNA structures are labour-intensive, expensive, and time-consuming. Computational approaches have emerged as valuable tools, leveraging physics-based-principles and machine learning to predict RNA structures rapidly. Despite advancements, the accuracy of computational methods remains modest, especially when compared to protein structure prediction. Deep learning methods, while successful in protein structure prediction, have shown some promise for RNA structure prediction as well, but face unique challenges. This study systematically benchmarks state-of-the-art deep learning methods for RNA structure prediction across diverse datasets. Our aim is to identify factors influencing performance variation, such as RNA family diversity, sequence length, RNA type, multiple sequence alignment (MSA) quality, and deep learning model architecture. We show that generally ML-based methods perform much better than non-ML methods on most RNA targets, although the performance difference isn't substantial when working with unseen novel or synthetic RNAs. The quality of the MSA and secondary structure prediction both play an important role and most methods aren't able to predict non-Watson-Crick pairs in the RNAs. Overall among the automated 3D RNA structure prediction methods, DeepFoldRNA has the best prediction results followed by DRFold as the second best method. Finally, we also suggest possible mitigations to improve the quality of the prediction for future method development.
Collapse
Affiliation(s)
- Akash Bahai
- School of Biological Sciences (SBS), Nanyang Technological University, Singapore, Singapore
| | - Chee Keong Kwoh
- School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
| | - Yuguang Mu
- School of Biological Sciences (SBS), Nanyang Technological University, Singapore, Singapore
| | - Yinghui Li
- School of Biological Sciences (SBS), Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
10
|
Liu CX, Yang L, Chen LL. Dynamic conformation: Marching toward circular RNA function and application. Mol Cell 2024; 84:3596-3609. [PMID: 39366349 DOI: 10.1016/j.molcel.2024.08.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2024] [Revised: 07/01/2024] [Accepted: 08/15/2024] [Indexed: 10/06/2024]
Abstract
Circular RNA is a group of covalently closed, single-stranded transcripts with unique biogenesis, stability, and conformation that play distinct roles in modulating cellular functions and also possess a great potential for developing circular RNA-based therapies. Importantly, due to its circular conformation, circular RNA generates distinct intramolecular base pairing that is different from the linear transcript. In this perspective, we review how circular RNA conformation can affect its turnover and modes of action, as well as what factors can modulate circular RNA conformation. We also discuss how understanding circular RNA conformation can facilitate learning about their functions as well as the remaining technological issues to further address their conformation. These efforts will ultimately inform the design of circular RNA-based platforms for biomedical applications.
Collapse
Affiliation(s)
- Chu-Xiao Liu
- Key Laboratory of RNA Innovation, Science and Engineering, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Li Yang
- Center for Molecular Medicine, Children's Hospital of Fudan University and Shanghai Key Laboratory of Medical Epigenetics, International Laboratory of Medical Epigenetics and Metabolism, Ministry of Science and Technology, Institutes of Biomedical Sciences, Fudan University, Shanghai, China
| | - Ling-Ling Chen
- Key Laboratory of RNA Innovation, Science and Engineering, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China; New Cornerstone Science Laboratory, Shenzhen, China; School of Life Science and Technology, ShanghaiTech University, Shanghai, China.
| |
Collapse
|
11
|
Bernard C, Postic G, Ghannay S, Tahi F. State-of-the-RNArt: benchmarking current methods for RNA 3D structure prediction. NAR Genom Bioinform 2024; 6:lqae048. [PMID: 38745991 PMCID: PMC11091930 DOI: 10.1093/nargab/lqae048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 04/05/2024] [Accepted: 05/08/2024] [Indexed: 05/16/2024] Open
Abstract
RNAs are essential molecules involved in numerous biological functions. Understanding RNA functions requires the knowledge of their 3D structures. Computational methods have been developed for over two decades to predict the 3D conformations from RNA sequences. These computational methods have been widely used and are usually categorised as either ab initio or template-based. The performances remain to be improved. Recently, the rise of deep learning has changed the sight of novel approaches. Deep learning methods are promising, but their adaptation to RNA 3D structure prediction remains difficult. In this paper, we give a brief review of the ab initio, template-based and novel deep learning approaches. We highlight the different available tools and provide a benchmark on nine methods using the RNA-Puzzles dataset. We provide an online dashboard that shows the predictions made by benchmarked methods, freely available on the EvryRNA platform: https://evryrna.ibisc.univ-evry.fr/evryrna/state_of_the_rnart/.
Collapse
Affiliation(s)
- Clément Bernard
- Université Paris-Saclay, Univ. Evry, IBISC, 91020 Evry-Courcouronnes, France
- LISN - CNRS/Université Paris-Saclay, 91400 Orsay, France
| | - Guillaume Postic
- Université Paris-Saclay, Univ. Evry, IBISC, 91020 Evry-Courcouronnes, France
| | - Sahar Ghannay
- LISN - CNRS/Université Paris-Saclay, 91400 Orsay, France
| | - Fariza Tahi
- Université Paris-Saclay, Univ. Evry, IBISC, 91020 Evry-Courcouronnes, France
| |
Collapse
|
12
|
Bugnon LA, Di Persia L, Gerard M, Raad J, Prochetto S, Fenoy E, Chorostecki U, Ariel F, Stegmayer G, Milone DH. sincFold: end-to-end learning of short- and long-range interactions in RNA secondary structure. Brief Bioinform 2024; 25:bbae271. [PMID: 38855913 PMCID: PMC11163250 DOI: 10.1093/bib/bbae271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Revised: 05/03/2024] [Accepted: 05/24/2024] [Indexed: 06/11/2024] Open
Abstract
MOTIVATION Coding and noncoding RNA molecules participate in many important biological processes. Noncoding RNAs fold into well-defined secondary structures to exert their functions. However, the computational prediction of the secondary structure from a raw RNA sequence is a long-standing unsolved problem, which after decades of almost unchanged performance has now re-emerged due to deep learning. Traditional RNA secondary structure prediction algorithms have been mostly based on thermodynamic models and dynamic programming for free energy minimization. More recently deep learning methods have shown competitive performance compared with the classical ones, but there is still a wide margin for improvement. RESULTS In this work we present sincFold, an end-to-end deep learning approach, that predicts the nucleotides contact matrix using only the RNA sequence as input. The model is based on 1D and 2D residual neural networks that can learn short- and long-range interaction patterns. We show that structures can be accurately predicted with minimal physical assumptions. Extensive experiments were conducted on several benchmark datasets, considering sequence homology and cross-family validation. sincFold was compared with classical methods and recent deep learning models, showing that it can outperform the state-of-the-art methods.
Collapse
Affiliation(s)
- Leandro A Bugnon
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina
| | - Leandro Di Persia
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina
| | - Matias Gerard
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina
| | - Jonathan Raad
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina
| | - Santiago Prochetto
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina
- Instituto de Agrobiotecnología del Litoral, CONICET-UNL, CCT-Santa Fe, Ruta Nacional N° 168 Km 0, s/n, Paraje el Pozo, 3000, Santa Fe, Argentina
| | - Emilio Fenoy
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina
| | - Uciel Chorostecki
- Faculty of Medicine and Health Sciences, Universitat Internacional de Catalunya, Barcelona, Spain
| | - Federico Ariel
- Instituto de Agrobiotecnología del Litoral, CONICET-UNL, CCT-Santa Fe, Ruta Nacional N° 168 Km 0, s/n, Paraje el Pozo, 3000, Santa Fe, Argentina
| | - Georgina Stegmayer
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina
| | - Diego H Milone
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina
| |
Collapse
|
13
|
Ramakers J, Blum CF, König S, Harmeling S, Kollmann M. De novo prediction of RNA 3D structures with deep generative models. PLoS One 2024; 19:e0297105. [PMID: 38358972 PMCID: PMC10868834 DOI: 10.1371/journal.pone.0297105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 12/24/2023] [Indexed: 02/17/2024] Open
Abstract
We present a Deep Learning approach to predict 3D folding structures of RNAs from their nucleic acid sequence. Our approach combines an autoregressive Deep Generative Model, Monte Carlo Tree Search, and a score model to find and rank the most likely folding structures for a given RNA sequence. We show that RNA de novo structure prediction by deep learning is possible at atom resolution, despite the low number of experimentally measured structures that can be used for training. We confirm the predictive power of our approach by achieving competitive results in a retrospective evaluation of the RNA-Puzzles prediction challenges, without using structural contact information from multiple sequence alignments or additional data from chemical probing experiments. Blind predictions for recent RNA-Puzzle challenges under the name "Dfold" further support the competitive performance of our approach.
Collapse
Affiliation(s)
- Julius Ramakers
- Department of Computer Science, Heinrich-Heine-Universität Düsseldorf, Düsseldorf, Germany
| | | | - Sabrina König
- Department of Computer Science, Heinrich-Heine-Universität Düsseldorf, Düsseldorf, Germany
| | - Stefan Harmeling
- Department of Computer Science, Technical University Dortmund, Dortmund, Germany
| | - Markus Kollmann
- Department of Computer Science, Heinrich-Heine-Universität Düsseldorf, Düsseldorf, Germany
| |
Collapse
|
14
|
Du Z, Peng Z, Yang J. RNA threading with secondary structure and sequence profile. Bioinformatics 2024; 40:btae080. [PMID: 38341662 PMCID: PMC10893584 DOI: 10.1093/bioinformatics/btae080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 01/05/2024] [Accepted: 02/09/2024] [Indexed: 02/12/2024] Open
Abstract
MOTIVATION RNA threading aims to identify remote homologies for template-based modeling of RNA 3D structure. Existing RNA alignment methods primarily rely on secondary structure alignment. They are often time- and memory-consuming, limiting large-scale applications. In addition, the accuracy is far from satisfactory. RESULTS Using RNA secondary structure and sequence profile, we developed a novel RNA threading algorithm, named RNAthreader. To enhance the alignment process and minimize memory usage, a novel approach has been introduced to simplify RNA secondary structures into compact diagrams. RNAthreader employs a two-step methodology. Initially, integer programming and dynamic programming are combined to create an initial alignment for the simplified diagram. Subsequently, the final alignment is obtained using dynamic programming, taking into account the initial alignment derived from the previous step. The benchmark test on 80 RNAs illustrates that RNAthreader generates more accurate alignments than other methods, especially for RNAs with pseudoknots. Another benchmark, involving 30 RNAs from the RNA-Puzzles experiments, exhibits that the models constructed using RNAthreader templates have a lower average RMSD than those created by alternative methods. Remarkably, RNAthreader takes less than two hours to complete alignments with ∼5000 RNAs, which is 3-40 times faster than other methods. These compelling results suggest that RNAthreader is a promising algorithm for RNA template detection. AVAILABILITY AND IMPLEMENTATION https://yanglab.qd.sdu.edu.cn/RNAthreader.
Collapse
Affiliation(s)
- Zongyang Du
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
| | - Zhenling Peng
- MOE Frontiers Science Center for Nonlinear Expectations, Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao 266237, China
| | - Jianyi Yang
- MOE Frontiers Science Center for Nonlinear Expectations, Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao 266237, China
| |
Collapse
|
15
|
Sarzynska J, Popenda M, Antczak M, Szachniuk M. RNA tertiary structure prediction using RNAComposer in CASP15. Proteins 2023; 91:1790-1799. [PMID: 37615316 DOI: 10.1002/prot.26578] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Revised: 06/14/2023] [Accepted: 08/08/2023] [Indexed: 08/25/2023]
Abstract
As CASP15 participants, in the new category of 3D RNA structure prediction, we applied expert modeling with the support of our proprietary system RNAComposer. Although RNAComposer is primarily known as an automated web server, its features allow it to be used interactively, for example, for homology-based modeling or assembling models from user-provided structural elements. In the paper, we present various scenarios of applying the system to predict the 3D RNA structures that we employed. Their combination with expert input, comparative analysis of models, and routines to select representative resultant structures form a ready-for-reuse workflow. With selected examples, we demonstrate its application for the in silico modeling of natural and synthetic RNA molecules targeted in CASP15.
Collapse
Affiliation(s)
- Joanna Sarzynska
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| | - Mariusz Popenda
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| | - Maciej Antczak
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
- Institute of Computing Science, Poznan University of Technology, Poznan, Poland
| | - Marta Szachniuk
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
- Institute of Computing Science, Poznan University of Technology, Poznan, Poland
| |
Collapse
|
16
|
Schneider B, Sweeney BA, Bateman A, Cerny J, Zok T, Szachniuk M. When will RNA get its AlphaFold moment? Nucleic Acids Res 2023; 51:9522-9532. [PMID: 37702120 PMCID: PMC10570031 DOI: 10.1093/nar/gkad726] [Citation(s) in RCA: 48] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 08/13/2023] [Accepted: 08/22/2023] [Indexed: 09/14/2023] Open
Abstract
The protein structure prediction problem has been solved for many types of proteins by AlphaFold. Recently, there has been considerable excitement to build off the success of AlphaFold and predict the 3D structures of RNAs. RNA prediction methods use a variety of techniques, from physics-based to machine learning approaches. We believe that there are challenges preventing the successful development of deep learning-based methods like AlphaFold for RNA in the short term. Broadly speaking, the challenges are the limited number of structures and alignments making data-hungry deep learning methods unlikely to succeed. Additionally, there are several issues with the existing structure and sequence data, as they are often of insufficient quality, highly biased and missing key information. Here, we discuss these challenges in detail and suggest some steps to remedy the situation. We believe that it is possible to create an accurate RNA structure prediction method, but it will require solving several data quality and volume issues, usage of data beyond simple sequence alignments, or the development of new less data-hungry machine learning methods.
Collapse
Affiliation(s)
- Bohdan Schneider
- Institute of Biotechnology of the Czech Academy of Sciences, Prumyslova 595, CZ-252 50 Vestec, Czech Republic
| | - Blake Alexander Sweeney
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Jiri Cerny
- Institute of Biotechnology of the Czech Academy of Sciences, Prumyslova 595, CZ-252 50 Vestec, Czech Republic
| | - Tomasz Zok
- Institute of Computing Science and European Centre for Bioinformatics and Genomics, Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland
| | - Marta Szachniuk
- Institute of Computing Science and European Centre for Bioinformatics and Genomics, Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland
| |
Collapse
|
17
|
Ibéné M, Legendre A, Postic G, Angel E, Tahi F. C-RCPred: a multi-objective algorithm for interactive secondary structure prediction of RNA complexes integrating user knowledge and SHAPE data. Brief Bioinform 2023:bbad225. [PMID: 37337745 DOI: 10.1093/bib/bbad225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 04/12/2023] [Accepted: 05/26/2023] [Indexed: 06/21/2023] Open
Abstract
RNAs can interact with other molecules in their environment, such as ions, proteins or other RNAs, to form complexes with important biological roles. The prediction of the structure of these complexes is therefore an important issue and a difficult task. We are interested in RNA complexes composed of several (more than two) interacting RNAs. We show how available knowledge on the considered RNAs can help predict their secondary structure. We propose an interactive tool for the prediction of RNA complexes, called C-RCPRed, that considers user knowledge and probing data (which can be generated experimentally or artificially). C-RCPred is based on a multi-objective optimization algorithm. Through an extensive benchmarking procedure, which includes state-of-the-art methods, we show the efficiency of the multi-objective approach and the positive impact of considering user knowledge and probing data on the prediction results. C-RCPred is freely available as an open-source program and web server on the EvryRNA website (https://evryrna.ibisc.univ-evry.fr).
Collapse
Affiliation(s)
- Mandy Ibéné
- Université Paris-Saclay, Univ Evry, IBISC, 91020, Evry-Courcouronnes, France
| | - Audrey Legendre
- Université Paris-Saclay, Univ Evry, IBISC, 91020, Evry-Courcouronnes, France
| | - Guillaume Postic
- Université Paris-Saclay, Univ Evry, IBISC, 91020, Evry-Courcouronnes, France
| | - Eric Angel
- Université Paris-Saclay, Univ Evry, IBISC, 91020, Evry-Courcouronnes, France
| | - Fariza Tahi
- Université Paris-Saclay, Univ Evry, IBISC, 91020, Evry-Courcouronnes, France
| |
Collapse
|
18
|
Justyna M, Antczak M, Szachniuk M. Machine learning for RNA 2D structure prediction benchmarked on experimental data. Brief Bioinform 2023; 24:7140288. [PMID: 37096592 DOI: 10.1093/bib/bbad153] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Revised: 03/15/2023] [Accepted: 03/29/2023] [Indexed: 04/26/2023] Open
Abstract
Since the 1980s, dozens of computational methods have addressed the problem of predicting RNA secondary structure. Among them are those that follow standard optimization approaches and, more recently, machine learning (ML) algorithms. The former were repeatedly benchmarked on various datasets. The latter, on the other hand, have not yet undergone extensive analysis that could suggest to the user which algorithm best fits the problem to be solved. In this review, we compare 15 methods that predict the secondary structure of RNA, of which 6 are based on deep learning (DL), 3 on shallow learning (SL) and 6 control methods on non-ML approaches. We discuss the ML strategies implemented and perform three experiments in which we evaluate the prediction of (I) representatives of the RNA equivalence classes, (II) selected Rfam sequences and (III) RNAs from new Rfam families. We show that DL-based algorithms (such as SPOT-RNA and UFold) can outperform SL and traditional methods if the data distribution is similar in the training and testing set. However, when predicting 2D structures for new RNA families, the advantage of DL is no longer clear, and its performance is inferior or equal to that of SL and non-ML methods.
Collapse
Affiliation(s)
- Marek Justyna
- Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland
| | - Maciej Antczak
- Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland
| | - Marta Szachniuk
- Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland
| |
Collapse
|
19
|
Tan YL, Wang X, Yu S, Zhang B, Tan ZJ. cgRNASP: coarse-grained statistical potentials with residue separation for RNA structure evaluation. NAR Genom Bioinform 2023; 5:lqad016. [PMID: 36879898 PMCID: PMC9985339 DOI: 10.1093/nargab/lqad016] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 01/21/2023] [Accepted: 02/03/2023] [Indexed: 03/07/2023] Open
Abstract
Knowledge-based statistical potentials are very important for RNA 3-dimensional (3D) structure prediction and evaluation. In recent years, various coarse-grained (CG) and all-atom models have been developed for predicting RNA 3D structures, while there is still lack of reliable CG statistical potentials not only for CG structure evaluation but also for all-atom structure evaluation at high efficiency. In this work, we have developed a series of residue-separation-based CG statistical potentials at different CG levels for RNA 3D structure evaluation, namely cgRNASP, which is composed of long-ranged and short-ranged interactions by residue separation. Compared with the newly developed all-atom rsRNASP, the short-ranged interaction in cgRNASP was involved more subtly and completely. Our examinations show that, the performance of cgRNASP varies with CG levels and compared with rsRNASP, cgRNASP has similarly good performance for extensive types of test datasets and can have slightly better performance for the realistic dataset-RNA-Puzzles dataset. Furthermore, cgRNASP is strikingly more efficient than all-atom statistical potentials/scoring functions, and can be apparently superior to other all-atom statistical potentials and scoring functions trained from neural networks for the RNA-Puzzles dataset. cgRNASP is available at https://github.com/Tan-group/cgRNASP.
Collapse
Affiliation(s)
- Ya-Lan Tan
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University, Wuhan 430073, China.,Department of Physics and Key Laboratory of Artificial Micro & Nano-structures of Education, School of Physics and Technology, Wuhan University, Wuhan 430072, China
| | - Xunxun Wang
- Department of Physics and Key Laboratory of Artificial Micro & Nano-structures of Education, School of Physics and Technology, Wuhan University, Wuhan 430072, China
| | - Shixiong Yu
- Department of Physics and Key Laboratory of Artificial Micro & Nano-structures of Education, School of Physics and Technology, Wuhan University, Wuhan 430072, China
| | - Bengong Zhang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University, Wuhan 430073, China
| | - Zhi-Jie Tan
- Department of Physics and Key Laboratory of Artificial Micro & Nano-structures of Education, School of Physics and Technology, Wuhan University, Wuhan 430072, China
| |
Collapse
|
20
|
Sato R, Suzuki K, Yasuda Y, Suenaga A, Fukui K. RNAapt3D: RNA aptamer 3D-structural modeling database. Biophys J 2022; 121:4770-4776. [PMID: 36146935 PMCID: PMC9808543 DOI: 10.1016/j.bpj.2022.09.023] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Revised: 08/17/2022] [Accepted: 09/20/2022] [Indexed: 01/07/2023] Open
Abstract
RNA aptamers are oligonucleotides with high binding affinity and specificity for target molecules and are expected to be a new generation of therapeutic molecules and targeted delivery materials. The tertiary structure of RNA molecules and RNA-protein interaction sites are increasingly important as potential targets for new drugs. The pathological mechanisms of diseases must be understood in detail to guide drug design. In developing RNA aptamers as drugs, information about the interaction mechanisms and structures of RNA aptamer-target protein complexes are useful. We constructed a database, RNA aptamer 3D-structural modeling (RNAapt3D), consisting of RNA aptamer data that are potential drug candidates. The database includes RNA sequences and computationally predicted RNA tertiary structures based on secondary structures and implements methods that can be used to predict unknown structures of RNA aptamer-target molecule complexes. RNAapt3D should enable the design of RNA aptamers for target molecules and improve the efficiency and productivity of candidate drug selection. RNAapt3D can be accessed at https://rnaapt3d.medals.jp.
Collapse
Affiliation(s)
- Ryuma Sato
- Cellular and Molecular Biotechnology Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
| | - Koji Suzuki
- Cellular and Molecular Biotechnology Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
| | - Yuichi Yasuda
- College of Humanities and Science, Department of Biosciences, Nihon University, Tokyo, Japan
| | - Atsushi Suenaga
- College of Humanities and Science, Department of Biosciences, Nihon University, Tokyo, Japan
| | - Kazuhiko Fukui
- Cellular and Molecular Biotechnology Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan.
| |
Collapse
|
21
|
Wiedemann J, Kaczor J, Milostan M, Zok T, Blazewicz J, Szachniuk M, Antczak M. RNAloops: a database of RNA multiloops. Bioinformatics 2022; 38:4200-4205. [PMID: 35809063 PMCID: PMC9438955 DOI: 10.1093/bioinformatics/btac484] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Revised: 06/26/2022] [Accepted: 07/06/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Knowledge of the 3D structure of RNA supports discovering its functions and is crucial for designing drugs and modern therapeutic solutions. Thus, much attention is devoted to experimental determination and computational prediction targeting the global fold of RNA and its local substructures. The latter include multi-branched loops-functionally significant elements that highly affect the spatial shape of the entire molecule. Unfortunately, their computational modeling constitutes a weak point of structural bioinformatics. A remedy for this is in collecting these motifs and analyzing their features. RESULTS RNAloops is a self-updating database that stores multi-branched loops identified in the PDB-deposited RNA structures. A description of each loop includes angular data-planar and Euler angles computed between pairs of adjacent helices to allow studying their mutual arrangement in space. The system enables search and analysis of multiloops, presents their structure details numerically and visually, and computes data statistics. AVAILABILITY AND IMPLEMENTATION RNAloops is freely accessible at https://rnaloops.cs.put.poznan.pl. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jakub Wiedemann
- Institute of Computing Science, Poznan University of Technology, 60-965 Poznan, Poland
| | - Jacek Kaczor
- Institute of Computing Science, Poznan University of Technology, 60-965 Poznan, Poland
| | - Maciej Milostan
- Institute of Computing Science, Poznan University of Technology, 60-965 Poznan, Poland,Poznan Supercomputing and Networking Center, 61-131 Poznan, Poland
| | - Tomasz Zok
- Institute of Computing Science, Poznan University of Technology, 60-965 Poznan, Poland,Poznan Supercomputing and Networking Center, 61-131 Poznan, Poland
| | - Jacek Blazewicz
- Institute of Computing Science, Poznan University of Technology, 60-965 Poznan, Poland,Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704 Poznan, Poland
| | | | | |
Collapse
|