1
|
Qi F, Chen J, Chen Y, Sun J, Lin Y, Chen Z, Kapranov P. Evaluating Performance of Different RNA Secondary Structure Prediction Programs Using Self-cleaving Ribozymes. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae043. [PMID: 39317944 PMCID: PMC12016570 DOI: 10.1093/gpbjnl/qzae043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 03/02/2024] [Accepted: 06/05/2024] [Indexed: 09/26/2024]
Abstract
Accurate identification of the correct, biologically relevant RNA structures is critical to understanding various aspects of RNA biology since proper folding represents the key to the functionality of all types of RNA molecules and plays pivotal roles in many essential biological processes. Thus, a plethora of approaches have been developed to predict, identify, or solve RNA structures based on various computational, molecular, genetic, chemical, or physicochemical strategies. Purely computational approaches hold distinct advantages over all other strategies in terms of the ease of implementation, time, speed, cost, and throughput, but they strongly underperform in terms of accuracy that significantly limits their broader application. Nonetheless, the advantages of these methods led to a steady development of multiple in silico RNA secondary structure prediction approaches including recent deep learning-based programs. Here, we compared the accuracy of predictions of biologically relevant secondary structures of dozens of self-cleaving ribozyme sequences using seven in silico RNA folding prediction tools with tasks of varying complexity. We found that while many programs performed well in relatively simple tasks, their performance varied significantly in more complex RNA folding problems. However, in general, a modern deep learning method outperformed the other programs in the complex tasks in predicting the RNA secondary structures, at least based on the specific class of sequences tested, suggesting that it may represent the future of RNA structure prediction algorithms.
Collapse
Affiliation(s)
- Fei Qi
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen 361102, China
- Institute of Genomics, School of Medicine, Huaqiao University, Xiamen 361021, China
| | - Junjie Chen
- Institute of Genomics, School of Medicine, Huaqiao University, Xiamen 361021, China
| | - Yue Chen
- Institute of Genomics, School of Medicine, Huaqiao University, Xiamen 361021, China
| | - Jianfeng Sun
- Botnar Research Centre, University of Oxford, Oxford, OX3 7LD, United Kingdom
| | - Yiting Lin
- Institute of Genomics, School of Medicine, Huaqiao University, Xiamen 361021, China
| | - Zipeng Chen
- Institute of Genomics, School of Medicine, Huaqiao University, Xiamen 361021, China
| | - Philipp Kapranov
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen 361102, China
| |
Collapse
|
2
|
Röder K, Pasquali S. Assessing RNA atomistic force fields via energy landscape explorations in implicit solvent. Biophys Rev 2024; 16:285-295. [PMID: 39099837 PMCID: PMC11297004 DOI: 10.1007/s12551-024-01202-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Accepted: 05/29/2024] [Indexed: 08/06/2024] Open
Abstract
Predicting the structure and dynamics of RNA molecules still proves challenging because of the relative scarcity of experimental RNA structures on which to train models and the very sensitive nature of RNA towards its environment. In the last decade, several atomistic force fields specifically designed for RNA have been proposed and are commonly used for simulations. However, it is not necessarily clear which force field is the most suitable for a given RNA molecule. In this contribution, we propose the use of the computational energy landscape framework to explore the energy landscape of RNA systems as it can bring complementary information to the more standard approaches of enhanced sampling simulations based on molecular dynamics. We apply the EL framework to the study of a small RNA pseudoknot, the Aquifex aeolicus tmRNA pseudoknot PK1, and we compare the results of five different RNA force fields currently available in the AMBER simulation software, in implicit solvent. With this computational approach, we can not only compare the predicted 'native' states for the different force fields, but the method enables us to study metastable states as well. As a result, our comparison not only looks at structural features of low energy folded structures, but provides insight into folding pathways and higher energy excited states, opening to the possibility of assessing the validity of force fields also based on kinetics and experiments providing information on metastable and unfolded states. Supplementary Information The online version contains supplementary material available at 10.1007/s12551-024-01202-9.
Collapse
Affiliation(s)
- Konstantin Röder
- Randall Centre for Cell & Molecular Biophysics, King’s College London, London, SE1 1UL UK
| | - Samuela Pasquali
- Laboratoire Biologie Functionnelle Et Adaptative, CNRS UMR 8251, Inserm ERL U1133, Université Paris Cité , 35 Rue Hélène Brion, Paris, France
| |
Collapse
|
3
|
Tosti Guerra F, Poppleton E, Šulc P, Rovigatti L. ANNaMo: Coarse-grained modeling for folding and assembly of RNA and DNA systems. J Chem Phys 2024; 160:205102. [PMID: 38814009 DOI: 10.1063/5.0202829] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Accepted: 05/04/2024] [Indexed: 05/31/2024] Open
Abstract
The folding of RNA and DNA strands plays crucial roles in biological systems and bionanotechnology. However, studying these processes with high-resolution numerical models is beyond current computational capabilities due to the timescales and system sizes involved. In this article, we present a new coarse-grained model for investigating the folding dynamics of nucleic acids. Our model represents three nucleotides with a patchy particle and is parameterized using well-established nearest-neighbor models. Thanks to the reduction of degrees of freedom and to a bond-swapping mechanism, our model allows for simulations at timescales and length scales that are currently inaccessible to more detailed models. To validate the performance of our model, we conducted extensive simulations of various systems: We examined the thermodynamics of DNA hairpins, capturing their stability and structural transitions, the folding of an MMTV pseudoknot, which is a complex RNA structure involved in viral replication, and also explored the folding of an RNA tile containing a k-type pseudoknot. Finally, we evaluated the performance of the new model in reproducing the melting temperatures of oligomers and the dependence on the toehold length of the displacement rate in toehold-mediated displacement processes, a key reaction used in molecular computing. All in all, the successful reproduction of experimental data and favorable comparisons with existing coarse-grained models validate the effectiveness of the new model.
Collapse
Affiliation(s)
- F Tosti Guerra
- Department of Physics, Sapienza University of Rome, Roma, Italy
| | - E Poppleton
- School of Molecular Sciences and Center for Molecular Design and Biomimetics, The Biodesign Institute, Arizona State University, Tempe, Arizona 85281, USA
- Biophysical Engineering Group, Max Planck Institute for Medical Research, Heidelberg, Germany
| | - P Šulc
- School of Molecular Sciences and Center for Molecular Design and Biomimetics, The Biodesign Institute, Arizona State University, Tempe, Arizona 85281, USA
- Department of Bioscience, School of Natural Sciences, Technical University Munich, Munich, Germany
| | - L Rovigatti
- Department of Physics, Sapienza University of Rome, Roma, Italy
| |
Collapse
|
4
|
Kolaitis A, Makris E, Karagiannis AA, Tsanakas P, Pavlatos C. Knotify_V2.0: Deciphering RNA Secondary Structures with H-Type Pseudoknots and Hairpin Loops. Genes (Basel) 2024; 15:670. [PMID: 38927606 PMCID: PMC11203014 DOI: 10.3390/genes15060670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Revised: 05/19/2024] [Accepted: 05/22/2024] [Indexed: 06/28/2024] Open
Abstract
Accurately predicting the pairing order of bases in RNA molecules is essential for anticipating RNA secondary structures. Consequently, this task holds significant importance in unveiling previously unknown biological processes. The urgent need to comprehend RNA structures has been accentuated by the unprecedented impact of the widespread COVID-19 pandemic. This paper presents a framework, Knotify_V2.0, which makes use of syntactic pattern recognition techniques in order to predict RNA structures, with a specific emphasis on tackling the demanding task of predicting H-type pseudoknots that encompass bulges and hairpins. By leveraging the expressive capabilities of a Context-Free Grammar (CFG), the suggested framework integrates the inherent benefits of CFG and makes use of minimum free energy and maximum base pairing criteria. This integration enables the effective management of this inherently ambiguous task. The main contribution of Knotify_V2.0 compared to earlier versions lies in its capacity to identify additional motifs like bulges and hairpins within the internal loops of the pseudoknot. Notably, the proposed methodology, Knotify_V2.0, demonstrates superior accuracy in predicting core stems compared to state-of-the-art frameworks. Knotify_V2.0 exhibited exceptional performance by accurately identifying both core base pairing that form the ground truth pseudoknot in 70% of the examined sequences. Furthermore, Knotify_V2.0 narrowed the performance gap with Knotty, which had demonstrated better performance than Knotify and even surpassed it in Recall and F1-score metrics. Knotify_V2.0 achieved a higher count of true positives (tp) and a significantly lower count of false negatives (fn) compared to Knotify, highlighting improvements in Prediction and Recall metrics, respectively. Consequently, Knotify_V2.0 achieved a higher F1-score than any other platform. The source code and comprehensive implementation details of Knotify_V2.0 are publicly available on GitHub.
Collapse
Affiliation(s)
- Angelos Kolaitis
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece; (A.K.); (E.M.); (A.A.K.); (P.T.)
| | - Evangelos Makris
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece; (A.K.); (E.M.); (A.A.K.); (P.T.)
| | - Alexandros Anastasios Karagiannis
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece; (A.K.); (E.M.); (A.A.K.); (P.T.)
| | - Panayiotis Tsanakas
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece; (A.K.); (E.M.); (A.A.K.); (P.T.)
| | - Christos Pavlatos
- Hellenic Air Force Academy, Dekelia Air Base, Acharnes, 13671 Athens, Greece
| |
Collapse
|
5
|
Gong T, Ju F, Bu D. Accurate prediction of RNA secondary structure including pseudoknots through solving minimum-cost flow with learned potentials. Commun Biol 2024; 7:297. [PMID: 38461362 PMCID: PMC10924946 DOI: 10.1038/s42003-024-05952-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 02/21/2024] [Indexed: 03/11/2024] Open
Abstract
Pseudoknots are key structure motifs of RNA and pseudoknotted RNAs play important roles in a variety of biological processes. Here, we present KnotFold, an accurate approach to the prediction of RNA secondary structure including pseudoknots. The key elements of KnotFold include a learned potential function and a minimum-cost flow algorithm to find the secondary structure with the lowest potential. KnotFold learns the potential from the RNAs with known structures using an attention-based neural network, thus avoiding the inaccuracy of hand-crafted energy functions. The specially designed minimum-cost flow algorithm used by KnotFold considers all possible combinations of base pairs and selects from them the optimal combination. The algorithm breaks the restriction of nested base pairs required by the widely used dynamic programming algorithms, thus enabling the identification of pseudoknots. Using 1,009 pseudoknotted RNAs as representatives, we demonstrate the successful application of KnotFold in predicting RNA secondary structures including pseudoknots with accuracy higher than the state-of-the-art approaches. We anticipate that KnotFold, with its superior accuracy, will greatly facilitate the understanding of RNA structures and functionalities.
Collapse
Affiliation(s)
- Tiansu Gong
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, 100190, Beijing, China
- University of Chinese Academy of Sciences, 100190, Beijing, China
| | - Fusong Ju
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, 100190, Beijing, China
- University of Chinese Academy of Sciences, 100190, Beijing, China
| | - Dongbo Bu
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, 100190, Beijing, China.
- University of Chinese Academy of Sciences, 100190, Beijing, China.
- Central China Artificial Intelligence Research Institute, Henan Academy of Sciences, Zhengzhou, 450046, Henan, China.
| |
Collapse
|
6
|
Makris E, Kolaitis A, Andrikos C, Moulos V, Tsanakas P, Pavlatos C. Knotify+: Toward the Prediction of RNA H-Type Pseudoknots, Including Bulges and Internal Loops. Biomolecules 2023; 13:biom13020308. [PMID: 36830677 PMCID: PMC9953189 DOI: 10.3390/biom13020308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2022] [Revised: 01/25/2023] [Accepted: 02/01/2023] [Indexed: 02/09/2023] Open
Abstract
The accurate "base pairing" in RNA molecules, which leads to the prediction of RNA secondary structures, is crucial in order to explain unknown biological operations. Recently, COVID-19, a widespread disease, has caused many deaths, affecting humanity in an unprecedented way. SARS-CoV-2, a single-stranded RNA virus, has shown the significance of analyzing these molecules and their structures. This paper aims to create a pioneering framework in the direction of predicting specific RNA structures, leveraging syntactic pattern recognition. The proposed framework, Knotify+, addresses the problem of predicting H-type pseudoknots, including bulges and internal loops, by featuring the power of context-free grammar (CFG). We combine the grammar's advantages with maximum base pairing and minimum free energy to tackle this ambiguous task in a performant way. Specifically, our proposed methodology, Knotify+, outperforms state-of-the-art frameworks with regards to its accuracy in core stems prediction. Additionally, it performs more accurately in small sequences and presents a comparable accuracy rate in larger ones, while it requires a smaller execution time compared to well-known platforms. The Knotify+ source code and implementation details are available as a public repository on GitHub.
Collapse
Affiliation(s)
- Evangelos Makris
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece
| | - Angelos Kolaitis
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece
| | - Christos Andrikos
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece
| | - Vrettos Moulos
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece
| | - Panayiotis Tsanakas
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece
| | - Christos Pavlatos
- Hellenic Air Force Academy, Dekelia Air Base, Acharnes, 13671 Athens, Greece
- Correspondence: ; Tel.: +30-210-7722541
| |
Collapse
|
7
|
Kimchi O, Brenner MP, Colwell LJ. Nucleic Acid Structure Prediction Including Pseudoknots Through Direct Enumeration of States: A User's Guide to the LandscapeFold Algorithm. Methods Mol Biol 2023; 2586:49-77. [PMID: 36705898 DOI: 10.1007/978-1-0716-2768-6_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Here we detail the LandscapeFold secondary structure prediction algorithm and how it is used. The algorithm was previously described and tested in (Kimchi O et al., Biophys J 117(3):520-532, 2019), though it was not named there. The algorithm directly enumerates all possible secondary structures into which up to two RNA or single-stranded DNA sequences can fold. It uses a polymer physics model to estimate the configurational entropy of structures including complex pseudoknots. We detail each of these steps and ways in which the user can adjust the algorithm as desired. The code is available on the GitHub repository https://github.com/ofer-kimchi/LandscapeFold .
Collapse
Affiliation(s)
- Ofer Kimchi
- School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA. .,Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA.
| | - Michael P Brenner
- School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA
| | - Lucy J Colwell
- Department of Chemistry, University of Cambridge, Cambridge, UK
| |
Collapse
|
8
|
Andrikos C, Makris E, Kolaitis A, Rassias G, Pavlatos C, Tsanakas P. Knotify: An Efficient Parallel Platform for RNA Pseudoknot Prediction Using Syntactic Pattern Recognition. Methods Protoc 2022; 5:mps5010014. [PMID: 35200530 PMCID: PMC8876629 DOI: 10.3390/mps5010014] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Revised: 01/27/2022] [Accepted: 01/30/2022] [Indexed: 11/16/2022] Open
Abstract
Obtaining valuable clues for noncoding RNA (ribonucleic acid) subsequences remains a significant challenge, acknowledging that most of the human genome transcribes into noncoding RNA parts related to unknown biological operations. Capturing these clues relies on accurate “base pairing” prediction, also known as “RNA secondary structure prediction”. As COVID-19 is considered a severe global threat, the single-stranded SARS-CoV-2 virus reveals the importance of establishing an efficient RNA analysis toolkit. This work aimed to contribute to that by introducing a novel system committed to predicting RNA secondary structure patterns (i.e., RNA’s pseudoknots) that leverage syntactic pattern-recognition strategies. Having focused on the pseudoknot predictions, we formalized the secondary structure prediction of the RNA to be primarily a parsing and, secondly, an optimization problem. The proposed methodology addresses the problem of predicting pseudoknots of the first order (H-type). We introduce a context-free grammar (CFG) that affords enough expression power to recognize potential pseudoknot pattern. In addition, an alternative methodology of detecting possible pseudoknots is also implemented as well, using a brute-force algorithm. Any input sequence may highlight multiple potential folding patterns requiring a strict methodology to determine the single biologically realistic one. We conscripted a novel heuristic over the widely accepted notion of free-energy minimization to tackle such ambiguity in a performant way by utilizing each pattern’s context to unveil the most prominent pseudoknot pattern. The overall process features polynomial-time complexity, while its parallel implementation enhances the end performance, as proportional to the deployed hardware. The proposed methodology does succeed in predicting the core stems of any RNA pseudoknot of the test dataset by performing a 76.4% recall ratio. The methodology achieved a F1-score equal to 0.774 and MCC equal 0.543 in discovering all the stems of an RNA sequence, outperforming the particular task. Measurements were taken using a dataset of 262 RNA sequences establishing a performance speed of 1.31, 3.45, and 7.76 compared to three well-known platforms. The implementation source code is publicly available under knotify github repo.
Collapse
Affiliation(s)
- Christos Andrikos
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece; (C.A.); (E.M.); (A.K.); (G.R.); (P.T.)
| | - Evangelos Makris
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece; (C.A.); (E.M.); (A.K.); (G.R.); (P.T.)
| | - Angelos Kolaitis
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece; (C.A.); (E.M.); (A.K.); (G.R.); (P.T.)
| | - Georgios Rassias
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece; (C.A.); (E.M.); (A.K.); (G.R.); (P.T.)
| | - Christos Pavlatos
- Hellenic Air Force Academy, Dekelia Air Base, Acharnes, 13671 Athens, Greece
- Correspondence: ; Tel.: +30-210-7722541
| | - Panayiotis Tsanakas
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece; (C.A.); (E.M.); (A.K.); (G.R.); (P.T.)
| |
Collapse
|
9
|
Winkler J, Urgese G, Ficarra E, Reinert K. LaRA 2: parallel and vectorized program for sequence-structure alignment of RNA sequences. BMC Bioinformatics 2022; 23:18. [PMID: 34991448 PMCID: PMC8734264 DOI: 10.1186/s12859-021-04532-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Accepted: 12/13/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The function of non-coding RNA sequences is largely determined by their spatial conformation, namely the secondary structure of the molecule, formed by Watson-Crick interactions between nucleotides. Hence, modern RNA alignment algorithms routinely take structural information into account. In order to discover yet unknown RNA families and infer their possible functions, the structural alignment of RNAs is an essential task. This task demands a lot of computational resources, especially for aligning many long sequences, and it therefore requires efficient algorithms that utilize modern hardware when available. A subset of the secondary structures contains overlapping interactions (called pseudoknots), which add additional complexity to the problem and are often ignored in available software. RESULTS We present the SeqAn-based software LaRA 2 that is significantly faster than comparable software for accurate pairwise and multiple alignments of structured RNA sequences. In contrast to other programs our approach can handle arbitrary pseudoknots. As an improved re-implementation of the LaRA tool for structural alignments, LaRA 2 uses multi-threading and vectorization for parallel execution and a new heuristic for computing a lower boundary of the solution. Our algorithmic improvements yield a program that is up to 130 times faster than the previous version. CONCLUSIONS With LaRA 2 we provide a tool to analyse large sets of RNA secondary structures in relatively short time, based on structural alignment. The produced alignments can be used to derive structural motifs for the search in genomic databases.
Collapse
Affiliation(s)
- Jörg Winkler
- Department of Mathematics and Computer Science, Free University Berlin, Takustraße 9, 14195 Berlin, Germany
- Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, 14195 Berlin, Germany
| | - Gianvito Urgese
- Interuniversity Department of Regional and Urban Studies and Planning, Politecnico di Torino, C.so Duca degli Abruzzi 24, 10129 Turin, Italy
| | - Elisa Ficarra
- Department of Control and Computer Science, Politecnico di Torino, C.so Duca degli Abruzzi 24, 10129 Turin, Italy
| | - Knut Reinert
- Department of Mathematics and Computer Science, Free University Berlin, Takustraße 9, 14195 Berlin, Germany
- Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, 14195 Berlin, Germany
| |
Collapse
|
10
|
Magnus M, Antczak M, Zok T, Wiedemann J, Lukasiak P, Cao Y, Bujnicki JM, Westhof E, Szachniuk M, Miao Z. RNA-Puzzles toolkit: a computational resource of RNA 3D structure benchmark datasets, structure manipulation, and evaluation tools. Nucleic Acids Res 2020; 48:576-588. [PMID: 31799609 PMCID: PMC7145511 DOI: 10.1093/nar/gkz1108] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2019] [Revised: 11/06/2019] [Accepted: 11/15/2019] [Indexed: 12/12/2022] Open
Abstract
Significant improvements have been made in the efficiency and accuracy of RNA 3D structure prediction methods during the succeeding challenges of RNA-Puzzles, a community-wide effort on the assessment of blind prediction of RNA tertiary structures. The RNA-Puzzles contest has shown, among others, that the development and validation of computational methods for RNA fold prediction strongly depend on the benchmark datasets and the structure comparison algorithms. Yet, there has been no systematic benchmark set or decoy structures available for the 3D structure prediction of RNA, hindering the standardization of comparative tests in the modeling of RNA structure. Furthermore, there has not been a unified set of tools that allows deep and complete RNA structure analysis, and at the same time, that is easy to use. Here, we present RNA-Puzzles toolkit, a computational resource including (i) decoy sets generated by different RNA 3D structure prediction methods (raw, for-evaluation and standardized datasets), (ii) 3D structure normalization, analysis, manipulation, visualization tools (RNA_format, RNA_normalizer, rna-tools) and (iii) 3D structure comparison metric tools (RNAQUA, MCQ4Structures). This resource provides a full list of computational tools as well as a standard RNA 3D structure prediction assessment protocol for the community.
Collapse
Affiliation(s)
- Marcin Magnus
- International Institute of Molecular and Cell Biology in Warsaw, 02-109 Warsaw, Poland
- ReMedy-International Research Agenda Unit, Centre of New Technologies, University of Warsaw, 02-097 Warsaw, Poland
| | - Maciej Antczak
- Institute of Computing Science & European Centre for Bioinformatics and Genomics, Poznan University of Technology, 60-965 Poznan, Poland
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704 Poznan, Poland
| | - Tomasz Zok
- Institute of Computing Science & European Centre for Bioinformatics and Genomics, Poznan University of Technology, 60-965 Poznan, Poland
| | - Jakub Wiedemann
- Institute of Computing Science & European Centre for Bioinformatics and Genomics, Poznan University of Technology, 60-965 Poznan, Poland
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704 Poznan, Poland
| | - Piotr Lukasiak
- Institute of Computing Science & European Centre for Bioinformatics and Genomics, Poznan University of Technology, 60-965 Poznan, Poland
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704 Poznan, Poland
| | - Yang Cao
- Center of Growth, Metabolism and Aging, Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, PR China
| | - Janusz M Bujnicki
- International Institute of Molecular and Cell Biology in Warsaw, 02-109 Warsaw, Poland
- Institute of Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University, Poznan, Poland
| | - Eric Westhof
- Architecture et Réactivité de l’ARN, Université de Strasbourg, Institut de biologie moléculaire et cellulaire du CNRS, 12 allée Konrad Roentgen, 67084 Strasbourg, France
| | - Marta Szachniuk
- Institute of Computing Science & European Centre for Bioinformatics and Genomics, Poznan University of Technology, 60-965 Poznan, Poland
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704 Poznan, Poland
| | - Zhichao Miao
- Translational Research Institute of Brain and Brain-Like Intelligence and Department of Anesthesiology, Shanghai Fourth People's Hospital Affiliated to Tongji University School of Medicine, Shanghai 200081, China
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB10 1SD, UK
- Newcastle Fibrosis Research Group, Institute of Cellular Medicine, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, UK
| |
Collapse
|
11
|
Kimchi O, Cragnolini T, Brenner MP, Colwell LJ. A Polymer Physics Framework for the Entropy of Arbitrary Pseudoknots. Biophys J 2019; 117:520-532. [PMID: 31353036 PMCID: PMC6697467 DOI: 10.1016/j.bpj.2019.06.037] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2018] [Revised: 06/21/2019] [Accepted: 06/27/2019] [Indexed: 11/18/2022] Open
Abstract
The accurate prediction of RNA secondary structure from primary sequence has had enormous impact on research from the past 40 years. Although many algorithms are available to make these predictions, the inclusion of non-nested loops, termed pseudoknots, still poses challenges arising from two main factors: 1) no physical model exists to estimate the loop entropies of complex intramolecular pseudoknots, and 2) their NP-complete enumeration has impeded their study. Here, we address both challenges. First, we develop a polymer physics model that can address arbitrarily complex pseudoknots using only two parameters corresponding to concrete physical quantities-over an order of magnitude fewer than the sparsest state-of-the-art phenomenological methods. Second, by coupling this model to exhaustive enumeration of the set of possible structures, we compute the entire free energy landscape of secondary structures resulting from a primary RNA sequence. We demonstrate that for RNA structures of ∼80 nucleotides, with minimal heuristics, the complete enumeration of possible secondary structures can be accomplished quickly despite the NP-complete nature of the problem. We further show that despite our loop entropy model's parametric sparsity, it performs better than or on par with previously published methods in predicting both pseudoknotted and non-pseudoknotted structures on a benchmark data set of RNA structures of ≤80 nucleotides. We suggest ways in which the accuracy of the model can be further improved.
Collapse
Affiliation(s)
- Ofer Kimchi
- Harvard Graduate Program in Biophysics, Harvard University, Cambridge, Massachusetts.
| | - Tristan Cragnolini
- Department of Chemistry, University of Cambridge, Cambridge, United Kingdom
| | - Michael P Brenner
- School of Engineering and Applied Sciences, Cambridge, Massachusetts; Kavli Institute for Bionano Science and Technology, Harvard University, Cambridge, Massachusetts
| | - Lucy J Colwell
- Department of Chemistry, University of Cambridge, Cambridge, United Kingdom.
| |
Collapse
|
12
|
Steger G, Riesner D. Viroid research and its significance for RNA technology and basic biochemistry. Nucleic Acids Res 2019; 46:10563-10576. [PMID: 30304486 PMCID: PMC6237808 DOI: 10.1093/nar/gky903] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Accepted: 09/24/2018] [Indexed: 12/27/2022] Open
Abstract
Viroids were described 47 years ago as the smallest RNA molecules capable of infecting plants and autonomously self-replicating without an encoded protein. Work on viroids initiated the development of a number of innovative methods. Novel chromatographic and gelelectrophoretic methods were developed for the purification and characterization of viroids; these methods were later used in molecular biology, gene technology and in prion research. Theoretical and experimental studies of RNA folding demonstrated the general biological importance of metastable structures, and nuclear magnetic resonance spectroscopy of viroid RNA showed the partially covalent nature of hydrogen bonds in biological macromolecules. RNA biochemistry and molecular biology profited from viroid research, such as in the detection of RNA as template of DNA-dependent polymerases and in mechanisms of gene silencing. Viroids, the first circular RNA detected in nature, are important for studies on the much wider spectrum of circular RNAs and other non-coding RNAs.
Collapse
Affiliation(s)
- Gerhard Steger
- Department of Biology, Institut für Physikalische Biologie, Heinrich-Heine-Universität Düsseldorf, Universitätsstr. 1, 40225 Düsseldorf, Germany
| | - Detlev Riesner
- Department of Biology, Institut für Physikalische Biologie, Heinrich-Heine-Universität Düsseldorf, Universitätsstr. 1, 40225 Düsseldorf, Germany
| |
Collapse
|
13
|
Jain S, Bayrak CS, Petingi L, Schlick T. Dual Graph Partitioning Highlights a Small Group of Pseudoknot-Containing RNA Submotifs. Genes (Basel) 2018; 9:E371. [PMID: 30044451 PMCID: PMC6115904 DOI: 10.3390/genes9080371] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2018] [Revised: 06/26/2018] [Accepted: 06/26/2018] [Indexed: 12/31/2022] Open
Abstract
RNA molecules are composed of modular architectural units that define their unique structural and functional properties. Characterization of these building blocks can help interpret RNA structure/function relationships. We present an RNA secondary structure motif and submotif library using dual graph representation and partitioning. Dual graphs represent RNA helices as vertices and loops as edges. Unlike tree graphs, dual graphs can represent RNA pseudoknots (intertwined base pairs). For a representative set of RNA structures, we construct dual graphs from their secondary structures, and apply our partitioning algorithm to identify non-separable subgraphs (or blocks) without breaking pseudoknots. We report 56 subgraph blocks up to nine vertices; among them, 22 are frequently occurring, 15 of which contain pseudoknots. We then catalog atomic fragments corresponding to the subgraph blocks to define a library of building blocks that can be used for RNA design, which we call RAG-3Dual, as we have done for tree graphs. As an application, we analyze the distribution of these subgraph blocks within ribosomal RNAs of various prokaryotic and eukaryotic species to identify common subgraphs and possible ancestry relationships. Other applications of dual graph partitioning and motif library can be envisioned for RNA structure analysis and design.
Collapse
Affiliation(s)
- Swati Jain
- Department of Chemistry, New York University, New York, NY 10003, USA.
| | - Cigdem S Bayrak
- Department of Chemistry, New York University, New York, NY 10003, USA.
| | - Louis Petingi
- Computer Science Department, College of Staten Island, City University of New York, Staten Island, New York, NY 10314, USA.
| | - Tamar Schlick
- Department of Chemistry, New York University, New York, NY 10003, USA.
- Courant Institute of Mathematical Sciences, New York University, New York, NY 10012, USA.
- NYU-East China Normal University Center for Computational Chemistry, New York University Shanghai, Shanghai 3663, China.
| |
Collapse
|
14
|
Antczak M, Popenda M, Zok T, Zurkowski M, Adamiak RW, Szachniuk M. New algorithms to represent complex pseudoknotted RNA structures in dot-bracket notation. Bioinformatics 2018; 34:1304-1312. [PMID: 29236971 PMCID: PMC5905660 DOI: 10.1093/bioinformatics/btx783] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2017] [Revised: 10/23/2017] [Accepted: 12/08/2017] [Indexed: 11/12/2022] Open
Abstract
Motivation Understanding the formation, architecture and roles of pseudoknots in RNA structures are one of the most difficult challenges in RNA computational biology and structural bioinformatics. Methods predicting pseudoknots typically perform this with poor accuracy, often despite experimental data incorporation. Existing bioinformatic approaches differ in terms of pseudoknots' recognition and revealing their nature. A few ways of pseudoknot classification exist, most common ones refer to a genus or order. Following the latter one, we propose new algorithms that identify pseudoknots in RNA structure provided in BPSEQ format, determine their order and encode in dot-bracket-letter notation. The proposed encoding aims to illustrate the hierarchy of RNA folding. Results New algorithms are based on dynamic programming and hybrid (combining exhaustive search and random walk) approaches. They evolved from elementary algorithm implemented within the workflow of RNA FRABASE 1.0, our database of RNA structure fragments. They use different scoring functions to rank dissimilar dot-bracket representations of RNA structure. Computational experiments show an advantage of new methods over the others, especially for large RNA structures. Availability and implementation Presented algorithms have been implemented as new functionality of RNApdbee webserver and are ready to use at http://rnapdbee.cs.put.poznan.pl. Contact mszachniuk@cs.put.poznan.pl. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Maciej Antczak
- Institute of Computing Science, Poznan University of Technology, Poznan, Poland
| | - Mariusz Popenda
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| | - Tomasz Zok
- Institute of Computing Science, Poznan University of Technology, Poznan, Poland
- Poznan Supercomputing and Networking Center, Poznan, Poland
| | - Michal Zurkowski
- Institute of Computing Science, Poznan University of Technology, Poznan, Poland
| | - Ryszard W Adamiak
- Institute of Computing Science, Poznan University of Technology, Poznan, Poland
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| | - Marta Szachniuk
- Institute of Computing Science, Poznan University of Technology, Poznan, Poland
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| |
Collapse
|
15
|
RNA structure prediction: from 2D to 3D. Emerg Top Life Sci 2017; 1:275-285. [DOI: 10.1042/etls20160027] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2017] [Revised: 07/27/2017] [Accepted: 08/10/2017] [Indexed: 11/17/2022]
Abstract
We summarize different levels of RNA structure prediction, from classical 2D structure to extended secondary structure and motif-based research toward 3D structure prediction of RNA. We outline the importance of classical secondary structure during all those levels of structure prediction.
Collapse
|