1
|
Bernard C, Postic G, Ghannay S, Tahi F. Has AlphaFold3 achieved success for RNA? Acta Crystallogr D Struct Biol 2025; 81:49-62. [PMID: 39868559 PMCID: PMC11804252 DOI: 10.1107/s2059798325000592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2024] [Accepted: 01/21/2025] [Indexed: 01/28/2025] Open
Abstract
Predicting the 3D structure of RNA is a significant challenge despite ongoing advancements in the field. Although AlphaFold has successfully addressed this problem for proteins, RNA structure prediction raises difficulties due to the fundamental differences between proteins and RNA, which hinder its direct adaptation. The latest release of AlphaFold, AlphaFold3, has broadened its scope to include multiple different molecules such as DNA, ligands and RNA. While the AlphaFold3 article discussed the results for the last CASP-RNA data set, the scope of its performance and the limitations for RNA are unclear. In this article, we provide a comprehensive analysis of the performance of AlphaFold3 in the prediction of 3D structures of RNA. Through an extensive benchmark over five different test sets, we discuss the performance and limitations of AlphaFold3. We also compare its performance with ten existing state-of-the-art ab initio, template-based and deep-learning approaches. Our results are freely available on the EvryRNA platform at https://evryrna.ibisc.univ-evry.fr/evryrna/alphafold3/.
Collapse
Affiliation(s)
- Clément Bernard
- Université Paris-Saclay, Université Evry, IBISC, 91020Evry-Courcouronnes, France
- LISN – CNRS/Université Paris-Saclay, 91400Orsay, France
| | - Guillaume Postic
- Université Paris-Saclay, Université Evry, IBISC, 91020Evry-Courcouronnes, France
| | - Sahar Ghannay
- LISN – CNRS/Université Paris-Saclay, 91400Orsay, France
| | - Fariza Tahi
- Université Paris-Saclay, Université Evry, IBISC, 91020Evry-Courcouronnes, France
| |
Collapse
|
2
|
Szikszai M, Magnus M, Sanghi S, Kadyan S, Bouatta N, Rivas E. RNA3DB: A structurally-dissimilar dataset split for training and benchmarking deep learning models for RNA structure prediction. J Mol Biol 2024; 436:168552. [PMID: 38552946 PMCID: PMC11377173 DOI: 10.1016/j.jmb.2024.168552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 03/19/2024] [Accepted: 03/22/2024] [Indexed: 04/09/2024]
Abstract
With advances in protein structure prediction thanks to deep learning models like AlphaFold, RNA structure prediction has recently received increased attention from deep learning researchers. RNAs introduce substantial challenges due to the sparser availability and lower structural diversity of the experimentally resolved RNA structures in comparison to protein structures. These challenges are often poorly addressed by the existing literature, many of which report inflated performance due to using training and testing sets with significant structural overlap. Further, the most recent Critical Assessment of Structure Prediction (CASP15) has shown that deep learning models for RNA structure are currently outperformed by traditional methods. In this paper we present RNA3DB, a dataset of structured RNAs, derived from the Protein Data Bank (PDB), that is designed for training and benchmarking deep learning models. The RNA3DB method arranges the RNA 3D chains into distinct groups (Components) that are non-redundant both with regard to sequence as well as structure, providing a robust way of dividing training, validation, and testing sets. Any split of these structurally-dissimilar Components are guaranteed to produce test and validations sets that are distinct by sequence and structure from those in the training set. We provide the RNA3DB dataset, a particular train/test split of the RNA3DB Components (in an approximate 70/30 ratio) that will be updated periodically. We also provide the RNA3DB methodology along with the source-code, with the goal of creating a reproducible and customizable tool for producing structurally-dissimilar dataset splits for structural RNAs.
Collapse
Affiliation(s)
- Marcell Szikszai
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, 02138, MA, USA
| | - Marcin Magnus
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, 02138, MA, USA
| | - Siddhant Sanghi
- Department of Systems Biology, Columbia University, New York 10027, NY, USA; College of Biological Sciences, UC Davis, Davis 95616, CA, USA
| | - Sachin Kadyan
- Department of Systems Biology, Columbia University, New York 10027, NY, USA
| | - Nazim Bouatta
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston 02115, MA, USA
| | - Elena Rivas
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, 02138, MA, USA
| |
Collapse
|
3
|
Bugnon LA, Di Persia L, Gerard M, Raad J, Prochetto S, Fenoy E, Chorostecki U, Ariel F, Stegmayer G, Milone DH. sincFold: end-to-end learning of short- and long-range interactions in RNA secondary structure. Brief Bioinform 2024; 25:bbae271. [PMID: 38855913 PMCID: PMC11163250 DOI: 10.1093/bib/bbae271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Revised: 05/03/2024] [Accepted: 05/24/2024] [Indexed: 06/11/2024] Open
Abstract
MOTIVATION Coding and noncoding RNA molecules participate in many important biological processes. Noncoding RNAs fold into well-defined secondary structures to exert their functions. However, the computational prediction of the secondary structure from a raw RNA sequence is a long-standing unsolved problem, which after decades of almost unchanged performance has now re-emerged due to deep learning. Traditional RNA secondary structure prediction algorithms have been mostly based on thermodynamic models and dynamic programming for free energy minimization. More recently deep learning methods have shown competitive performance compared with the classical ones, but there is still a wide margin for improvement. RESULTS In this work we present sincFold, an end-to-end deep learning approach, that predicts the nucleotides contact matrix using only the RNA sequence as input. The model is based on 1D and 2D residual neural networks that can learn short- and long-range interaction patterns. We show that structures can be accurately predicted with minimal physical assumptions. Extensive experiments were conducted on several benchmark datasets, considering sequence homology and cross-family validation. sincFold was compared with classical methods and recent deep learning models, showing that it can outperform the state-of-the-art methods.
Collapse
Affiliation(s)
- Leandro A Bugnon
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina
| | - Leandro Di Persia
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina
| | - Matias Gerard
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina
| | - Jonathan Raad
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina
| | - Santiago Prochetto
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina
- Instituto de Agrobiotecnología del Litoral, CONICET-UNL, CCT-Santa Fe, Ruta Nacional N° 168 Km 0, s/n, Paraje el Pozo, 3000, Santa Fe, Argentina
| | - Emilio Fenoy
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina
| | - Uciel Chorostecki
- Faculty of Medicine and Health Sciences, Universitat Internacional de Catalunya, Barcelona, Spain
| | - Federico Ariel
- Instituto de Agrobiotecnología del Litoral, CONICET-UNL, CCT-Santa Fe, Ruta Nacional N° 168 Km 0, s/n, Paraje el Pozo, 3000, Santa Fe, Argentina
| | - Georgina Stegmayer
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina
| | - Diego H Milone
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina
| |
Collapse
|
4
|
Jin Z, Sheng J, Hu Y, Zhang Y, Wang X, Huang Y. Shining a spotlight on m6A and the vital role of RNA modification in endometrial cancer: a review. Front Genet 2023; 14:1247309. [PMID: 37886684 PMCID: PMC10598767 DOI: 10.3389/fgene.2023.1247309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2023] [Accepted: 09/19/2023] [Indexed: 10/28/2023] Open
Abstract
RNA modifications are mostly dynamically reversible post-transcriptional modifications, of which m6A is the most prevalent in eukaryotic mRNAs. A growing number of studies indicate that RNA modification can finely tune gene expression and modulate RNA metabolic homeostasis, which in turn affects the self-renewal, proliferation, apoptosis, migration, and invasion of tumor cells. Endometrial carcinoma (EC) is the most common gynecologic tumor in developed countries. Although it can be diagnosed early in the onset and have a preferable prognosis, some cases might develop and become metastatic or recurrent, with a worse prognosis. Fortunately, immunotherapy and targeted therapy are promising methods of treating endometrial cancer patients. Gene modifications may also contribute to these treatments, as is especially the case with recent developments of new targeted therapeutic genes and diagnostic biomarkers for EC, even though current findings on the relationship between RNA modification and EC are still very limited, especially m6A. For example, what is the elaborate mechanism by which RNA modification affects EC progression? Taking m6A modification as an example, what is the conversion mode of methylation and demethylation for RNAs, and how to achieve selective recognition of specific RNA? Understanding how they cope with various stimuli as part of in vivo and in vitro biological development, disease or tumor occurrence and development, and other processes is valuable and RNA modifications provide a distinctive insight into genetic information. The roles of these processes in coping with various stimuli, biological development, disease, or tumor development in vivo and in vitro are self-evident and may become a new direction for cancer in the future. In this review, we summarize the category, characteristics, and therapeutic precis of RNA modification, m6A in particular, with the purpose of seeking the systematic regulation axis related to RNA modification to provide a better solution for the treatment of EC.
Collapse
Affiliation(s)
- Zujian Jin
- Department of Gynecology and Obstetrics, The Fourth Affiliated Hospital, Zhejiang Provincial Clinical Research Center for Obstetrics and Gynecology, Zhejiang University School of Medicine, Yiwu, Zhejiang, China
| | - Jingjing Sheng
- Department of Gynecology and Obstetrics, The Fourth Affiliated Hospital, Zhejiang Provincial Clinical Research Center for Obstetrics and Gynecology, Zhejiang University School of Medicine, Yiwu, Zhejiang, China
| | - Yingying Hu
- Department of Gynecology and Obstetrics, The Fourth Affiliated Hospital, Zhejiang Provincial Clinical Research Center for Obstetrics and Gynecology, Zhejiang University School of Medicine, Yiwu, Zhejiang, China
| | - Yu Zhang
- Department of Gynecology and Obstetrics, The Fourth Affiliated Hospital, Zhejiang Provincial Clinical Research Center for Obstetrics and Gynecology, Zhejiang University School of Medicine, Yiwu, Zhejiang, China
| | - Xiaoxia Wang
- Reproductive Medicine Center, School of Medicine, The Fourth Affiliated Hospital, Zhejiang University, Yiwu, Zhejiang, China
| | - Yiping Huang
- Department of Gynecology and Obstetrics, The Fourth Affiliated Hospital, Zhejiang Provincial Clinical Research Center for Obstetrics and Gynecology, Zhejiang University School of Medicine, Yiwu, Zhejiang, China
| |
Collapse
|
5
|
Tan YL, Wang X, Yu S, Zhang B, Tan ZJ. cgRNASP: coarse-grained statistical potentials with residue separation for RNA structure evaluation. NAR Genom Bioinform 2023; 5:lqad016. [PMID: 36879898 PMCID: PMC9985339 DOI: 10.1093/nargab/lqad016] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 01/21/2023] [Accepted: 02/03/2023] [Indexed: 03/07/2023] Open
Abstract
Knowledge-based statistical potentials are very important for RNA 3-dimensional (3D) structure prediction and evaluation. In recent years, various coarse-grained (CG) and all-atom models have been developed for predicting RNA 3D structures, while there is still lack of reliable CG statistical potentials not only for CG structure evaluation but also for all-atom structure evaluation at high efficiency. In this work, we have developed a series of residue-separation-based CG statistical potentials at different CG levels for RNA 3D structure evaluation, namely cgRNASP, which is composed of long-ranged and short-ranged interactions by residue separation. Compared with the newly developed all-atom rsRNASP, the short-ranged interaction in cgRNASP was involved more subtly and completely. Our examinations show that, the performance of cgRNASP varies with CG levels and compared with rsRNASP, cgRNASP has similarly good performance for extensive types of test datasets and can have slightly better performance for the realistic dataset-RNA-Puzzles dataset. Furthermore, cgRNASP is strikingly more efficient than all-atom statistical potentials/scoring functions, and can be apparently superior to other all-atom statistical potentials and scoring functions trained from neural networks for the RNA-Puzzles dataset. cgRNASP is available at https://github.com/Tan-group/cgRNASP.
Collapse
Affiliation(s)
- Ya-Lan Tan
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University, Wuhan 430073, China.,Department of Physics and Key Laboratory of Artificial Micro & Nano-structures of Education, School of Physics and Technology, Wuhan University, Wuhan 430072, China
| | - Xunxun Wang
- Department of Physics and Key Laboratory of Artificial Micro & Nano-structures of Education, School of Physics and Technology, Wuhan University, Wuhan 430072, China
| | - Shixiong Yu
- Department of Physics and Key Laboratory of Artificial Micro & Nano-structures of Education, School of Physics and Technology, Wuhan University, Wuhan 430072, China
| | - Bengong Zhang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University, Wuhan 430073, China
| | - Zhi-Jie Tan
- Department of Physics and Key Laboratory of Artificial Micro & Nano-structures of Education, School of Physics and Technology, Wuhan University, Wuhan 430072, China
| |
Collapse
|
6
|
Sato R, Suzuki K, Yasuda Y, Suenaga A, Fukui K. RNAapt3D: RNA aptamer 3D-structural modeling database. Biophys J 2022; 121:4770-4776. [PMID: 36146935 PMCID: PMC9808543 DOI: 10.1016/j.bpj.2022.09.023] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Revised: 08/17/2022] [Accepted: 09/20/2022] [Indexed: 01/07/2023] Open
Abstract
RNA aptamers are oligonucleotides with high binding affinity and specificity for target molecules and are expected to be a new generation of therapeutic molecules and targeted delivery materials. The tertiary structure of RNA molecules and RNA-protein interaction sites are increasingly important as potential targets for new drugs. The pathological mechanisms of diseases must be understood in detail to guide drug design. In developing RNA aptamers as drugs, information about the interaction mechanisms and structures of RNA aptamer-target protein complexes are useful. We constructed a database, RNA aptamer 3D-structural modeling (RNAapt3D), consisting of RNA aptamer data that are potential drug candidates. The database includes RNA sequences and computationally predicted RNA tertiary structures based on secondary structures and implements methods that can be used to predict unknown structures of RNA aptamer-target molecule complexes. RNAapt3D should enable the design of RNA aptamers for target molecules and improve the efficiency and productivity of candidate drug selection. RNAapt3D can be accessed at https://rnaapt3d.medals.jp.
Collapse
Affiliation(s)
- Ryuma Sato
- Cellular and Molecular Biotechnology Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
| | - Koji Suzuki
- Cellular and Molecular Biotechnology Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
| | - Yuichi Yasuda
- College of Humanities and Science, Department of Biosciences, Nihon University, Tokyo, Japan
| | - Atsushi Suenaga
- College of Humanities and Science, Department of Biosciences, Nihon University, Tokyo, Japan
| | - Kazuhiko Fukui
- Cellular and Molecular Biotechnology Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan.
| |
Collapse
|
7
|
Moniot A, Guermeur Y, de Vries SJ, Chauvot de Beauchene I. ProtNAff: protein-bound Nucleic Acid filters and fragment libraries. Bioinformatics 2022; 38:3911-3917. [PMID: 35775902 DOI: 10.1093/bioinformatics/btac430] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Revised: 04/25/2022] [Accepted: 06/28/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Atomistic models of nucleic acids (NA) fragments can be used to model the 3D structures of specific protein-NA interactions and address the problem of great NA flexibility, especially in their single-stranded regions. One way to obtain relevant NA fragments is to extract them from existing 3D structures corresponding to the targeted context (e.g. specific 2D structures, protein families, sequences) and to learn from them. Several databases exist for specific NA 3D motifs, especially in RNA, but none can handle the variety of possible contexts. RESULTS This article presents protNAff (protein-bound Nucleic Acids filters and fragments), a new pipeline for the conception of searchable databases on the 2D and 3D structures of protein-bound NA, the selection of context-specific (regions of) NA structures by combinations of filters, and the creation of context-specific NA fragment libraries. The strength of this pipeline is its modularity, allowing users to adapt it to many specific modeling problems. As examples, the pipeline is applied to the quantitative analysis of (i) the sequence-specificity of trinucleotide conformations, (ii) the conformational diversity of RNA at several levels of resolution, (iii) the effect of protein binding on RNA local conformations and (iv) the protein-binding propensity of RNA hairpin loops of various lengths. AVAILABILITY AND IMPLEMENTATION The source code is freely available for download at URL https://github.com/isaureCdB/protNAff. The database and the trinucleotide fragment library are downloadable at URL https://zenodo.org/record/6483823#.YmbVhFxByV4. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Antoine Moniot
- LORIA (CNRS - INRIA - Université de Lorraine), Nancy 54000, France
| | - Yann Guermeur
- LORIA (CNRS - INRIA - Université de Lorraine), Nancy 54000, France
| | - Sjoerd Jacob de Vries
- Ressource Parisienne en Bioinformatique Structurale (RPBS), Paris 75013, France.,BFA, CNRS UMR 8251, INSERM ERL U1133, Paris 75013, France
| | | |
Collapse
|
8
|
Szikszai M, Wise M, Datta A, Ward M, Mathews DH. Deep learning models for RNA secondary structure prediction (probably) do not generalize across families. Bioinformatics 2022; 38:3892-3899. [PMID: 35748706 PMCID: PMC9364374 DOI: 10.1093/bioinformatics/btac415] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 06/09/2022] [Accepted: 06/21/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION The secondary structure of RNA is of importance to its function. Over the last few years, several papers attempted to use machine learning to improve de novo RNA secondary structure prediction. Many of these papers report impressive results for intra-family predictions but seldom address the much more difficult (and practical) inter-family problem. RESULTS We demonstrate that it is nearly trivial with convolutional neural networks to generate pseudo-free energy changes, modelled after structure mapping data that improve the accuracy of structure prediction for intra-family cases. We propose a more rigorous method for inter-family cross-validation that can be used to assess the performance of learning-based models. Using this method, we further demonstrate that intra-family performance is insufficient proof of generalization despite the widespread assumption in the literature and provide strong evidence that many existing learning-based models have not generalized inter-family. AVAILABILITY AND IMPLEMENTATION Source code and data are available at https://github.com/marcellszi/dl-rna. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Marcell Szikszai
- Department of Computer Science & Software Engineering, The University of Western Australia, Perth, WA 6009, Australia
| | - Michael Wise
- Department of Computer Science & Software Engineering, The University of Western Australia, Perth, WA 6009, Australia
- The Marshall Centre for Infectious Diseases Research and Training, The University of Western Australia, Perth, WA 6009, Australia
| | - Amitava Datta
- Department of Computer Science & Software Engineering, The University of Western Australia, Perth, WA 6009, Australia
| | - Max Ward
- Department of Computer Science & Software Engineering, The University of Western Australia, Perth, WA 6009, Australia
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA
| | - David H Mathews
- Department of Biochemistry & Biophysics, Center for RNA Biology, and Department of Biostatistics & Computational Biology, University of Rochester, Rochester, NY 14642, USA
| |
Collapse
|
9
|
Wiedemann J, Kaczor J, Milostan M, Zok T, Blazewicz J, Szachniuk M, Antczak M. RNAloops: a database of RNA multiloops. Bioinformatics 2022; 38:4200-4205. [PMID: 35809063 PMCID: PMC9438955 DOI: 10.1093/bioinformatics/btac484] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Revised: 06/26/2022] [Accepted: 07/06/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Knowledge of the 3D structure of RNA supports discovering its functions and is crucial for designing drugs and modern therapeutic solutions. Thus, much attention is devoted to experimental determination and computational prediction targeting the global fold of RNA and its local substructures. The latter include multi-branched loops-functionally significant elements that highly affect the spatial shape of the entire molecule. Unfortunately, their computational modeling constitutes a weak point of structural bioinformatics. A remedy for this is in collecting these motifs and analyzing their features. RESULTS RNAloops is a self-updating database that stores multi-branched loops identified in the PDB-deposited RNA structures. A description of each loop includes angular data-planar and Euler angles computed between pairs of adjacent helices to allow studying their mutual arrangement in space. The system enables search and analysis of multiloops, presents their structure details numerically and visually, and computes data statistics. AVAILABILITY AND IMPLEMENTATION RNAloops is freely accessible at https://rnaloops.cs.put.poznan.pl. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jakub Wiedemann
- Institute of Computing Science, Poznan University of Technology, 60-965 Poznan, Poland
| | - Jacek Kaczor
- Institute of Computing Science, Poznan University of Technology, 60-965 Poznan, Poland
| | - Maciej Milostan
- Institute of Computing Science, Poznan University of Technology, 60-965 Poznan, Poland,Poznan Supercomputing and Networking Center, 61-131 Poznan, Poland
| | - Tomasz Zok
- Institute of Computing Science, Poznan University of Technology, 60-965 Poznan, Poland,Poznan Supercomputing and Networking Center, 61-131 Poznan, Poland
| | - Jacek Blazewicz
- Institute of Computing Science, Poznan University of Technology, 60-965 Poznan, Poland,Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704 Poznan, Poland
| | | | | |
Collapse
|
10
|
Adamczyk B, Antczak M, Szachniuk M. RNAsolo: a repository of cleaned PDB-derived RNA 3D structures. Bioinformatics 2022; 38:3668-3670. [PMID: 35674373 PMCID: PMC9272803 DOI: 10.1093/bioinformatics/btac386] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 04/22/2022] [Accepted: 06/02/2022] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The development of algorithms dedicated to RNA 3D structures contributes to the demand for training, testing, and benchmarking data. A reliable source of such data derived from computational prediction is the RNA-Puzzles repository. In contrast, the largest resource with experimentally determined structures is the Protein Data Bank. However, files in this archive often contain other molecular data in addition to the RNA structure itself, which-to be used by RNA processing algorithms-should be removed. RESULTS RNAsolo is a self-updating database dedicated to RNA bioinformatics. It systematically collects experimentally determined RNA 3D structures stored in the PDB, cleans them from non-RNA chains, and groups them into equivalence classes. It allows users to download various subsets of data-clustered by resolution, source, data format, etc. - for further processing and analysis with a single click. AVAILABILITY The repository is publicly available at https://rnasolo.cs.put.poznan.pl.
Collapse
Affiliation(s)
- Bartosz Adamczyk
- Institute of Computing Science,Poznan University of Technology, Piotrowo 2, Poznan, 60-965, Poland
| | - Maciej Antczak
- Institute of Computing Science,Poznan University of Technology, Piotrowo 2, Poznan, 60-965, Poland.,Institute of Bioorganic Chemistry,Polish Academy of Sciences, Noskowskiego 12/14, Poznan, 61-704, Poland
| | - Marta Szachniuk
- Institute of Computing Science,Poznan University of Technology, Piotrowo 2, Poznan, 60-965, Poland.,Institute of Bioorganic Chemistry,Polish Academy of Sciences, Noskowskiego 12/14, Poznan, 61-704, Poland
| |
Collapse
|