1
|
Gong T, Ju F, Bu D. Accurate prediction of RNA secondary structure including pseudoknots through solving minimum-cost flow with learned potentials. Commun Biol 2024; 7:297. [PMID: 38461362 PMCID: PMC10924946 DOI: 10.1038/s42003-024-05952-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 02/21/2024] [Indexed: 03/11/2024] Open
Abstract
Pseudoknots are key structure motifs of RNA and pseudoknotted RNAs play important roles in a variety of biological processes. Here, we present KnotFold, an accurate approach to the prediction of RNA secondary structure including pseudoknots. The key elements of KnotFold include a learned potential function and a minimum-cost flow algorithm to find the secondary structure with the lowest potential. KnotFold learns the potential from the RNAs with known structures using an attention-based neural network, thus avoiding the inaccuracy of hand-crafted energy functions. The specially designed minimum-cost flow algorithm used by KnotFold considers all possible combinations of base pairs and selects from them the optimal combination. The algorithm breaks the restriction of nested base pairs required by the widely used dynamic programming algorithms, thus enabling the identification of pseudoknots. Using 1,009 pseudoknotted RNAs as representatives, we demonstrate the successful application of KnotFold in predicting RNA secondary structures including pseudoknots with accuracy higher than the state-of-the-art approaches. We anticipate that KnotFold, with its superior accuracy, will greatly facilitate the understanding of RNA structures and functionalities.
Collapse
Affiliation(s)
- Tiansu Gong
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, 100190, Beijing, China
- University of Chinese Academy of Sciences, 100190, Beijing, China
| | - Fusong Ju
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, 100190, Beijing, China
- University of Chinese Academy of Sciences, 100190, Beijing, China
| | - Dongbo Bu
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, 100190, Beijing, China.
- University of Chinese Academy of Sciences, 100190, Beijing, China.
- Central China Artificial Intelligence Research Institute, Henan Academy of Sciences, Zhengzhou, 450046, Henan, China.
| |
Collapse
|
2
|
Tieng FYF, Abdullah-Zawawi MR, Md Shahri NAA, Mohamed-Hussein ZA, Lee LH, Mutalib NSA. A Hitchhiker's guide to RNA-RNA structure and interaction prediction tools. Brief Bioinform 2023; 25:bbad421. [PMID: 38040490 PMCID: PMC10753535 DOI: 10.1093/bib/bbad421] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 10/16/2023] [Accepted: 10/26/2023] [Indexed: 12/03/2023] Open
Abstract
RNA biology has risen to prominence after a remarkable discovery of diverse functions of noncoding RNA (ncRNA). Most untranslated transcripts often exert their regulatory functions into RNA-RNA complexes via base pairing with complementary sequences in other RNAs. An interplay between RNAs is essential, as it possesses various functional roles in human cells, including genetic translation, RNA splicing, editing, ribosomal RNA maturation, RNA degradation and the regulation of metabolic pathways/riboswitches. Moreover, the pervasive transcription of the human genome allows for the discovery of novel genomic functions via RNA interactome investigation. The advancement of experimental procedures has resulted in an explosion of documented data, necessitating the development of efficient and precise computational tools and algorithms. This review provides an extensive update on RNA-RNA interaction (RRI) analysis via thermodynamic- and comparative-based RNA secondary structure prediction (RSP) and RNA-RNA interaction prediction (RIP) tools and their general functions. We also highlighted the current knowledge of RRIs and the limitations of RNA interactome mapping via experimental data. Then, the gap between RSP and RIP, the importance of RNA homologues, the relationship between pseudoknots, and RNA folding thermodynamics are discussed. It is hoped that these emerging prediction tools will deepen the understanding of RNA-associated interactions in human diseases and hasten treatment processes.
Collapse
Affiliation(s)
- Francis Yew Fu Tieng
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia (UKM), Kuala Lumpur 56000, Malaysia
| | | | - Nur Alyaa Afifah Md Shahri
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia (UKM), Kuala Lumpur 56000, Malaysia
| | - Zeti-Azura Mohamed-Hussein
- Institute of Systems Biology (INBIOSIS), UKM, Selangor 43600, Malaysia
- Department of Applied Physics, Faculty of Science and Technology, UKM, Selangor 43600, Malaysia
| | - Learn-Han Lee
- Sunway Microbiomics Centre, School of Medical and Life Sciences, Sunway University, Sunway City 47500, Malaysia
- Novel Bacteria and Drug Discovery Research Group, Microbiome and Bioresource Research Strength, Jeffrey Cheah School of Medicine and Health Sciences, Monash University of Malaysia, Selangor 47500, Malaysia
| | - Nurul-Syakima Ab Mutalib
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia (UKM), Kuala Lumpur 56000, Malaysia
- Novel Bacteria and Drug Discovery Research Group, Microbiome and Bioresource Research Strength, Jeffrey Cheah School of Medicine and Health Sciences, Monash University of Malaysia, Selangor 47500, Malaysia
- Faculty of Health Sciences, UKM, Kuala Lumpur 50300, Malaysia
| |
Collapse
|
3
|
Sato K, Kato Y. Prediction of RNA secondary structure including pseudoknots for long sequences. Brief Bioinform 2021; 23:6380459. [PMID: 34601552 PMCID: PMC8769711 DOI: 10.1093/bib/bbab395] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Revised: 08/13/2021] [Accepted: 08/30/2021] [Indexed: 12/28/2022] Open
Abstract
RNA structural elements called pseudoknots are involved in various biological phenomena including ribosomal frameshifts. Because it is infeasible to construct an efficiently computable secondary structure model including pseudoknots, secondary structure prediction methods considering pseudoknots are not yet widely available. We developed IPknot, which uses heuristics to speed up computations, but it has remained difficult to apply it to long sequences, such as messenger RNA and viral RNA, because it requires cubic computational time with respect to sequence length and has threshold parameters that need to be manually adjusted. Here, we propose an improvement of IPknot that enables calculation in linear time by employing the LinearPartition model and automatically selects the optimal threshold parameters based on the pseudo-expected accuracy. In addition, IPknot showed favorable prediction accuracy across a wide range of conditions in our exhaustive benchmarking, not only for single sequences but also for multiple alignments.
Collapse
Affiliation(s)
- Kengo Sato
- Department of Biosciences and Informatics, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama 223-8522, Japan
| | - Yuki Kato
- Department of RNA Biology and Neuroscience, Graduate School of Medicine, Osaka University, Suita, Osaka 565-0871, Japan.,Integrated Frontier Research for Medical Science Division, Institute for Open and Transdisciplinary Research Initiatives, Osaka University, Suita, Osaka 565-0871, Japan
| |
Collapse
|
4
|
Islam MR, Islam MS, Sakeef N. RNA Secondary Structure Prediction with Pseudoknots Using Chemical Reaction Optimization Algorithm. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1195-1207. [PMID: 31443047 DOI: 10.1109/tcbb.2019.2936570] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
RNA molecules play a significant role in cell function especially including pseudoknots. In past decades, several methods have been developed to predict RNA secondary structure with pseudoknots and the most popular one uses minimum free energy. It is a nondeterministic polynomial-time hard (NP-hard) problem. We have proposed an approach based on a metaheuristic algorithm named Chemical Reaction Optimization (CRO) to solve the RNA pseudoknotted structure prediction problem. The reaction operators of CRO algorithm have been redesigned and used on the generated population to find the structure with the minimum free energy. Besides, we have developed an additional operator called Repair operator which has a great influence on our algorithm in increasing accuracy. It helps to increase the true positive base pairs while decreasing the false positive and false negative base pairs. Four energy models have been applied to calculate the energy. To evaluate the performance, we have used four datasets containing RNA pseudoknotted sequences taken from the RNA STRAND and Pseudobase++ database. We have compared the proposed approach with some existing algorithms and shown that our CRO based model is a better prediction method in terms of accuracy and speed.
Collapse
|
5
|
Liu Y, Zhao Q, Zhang H, Xu R, Li Y, Wei L. A New Method to Predict RNA Secondary Structure Based on RNA Folding Simulation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:990-995. [PMID: 26552091 DOI: 10.1109/tcbb.2015.2496347] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
RNA plays an important role in various biological processes; hence, it is essential when determining the functions of RNA to research its secondary structures. So far, the accuracy of RNA secondary structure prediction remains an area in need of improvement. This paper presents a novel method for predicting RNA secondary structure based on an RNA folding simulation model. This model assumes that the process of RNA folding from the random coil state to full structure is staged and in every stage of folding, the final state of an RNA is determined by the optimal combination of helical regions, which are urgently essential to dynamics of RNA formation. This paper proposes the First Large Free Energy Difference (FLED) in order to find the helical regions most urgently needed for optimal final state formation among all the possible helical regions. Tests on the datasets with known structures from public databases demonstrate that our method can outperform other current RNA secondary structure prediction methods in terms of prediction accuracy.
Collapse
|
6
|
Abstract
RNA secondary structure is often predicted using folding thermodynamics. RNAstructure is a software package that includes structure prediction by free energy minimization, prediction of base pairing probabilities, prediction of structures composed of highly probably base pairs, and prediction of structures with pseudoknots. A user-friendly graphical user interface is provided, and this interface works on Windows, Apple OS X, and Linux. This chapter provides protocols for using RNAstructure for structure prediction.
Collapse
|
7
|
Lim NCH, Jackson SE. Molecular knots in biology and chemistry. JOURNAL OF PHYSICS. CONDENSED MATTER : AN INSTITUTE OF PHYSICS JOURNAL 2015; 27:354101. [PMID: 26291690 DOI: 10.1088/0953-8984/27/35/354101] [Citation(s) in RCA: 109] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Knots and entanglements are ubiquitous. Beyond their aesthetic appeal, these fascinating topological entities can be either useful or cumbersome. In recent decades, the importance and prevalence of molecular knots have been increasingly recognised by scientists from different disciplines. In this review, we provide an overview on the various molecular knots found in naturally occurring biological systems (DNA, RNA and proteins), and those created by synthetic chemists. We discuss the current knowledge in these fields, including recent developments in experimental and, in some cases, computational studies which are beginning to shed light into the complex interplay between the structure, formation and properties of these topologically intricate molecules.
Collapse
Affiliation(s)
- Nicole C H Lim
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, UK. Faculty of Sciences, Universiti Brunei Darussalam, Gadong BE 1410, Brunei Darussalam
| | | |
Collapse
|
8
|
Li H, Zhu D, Zhang C, Han H, Crandall KA. Characteristics and prediction of RNA structure. BIOMED RESEARCH INTERNATIONAL 2014; 2014:690340. [PMID: 25110687 PMCID: PMC4109605 DOI: 10.1155/2014/690340] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2014] [Accepted: 06/11/2014] [Indexed: 11/18/2022]
Abstract
RNA secondary structures with pseudoknots are often predicted by minimizing free energy, which is NP-hard. Most RNAs fold during transcription from DNA into RNA through a hierarchical pathway wherein secondary structures form prior to tertiary structures. Real RNA secondary structures often have local instead of global optimization because of kinetic reasons. The performance of RNA structure prediction may be improved by considering dynamic and hierarchical folding mechanisms. This study is a novel report on RNA folding that accords with the golden mean characteristic based on the statistical analysis of the real RNA secondary structures of all 480 sequences from RNA STRAND, which are validated by NMR or X-ray. The length ratios of domains in these sequences are approximately 0.382L, 0.5L, 0.618L, and L, where L is the sequence length. These points are just the important golden sections of sequence. With this characteristic, an algorithm is designed to predict RNA hierarchical structures and simulate RNA folding by dynamically folding RNA structures according to the above golden section points. The sensitivity and number of predicted pseudoknots of our algorithm are better than those of the Mfold, HotKnots, McQfold, ProbKnot, and Lhw-Zhu algorithms. Experimental results reflect the folding rules of RNA from a new angle that is close to natural folding.
Collapse
Affiliation(s)
- Hengwu Li
- School of Computer Science and Technology, Shandong Provincial Key Laboratory of Digital Media Technology, Shandong University of Finance and Economics, Jinan 250014, China
- Computational Biology Institute, George Washington University, Ashburn, VA 20147, USA
| | - Daming Zhu
- School of Computer Science and Technology, Shandong Provincial Key Laboratory of Software Engineering, Shandong University, Jinan 250101, China
| | - Caiming Zhang
- School of Computer Science and Technology, Shandong Provincial Key Laboratory of Software Engineering, Shandong University, Jinan 250101, China
| | - Huijian Han
- School of Computer Science and Technology, Shandong Provincial Key Laboratory of Digital Media Technology, Shandong University of Finance and Economics, Jinan 250014, China
| | - Keith A. Crandall
- Computational Biology Institute, George Washington University, Ashburn, VA 20147, USA
| |
Collapse
|
9
|
Chen J, Gong S, Wang Y, Zhang W. Kinetic partitioning mechanism of HDV ribozyme folding. J Chem Phys 2014; 140:025102. [DOI: 10.1063/1.4861037] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
|
10
|
Large-scale study of long non-coding RNA functions based on structure and expression features. SCIENCE CHINA-LIFE SCIENCES 2013; 56:953-9. [PMID: 24091687 DOI: 10.1007/s11427-013-4556-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/14/2013] [Accepted: 09/02/2013] [Indexed: 02/01/2023]
Abstract
Mammals and other complex organisms can transcribe an abundance of long non-coding RNAs (lncRNAs) that fulfill a wide variety of regulatory roles in many biological processes. These roles, including as scaffolds and as guides for protein-coding genes, mainly depend on the structure and expression level of lncRNAs. In this review, we focus on the current methods for analyzing lncRNA structure and expression, which is basic but necessary information for in-depth, large-scale analysis of lncRNA functions.
Collapse
|
11
|
Doose G, Metzler D. Bayesian sampling of evolutionarily conserved RNA secondary structures with pseudoknots. Bioinformatics 2012; 28:2242-8. [PMID: 22796961 DOI: 10.1093/bioinformatics/bts369] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Affiliation(s)
- Gero Doose
- Department of Biology, LMU Biocenter, Ludwig-Maximilians-Universität München, Planegg-Martinsried, Germany.
| | | |
Collapse
|
12
|
Zhang J, Bian Y, Lin H, Wang W. RNA fragment modeling with a nucleobase discrete-state model. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2012; 85:021909. [PMID: 22463246 DOI: 10.1103/physreve.85.021909] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2011] [Revised: 12/30/2011] [Indexed: 05/24/2023]
Abstract
In this work we develop an approach for predicting the tertiary structures of RNA fragments by combining an RNA nucleobase discrete state (RNAnbds) model, a sequential Monte Carlo method, and a statistical potential. The RNAnbds model is designed for optimizing the configuration of nucleobases with respect to their preceding ones along the sequence and their spatial neighbors, in contrast to previous works that focus on RNA backbones. The tests of our approach with the fragments taken from a small RNA pseudoknot and a 23S ribosome RNA show that for short fragments (<10 nucleotides), the root mean square deviations (RMSDs) between the predicted and the experimental ones are generally smaller than 3 Å; for slightly longer fragments (10-15 nucleotides), most RMSDs are smaller than 4 Å. The comparison of our method with another physics-based predictor with a testing set containing nine loops shows that ours is superior in both accuracy and efficiency. Our approach is useful in facilitating RNA three-dimensional structure prediction as well as loop modeling. It also holds the promise of providing insight into the structural ensembles of RNA loops.
Collapse
Affiliation(s)
- Jian Zhang
- National Laboratory of Solid State Microstructure and School of Business, Nanjing University, China
| | | | | | | |
Collapse
|
13
|
Sato K, Kato Y, Hamada M, Akutsu T, Asai K. IPknot: fast and accurate prediction of RNA secondary structures with pseudoknots using integer programming. ACTA ACUST UNITED AC 2011; 27:i85-93. [PMID: 21685106 PMCID: PMC3117384 DOI: 10.1093/bioinformatics/btr215] [Citation(s) in RCA: 181] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
MOTIVATION Pseudoknots found in secondary structures of a number of functional RNAs play various roles in biological processes. Recent methods for predicting RNA secondary structures cover certain classes of pseudoknotted structures, but only a few of them achieve satisfying predictions in terms of both speed and accuracy. RESULTS We propose IPknot, a novel computational method for predicting RNA secondary structures with pseudoknots based on maximizing expected accuracy of a predicted structure. IPknot decomposes a pseudoknotted structure into a set of pseudoknot-free substructures and approximates a base-pairing probability distribution that considers pseudoknots, leading to the capability of modeling a wide class of pseudoknots and running quite fast. In addition, we propose a heuristic algorithm for refining base-paring probabilities to improve the prediction accuracy of IPknot. The problem of maximizing expected accuracy is solved by using integer programming with threshold cut. We also extend IPknot so that it can predict the consensus secondary structure with pseudoknots when a multiple sequence alignment is given. IPknot is validated through extensive experiments on various datasets, showing that IPknot achieves better prediction accuracy and faster running time as compared with several competitive prediction methods. AVAILABILITY The program of IPknot is available at http://www.ncrna.org/software/ipknot/. IPknot is also available as a web server at http://rna.naist.jp/ipknot/. CONTACT satoken@k.u-tokyo.ac.jp; ykato@is.naist.jp SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kengo Sato
- Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8561, Japan.
| | | | | | | | | |
Collapse
|
14
|
Wan Y, Kertesz M, Spitale RC, Segal E, Chang HY. Understanding the transcriptome through RNA structure. Nat Rev Genet 2011; 12:641-55. [PMID: 21850044 DOI: 10.1038/nrg3049] [Citation(s) in RCA: 351] [Impact Index Per Article: 25.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
RNA structure is crucial for gene regulation and function. In the past, transcriptomes have largely been parsed by primary sequences and expression levels, but it is now becoming feasible to annotate and compare transcriptomes based on RNA structure. In addition to computational prediction methods, the recent advent of experimental techniques to probe RNA structure by high-throughput sequencing has enabled genome-wide measurements of RNA structure and has provided the first picture of the structural organization of a eukaryotic transcriptome - the 'RNA structurome'. With additional advances in method refinement and interpretation, structural views of the transcriptome should help to identify and validate regulatory RNA motifs that are involved in diverse cellular processes and thereby increase understanding of RNA function.
Collapse
Affiliation(s)
- Yue Wan
- Howard Hughes Medical Institute and Program in Epithelial Biology, Stanford University School of Medicine, Stanford, California 94305, USA
| | | | | | | | | |
Collapse
|
15
|
Al-Khatib RM, Rashid NAA, Abdullah R. Thermodynamic Heuristics with Case-Based Reasoning: Combined Insights for RNA Pseudoknot Secondary Structure. J Biomol Struct Dyn 2011; 29:1-26. [DOI: 10.1080/07391102.2011.10507373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
16
|
Zhao Y, Gong Z, Xiao Y. Improvements of the Hierarchical Approach for Predicting RNA Tertiary Structure. J Biomol Struct Dyn 2011; 28:815-26. [DOI: 10.1080/07391102.2011.10508609] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
17
|
Sperschneider J, Datta A, Wise MJ. Heuristic RNA pseudoknot prediction including intramolecular kissing hairpins. RNA (NEW YORK, N.Y.) 2011; 17:27-38. [PMID: 21098139 PMCID: PMC3004063 DOI: 10.1261/rna.2394511] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/29/2010] [Accepted: 10/17/2010] [Indexed: 05/30/2023]
Abstract
Pseudoknots are an essential feature of RNA tertiary structures. Simple H-type pseudoknots have been studied extensively in terms of biological functions, computational prediction, and energy models. Intramolecular kissing hairpins are a more complex and biologically important type of pseudoknot in which two hairpin loops form base pairs. They are hard to predict using free energy minimization due to high computational requirements. Heuristic methods that allow arbitrary pseudoknots strongly depend on the quality of energy parameters, which are not yet available for complex pseudoknots. We present an extension of the heuristic pseudoknot prediction algorithm DotKnot, which covers H-type pseudoknots and intramolecular kissing hairpins. Our framework allows for easy integration of advanced H-type pseudoknot energy models. For a test set of RNA sequences containing kissing hairpins and other types of pseudoknot structures, DotKnot outperforms competing methods from the literature. DotKnot is available as a web server under http://dotknot.csse.uwa.edu.au.
Collapse
Affiliation(s)
- Jana Sperschneider
- School of Computer Science and Software Engineering, University of Western Australia, Perth WA 6009, Australia.
| | | | | |
Collapse
|
18
|
Koessler DR, Knisley DJ, Knisley J, Haynes T. A predictive model for secondary RNA structure using graph theory and a neural network. BMC Bioinformatics 2010; 11 Suppl 6:S21. [PMID: 20946605 PMCID: PMC3026369 DOI: 10.1186/1471-2105-11-s6-s21] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Determining the secondary structure of RNA from the primary structure is a challenging computational problem. A number of algorithms have been developed to predict the secondary structure from the primary structure. It is agreed that there is still room for improvement in each of these approaches. In this work we build a predictive model for secondary RNA structure using a graph-theoretic tree representation of secondary RNA structure. We model the bonding of two RNA secondary structures to form a larger secondary structure with a graph operation we call merge. We consider all combinatorial possibilities using all possible tree inputs, both those that are RNA-like in structure and those that are not. The resulting data from each tree merge operation is represented by a vector. We use these vectors as input values for a neural network and train the network to recognize a tree as RNA-like or not, based on the merge data vector. The network estimates the probability of a tree being RNA-like. Results The network correctly assigned a high probability of RNA-likeness to trees previously identified as RNA-like and a low probability of RNA-likeness to those classified as not RNA-like. We then used the neural network to predict the RNA-likeness of the unclassified trees. Conclusions There are a number of secondary RNA structure prediction algorithms available online. These programs are based on finding the secondary structure with the lowest total free energy. In this work, we create a predictive tool for secondary RNA structures using graph-theoretic values as input for a neural network. The use of a graph operation to theoretically describe the bonding of secondary RNA is novel and is an entirely different approach to the prediction of secondary RNA structures. Our method correctly predicted trees to be RNA-like or not RNA-like for all known cases. In addition, our results convey a measure of likelihood that a tree is RNA-like or not RNA-like. Given that the majority of secondary RNA folding algorithms return more than one possible outcome, our method provides a means of determining the best or most likely structures among all of the possible outcomes.
Collapse
Affiliation(s)
- Denise R Koessler
- Department of Mathematics and Statistics, East Tennessee State University, Johnson City, TN 37614, USA
| | | | | | | |
Collapse
|
19
|
Al-Khatib RM, Abdullah R, Rashid NA. A comparative taxonomy of parallel algorithms for RNA secondary structure prediction. Evol Bioinform Online 2010; 6:27-45. [PMID: 20458364 PMCID: PMC2865774 DOI: 10.4137/ebo.s4058] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
RNA molecules have been discovered playing crucial roles in numerous biological and medical procedures and processes. RNA structures determination have become a major problem in the biology context. Recently, computer scientists have empowered the biologists with RNA secondary structures that ease an understanding of the RNA functions and roles. Detecting RNA secondary structure is an NP-hard problem, especially in pseudoknotted RNA structures. The detection process is also time-consuming; as a result, an alternative approach such as using parallel architectures is a desirable option. The main goal in this paper is to do an intensive investigation of parallel methods used in the literature to solve the demanding issues, related to the RNA secondary structure prediction methods. Then, we introduce a new taxonomy for the parallel RNA folding methods. Based on this proposed taxonomy, a systematic and scientific comparison is performed among these existing methods.
Collapse
Affiliation(s)
- Ra’ed M. Al-Khatib
- The Parallel and Distributed Computing Center (PDCC), School of Computer Sciences, University Sains Malaysia, 11800 Penang, Malaysia.
| | - Rosni Abdullah
- The Parallel and Distributed Computing Center (PDCC), School of Computer Sciences, University Sains Malaysia, 11800 Penang, Malaysia.
| | - Nur’Aini Abdul Rashid
- The Parallel and Distributed Computing Center (PDCC), School of Computer Sciences, University Sains Malaysia, 11800 Penang, Malaysia.
| |
Collapse
|
20
|
Engelen S, Tahi F. Tfold: efficient in silico prediction of non-coding RNA secondary structures. Nucleic Acids Res 2010; 38:2453-66. [PMID: 20047957 PMCID: PMC2853104 DOI: 10.1093/nar/gkp1067] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2009] [Revised: 10/30/2009] [Accepted: 11/02/2009] [Indexed: 11/12/2022] Open
Abstract
Predicting RNA secondary structures is a very important task, and continues to be a challenging problem, even though several methods and algorithms are proposed in the literature. In this article, we propose an algorithm called Tfold, for predicting non-coding RNA secondary structures. Tfold takes as input a RNA sequence for which the secondary structure is searched and a set of aligned homologous sequences. It combines criteria of stability, conservation and covariation in order to search for stems and pseudoknots (whatever their type). Stems are searched recursively, from the most to the least stable. Tfold uses an algorithm called SSCA for selecting the most appropriate sequences from a large set of homologous sequences (taken from a database for example) to use for the prediction. Tfold can take into account one or several stems considered by the user as belonging to the secondary structure. Tfold can return several structures (if requested by the user) when 'rival' stems are found. Tfold has a complexity of O(n(2)), with n the sequence length. The developed software, which offers several different uses, is available on the web site: http://tfold.ibisc.univ-evry.fr/TFold.
Collapse
Affiliation(s)
| | - Fariza Tahi
- IBISC laboratory CNRS FRE 3190, University of Evry/Genopole, 523 place des Terrasses, 91000 Evry, France
| |
Collapse
|
21
|
Sperschneider J, Datta A. DotKnot: pseudoknot prediction using the probability dot plot under a refined energy model. Nucleic Acids Res 2010; 38:e103. [PMID: 20123730 PMCID: PMC2853144 DOI: 10.1093/nar/gkq021] [Citation(s) in RCA: 71] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
RNA pseudoknots are functional structure elements with key roles in viral and cellular processes. Prediction of a pseudoknotted minimum free energy structure is an NP-complete problem. Practical algorithms for RNA structure prediction including restricted classes of pseudoknots suffer from high runtime and poor accuracy for longer sequences. A heuristic approach is to search for promising pseudoknot candidates in a sequence and verify those. Afterwards, the detected pseudoknots can be further analysed using bioinformatics or laboratory techniques. We present a novel pseudoknot detection method called DotKnot that extracts stem regions from the secondary structure probability dot plot and assembles pseudoknot candidates in a constructive fashion. We evaluate pseudoknot free energies using novel parameters, which have recently become available. We show that the conventional probability dot plot makes a wide class of pseudoknots including those with bulged stems manageable in an explicit fashion. The energy parameters now become the limiting factor in pseudoknot prediction. DotKnot is an efficient method for long sequences, which finds pseudoknots with higher accuracy compared to other known prediction algorithms. DotKnot is accessible as a web server at http://dotknot.csse.uwa.edu.au.
Collapse
Affiliation(s)
- Jana Sperschneider
- School of Computer Science and Software Engineering, The University of Western Australia, Perth, WA 6009, Australia.
| | | |
Collapse
|
22
|
Zhang J, Dundas J, Lin M, Chen R, Wang W, Liang J. Prediction of geometrically feasible three-dimensional structures of pseudoknotted RNA through free energy estimation. RNA (NEW YORK, N.Y.) 2009; 15:2248-63. [PMID: 19864433 PMCID: PMC2779689 DOI: 10.1261/rna.1723609] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/08/2009] [Accepted: 09/05/2009] [Indexed: 05/07/2023]
Abstract
Accurate free energy estimation is essential for RNA structure prediction. The widely used Turner's energy model works well for nested structures. For pseudoknotted RNAs, however, there is no effective rule for estimation of loop entropy and free energy. In this work we present a new free energy estimation method, termed the pseudoknot predictor in three-dimensional space (pk3D), which goes beyond Turner's model. Our approach treats nested and pseudoknotted structures alike in one unifying physical framework, regardless of how complex the RNA structures are. We first test the ability of pk3D in selecting native structures from a large number of decoys for a set of 43 pseudoknotted RNA molecules, with lengths ranging from 23 to 113. We find that pk3D performs slightly better than the Dirks and Pierce extension of Turner's rule. We then test pk3D for blind secondary structure prediction, and find that pk3D gives the best sensitivity and comparable positive predictive value (related to specificity) in predicting pseudoknotted RNA secondary structures, when compared with other methods. A unique strength of pk3D is that it also generates spatial arrangement of structural elements of the RNA molecule. Comparison of three-dimensional structures predicted by pk3D with the native structure measured by nuclear magnetic resonance or X-ray experiments shows that the predicted spatial arrangement of stems and loops is often similar to that found in the native structure. These close-to-native structures can be used as starting points for further refinement to derive accurate three-dimensional structures of RNA molecules, including those with pseudoknots.
Collapse
Affiliation(s)
- Jian Zhang
- Department of Bioengineering, University of Illinois at Chicago, Chicago, Illinois 60607, USA
| | | | | | | | | | | |
Collapse
|
23
|
Cao S, Chen SJ. Predicting structures and stabilities for H-type pseudoknots with interhelix loops. RNA (NEW YORK, N.Y.) 2009; 15:696-706. [PMID: 19237463 PMCID: PMC2661829 DOI: 10.1261/rna.1429009] [Citation(s) in RCA: 81] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/21/2008] [Accepted: 01/10/2009] [Indexed: 05/20/2023]
Abstract
RNA pseudoknots play a critical role in RNA-related biology from the assembly of ribosome to the regulation of viral gene expression. A predictive model for pseudoknot structure and stability is essential for understanding and designing RNA structure and function. A previous statistical mechanical theory allows us to treat canonical H-type RNA pseudoknots that contain no intervening loop between the helices (see S. Cao and S.J. Chen [2006] in Nucleic Acids Research, Vol. 34; pp. 2634-2652). Biologically significant RNA pseudoknots often contain interhelix loops. Predicting the structure and stability for such more-general pseudoknots remains an unsolved problem. In the present study, we develop a predictive model for pseudoknots with interhelix loops. The model gives conformational entropy, stability, and the free-energy landscape from RNA sequences. The main features of this new model are the computation of the conformational entropy and folding free-energy base on the complete conformational ensemble and rigorous treatment for the excluded volume effects. Extensive tests for the structural predictions show overall good accuracy with average sensitivity and specificity equal to 0.91 and 0.91, respectively. The theory developed here may be a solid starting point for first-principles modeling of more complex, larger RNAs.
Collapse
Affiliation(s)
- Song Cao
- Department of Physics, University of Missouri, Columbia, 65211, USA
| | | |
Collapse
|