1
|
Chen CC, Chan YM, Jeong H. REDalign: accurate RNA structural alignment using residual encoder-decoder network. BMC Bioinformatics 2024; 25:346. [PMID: 39501155 PMCID: PMC11539752 DOI: 10.1186/s12859-024-05956-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Accepted: 10/11/2024] [Indexed: 11/08/2024] Open
Abstract
BACKGROUND RNA secondary structural alignment serves as a foundational procedure in identifying conserved structural motifs among RNA sequences, crucially advancing our understanding of novel RNAs via comparative genomic analysis. While various computational strategies for RNA structural alignment exist, they often come with high computational complexity. Specifically, when addressing a set of RNAs with unknown structures, the task of simultaneously predicting their consensus secondary structure and determining the optimal sequence alignment requires an overwhelming computational effort of O ( L 6 ) for each RNA pair. Such an extremely high computational complexity makes these methods impractical for large-scale analysis despite their accurate alignment capabilities. RESULTS In this paper, we introduce REDalign, an innovative approach based on deep learning for RNA secondary structural alignment. By utilizing a residual encoder-decoder network, REDalign can efficiently capture consensus structures and optimize structural alignments. In this learning model, the encoder network leverages a hierarchical pyramid to assimilate high-level structural features. Concurrently, the decoder network, enhanced with residual skip connections, integrates multi-level encoded features to learn detailed feature hierarchies with fewer parameter sets. REDalign significantly reduces computational complexity compared to Sankoff-style algorithms and effectively handles non-nested structures, including pseudoknots, which are challenging for traditional alignment methods. Extensive evaluations demonstrate that REDalign provides superior accuracy and substantial computational efficiency. CONCLUSION REDalign presents a significant advancement in RNA secondary structural alignment, balancing high alignment accuracy with lower computational demands. Its ability to handle complex RNA structures, including pseudoknots, makes it an effective tool for large-scale RNA analysis, with potential implications for accelerating discoveries in RNA research and comparative genomics.
Collapse
Affiliation(s)
- Chun-Chi Chen
- Department of Electrical Engineering, National Chiayi University, No.300 Xuefu Rd, Chiayi City, 600355, Taiwan.
| | - Yi-Ming Chan
- MindtronicAI Co., 7 F., No. 218, Sec. 6, Roosevelt Road, Taipei, 11674, Taiwan
| | - Hyundoo Jeong
- Biomedical and Robotics Engineering, Incheon National University, 119 Academy-ro, Incheon, 22012, Yeonsu-gu, South Korea.
| |
Collapse
|
2
|
Mittal A, Ali SE, Mathews DH. Using the RNAstructure Software Package to Predict Conserved RNA Structures. Curr Protoc 2024; 4:e70054. [PMID: 39540715 DOI: 10.1002/cpz1.70054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2024]
Abstract
The structures of many non-coding RNAs (ncRNA) are conserved by evolution to a greater extent than their sequences. By predicting the conserved structure of two or more homologous sequences, the accuracy of secondary structure prediction can be improved as compared to structure prediction for a single sequence. Here, we provide protocols for the use of four programs in the RNAstructure suite to predict conserved structures: Multilign, TurboFold, Dynalign, and PARTS. TurboFold iteratively aligns multiple homologous sequences and estimates the pairing probabilities for the conserved structure. Dynalign, PARTS, and Multilign are dynamic programming algorithms that simultaneously align sequences and identify the common secondary structure. Dynalign uses a pair of homologs and finds the lowest free energy common structure. PARTS uses a pair of homologs and estimates pairing probabilities from the base pairing probabilities estimated for each sequence. Multilign uses two or more homologs and finds the lowest free energy common structure using multiple pairwise calculations with Dynalign. It scales linearly with the number of sequences. We outline the strengths of each program. These programs can be run through web servers, on the command line, or with graphical user interfaces. © 2024 Wiley Periodicals LLC. Basic Protocol 1: Predicting a structure conserved in three or more sequences with the RNAstructure web server Basic Protocol 2: Predicting a structure conserved in two sequences with the RNAstructure web server Alternative Protocol 1: Predicting a structure conserved in multiple sequences in the RNAstructure graphical user interface Alternative Protocol 2: Predicting a structure conserved in two sequences with Dynalign in the RNAstructure graphical user interface Alternative Protocol 3: Running TurboFold on the command line.
Collapse
Affiliation(s)
- Abhinav Mittal
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York
| | - Sara E Ali
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York
| | - David H Mathews
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York
| |
Collapse
|
3
|
Abstract
RNAstructure is a user-friendly program for the prediction and analysis of RNA secondary structure. It is available as a web server, a program with a graphical user interface, or a set of command line tools. The programs are available for Microsoft Windows, macOS, or Linux. This article provides protocols for prediction of RNA secondary structure (using the web server, the graphical user interface, or the command line) and high-affinity oligonucleotide binding sites to a structured RNA target (using the graphical user interface). © 2023 Wiley Periodicals LLC. Basic Protocol 1: Predicting RNA secondary structure using the RNAstructure web server Alternate Protocol 1: Predicting secondary structure and base pair probabilities using the RNAstructure graphical user interface Alternate Protocol 2: Predicting secondary structure and base pair probabilities using the RNAstructure command line interface Basic Protocol 2: Predicting binding affinities of oligonucleotides complementary to an RNA target using OligoWalk.
Collapse
Affiliation(s)
- Sara E. Ali
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, New York 14642
| | - Abhinav Mittal
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, New York 14642
| | - David H. Mathews
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, New York 14642
| |
Collapse
|
4
|
Network-Based Structural Alignment of RNA Sequences Using TOPAS. Methods Mol Biol 2023; 2586:147-162. [PMID: 36705903 DOI: 10.1007/978-1-0716-2768-6_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
TOPAS (TOPological network-based Alignment of Structural RNAs) is a network-based alignment algorithm that predicts structurally sound pairwise alignment of RNAs. In order to take advantage of recent advances in comparative network analysis for efficient structurally sound RNA alignment, TOPAS constructs topological network representations for RNAs, which consist of sequential edges connecting nucleotide bases as well as structural edges reflecting the underlying folding structure. Structural edges are weighted by the estimated base-pairing probabilities. Next, the constructed networks are aligned using probabilistic network alignment techniques, which yield a structurally sound RNA alignment that considers both the sequence similarity and the structural similarity between the given RNAs. Compared to traditional Sankoff-style algorithms, this network-based alignment scheme leads to a significant reduction in the overall computational cost while yielding favorable alignment results. Another important benefit is its capability to handle arbitrary folding structures, which can potentially lead to more accurate alignment for RNAs with pseudoknots.
Collapse
|
5
|
Akiyama M, Sakakibara Y. Informative RNA base embedding for RNA structural alignment and clustering by deep representation learning. NAR Genom Bioinform 2022; 4:lqac012. [PMID: 35211670 PMCID: PMC8862729 DOI: 10.1093/nargab/lqac012] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 01/08/2022] [Accepted: 02/05/2022] [Indexed: 01/17/2023] Open
Abstract
Effective embedding is actively conducted by applying deep learning to biomolecular information. Obtaining better embeddings enhances the quality of downstream analyses, such as DNA sequence motif detection and protein function prediction. In this study, we adopt a pre-training algorithm for the effective embedding of RNA bases to acquire semantically rich representations and apply this algorithm to two fundamental RNA sequence problems: structural alignment and clustering. By using the pre-training algorithm to embed the four bases of RNA in a position-dependent manner using a large number of RNA sequences from various RNA families, a context-sensitive embedding representation is obtained. As a result, not only base information but also secondary structure and context information of RNA sequences are embedded for each base. We call this ‘informative base embedding’ and use it to achieve accuracies superior to those of existing state-of-the-art methods on RNA structural alignment and RNA family clustering tasks. Furthermore, upon performing RNA sequence alignment by combining this informative base embedding with a simple Needleman–Wunsch alignment algorithm, we succeed in calculating structural alignments with a time complexity of O(n2) instead of the O(n6) time complexity of the naive implementation of Sankoff-style algorithm for input RNA sequence of length n.
Collapse
Affiliation(s)
- Manato Akiyama
- Department of Biosciences and Informatics, Keio University, 223-8522, Japan
| | | |
Collapse
|
6
|
Zambrano RAI, Hernandez-Perez C, Takahashi MK. RNA Structure Prediction, Analysis, and Design: An Introduction to Web-Based Tools. Methods Mol Biol 2022; 2518:253-269. [PMID: 35666450 DOI: 10.1007/978-1-0716-2421-0_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Understanding RNA structure has become critical in the study of RNA in their roles as mediators of biological processes. To aid in these studies, computational algorithms that utilize thermodynamics have been developed to predict RNA secondary structure. Due to the importance of intermolecular interactions, the algorithms have been expanded to determine and predict RNA-RNA hybridization. This chapter discusses popular webservers with the tools for RNA secondary structure prediction, RNA-RNA hybridization, and design. We address key features that distinguish common-functioning programs and their purposes for the interests of the user. Ultimately, we hope this review elucidates web-based tools researchers may take advantage of in their investigations of RNA structure and function.
Collapse
Affiliation(s)
| | | | - Melissa K Takahashi
- Department of Biology, California State University Northridge, Northridge, CA, USA.
| |
Collapse
|
7
|
Barcytė D, Eikrem W, Engesmo A, Seoane S, Wohlmann J, Horák A, Yurchenko T, Eliáš M. Olisthodiscus represents a new class of Ochrophyta. JOURNAL OF PHYCOLOGY 2021; 57:1094-1118. [PMID: 33655496 DOI: 10.1111/jpy.13155] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/09/2020] [Revised: 12/08/2020] [Accepted: 01/04/2021] [Indexed: 06/12/2023]
Abstract
The phylogenetic diversity of Ochrophyta, a diverse and ecologically important radiation of algae, is still incompletely understood even at the level of the principal lineages. One taxon that has eluded simple classification is the marine flagellate genus Olisthodiscus. We investigated Olisthodiscus luteus K-0444 and documented its morphological and genetic differences from the NIES-15 strain, which we described as Olisthodiscus tomasii sp. nov. Phylogenetic analyses of combined 18S and 28S rRNA sequences confirmed that Olisthodiscus constitutes a separate, deep, ochrophyte lineage, but its position could not be resolved. To overcome this problem, we sequenced the plastid genome of O. luteus K-0444 and used the new data in multigene phylogenetic analyses, which suggested that Olisthodiscus is a sister lineage of the class Pinguiophyceae within a broader clade additionally including Chrysophyceae, Synchromophyceae, and Eustigmatophyceae. Surprisingly, the Olisthodiscus plastid genome contained three genes, ycf80, cysT, and cysW, inherited from the rhodophyte ancestor of the ochrophyte plastid yet lost from all other ochrophyte groups studied so far. Combined with nuclear genes for CysA and Sbp proteins, Olisthodiscus is the only known ochrophyte possessing a plastidial sulfate transporter SulT. In addition, the finding of a cemA gene in the Olisthodiscus plastid genome and an updated phylogenetic analysis ruled out the previously proposed hypothesis invoking horizontal cemA transfer from a green algal plastid into Synurales. Altogether, Olisthodiscus clearly represents a novel phylogenetically distinct ochrophyte lineage, which we have proposed as a new class, Olisthodiscophyceae.
Collapse
Affiliation(s)
- Dovilė Barcytė
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, 710 00, Ostrava, Czech Republic
| | - Wenche Eikrem
- Norwegian Institute for Water Research, Gaustadallèen 21, 0349, Oslo, Norway
- Natural history Museum, University of Oslo, P.O. Box 1172 Blindern, 0318, Oslo, Norway
- Department of Biosciences, University of Oslo, P.O. Box 1066 Blindern, 0316, Oslo, Norway
| | - Anette Engesmo
- Norwegian Institute for Water Research, Gaustadallèen 21, 0349, Oslo, Norway
- Department of Biosciences, University of Oslo, P.O. Box 1066 Blindern, 0316, Oslo, Norway
| | - Sergio Seoane
- Department of Plant Biology and Ecology, University of the Basque Country (UPV/EHU), 48940, Leioa, Spain
| | - Jens Wohlmann
- Department of Biosciences, University of Oslo, P.O. Box 1066 Blindern, 0316, Oslo, Norway
| | - Aleš Horák
- Biology Centre, Czech Academy of Sciences, Institute of Parasitology, Branišovská 31, 37005, České Budějovice, Czech Republic
- Department of Molecular Biology, Faculty of Science, University of South Bohemia, Branišovská 31, 37005, České Budějovice, Czech Republic
| | - Tatiana Yurchenko
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, 710 00, Ostrava, Czech Republic
| | - Marek Eliáš
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, 710 00, Ostrava, Czech Republic
| |
Collapse
|
8
|
Chen CC, Jeong H, Qian X, Yoon BJ. TOPAS: network-based structural alignment of RNA sequences. Bioinformatics 2020; 35:2941-2948. [PMID: 30629122 DOI: 10.1093/bioinformatics/btz001] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2017] [Revised: 12/07/2018] [Accepted: 01/04/2019] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION For many RNA families, the secondary structure is known to be better conserved among the member RNAs compared to the primary sequence. For this reason, it is important to consider the underlying folding structures when aligning RNA sequences, especially for those with relatively low sequence identity. Given a set of RNAs with unknown structures, simultaneous RNA alignment and folding algorithms aim to accurately align the RNAs by jointly predicting their consensus secondary structure and the optimal sequence alignment. Despite the improved accuracy of the resulting alignment, the computational complexity of simultaneous alignment and folding for a pair of RNAs is O(N6), which is too costly to be used for large-scale analysis. RESULTS In order to address this shortcoming, in this work, we propose a novel network-based scheme for pairwise structural alignment of RNAs. The proposed algorithm, TOPAS, builds on the concept of topological networks that provide structural maps of the RNAs to be aligned. For each RNA sequence, TOPAS first constructs a topological network based on the predicted folding structure, which consists of sequential edges and structural edges weighted by the base-pairing probabilities. The obtained networks can then be efficiently aligned by using probabilistic network alignment techniques, thereby yielding the structural alignment of the RNAs. The computational complexity of our proposed method is significantly lower than that of the Sankoff-style dynamic programming approach, while yielding favorable alignment results. Furthermore, another important advantage of the proposed algorithm is its capability of handling RNAs with pseudoknots while predicting the RNA structural alignment. We demonstrate that TOPAS generally outperforms previous RNA structural alignment methods on RNA benchmarks in terms of both speed and accuracy. AVAILABILITY AND IMPLEMENTATION Source code of TOPAS and the benchmark data used in this paper are available at https://github.com/bjyoontamu/TOPAS.
Collapse
Affiliation(s)
- Chun-Chi Chen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA.,TEES-AgriLife Center for Bioinformatics & Genomic Systems Engineering, Texas A&M University, College Station, TX, USA
| | - Hyundoo Jeong
- Department of Electronic Engineering, Chosun University, Gwangju, Republic of Korea
| | - Xiaoning Qian
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA.,TEES-AgriLife Center for Bioinformatics & Genomic Systems Engineering, Texas A&M University, College Station, TX, USA
| | - Byung-Jun Yoon
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA.,TEES-AgriLife Center for Bioinformatics & Genomic Systems Engineering, Texas A&M University, College Station, TX, USA
| |
Collapse
|
9
|
Tan Z, Fu Y, Sharma G, Mathews DH. TurboFold II: RNA structural alignment and secondary structure prediction informed by multiple homologs. Nucleic Acids Res 2017; 45:11570-11581. [PMID: 29036420 PMCID: PMC5714223 DOI: 10.1093/nar/gkx815] [Citation(s) in RCA: 67] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2017] [Accepted: 09/12/2017] [Indexed: 12/26/2022] Open
Abstract
This paper presents TurboFold II, an extension of the TurboFold algorithm for predicting secondary structures for multiple RNA homologs. TurboFold II augments the structure prediction capabilities of TurboFold by additionally providing multiple sequence alignments. Probabilities for alignment of nucleotide positions between all pairs of input sequences are iteratively estimated in TurboFold II by incorporating information from both the sequence identity and secondary structures. A multiple sequence alignment is obtained from these probabilities by using a probabilistic consistency transformation and a hierarchically computed guide tree. To assess TurboFold II, its sequence alignment and structure predictions were compared with leading tools, including methods that focus on alignment alone and methods that provide both alignment and structure prediction. TurboFold II has comparable alignment accuracy with MAFFT and higher accuracy than other tools. TurboFold II also has comparable structure prediction accuracy as the original TurboFold algorithm, which is one of the most accurate methods. TurboFold II is part of the RNAstructure software package, which is freely available for download at http://rna.urmc.rochester.edu under a GPL license.
Collapse
Affiliation(s)
- Zhen Tan
- Department of Biochemistry and Biophysics, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, USA.,Center for RNA Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, USA
| | - Yinghan Fu
- Department of Biochemistry and Biophysics, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, USA.,Center for RNA Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, USA
| | - Gaurav Sharma
- Center for RNA Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, USA.,Department of Electrical and Computer Engineering, University of Rochester, Hopeman 204, RC Box 270126, Rochester, NY 14627, USA.,Department of Biostatistics and Computational Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 630, Rochester, NY 14642, USA
| | - David H Mathews
- Department of Biochemistry and Biophysics, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, USA.,Center for RNA Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, USA.,Department of Biostatistics and Computational Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 630, Rochester, NY 14642, USA
| |
Collapse
|
10
|
Abstract
The secondary structure of an RNA molecule represents the base-pairing interactions within the molecule and fundamentally determines its overall structure. In this chapter, we overview the main approaches and existing tools for predicting RNA secondary structures, as well as methods for identifying noncoding RNAs from genomic sequences or RNA sequencing data. We then focus on the identification of a well-known class of small noncoding RNAs, namely microRNAs, which play very important roles in many biological processes through regulating post-transcriptionally the expression of genes and which dysregulation has been shown to be involved in several human diseases.
Collapse
Affiliation(s)
- Fariza Tahi
- IBISC, UEVE/Genopole, 23 bv. de France, 91000, Evry, France.
- IPS2, University of Paris-Saclay, 91190, Gif-sur-Yvette, France.
| | - Van Du T Tran
- Vital-IT group, SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Anouar Boucheham
- IBISC, UEVE/Genopole, 23 bv. de France, 91000, Evry, France
- College of NTIC, Constantine University 2, Constantine, Algeria
| |
Collapse
|
11
|
Abstract
Deciphering the folding pathways and predicting the structures of complex three-dimensional biomolecules is central to elucidating biological function. RNA is single-stranded, which gives it the freedom to fold into complex secondary and tertiary structures. These structures endow RNA with the ability to perform complex chemistries and functions ranging from enzymatic activity to gene regulation. Given that RNA is involved in many essential cellular processes, it is critical to understand how it folds and functions in vivo. Within the last few years, methods have been developed to probe RNA structures in vivo and genome-wide. These studies reveal that RNA often adopts very different structures in vivo and in vitro, and provide profound insights into RNA biology. Nonetheless, both in vitro and in vivo approaches have limitations: studies in the complex and uncontrolled cellular environment make it difficult to obtain insight into RNA folding pathways and thermodynamics, and studies in vitro often lack direct cellular relevance, leaving a gap in our knowledge of RNA folding in vivo. This gap is being bridged by biophysical and mechanistic studies of RNA structure and function under conditions that mimic the cellular environment. To date, most artificial cytoplasms have used various polymers as molecular crowding agents and a series of small molecules as cosolutes. Studies under such in vivo-like conditions are yielding fresh insights, such as cooperative folding of functional RNAs and increased activity of ribozymes. These observations are accounted for in part by molecular crowding effects and interactions with other molecules. In this review, we report milestones in RNA folding in vitro and in vivo and discuss ongoing experimental and computational efforts to bridge the gap between these two conditions in order to understand how RNA folds in the cell.
Collapse
|
12
|
Abstract
RNA structure is conserved by evolution to a greater extent than sequence. Predicting the conserved structure for multiple homologous sequences can be much more accurate than predicting the structure for a single sequence. RNAstructure is a software package that includes the programs Dynalign, Multilign, TurboFold, and PARTS for predicting conserved RNA secondary structure. This chapter provides protocols for using these programs.
Collapse
|
13
|
DiChiacchio L, Sloma MF, Mathews DH. AccessFold: predicting RNA-RNA interactions with consideration for competing self-structure. Bioinformatics 2015; 32:1033-9. [PMID: 26589271 DOI: 10.1093/bioinformatics/btv682] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2014] [Accepted: 11/15/2015] [Indexed: 01/09/2023] Open
Abstract
MOTIVATION There are numerous examples of RNA-RNA complexes, including microRNA-mRNA and small RNA-mRNA duplexes for regulation of translation, guide RNA interactions with target RNA for post-transcriptional modification and small nuclear RNA duplexes for splicing. Predicting the base pairs formed between two interacting sequences remains difficult, at least in part because of the competition between unimolecular and bimolecular structure. RESULTS Two algorithms were developed for improved prediction of bimolecular RNA structure that consider the competition between self-structure and bimolecular structure. These algorithms utilize two novel approaches to evaluate accessibility: free energy density minimization and pseudo-energy minimization. Free energy density minimization minimizes the folding free energy change per nucleotide involved in an intermolecular secondary structure. Pseudo-energy minimization (called AccessFold) minimizes the sum of free energy change and a pseudo-free energy penalty for bimolecular pairing of nucleotides that are unlikely to be accessible for bimolecular structure. The pseudo-free energy, derived from unimolecular pairing probabilities, is applied per nucleotide in bimolecular pairs, and this approach is able to predict binding sites that are split by unimolecular structures. A benchmark set of 17 bimolecular RNA structures was assembled to assess structure prediction. Pseudo-energy minimization provides a statistically significant improvement in sensitivity over the method that was found in a benchmark to be the most accurate previously available method, with an improvement from 36.8% to 57.8% in mean sensitivity for base pair prediction. AVAILABILITY AND IMPLEMENTATION Pseudo-energy minimization is available for download as AccessFold, under an open-source license and as part of the RNAstructure package, at: http://rna.urmc.rochester.edu/RNAstructure.html CONTACT david_mathews@urmc.rochester.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Laura DiChiacchio
- Department of Biochemistry and Biophysics and Center for RNA Biology and
| | - Michael F Sloma
- Department of Biochemistry and Biophysics and Center for RNA Biology and
| | - David H Mathews
- Department of Biochemistry and Biophysics and Center for RNA Biology and Department of Biostatistics and Computational Biology, University of Rochester Medical Center, Rochester, NY 14642, USA
| |
Collapse
|
14
|
Fu Y, Sharma G, Mathews DH. Dynalign II: common secondary structure prediction for RNA homologs with domain insertions. Nucleic Acids Res 2015; 42:13939-48. [PMID: 25416799 PMCID: PMC4267632 DOI: 10.1093/nar/gku1172] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Homologous non-coding RNAs frequently exhibit domain insertions, where a branch of secondary structure is inserted in a sequence with respect to its homologs. Dynamic programming algorithms for common secondary structure prediction of multiple RNA homologs, however, do not account for these domain insertions. This paper introduces a novel dynamic programming algorithm methodology that explicitly accounts for the possibility of inserted domains when predicting common RNA secondary structures. The algorithm is implemented as Dynalign II, an update to the Dynalign software package for predicting the common secondary structure of two RNA homologs. This update is accomplished with negligible increase in computational cost. Benchmarks on ncRNA families with domain insertions validate the method. Over base pairs occurring in inserted domains, Dynalign II improves accuracy over Dynalign, attaining 80.8% sensitivity (compared with 14.4% for Dynalign) and 91.4% positive predictive value (PPV) for tRNA; 66.5% sensitivity (compared with 38.9% for Dynalign) and 57.0% PPV for RNase P RNA; and 50.1% sensitivity (compared with 24.3% for Dynalign) and 58.5% PPV for SRP RNA. Compared with Dynalign, Dynalign II also exhibits statistically significant improvements in overall sensitivity and PPV. Dynalign II is available as a component of RNAstructure, which can be downloaded from http://rna.urmc.rochester.edu/RNAstructure.html.
Collapse
Affiliation(s)
- Yinghan Fu
- Department of Biochemistry and Biophysics, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, USA
- Center for RNA Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, USA
| | - Gaurav Sharma
- Center for RNA Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, USA
- Department of Electrical and Computer Engineering, University of Rochester, Hopeman 204, RC Box 270126, Rochester, NY 14627, USA
- Department of Biostatistics and Computational Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 630, Rochester, NY 14642, USA
- To whom correspondence should be addressed. Tel: +1 585 275 1734; Fax: +1 585 275 6007;
| | - David H. Mathews
- Department of Biochemistry and Biophysics, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, USA
- Center for RNA Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, USA
- Department of Biostatistics and Computational Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 630, Rochester, NY 14642, USA
- To whom correspondence should be addressed. Tel: +1 585 275 1734; Fax: +1 585 275 6007;
| |
Collapse
|
15
|
Affiliation(s)
- David H. Mathews
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center Rochester New York
| |
Collapse
|
16
|
Mathews DH. Using the RNAstructure Software Package to Predict Conserved RNA Structures. ACTA ACUST UNITED AC 2014; 46:12.4.1-12.4.22. [PMID: 24939126 DOI: 10.1002/0471250953.bi1204s46] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The structures of many non-coding RNA (ncRNA) are conserved by evolution to a greater extent than their sequences. By predicting the conserved structure of two or more homologous sequences, the accuracy of secondary structure prediction can be improved as compared to structure prediction for a single sequence. This unit provides protocols for the use of four programs in the RNAstructure suite for prediction of conserved structures, Multilign, TurboFold, Dynalign, and PARTS. These programs can be run via Web servers, on the command line, or with graphical interfaces.
Collapse
Affiliation(s)
- David H Mathews
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York
| |
Collapse
|
17
|
Stern HA, Mathews DH. Accelerating calculations of RNA secondary structure partition functions using GPUs. Algorithms Mol Biol 2013; 8:29. [PMID: 24180434 PMCID: PMC4175106 DOI: 10.1186/1748-7188-8-29] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2013] [Accepted: 10/14/2013] [Indexed: 01/06/2023] Open
Abstract
Background RNA performs many diverse functions in the cell in addition to its role as a messenger of genetic information. These functions depend on its ability to fold to a unique three-dimensional structure determined by the sequence. The conformation of RNA is in part determined by its secondary structure, or the particular set of contacts between pairs of complementary bases. Prediction of the secondary structure of RNA from its sequence is therefore of great interest, but can be computationally expensive. In this work we accelerate computations of base-pair probababilities using parallel graphics processing units (GPUs). Results Calculation of the probabilities of base pairs in RNA secondary structures using nearest-neighbor standard free energy change parameters has been implemented using CUDA to run on hardware with multiprocessor GPUs. A modified set of recursions was introduced, which reduces memory usage by about 25%. GPUs are fastest in single precision, and for some hardware, restricted to single precision. This may introduce significant roundoff error. However, deviations in base-pair probabilities calculated using single precision were found to be negligible compared to those resulting from shifting the nearest-neighbor parameters by a random amount of magnitude similar to their experimental uncertainties. For large sequences running on our particular hardware, the GPU implementation reduces execution time by a factor of close to 60 compared with an optimized serial implementation, and by a factor of 116 compared with the original code. Conclusions Using GPUs can greatly accelerate computation of RNA secondary structure partition functions, allowing calculation of base-pair probabilities for large sequences in a reasonable amount of time, with a negligible compromise in accuracy due to working in single precision. The source code is integrated into the RNAstructure software package and available for download at http://rna.urmc.rochester.edu.
Collapse
|
18
|
Bellaousov S, Reuter JS, Seetin MG, Mathews DH. RNAstructure: Web servers for RNA secondary structure prediction and analysis. Nucleic Acids Res 2013; 41:W471-4. [PMID: 23620284 PMCID: PMC3692136 DOI: 10.1093/nar/gkt290] [Citation(s) in RCA: 298] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
RNAstructure is a software package for RNA secondary structure prediction and analysis. This contribution describes a new set of web servers to provide its functionality. The web server offers RNA secondary structure prediction, including free energy minimization, maximum expected accuracy structure prediction and pseudoknot prediction. Bimolecular secondary structure prediction is also provided. Additionally, the server can predict secondary structures conserved in either two homologs or more than two homologs. Folding free energy changes can be predicted for a given RNA structure using nearest neighbor rules. Secondary structures can be compared using circular plots or the scoring methods, sensitivity and positive predictive value. Additionally, structure drawings can be rendered as SVG, postscript, jpeg or pdf. The web server is freely available for public use at: http://rna.urmc.rochester.edu/RNAstructureWeb.
Collapse
Affiliation(s)
- Stanislav Bellaousov
- Department of Biochemistry and Biophysics, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, USA
| | | | | | | |
Collapse
|
19
|
Will S, Joshi T, Hofacker IL, Stadler PF, Backofen R. LocARNA-P: accurate boundary prediction and improved detection of structural RNAs. RNA (NEW YORK, N.Y.) 2012; 18:900-14. [PMID: 22450757 PMCID: PMC3334699 DOI: 10.1261/rna.029041.111] [Citation(s) in RCA: 269] [Impact Index Per Article: 20.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2011] [Accepted: 01/18/2012] [Indexed: 05/18/2023]
Abstract
Current genomic screens for noncoding RNAs (ncRNAs) predict a large number of genomic regions containing potential structural ncRNAs. The analysis of these data requires highly accurate prediction of ncRNA boundaries and discrimination of promising candidate ncRNAs from weak predictions. Existing methods struggle with these goals because they rely on sequence-based multiple sequence alignments, which regularly misalign RNA structure and therefore do not support identification of structural similarities. To overcome this limitation, we compute columnwise and global reliabilities of alignments based on sequence and structure similarity; we refer to these structure-based alignment reliabilities as STARs. The columnwise STARs of alignments, or STAR profiles, provide a versatile tool for the manual and automatic analysis of ncRNAs. In particular, we improve the boundary prediction of the widely used ncRNA gene finder RNAz by a factor of 3 from a median deviation of 47 to 13 nt. Post-processing RNAz predictions, LocARNA-P's STAR score allows much stronger discrimination between true- and false-positive predictions than RNAz's own evaluation. The improved accuracy, in this scenario increased from AUC 0.71 to AUC 0.87, significantly reduces the cost of successive analysis steps. The ready-to-use software tool LocARNA-P produces structure-based multiple RNA alignments with associated columnwise STARs and predicts ncRNA boundaries. We provide additional results, a web server for LocARNA/LocARNA-P, and the software package, including documentation and a pipeline for refining screens for structural ncRNA, at http://www.bioinf.uni-freiburg.de/Supplements/LocARNA-P/.
Collapse
Affiliation(s)
- Sebastian Will
- Chair for Bioinformatics, Institute of Computer Science, Albert-Ludwigs-Universität, D-79110 Freiburg, Germany
- Computation and Biology Group, CSAIL and Mathematics Department, MIT, Cambridge, Massachusetts 02139, USA
| | - Tejal Joshi
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, DK-2800 Kongens Lyngby, Denmark
| | - Ivo L. Hofacker
- Department of Theoretical Chemistry, University of Vienna, A-1090 Wien, Austria
| | - Peter F. Stadler
- Department of Theoretical Chemistry, University of Vienna, A-1090 Wien, Austria
- Bioinformatics Group, Department of Computer Science, Interdisciplinary Center of Bioinformatics, University of Leipzig, D-04107 Leipzig, Germany
- Max-Planck-Institute for Mathematics in the Sciences, D-04103 Leipzig, Germany
- Fraunhofer Institute for Cell Therapy and Immunology, D-04103 Leipzig, Germany
- Santa Fe Institute, Santa Fe, New Mexico 87501, USA
| | - Rolf Backofen
- Chair for Bioinformatics, Institute of Computer Science, Albert-Ludwigs-Universität, D-79110 Freiburg, Germany
- Center for Biological Signaling Studies (BIOSS), University of Freiburg, D-79104 Freiburg, Germany
| |
Collapse
|
20
|
Hamada M, Asai K. A classification of bioinformatics algorithms from the viewpoint of maximizing expected accuracy (MEA). J Comput Biol 2012; 19:532-49. [PMID: 22313125 DOI: 10.1089/cmb.2011.0197] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Many estimation problems in bioinformatics are formulated as point estimation problems in a high-dimensional discrete space. In general, it is difficult to design reliable estimators for this type of problem, because the number of possible solutions is immense, which leads to an extremely low probability for every solution-even for the one with the highest probability. Therefore, maximum score and maximum likelihood estimators do not work well in this situation although they are widely employed in a number of applications. Maximizing expected accuracy (MEA) estimation, in which accuracy measures of the target problem and the entire distribution of solutions are considered, is a more successful approach. In this review, we provide an extensive discussion of algorithms and software based on MEA. We describe how a number of algorithms used in previous studies can be classified from the viewpoint of MEA. We believe that this review will be useful not only for users wishing to utilize software to solve the estimation problems appearing in this article, but also for developers wishing to design algorithms on the basis of MEA.
Collapse
Affiliation(s)
- Michiaki Hamada
- Graduate School of Frontier Sciences, University of Tokyo, Kashiwa, Japan.
| | | |
Collapse
|
21
|
Seetin MG, Mathews DH. TurboKnot: rapid prediction of conserved RNA secondary structures including pseudoknots. ACTA ACUST UNITED AC 2012; 28:792-8. [PMID: 22285566 DOI: 10.1093/bioinformatics/bts044] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
MOTIVATION Many RNA molecules function without being translated into proteins, and function depends on structure. Pseudoknots are motifs in RNA secondary structures that are difficult to predict but are also often functionally important. RESULTS TurboKnot is a new algorithm for predicting the secondary structure, including pseudoknotted pairs, conserved across multiple sequences. TurboKnot finds 81.6% of all known base pairs in the systems tested, and 75.6% of predicted pairs were found in the known structures. Pseudoknots are found with half or better of the false-positive rate of previous methods.
Collapse
Affiliation(s)
- Matthew G Seetin
- Department of Biochemistry and Biophysics, University of Rochester Medical Center, Rochester, NY 14642, USA
| | | |
Collapse
|
22
|
Abstract
RNA is now appreciated to serve numerous cellular roles, and understanding RNA structure is important for understanding a mechanism of action. This contribution discusses the methods available for predicting RNA structure. Secondary structure is the set of the canonical base pairs, and secondary structure can be accurately determined by comparative sequence analysis. Secondary structure can also be predicted. The most commonly used method is free energy minimization. The accuracy of structure prediction is improved either by using experimental mapping data or by predicting a structure conserved in a set of homologous sequences. Additionally, tertiary structure, the three-dimensional arrangement of atoms, can be modeled with guidance from comparative analysis and experimental techniques. New approaches are also available for predicting tertiary structure.
Collapse
Affiliation(s)
- Matthew G Seetin
- Department of Biochemistry & Biophysics, University of Rochester Medical Center, Rochester, NY, USA
| | | |
Collapse
|
23
|
Wang X, Cai Y, Sun Y, Knight R, Mai V. Secondary structure information does not improve OTU assignment for partial 16s rRNA sequences. ISME JOURNAL 2011; 6:1277-80. [PMID: 22170430 DOI: 10.1038/ismej.2011.187] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Xiaoyu Wang
- Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, USA
| | | | | | | | | |
Collapse
|
24
|
Harmanci AO, Sharma G, Mathews DH. TurboFold: iterative probabilistic estimation of secondary structures for multiple RNA sequences. BMC Bioinformatics 2011; 12:108. [PMID: 21507242 PMCID: PMC3120699 DOI: 10.1186/1471-2105-12-108] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2010] [Accepted: 04/20/2011] [Indexed: 01/07/2023] Open
Abstract
Background The prediction of secondary structure, i.e. the set of canonical base pairs between nucleotides, is a first step in developing an understanding of the function of an RNA sequence. The most accurate computational methods predict conserved structures for a set of homologous RNA sequences. These methods usually suffer from high computational complexity. In this paper, TurboFold, a novel and efficient method for secondary structure prediction for multiple RNA sequences, is presented. Results TurboFold takes, as input, a set of homologous RNA sequences and outputs estimates of the base pairing probabilities for each sequence. The base pairing probabilities for a sequence are estimated by combining intrinsic information, derived from the sequence itself via the nearest neighbor thermodynamic model, with extrinsic information, derived from the other sequences in the input set. For a given sequence, the extrinsic information is computed by using pairwise-sequence-alignment-based probabilities for co-incidence with each of the other sequences, along with estimated base pairing probabilities, from the previous iteration, for the other sequences. The extrinsic information is introduced as free energy modifications for base pairing in a partition function computation based on the nearest neighbor thermodynamic model. This process yields updated estimates of base pairing probability. The updated base pairing probabilities in turn are used to recompute extrinsic information, resulting in the overall iterative estimation procedure that defines TurboFold. TurboFold is benchmarked on a number of ncRNA datasets and compared against alternative secondary structure prediction methods. The iterative procedure in TurboFold is shown to improve estimates of base pairing probability with each iteration, though only small gains are obtained beyond three iterations. Secondary structures composed of base pairs with estimated probabilities higher than a significance threshold are shown to be more accurate for TurboFold than for alternative methods that estimate base pairing probabilities. TurboFold-MEA, which uses base pairing probabilities from TurboFold in a maximum expected accuracy algorithm for secondary structure prediction, has accuracy comparable to the best performing secondary structure prediction methods. The computational and memory requirements for TurboFold are modest and, in terms of sequence length and number of sequences, scale much more favorably than joint alignment and folding algorithms. Conclusions TurboFold is an iterative probabilistic method for predicting secondary structures for multiple RNA sequences that efficiently and accurately combines the information from the comparative analysis between sequences with the thermodynamic folding model. Unlike most other multi-sequence structure prediction methods, TurboFold does not enforce strict commonality of structures and is therefore useful for predicting structures for homologous sequences that have diverged significantly. TurboFold can be downloaded as part of the RNAstructure package at http://rna.urmc.rochester.edu.
Collapse
Affiliation(s)
- Arif O Harmanci
- Department of Electrical and Computer Engineering, University of Rochester, Rochester, NY, USA
| | | | | |
Collapse
|
25
|
Sahraeian SME, Yoon BJ. PicXAA-R: efficient structural alignment of multiple RNA sequences using a greedy approach. BMC Bioinformatics 2011; 12 Suppl 1:S38. [PMID: 21342569 PMCID: PMC3044294 DOI: 10.1186/1471-2105-12-s1-s38] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
Background Accurate and efficient structural alignment of non-coding RNAs (ncRNAs) has grasped more and more attentions as recent studies unveiled the significance of ncRNAs in living organisms. While the Sankoff style structural alignment algorithms cannot efficiently serve for multiple sequences, mostly progressive schemes are used to reduce the complexity. However, this idea tends to propagate the early stage errors throughout the entire process, thereby degrading the quality of the final alignment. For multiple protein sequence alignment, we have recently proposed PicXAA which constructs an accurate alignment in a non-progressive fashion. Results Here, we propose PicXAA-R as an extension to PicXAA for greedy structural alignment of ncRNAs. PicXAA-R efficiently grasps both folding information within each sequence and local similarities between sequences. It uses a set of probabilistic consistency transformations to improve the posterior base-pairing and base alignment probabilities using the information of all sequences in the alignment. Using a graph-based scheme, we greedily build up the structural alignment from sequence regions with high base-pairing and base alignment probabilities. Conclusions Several experiments on datasets with different characteristics confirm that PicXAA-R is one of the fastest algorithms for structural alignment of multiple RNAs and it consistently yields accurate alignment results, especially for datasets with locally similar sequences. PicXAA-R source code is freely available at: http://www.ece.tamu.edu/~bjyoon/picxaa/.
Collapse
|
26
|
Xu Z, Mathews DH. Multilign: an algorithm to predict secondary structures conserved in multiple RNA sequences. ACTA ACUST UNITED AC 2010; 27:626-32. [PMID: 21193521 DOI: 10.1093/bioinformatics/btq726] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
MOTIVATION With recent advances in sequencing, structural and functional studies of RNA lag behind the discovery of sequences. Computational analysis of RNA is increasingly important to reveal structure-function relationships with low cost and speed. The purpose of this study is to use multiple homologous sequences to infer a conserved RNA structure. RESULTS A new algorithm, called Multilign, is presented to find the lowest free energy RNA secondary structure common to multiple sequences. Multilign is based on Dynalign, which is a program that simultaneously aligns and folds two sequences to find the lowest free energy conserved structure. For Multilign, Dynalign is used to progressively construct a conserved structure from multiple pairwise calculations, with one sequence used in all pairwise calculations. A base pair is predicted only if it is contained in the set of low free energy structures predicted by all Dynalign calculations. In this way, Multilign improves prediction accuracy by keeping the genuine base pairs and excluding competing false base pairs. Multilign has computational complexity that scales linearly in the number of sequences. Multilign was tested on extensive datasets of sequences with known structure and its prediction accuracy is among the best of available algorithms. Multilign can run on long sequences (> 1500 nt) and an arbitrarily large number of sequences. AVAILABILITY The algorithm is implemented in ANSI C++ and can be downloaded as part of the RNAstructure package at: http://rna.urmc.rochester.edu.
Collapse
Affiliation(s)
- Zhenjiang Xu
- Department of Biochemistry and Biophysics, University of Rochester Medical Center, Rochester, NY, USA
| | | |
Collapse
|
27
|
Andronescu M, Condon A, Hoos HH, Mathews DH, Murphy KP. Computational approaches for RNA energy parameter estimation. RNA (NEW YORK, N.Y.) 2010; 16:2304-18. [PMID: 20940338 PMCID: PMC2995392 DOI: 10.1261/rna.1950510] [Citation(s) in RCA: 82] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
Methods for efficient and accurate prediction of RNA structure are increasingly valuable, given the current rapid advances in understanding the diverse functions of RNA molecules in the cell. To enhance the accuracy of secondary structure predictions, we developed and refined optimization techniques for the estimation of energy parameters. We build on two previous approaches to RNA free-energy parameter estimation: (1) the Constraint Generation (CG) method, which iteratively generates constraints that enforce known structures to have energies lower than other structures for the same molecule; and (2) the Boltzmann Likelihood (BL) method, which infers a set of RNA free-energy parameters that maximize the conditional likelihood of a set of reference RNA structures. Here, we extend these approaches in two main ways: We propose (1) a max-margin extension of CG, and (2) a novel linear Gaussian Bayesian network that models feature relationships, which effectively makes use of sparse data by sharing statistical strength between parameters. We obtain significant improvements in the accuracy of RNA minimum free-energy pseudoknot-free secondary structure prediction when measured on a comprehensive set of 2518 RNA molecules with reference structures. Our parameters can be used in conjunction with software that predicts RNA secondary structures, RNA hybridization, or ensembles of structures. Our data, software, results, and parameter sets in various formats are freely available at http://www.cs.ubc.ca/labs/beta/Projects/RNA-Params.
Collapse
Affiliation(s)
- Mirela Andronescu
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA.
| | | | | | | | | |
Collapse
|
28
|
Mathews DH, Moss WN, Turner DH. Folding and finding RNA secondary structure. Cold Spring Harb Perspect Biol 2010; 2:a003665. [PMID: 20685845 DOI: 10.1101/cshperspect.a003665] [Citation(s) in RCA: 106] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Optimal exploitation of the expanding database of sequences requires rapid finding and folding of RNAs. Methods are reviewed that automate folding and discovery of RNAs with algorithms that couple thermodynamics with chemical mapping, NMR, and/or sequence comparison. New functional noncoding RNAs in genome sequences can be found by combining sequence comparison with the assumption that functional noncoding RNAs will have more favorable folding free energies than other RNAs. When a new RNA is discovered, experiments and sequence comparison can restrict folding space so that secondary structure can be rapidly determined with the help of predicted free energies. In turn, secondary structure restricts folding in three dimensions, which allows modeling of three-dimensional structure. An example from a domain of a retrotransposon is described. Discovery of new RNAs and their structures will provide insights into evolution, biology, and design of therapeutics. Applications to studies of evolution are also reviewed.
Collapse
Affiliation(s)
- David H Mathews
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester School of Medicine and Dentistry, Rochester, New York 14642, USA
| | | | | |
Collapse
|
29
|
|
30
|
Engelen S, Tahi F. Tfold: efficient in silico prediction of non-coding RNA secondary structures. Nucleic Acids Res 2010; 38:2453-66. [PMID: 20047957 PMCID: PMC2853104 DOI: 10.1093/nar/gkp1067] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2009] [Revised: 10/30/2009] [Accepted: 11/02/2009] [Indexed: 11/12/2022] Open
Abstract
Predicting RNA secondary structures is a very important task, and continues to be a challenging problem, even though several methods and algorithms are proposed in the literature. In this article, we propose an algorithm called Tfold, for predicting non-coding RNA secondary structures. Tfold takes as input a RNA sequence for which the secondary structure is searched and a set of aligned homologous sequences. It combines criteria of stability, conservation and covariation in order to search for stems and pseudoknots (whatever their type). Stems are searched recursively, from the most to the least stable. Tfold uses an algorithm called SSCA for selecting the most appropriate sequences from a large set of homologous sequences (taken from a database for example) to use for the prediction. Tfold can take into account one or several stems considered by the user as belonging to the secondary structure. Tfold can return several structures (if requested by the user) when 'rival' stems are found. Tfold has a complexity of O(n(2)), with n the sequence length. The developed software, which offers several different uses, is available on the web site: http://tfold.ibisc.univ-evry.fr/TFold.
Collapse
Affiliation(s)
| | - Fariza Tahi
- IBISC laboratory CNRS FRE 3190, University of Evry/Genopole, 523 place des Terrasses, 91000 Evry, France
| |
Collapse
|
31
|
Reuter JS, Mathews DH. RNAstructure: software for RNA secondary structure prediction and analysis. BMC Bioinformatics 2010; 11:129. [PMID: 20230624 PMCID: PMC2984261 DOI: 10.1186/1471-2105-11-129] [Citation(s) in RCA: 1399] [Impact Index Per Article: 93.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2009] [Accepted: 03/15/2010] [Indexed: 11/16/2022] Open
Abstract
Background To understand an RNA sequence's mechanism of action, the structure must be known. Furthermore, target RNA structure is an important consideration in the design of small interfering RNAs and antisense DNA oligonucleotides. RNA secondary structure prediction, using thermodynamics, can be used to develop hypotheses about the structure of an RNA sequence. Results RNAstructure is a software package for RNA secondary structure prediction and analysis. It uses thermodynamics and utilizes the most recent set of nearest neighbor parameters from the Turner group. It includes methods for secondary structure prediction (using several algorithms), prediction of base pair probabilities, bimolecular structure prediction, and prediction of a structure common to two sequences. This contribution describes new extensions to the package, including a library of C++ classes for incorporation into other programs, a user-friendly graphical user interface written in JAVA, and new Unix-style text interfaces. The original graphical user interface for Microsoft Windows is still maintained. Conclusion The extensions to RNAstructure serve to make RNA secondary structure prediction user-friendly. The package is available for download from the Mathews lab homepage at http://rna.urmc.rochester.edu/RNAstructure.html.
Collapse
Affiliation(s)
- Jessica S Reuter
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, USA
| | | |
Collapse
|
32
|
Hamada M, Sato K, Kiryu H, Mituyama T, Asai K. CentroidAlign: fast and accurate aligner for structured RNAs by maximizing expected sum-of-pairs score. Bioinformatics 2009; 25:3236-43. [DOI: 10.1093/bioinformatics/btp580] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
|
33
|
Harmanci AO, Sharma G, Mathews DH. Stochastic sampling of the RNA structural alignment space. Nucleic Acids Res 2009; 37:4063-75. [PMID: 19429694 PMCID: PMC2709569 DOI: 10.1093/nar/gkp276] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
A novel method is presented for predicting the common secondary structures and alignment of two homologous RNA sequences by sampling the 'structural alignment' space, i.e. the joint space of their alignments and common secondary structures. The structural alignment space is sampled according to a pseudo-Boltzmann distribution based on a pseudo-free energy change that combines base pairing probabilities from a thermodynamic model and alignment probabilities from a hidden Markov model. By virtue of the implicit comparative analysis between the two sequences, the method offers an improvement over single sequence sampling of the Boltzmann ensemble. A cluster analysis shows that the samples obtained from joint sampling of the structural alignment space cluster more closely than samples generated by the single sequence method. On average, the representative (centroid) structure and alignment of the most populated cluster in the sample of structures and alignments generated by joint sampling are more accurate than single sequence sampling and alignment based on sequence alone, respectively. The 'best' centroid structure that is closest to the known structure among all the centroids is, on average, more accurate than structure predictions of other methods. Additionally, cluster analysis identifies, on average, a few clusters, whose centroids can be presented as alternative candidates. The source code for the proposed method can be downloaded at http://rna.urmc.rochester.edu.
Collapse
Affiliation(s)
- Arif Ozgun Harmanci
- Department of Electrical and Computer Engineering, University of Rochester, Hopeman 204, RC Box 270126, Rochester, NY 14627, USA
| | | | | |
Collapse
|
34
|
Taneda A. An efficient genetic algorithm for structural RNA pairwise alignment and its application to non-coding RNA discovery in yeast. BMC Bioinformatics 2008; 9:521. [PMID: 19061486 PMCID: PMC2630964 DOI: 10.1186/1471-2105-9-521] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2008] [Accepted: 12/05/2008] [Indexed: 11/30/2022] Open
Abstract
Background Aligning RNA sequences with low sequence identity has been a challenging problem since such a computation essentially needs an algorithm with high complexities for taking structural conservation into account. Although many sophisticated algorithms for the purpose have been proposed to date, further improvement in efficiency is necessary to accelerate its large-scale applications including non-coding RNA (ncRNA) discovery. Results We developed a new genetic algorithm, Cofolga2, for simultaneously computing pairwise RNA sequence alignment and consensus folding, and benchmarked it using BRAliBase 2.1. The benchmark results showed that our new algorithm is accurate and efficient in both time and memory usage. Then, combining with the originally trained SVM, we applied the new algorithm to novel ncRNA discovery where we compared S. cerevisiae genome with six related genomes in a pairwise manner. By focusing our search to the relatively short regions (50 bp to 2,000 bp) sandwiched by conserved sequences, we successfully predict 714 intergenic and 1,311 sense or antisense ncRNA candidates, which were found in the pairwise alignments with stable consensus secondary structure and low sequence identity (≤ 50%). By comparing with the previous predictions, we found that > 92% of the candidates is novel candidates. The estimated rate of false positives in the predicted candidates is 51%. Twenty-five percent of the intergenic candidates has supports for expression in cell, i.e. their genomic positions overlap those of the experimentally determined transcripts in literature. By manual inspection of the results, moreover, we obtained four multiple alignments with low sequence identity which reveal consensus structures shared by three species/sequences. Conclusion The present method gives an efficient tool complementary to sequence-alignment-based ncRNA finders.
Collapse
Affiliation(s)
- Akito Taneda
- Graduate School of Science and Technology, Hirosaki University, Hirosaki, Japan.
| |
Collapse
|
35
|
Seemann SE, Gorodkin J, Backofen R. Unifying evolutionary and thermodynamic information for RNA folding of multiple alignments. Nucleic Acids Res 2008; 36:6355-62. [PMID: 18836192 PMCID: PMC2582601 DOI: 10.1093/nar/gkn544] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Computational methods for determining the secondary structure of RNA sequences from given alignments are currently either based on thermodynamic folding, compensatory base pair substitutions or both. However, there is currently no approach that combines both sources of information in a single optimization problem. Here, we present a model that formally integrates both the energy-based and evolution-based approaches to predict the folding of multiple aligned RNA sequences. We have implemented an extended version of Pfold that identifies base pairs that have high probabilities of being conserved and of being energetically favorable. The consensus structure is predicted using a maximum expected accuracy scoring scheme to smoothen the effect of incorrectly predicted base pairs. Parameter tuning revealed that the probability of base pairing has a higher impact on the RNA structure prediction than the corresponding probability of being single stranded. Furthermore, we found that structurally conserved RNA motifs are mostly supported by folding energies. Other problems (e.g. RNA-folding kinetics) may also benefit from employing the principles of the model we introduce. Our implementation, PETfold, was tested on a set of 46 well-curated Rfam families and its performance compared favorably to that of Pfold and RNAalifold.
Collapse
Affiliation(s)
- Stefan E Seemann
- Division of Genetics and Bioinformatics, IBHV and Center for Applied Bioinformatics, University of Copenhagen, Groennegårdsvej 3, DK-1870 Frederiksberg C, Denmark
| | | | | |
Collapse
|
36
|
Abstract
RNAstructure is a user-friendly program for the prediction and analysis of RNA secondary structure under Microsoft Windows. This unit provides protocols for RNA secondary structure prediction and prediction of high-affinity oligonucleotide binding sites to a structured RNA target.
Collapse
Affiliation(s)
- David H Mathews
- University of Rochester Medical Center, Rochester, New York, USA
| |
Collapse
|