1
|
Tieng FYF, Abdullah-Zawawi MR, Md Shahri NAA, Mohamed-Hussein ZA, Lee LH, Mutalib NSA. A Hitchhiker's guide to RNA-RNA structure and interaction prediction tools. Brief Bioinform 2023; 25:bbad421. [PMID: 38040490 PMCID: PMC10753535 DOI: 10.1093/bib/bbad421] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 10/16/2023] [Accepted: 10/26/2023] [Indexed: 12/03/2023] Open
Abstract
RNA biology has risen to prominence after a remarkable discovery of diverse functions of noncoding RNA (ncRNA). Most untranslated transcripts often exert their regulatory functions into RNA-RNA complexes via base pairing with complementary sequences in other RNAs. An interplay between RNAs is essential, as it possesses various functional roles in human cells, including genetic translation, RNA splicing, editing, ribosomal RNA maturation, RNA degradation and the regulation of metabolic pathways/riboswitches. Moreover, the pervasive transcription of the human genome allows for the discovery of novel genomic functions via RNA interactome investigation. The advancement of experimental procedures has resulted in an explosion of documented data, necessitating the development of efficient and precise computational tools and algorithms. This review provides an extensive update on RNA-RNA interaction (RRI) analysis via thermodynamic- and comparative-based RNA secondary structure prediction (RSP) and RNA-RNA interaction prediction (RIP) tools and their general functions. We also highlighted the current knowledge of RRIs and the limitations of RNA interactome mapping via experimental data. Then, the gap between RSP and RIP, the importance of RNA homologues, the relationship between pseudoknots, and RNA folding thermodynamics are discussed. It is hoped that these emerging prediction tools will deepen the understanding of RNA-associated interactions in human diseases and hasten treatment processes.
Collapse
Affiliation(s)
- Francis Yew Fu Tieng
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia (UKM), Kuala Lumpur 56000, Malaysia
| | | | - Nur Alyaa Afifah Md Shahri
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia (UKM), Kuala Lumpur 56000, Malaysia
| | - Zeti-Azura Mohamed-Hussein
- Institute of Systems Biology (INBIOSIS), UKM, Selangor 43600, Malaysia
- Department of Applied Physics, Faculty of Science and Technology, UKM, Selangor 43600, Malaysia
| | - Learn-Han Lee
- Sunway Microbiomics Centre, School of Medical and Life Sciences, Sunway University, Sunway City 47500, Malaysia
- Novel Bacteria and Drug Discovery Research Group, Microbiome and Bioresource Research Strength, Jeffrey Cheah School of Medicine and Health Sciences, Monash University of Malaysia, Selangor 47500, Malaysia
| | - Nurul-Syakima Ab Mutalib
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia (UKM), Kuala Lumpur 56000, Malaysia
- Novel Bacteria and Drug Discovery Research Group, Microbiome and Bioresource Research Strength, Jeffrey Cheah School of Medicine and Health Sciences, Monash University of Malaysia, Selangor 47500, Malaysia
- Faculty of Health Sciences, UKM, Kuala Lumpur 50300, Malaysia
| |
Collapse
|
2
|
El Fatmi A, Bekri MA, Benhlima S. RNAknot: A new algorithm for RNA secondary structure prediction based on genetic algorithm and GRASP method. J Bioinform Comput Biol 2020; 17:1950031. [PMID: 31856666 DOI: 10.1142/s0219720019500318] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The prediction of the optimal secondary structure for a given RNA sequence represents a challenging computational problem in bioinformatics. This challenge becomes harder especially with the discovery of different pseudoknot classes, which is a complex topology that plays diverse roles in biological processes. Many recent studies have been proposed to predict RNA secondary structure with some pseudoknot classes, but only a few of them have reached satisfying results in terms of both complexity and accuracy. Here we present RNAknot, a new method for predicting RNA secondary structure that contains the following components: stems, hairpin loops, multi-branched loops or multi-loops, bulge loops, and internal loops, in addition to two types of pseudoknots, H-type pseudoknot and Hairpin kissing. RNAknot is based on a genetic algorithm and Greedy Randomized Adaptive Search Procedure (GRASP), and it uses the free energy as fitness function to evaluate the obtained structures. In order to validate the performance of the presented method 131 tests have been performed using two datasets of 26 and 105 RNA sequences, which have been taken from the two data bases RNAstrand and Pseudobase respectively. The obtained results are compared with those of some RNA secondary structure prediction programs such as Vs_subopt, CyloFold, IPknot, Kinefold, RNAstructure, and Sfold. The results of this comparative study show that the prediction accuracy of our proposed approach is significantly improved compared to those obtained by the other programs. For the first dataset, RNAknot has the highest specificity (SP) (71.23%) and sensitivity (SN) (72.15%) averages compared to the other programs. Concerning the second dataset, the RNA secondary structure predictions obtained by the RNAknot correspond to the highest averages of SP (85.49%) and F-measure (79.97%) compared to the other programs. The program is available as a jar file in the link: www.bachmek.umi.ac.ma/wp-content/uploads/RNAknot.0.0.2.rar.
Collapse
Affiliation(s)
- Abdelhakim El Fatmi
- Computer Science Department, MACS Lab, Faculty of Science, Moulay Ismail University, Meknes, BP 11201, Morocco
| | - M Ali Bekri
- Computer Science Department, MACS Lab, Faculty of Science, Moulay Ismail University, Meknes, BP 11201, Morocco
| | - Said Benhlima
- Computer Science Department, MACS Lab, Faculty of Science, Moulay Ismail University, Meknes, BP 11201, Morocco
| |
Collapse
|
3
|
Barquist L, Burge SW, Gardner PP. Studying RNA Homology and Conservation with Infernal: From Single Sequences to RNA Families. CURRENT PROTOCOLS IN BIOINFORMATICS 2016; 54:12.13.1-12.13.25. [PMID: 27322404 PMCID: PMC5010141 DOI: 10.1002/cpbi.4] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Emerging high-throughput technologies have led to a deluge of putative non-coding RNA (ncRNA) sequences identified in a wide variety of organisms. Systematic characterization of these transcripts will be a tremendous challenge. Homology detection is critical to making maximal use of functional information gathered about ncRNAs: identifying homologous sequence allows us to transfer information gathered in one organism to another quickly and with a high degree of confidence. ncRNA presents a challenge for homology detection, as the primary sequence is often poorly conserved and de novo secondary structure prediction and search remain difficult. This unit introduces methods developed by the Rfam database for identifying "families" of homologous ncRNAs starting from single "seed" sequences, using manually curated sequence alignments to build powerful statistical models of sequence and structure conservation known as covariance models (CMs), implemented in the Infernal software package. We provide a step-by-step iterative protocol for identifying ncRNA homologs and then constructing an alignment and corresponding CM. We also work through an example for the bacterial small RNA MicA, discovering a previously unreported family of divergent MicA homologs in genus Xenorhabdus in the process. © 2016 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Lars Barquist
- Institute for Molecular Infection Biology, University of Würzburg, Würzburg, D-97080 Germany
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA United Kingdom; Fax: +44 (0)1223 494919
| | - Sarah W. Burge
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA United Kingdom; Fax: +44 (0)1223 494919
| | - Paul P. Gardner
- School of Biological Sciences, University of Canterbury, Private Bag 4800, Christchurch, New Zealand
- Biomolecular Interaction Centre, University of Canterbury, Private Bag 4800, Christchurch, New Zealand
| |
Collapse
|
4
|
Zhang Y, Huang H, Dong X, Fang Y, Wang K, Zhu L, Wang K, Huang T, Yang J. A Dynamic 3D Graphical Representation for RNA Structure Analysis and Its Application in Non-Coding RNA Classification. PLoS One 2016; 11:e0152238. [PMID: 27213271 PMCID: PMC4877074 DOI: 10.1371/journal.pone.0152238] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2015] [Accepted: 03/10/2016] [Indexed: 12/21/2022] Open
Abstract
With the development of new technologies in transcriptome and epigenetics, RNAs have been identified to play more and more important roles in life processes. Consequently, various methods have been proposed to assess the biological functions of RNAs and thus classify them functionally, among which comparative study of RNA structures is perhaps the most important one. To measure the structural similarity of RNAs and classify them, we propose a novel three dimensional (3D) graphical representation of RNA secondary structure, in which an RNA secondary structure is first transformed into a characteristic sequence based on chemical property of nucleic acids; a dynamic 3D graph is then constructed for the characteristic sequence; and lastly a numerical characterization of the 3D graph is used to represent the RNA secondary structure. We tested our algorithm on three datasets: (1) Dataset I consisting of nine RNA secondary structures of viruses, (2) Dataset II consisting of complex RNA secondary structures including pseudo-knots, and (3) Dataset III consisting of 18 non-coding RNA families. We also compare our method with other nine existing methods using Dataset II and III. The results demonstrate that our method is better than other methods in similarity measurement and classification of RNA secondary structures.
Collapse
Affiliation(s)
- Yi Zhang
- Department of Mathematics, Hebei University of Science and Technology, Shijiazhuang, Hebei 050018, People's Republic of China
- Hebei Laboratory of Pharmaceutic Molecular Chemistry, Shijiazhuang, Hebei 050018, People's Republic of China
- * E-mail: (JY); (YZ); (TH)
| | - Haiyun Huang
- Department of Information Retrieval of Library, Hebei University of Science and Technology, Shijiazhuang, Hebei 050018, People's Republic of China
| | - Xiaoqing Dong
- Department of Mathematics, Hebei University of Science and Technology, Shijiazhuang, Hebei 050018, People's Republic of China
| | - Yiliang Fang
- International Travel Healthcare Center, Fuzhou, Fujian 350001, People's Republic of China
| | - Kejing Wang
- Department of Mathematics, Hebei University of Science and Technology, Shijiazhuang, Hebei 050018, People's Republic of China
| | - Lijuan Zhu
- Department of Mathematics, Hebei University of Science and Technology, Shijiazhuang, Hebei 050018, People's Republic of China
| | - Ke Wang
- Department of Mathematics, Hebei University of Science and Technology, Shijiazhuang, Hebei 050018, People's Republic of China
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, People's Republic of China
- * E-mail: (JY); (YZ); (TH)
| | - Jialiang Yang
- Department of Mathematics, Hebei University of Science and Technology, Shijiazhuang, Hebei 050018, People's Republic of China
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, United States of America
- * E-mail: (JY); (YZ); (TH)
| |
Collapse
|
5
|
Abstract
RNA structure is conserved by evolution to a greater extent than sequence. Predicting the conserved structure for multiple homologous sequences can be much more accurate than predicting the structure for a single sequence. RNAstructure is a software package that includes the programs Dynalign, Multilign, TurboFold, and PARTS for predicting conserved RNA secondary structure. This chapter provides protocols for using these programs.
Collapse
|
6
|
Chatzou M, Magis C, Chang JM, Kemena C, Bussotti G, Erb I, Notredame C. Multiple sequence alignment modeling: methods and applications. Brief Bioinform 2015; 17:1009-1023. [PMID: 26615024 DOI: 10.1093/bib/bbv099] [Citation(s) in RCA: 84] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2015] [Revised: 10/16/2015] [Indexed: 12/20/2022] Open
Abstract
This review provides an overview on the development of Multiple sequence alignment (MSA) methods and their main applications. It is focused on progress made over the past decade. The three first sections review recent algorithmic developments for protein, RNA/DNA and genomic alignments. The fourth section deals with benchmarks and explores the relationship between empirical and simulated data, along with the impact on method developments. The last part of the review gives an overview on available MSA local reliability estimators and their dependence on various algorithmic properties of available methods.
Collapse
|
7
|
Song Y, Hua L, Shapiro BA, Wang JTL. Effective alignment of RNA pseudoknot structures using partition function posterior log-odds scores. BMC Bioinformatics 2015; 16:39. [PMID: 25727492 PMCID: PMC4339682 DOI: 10.1186/s12859-015-0464-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2014] [Accepted: 01/13/2015] [Indexed: 11/18/2022] Open
Abstract
Background RNA pseudoknots play important roles in many biological processes. Previous methods for comparative pseudoknot analysis mainly focus on simultaneous folding and alignment of RNA sequences. Little work has been done to align two known RNA secondary structures with pseudoknots taking into account both sequence and structure information of the two RNAs. Results In this article we present a novel method for aligning two known RNA secondary structures with pseudoknots. We adopt the partition function methodology to calculate the posterior log-odds scores of the alignments between bases or base pairs of the two RNAs with a dynamic programming algorithm. The posterior log-odds scores are then used to calculate the expected accuracy of an alignment between the RNAs. The goal is to find an optimal alignment with the maximum expected accuracy. We present a heuristic to achieve this goal. The performance of our method is investigated and compared with existing tools for RNA structure alignment. An extension of the method to multiple alignment of pseudoknot structures is also discussed. Conclusions The method described here has been implemented in a tool named RKalign, which is freely accessible on the Internet. As more and more pseudoknots are revealed, collected and stored in public databases, we anticipate a tool like RKalign will play a significant role in data comparison, annotation, analysis, and retrieval in these databases. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0464-9) contains supplementary material, which is available to authorized users.
Collapse
|
8
|
Sloma MF, Mathews DH. Improving RNA secondary structure prediction with structure mapping data. Methods Enzymol 2015; 553:91-114. [PMID: 25726462 DOI: 10.1016/bs.mie.2014.10.053] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Methods to probe RNA secondary structure, such as small molecule modifying agents, secondary structure-specific nucleases, inline probing, and SHAPE chemistry, are widely used to study the structure of functional RNA. Computational secondary structure prediction programs can incorporate probing data to predict structure with high accuracy. In this chapter, an overview of current methods for probing RNA secondary structure is provided, including modern high-throughput methods. Methods for guiding secondary structure prediction algorithms using these data are explained, and best practices for using these data are provided. This chapter concludes by listing a number of open questions about how to best use probing data, and what these data can provide.
Collapse
Affiliation(s)
- Michael F Sloma
- Department of Biochemistry & Biophysics, University of Rochester Medical Center, Box 712, Rochester, New York, USA; Center for RNA Biology, University of Rochester Medical Center, Box 712, Rochester, New York, USA
| | - David H Mathews
- Department of Biochemistry & Biophysics, University of Rochester Medical Center, Box 712, Rochester, New York, USA; Center for RNA Biology, University of Rochester Medical Center, Box 712, Rochester, New York, USA.
| |
Collapse
|
9
|
Theil Have C, Zambach S, Christiansen H. Effects of using coding potential, sequence conservation and mRNA structure conservation for predicting pyrrolysine containing genes. BMC Bioinformatics 2013; 14:118. [PMID: 23557142 PMCID: PMC3639795 DOI: 10.1186/1471-2105-14-118] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2012] [Accepted: 03/19/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Pyrrolysine (the 22nd amino acid) is in certain organisms and under certain circumstances encoded by the amber stop codon, UAG. The circumstances driving pyrrolysine translation are not well understood. The involvement of a predicted mRNA structure in the region downstream UAG has been suggested, but the structure does not seem to be present in all pyrrolysine incorporating genes. RESULTS We propose a strategy to predict pyrrolysine encoding genes in genomes of archaea and bacteria. We cluster open reading frames interrupted by the amber codon based on sequence similarity. We rank these clusters according to several features that may influence pyrrolysine translation. The ranking effects of different features are assessed and we propose a weighted combination of these features which best explains the currently known pyrrolysine incorporating genes. We devote special attention to the effect of structural conservation and provide further substantiation to support that structural conservation may be influential - but is not a necessary factor. Finally, from the weighted ranking, we identify a number of potentially pyrrolysine incorporating genes. CONCLUSIONS We propose a method for prediction of pyrrolysine incorporating genes in genomes of bacteria and archaea leading to insights about the factors driving pyrrolysine translation and identification of new gene candidates. The method predicts known conserved genes with high recall and predicts several other promising candidates for experimental verification. The method is implemented as a computational pipeline which is available on request.
Collapse
Affiliation(s)
- Christian Theil Have
- Research group PLIS: Programming, Logic and Intelligent Systems, Department of Communication, Business and Information Technologies, Roskilde University, P.O. Box 260, Roskilde, DK-4000, Denmark.
| | | | | |
Collapse
|
10
|
Puton T, Kozlowski LP, Rother KM, Bujnicki JM. CompaRNA: a server for continuous benchmarking of automated methods for RNA secondary structure prediction. Nucleic Acids Res 2013; 41:4307-23. [PMID: 23435231 PMCID: PMC3627593 DOI: 10.1093/nar/gkt101] [Citation(s) in RCA: 81] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
We present a continuous benchmarking approach for the assessment of RNA secondary structure prediction methods implemented in the CompaRNA web server. As of 3 October 2012, the performance of 28 single-sequence and 13 comparative methods has been evaluated on RNA sequences/structures released weekly by the Protein Data Bank. We also provide a static benchmark generated on RNA 2D structures derived from the RNAstrand database. Benchmarks on both data sets offer insight into the relative performance of RNA secondary structure prediction methods on RNAs of different size and with respect to different types of structure. According to our tests, on the average, the most accurate predictions obtained by a comparative approach are generated by CentroidAlifold, MXScarna, RNAalifold and TurboFold. On the average, the most accurate predictions obtained by single-sequence analyses are generated by CentroidFold, ContextFold and IPknot. The best comparative methods typically outperform the best single-sequence methods if an alignment of homologous RNA sequences is available. This article presents the results of our benchmarks as of 3 October 2012, whereas the rankings presented online are continuously updated. We will gladly include new prediction methods and new measures of accuracy in the new editions of CompaRNA benchmarks.
Collapse
Affiliation(s)
- Tomasz Puton
- Bioinformatics Laboratory, Institute for Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University, ul. Umultowska 89, 61-614 Poznan, Poland
| | | | | | | |
Collapse
|
11
|
Shao J, Zhang J, Zhang Z, Jiang H, Lou X, Huang B, Foltz G, Lan Q, Huang Q, Lin B. Alternative polyadenylation in glioblastoma multiforme and changes in predicted RNA binding protein profiles. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2013; 17:136-49. [PMID: 23421905 DOI: 10.1089/omi.2012.0098] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Alternative polyadenylation (APA) is widely present in the human genome and plays a key role in carcinogenesis. We conducted a comprehensive analysis of the APA products in glioblastoma multiforme (GBM, one of the most lethal brain tumors) and normal brain tissues and further developed a computational pipeline, RNAelements (http://sysbio.zju.edu.cn/RNAelements/), using covariance model from known RNA binding protein (RBP) targets acquired by RNA Immunoprecipitation (RIP) analysis. We identified 4530 APA isoforms for 2733 genes in GBM, and found that 182 APA isoforms from 148 genes showed significant differential expression between normal and GBM brain tissues. We then focused on three genes with long and short APA isoforms that show inconsistent expression changes between normal and GBM brain tissues. These were myocyte enhancer factor 2D, heat shock factor binding protein 1, and polyhomeotic homolog 1 (Drosophila). Using the RNAelements program, we found that RBP binding sites were enriched in the alternative regions between the first and the last polyadenylation sites, which would result in the short APA forms escaping regulation from those RNA binding proteins. To the best of our knowledge, this report is the first comprehensive APA isoform dataset for GBM and normal brain tissues. Additionally, we demonstrated a putative novel APA-mediated mechanism for controlling RNA stability and translation for APA isoforms. These observations collectively lay a foundation for novel diagnostics and molecular mechanisms that can inform future therapeutic interventions for GBM.
Collapse
Affiliation(s)
- Jiaofang Shao
- Systems Biology Division, Zhejiang-California International NanoSystems Institute, Zhejiang University, Hangzhou, China
| | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Sato K, Kato Y, Akutsu T, Asai K, Sakakibara Y. DAFS: simultaneous aligning and folding of RNA sequences via dual decomposition. ACTA ACUST UNITED AC 2012; 28:3218-24. [PMID: 23060618 DOI: 10.1093/bioinformatics/bts612] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
MOTIVATION It is well known that the accuracy of RNA secondary structure prediction from a single sequence is limited, and thus a comparative approach that predicts a common secondary structure from aligned sequences is a better choice if homologous sequences with reliable alignments are available. However, correct secondary structure information is needed to produce reliable alignments of RNA sequences. To tackle this dilemma, we require a fast and accurate aligner that takes structural information into consideration to yield reliable structural alignments, which are suitable for common secondary structure prediction. RESULTS We develop DAFS, a novel algorithm that simultaneously aligns and folds RNA sequences based on maximizing expected accuracy of a predicted common secondary structure and its alignment. DAFS decomposes the pairwise structural alignment problem into two independent secondary structure prediction problems and one pairwise (non-structural) alignment problem by the dual decomposition technique, and maintains the consistency of a pairwise structural alignment by imposing penalties on inconsistent base pairs and alignment columns that are iteratively updated. Furthermore, we extend DAFS to consider pseudoknots in RNA structural alignments by integrating IPknot for predicting a pseudoknotted structure. The experiments on publicly available datasets showed that DAFS can produce reliable structural alignments from unaligned sequences in terms of accuracy of common secondary structure prediction.
Collapse
Affiliation(s)
- Kengo Sato
- Department of Biosciences and Informatics, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama 223-8522, Japan.
| | | | | | | | | |
Collapse
|
13
|
Seetin MG, Mathews DH. TurboKnot: rapid prediction of conserved RNA secondary structures including pseudoknots. ACTA ACUST UNITED AC 2012; 28:792-8. [PMID: 22285566 DOI: 10.1093/bioinformatics/bts044] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
MOTIVATION Many RNA molecules function without being translated into proteins, and function depends on structure. Pseudoknots are motifs in RNA secondary structures that are difficult to predict but are also often functionally important. RESULTS TurboKnot is a new algorithm for predicting the secondary structure, including pseudoknotted pairs, conserved across multiple sequences. TurboKnot finds 81.6% of all known base pairs in the systems tested, and 75.6% of predicted pairs were found in the known structures. Pseudoknots are found with half or better of the false-positive rate of previous methods.
Collapse
Affiliation(s)
- Matthew G Seetin
- Department of Biochemistry and Biophysics, University of Rochester Medical Center, Rochester, NY 14642, USA
| | | |
Collapse
|
14
|
Xu Z, Almudevar A, Mathews DH. Statistical evaluation of improvement in RNA secondary structure prediction. Nucleic Acids Res 2011; 40:e26. [PMID: 22139940 PMCID: PMC3287165 DOI: 10.1093/nar/gkr1081] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
With discovery of diverse roles for RNA, its centrality in cellular functions has become increasingly apparent. A number of algorithms have been developed to predict RNA secondary structure. Their performance has been benchmarked by comparing structure predictions to reference secondary structures. Generally, algorithms are compared against each other and one is selected as best without statistical testing to determine whether the improvement is significant. In this work, it is demonstrated that the prediction accuracies of methods correlate with each other over sets of sequences. One possible reason for this correlation is that many algorithms use the same underlying principles. A set of benchmarks published previously for programs that predict a structure common to three or more sequences is statistically analyzed as an example to show that it can be rigorously evaluated using paired two-sample t-tests. Finally, a pipeline of statistical analyses is proposed to guide the choice of data set size and performance assessment for benchmarks of structure prediction. The pipeline is applied using 5S rRNA sequences as an example.
Collapse
Affiliation(s)
- Zhenjiang Xu
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, NY, USA
| | | | | |
Collapse
|
15
|
Achawanantakun R, Sun Y, Takyar SS. ncRNA consensus secondary structure derivation using grammar strings. J Bioinform Comput Biol 2011; 9:317-37. [PMID: 21523935 DOI: 10.1142/s0219720011005501] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2011] [Revised: 02/28/2011] [Accepted: 03/01/2011] [Indexed: 11/18/2022]
Abstract
Many noncoding RNAs (ncRNAs) function through both their sequences and secondary structures. Thus, secondary structure derivation is an important issue in today's RNA research. The state-of-the-art structure annotation tools are based on comparative analysis, which derives consensus structure of homologous ncRNAs. Despite promising results from existing ncRNA aligning and consensus structure derivation tools, there is a need for more efficient and accurate ncRNA secondary structure modeling and alignment methods. In this work, we introduce a consensus structure derivation approach based on grammar string, a novel ncRNA secondary structure representation that encodes an ncRNA's sequence and secondary structure in the parameter space of a context-free grammar (CFG) and a full RNA grammar including pseudoknots. Being a string defined on a special alphabet constructed from a grammar, grammar string converts ncRNA alignment into sequence alignment. We derive consensus secondary structures from hundreds of ncRNA families from BraliBase 2.1 and 25 families containing pseudoknots using grammar string alignment. Our experiments have shown that grammar string-based structure derivation competes favorably in consensus structure quality with Murlet and RNASampler. Source code and experimental data are available at http://www.cse.msu.edu/~yannisun/grammar-string.
Collapse
Affiliation(s)
- Rujira Achawanantakun
- Computer Science and Engineering Department, Michigan State University, East Lansing, Michigan 48824, USA
| | | | | |
Collapse
|
16
|
Wei D, Alpert LV, Lawrence CE. RNAG: a new Gibbs sampler for predicting RNA secondary structure for unaligned sequences. ACTA ACUST UNITED AC 2011; 27:2486-93. [PMID: 21788211 PMCID: PMC3167047 DOI: 10.1093/bioinformatics/btr421] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION RNA secondary structure plays an important role in the function of many RNAs, and structural features are often key to their interaction with other cellular components. Thus, there has been considerable interest in the prediction of secondary structures for RNA families. In this article, we present a new global structural alignment algorithm, RNAG, to predict consensus secondary structures for unaligned sequences. It uses a blocked Gibbs sampling algorithm, which has a theoretical advantage in convergence time. This algorithm iteratively samples from the conditional probability distributions P(Structure | Alignment) and P(Alignment | Structure). Not surprisingly, there is considerable uncertainly in the high-dimensional space of this difficult problem, which has so far received limited attention in this field. We show how the samples drawn from this algorithm can be used to more fully characterize the posterior space and to assess the uncertainty of predictions. RESULTS Our analysis of three publically available datasets showed a substantial improvement in RNA structure prediction by RNAG over extant prediction methods. Additionally, our analysis of 17 RNA families showed that the RNAG sampled structures were generally compact around their ensemble centroids, and at least 11 families had at least two well-separated clusters of predicted structures. In general, the distance between a reference structure and our predicted structure was large relative to the variation among structures within an ensemble. AVAILABILITY The Perl implementation of the RNAG algorithm and the data necessary to reproduce the results described in Sections 3.1 and 3.2 are available at http://ccmbweb.ccv.brown.edu/rnag.html CONTACT charles_lawrence@brown.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Donglai Wei
- Department of Mathematics, Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA
| | | | | |
Collapse
|
17
|
Sahraeian SME, Yoon BJ. PicXAA-Web: a web-based platform for non-progressive maximum expected accuracy alignment of multiple biological sequences. Nucleic Acids Res 2011; 39:W8-12. [PMID: 21515632 PMCID: PMC3125727 DOI: 10.1093/nar/gkr244] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
In this article, we introduce PicXAA-Web, a web-based platform for accurate probabilistic alignment of multiple biological sequences. The core of PicXAA-Web consists of PicXAA, a multiple protein/DNA sequence alignment algorithm, and PicXAA-R, an extension of PicXAA for structural alignment of RNA sequences. Both PicXAA and PicXAA-R are probabilistic non-progressive alignment algorithms that aim to find the optimal alignment of multiple biological sequences by maximizing the expected accuracy. PicXAA and PicXAA-R greedily build up the alignment from sequence regions with high local similarity, thereby yielding an accurate global alignment that effectively captures local similarities among sequences. PicXAA-Web integrates these two algorithms in a user-friendly web platform for accurate alignment and analysis of multiple protein, DNA and RNA sequences. PicXAA-Web can be freely accessed at http://gsp.tamu.edu/picxaa/.
Collapse
|
18
|
Harmanci AO, Sharma G, Mathews DH. TurboFold: iterative probabilistic estimation of secondary structures for multiple RNA sequences. BMC Bioinformatics 2011; 12:108. [PMID: 21507242 PMCID: PMC3120699 DOI: 10.1186/1471-2105-12-108] [Citation(s) in RCA: 69] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2010] [Accepted: 04/20/2011] [Indexed: 01/07/2023] Open
Abstract
Background The prediction of secondary structure, i.e. the set of canonical base pairs between nucleotides, is a first step in developing an understanding of the function of an RNA sequence. The most accurate computational methods predict conserved structures for a set of homologous RNA sequences. These methods usually suffer from high computational complexity. In this paper, TurboFold, a novel and efficient method for secondary structure prediction for multiple RNA sequences, is presented. Results TurboFold takes, as input, a set of homologous RNA sequences and outputs estimates of the base pairing probabilities for each sequence. The base pairing probabilities for a sequence are estimated by combining intrinsic information, derived from the sequence itself via the nearest neighbor thermodynamic model, with extrinsic information, derived from the other sequences in the input set. For a given sequence, the extrinsic information is computed by using pairwise-sequence-alignment-based probabilities for co-incidence with each of the other sequences, along with estimated base pairing probabilities, from the previous iteration, for the other sequences. The extrinsic information is introduced as free energy modifications for base pairing in a partition function computation based on the nearest neighbor thermodynamic model. This process yields updated estimates of base pairing probability. The updated base pairing probabilities in turn are used to recompute extrinsic information, resulting in the overall iterative estimation procedure that defines TurboFold. TurboFold is benchmarked on a number of ncRNA datasets and compared against alternative secondary structure prediction methods. The iterative procedure in TurboFold is shown to improve estimates of base pairing probability with each iteration, though only small gains are obtained beyond three iterations. Secondary structures composed of base pairs with estimated probabilities higher than a significance threshold are shown to be more accurate for TurboFold than for alternative methods that estimate base pairing probabilities. TurboFold-MEA, which uses base pairing probabilities from TurboFold in a maximum expected accuracy algorithm for secondary structure prediction, has accuracy comparable to the best performing secondary structure prediction methods. The computational and memory requirements for TurboFold are modest and, in terms of sequence length and number of sequences, scale much more favorably than joint alignment and folding algorithms. Conclusions TurboFold is an iterative probabilistic method for predicting secondary structures for multiple RNA sequences that efficiently and accurately combines the information from the comparative analysis between sequences with the thermodynamic folding model. Unlike most other multi-sequence structure prediction methods, TurboFold does not enforce strict commonality of structures and is therefore useful for predicting structures for homologous sequences that have diverged significantly. TurboFold can be downloaded as part of the RNAstructure package at http://rna.urmc.rochester.edu.
Collapse
Affiliation(s)
- Arif O Harmanci
- Department of Electrical and Computer Engineering, University of Rochester, Rochester, NY, USA
| | | | | |
Collapse
|
19
|
Sahraeian SME, Yoon BJ. PicXAA-R: efficient structural alignment of multiple RNA sequences using a greedy approach. BMC Bioinformatics 2011; 12 Suppl 1:S38. [PMID: 21342569 PMCID: PMC3044294 DOI: 10.1186/1471-2105-12-s1-s38] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
Background Accurate and efficient structural alignment of non-coding RNAs (ncRNAs) has grasped more and more attentions as recent studies unveiled the significance of ncRNAs in living organisms. While the Sankoff style structural alignment algorithms cannot efficiently serve for multiple sequences, mostly progressive schemes are used to reduce the complexity. However, this idea tends to propagate the early stage errors throughout the entire process, thereby degrading the quality of the final alignment. For multiple protein sequence alignment, we have recently proposed PicXAA which constructs an accurate alignment in a non-progressive fashion. Results Here, we propose PicXAA-R as an extension to PicXAA for greedy structural alignment of ncRNAs. PicXAA-R efficiently grasps both folding information within each sequence and local similarities between sequences. It uses a set of probabilistic consistency transformations to improve the posterior base-pairing and base alignment probabilities using the information of all sequences in the alignment. Using a graph-based scheme, we greedily build up the structural alignment from sequence regions with high base-pairing and base alignment probabilities. Conclusions Several experiments on datasets with different characteristics confirm that PicXAA-R is one of the fastest algorithms for structural alignment of multiple RNAs and it consistently yields accurate alignment results, especially for datasets with locally similar sequences. PicXAA-R source code is freely available at: http://www.ece.tamu.edu/~bjyoon/picxaa/.
Collapse
|
20
|
Abstract
Like protein coding sequences, functional motifs in RNA elements are frequently conserved, but this conservation is most often at the structure level rather than sequence based. Proper characterization of these structural RNA motifs is both the key and the limiting step to understanding the nature of RNA-protein interactions. The discovery of elements targeted by RNA-binding proteins and how they function remains one of the most active, yet elusive areas of RNA biology. Only a limited number of these elements have been well characterized with many of the fundamental rules yet to be discovered. Here we present a comprehensive list of web based resources that can be used in the study and identification of RNA-based structural and regulatory motifs and provide a survey of the informatic resources that can have been developed to facilitate this research.
Collapse
Affiliation(s)
- Ajish D George
- Department of Biomedical Sciences, School of Public Health, Gen∗NY∗Sis Center for Excellence in Cancer Genomics, University at Albany-SUNY, Rensselaer, NY, USA.
| | | |
Collapse
|
21
|
Xu Z, Mathews DH. Multilign: an algorithm to predict secondary structures conserved in multiple RNA sequences. ACTA ACUST UNITED AC 2010; 27:626-32. [PMID: 21193521 DOI: 10.1093/bioinformatics/btq726] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
MOTIVATION With recent advances in sequencing, structural and functional studies of RNA lag behind the discovery of sequences. Computational analysis of RNA is increasingly important to reveal structure-function relationships with low cost and speed. The purpose of this study is to use multiple homologous sequences to infer a conserved RNA structure. RESULTS A new algorithm, called Multilign, is presented to find the lowest free energy RNA secondary structure common to multiple sequences. Multilign is based on Dynalign, which is a program that simultaneously aligns and folds two sequences to find the lowest free energy conserved structure. For Multilign, Dynalign is used to progressively construct a conserved structure from multiple pairwise calculations, with one sequence used in all pairwise calculations. A base pair is predicted only if it is contained in the set of low free energy structures predicted by all Dynalign calculations. In this way, Multilign improves prediction accuracy by keeping the genuine base pairs and excluding competing false base pairs. Multilign has computational complexity that scales linearly in the number of sequences. Multilign was tested on extensive datasets of sequences with known structure and its prediction accuracy is among the best of available algorithms. Multilign can run on long sequences (> 1500 nt) and an arbitrarily large number of sequences. AVAILABILITY The algorithm is implemented in ANSI C++ and can be downloaded as part of the RNAstructure package at: http://rna.urmc.rochester.edu.
Collapse
Affiliation(s)
- Zhenjiang Xu
- Department of Biochemistry and Biophysics, University of Rochester Medical Center, Rochester, NY, USA
| | | |
Collapse
|
22
|
Chen Q, Li G, Phoebe Chen YP. Interval-based distance function for identifying RNA structure candidates. J Theor Biol 2010; 269:280-6. [PMID: 21056578 DOI: 10.1016/j.jtbi.2010.11.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2010] [Revised: 11/01/2010] [Accepted: 11/01/2010] [Indexed: 10/18/2022]
Abstract
Many clustering approaches have been developed for biological data analysis, however, the application of traditional clustering algorithms for RNA structure data analysis is still a challenging issue. This arises from the existence of complex secondary structures while clustering. One of the most critical issues of cluster analysis is the development of appropriate distance measures in high dimensional space. The traditional distance measures focus on scale issues, but ignores the correlation between two values. This article develops a novel interval-based distance (Hausdorff) measure for computing the similarity between characterized structures. Three relationships including perfect match, partially overlapped and non-overlapped are considered. Finally, we demonstrate the methods by analyzing a data set of RNA secondary structures from the Rfam database.
Collapse
Affiliation(s)
- Qingfeng Chen
- School of Computer, Electronic and Information, Guangxi University, Nanning 530004, China.
| | | | | |
Collapse
|
23
|
|
24
|
Iacoangeli A, Bianchi R, Tiedge H. Regulatory RNAs in brain function and disorders. Brain Res 2010; 1338:36-47. [PMID: 20307503 PMCID: PMC3524968 DOI: 10.1016/j.brainres.2010.03.042] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2010] [Revised: 03/10/2010] [Accepted: 03/15/2010] [Indexed: 11/17/2022]
Abstract
Regulatory RNAs are being increasingly investigated in neurons, and important roles in brain function have been revealed. Regulatory RNAs are non-protein-coding RNAs (npcRNAs) that comprise a heterogeneous group of molecules, varying in size and mechanism of action. Regulatory RNAs often exert post-transcriptional control of gene expression, resulting in gene silencing or gene expression stimulation. Here, we review evidence that regulatory RNAs are implicated in neuronal development, differentiation, and plasticity. We will also discuss npcRNA dysregulation that may be involved in pathological states of the brain such as neurodevelopmental disorders, neurodegeneration, and epilepsy.
Collapse
Affiliation(s)
- Anna Iacoangeli
- The Robert F. Furchgott Center for Neural and Behavioral Science, Department of Physiology and Pharmacology, State University of New York, Health Science Center at Brooklyn, 450 Clarkson Avenue, Brooklyn, New York 11203, USA
| | - Riccardo Bianchi
- The Robert F. Furchgott Center for Neural and Behavioral Science, Department of Physiology and Pharmacology, State University of New York, Health Science Center at Brooklyn, 450 Clarkson Avenue, Brooklyn, New York 11203, USA
- Program in Neural and Behavioral Science, State University of New York, Health Science Center at Brooklyn, 450 Clarkson Avenue, Brooklyn, New York 11203, USA
| | - Henri Tiedge
- The Robert F. Furchgott Center for Neural and Behavioral Science, Department of Physiology and Pharmacology, State University of New York, Health Science Center at Brooklyn, 450 Clarkson Avenue, Brooklyn, New York 11203, USA
- Program in Neural and Behavioral Science, State University of New York, Health Science Center at Brooklyn, 450 Clarkson Avenue, Brooklyn, New York 11203, USA
| |
Collapse
|
25
|
Bernhart SH, Hofacker IL. From consensus structure prediction to RNA gene finding. BRIEFINGS IN FUNCTIONAL GENOMICS AND PROTEOMICS 2009; 8:461-71. [PMID: 19833701 DOI: 10.1093/bfgp/elp043] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
Reliable structure prediction is a prerequisite for most types of bioinformatical analysis of RNA. Since the accuracy of structure prediction from single sequences is limited, one often resorts to computing the consensus structure for a set of related RNA sequences. Since functionally important RNA structures are expected to evolve much more slowly than the underlying sequences, the pattern of sequence (co-)variation can be exploited to dramatically improve structure prediction. Since a conserved common structure is only expected when the RNA structure is under selective pressure, consensus structure prediction also provides an ideal starting point for the de novo detection of structured non-coding RNAs. Here, we review different strategies for the prediction of consensus secondary structures, and show how these approaches can be used to predict non-coding RNA genes.
Collapse
Affiliation(s)
- Stephan H Bernhart
- Department of Theoretical Chemistry, University of Vienna, Währingerstrasse 17, A-1090 Wien, Austria.
| | | |
Collapse
|
26
|
Fan D, Bitterman PB, Larsson O. Regulatory element identification in subsets of transcripts: comparison and integration of current computational methods. RNA (NEW YORK, N.Y.) 2009; 15:1469-82. [PMID: 19553345 PMCID: PMC2714745 DOI: 10.1261/rna.1617009] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/25/2009] [Accepted: 05/20/2009] [Indexed: 05/20/2023]
Abstract
Regulatory elements in mRNA play an often pivotal role in post-transcriptional regulation of gene expression. However, a systematic approach to efficiently identify putative regulatory elements from sets of post-transcriptionally coregulated genes is lacking, hampering studies of coregulation mechanisms. Although there are several analytical methods that can be used to detect conserved mRNA regulatory elements in a set of transcripts, there has been no systematic study of how well any of these methods perform individually or as a group. We therefore compared how well three algorithms, each based on a different principle (enumeration, optimization, or structure/sequence profiles), can identify elements in unaligned untranslated sequence regions. Two algorithms were originally designed to detect transcription factor binding sites, Weeder and BioProspector; and one was designed to detect RNA elements conserved in structure, RNAProfile. Three types of elements were examined: (1) elements conserved in both primary sequence and secondary structure; (2) elements conserved only in primary sequence; and (3) microRNA targets. Our results indicate that all methods can uniquely identify certain known RNA elements, and therefore, integrating the output from all algorithms leads to the most complete identification of elements. We therefore developed an approach to integrate results and guide selection of candidate elements from several algorithms presented as a web service (https://dbw.msi.umn.edu:8443/recit). These findings together with the approach for integration can be used to identify candidate elements from genome-wide post-transcriptional profiling data sets.
Collapse
Affiliation(s)
- Danhua Fan
- Department of Medicine, University of Minnesota, Minneapolis, Minnesota 55455, USA
| | | | | |
Collapse
|
27
|
Harmanci AO, Sharma G, Mathews DH. Stochastic sampling of the RNA structural alignment space. Nucleic Acids Res 2009; 37:4063-75. [PMID: 19429694 PMCID: PMC2709569 DOI: 10.1093/nar/gkp276] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
A novel method is presented for predicting the common secondary structures and alignment of two homologous RNA sequences by sampling the 'structural alignment' space, i.e. the joint space of their alignments and common secondary structures. The structural alignment space is sampled according to a pseudo-Boltzmann distribution based on a pseudo-free energy change that combines base pairing probabilities from a thermodynamic model and alignment probabilities from a hidden Markov model. By virtue of the implicit comparative analysis between the two sequences, the method offers an improvement over single sequence sampling of the Boltzmann ensemble. A cluster analysis shows that the samples obtained from joint sampling of the structural alignment space cluster more closely than samples generated by the single sequence method. On average, the representative (centroid) structure and alignment of the most populated cluster in the sample of structures and alignments generated by joint sampling are more accurate than single sequence sampling and alignment based on sequence alone, respectively. The 'best' centroid structure that is closest to the known structure among all the centroids is, on average, more accurate than structure predictions of other methods. Additionally, cluster analysis identifies, on average, a few clusters, whose centroids can be presented as alternative candidates. The source code for the proposed method can be downloaded at http://rna.urmc.rochester.edu.
Collapse
Affiliation(s)
- Arif Ozgun Harmanci
- Department of Electrical and Computer Engineering, University of Rochester, Hopeman 204, RC Box 270126, Rochester, NY 14627, USA
| | | | | |
Collapse
|
28
|
Tabei Y, Asai K. A local multiple alignment method for detection of non-coding RNA sequences. ACTA ACUST UNITED AC 2009; 25:1498-505. [PMID: 19376823 DOI: 10.1093/bioinformatics/btp261] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Non-coding RNAs (ncRNAs) show a unique evolutionary process in which the substitutions of distant bases are correlated in order to conserve the secondary structure of the ncRNA molecule. Therefore, the multiple alignment method for the detection of ncRNAs should take into account both the primary sequence and the secondary structure. Recently, there has been intense focus on multiple alignment investigations for the detection of ncRNAs; however, most of the proposed methods are designed for global multiple alignments. For this reason, these methods are not appropriate to identify locally conserved ncRNAs among genomic sequences. A more efficient local multiple alignment method for the detection of ncRNAs is required. RESULTS We propose a new local multiple alignment method for the detection of ncRNAs. This method uses a local multiple alignment construction procedure inspired by ProDA, which is a local multiple aligner program for protein sequences with repeated and shuffled elements. To align sequences based on secondary structure information, we propose a new alignment model which incorporates secondary structure features. We define the conditional probability of an alignment via a conditional random field and use a gamma-centroid estimator to align sequences. The locally aligned subsequences are clustered into blocks of approximately globally alignable subsequences between pairwise alignments. Finally, these blocks are multiply aligned via MXSCARNA. In benchmark experiments, we demonstrate the high ability of the implemented software, SCARNA_LM, for local multiple alignment for the detection of ncRNAs. AVAILABILITY The C++ source code for SCARNA_LM and its experimental datasets are available at http://www.ncrna.org/software/scarna_lm/download. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yasuo Tabei
- Department of Computational biology, Graduate School of Frontier Sciences, University of Tokyo, Kashiwa, Chiba, Japan.
| | | |
Collapse
|
29
|
Xu X, Ji Y, Stormo GD. Discovering cis-regulatory RNAs in Shewanella genomes by Support Vector Machines. PLoS Comput Biol 2009; 5:e1000338. [PMID: 19343219 PMCID: PMC2659441 DOI: 10.1371/journal.pcbi.1000338] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2008] [Accepted: 02/24/2009] [Indexed: 12/31/2022] Open
Abstract
An increasing number of cis-regulatory RNA elements have been found to regulate gene expression post-transcriptionally in various biological processes in bacterial systems. Effective computational tools for large-scale identification of novel regulatory RNAs are strongly desired to facilitate our exploration of gene regulation mechanisms and regulatory networks. We present a new computational program named RSSVM (RNA Sampler+Support Vector Machine), which employs Support Vector Machines (SVMs) for efficient identification of functional RNA motifs from random RNA secondary structures. RSSVM uses a set of distinctive features to represent the common RNA secondary structure and structural alignment predicted by RNA Sampler, a tool for accurate common RNA secondary structure prediction, and is trained with functional RNAs from a variety of bacterial RNA motif/gene families covering a wide range of sequence identities. When tested on a large number of known and random RNA motifs, RSSVM shows a significantly higher sensitivity than other leading RNA identification programs while maintaining the same false positive rate. RSSVM performs particularly well on sets with low sequence identities. The combination of RNA Sampler and RSSVM provides a new, fast, and efficient pipeline for large-scale discovery of regulatory RNA motifs. We applied RSSVM to multiple Shewanella genomes and identified putative regulatory RNA motifs in the 5′ untranslated regions (UTRs) in S. oneidensis, an important bacterial organism with extraordinary respiratory and metal reducing abilities and great potential for bioremediation and alternative energy generation. From 1002 sets of 5′-UTRs of orthologous operons, we identified 166 putative regulatory RNA motifs, including 17 of the 19 known RNA motifs from Rfam, an additional 21 RNA motifs that are supported by literature evidence, 72 RNA motifs overlapping predicted transcription terminators or attenuators, and other candidate regulatory RNA motifs. Our study provides a list of promising novel regulatory RNA motifs potentially involved in post-transcriptional gene regulation. Combined with the previous cis-regulatory DNA motif study in S. oneidensis, this genome-wide discovery of cis-regulatory RNA motifs may offer more comprehensive views of gene regulation at a different level in this organism. The RSSVM software, predictions, and analysis results on Shewanella genomes are available at http://ural.wustl.edu/resources.html#RSSVM. RNA is remarkably versatile, acting not only as messengers to transfer genetic information from DNA to protein but also as critical structural components and catalytic enzymes in the cell. More intriguingly, RNA elements in messenger RNAs have been widely found in bacteria to control the expression of their downstream genes. The functions of these RNA elements are intrinsically linked to their secondary structures, which are usually conserved across multiple closely related species during evolution and often shared by genes in the same metabolic pathways. We developed a new computational approach to find putative functional RNA elements by looking for conserved RNA secondary structures that are distinguished from random RNA secondary structures in the orthologous RNA sequences from related species. We applied this approach to multiple Shewanella genomes and predicted putative regulatory RNA elements in Shewanella oneidensis, a bacterium that has extraordinary respiratory and metal reducing abilities and great potential for bioremediation and alternative energy generation. Our findings not only recovered many RNA elements that are known or supported by literature evidence but also included exciting novel RNA elements for further exploration.
Collapse
Affiliation(s)
- Xing Xu
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Yongmei Ji
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Gary D. Stormo
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America
- * E-mail:
| |
Collapse
|
30
|
Singh V, Somvanshi P. Computational modeling analyses of RNA secondary structures and phylogenetic inference of evolutionary conserved 5S rRNA in the prokaryotes. J Mol Graph Model 2009; 27:770-6. [PMID: 19217331 DOI: 10.1016/j.jmgm.2008.11.012] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2008] [Revised: 11/16/2008] [Accepted: 11/19/2008] [Indexed: 10/21/2022]
Abstract
Bacteria are unicellular, ubiquitous microorganisms which grow on soil, acidic hot springs, radioactive wastes, etc. The genome of bacteria constitutes species specific conserved region. The 5S rRNA is one of the most conserved region determined in each bacteria and the size ranges between 110 and 148 bp. On this basis phylogenetic study of 37 bacterial strains was done which results in formation of seven clades and furthermore RNA secondary structure from each clade was made. The lowest free energy (delta G) of the 5S rRNA may divulge the most primitive bacteria and slow changes occurs throughout the evolution whereas the higher free energy indicates less stability during the evolution. The RNA secondary structure may provide new insights to understand bacteria evolution and stability.
Collapse
Affiliation(s)
- Vijai Singh
- Bioinformatics Centre, Biotech Park, Sector-G Jankipuram, Lucknow 226021, Uttar Pradesh, India
| | | |
Collapse
|
31
|
Abstract
Chemical probing of RNA structure has become one of the most popular approaches to map the conformation of RNA molecules of various sizes under well-defined experimental conditions. The method monitors the sensitivity of each nucleotide to various chemicals, which reflects its hydrogen-bonding environment within the RNA molecule. The goal of this chapter is to provide the reader with an experimental guide to mapping the secondary structure of RNA thermosensors in vitro with the most suitable chemical probes.
Collapse
Affiliation(s)
- Claude Chiaruttini
- UPR9073 du CNRS associée à l'Université de Paris VII, Institut de Biologie Physico-Chimique, 75005 Paris, France.
| | | | | |
Collapse
|
32
|
|
33
|
Taneda A. An efficient genetic algorithm for structural RNA pairwise alignment and its application to non-coding RNA discovery in yeast. BMC Bioinformatics 2008; 9:521. [PMID: 19061486 PMCID: PMC2630964 DOI: 10.1186/1471-2105-9-521] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2008] [Accepted: 12/05/2008] [Indexed: 11/30/2022] Open
Abstract
Background Aligning RNA sequences with low sequence identity has been a challenging problem since such a computation essentially needs an algorithm with high complexities for taking structural conservation into account. Although many sophisticated algorithms for the purpose have been proposed to date, further improvement in efficiency is necessary to accelerate its large-scale applications including non-coding RNA (ncRNA) discovery. Results We developed a new genetic algorithm, Cofolga2, for simultaneously computing pairwise RNA sequence alignment and consensus folding, and benchmarked it using BRAliBase 2.1. The benchmark results showed that our new algorithm is accurate and efficient in both time and memory usage. Then, combining with the originally trained SVM, we applied the new algorithm to novel ncRNA discovery where we compared S. cerevisiae genome with six related genomes in a pairwise manner. By focusing our search to the relatively short regions (50 bp to 2,000 bp) sandwiched by conserved sequences, we successfully predict 714 intergenic and 1,311 sense or antisense ncRNA candidates, which were found in the pairwise alignments with stable consensus secondary structure and low sequence identity (≤ 50%). By comparing with the previous predictions, we found that > 92% of the candidates is novel candidates. The estimated rate of false positives in the predicted candidates is 51%. Twenty-five percent of the intergenic candidates has supports for expression in cell, i.e. their genomic positions overlap those of the experimentally determined transcripts in literature. By manual inspection of the results, moreover, we obtained four multiple alignments with low sequence identity which reveal consensus structures shared by three species/sequences. Conclusion The present method gives an efficient tool complementary to sequence-alignment-based ncRNA finders.
Collapse
Affiliation(s)
- Akito Taneda
- Graduate School of Science and Technology, Hirosaki University, Hirosaki, Japan.
| |
Collapse
|
34
|
Comparative analysis of sequences and secondary structures of the rRNA internal transcribed spacer 2 (ITS2) in pollen beetles of the subfamily Meligethinae (Coleoptera, Nitidulidae): potential use of slippage-derived sequences in molecular systematics. Mol Phylogenet Evol 2008; 51:215-26. [PMID: 19059352 DOI: 10.1016/j.ympev.2008.11.004] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2008] [Revised: 11/05/2008] [Accepted: 11/06/2008] [Indexed: 11/21/2022]
Abstract
A comparative analysis of ITS2 sequences and secondary structures in 89 species of pollen beetles of the subfamily Meligethinae (Coleoptera, Nitidulidae) was performed. The ITS2 folding pattern was highly conserved and comparable with the general model proposed for eukaryotes. Simple sequence repeats (SSRs) were responsible for most of the observed nucleotide variability (approximately 1-3%) and length variation (359-459bp). When plotted on secondary structures, SSRs mapped in expansion segments positioned at the apices of three ITS2 helices ('A', 'B' and 'D1') and appeared to have evolved under mechanisms of compensatory slippage. Homologies among SSRs nucleotides could not be unambiguously assigned, and thus were not useful to resolve phylogeny. However, slippage-derived motifs provided some preliminary genetic support for newly proposed taxonomic arrangements of several genera and subgenera of Meligethinae, corroborating existing morphological and ecological datasets.
Collapse
|
35
|
Informatic resources for identifying and annotating structural RNA motifs. Mol Biotechnol 2008; 41:180-93. [PMID: 18979204 DOI: 10.1007/s12033-008-9114-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2008] [Accepted: 10/01/2008] [Indexed: 10/21/2022]
Abstract
Post-transcriptional regulation of genes and transcripts is a vital aspect of cellular processes, and unlike transcriptional regulation, remains a largely unexplored domain. One of the most obvious and most important questions to explore is the discovery of functional RNA elements. Many RNA elements have been characterized to date ranging from cis-regulatory motifs within mRNAs to large families of non-coding RNAs. Like protein coding genes, the functional motifs of these RNA elements are highly conserved, but unlike protein coding genes, it is most often the structure and not the sequence that is conserved. Proper characterization of these structural RNA motifs is both the key and the limiting step to understanding the post-transcriptional aspects of the genomic world. Here, we focus on the task of structural motif discovery and provide a survey of the informatics resources geared towards this task.
Collapse
|
36
|
Bradley RK, Pachter L, Holmes I. Specific alignment of structured RNA: stochastic grammars and sequence annealing. ACTA ACUST UNITED AC 2008; 24:2677-83. [PMID: 18796475 DOI: 10.1093/bioinformatics/btn495] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
MOTIVATION Whole-genome screens suggest that eukaryotic genomes are dense with non-coding RNAs (ncRNAs). We introduce a novel approach to RNA multiple alignment which couples a generative probabilistic model of sequence and structure with an efficient sequence annealing approach for exploring the space of multiple alignments. This leads to a new software program, Stemloc-AMA, that is both accurate and specific in the alignment of multiple related RNA sequences. RESULTS When tested on the benchmark datasets BRalibase II and BRalibase 2.1, Stemloc-AMA has comparable sensitivity to and better specificity than the best competing methods. We use a large-scale random sequence experiment to show that while most alignment programs maximize sensitivity at the expense of specificity, even to the point of giving complete alignments of non-homologous sequences, Stemloc-AMA aligns only sequences with detectable homology and leaves unrelated sequences largely unaligned. Such accurate and specific alignments are crucial for comparative-genomics analysis, from inferring phylogeny to estimating substitution rates across different lineages. AVAILABILITY Stemloc-AMA is available from http://biowiki.org/StemLocAMA as part of the dart software package for sequence analysis.
Collapse
Affiliation(s)
- Robert K Bradley
- Biophysics Graduate Group, Department of Mathematics and Department of Bioengineering, University of California, Berkeley, CA 94720, USA
| | | | | |
Collapse
|
37
|
Do CB, Foo CS, Batzoglou S. A max-margin model for efficient simultaneous alignment and folding of RNA sequences. Bioinformatics 2008; 24:i68-76. [PMID: 18586747 PMCID: PMC2718655 DOI: 10.1093/bioinformatics/btn177] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION The need for accurate and efficient tools for computational RNA structure analysis has become increasingly apparent over the last several years: RNA folding algorithms underlie numerous applications in bioinformatics, ranging from microarray probe selection to de novo non-coding RNA gene prediction. In this work, we present RAF (RNA Alignment and Folding), an efficient algorithm for simultaneous alignment and consensus folding of unaligned RNA sequences. Algorithmically, RAF exploits sparsity in the set of likely pairing and alignment candidates for each nucleotide (as identified by the CONTRAfold or CONTRAlign programs) to achieve an effectively quadratic running time for simultaneous pairwise alignment and folding. RAF's fast sparse dynamic programming, in turn, serves as the inference engine within a discriminative machine learning algorithm for parameter estimation. RESULTS In cross-validated benchmark tests, RAF achieves accuracies equaling or surpassing the current best approaches for RNA multiple sequence secondary structure prediction. However, RAF requires nearly an order of magnitude less time than other simultaneous folding and alignment methods, thus making it especially appropriate for high-throughput studies. AVAILABILITY Source code for RAF is available at:http://contra.stanford.edu/contrafold/.
Collapse
Affiliation(s)
- Chuong B Do
- Computer Science Department, Stanford University, Stanford, CA 94305, USA.
| | | | | |
Collapse
|
38
|
Abstract
We present an easy-to-use webserver that makes it possible to simultaneously use a number of state of the art methods for performing multiple alignment and secondary structure prediction for noncoding RNA sequences. This makes it possible to use the programs without having to download the code and get the programs to run. The results of all the programs are presented on a webpage and can easily be downloaded for further analysis. Additional measures are calculated for each program to make it easier to judge the individual predictions, and a consensus prediction taking all the programs into account is also calculated. This website is free and open to all users and there is no login requirement. The webserver can be found at: http://genome.ku.dk/resources/war.
Collapse
Affiliation(s)
- Elfar Torarinsson
- Division of Genetics and Bioinformatics, IBHV, Faculty of Life Sciences, University of Copenhagen, Groennegaardsvej 3, DK-1870 Frederiksberg C, Denmark
| | | |
Collapse
|
39
|
Moretti S, Wilm A, Higgins DG, Xenarios I, Notredame C. R-Coffee: a web server for accurately aligning noncoding RNA sequences. Nucleic Acids Res 2008; 36:W10-3. [PMID: 18483080 PMCID: PMC2447777 DOI: 10.1093/nar/gkn278] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
The R-Coffee web server produces highly accurate multiple alignments of noncoding RNA (ncRNA) sequences, taking into account predicted secondary structures. R-Coffee uses a novel algorithm recently incorporated in the T-Coffee package. R-Coffee works along the same lines as T-Coffee: it uses pairwise or multiple sequence alignment (MSA) methods to compute a primary library of input alignments. The program then computes an MSA highly consistent with both the alignments contained in the library and the secondary structures associated with the sequences. The secondary structures are predicted using RNAplfold. The server provides two modes. The slow/accurate mode is restricted to small datasets (less than 5 sequences less than 150 nucleotides) and combines R-Coffee with Consan, a very accurate pairwise RNA alignment method. For larger datasets a fast method can be used (RM-Coffee mode), that uses R-Coffee to combine the output of the three packages which combines the outputs from programs found to perform best on RNA (MUSCLE, MAFFT and ProbConsRNA). Our BRAliBase benchmarks indicate that the R-Coffee/Consan combination is one of the best ncRNA alignment methods for short sequences, while the RM-Coffee gives comparable results on longer sequences. The R-Coffee web server is available at http://www.tcoffee.org.
Collapse
Affiliation(s)
- Sébastien Moretti
- Swiss Institute of Bioinformatics (SIB), Quartier Sorge - Genopode, UNIL, CH-1015 Lausanne, Switzerland
| | | | | | | | | |
Collapse
|
40
|
Katoh K, Toh H. Improved accuracy of multiple ncRNA alignment by incorporating structural information into a MAFFT-based framework. BMC Bioinformatics 2008; 9:212. [PMID: 18439255 PMCID: PMC2387179 DOI: 10.1186/1471-2105-9-212] [Citation(s) in RCA: 444] [Impact Index Per Article: 27.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2007] [Accepted: 04/25/2008] [Indexed: 11/10/2022] Open
Abstract
Background Structural alignment of RNAs is becoming important, since the discovery of functional non-coding RNAs (ncRNAs). Recent studies, mainly based on various approximations of the Sankoff algorithm, have resulted in considerable improvement in the accuracy of pairwise structural alignment. In contrast, for the cases with more than two sequences, the practical merit of structural alignment remains unclear as compared to traditional sequence-based methods, although the importance of multiple structural alignment is widely recognized. Results We took a different approach from a straightforward extension of the Sankoff algorithm to the multiple alignments from the viewpoints of accuracy and time complexity. As a new option of the MAFFT alignment program, we developed a multiple RNA alignment framework, X-INS-i, which builds a multiple alignment with an iterative method incorporating structural information through two components: (1) pairwise structural alignments by an external pairwise alignment method such as SCARNA or LaRA and (2) a new objective function, Four-way Consistency, derived from the base-pairing probability of every sub-aligned group at every multiple alignment stage. Conclusion The BRAliBASE benchmark showed that X-INS-i outperforms other methods currently available in the sum-of-pairs score (SPS) criterion. As a basis for predicting common secondary structure, the accuracy of the present method is comparable to or rather higher than those of the current leading methods such as RNA Sampler. The X-INS-i framework can be used for building a multiple RNA alignment from any combination of algorithms for pairwise RNA alignment and base-pairing probability. The source code is available at the webpage found in the Availability and requirements section.
Collapse
Affiliation(s)
- Kazutaka Katoh
- Digital Medicine Initiative, Kyushu University, Fukuoka, 812-8582, Japan.
| | | |
Collapse
|
41
|
Wilm A, Higgins DG, Notredame C. R-Coffee: a method for multiple alignment of non-coding RNA. Nucleic Acids Res 2008; 36:e52. [PMID: 18420654 PMCID: PMC2396437 DOI: 10.1093/nar/gkn174] [Citation(s) in RCA: 91] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
R-Coffee is a multiple RNA alignment package, derived from T-Coffee, designed to align RNA sequences while exploiting secondary structure information. R-Coffee uses an alignment-scoring scheme that incorporates secondary structure information within the alignment. It works particularly well as an alignment improver and can be combined with any existing sequence alignment method. In this work, we used R-Coffee to compute multiple sequence alignments combining the pairwise output of sequence aligners and structural aligners. We show that R-Coffee can improve the accuracy of all the sequence aligners. We also show that the consistency-based component of T-Coffee can improve the accuracy of several structural aligners. R-Coffee was tested on 388 BRAliBase reference datasets and on 11 longer Cmfinder datasets. Altogether our results suggest that the best protocol for aligning short sequences (less than 200 nt) is the combination of R-Coffee with the RNA pairwise structural aligner Consan. We also show that the simultaneous combination of the four best sequence alignment programs with R-Coffee produces alignments almost as accurate as those obtained with R-Coffee/Consan. Finally, we show that R-Coffee can also be used to align longer datasets beyond the usual scope of structural aligners. R-Coffee is freely available for download, along with documentation, from the T-Coffee web site (www.tcoffee.org).
Collapse
Affiliation(s)
- Andreas Wilm
- The Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Ireland
| | | | | |
Collapse
|
42
|
Tabei Y, Kiryu H, Kin T, Asai K. A fast structural multiple alignment method for long RNA sequences. BMC Bioinformatics 2008; 9:33. [PMID: 18215258 PMCID: PMC2375124 DOI: 10.1186/1471-2105-9-33] [Citation(s) in RCA: 132] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2007] [Accepted: 01/23/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Aligning multiple RNA sequences is essential for analyzing non-coding RNAs. Although many alignment methods for non-coding RNAs, including Sankoff's algorithm for strict structural alignments, have been proposed, they are either inaccurate or computationally too expensive. Faster methods with reasonable accuracies are required for genome-scale analyses. RESULTS We propose a fast algorithm for multiple structural alignments of RNA sequences that is an extension of our pairwise structural alignment method (implemented in SCARNA). The accuracies of the implemented software, MXSCARNA, are at least as favorable as those of state-of-art algorithms that are computationally much more expensive in time and memory. CONCLUSION The proposed method for structural alignment of multiple RNA sequences is fast enough for large-scale analyses with accuracies at least comparable to those of existing algorithms. The source code of MXSCARNA and its web server are available at http://mxscarna.ncrna.org.
Collapse
Affiliation(s)
- Yasuo Tabei
- Graduate School of Frontier Science, University of Tokyo, CB04 Kiban-tou 5-1-5 Kashiwanoha, Kashiwa, Chiba, 277-8561, Japan.
| | | | | | | |
Collapse
|
43
|
Lindgreen S, Gardner PP, Krogh A. MASTR: multiple alignment and structure prediction of non-coding RNAs using simulated annealing. Bioinformatics 2007; 23:3304-11. [PMID: 18006551 DOI: 10.1093/bioinformatics/btm525] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION As more non-coding RNAs are discovered, the importance of methods for RNA analysis increases. Since the structure of ncRNA is intimately tied to the function of the molecule, programs for RNA structure prediction are necessary tools in this growing field of research. Furthermore, it is known that RNA structure is often evolutionarily more conserved than sequence. However, few existing methods are capable of simultaneously considering multiple sequence alignment and structure prediction. RESULT We present a novel solution to the problem of simultaneous structure prediction and multiple alignment of RNA sequences. Using Markov chain Monte Carlo in a simulated annealing framework, the algorithm MASTR (Multiple Alignment of STructural RNAs) iteratively improves both sequence alignment and structure prediction for a set of RNA sequences. This is done by minimizing a combined cost function that considers sequence conservation, covariation and basepairing probabilities. The results show that the method is very competitive to similar programs available today, both in terms of accuracy and computational efficiency. AVAILABILITY Source code available from http://mastr.binf.ku.dk/
Collapse
Affiliation(s)
- Stinus Lindgreen
- Bioinformatics Centre, Department of Molecular Biology, University of Copenhagen, Ole Maaloes Vej 5, 2200 Copenhagen N, Denmark.
| | | | | |
Collapse
|