1
|
Gupta S, Pal D. Detection of intrinsic transcription termination sites in bacteria: consensus from hairpin detection approaches. J Biomol Struct Dyn 2024:1-11. [PMID: 38605579 DOI: 10.1080/07391102.2024.2325107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Accepted: 02/23/2024] [Indexed: 04/13/2024]
Abstract
We compare the WebGeSTer and INtrinsic transcription TERmination hairPIN (INTERPIN) databases used for intrinsic transcription termination (ITT) site prediction in bacteria. The former deploys inverted nucleotide repeat detection for identification of RNA hairpin, while the latter a pair-potential function - the hairpin energy score evaluation being identical for both. We find INTERPIN more sensitive than WebGeSTer with about 6% and 51% additional predictions for ITTs in chromosomal and plasmid operons, respectively. INTERPIN hairpins are relatively shorter in length with ungapped stem, and even located in AT-rich segments, compared to GC-rich longer hairpins with a gapped stem in WebGeSTer. The GC%, length, and energy score from INTERPIN transcription units (TUs) are best inter-correlated while the lowest energy single hairpins from WebGeSTer, considered suitable for ITT, being the worst. Around 72% TUs from the two databases overlap, and ∼60% of all alternate ITT sites downstream of TUs overlap, of which 65% are cluster hairpins. This helps highlight hairpin features that can be used to identify termination sites in bacteria across different prediction methods. Overall, the pair-potential-function-based hairpins screened appear to be more consistent with the kinetic and thermodynamics processes of ITT known to date.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Swati Gupta
- Department of Computational and Data Sciences, Indian Institute of Science, Bengaluru, India
| | - Debnath Pal
- Department of Computational and Data Sciences, Indian Institute of Science, Bengaluru, India
| |
Collapse
|
2
|
Rinaldi S, Moroni E, Rozza R, Magistrato A. Frontiers and Challenges of Computing ncRNAs Biogenesis, Function and Modulation. J Chem Theory Comput 2024; 20:993-1018. [PMID: 38287883 DOI: 10.1021/acs.jctc.3c01239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2024]
Abstract
Non-coding RNAs (ncRNAs), generated from nonprotein coding DNA sequences, constitute 98-99% of the human genome. Non-coding RNAs encompass diverse functional classes, including microRNAs, small interfering RNAs, PIWI-interacting RNAs, small nuclear RNAs, small nucleolar RNAs, and long non-coding RNAs. With critical involvement in gene expression and regulation across various biological and physiopathological contexts, such as neuronal disorders, immune responses, cardiovascular diseases, and cancer, non-coding RNAs are emerging as disease biomarkers and therapeutic targets. In this review, after providing an overview of non-coding RNAs' role in cell homeostasis, we illustrate the potential and the challenges of state-of-the-art computational methods exploited to study non-coding RNAs biogenesis, function, and modulation. This can be done by directly targeting them with small molecules or by altering their expression by targeting the cellular engines underlying their biosynthesis. Drawing from applications, also taken from our work, we showcase the significance and role of computer simulations in uncovering fundamental facets of ncRNA mechanisms and modulation. This information may set the basis to advance gene modulation tools and therapeutic strategies to address unmet medical needs.
Collapse
Affiliation(s)
- Silvia Rinaldi
- National Research Council of Italy (CNR) - Institute of Chemistry of OrganoMetallic Compounds (ICCOM), c/o Area di Ricerca CNR di Firenze Via Madonna del Piano 10, 50019 Sesto Fiorentino, Florence, Italy
| | - Elisabetta Moroni
- National Research Council of Italy (CNR) - Institute of Chemical Sciences and Technologies (SCITEC), via Mario Bianco 9, 20131 Milano, Italy
| | - Riccardo Rozza
- National Research Council of Italy (CNR) - Institute of Material Foundry (IOM) c/o International School for Advanced Studies (SISSA), Via Bonomea, 265, 34136 Trieste, Italy
| | - Alessandra Magistrato
- National Research Council of Italy (CNR) - Institute of Material Foundry (IOM) c/o International School for Advanced Studies (SISSA), Via Bonomea, 265, 34136 Trieste, Italy
| |
Collapse
|
3
|
Sato K, Hamada M. Recent trends in RNA informatics: a review of machine learning and deep learning for RNA secondary structure prediction and RNA drug discovery. Brief Bioinform 2023; 24:bbad186. [PMID: 37232359 PMCID: PMC10359090 DOI: 10.1093/bib/bbad186] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 04/24/2023] [Accepted: 04/25/2023] [Indexed: 05/27/2023] Open
Abstract
Computational analysis of RNA sequences constitutes a crucial step in the field of RNA biology. As in other domains of the life sciences, the incorporation of artificial intelligence and machine learning techniques into RNA sequence analysis has gained significant traction in recent years. Historically, thermodynamics-based methods were widely employed for the prediction of RNA secondary structures; however, machine learning-based approaches have demonstrated remarkable advancements in recent years, enabling more accurate predictions. Consequently, the precision of sequence analysis pertaining to RNA secondary structures, such as RNA-protein interactions, has also been enhanced, making a substantial contribution to the field of RNA biology. Additionally, artificial intelligence and machine learning are also introducing technical innovations in the analysis of RNA-small molecule interactions for RNA-targeted drug discovery and in the design of RNA aptamers, where RNA serves as its own ligand. This review will highlight recent trends in the prediction of RNA secondary structure, RNA aptamers and RNA drug discovery using machine learning, deep learning and related technologies, and will also discuss potential future avenues in the field of RNA informatics.
Collapse
Affiliation(s)
- Kengo Sato
- School of System Design and Technology, Tokyo Denki University, 5 Senju Asahi-cho, Adachi-ku, Tokyo 120-8551, Japan
| | - Michiaki Hamada
- Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL) , National Institute of Advanced Industrial Science and Technology (AIST), 3-4-1, Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
- Graduate School of Medicine, Nippon Medical School, 1-1-5, Sendagi, Bunkyo-ku, Tokyo 113-8602, Japan
| |
Collapse
|
4
|
Ono Y, Asai K. Rtools: A Web Server for Various Secondary Structural Analyses on Single RNA Sequences. Methods Mol Biol 2023; 2586:1-14. [PMID: 36705895 DOI: 10.1007/978-1-0716-2768-6_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Predicting the secondary structures of RNA molecules is an essential step to characterize their functions, but the thermodynamic probability of any prediction is generally small. On the other hand, there are a few tools for calculating and visualizing various secondary structural information from RNA sequences. We implemented a web server that calculates in parallel various features of secondary structures: different types of secondary structure predictions, the marginal probabilities for local structural contexts, accessibilities of the subsequences, the energy changes by arbitrary base mutations, and the measures for validations of the predicted secondary structures. The web server is available at http://rtools.cbrc.jp , which integrates software tools, CentroidFold, CentroidHomfold, IPknot, CapR, Raccess, Rchange, RintD, and RintW.
Collapse
Affiliation(s)
- Yukiteru Ono
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Tokyo, Japan
| | - Kiyoshi Asai
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Tokyo, Japan.
| |
Collapse
|
5
|
Morishita EC. Discovery of RNA-targeted small molecules through the merging of experimental and computational technologies. Expert Opin Drug Discov 2023; 18:207-226. [PMID: 36322542 DOI: 10.1080/17460441.2022.2134852] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
INTRODUCTION The field of RNA-targeted small molecules is rapidly evolving, owing to the advances in experimental and computational technologies. With the identification of several bioactive small molecules that target RNA, including the FDA-approved risdiplam, the biopharmaceutical industry is gaining confidence in the field. This review, based on the literature obtained from PubMed, aims to disseminate information about the various technologies developed for targeting RNA with small molecules and propose areas for improvement to develop drugs more efficiently, particularly those linked to diseases with unmet medical needs. AREAS COVERED The technologies for the identification of RNA targets, screening of chemical libraries against RNA, assessing the bioactivity and target engagement of the hit compounds, structure determination, and hit-to-lead optimization are reviewed. Along with the description of the technologies, their strengths, limitations, and examples of how they can impact drug discovery are provided. EXPERT OPINION Many existing technologies employed for protein targets have been repurposed for use in the discovery of RNA-targeted small molecules. In addition, technologies tailored for RNA targets have been developed. Nevertheless, more improvements are necessary, such as artificial intelligence to dissect important RNA structures and RNA-small-molecule interactions and more powerful chemical probing and structure prediction techniques.
Collapse
|
6
|
Fukunaga T, Hamada M. LinAliFold and CentroidLinAliFold: fast RNA consensus secondary structure prediction for aligned sequences using beam search methods. BIOINFORMATICS ADVANCES 2022; 2:vbac078. [PMID: 36699418 PMCID: PMC9710674 DOI: 10.1093/bioadv/vbac078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 10/13/2022] [Accepted: 10/21/2022] [Indexed: 11/05/2022]
Abstract
Motivation RNA consensus secondary structure prediction from aligned sequences is a powerful approach for improving the secondary structure prediction accuracy. However, because the computational complexities of conventional prediction tools scale with the cube of the alignment lengths, their application to long RNA sequences, such as viral RNAs or long non-coding RNAs, requires significant computational time. Results In this study, we developed LinAliFold and CentroidLinAliFold, fast RNA consensus secondary structure prediction tools based on minimum free energy and maximum expected accuracy principles, respectively. We achieved software acceleration using beam search methods that were successfully used for fast secondary structure prediction from a single RNA sequence. Benchmark analyses showed that LinAliFold and CentroidLinAliFold were much faster than the existing methods while preserving the prediction accuracy. As an empirical application, we predicted the consensus secondary structure of coronaviruses with approximately 30 000 nt in 5 and 79 min by LinAliFold and CentroidLinAliFold, respectively. We confirmed that the predicted consensus secondary structure of coronaviruses was consistent with the experimental results. Availability and implementation The source codes of LinAliFold and CentroidLinAliFold are freely available at https://github.com/fukunagatsu/LinAliFold-CentroidLinAliFold. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
| | - Michiaki Hamada
- Department of Electrical Engineering and Bioscience, Graduate School of Advanced Science and Engineering, Waseda University, Tokyo 1698555, Japan,Computational Bio Big-Data Open Innovation Laboratory, AIST-Waseda University, Tokyo 1698555, Japan
| |
Collapse
|
7
|
Tagashira M, Asai K. ConsAlifold: considering RNA structural alignments improves prediction accuracy of RNA consensus secondary structures. Bioinformatics 2022; 38:710-719. [PMID: 34694364 DOI: 10.1093/bioinformatics/btab738] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 08/24/2021] [Accepted: 10/20/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION By detecting homology among RNAs, the probabilistic consideration of RNA structural alignments has improved the prediction accuracy of significant RNA prediction problems. Predicting an RNA consensus secondary structure from an RNA sequence alignment is a fundamental research objective because in the detection of conserved base-pairings among RNA homologs, predicting an RNA consensus secondary structure is more convenient than predicting an RNA structural alignment. RESULTS We developed and implemented ConsAlifold, a dynamic programming-based method that predicts the consensus secondary structure of an RNA sequence alignment. ConsAlifold considers RNA structural alignments. ConsAlifold achieves moderate running time and the best prediction accuracy of RNA consensus secondary structures among available prediction methods. AVAILABILITY AND IMPLEMENTATION ConsAlifold, data and Python scripts for generating both figures and tables are freely available at https://github.com/heartsh/consalifold. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Masaki Tagashira
- Department of Computational Biology and Medical Sciences, University of Tokyo, Chiba 277-8561, Japan.,Artificial Intelligence Research Center, AIST, Tokyo 135-0064, Japan
| | - Kiyoshi Asai
- Department of Computational Biology and Medical Sciences, University of Tokyo, Chiba 277-8561, Japan.,Artificial Intelligence Research Center, AIST, Tokyo 135-0064, Japan
| |
Collapse
|
8
|
Schwarz M, Vohradský J, Modrák M, Pánek J. rboAnalyzer: A Software to Improve Characterization of Non-coding RNAs From Sequence Database Search Output. Front Genet 2020; 11:675. [PMID: 32849767 PMCID: PMC7401326 DOI: 10.3389/fgene.2020.00675] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2019] [Accepted: 06/02/2020] [Indexed: 12/12/2022] Open
Abstract
Searching for similar sequences in a database via BLAST or a similar tool is one of the most common bioinformatics tasks applied in general, and to non-coding RNAs in particular. However, the results of the search might be difficult to interpret due to the presence of partial matches to the database subject sequences. Here, we present rboAnalyzer – a tool that helps with interpreting sequence search result by (1) extending partial matches into plausible full-length subject sequences, (2) predicting homology of RNAs represented by full-length subject sequences to the query RNA, (3) pooling information across homologous RNAs found in the search results and public databases such as Rfam to predict more reliable secondary structures for all matches, and (4) contextualizing the matches by providing the prediction results and other relevant information in a rich graphical output. Using predicted full-length matches improves secondary structure prediction and makes rboAnalyzer robust with regards to identification of homology. The output of the tool should help the user to reliably characterize non-coding RNAs in BLAST output. The usefulness of the rboAnalyzer and its ability to correctly extend partial matches to full-length is demonstrated on known homologous RNAs. To allow the user to use custom databases and search options, rboAnalyzer accepts any search results as a text file in the BLAST format. The main output is an interactive HTML page displaying the computed characteristics and other context of the matches. The output can also be exported in an appropriate sequence and/or secondary structure formats.
Collapse
Affiliation(s)
- Marek Schwarz
- Laboratory of Bioinformatics, Institute of Microbiology, Czech Academy of Sciences, Prague, Czechia
| | - Jiří Vohradský
- Laboratory of Bioinformatics, Institute of Microbiology, Czech Academy of Sciences, Prague, Czechia
| | - Martin Modrák
- Laboratory of Bioinformatics, Institute of Microbiology, Czech Academy of Sciences, Prague, Czechia
| | - Josef Pánek
- Laboratory of Bioinformatics, Institute of Microbiology, Czech Academy of Sciences, Prague, Czechia
| |
Collapse
|
9
|
Pánek J, Modrák M, Schwarz M. An Algorithm for Template-Based Prediction of Secondary Structures of Individual RNA Sequences. Front Genet 2017; 8:147. [PMID: 29067038 PMCID: PMC5641303 DOI: 10.3389/fgene.2017.00147] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2017] [Accepted: 09/25/2017] [Indexed: 11/24/2022] Open
Abstract
While understanding the structure of RNA molecules is vital for deciphering their functions, determining RNA structures experimentally is exceptionally hard. At the same time, extant approaches to computational RNA structure prediction have limited applicability and reliability. In this paper we provide a method to solve a simpler yet still biologically relevant problem: prediction of secondary RNA structure using structure of different molecules as a template. Our method identifies conserved and unconserved subsequences within an RNA molecule. For conserved subsequences, the template structure is directly transferred into the generated structure and combined with de-novo predicted structure for the unconserved subsequences with low evolutionary conservation. The method also determines, when the generated structure is unreliable. The method is validated using experimentally identified structures. The accuracy of the method exceeds that of classical prediction algorithms and constrained prediction methods. This is demonstrated by comparison using large number of heterogeneous RNAs. The presented method is fast and robust, and useful for various applications requiring knowledge of secondary structures of individual RNA sequences.
Collapse
Affiliation(s)
- Josef Pánek
- Laboratory of Bioinformatics, Institute of Microbiology of the Academy of Sciences of Czech Republic, Prague, Czechia
| | - Martin Modrák
- Laboratory of Bioinformatics, Institute of Microbiology of the Academy of Sciences of Czech Republic, Prague, Czechia
| | - Marek Schwarz
- Laboratory of Bioinformatics, Institute of Microbiology of the Academy of Sciences of Czech Republic, Prague, Czechia
| |
Collapse
|
10
|
Sato T, Higuchi H, Yokota SI, Tamura Y. Mycoplasma bovis isolates from dairy calves in Japan have less susceptibility than a reference strain to all approved macrolides associated with a point mutation (G748A) combined with multiple species-specific nucleotide alterations in 23S rRNA. Microbiol Immunol 2017; 61:215-224. [PMID: 28504455 DOI: 10.1111/1348-0421.12490] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2017] [Revised: 04/04/2017] [Accepted: 05/09/2017] [Indexed: 11/29/2022]
Abstract
Erythromycin, tylosin and tilmicosin are approved for use in cattle in Japan, the latter two being used to treat Mycoplasma bovis infection. In this study, 58 M. bovis isolates obtained from Japanese dairy calves all exhibited reduced susceptibility to these macrolides, this widespread reduced susceptibility being attributable to a few dominant lineages. All 58 isolates contained the G748A variant in both the rrl3 and rrl4 alleles of 23S rRNA, whereas a reference strain (PG45) did not. G748 localizes in the central loop of domain II (from C744 to A753) of 23S rRNA, which participates in binding to mycinose, a sugar residue present in both tylosin and tilmicosin. A number of in vitro-selected mutants derived from M. bovis PG45 showed reduced susceptibility to tylosin and tilmicosin and contained a nucleotide insertion within the central loop of domain II of rrl3 (U747-G748Ins_CU/GU or A743-U744Ins_UA), suggesting that mutations around G748 confer this reduced susceptibility phenotype. However, other Mycoplasma species containing G748A were susceptible to tylosin and tilmicosin. Sequence comparison with Escherichia coli revealed that M. bovis PG45 and isolates harbored five nucleotide alterations (U744C, G745A, U746C, A752C and A753G) in the central loop of domain II of 23S rRNA, whereas other Mycoplasma species lacked at least two of these five nucleotide alterations. It was therefore concluded that G748 mutations in combination with species-specific nucleotide alterations in the central loop of domain II of 23S rRNA are likely sufficient to reduce susceptibility of M. bovis to tylosin and tilmicosin.
Collapse
Affiliation(s)
- Toyotaka Sato
- Laboratory of Food Microbiology and Food Safety, Department of Health and Environmental Sciences, School of Veterinary Medicine, Rakuno Gakuen University, 582 Bunkyoudai-Midorimachi, Ebetsu, 069-8501, Japan.,Department of Microbiology, Sapporo Medical University School of Medicine, S1 W17, Chuo-ku, Sapporo, 060-8556, Japan
| | - Hidetoshi Higuchi
- Laboratory of Animal Health, Department of Health and Environmental Sciences, School of Veterinary Medicine, Rakuno Gakuen University, Ebetsu, 069-8501, Japan
| | - Shin-Ichi Yokota
- Department of Microbiology, Sapporo Medical University School of Medicine, S1 W17, Chuo-ku, Sapporo, 060-8556, Japan
| | - Yutaka Tamura
- Laboratory of Food Microbiology and Food Safety, Department of Health and Environmental Sciences, School of Veterinary Medicine, Rakuno Gakuen University, 582 Bunkyoudai-Midorimachi, Ebetsu, 069-8501, Japan
| |
Collapse
|
11
|
Murakami K, Zhao J, Yamasaki K, Miyagishi M. Biochemical and structural features of extracellular vesicle-binding RNA aptamers. Biomed Rep 2017; 6:615-626. [PMID: 28584632 PMCID: PMC5449965 DOI: 10.3892/br.2017.899] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2017] [Accepted: 02/16/2017] [Indexed: 01/08/2023] Open
Abstract
Extracellular vesicles are particles in mammalian body fluids that have attracted considerable attention as biomarkers for various diseases. In the present study, the authors isolated RNA aptamers with an affinity for extracellular vesicles from two library pools that encoded randomized sequences of different lengths. After the several rounds of selection, two conserved motifs are identified in the sequences that are obtained by next-generation sequencing. Most of the sequences were predicted to adopt a secondary structure that consisted of a non-conserved stem structure and a conserved loop sequence. Two minimal similar sequences are synthesized and confirmed the ability of these sequences to bind to extracellular vesicles. Circular dichroism spectroscopy and melting temperature analysis demonstrated that the aptamers were able to form a G-quadruplex structure in their loop regions and these structures were stabilized by potassium ions. Consistent with these structural data, the affinity of each aptamer for extracellular vesicles was dependent on potassium ions. The aptamers that were identified may be useful molecular tools for the development of diagnostic methods that utilize body fluids, such as blood, saliva and urine.
Collapse
Affiliation(s)
- Kazuyoshi Murakami
- Molecular Composite Medicine Research Group, Biomedical Research Institute, National Institute of Advanced Industrial Science and Technology, Tsukuba-shi, Ibaraki 305-8566, Japan
| | - Jing Zhao
- Molecular Composite Medicine Research Group, Biomedical Research Institute, National Institute of Advanced Industrial Science and Technology, Tsukuba-shi, Ibaraki 305-8566, Japan
| | - Kazuhiko Yamasaki
- Molecular Composite Medicine Research Group, Biomedical Research Institute, National Institute of Advanced Industrial Science and Technology, Tsukuba-shi, Ibaraki 305-8566, Japan
| | - Makoto Miyagishi
- Molecular Composite Medicine Research Group, Biomedical Research Institute, National Institute of Advanced Industrial Science and Technology, Tsukuba-shi, Ibaraki 305-8566, Japan
| |
Collapse
|
12
|
Hiruta SF, Kobayashi N, Katoh T, Kajihara H. Molecular Phylogeny of Cypridoid Freshwater Ostracods (Crustacea: Ostracoda), Inferred from 18S and 28S rDNA Sequences. Zoolog Sci 2016; 33:179-85. [PMID: 27032683 DOI: 10.2108/zs150103] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
With the aim of exploring phylogenetic relationships within Cypridoidea, the most species-rich superfamily among the podocopidan ostracods, we sequenced nearly the entire 18S rRNA gene (18S) and part of the 28S rRNA gene (28S) for 22 species in the order Podocopida, with representatives from all the major cypridoid families. We conducted phylogenetic analyses using the methods of maximum likelihood, minimum evolution, and Bayesian analysis. Our analyses showed monophyly for Cyprididae, one of the four families currently recognized in Cypridoidea. Candonidae turned out to be paraphyletic, and included three clades corresponding to the subfamilies Candoninae, Paracypridinae, and Cyclocypridinae. We propose restricting the name Candonidae s. str. to comprise what is now Candoninae, and raising Paracypridinae and Cyclocyprininae to family rank within the superfamily Cypridoidea.
Collapse
Affiliation(s)
- Shimpei F Hiruta
- 1 Faculty of Science, Hokkaido University, Sapporo 060-0810, Japan
| | | | - Toru Katoh
- 1 Faculty of Science, Hokkaido University, Sapporo 060-0810, Japan
| | - Hiroshi Kajihara
- 1 Faculty of Science, Hokkaido University, Sapporo 060-0810, Japan
| |
Collapse
|
13
|
Eukaryotic elongation factor 1-beta interacts with the 5' untranslated region of the M gene of Nipah virus to promote mRNA translation. Arch Virol 2016; 161:2361-8. [PMID: 27236461 DOI: 10.1007/s00705-016-2903-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2016] [Accepted: 05/14/2016] [Indexed: 12/12/2022]
Abstract
Nipah virus belongs to the genus Henipavirus in the family Paramyxoviridae, and its RNA genome is larger than those of other paramyxoviruses because it has long untranslated regions (UTRs) in each gene. However, the functions of these UTRs are not fully understood. In this study, we investigated the functions of the 5' UTRs and found that the 5' UTR of the M gene upregulated the translation of a reporter gene. Using an RNA pull-down assay, we showed that eukaryotic elongation factor 1-beta (EEF1B2) interacts with nucleotides 81-100 of the M 5' UTR and specifically enhances its translation efficiency. Our results suggest that the M 5' UTR promotes the production of M protein and viral budding by recruiting EEF1B2.
Collapse
|
14
|
Hamada M, Ono Y, Kiryu H, Sato K, Kato Y, Fukunaga T, Mori R, Asai K. Rtools: a web server for various secondary structural analyses on single RNA sequences. Nucleic Acids Res 2016; 44:W302-7. [PMID: 27131356 PMCID: PMC4987903 DOI: 10.1093/nar/gkw337] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2016] [Accepted: 04/15/2016] [Indexed: 11/12/2022] Open
Abstract
The secondary structures, as well as the nucleotide sequences, are the important features of RNA molecules to characterize their functions. According to the thermodynamic model, however, the probability of any secondary structure is very small. As a consequence, any tool to predict the secondary structures of RNAs has limited accuracy. On the other hand, there are a few tools to compensate the imperfect predictions by calculating and visualizing the secondary structural information from RNA sequences. It is desirable to obtain the rich information from those tools through a friendly interface. We implemented a web server of the tools to predict secondary structures and to calculate various structural features based on the energy models of secondary structures. By just giving an RNA sequence to the web server, the user can get the different types of solutions of the secondary structures, the marginal probabilities such as base-paring probabilities, loop probabilities and accessibilities of the local bases, the energy changes by arbitrary base mutations as well as the measures for validations of the predicted secondary structures. The web server is available at http://rtools.cbrc.jp, which integrates software tools, CentroidFold, CentroidHomfold, IPKnot, CapR, Raccess, Rchange and RintD.
Collapse
Affiliation(s)
- Michiaki Hamada
- Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo Shinjuku-ku, Tokyo 169-8555, Japan Artificial Intelligence Research Center (AIRC), National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, 135-0064 Tokyo, Japan
| | - Yukiteru Ono
- IMSBIO Co., Ltd, 4-21-1-601 Higashi-Ikebukuro, Toshima-ku, Tokyo 170-0013, Japan
| | - Hisanori Kiryu
- Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa 277-8562, Japan
| | - Kengo Sato
- Department of Biosciences and Informatics, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama 223-8522, Japan
| | - Yuki Kato
- Center for iPS Cell Research and Application (CiRA), Kyoto University, Shogoin, Sakyo-ku, Kyoto 606-8507, Japan
| | - Tsukasa Fukunaga
- Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa 277-8562, Japan
| | - Ryota Mori
- Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa 277-8562, Japan
| | - Kiyoshi Asai
- Artificial Intelligence Research Center (AIRC), National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, 135-0064 Tokyo, Japan Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa 277-8562, Japan
| |
Collapse
|
15
|
Bioinformatics tools for lncRNA research. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2016; 1859:23-30. [DOI: 10.1016/j.bbagrm.2015.07.014] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/30/2015] [Revised: 07/07/2015] [Accepted: 07/14/2015] [Indexed: 12/28/2022]
|
16
|
Local Mutational Pressures in Genomes of Zaire Ebolavirus and Marburg Virus. Adv Bioinformatics 2015; 2015:678587. [PMID: 26798338 PMCID: PMC4698526 DOI: 10.1155/2015/678587] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2015] [Revised: 10/30/2015] [Accepted: 11/03/2015] [Indexed: 11/18/2022] Open
Abstract
Heterogeneities in nucleotide content distribution along the length of Zaire ebolavirus and Marburg virus genomes have been analyzed. Results showed that there is asymmetric mutational A-pressure in the majority of Zaire ebolavirus genes; there is mutational AC-pressure in the coding region of the matrix protein VP40, probably, caused by its high expression at the end of the infection process; there is also AC-pressure in the 3'-part of the nucleoprotein (NP) coding gene associated with low amount of secondary structure formed by the 3'-part of its mRNA; in the middle of the glycoprotein (GP) coding gene that kind of mutational bias is linked with the high amount of secondary structure formed by the corresponding fragment of RNA negative (-) strand; there is relatively symmetric mutational AU-pressure in the polymerase (Pol) coding gene caused by its low expression level. In Marburg virus all genes, including C-rich fragment of GP coding region, demonstrate asymmetric mutational A-bias, while the last gene (Pol) demonstrates more symmetric mutational AU-pressure. The hypothesis of a newly synthesized RNA negative (-) strand shielding by complementary fragments of mRNAs has been described in this work: shielded fragments of RNA negative (-) strand should be better protected from oxidative damage and prone to ADAR-editing.
Collapse
|
17
|
Yonemoto H, Asai K, Hamada M. A semi-supervised learning approach for RNA secondary structure prediction. Comput Biol Chem 2015; 57:72-9. [PMID: 25748534 DOI: 10.1016/j.compbiolchem.2015.02.002] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2015] [Accepted: 02/03/2015] [Indexed: 12/25/2022]
Abstract
RNA secondary structure prediction is a key technology in RNA bioinformatics. Most algorithms for RNA secondary structure prediction use probabilistic models, in which the model parameters are trained with reliable RNA secondary structures. Because of the difficulty of determining RNA secondary structures by experimental procedures, such as NMR or X-ray crystal structural analyses, there are still many RNA sequences that could be useful for training whose secondary structures have not been experimentally determined. In this paper, we introduce a novel semi-supervised learning approach for training parameters in a probabilistic model of RNA secondary structures in which we employ not only RNA sequences with annotated secondary structures but also ones with unknown secondary structures. Our model is based on a hybrid of generative (stochastic context-free grammars) and discriminative models (conditional random fields) that has been successfully applied to natural language processing. Computational experiments indicate that the accuracy of secondary structure prediction is improved by incorporating RNA sequences with unknown secondary structures into training. To our knowledge, this is the first study of a semi-supervised learning approach for RNA secondary structure prediction. This technique will be useful when the number of reliable structures is limited.
Collapse
Affiliation(s)
- Haruka Yonemoto
- Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa 277-8562, Japan
| | - Kiyoshi Asai
- Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa 277-8562, Japan; Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7, Aomi, Koto-ku, Tokyo 135-0064, Japan
| | - Michiaki Hamada
- Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo Shinjuku-ku, Tokyo 169-8555, Japan; Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7, Aomi, Koto-ku, Tokyo 135-0064, Japan.
| |
Collapse
|
18
|
DNASynth: a computer program for assembly of artificial gene parts in decreasing temperature. BIOMED RESEARCH INTERNATIONAL 2015; 2015:413262. [PMID: 25629047 PMCID: PMC4300049 DOI: 10.1155/2015/413262] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/30/2014] [Revised: 10/08/2014] [Accepted: 10/11/2014] [Indexed: 11/23/2022]
Abstract
Artificial gene synthesis requires consideration of nucleotide sequence development as well as long DNA molecule assembly protocols. The nucleotide sequence of the molecule must meet many conditions including particular preferences of the host organism for certain codons, avoidance of specific regulatory subsequences, and a lack of secondary structures that inhibit expression. The chemical synthesis of DNA molecule has limitations in terms of strand length; thus, the creation of artificial genes requires the assembly of long DNA molecules from shorter fragments.
In the approach presented, the algorithm and the computer program address both tasks: developing the optimal nucleotide sequence to encode a given peptide for a given host organism and determining the long DNA assembly protocol. These tasks are closely connected; a change in codon usage may lead to changes in the optimal assembly protocol, and the lack of a simple assembly protocol may be addressed by changing the nucleotide sequence. The computer program presented in this study was tested with real data from an experiment in a wet biological laboratory to synthesize a peptide. The benefit of the presented algorithm and its application is the shorter time, compared to polymerase cycling assembly, needed to produce a ready synthetic gene.
Collapse
|
19
|
Abstract
It has been well accepted that the RNA secondary structures of most functional non-coding RNAs (ncRNAs) are closely related to their functions and are conserved during evolution. Hence, prediction of conserved secondary structures from evolutionarily related sequences is one important task in RNA bioinformatics; the methods are useful not only to further functional analyses of ncRNAs but also to improve the accuracy of secondary structure predictions and to find novel functional RNAs from the genome. In this review, I focus on common secondary structure prediction from a given aligned RNA sequence, in which one secondary structure whose length is equal to that of the input alignment is predicted. I systematically review and classify existing tools and algorithms for the problem, by utilizing the information employed in the tools and by adopting a unified viewpoint based on maximum expected gain (MEG) estimators. I believe that this classification will allow a deeper understanding of each tool and provide users with useful information for selecting tools for common secondary structure predictions.
Collapse
|
20
|
Khrustalev VV, Barkovsky EV, Khrustaleva TA, Lelevich SV. Intragenic isochores (intrachores) in the platelet phosphofructokinase gene of Passeriform birds. Gene 2014; 546:16-24. [PMID: 24861647 DOI: 10.1016/j.gene.2014.05.045] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2014] [Revised: 05/09/2014] [Accepted: 05/21/2014] [Indexed: 10/25/2022]
Abstract
Total GC-content in the platelet phosphofructokinase gene of Zebra Finch (Taeniopygia guttata) is low (37.53±0.51%), while there are short areas (about 300 nucleotides in length) with increased GC-content overlapping its exon 4 and exon 17. GC-content in third codon positions (3GC) of those two exons is equal to 88.42 and 80.00%, respectively, while overall 3GC of the coding region is equal to 49.9%. Similar distribution of GC-content has been found in platelet phosphofructokinase genes of other birds from Passeriformes order. According to the results of phylogenetic analysis, formation of those areas with high G+C started from 91.4 to 47.1millionyears ago, since there are no such peaks of GC-content in homologous genes of other birds and reptiles. There are clusters of transcription factor binding sites in those areas with higher GC-content, as well as microRNA precursors conserved in Zebra Finch and Flycatcher genes. According to our hypothesis those intragenic isochores (intrachores) may be consequences of autonomous microRNA precursor transcription at certain period(s) of embryogenesis and gametogenesis, when the platelet phosphofructokinase gene itself is not expressed. Transcription-associated mutational pressure existing during those periods may cause the increase in rates of AT to GC mutations in those genes which are transcribed.
Collapse
Affiliation(s)
| | | | | | - Sergey Vladimirovich Lelevich
- Department of Clinical Laboratory Diagnostics, Allergology and Immunology, Grodno State Medical University, Gorkogo 80, Grodno, Belarus
| |
Collapse
|
21
|
Abstract
Efforts to understand the molecular basis of mycobacterial gene regulation are dominated by a protein-centric view. However, there is a growing appreciation that noncoding RNA, i.e., RNA that is not translated, plays a role in a wide variety of molecular mechanisms. Noncoding RNA comprises rRNA, tRNA, 4.5S RNA, RnpB, and transfer-messenger RNA, as well as a vast population of regulatory RNA, often dubbed "the dark matter of gene regulation." The regulatory RNA species comprise 5' and 3' untranslated regions and a rapidly expanding category of transcripts with the ability to base-pair with mRNAs or to interact with proteins. Regulatory RNA plays a central role in the bacterium's response to changes in the environment, and in this article we review emerging information on the presence and abundance of different types of noncoding RNA in mycobacteria.
Collapse
|
22
|
Abstract
Many bioinformatics problems, such as sequence alignment, gene prediction, phylogenetic tree estimation and RNA secondary structure prediction, are often affected by the 'uncertainty' of a solution, that is, the probability of the solution is extremely small. This situation arises for estimation problems on high-dimensional discrete spaces in which the number of possible discrete solutions is immense. In the analysis of biological data or the development of prediction algorithms, this uncertainty should be handled carefully and appropriately. In this review, I will explain several methods to combat this uncertainty, presenting a number of examples in bioinformatics. The methods include (i) avoiding point estimation, (ii) maximum expected accuracy (MEA) estimations and (iii) several strategies to design a pipeline involving several prediction methods. I believe that the basic concepts and ideas described in this review will be generally useful for estimation problems in various areas of bioinformatics.
Collapse
|
23
|
Downregulation of Nipah virus N mRNA occurs through interaction between its 3' untranslated region and hnRNP D. J Virol 2013; 87:6582-8. [PMID: 23514888 DOI: 10.1128/jvi.02495-12] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Nipah virus (NiV) is a nonsegmented, single-stranded, negative-sense RNA virus belonging to the genus Henipavirus, family Paramyxoviridae. NiV causes acute encephalitis and respiratory disease in humans, is associated with high mortality, and poses a threat in southern Asia. The genomes of henipaviruses are about 18,246 nucleotides (nt) long, which is longer than those of other paramyxoviruses (around 15,384 nt). This difference is caused by the noncoding RNA region, particularly the 3' untranslated region (UTR), which occupies more than half of the noncoding RNA region. To determine the function(s) of the NiV noncoding RNA region, we investigated the effects of NiV 3' UTRs on reporter gene expression. The NiV N 3' UTR (nt 1 to 100) demonstrated strong repressor activity associated with hnRNP D protein binding to that region. Mutation of the hnRNP D binding site or knockdown of hnRNP D resulted in increased expression of the NiV N 3' UTR reporter. Our findings suggest that NiV N expression is repressed by hnRNP D through the NiV N 3' UTR and demonstrate the involvement of posttranscriptional regulation in the NiV life cycle. To the best of our knowledge, this provides the first report of the functions of the NiV noncoding RNA region.
Collapse
|
24
|
Sakuragi JI, Ode H, Sakuragi S, Shioda T, Sato H. A proposal for a new HIV-1 DLS structural model. Nucleic Acids Res 2012; 40:5012-22. [PMID: 22328732 PMCID: PMC3367192 DOI: 10.1093/nar/gks156] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The dimer initiation site/dimer linkage sequence (DIS/DLS) region of the human immunodeficiency virus type 1 (HIV-1) RNA genome is suggested to play essential roles at various stages of the viral life cycle. Through a novel assay we had recently developed, we reported on the necessary and sufficient region for RNA dimerization in the HIV-1 virion. Using this system, we performed further detailed mapping of the functional base pairs necessary for HIV-1 DLS structure. Interestingly, the study revealed a previously unnoticed stem formation between two distantly positioned regions. Based on this and other findings on functional base pairing in vivo, we propose new 3D models of the HIV-1 DLS which contain a unique pseudoknot-like conformation. Since this pseudoknot-like conformation appears to be thermodynamically stable, forms a foundational skeleton for the DLS and sterically restricts the spontaneous diversification of DLS conformations, its unique shape may contribute to the viral life cycle and potentially serve as a novel target for anti-HIV-1 therapies.
Collapse
Affiliation(s)
- Jun-ichi Sakuragi
- Department of Viral Infections, RIMD, Osaka Univ. 3-1 Yamadaoka, Suita, Osaka 565-0871, Japan.
| | | | | | | | | |
Collapse
|
25
|
Hamada M, Asai K. A classification of bioinformatics algorithms from the viewpoint of maximizing expected accuracy (MEA). J Comput Biol 2012; 19:532-49. [PMID: 22313125 DOI: 10.1089/cmb.2011.0197] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Many estimation problems in bioinformatics are formulated as point estimation problems in a high-dimensional discrete space. In general, it is difficult to design reliable estimators for this type of problem, because the number of possible solutions is immense, which leads to an extremely low probability for every solution-even for the one with the highest probability. Therefore, maximum score and maximum likelihood estimators do not work well in this situation although they are widely employed in a number of applications. Maximizing expected accuracy (MEA) estimation, in which accuracy measures of the target problem and the entire distribution of solutions are considered, is a more successful approach. In this review, we provide an extensive discussion of algorithms and software based on MEA. We describe how a number of algorithms used in previous studies can be classified from the viewpoint of MEA. We believe that this review will be useful not only for users wishing to utilize software to solve the estimation problems appearing in this article, but also for developers wishing to design algorithms on the basis of MEA.
Collapse
Affiliation(s)
- Michiaki Hamada
- Graduate School of Frontier Sciences, University of Tokyo, Kashiwa, Japan.
| | | |
Collapse
|
26
|
Wang Z, Xu J. A conditional random fields method for RNA sequence-structure relationship modeling and conformation sampling. ACTA ACUST UNITED AC 2011; 27:i102-10. [PMID: 21685058 PMCID: PMC3117333 DOI: 10.1093/bioinformatics/btr232] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Accurate tertiary structures are very important for the functional study of non-coding RNA molecules. However, predicting RNA tertiary structures is extremely challenging, because of a large conformation space to be explored and lack of an accurate scoring function differentiating the native structure from decoys. The fragment-based conformation sampling method (e.g. FARNA) bears shortcomings that the limited size of a fragment library makes it infeasible to represent all possible conformations well. A recent dynamic Bayesian network method, BARNACLE, overcomes the issue of fragment assembly. In addition, neither of these methods makes use of sequence information in sampling conformations. Here, we present a new probabilistic graphical model, conditional random fields (CRFs), to model RNA sequence–structure relationship, which enables us to accurately estimate the probability of an RNA conformation from sequence. Coupled with a novel tree-guided sampling scheme, our CRF model is then applied to RNA conformation sampling. Experimental results show that our CRF method can model RNA sequence–structure relationship well and sequence information is important for conformation sampling. Our method, named as TreeFolder, generates a much higher percentage of native-like decoys than FARNA and BARNACLE, although we use the same simple energy function as BARNACLE. Contact:zywang@ttic.edu; j3xu@ttic.edu Supplementary Information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhiyong Wang
- Toyota Technological Institute at Chicago, IL, USA.
| | | |
Collapse
|
27
|
Hamada M, Yamada K, Sato K, Frith MC, Asai K. CentroidHomfold-LAST: accurate prediction of RNA secondary structure using automatically collected homologous sequences. Nucleic Acids Res 2011; 39:W100-6. [PMID: 21565800 PMCID: PMC3125741 DOI: 10.1093/nar/gkr290] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Although secondary structure predictions of an individual RNA sequence have been widely used in a number of sequence analyses of RNAs, accuracy is still limited. Recently, we proposed a method (called 'CentroidHomfold'), which includes information about homologous sequences into the prediction of the secondary structure of the target sequence, and showed that it substantially improved the performance of secondary structure predictions. CentroidHomfold, however, forces users to prepare homologous sequences of the target sequence. We have developed a Web application (CentroidHomfold-LAST) that predicts the secondary structure of the target sequence using automatically collected homologous sequences. LAST, which is a fast and sensitive local aligner, and CentroidHomfold are employed in the Web application. Computational experiments with a commonly-used data set indicated that CentroidHomfold-LAST substantially outperformed conventional secondary structure predictions including CentroidFold and RNAfold.
Collapse
Affiliation(s)
- Michiaki Hamada
- Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8562, Japan.
| | | | | | | | | |
Collapse
|
28
|
Kiryu H, Terai G, Imamura O, Yoneyama H, Suzuki K, Asai K. A detailed investigation of accessibilities around target sites of siRNAs and miRNAs. ACTA ACUST UNITED AC 2011; 27:1788-97. [PMID: 21531769 DOI: 10.1093/bioinformatics/btr276] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
MOTIVATION The importance of RNA sequence analysis has been increasing since the discovery of various types of non-coding RNAs transcribed in animal cells. Conventional RNA sequence analyses have mainly focused on structured regions, which are stabilized by the stacking energies acting on adjacent base pairs. On the other hand, recent findings regarding the mechanisms of small interfering RNAs (siRNAs) and transcription regulation by microRNAs (miRNAs) indicate the importance of analyzing accessible regions where no base pairs exist. So far, relatively few studies have investigated the nature of such regions. RESULTS We have conducted a detailed investigation of accessibilities around the target sites of siRNAs and miRNAs. We have exhaustively calculated the correlations between the accessibilities around the target sites and the repression levels of the corresponding mRNAs. We have computed the accessibilities with an originally developed software package, called 'Raccess', which computes the accessibility of all the segments of a fixed length for a given RNA sequence when the maximal distance between base pairs is limited to a fixed size W. We show that the computed accessibilities are relatively insensitive to the choice of the maximal span W. We have found that the efficacy of siRNAs depends strongly on the accessibility of the very 3'-end of their binding sites, which might reflect a target site recognition mechanism in the RNA-induced silencing complex. We also show that the efficacy of miRNAs has a similar dependence on the accessibilities, but some miRNAs also show positive correlations between the efficacy and the accessibilities in broad regions downstream of their putative binding sites, which might imply that the downstream regions of the target sites are bound by other proteins that allow the miRNAs to implement their functions. We have also investigated the off-target effects of an siRNA as a potential RNAi therapeutic. We show that the off-target effects of the siRNA have similar correlations to the miRNA repression, indicating that they are caused by the same mechanism. AVAILABILITY The C++ source code of the Raccess software is available at http://www.ncrna.org/software/Raccess/ The microarray data on the measurements of the siRNA off-target effects are also available at the same site. CONTACT kiryu-h@k.u-tokyo.ac.jp
Collapse
Affiliation(s)
- Hisanori Kiryu
- Department of Computational Biology, Faculty of Frontier Science, The University of Tokyo, Chiba 277-8561, Japan.
| | | | | | | | | | | |
Collapse
|
29
|
Harmanci AO, Sharma G, Mathews DH. TurboFold: iterative probabilistic estimation of secondary structures for multiple RNA sequences. BMC Bioinformatics 2011; 12:108. [PMID: 21507242 PMCID: PMC3120699 DOI: 10.1186/1471-2105-12-108] [Citation(s) in RCA: 69] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2010] [Accepted: 04/20/2011] [Indexed: 01/07/2023] Open
Abstract
Background The prediction of secondary structure, i.e. the set of canonical base pairs between nucleotides, is a first step in developing an understanding of the function of an RNA sequence. The most accurate computational methods predict conserved structures for a set of homologous RNA sequences. These methods usually suffer from high computational complexity. In this paper, TurboFold, a novel and efficient method for secondary structure prediction for multiple RNA sequences, is presented. Results TurboFold takes, as input, a set of homologous RNA sequences and outputs estimates of the base pairing probabilities for each sequence. The base pairing probabilities for a sequence are estimated by combining intrinsic information, derived from the sequence itself via the nearest neighbor thermodynamic model, with extrinsic information, derived from the other sequences in the input set. For a given sequence, the extrinsic information is computed by using pairwise-sequence-alignment-based probabilities for co-incidence with each of the other sequences, along with estimated base pairing probabilities, from the previous iteration, for the other sequences. The extrinsic information is introduced as free energy modifications for base pairing in a partition function computation based on the nearest neighbor thermodynamic model. This process yields updated estimates of base pairing probability. The updated base pairing probabilities in turn are used to recompute extrinsic information, resulting in the overall iterative estimation procedure that defines TurboFold. TurboFold is benchmarked on a number of ncRNA datasets and compared against alternative secondary structure prediction methods. The iterative procedure in TurboFold is shown to improve estimates of base pairing probability with each iteration, though only small gains are obtained beyond three iterations. Secondary structures composed of base pairs with estimated probabilities higher than a significance threshold are shown to be more accurate for TurboFold than for alternative methods that estimate base pairing probabilities. TurboFold-MEA, which uses base pairing probabilities from TurboFold in a maximum expected accuracy algorithm for secondary structure prediction, has accuracy comparable to the best performing secondary structure prediction methods. The computational and memory requirements for TurboFold are modest and, in terms of sequence length and number of sequences, scale much more favorably than joint alignment and folding algorithms. Conclusions TurboFold is an iterative probabilistic method for predicting secondary structures for multiple RNA sequences that efficiently and accurately combines the information from the comparative analysis between sequences with the thermodynamic folding model. Unlike most other multi-sequence structure prediction methods, TurboFold does not enforce strict commonality of structures and is therefore useful for predicting structures for homologous sequences that have diverged significantly. TurboFold can be downloaded as part of the RNAstructure package at http://rna.urmc.rochester.edu.
Collapse
Affiliation(s)
- Arif O Harmanci
- Department of Electrical and Computer Engineering, University of Rochester, Rochester, NY, USA
| | | | | |
Collapse
|
30
|
Hamada M, Kiryu H, Iwasaki W, Asai K. Generalized centroid estimators in bioinformatics. PLoS One 2011; 6:e16450. [PMID: 21365017 PMCID: PMC3041832 DOI: 10.1371/journal.pone.0016450] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2010] [Accepted: 12/22/2010] [Indexed: 11/27/2022] Open
Abstract
In a number of estimation problems in bioinformatics, accuracy measures of the target problem are usually given, and it is important to design estimators that are suitable to those accuracy measures. However, there is often a discrepancy between an employed estimator and a given accuracy measure of the problem. In this study, we introduce a general class of efficient estimators for estimation problems on high-dimensional binary spaces, which represent many fundamental problems in bioinformatics. Theoretical analysis reveals that the proposed estimators generally fit with commonly-used accuracy measures (e.g. sensitivity, PPV, MCC and F-score) as well as it can be computed efficiently in many cases, and cover a wide range of problems in bioinformatics from the viewpoint of the principle of maximum expected accuracy (MEA). It is also shown that some important algorithms in bioinformatics can be interpreted in a unified manner. Not only the concept presented in this paper gives a useful framework to design MEA-based estimators but also it is highly extendable and sheds new light on many problems in bioinformatics.
Collapse
Affiliation(s)
- Michiaki Hamada
- Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan.
| | | | | | | |
Collapse
|
31
|
Sahraeian SME, Yoon BJ. PicXAA-R: efficient structural alignment of multiple RNA sequences using a greedy approach. BMC Bioinformatics 2011; 12 Suppl 1:S38. [PMID: 21342569 PMCID: PMC3044294 DOI: 10.1186/1471-2105-12-s1-s38] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
Background Accurate and efficient structural alignment of non-coding RNAs (ncRNAs) has grasped more and more attentions as recent studies unveiled the significance of ncRNAs in living organisms. While the Sankoff style structural alignment algorithms cannot efficiently serve for multiple sequences, mostly progressive schemes are used to reduce the complexity. However, this idea tends to propagate the early stage errors throughout the entire process, thereby degrading the quality of the final alignment. For multiple protein sequence alignment, we have recently proposed PicXAA which constructs an accurate alignment in a non-progressive fashion. Results Here, we propose PicXAA-R as an extension to PicXAA for greedy structural alignment of ncRNAs. PicXAA-R efficiently grasps both folding information within each sequence and local similarities between sequences. It uses a set of probabilistic consistency transformations to improve the posterior base-pairing and base alignment probabilities using the information of all sequences in the alignment. Using a graph-based scheme, we greedily build up the structural alignment from sequence regions with high base-pairing and base alignment probabilities. Conclusions Several experiments on datasets with different characteristics confirm that PicXAA-R is one of the fastest algorithms for structural alignment of multiple RNAs and it consistently yields accurate alignment results, especially for datasets with locally similar sequences. PicXAA-R source code is freely available at: http://www.ece.tamu.edu/~bjyoon/picxaa/.
Collapse
|
32
|
Hamada M, Sato K, Asai K. Prediction of RNA secondary structure by maximizing pseudo-expected accuracy. BMC Bioinformatics 2010; 11:586. [PMID: 21118522 PMCID: PMC3003279 DOI: 10.1186/1471-2105-11-586] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2010] [Accepted: 11/30/2010] [Indexed: 12/17/2022] Open
Abstract
Background Recent studies have revealed the importance of considering the entire distribution of possible secondary structures in RNA secondary structure predictions; therefore, a new type of estimator is proposed including the maximum expected accuracy (MEA) estimator. The MEA-based estimators have been designed to maximize the expected accuracy of the base-pairs and have achieved the highest level of accuracy. Those methods, however, do not give the single best prediction of the structure, but employ parameters to control the trade-off between the sensitivity and the positive predictive value (PPV). It is unclear what parameter value we should use, and even the well-trained default parameter value does not, in general, give the best result in popular accuracy measures to each RNA sequence. Results Instead of using the expected values of the popular accuracy measures for RNA secondary structure prediction, which is difficult to be calculated, the pseudo-expected accuracy, which can easily be computed from base-pairing probabilities, is introduced. It is shown that the pseudo-expected accuracy is a good approximation in terms of sensitivity, PPV, MCC, or F-score. The pseudo-expected accuracy can be approximately maximized for each RNA sequence by stochastic sampling. It is also shown that well-balanced secondary structures between sensitivity and PPV can be predicted with a small computational overhead by combining the pseudo-expected accuracy of MCC or F-score with the γ-centroid estimator. Conclusions This study gives not only a method for predicting the secondary structure that balances between sensitivity and PPV, but also a general method for approximately maximizing the (pseudo-)expected accuracy with respect to various evaluation measures including MCC and F-score.
Collapse
Affiliation(s)
- Michiaki Hamada
- Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba, Japan.
| | | | | |
Collapse
|
33
|
Hamada M, Sato K, Asai K. Improving the accuracy of predicting secondary structure for aligned RNA sequences. Nucleic Acids Res 2010; 39:393-402. [PMID: 20843778 PMCID: PMC3025558 DOI: 10.1093/nar/gkq792] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Considerable attention has been focused on predicting the secondary structure for aligned RNA sequences since it is useful not only for improving the limiting accuracy of conventional secondary structure prediction but also for finding non-coding RNAs in genomic sequences. Although there exist many algorithms of predicting secondary structure for aligned RNA sequences, further improvement of the accuracy is still awaited. In this article, toward improving the accuracy, a theoretical classification of state-of-the-art algorithms of predicting secondary structure for aligned RNA sequences is presented. The classification is based on the viewpoint of maximum expected accuracy (MEA), which has been successfully applied in various problems in bioinformatics. The classification reveals several disadvantages of the current algorithms but we propose an improvement of a previously introduced algorithm (CentroidAlifold). Finally, computational experiments strongly support the theoretical classification and indicate that the improved CentroidAlifold substantially outperforms other algorithms.
Collapse
Affiliation(s)
- Michiaki Hamada
- Mizuho Information & Research Institute, Inc, Chiyoda-ku, Tokyo, Japan.
| | | | | |
Collapse
|
34
|
Hamada M, Sato K, Kiryu H, Mituyama T, Asai K. CentroidAlign: fast and accurate aligner for structured RNAs by maximizing expected sum-of-pairs score. Bioinformatics 2009; 25:3236-43. [DOI: 10.1093/bioinformatics/btp580] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
|