1
|
Edelbroek B, Kjellin J, Biryukova I, Liao Z, Lundberg T, Noegel A, Eichinger L, Friedländer M, Söderbom F. Evolution of microRNAs in Amoebozoa and implications for the origin of multicellularity. Nucleic Acids Res 2024; 52:3121-3136. [PMID: 38375870 PMCID: PMC11014262 DOI: 10.1093/nar/gkae109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 01/31/2024] [Accepted: 02/05/2024] [Indexed: 02/21/2024] Open
Abstract
MicroRNAs (miRNAs) are important and ubiquitous regulators of gene expression in both plants and animals. They are thought to have evolved convergently in these lineages and hypothesized to have played a role in the evolution of multicellularity. In line with this hypothesis, miRNAs have so far only been described in few unicellular eukaryotes. Here, we investigate the presence and evolution of miRNAs in Amoebozoa, focusing on species belonging to Acanthamoeba, Physarum and dictyostelid taxonomic groups, representing a range of unicellular and multicellular lifestyles. miRNAs that adhere to both the stringent plant and animal miRNA criteria were identified in all examined amoebae, expanding the total number of protists harbouring miRNAs from 7 to 15. We found conserved miRNAs between closely related species, but the majority of species feature only unique miRNAs. This shows rapid gain and/or loss of miRNAs in Amoebozoa, further illustrated by a detailed comparison between two evolutionary closely related dictyostelids. Additionally, loss of miRNAs in the Dictyostelium discoideum drnB mutant did not seem to affect multicellular development and, hence, demonstrates that the presence of miRNAs does not appear to be a strict requirement for the transition from uni- to multicellular life.
Collapse
Affiliation(s)
- Bart Edelbroek
- Department of Cell and Molecular Biology, Uppsala Biomedical Centre, Uppsala University, 75124 Uppsala, Sweden
| | - Jonas Kjellin
- Department of Cell and Molecular Biology, Uppsala Biomedical Centre, Uppsala University, 75124 Uppsala, Sweden
| | - Inna Biryukova
- Science for Life Laboratory, The Department of Molecular Biosciences, The Wenner-Gren Institute, Stockholm University, 10691 Stockholm, Sweden
| | - Zhen Liao
- Department of Cell and Molecular Biology, Uppsala Biomedical Centre, Uppsala University, 75124 Uppsala, Sweden
| | - Torgny Lundberg
- Department of Cell and Molecular Biology, Uppsala Biomedical Centre, Uppsala University, 75124 Uppsala, Sweden
| | - Angelika A Noegel
- Centre for Biochemistry, Medical Faculty, University of Cologne, 50931 Cologne, Germany
| | - Ludwig Eichinger
- Centre for Biochemistry, Medical Faculty, University of Cologne, 50931 Cologne, Germany
| | - Marc R Friedländer
- Science for Life Laboratory, The Department of Molecular Biosciences, The Wenner-Gren Institute, Stockholm University, 10691 Stockholm, Sweden
| | - Fredrik Söderbom
- Department of Cell and Molecular Biology, Uppsala Biomedical Centre, Uppsala University, 75124 Uppsala, Sweden
| |
Collapse
|
2
|
Backofen R, Gorodkin J, Hofacker IL, Stadler PF. Comparative RNA Genomics. Methods Mol Biol 2024; 2802:347-393. [PMID: 38819565 DOI: 10.1007/978-1-0716-3838-5_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Over the last quarter of a century it has become clear that RNA is much more than just a boring intermediate in protein expression. Ancient RNAs still appear in the core information metabolism and comprise a surprisingly large component in bacterial gene regulation. A common theme with these types of mostly small RNAs is their reliance of conserved secondary structures. Large-scale sequencing projects, on the other hand, have profoundly changed our understanding of eukaryotic genomes. Pervasively transcribed, they give rise to a plethora of large and evolutionarily extremely flexible non-coding RNAs that exert a vastly diverse array of molecule functions. In this chapter we provide a-necessarily incomplete-overview of the current state of comparative analysis of non-coding RNAs, emphasizing computational approaches as a means to gain a global picture of the modern RNA world.
Collapse
Affiliation(s)
- Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, Germany
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
| | - Jan Gorodkin
- Center for Non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Ivo L Hofacker
- Institute for Theoretical Chemistry, University of Vienna, Wien, Austria
- Bioinformatics and Computational Biology research group, University of Vienna, Vienna, Austria
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, University of Leipzig, Leipzig, Germany.
- Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany.
- Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany.
- Universidad National de Colombia, Bogotá, Colombia.
- Institute for Theoretical Chemistry, University of Vienna, Wien, Austria.
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark.
- Santa Fe Institute, Santa Fe, NM, USA.
| |
Collapse
|
3
|
Revisiting the Relationships Between Genomic G + C Content, RNA Secondary Structures, and Optimal Growth Temperature. J Mol Evol 2020; 89:165-171. [PMID: 33216148 DOI: 10.1007/s00239-020-09974-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Accepted: 11/09/2020] [Indexed: 10/23/2022]
Abstract
Over twenty years ago Galtier and Lobry published a manuscript entitled "Relationships between Genomic G + C Content, RNA Secondary Structure, and Optimal Growth Temperature" in the Journal of Molecular Evolution that showcased the lack of a relationship between genomic G + C content and optimal growth temperature (OGT) in a set of about 200 prokaryotes. Galtier and Lobry also assessed the relationship between RNA secondary structures (rRNA stems, tRNAs) and OGT, and in this case a clear relationship emerged. Increasing structured RNA G + C content (particularly in regions that are double-stranded) correlates with increased OGT. Both of these fundamental relationships have withstood test of many additional sequences and spawned a variety of different applications that include prediction of OGT from rRNA sequence and computational ncRNA identification approaches. In this work, I present the motivation behind Galtier and Lobry's original paper and the larger questions addressed by the work, how these questions have evolved over the last two decades, and the impact of Galtier and Lobry's manuscript in fields beyond these questions.
Collapse
|
4
|
Liao Z, Kjellin J, Hoeppner MP, Grabherr M, Söderbom F. Global characterization of the Dicer-like protein DrnB roles in miRNA biogenesis in the social amoeba Dictyostelium discoideum. RNA Biol 2018; 15:937-954. [PMID: 29966484 PMCID: PMC6161686 DOI: 10.1080/15476286.2018.1481697] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Micro (mi)RNAs regulate gene expression in many eukaryotic organisms where they control diverse biological processes. Their biogenesis, from primary transcripts to mature miRNAs, have been extensively characterized in animals and plants, showing distinct differences between these phylogenetically distant groups of organisms. However, comparably little is known about miRNA biogenesis in organisms whose evolutionary position is placed in between plants and animals and/or in unicellular organisms. Here, we investigate miRNA maturation in the unicellular amoeba Dictyostelium discoideum, belonging to Amoebozoa, which branched out after plants but before animals. High-throughput sequencing of small RNAs and poly(A)-selected RNAs demonstrated that the Dicer-like protein DrnB is required, and essentially specific, for global miRNA maturation in D. discoideum. Our RNA-seq data also showed that longer miRNA transcripts, generally preceded by a T-rich putative promoter motif, accumulate in a drnB knock-out strain. For two model miRNAs we defined the transcriptional start sites (TSSs) of primary (pri)-miRNAs and showed that they carry the RNA polymerase II specific m7G-cap. The generation of the 3ʹ-ends of these pri-miRNAs differs, with pri-mir-1177 reading into the downstream gene, and pri-mir-1176 displaying a distinct end. This 3´-end is processed to shorter intermediates, stabilized in DrnB-depleted cells, of which some carry a short oligo(A)-tail. Furthermore, we identified 10 new miRNAs, all DrnB dependent and developmentally regulated. Thus, the miRNA machinery in D. discoideum shares features with both plants and animals, which is in agreement with its evolutionary position and perhaps also an adaptation to its complex lifestyle: unicellular growth and multicellular development.
Collapse
Affiliation(s)
- Zhen Liao
- a Department of Cell and Molecular Biology , Uppsala University , Uppsala , Sweden
| | - Jonas Kjellin
- a Department of Cell and Molecular Biology , Uppsala University , Uppsala , Sweden
| | - Marc P Hoeppner
- b Department of Medical Biochemistry and Microbiology , Uppsala University , Uppsala , Sweden.,c Christian-Albrechts-University of Kiel, Institute of Clinical Molecular Biology , Kiel , Germany
| | - Manfred Grabherr
- b Department of Medical Biochemistry and Microbiology , Uppsala University , Uppsala , Sweden
| | - Fredrik Söderbom
- a Department of Cell and Molecular Biology , Uppsala University , Uppsala , Sweden
| |
Collapse
|
5
|
Abstract
Over the last two decades it has become clear that RNA is much more than just a boring intermediate in protein expression. Ancient RNAs still appear in the core information metabolism and comprise a surprisingly large component in bacterial gene regulation. A common theme with these types of mostly small RNAs is their reliance of conserved secondary structures. Large scale sequencing projects, on the other hand, have profoundly changed our understanding of eukaryotic genomes. Pervasively transcribed, they give rise to a plethora of large and evolutionarily extremely flexible noncoding RNAs that exert a vastly diverse array of molecule functions. In this chapter we provide a-necessarily incomplete-overview of the current state of comparative analysis of noncoding RNAs, emphasizing computational approaches as a means to gain a global picture of the modern RNA world.
Collapse
Affiliation(s)
- Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, D-79110 Freiburg, Germany.,Center for non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark
| | - Jan Gorodkin
- Center for non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark
| | - Ivo L Hofacker
- Center for non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark.,Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria.,Bioinformatics and Computational Biology Research Group, University of Vienna, Währingerstraße 17, A-1090 Vienna, Austria
| | - Peter F Stadler
- Center for non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark. .,Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria. .,Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany. .,Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig, Germany. .,Fraunhofer Institute for Cell Therapy and Immunology, Perlickstraße 1, D-04103 Leipzig, Germany. .,Santa Fe Institute, 1399 Hyde Park Rd, Santa Fe, NM 87501, USA.
| |
Collapse
|
6
|
The Long Noncoding RNA Transcriptome of Dictyostelium discoideum Development. G3-GENES GENOMES GENETICS 2017; 7:387-398. [PMID: 27932387 PMCID: PMC5295588 DOI: 10.1534/g3.116.037150] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Dictyostelium discoideum live in the soil as single cells, engulfing bacteria and growing vegetatively. Upon starvation, tens of thousands of amoebae enter a developmental program that includes aggregation, multicellular differentiation, and sporulation. Major shifts across the protein-coding transcriptome accompany these developmental changes. However, no study has presented a global survey of long noncoding RNAs (ncRNAs) in D. discoideum To characterize the antisense and long intergenic noncoding RNA (lncRNA) transcriptome, we analyzed previously published developmental time course samples using an RNA-sequencing (RNA-seq) library preparation method that selectively depletes ribosomal RNAs (rRNAs). We detected the accumulation of transcripts for 9833 protein-coding messenger RNAs (mRNAs), 621 lncRNAs, and 162 putative antisense RNAs (asRNAs). The noncoding RNAs were interspersed throughout the genome, and were distinct in expression level, length, and nucleotide composition. The noncoding transcriptome displayed a temporal profile similar to the coding transcriptome, with stages of gradual change interspersed with larger leaps. The transcription profiles of some noncoding RNAs were strongly correlated with known differentially expressed coding RNAs, hinting at a functional role for these molecules during development. Examining the mitochondrial transcriptome, we modeled two novel antisense transcripts. We applied yet another ribosomal depletion method to a subset of the samples to better retain transfer RNA (tRNA) transcripts. We observed polymorphisms in tRNA anticodons that suggested a post-transcriptional means by which D. discoideum compensates for codons missing in the genomic complement of tRNAs. We concluded that the prevalence and characteristics of long ncRNAs indicate that these molecules are relevant to the progression of molecular and cellular phenotypes during development.
Collapse
|
7
|
Siqueira FM, de Morais GL, Higashi S, Beier LS, Breyer GM, de Sá Godinho CP, Sagot MF, Schrank IS, Zaha A, de Vasconcelos ATR. Mycoplasma non-coding RNA: identification of small RNAs and targets. BMC Genomics 2016; 17:743. [PMID: 27801290 PMCID: PMC5088518 DOI: 10.1186/s12864-016-3061-z] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Background Bacterial non-coding RNAs act by base-pairing as regulatory elements in crucial biological processes. We performed the identification of trans-encoded small RNAs (sRNA) from the genomes of Mycoplama hyopneumoniae, Mycoplasma flocculare and Mycoplasma hyorhinis, which are Mycoplasma species that have been identified in the porcine respiratory system. Results A total of 47, 15 and 11 putative sRNAs were predicted in M. hyopneumoniae, M. flocculare and M. hyorhinis, respectively. A comparative genomic analysis revealed the presence of species or lineage specific sRNA candidates. Furthermore, the expression profile of some M. hyopneumoniae sRNAs was determined by a reverse transcription amplification approach, in three different culture conditions. All tested sRNAs were transcribed in at least one condition. A detailed investigation revealed a differential expression profile for two M. hyopneumoniae sRNAs in response to oxidative and heat shock stress conditions, suggesting that their expression is influenced by environmental signals. Moreover, we analyzed sRNA-mRNA hybrids and accessed putative target genes for the novel sRNA candidates. The majority of the sRNAs showed interaction with multiple target genes, some of which could be linked to pathogenesis and cell homeostasis activity. Conclusion This study contributes to our knowledge of Mycoplasma sRNAs and their response to environmental changes. Furthermore, the mRNA target prediction provides a perspective for the characterization and comprehension of the function of the sRNA regulatory mechanisms. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3061-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Franciele Maboni Siqueira
- Centro de Biotecnologia (CBiot), Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, Rio Grande do Sul, Brazil
| | - Guilherme Loss de Morais
- Laboratório Nacional de Computação Científica (LNCC), Laboratório de Bioinformática (LABINFO), Petrópolis, Rio de Janeiro, Brazil
| | - Susan Higashi
- Inria Grenoble Rhône-Alpes, 38330, Montbonnot Saint-Martin, France.,Université Lyon 1, Villeurbanne, France.,CNRS, UMR5558, Laboratoire de Biométrie et Biologie Évolutive, F-69622, Villeurbanne, France
| | - Laura Scherer Beier
- Centro de Biotecnologia (CBiot), Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, Rio Grande do Sul, Brazil
| | - Gabriela Merker Breyer
- Centro de Biotecnologia (CBiot), Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, Rio Grande do Sul, Brazil
| | - Caio Padoan de Sá Godinho
- Laboratório Nacional de Computação Científica (LNCC), Laboratório de Bioinformática (LABINFO), Petrópolis, Rio de Janeiro, Brazil
| | - Marie-France Sagot
- Inria Grenoble Rhône-Alpes, 38330, Montbonnot Saint-Martin, France.,Université Lyon 1, Villeurbanne, France.,CNRS, UMR5558, Laboratoire de Biométrie et Biologie Évolutive, F-69622, Villeurbanne, France
| | - Irene Silveira Schrank
- Centro de Biotecnologia (CBiot), Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, Rio Grande do Sul, Brazil
| | - Arnaldo Zaha
- Centro de Biotecnologia (CBiot), Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, Rio Grande do Sul, Brazil.
| | | |
Collapse
|
8
|
Lertampaiporn S, Thammarongtham C, Nukoolkit C, Kaewkamnerdpong B, Ruengjitchatchawalya M. Identification of non-coding RNAs with a new composite feature in the Hybrid Random Forest Ensemble algorithm. Nucleic Acids Res 2014; 42:e93. [PMID: 24771344 PMCID: PMC4066759 DOI: 10.1093/nar/gku325] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2014] [Revised: 04/02/2014] [Accepted: 04/07/2014] [Indexed: 12/13/2022] Open
Abstract
To identify non-coding RNA (ncRNA) signals within genomic regions, a classification tool was developed based on a hybrid random forest (RF) with a logistic regression model to efficiently discriminate short ncRNA sequences as well as long complex ncRNA sequences. This RF-based classifier was trained on a well-balanced dataset with a discriminative set of features and achieved an accuracy, sensitivity and specificity of 92.11%, 90.7% and 93.5%, respectively. The selected feature set includes a new proposed feature, SCORE. This feature is generated based on a logistic regression function that combines five significant features-structure, sequence, modularity, structural robustness and coding potential-to enable improved characterization of long ncRNA (lncRNA) elements. The use of SCORE improved the performance of the RF-based classifier in the identification of Rfam lncRNA families. A genome-wide ncRNA classification framework was applied to a wide variety of organisms, with an emphasis on those of economic, social, public health, environmental and agricultural significance, such as various bacteria genomes, the Arthrospira (Spirulina) genome, and rice and human genomic regions. Our framework was able to identify known ncRNAs with sensitivities of greater than 90% and 77.7% for prokaryotic and eukaryotic sequences, respectively. Our classifier is available at http://ncrna-pred.com/HLRF.htm.
Collapse
Affiliation(s)
- Supatcha Lertampaiporn
- Biological Engineering Program, Faculty of Engineering, King Mongkut's University of Technology Thonburi, 126 Pracha Uthit Rd, Bangmod, Thung Khru, Bangkok 10140, Thailand
| | - Chinae Thammarongtham
- Biochemical Engineering and Pilot Plant Research and Development Unit, National Center for Genetic Engineering and Biotechnology at King Mongkut's University of Technology Thonburi (Bang Khun Thian Campus), 49 Soi Thian Thale 25, Bang Khun Thian Chai Thale Rd, Tha Kham, Bangkok 10150, Thailand
| | - Chakarida Nukoolkit
- School of Information Technology, King Mongkut's University of Technology Thonburi, 126 Pracha Uthit Rd, Bangmod, Thung Khru, Bangkok 10140, Thailand
| | - Boonserm Kaewkamnerdpong
- Biological Engineering Program, Faculty of Engineering, King Mongkut's University of Technology Thonburi, 126 Pracha Uthit Rd, Bangmod, Thung Khru, Bangkok 10140, Thailand
| | - Marasri Ruengjitchatchawalya
- Biotechnology Program, School of Bioresources and Technology, King Mongkut's University of Technology Thonburi (Bang Khun Thian Campus), 49 Soi Thian Thale 25, Bang Khun Thian Chai Thale Rd, Tha Kham, Bangkok 10150, Thailand Bioinformatics and Systems Biology Program, King Mongkut's University of Technology Thonburi (Bang Khun Thian Campus), 49 Soi Thian Thale 25, Bang Khun Thian Chai Thale Rd, Tha Kham, Bangkok 10150, Thailand
| |
Collapse
|
9
|
Avesson L, Reimegård J, Wagner EGH, Söderbom F. MicroRNAs in Amoebozoa: deep sequencing of the small RNA population in the social amoeba Dictyostelium discoideum reveals developmentally regulated microRNAs. RNA (NEW YORK, N.Y.) 2012; 18:1771-1782. [PMID: 22875808 PMCID: PMC3446702 DOI: 10.1261/rna.033175.112] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/09/2012] [Accepted: 06/11/2012] [Indexed: 06/01/2023]
Abstract
The RNA interference machinery has served as a guardian of eukaryotic genomes since the divergence from prokaryotes. Although the basic components have a shared origin, silencing pathways directed by small RNAs have evolved in diverse directions in different eukaryotic lineages. Micro (mi)RNAs regulate protein-coding genes and play vital roles in plants and animals, but less is known about their functions in other organisms. Here, we report, for the first time, deep sequencing of small RNAs from the social amoeba Dictyostelium discoideum. RNA from growing single-cell amoebae as well as from two multicellular developmental stages was sequenced. Computational analyses combined with experimental data reveal the expression of miRNAs, several of them exhibiting distinct expression patterns during development. To our knowledge, this is the first report of miRNAs in the Amoebozoa supergroup. We also show that overexpressed miRNA precursors generate miRNAs and, in most cases, miRNA* sequences, whose biogenesis is dependent on the Dicer-like protein DrnB, further supporting the presence of miRNAs in D. discoideum. In addition, we find miRNAs processed from hairpin structures originating from an intron as well as from a class of repetitive elements. We believe that these repetitive elements are sources for newly evolved miRNAs.
Collapse
Affiliation(s)
- Lotta Avesson
- Department of Molecular Biology, Biomedical Center, Swedish University of Agricultural Sciences, S-75124 Uppsala, Sweden
| | - Johan Reimegård
- Department of Cell and Molecular Biology, Biomedical Center, Uppsala University, S-75124 Uppsala, Sweden
| | - E. Gerhart H. Wagner
- Department of Cell and Molecular Biology, Biomedical Center, Uppsala University, S-75124 Uppsala, Sweden
- Science for Life Laboratory, SE-75124 Uppsala, Sweden
| | - Fredrik Söderbom
- Department of Molecular Biology, Biomedical Center, Swedish University of Agricultural Sciences, S-75124 Uppsala, Sweden
- Science for Life Laboratory, SE-75124 Uppsala, Sweden
| |
Collapse
|
10
|
Panneerselvam P, Bawankar P, Kulkarni S, Patankar S. In Silico Prediction of Evolutionarily Conserved GC-Rich Elements Associated with Antigenic Proteins of Plasmodium falciparum. Evol Bioinform Online 2011; 7:235-55. [PMID: 22375094 PMCID: PMC3283219 DOI: 10.4137/ebo.s8162] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
The Plasmodium falciparum genome being AT-rich, the presence of GC-rich regions suggests functional significance. Evolution imposes selection pressure to retain functionally important coding and regulatory elements. Hence searching for evolutionarily conserved GC-rich, intergenic regions in an AT-rich genome will help in discovering new coding regions and regulatory elements. We have used elevated GC content in intergenic regions coupled with sequence conservation against P. reichenowi, which is evolutionarily closely related to P. falciparum to identify potential sequences of functional importance. Interestingly, ~30% of the GC-rich, conserved sequences were associated with antigenic proteins encoded by var and rifin genes. The majority of sequences identified in the 5′ UTR of var genes are represented by short expressed sequence tags (ESTs) in cDNA libraries signifying that they are transcribed in the parasite. Additionally, 19 sequences were located in the 3′ UTR of rifins and 4 also have overlapping ESTs. Further analysis showed that several sequences associated with var genes have the capacity to encode small peptides. A previous report has shown that upstream peptides can regulate the expression of var genes hence we propose that these conserved GC-rich sequences may play roles in regulation of gene expression.
Collapse
Affiliation(s)
- Porkodi Panneerselvam
- Centre for Biotechnology, Anna University, Sardar Patel Road, Guindy, Chennai 600025, India
| | | | | | | |
Collapse
|
11
|
Cros MJ, de Monte A, Mariette J, Bardou P, Grenier-Boley B, Gautheret D, Touzet H, Gaspin C. RNAspace.org: An integrated environment for the prediction, annotation, and analysis of ncRNA. RNA (NEW YORK, N.Y.) 2011; 17:1947-56. [PMID: 21947200 PMCID: PMC3198588 DOI: 10.1261/rna.2844911] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2011] [Accepted: 08/07/2011] [Indexed: 05/22/2023]
Abstract
The annotation of noncoding RNA genes remains a major bottleneck in genome sequencing projects. Most genome sequences released today still come with sets of tRNAs and rRNAs as the only annotated RNA elements, ignoring hundreds of other RNA families. We have developed a web environment that is dedicated to noncoding RNA (ncRNA) prediction, annotation, and analysis and allows users to run a variety of tools in an integrated and flexible manner. This environment offers complementary ncRNA gene finders and a set of tools for the comparison, visualization, editing, and export of ncRNA candidates. Predictions can be filtered according to a large set of characteristics. Based on this environment, we created a public website located at http://RNAspace.org. It accepts genomic sequences up to 5 Mb, which permits for an online annotation of a complete bacterial genome or a small eukaryotic chromosome. The project is hosted as a Source Forge project (http://rnaspace.sourceforge.net/).
Collapse
Affiliation(s)
| | - Antoine de Monte
- LIFL, UMR CNRS 8022 Université Lille 1 and INRIA Lille Nord Europe, 59655 Villeneuve d'Ascq cedex, France
| | - Jérôme Mariette
- INRA, Plateforme Bioinformatique, F-31320, UR 875, Castanet-Tolosan, France
| | | | - Benjamin Grenier-Boley
- LIFL, UMR CNRS 8022 Université Lille 1 and INRIA Lille Nord Europe, 59655 Villeneuve d'Ascq cedex, France
| | | | - Hélène Touzet
- LIFL, UMR CNRS 8022 Université Lille 1 and INRIA Lille Nord Europe, 59655 Villeneuve d'Ascq cedex, France
| | - Christine Gaspin
- INRA, UBIA, UR 875, F-31320 Castanet-Tolosan, France
- INRA, Plateforme Bioinformatique, F-31320, UR 875, Castanet-Tolosan, France
| |
Collapse
|
12
|
Abstract
Non-coding RNAs (ncRNAs) are receiving more and more attention not only as an abundant class of genes, but also as regulatory structural elements (some located in mRNAs). A key feature of RNA function is its structure. Computational methods were developed early for folding and prediction of RNA structure with the aim of assisting in functional analysis. With the discovery of more and more ncRNAs, it has become clear that a large fraction of these are highly structured. Interestingly, a large part of the structure is comprised of regular Watson-Crick and GU wobble base pairs. This and the increased amount of available genomes have made it possible to employ structure-based methods for genomic screens. The field has moved from folding prediction of single sequences to computational screens for ncRNAs in genomic sequence using the RNA structure as the main characteristic feature. Whereas early methods focused on energy-directed folding of single sequences, comparative analysis based on structure preserving changes of base pairs has been efficient in improving accuracy, and today this constitutes a key component in genomic screens. Here, we cover the basic principles of RNA folding and touch upon some of the concepts in current methods that have been applied in genomic screens for de novo RNA structures in searches for novel ncRNA genes and regulatory RNA structure on mRNAs. We discuss the strengths and weaknesses of the different strategies and how they can complement each other.
Collapse
|
13
|
Wong TKF, Lam TW, Sung WK, Yiu SM. Adjacent nucleotide dependence in ncRNA and order-1 SCFG for ncRNA identification. PLoS One 2010; 5. [PMID: 20927402 PMCID: PMC2946929 DOI: 10.1371/journal.pone.0012848] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2010] [Accepted: 08/25/2010] [Indexed: 12/31/2022] Open
Abstract
Background Non-coding RNAs (ncRNAs) are known to be involved in many critical biological processes, and identification of ncRNAs is an important task in biological research. A popular software, Infernal, is the most successful prediction tool and exhibits high sensitivity. The application of Infernal has been mainly focused on small suspected regions. We tried to apply Infernal on a chromosome level; the results have high sensitivity, yet contain many false positives. Further enhancing Infernal for chromosome level or genome wide study is desirable. Methodology Based on the conjecture that adjacent nucleotide dependence affects the stability of the secondary structure of an ncRNA, we first conduct a systematic study on human ncRNAs and find that adjacent nucleotide dependence in human ncRNA should be useful for identifying ncRNAs. We then incorporate this dependence in the SCFG model and develop a new order-1 SCFG model for identifying ncRNAs. Conclusions With respect to our experiments on human chromosomes, the proposed new model can eliminate more than 50% false positives reported by Infernal while maintaining the same sensitivity. The executable and the source code of programs are freely available at http://i.cs.hku.hk/~kfwong/order1scfg.
Collapse
Affiliation(s)
- Thomas K F Wong
- Department of Computer Science, The University of Hong Kong, Hong Kong, Special Administrative Region, People's Republic of China.
| | | | | | | |
Collapse
|
14
|
Gorodkin J, Hofacker IL, Torarinsson E, Yao Z, Havgaard JH, Ruzzo WL. De novo prediction of structured RNAs from genomic sequences. Trends Biotechnol 2009; 28:9-19. [PMID: 19942311 DOI: 10.1016/j.tibtech.2009.09.006] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2009] [Revised: 08/31/2009] [Accepted: 09/22/2009] [Indexed: 12/29/2022]
Abstract
Growing recognition of the numerous, diverse and important roles played by non-coding RNA in all organisms motivates better elucidation of these cellular components. Comparative genomics is a powerful tool for this task and is arguably preferable to any high-throughput experimental technology currently available, because evolutionary conservation highlights functionally important regions. Conserved secondary structure, rather than primary sequence, is the hallmark of many functionally important RNAs, because compensatory substitutions in base-paired regions preserve structure. Unfortunately, such substitutions also obscure sequence identity and confound alignment algorithms, which complicates analysis greatly. This paper surveys recent computational advances in this difficult arena, which have enabled genome-scale prediction of cross-species conserved RNA elements. These predictions suggest that a wealth of these elements indeed exist.
Collapse
Affiliation(s)
- Jan Gorodkin
- Section for Genetics and Bioinformatics, IBHV and Center for Applied Bioinformatics, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark.
| | | | | | | | | | | |
Collapse
|
15
|
Tran TT, Zhou F, Marshburn S, Stead M, Kushner SR, Xu Y. De novo computational prediction of non-coding RNA genes in prokaryotic genomes. ACTA ACUST UNITED AC 2009; 25:2897-905. [PMID: 19744996 PMCID: PMC2773258 DOI: 10.1093/bioinformatics/btp537] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Motivation: The computational identification of non-coding RNA (ncRNA) genes represents one of the most important and challenging problems in computational biology. Existing methods for ncRNA gene prediction rely mostly on homology information, thus limiting their applications to ncRNA genes with known homologues. Results: We present a novel de novo prediction algorithm for ncRNA genes using features derived from the sequences and structures of known ncRNA genes in comparison to decoys. Using these features, we have trained a neural network-based classifier and have applied it to Escherichia coli and Sulfolobus solfataricus for genome-wide prediction of ncRNAs. Our method has an average prediction sensitivity and specificity of 68% and 70%, respectively, for identifying windows with potential for ncRNA genes in E.coli. By combining windows of different sizes and using positional filtering strategies, we predicted 601 candidate ncRNAs and recovered 41% of known ncRNAs in E.coli. We experimentally investigated six novel candidates using Northern blot analysis and found expression of three candidates: one represents a potential new ncRNA, one is associated with stable mRNA decay intermediates and one is a case of either a potential riboswitch or transcription attenuator involved in the regulation of cell division. In general, our approach enables the identification of both cis- and trans-acting ncRNAs in partially or completely sequenced microbial genomes without requiring homology or structural conservation. Availability: The source code and results are available at http://csbl.bmb.uga.edu/publications/materials/tran/. Contact:xyn@bmb.uga.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Thao T Tran
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| | | | | | | | | | | |
Collapse
|
16
|
Meyer MM, Ames TD, Smith DP, Weinberg Z, Schwalbach MS, Giovannoni SJ, Breaker RR. Identification of candidate structured RNAs in the marine organism 'Candidatus Pelagibacter ubique'. BMC Genomics 2009; 10:268. [PMID: 19531245 PMCID: PMC2704228 DOI: 10.1186/1471-2164-10-268] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2009] [Accepted: 06/16/2009] [Indexed: 02/04/2023] Open
Abstract
Background Metagenomic sequence data are proving to be a vast resource for the discovery of biological components. Yet analysis of this data to identify functional RNAs lags behind efforts to characterize protein diversity. The genome of 'Candidatus Pelagibacter ubique' HTCC 1062 is the closest match for approximately 20% of marine metagenomic sequence reads. It is also small, contains little non-coding DNA, and has strikingly low GC content. Results To aid the discovery of RNA motifs within the marine metagenome we exploited the genomic properties of 'Cand. P. ubique' by targeting our search to long intergenic regions (IGRs) with relatively high GC content. Analysis of known RNAs (rRNA, tRNA, riboswitches etc.) shows that structured RNAs are significantly enriched in such IGRs. To identify additional candidate structured RNAs, we examined other IGRs with similar characteristics from 'Cand. P. ubique' using comparative genomics approaches in conjunction with marine metagenomic data. Employing this strategy, we discovered four candidate structured RNAs including a new riboswitch class as well as three additional likely cis-regulatory elements that precede genes encoding ribosomal proteins S2 and S12, and the cytoplasmic protein component of the signal recognition particle. We also describe four additional potential RNA motifs with few or no examples occurring outside the metagenomic data. Conclusion This work begins the process of identifying functional RNA motifs present in the metagenomic data and illustrates how existing completed genomes may be used to aid in this task.
Collapse
Affiliation(s)
- Michelle M Meyer
- Department of Molecular Cellular and Developmental Biology, Yale University, New Haven, CT 06520, USA.
| | | | | | | | | | | | | |
Collapse
|