1
|
González-Tortuero E, Anthon C, Havgaard JH, Geissler AS, Breüner A, Hjort C, Gorodkin J, Seemann SE. The Bacillaceae-1 RNA motif comprises two distinct classes. Gene 2022; 841:146756. [PMID: 35905857 DOI: 10.1016/j.gene.2022.146756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Revised: 06/10/2022] [Accepted: 07/24/2022] [Indexed: 11/04/2022]
Abstract
Non-coding RNAs are key regulatory players in bacteria. Many computationally predicted non-coding RNAs, however, lack functional associations. An example is the Bacillaceae-1 RNA motif, whose Rfam model consists of two hairpin loops. We find the motif conserved in nine of 13 non-pathogenic strains of the genus Bacillus but only in one pathogenic strain. To elucidate functional characteristics, we studied 118 hits of the Rfam model in 11 Bacillus spp. and found two distinct classes based on the ensemble diversity of their RNA secondary structure and the genomic context concerning the ribosomal RNA (rRNA) cluster. Forty hits are associated with the rRNA cluster, of which all 19 hits upstream flanking of 16S rRNA have a reverse complementary structure of low structural diversity. Fifty-two hits have large ensemble diversity, of which 38 are located between two coding genes. For eight hits in Bacillus subtilis, we investigated public expression data under various conditions and observed either the forward or the reverse complementary motif expressed. Five hits are associated with the rRNA cluster. Four of them are located upstream of the 16S rRNA and are not transcriptionally active, but instead, their reverse complements with low structural diversity are expressed together with the rRNA cluster. The three other hits are located between two coding genes in non-conserved genomic loci. Two of them are independently expressed from their surrounding genes and are structurally diverse. In summary, we found that Bacillaceae-1 RNA motifs upstream flanking of ribosomal RNA clusters tend to have one stable structure with the reverse complementary motif expressed in B. subtilis. In contrast, a subgroup of intergenic motifs has the thermodynamic potential for structural switches.
Collapse
Affiliation(s)
- Enrique González-Tortuero
- Center for non-coding RNA in Technology and Health (RTH), Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Christian Anthon
- Center for non-coding RNA in Technology and Health (RTH), Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Jakob H Havgaard
- Center for non-coding RNA in Technology and Health (RTH), Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Adrian S Geissler
- Center for non-coding RNA in Technology and Health (RTH), Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, Denmark
| | | | | | - Jan Gorodkin
- Center for non-coding RNA in Technology and Health (RTH), Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, Denmark.
| | - Stefan E Seemann
- Center for non-coding RNA in Technology and Health (RTH), Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, Denmark.
| |
Collapse
|
2
|
Jacobsen MJ, Havgaard JH, Anthon C, Mentzel CMJ, Cirera S, Krogh PM, Pundhir S, Karlskov-Mortensen P, Bruun CS, Lesnik P, Guerin M, Gorodkin J, Jørgensen CB, Fredholm M, Barrès R. Epigenetic and Transcriptomic Characterization of Pure Adipocyte Fractions From Obese Pigs Identifies Candidate Pathways Controlling Metabolism. Front Genet 2019; 10:1268. [PMID: 31921306 PMCID: PMC6927937 DOI: 10.3389/fgene.2019.01268] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Accepted: 11/18/2019] [Indexed: 12/11/2022] Open
Abstract
Reprogramming of adipocyte function in obesity is implicated in metabolic disorders like type 2 diabetes. Here, we used the pig, an animal model sharing many physiological and pathophysiological similarities with humans, to perform in-depth epigenomic and transcriptomic characterization of pure adipocyte fractions. Using a combined DNA methylation capture sequencing and Reduced Representation bisulfite sequencing (RRBS) strategy in 11 lean and 12 obese pigs, we identified in 3529 differentially methylated regions (DMRs) located at close proximity to-, or within genes in the adipocytes. By sequencing of the transcriptome from the same fraction of isolated adipocytes, we identified 276 differentially expressed transcripts with at least one or more DMR. These transcripts were over-represented in gene pathways related to MAPK, metabolic and insulin signaling. Using a candidate gene approach, we further characterized 13 genes potentially regulated by DNA methylation and identified putative transcription factor binding sites that could be affected by the differential methylation in obesity. Our data constitute a valuable resource for further investigations aiming to delineate the epigenetic etiology of metabolic disorders.
Collapse
Affiliation(s)
- Mette Juul Jacobsen
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Jakob H Havgaard
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Christian Anthon
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Caroline M Junker Mentzel
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Susanna Cirera
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Poula Maltha Krogh
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Sachin Pundhir
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Peter Karlskov-Mortensen
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Camilla S Bruun
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Philippe Lesnik
- Institute of Cardiometabolism and Nutrition (ICAN), Pierre and Marie Curie University, Pitié-Salpetrière Hospital, Paris, France
| | - Maryse Guerin
- Institute of Cardiometabolism and Nutrition (ICAN), Pierre and Marie Curie University, Pitié-Salpetrière Hospital, Paris, France
| | - Jan Gorodkin
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Claus B Jørgensen
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Merete Fredholm
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Romain Barrès
- Novo Nordisk Foundation Centre for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
3
|
Sundfeld D, Havgaard JH, de Melo ACMA, Gorodkin J. Foldalign 2.5: multithreaded implementation for pairwise structural RNA alignment. Bioinformatics 2015; 32:1238-40. [PMID: 26704597 PMCID: PMC4824132 DOI: 10.1093/bioinformatics/btv748] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2015] [Accepted: 12/16/2015] [Indexed: 11/13/2022] Open
Abstract
Motivation: Structured RNAs can be hard to search for as they often are not well conserved in their primary structure and are local in their genomic or transcriptomic context. Thus, the need for tools which in particular can make local structural alignments of RNAs is only increasing. Results: To meet the demand for both large-scale screens and hands on analysis through web servers, we present a new multithreaded version of Foldalign. We substantially improve execution time while maintaining all previous functionalities, including carrying out local structural alignments of sequences with low similarity. Furthermore, the improvements allow for comparing longer RNAs and increasing the sequence length. For example, lengths in the range 2000–6000 nucleotides improve execution up to a factor of five. Availability and implementation: The Foldalign software and the web server are available at http://rth.dk/resources/foldalign Contact:gorodkin@rth.dk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Daniel Sundfeld
- Center for Non-Coding RNA in Technology and Health, IKVH, University of Copenhagen, Frederiksberg, Denmark and Department of Computer Science, University of Brasilia, Brasília, DF, Brazil
| | - Jakob H Havgaard
- Center for Non-Coding RNA in Technology and Health, IKVH, University of Copenhagen, Frederiksberg, Denmark and
| | - Alba C M A de Melo
- Department of Computer Science, University of Brasilia, Brasília, DF, Brazil
| | - Jan Gorodkin
- Center for Non-Coding RNA in Technology and Health, IKVH, University of Copenhagen, Frederiksberg, Denmark and
| |
Collapse
|
4
|
Hecker N, Christensen-Dalsgaard M, Seemann SE, Havgaard JH, Stadler PF, Hofacker IL, Nielsen H, Gorodkin J. Optimizing RNA structures by sequence extensions using RNAcop. Nucleic Acids Res 2015; 43:8135-45. [PMID: 26283181 PMCID: PMC4787817 DOI: 10.1093/nar/gkv813] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2015] [Revised: 07/28/2015] [Accepted: 07/30/2015] [Indexed: 12/26/2022] Open
Abstract
A key aspect of RNA secondary structure prediction is the identification of novel functional elements. This is a challenging task because these elements typically are embedded in longer transcripts where the borders between the element and flanking regions have to be defined. The flanking sequences impact the folding of the functional elements both at the level of computational analyses and when the element is extracted as a transcript for experimental analysis. Here, we analyze how different flanking region lengths impact folding into a constrained structure by computing probabilities of folding for different sizes of flanking regions. Our method, RNAcop (RNA context optimization by probability), is tested on known and de novo predicted structures. In vitro experiments support the computational analysis and suggest that for a number of structures, choosing proper lengths of flanking regions is critical. RNAcop is available as web server and stand-alone software via http://rth.dk/resources/rnacop.
Collapse
Affiliation(s)
- Nikolai Hecker
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Grønnegårdsvej 3, 1870 Frederiksberg C, Denmark Department of Veterinary Clinical and Animal Science, University of Copenhagen, Grønnegårdsvej 3, 1870 Frederiksberg C, Denmark
| | - Mikkel Christensen-Dalsgaard
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Grønnegårdsvej 3, 1870 Frederiksberg C, Denmark Department of Cellular and Molecular Medicine, Panum Institute, University of Copenhagen, Bledgamsvej 3, 2200 Copenhagen N, Denmark
| | - Stefan E Seemann
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Grønnegårdsvej 3, 1870 Frederiksberg C, Denmark Department of Veterinary Clinical and Animal Science, University of Copenhagen, Grønnegårdsvej 3, 1870 Frederiksberg C, Denmark
| | - Jakob H Havgaard
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Grønnegårdsvej 3, 1870 Frederiksberg C, Denmark Department of Veterinary Clinical and Animal Science, University of Copenhagen, Grønnegårdsvej 3, 1870 Frederiksberg C, Denmark
| | - Peter F Stadler
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Grønnegårdsvej 3, 1870 Frederiksberg C, Denmark Bioinformatics Group, Department of Computer Science & IZBI-Interdisciplinary Center for Bioinformatics & LIFE-Leipzig Research Center for Civilization Diseases, University Leipzig, Härtelstraße 16-18, 04107 Leipzig, Germany Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, 1090 Wien, Austria Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, 04103 Leipzig, Germany Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA
| | - Ivo L Hofacker
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Grønnegårdsvej 3, 1870 Frederiksberg C, Denmark Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, 1090 Wien, Austria
| | - Henrik Nielsen
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Grønnegårdsvej 3, 1870 Frederiksberg C, Denmark Department of Cellular and Molecular Medicine, Panum Institute, University of Copenhagen, Bledgamsvej 3, 2200 Copenhagen N, Denmark
| | - Jan Gorodkin
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Grønnegårdsvej 3, 1870 Frederiksberg C, Denmark Department of Veterinary Clinical and Animal Science, University of Copenhagen, Grønnegårdsvej 3, 1870 Frederiksberg C, Denmark
| |
Collapse
|
5
|
Anthon C, Tafer H, Havgaard JH, Thomsen B, Hedegaard J, Seemann SE, Pundhir S, Kehr S, Bartschat S, Nielsen M, Nielsen RO, Fredholm M, Stadler PF, Gorodkin J. Structured RNAs and synteny regions in the pig genome. BMC Genomics 2014; 15:459. [PMID: 24917120 PMCID: PMC4124155 DOI: 10.1186/1471-2164-15-459] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2013] [Accepted: 05/02/2014] [Indexed: 11/25/2022] Open
Abstract
Background Annotating mammalian genomes for noncoding RNAs (ncRNAs) is nontrivial since far from all ncRNAs are known and the computational models are resource demanding. Currently, the human genome holds the best mammalian ncRNA annotation, a result of numerous efforts by several groups. However, a more direct strategy is desired for the increasing number of sequenced mammalian genomes of which some, such as the pig, are relevant as disease models and production animals. Results We present a comprehensive annotation of structured RNAs in the pig genome. Combining sequence and structure similarity search as well as class specific methods, we obtained a conservative set with a total of 3,391 structured RNA loci of which 1,011 and 2,314, respectively, hold strong sequence and structure similarity to structured RNAs in existing databases. The RNA loci cover 139 cis-regulatory element loci, 58 lncRNA loci, 11 conflicts of annotation, and 3,183 ncRNA genes. The ncRNA genes comprise 359 miRNAs, 8 ribozymes, 185 rRNAs, 638 snoRNAs, 1,030 snRNAs, 810 tRNAs and 153 ncRNA genes not belonging to the here fore mentioned classes. When running the pipeline on a local shuffled version of the genome, we obtained no matches at the highest confidence level. Additional analysis of RNA-seq data from a pooled library from 10 different pig tissues added another 165 miRNA loci, yielding an overall annotation of 3,556 structured RNA loci. This annotation represents our best effort at making an automated annotation. To further enhance the reliability, 571 of the 3,556 structured RNAs were manually curated by methods depending on the RNA class while 1,581 were declared as pseudogenes. We further created a multiple alignment of pig against 20 representative vertebrates, from which RNAz predicted 83,859 de novo RNA loci with conserved RNA structures. 528 of the RNAz predictions overlapped with the homology based annotation or novel miRNAs. We further present a substantial synteny analysis which includes 1,004 lineage specific de novo RNA loci and 4 ncRNA loci in the known annotation specific for Laurasiatheria (pig, cow, dolphin, horse, cat, dog, hedgehog). Conclusions We have obtained one of the most comprehensive annotations for structured ncRNAs of a mammalian genome, which is likely to play central roles in both health modelling and production. The core annotation is available in Ensembl 70 and the complete annotation is available at
http://rth.dk/resources/rnannotator/susscr102/version1.02. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-459) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | | | - Jan Gorodkin
- Center for non-coding RNA in Technology and Health, University of Copenhagen, DK-1870 Frederiksberg, Denmark.
| |
Collapse
|
6
|
Gorodkin J, Hofacker IL, Torarinsson E, Yao Z, Havgaard JH, Ruzzo WL. De novo prediction of structured RNAs from genomic sequences. Trends Biotechnol 2009; 28:9-19. [PMID: 19942311 DOI: 10.1016/j.tibtech.2009.09.006] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2009] [Revised: 08/31/2009] [Accepted: 09/22/2009] [Indexed: 12/29/2022]
Abstract
Growing recognition of the numerous, diverse and important roles played by non-coding RNA in all organisms motivates better elucidation of these cellular components. Comparative genomics is a powerful tool for this task and is arguably preferable to any high-throughput experimental technology currently available, because evolutionary conservation highlights functionally important regions. Conserved secondary structure, rather than primary sequence, is the hallmark of many functionally important RNAs, because compensatory substitutions in base-paired regions preserve structure. Unfortunately, such substitutions also obscure sequence identity and confound alignment algorithms, which complicates analysis greatly. This paper surveys recent computational advances in this difficult arena, which have enabled genome-scale prediction of cross-species conserved RNA elements. These predictions suggest that a wealth of these elements indeed exist.
Collapse
Affiliation(s)
- Jan Gorodkin
- Section for Genetics and Bioinformatics, IBHV and Center for Applied Bioinformatics, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark.
| | | | | | | | | | | |
Collapse
|
7
|
Gorodkin J, Cirera S, Hedegaard J, Gilchrist MJ, Panitz F, Jørgensen C, Scheibye-Knudsen K, Arvin T, Lumholdt S, Sawera M, Green T, Nielsen BJ, Havgaard JH, Rosenkilde C, Wang J, Li H, Li R, Liu B, Hu S, Dong W, Li W, Yu J, Wang J, Stærfeldt HH, Wernersson R, Madsen LB, Thomsen B, Hornshøj H, Bujie Z, Wang X, Wang X, Bolund L, Brunak S, Yang H, Bendixen C, Fredholm M. Porcine transcriptome analysis based on 97 non-normalized cDNA libraries and assembly of 1,021,891 expressed sequence tags. Genome Biol 2007; 8:R45. [PMID: 17407547 PMCID: PMC1895994 DOI: 10.1186/gb-2007-8-4-r45] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2006] [Revised: 01/18/2007] [Accepted: 04/02/2007] [Indexed: 12/05/2022] Open
Abstract
A resource consisting of one million porcine ESTs is described, providing an essential resource for annotation, comparative genomics, assembly of the pig genome sequence, and further porcine transcription studies. Background Knowledge of the structure of gene expression is essential for mammalian transcriptomics research. We analyzed a collection of more than one million porcine expressed sequence tags (ESTs), of which two-thirds were generated in the Sino-Danish Pig Genome Project and one-third are from public databases. The Sino-Danish ESTs were generated from one normalized and 97 non-normalized cDNA libraries representing 35 different tissues and three developmental stages. Results Using the Distiller package, the ESTs were assembled to roughly 48,000 contigs and 73,000 singletons, of which approximately 25% have a high confidence match to UniProt. Approximately 6,000 new porcine gene clusters were identified. Expression analysis based on the non-normalized libraries resulted in the following findings. The distribution of cluster sizes is scaling invariant. Brain and testes are among the tissues with the greatest number of different expressed genes, whereas tissues with more specialized function, such as developing liver, have fewer expressed genes. There are at least 65 high confidence housekeeping gene candidates and 876 cDNA library-specific gene candidates. We identified differential expression of genes between different tissues, in particular brain/spinal cord, and found patterns of correlation between genes that share expression in pairs of libraries. Finally, there was remarkable agreement in expression between specialized tissues according to Gene Ontology categories. Conclusion This EST collection, the largest to date in pig, represents an essential resource for annotation, comparative genomics, assembly of the pig genome sequence, and further porcine transcription studies.
Collapse
Affiliation(s)
- Jan Gorodkin
- Division of Genetics and Bioinformatics, IBHV, Grønnegärdsvej 3, The Royal Veterinary and Agricultural University, DK-1870 Frederiksberg C, Denmark
| | - Susanna Cirera
- Division of Genetics and Bioinformatics, IBHV, Grønnegärdsvej 3, The Royal Veterinary and Agricultural University, DK-1870 Frederiksberg C, Denmark
| | - Jakob Hedegaard
- Department of Genetics and Biotechnology, Danish Institute of Agricultural Sciences, Blichers Alle, DK-8830 Tjele, Denmark
| | - Michael J Gilchrist
- The Wellcome Trust/Cancer Research UK Gurdon Institute, Cambridge, CB2 1QN, UK
| | - Frank Panitz
- Department of Genetics and Biotechnology, Danish Institute of Agricultural Sciences, Blichers Alle, DK-8830 Tjele, Denmark
| | - Claus Jørgensen
- Division of Genetics and Bioinformatics, IBHV, Grønnegärdsvej 3, The Royal Veterinary and Agricultural University, DK-1870 Frederiksberg C, Denmark
| | - Karsten Scheibye-Knudsen
- Division of Genetics and Bioinformatics, IBHV, Grønnegärdsvej 3, The Royal Veterinary and Agricultural University, DK-1870 Frederiksberg C, Denmark
| | - Troels Arvin
- Division of Genetics and Bioinformatics, IBHV, Grønnegärdsvej 3, The Royal Veterinary and Agricultural University, DK-1870 Frederiksberg C, Denmark
| | - Steen Lumholdt
- Division of Genetics and Bioinformatics, IBHV, Grønnegärdsvej 3, The Royal Veterinary and Agricultural University, DK-1870 Frederiksberg C, Denmark
| | - Milena Sawera
- Division of Genetics and Bioinformatics, IBHV, Grønnegärdsvej 3, The Royal Veterinary and Agricultural University, DK-1870 Frederiksberg C, Denmark
| | - Trine Green
- Division of Genetics and Bioinformatics, IBHV, Grønnegärdsvej 3, The Royal Veterinary and Agricultural University, DK-1870 Frederiksberg C, Denmark
| | - Bente J Nielsen
- Division of Genetics and Bioinformatics, IBHV, Grønnegärdsvej 3, The Royal Veterinary and Agricultural University, DK-1870 Frederiksberg C, Denmark
| | - Jakob H Havgaard
- Division of Genetics and Bioinformatics, IBHV, Grønnegärdsvej 3, The Royal Veterinary and Agricultural University, DK-1870 Frederiksberg C, Denmark
| | - Carina Rosenkilde
- Division of Genetics and Bioinformatics, IBHV, Grønnegärdsvej 3, The Royal Veterinary and Agricultural University, DK-1870 Frederiksberg C, Denmark
| | - Jun Wang
- Beijing Genomics Institute, The Airport Industrial Road, Beijing 101300, PR China
- Institute of Human Genetics, University of Aarhus, Nordre Ringgade 1, DK-8000 Aarhus C, Denmark
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Campus Vej 55, DK-5230 Odense M, Denmark
| | - Heng Li
- Beijing Genomics Institute, The Airport Industrial Road, Beijing 101300, PR China
- Institute of Human Genetics, University of Aarhus, Nordre Ringgade 1, DK-8000 Aarhus C, Denmark
| | - Ruiqiang Li
- Beijing Genomics Institute, The Airport Industrial Road, Beijing 101300, PR China
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Campus Vej 55, DK-5230 Odense M, Denmark
| | - Bin Liu
- Beijing Genomics Institute, The Airport Industrial Road, Beijing 101300, PR China
| | - Songnian Hu
- Beijing Genomics Institute, The Airport Industrial Road, Beijing 101300, PR China
| | - Wei Dong
- Beijing Genomics Institute, The Airport Industrial Road, Beijing 101300, PR China
| | - Wei Li
- Beijing Genomics Institute, The Airport Industrial Road, Beijing 101300, PR China
| | - Jun Yu
- Beijing Genomics Institute, The Airport Industrial Road, Beijing 101300, PR China
| | - Jian Wang
- Beijing Genomics Institute, The Airport Industrial Road, Beijing 101300, PR China
| | - Hans-Henrik Stærfeldt
- Center for Biological Sequence Analysis, BioCentrum-DTU, Building 208, DK-2800 Lyngby, Denmark
| | - Rasmus Wernersson
- Center for Biological Sequence Analysis, BioCentrum-DTU, Building 208, DK-2800 Lyngby, Denmark
| | - Lone B Madsen
- Department of Genetics and Biotechnology, Danish Institute of Agricultural Sciences, Blichers Alle, DK-8830 Tjele, Denmark
| | - Bo Thomsen
- Department of Genetics and Biotechnology, Danish Institute of Agricultural Sciences, Blichers Alle, DK-8830 Tjele, Denmark
| | - Henrik Hornshøj
- Department of Genetics and Biotechnology, Danish Institute of Agricultural Sciences, Blichers Alle, DK-8830 Tjele, Denmark
| | - Zhan Bujie
- Department of Genetics and Biotechnology, Danish Institute of Agricultural Sciences, Blichers Alle, DK-8830 Tjele, Denmark
| | - Xuegang Wang
- Department of Genetics and Biotechnology, Danish Institute of Agricultural Sciences, Blichers Alle, DK-8830 Tjele, Denmark
| | - Xuefei Wang
- Department of Genetics and Biotechnology, Danish Institute of Agricultural Sciences, Blichers Alle, DK-8830 Tjele, Denmark
| | - Lars Bolund
- Beijing Genomics Institute, The Airport Industrial Road, Beijing 101300, PR China
- Institute of Human Genetics, University of Aarhus, Nordre Ringgade 1, DK-8000 Aarhus C, Denmark
| | - Søren Brunak
- Center for Biological Sequence Analysis, BioCentrum-DTU, Building 208, DK-2800 Lyngby, Denmark
| | - Huanming Yang
- Beijing Genomics Institute, The Airport Industrial Road, Beijing 101300, PR China
| | - Christian Bendixen
- Department of Genetics and Biotechnology, Danish Institute of Agricultural Sciences, Blichers Alle, DK-8830 Tjele, Denmark
| | - Merete Fredholm
- Division of Genetics and Bioinformatics, IBHV, Grønnegärdsvej 3, The Royal Veterinary and Agricultural University, DK-1870 Frederiksberg C, Denmark
| |
Collapse
|
8
|
Andersen ES, Lind-Thomsen A, Knudsen B, Kristensen SE, Havgaard JH, Torarinsson E, Larsen N, Zwieb C, Sestoft P, Kjems J, Gorodkin J. Semiautomated improvement of RNA alignments. RNA 2007; 13:1850-9. [PMID: 17804647 PMCID: PMC2040093 DOI: 10.1261/rna.215407] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
We have developed a semiautomated RNA sequence editor (SARSE) that integrates tools for analyzing RNA alignments. The editor highlights different properties of the alignment by color, and its integrated analysis tools prevent the introduction of errors when doing alignment editing. SARSE readily connects to external tools to provide a flexible semiautomatic editing environment. A new method, Pcluster, is introduced for dividing the sequences of an RNA alignment into subgroups with secondary structure differences. Pcluster was used to evaluate 574 seed alignments obtained from the Rfam database and we identified 71 alignments with significant prediction of inconsistent base pairs and 102 alignments with significant prediction of novel base pairs. Four RNA families were used to illustrate how SARSE can be used to manually or automatically correct the inconsistent base pairs detected by Pcluster: the mir-399 RNA, vertebrate telomase RNA (vert-TR), bacterial transfer-messenger RNA (tmRNA), and the signal recognition particle (SRP) RNA. The general use of the method is illustrated by the ability to accommodate pseudoknots and handle even large and divergent RNA families. The open architecture of the SARSE editor makes it a flexible tool to improve all RNA alignments with relatively little human intervention. Online documentation and software are available at (http://sarse.ku.dk).
Collapse
Affiliation(s)
- Ebbe S Andersen
- Department of Molecular Biology, University of Aarhus, Arhus C, Denmark
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
9
|
Panitz F, Stengaard H, Hornshøj H, Gorodkin J, Hedegaard J, Cirera S, Thomsen B, Madsen LB, Høj A, Vingborg RK, Zahn B, Wang X, Wang X, Wernersson R, Jørgensen CB, Scheibye-Knudsen K, Arvin T, Lumholdt S, Sawera M, Green T, Nielsen BJ, Havgaard JH, Brunak S, Fredholm M, Bendixen C. SNP mining porcine ESTs with MAVIANT, a novel tool for SNP evaluation and annotation. Bioinformatics 2007; 23:i387-91. [PMID: 17646321 DOI: 10.1093/bioinformatics/btm192] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
MOTIVATION Single nucleotide polymorphisms (SNPs) analysis is an important means to study genetic variation. A fast and cost-efficient approach to identify large numbers of novel candidates is the SNP mining of large scale sequencing projects. The increasing availability of sequence trace data in public repositories makes it feasible to evaluate SNP predictions on the DNA chromatogram level. MAVIANT, a platform-independent Multipurpose Alignment VIewing and Annotation Tool, provides DNA chromatogram and alignment views and facilitates evaluation of predictions. In addition, it supports direct manual annotation, which is immediately accessible and can be easily shared with external collaborators. RESULTS Large-scale SNP mining of polymorphisms bases on porcine EST sequences yielded more than 7900 candidate SNPs in coding regions (cSNPs), which were annotated relative to the human genome. Non-synonymous SNPs were analyzed for their potential effect on the protein structure/function using the PolyPhen and SIFT prediction programs. Predicted SNPs and annotations are stored in a web-based database. Using MAVIANT SNPs can visually be verified based on the DNA sequencing traces. A subset of candidate SNPs was selected for experimental validation by resequencing and genotyping. This study provides a web-based DNA chromatogram and contig browser that facilitates the evaluation and selection of candidate SNPs, which can be applied as genetic markers for genome wide genetic studies. AVAILABILITY The stand-alone version of MAVIANT program for local use is freely available under GPL license terms at http://snp.agrsci.dk/maviant. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Frank Panitz
- Department of Genetics and Biotechnology, Faculty of Agricultural Sciences, University of Aarhus, DK-8830 Tjele, Denmark
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Abstract
MOTIVATION An apparent paradox in computational RNA structure prediction is that many methods, in advance, require a multiple alignment of a set of related sequences, when searching for a common structure between them. However, such a multiple alignment is hard to obtain even for few sequences with low sequence similarity without simultaneously folding and aligning them. Furthermore, it is of interest to conduct a multiple alignment of RNA sequence candidates found from searching as few as two genomic sequences. RESULTS Here, based on the PMcomp program, we present a global multiple alignment program, foldalignM, which performs especially well on few sequences with low sequence similarity, and is comparable in performance with state of the art programs in general. In addition, it can cluster sequences based on sequence and structure similarity and output a multiple alignment for each cluster. Furthermore, preliminary results with local datasets indicate that the program is useful for post processing foldalign pairwise scans. AVAILABILITY The program foldalignM is implemented in JAVA and is, along with some accompanying PERL scripts, available at http://foldalign.ku.dk/
Collapse
Affiliation(s)
- Elfar Torarinsson
- Division of Genetics and Bioinformatics, IBHV and Center for Bioinformatics, University of Copenhagen, Frederiksberg C, Denmark
| | | | | |
Collapse
|
11
|
Gorodkin J, Havgaard JH, Ensterö M, Sawera M, Jensen P, Ohman M, Fredholm M. MicroRNA sequence motifs reveal asymmetry between the stem arms. Comput Biol Chem 2006; 30:249-54. [PMID: 16798093 DOI: 10.1016/j.compbiolchem.2006.04.006] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2006] [Accepted: 04/24/2006] [Indexed: 12/31/2022]
Abstract
The processing of micro RNAs (miRNAs) from their stemloop precursor have revealed asymmetry in the processing of the mature and its star sequence. Furthermore, the miRNA processing system between organism differ. To assess this at the sequence level we have investigated mature miRNAs in their genomic contexts. We have compared profiles of mature miRNAs within their genomic context of the 5' and 3' stemloop precursor arms and we find asymmetry between mature sequences of the 5' and 3' stemloop precursor arms. The main observation is that vertebrate organisms have a characteristic motif on the 5' arm which is in contrast to the 3' arm motif which mainly show the conserved U at the position of the mature start. Also the vertebrate 5' arm motif show a semi-conserved G 13 nucleotides upstream from the first position. We compared the 5' and 3' arm profiles using the average log likelihood ratio (ALLR) score, as defined by Wang and Stormo (2003) [Wang T., Stormo, G.D., 2003. Combining phylogenetic data with co-regulated genes to identify regulatory motifs. Bioinformatics 2369-2380.] and computing a p-value we find that the two profiles differs significantly in their 3' end where the 5' arm motif (in contrast to the 3' arm motif) has a semi-conserved GU rich region. Similar findings are also obtained for other organisms, such as fly, worm and plants. The observed similarities and differences between closely and distantly related organisms are discussed and related to current knowledge of miRNA processing.
Collapse
Affiliation(s)
- J Gorodkin
- Division of Genetics and Bioinformatics, IBHV and Center for Bioinformatics, Royal Veterinary and Agricultural University, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark.
| | | | | | | | | | | | | |
Collapse
|
12
|
Torarinsson E, Sawera M, Havgaard JH, Fredholm M, Gorodkin J. Thousands of corresponding human and mouse genomic regions unalignable in primary sequence contain common RNA structure. Genome Res 2006; 16:885-9. [PMID: 16751343 PMCID: PMC1484455 DOI: 10.1101/gr.5226606] [Citation(s) in RCA: 138] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Human and mouse genome sequences contain roughly 100,000 regions that are unalignable in primary sequence and neighbor corresponding alignable regions between both organisms. These pairs are generally assumed to be nonconserved, although the level of structural conservation between these has never been investigated. Owing to the limitations in computational methods, comparative genomics has been lacking the ability to compare such nonconserved sequence regions for conserved structural RNA elements. We have investigated the presence of structural RNA elements by conducting a local structural alignment, using FOLDALIGN, on a subset of these 100,000 corresponding regions and estimate that 1800 contain common RNA structures. Comparing our results with the recent mapping of transcribed fragments (transfrags) in human, we find that high-scoring candidates are twice as likely to be found in regions overlapped by transfrags than regions that are not overlapped by transfrags. To verify the coexpression between predicted candidates in human and mouse, we conducted expression studies by RT-PCR and Northern blotting on mouse candidates, which overlap with transfrags on human chromosome 20. RT-PCR results confirmed expression of 32 out of 36 candidates, whereas Northern blots confirmed four out of 12 candidates. Furthermore, many RT-PCR results indicate differential expression in different tissues. Hence, our findings suggest that there are corresponding regions between human and mouse, which contain expressed non-coding RNA sequences not alignable in primary sequence.
Collapse
Affiliation(s)
- Elfar Torarinsson
- Division of Genetics and Bioinformatics, IBHV, The Royal Veterinary and Agricultural University, 1870 Frederiksberg C, Denmark
- Department of Natural Sciences, The Royal Veterinary and Agricultural University, 1870 Frederiksberg C, Denmark
| | - Milena Sawera
- Division of Genetics and Bioinformatics, IBHV, The Royal Veterinary and Agricultural University, 1870 Frederiksberg C, Denmark
| | - Jakob H. Havgaard
- Division of Genetics and Bioinformatics, IBHV, The Royal Veterinary and Agricultural University, 1870 Frederiksberg C, Denmark
| | - Merete Fredholm
- Division of Genetics and Bioinformatics, IBHV, The Royal Veterinary and Agricultural University, 1870 Frederiksberg C, Denmark
| | - Jan Gorodkin
- Division of Genetics and Bioinformatics, IBHV, The Royal Veterinary and Agricultural University, 1870 Frederiksberg C, Denmark
- Corresponding author.E-mail ; fax 45-3528-3042
| |
Collapse
|
13
|
Abstract
Foldalign is a Sankoff-based algorithm for making structural alignments of RNA sequences. Here, we present a web server for making pairwise alignments between two RNA sequences, using the recently updated version of foldalign. The server can be used to scan two sequences for a common structural RNA motif of limited size, or the entire sequences can be aligned locally or globally. The web server offers a graphical interface, which makes it simple to make alignments and manually browse the results. The web server can be accessed at .
Collapse
Affiliation(s)
| | - Rune B. Lyngsø
- Department of Statistics, Oxford University1 South Parks Road, Oxford OX1 3TG, UK
| | - Jan Gorodkin
- To whom correspondence should be addressed: Tel: +45 3528 3578; Fax: +45 3528 3042;
| |
Collapse
|