1
|
Abstract
DNA methylation is a crucial, abundant mechanism of gene regulation in vertebrates. It is less prevalent in many other metazoan organisms and completely absent in some key model species, such as Drosophila melanogaster and Caenorhabditis elegans. We report here a comprehensive study of the presence and absence of DNA methyltransferases (DNMTs) in 138 Ecdysozoa, covering Arthropoda, Nematoda, Priapulida, Onychophora, and Tardigrada. Three of these phyla have not been investigated for the presence of DNA methylation before. We observe that the loss of individual DNMTs independently occurred multiple times across ecdysozoan phyla. We computationally predict the presence of DNA methylation based on CpG rates in coding sequences using an implementation of Gaussian Mixture Modeling, MethMod. Integrating both analysis we predict two previously unknown losses of DNA methylation in Ecdysozoa, one within Chelicerata (Mesostigmata) and one in Tardigrada. In the early-branching Ecdysozoa Priapulus caudatus, we predict the presence of a full set of DNMTs and the presence of DNA methylation. We are therefore showing a very diverse and independent evolution of DNA methylation in different ecdysozoan phyla spanning a phylogenetic range of more than 700 million years.
Collapse
Affiliation(s)
- Jan Engelhardt
- Bioinformatics Group, Department of Computer Science, University of Leipzig, Härtelstraße 16-18, 04107, Leipzig, Germany. .,Computational EvoDevo Group, Department of Computer Science, University of Leipzig, Härtelstraße 16-18, 04107, Leipzig, Germany. .,Interdisciplinary Centre for Bioinformatics, University of Leipzig, Härtelstraße 16-18, 04107, Leipzig, Germany. .,Department of Evolutionary Biology, University of Vienna, Djerassiplatz 1, 1030, Vienna, Austria.
| | - Oliver Scheer
- Computational EvoDevo Group, Department of Computer Science, University of Leipzig, Härtelstraße 16-18, 04107, Leipzig, Germany.,Interdisciplinary Centre for Bioinformatics, University of Leipzig, Härtelstraße 16-18, 04107, Leipzig, Germany
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, University of Leipzig, Härtelstraße 16-18, 04107, Leipzig, Germany.,Interdisciplinary Centre for Bioinformatics, University of Leipzig, Härtelstraße 16-18, 04107, Leipzig, Germany.,The Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM, 87501, USA.,Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, 04103, Leipzig, Germany.,Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, 1090, Vienna, Austria.,Facultad de Ciencias, Universidad National de Colombia, Sede Bogotá, Colombia
| | - Sonja J Prohaska
- Computational EvoDevo Group, Department of Computer Science, University of Leipzig, Härtelstraße 16-18, 04107, Leipzig, Germany.,Interdisciplinary Centre for Bioinformatics, University of Leipzig, Härtelstraße 16-18, 04107, Leipzig, Germany.,The Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM, 87501, USA.,Complexity Science Hub Vienna, Josefstädter Str. 39, 1080, Vienna, Austria
| |
Collapse
|
2
|
Laubichler MD, Prohaska SJ, Stadler PF. Toward a mechanistic explanation of phenotypic evolution: The need for a theory of theory integration. J Exp Zool (Mol Dev Evol ) 2018; 330:5-14. [DOI: 10.1002/jez.b.22785] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/26/2016] [Revised: 11/03/2017] [Accepted: 11/15/2017] [Indexed: 01/01/2023]
Affiliation(s)
- Manfred D. Laubichler
- School of Life Sciences; Arizona State University; Tempe Arizona
- Marine Biological Laboratory; Woods Hole; Massachusetts
- Santa Fe Institute; Santa Fe New Mexico
| | - Sonja J. Prohaska
- Santa Fe Institute; Santa Fe New Mexico
- Computational EvoDevo Group; Department of Computer Science; Leipzig Germany
- Interdisciplinary Center of Bioinformatics; University of Leipzig; Leipzig Germany
| | - Peter F. Stadler
- Santa Fe Institute; Santa Fe New Mexico
- Interdisciplinary Center of Bioinformatics; University of Leipzig; Leipzig Germany
- Bioinformatics Group, Department of Computer Science; University of Leipzig; Leipzig Germany
- Max-Planck Institute for Mathematics in the Sciences; Leipzig Germany
- Fraunhofer Institut für Zelltherapie und Immunologie-IZI; Leipzig Germany. Department of Theoretical Chemistry; University of Vienna; Wien Austria. Center for Non-Coding RNA in Technology and Health; University of Copenhagen; Frederiksberg Denmark
| |
Collapse
|
3
|
Prohaska SJ, Berkemer SJ, Gärtner F, Gatter T, Retzlaff N, Höner Zu Siederdissen C, Stadler PF. Expansion of gene clusters, circular orders, and the shortest Hamiltonian path problem. J Math Biol 2017; 77:313-341. [PMID: 29260295 PMCID: PMC6060901 DOI: 10.1007/s00285-017-1197-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2017] [Revised: 12/02/2017] [Indexed: 11/30/2022]
Abstract
Clusters of paralogous genes such as the famous HOX cluster of developmental transcription factors tend to evolve by stepwise duplication of its members, often involving unequal crossing over. Gene conversion and possibly other mechanisms of concerted evolution further obfuscate the phylogenetic relationships. As a consequence, it is very difficult or even impossible to disentangle the detailed history of gene duplications in gene clusters. In this contribution we show that the expansion of gene clusters by unequal crossing over as proposed by Walter Gehring leads to distinctive patterns of genetic distances, namely a subclass of circular split systems. Furthermore, when the gene cluster was left undisturbed by genome rearrangements, the shortest Hamiltonian paths with respect to genetic distances coincide with the genomic order. This observation can be used to detect ancient genomic rearrangements of gene clusters and to distinguish gene clusters whose evolution was dominated by unequal crossing over within genes from those that expanded through other mechanisms.
Collapse
Affiliation(s)
- Sonja J Prohaska
- Computational EvoDevo Group, Department of Computer Science and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, 04107, Leipzig, Germany
| | - Sarah J Berkemer
- Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, 04103, Leipzig, Germany.,Bioinformatics Group, Department of Computer Science and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, 04107, Leipzig, Germany
| | - Fabian Gärtner
- Competence Center for Scalable Data Services and Solutions Dresden/Leipzig and Bioinformatics Group, Department of Computer Science, Universität Leipzig, Härtelstraße 16-18, 04107, Leipzig, Germany
| | - Thomas Gatter
- Bioinformatics Group, Department of Computer Science and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, 04107, Leipzig, Germany
| | - Nancy Retzlaff
- Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, 04103, Leipzig, Germany.,Bioinformatics Group, Department of Computer Science and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, 04107, Leipzig, Germany
| | | | - Christian Höner Zu Siederdissen
- Bioinformatics Group, Department of Computer Science and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, 04107, Leipzig, Germany
| | - Peter F Stadler
- Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, 04103, Leipzig, Germany. .,Bioinformatics Group, Department of Computer Science and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, 04107, Leipzig, Germany. .,RNomics Group, Fraunhofer Institute for Cell Therapy and Immunology, 04103, Leipzig, Germany. .,Institute for Theoretical Chemistry, University of Vienna, Währingerstrasse 17, 1090, Wien, Austria. .,Santa Fe Insitute, 1399 Hyde Park Rd., Santa Fe, NM, 87501, USA.
| |
Collapse
|
4
|
Richards CL, Alonso C, Becker C, Bossdorf O, Bucher E, Colomé-Tatché M, Durka W, Engelhardt J, Gaspar B, Gogol-Döring A, Grosse I, van Gurp TP, Heer K, Kronholm I, Lampei C, Latzel V, Mirouze M, Opgenoorth L, Paun O, Prohaska SJ, Rensing SA, Stadler PF, Trucchi E, Ullrich K, Verhoeven KJF. Ecological plant epigenetics: Evidence from model and non-model species, and the way forward. Ecol Lett 2017; 20:1576-1590. [PMID: 29027325 DOI: 10.1111/ele.12858] [Citation(s) in RCA: 175] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2017] [Revised: 06/15/2017] [Accepted: 09/04/2017] [Indexed: 12/15/2022]
Abstract
Growing evidence shows that epigenetic mechanisms contribute to complex traits, with implications across many fields of biology. In plant ecology, recent studies have attempted to merge ecological experiments with epigenetic analyses to elucidate the contribution of epigenetics to plant phenotypes, stress responses, adaptation to habitat, and range distributions. While there has been some progress in revealing the role of epigenetics in ecological processes, studies with non-model species have so far been limited to describing broad patterns based on anonymous markers of DNA methylation. In contrast, studies with model species have benefited from powerful genomic resources, which contribute to a more mechanistic understanding but have limited ecological realism. Understanding the significance of epigenetics for plant ecology requires increased transfer of knowledge and methods from model species research to genomes of evolutionarily divergent species, and examination of responses to complex natural environments at a more mechanistic level. This requires transforming genomics tools specifically for studying non-model species, which is challenging given the large and often polyploid genomes of plants. Collaboration among molecular geneticists, ecologists and bioinformaticians promises to enhance our understanding of the mutual links between genome function and ecological processes.
Collapse
Affiliation(s)
- Christina L Richards
- Department of Integrative Biology, University of South Florida, Tampa, FL, 33620, USA
| | | | - Claude Becker
- Gregor Mendel Institute of Molecular Plant Biology, 1030, Vienna, Austrian Academy of Sciences, Vienna Biocenter (VBC), Austria
| | - Oliver Bossdorf
- Plant Evolutionary Ecology, University of Tübingen, 72076, Tübingen, Germany
| | - Etienne Bucher
- Institut de Recherche en Horticulture et Semences, 49071, Beaucouzé Cedex, France
| | - Maria Colomé-Tatché
- European Research Institute for the Biology of Ageing, University Medical Center Groningen, 9713, Groningen, The Netherlands.,Institute of Computational Biology, Helmholtz Zentrum München, 85764, Neuherberg, Germany.,School of Life Sciences Weihenstephan, Technical University of Munich, 85354, Freising, Germany
| | - Walter Durka
- Department of Community Ecology, Helmholtz Centre for Environmental Research - UFZ, 06120, Halle, Germany.,German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, 04103, Leipzig, Germany
| | - Jan Engelhardt
- Institut für Informatik, University of Leipzig, 04107, Leipzig, Germany
| | - Bence Gaspar
- Plant Evolutionary Ecology, University of Tübingen, 72076, Tübingen, Germany
| | - Andreas Gogol-Döring
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, 04103, Leipzig, Germany.,Institute of Computer Science, University of Halle, 06120, Halle, Germany
| | - Ivo Grosse
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, 04103, Leipzig, Germany.,Institute of Computer Science, University of Halle, 06120, Halle, Germany
| | - Thomas P van Gurp
- Netherlands Institute of Ecology (NIOO-KNAW), Wageningen, The Netherlands
| | - Katrin Heer
- Conservation Biology, Philipps-University of Marburg, 35037, Marburg, Germany
| | - Ilkka Kronholm
- Department of Biological and Environmental Sciences, Center of Excellence in Biological Interactions, University of Jyväskylä, 40014, Jyväskylän yliopisto, Finland
| | - Christian Lampei
- Institute of Plant Breeding, Seed Science and Population Genetics, 70599, Stuttgart, Germany
| | - Vít Latzel
- Institute of Botany, The Czech Academy of Sciences, 25243, Průhonice, Czech Republic
| | - Marie Mirouze
- Institut de Recherche pour le Développement, Laboratoire Génome et Développement des Plantes, 66860, Perpignan, France
| | - Lars Opgenoorth
- Department of Ecology, Philipps-University Marburg, 35037, Marburg, Germany
| | - Ovidiu Paun
- Plant Ecological Genomics, University of Vienna, 1030, Vienna, Austria
| | - Sonja J Prohaska
- Institut für Informatik, University of Leipzig, 04107, Leipzig, Germany.,The Santa Fe Institute, Santa Fe NM, 87501, USA
| | - Stefan A Rensing
- Plant Cell Biology, Philipps-University Marburg, 35037, Marburg, Germany.,BIOSS Centre for Biological Signaling Studies, University of Freiburg, 79098, Freiburg, Germany
| | - Peter F Stadler
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, 04103, Leipzig, Germany.,Institut für Informatik, University of Leipzig, 04107, Leipzig, Germany.,The Santa Fe Institute, Santa Fe NM, 87501, USA.,Max Planck Institute for Mathematics in the Sciences, 04103, Leipzig, Germany
| | - Emiliano Trucchi
- Plant Ecological Genomics, University of Vienna, 1030, Vienna, Austria
| | - Kristian Ullrich
- Plant Cell Biology, Philipps-University Marburg, 35037, Marburg, Germany
| | - Koen J F Verhoeven
- Netherlands Institute of Ecology (NIOO-KNAW), Wageningen, The Netherlands
| |
Collapse
|
5
|
Indrischek H, Prohaska SJ, Gurevich VV, Gurevich EV, Stadler PF. Uncovering missing pieces: duplication and deletion history of arrestins in deuterostomes. BMC Evol Biol 2017; 17:163. [PMID: 28683816 PMCID: PMC5501109 DOI: 10.1186/s12862-017-1001-4] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2016] [Accepted: 06/19/2017] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND The cytosolic arrestin proteins mediate desensitization of activated G protein-coupled receptors (GPCRs) via competition with G proteins for the active phosphorylated receptors. Arrestins in active, including receptor-bound, conformation are also transducers of signaling. Therefore, this protein family is an attractive therapeutic target. The signaling outcome is believed to be a result of structural and sequence-dependent interactions of arrestins with GPCRs and other protein partners. Here we elucidated the detailed evolution of arrestins in deuterostomes. RESULTS Identity and number of arrestin paralogs were determined searching deuterostome genomes and gene expression data. In contrast to standard gene prediction methods, our strategy first detects exons situated on different scaffolds and then solves the problem of assigning them to the correct gene. This increases both the completeness and the accuracy of the annotation in comparison to conventional database search strategies applied by the community. The employed strategy enabled us to map in detail the duplication- and deletion history of arrestin paralogs including tandem duplications, pseudogenizations and the formation of retrogenes. The two rounds of whole genome duplications in the vertebrate stem lineage gave rise to four arrestin paralogs. Surprisingly, visual arrestin ARR3 was lost in the mammalian clades Afrotheria and Xenarthra. Duplications in specific clades, on the other hand, must have given rise to new paralogs that show signatures of diversification in functional elements important for receptor binding and phosphate sensing. CONCLUSION The current study traces the functional evolution of deuterostome arrestins in unprecedented detail. Based on a precise re-annotation of the exon-intron structure at nucleotide resolution, we infer the gain and loss of paralogs and patterns of conservation, co-variation and selection.
Collapse
Affiliation(s)
- Henrike Indrischek
- Computational EvoDevo Group, Department of Computer Science, Universität Leipzig, Härtelstraße 16-18, Leipzig, D-04107, Germany. .,Bioinformatics Group, Department of Computer Science, Universität Leipzig, Härtelstraße 16-18, Leipzig, D-04107, Germany. .,Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, Leipzig, D-04107, Germany.
| | - Sonja J Prohaska
- Computational EvoDevo Group, Department of Computer Science, Universität Leipzig, Härtelstraße 16-18, Leipzig, D-04107, Germany.,Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, Leipzig, D-04107, Germany
| | - Vsevolod V Gurevich
- Department of Pharmacology, Vanderbilt University, 2200 Pierce Ave, Nashville, TN 37232, USA
| | - Eugenia V Gurevich
- Department of Pharmacology, Vanderbilt University, 2200 Pierce Ave, Nashville, TN 37232, USA
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, Universität Leipzig, Härtelstraße 16-18, Leipzig, D-04107, Germany.,Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, Leipzig, D-04107, Germany.,Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, Leipzig, D-04103, Germany.,Fraunhofer Institute for Cell Therapy and Immunology, Perlickstraße 1, Leipzig, D-04103, Germany.,Department of Theoretical Chemistry, University of Vienna, Währinger Straße 17, Vienna, A-1090, Austria.,Center for non-coding RNA in Technology and Health, Grønegårdsvej 3, Frederiksberg C, DK-1870, Denmark.,Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501, USA
| |
Collapse
|
6
|
Indrischek H, Wieseke N, Stadler PF, Prohaska SJ. The paralog-to-contig assignment problem: high quality gene models from fragmented assemblies. Algorithms Mol Biol 2016; 11:1. [PMID: 26913054 PMCID: PMC4765045 DOI: 10.1186/s13015-016-0063-y] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2015] [Accepted: 02/02/2016] [Indexed: 11/10/2022] Open
Abstract
Background The accurate annotation of genes in newly sequenced genomes remains a challenge. Although sophisticated comparative pipelines are available, computationally derived gene models are often less than perfect. This is particularly true when multiple similar paralogs are present. The issue is aggravated further when genomes are assembled only at a preliminary draft level to contigs or short scaffolds. However, these genomes deliver valuable information for studying gene families. High accuracy models of protein coding genes are needed in particular for phylogenetics and for the analysis of gene family histories. Results We present a pipeline, ExonMatchSolver, that is designed to help the user to produce and curate high quality models of the protein-coding part of genes. The tool in particular tackles the problem of identifying those coding exon groups that belong to the same paralogous genes in a fragmented genome assembly. This paralog-to-contig assignment problem is shown to be NP-complete. It is phrased and solved as an Integer Linear Programming problem. Conclusions The ExonMatchSolver-pipeline can be employed to build highly accurate models of protein coding genes even when spanning several genomic fragments. This sets the stage for a better understanding of the evolutionary history within particular gene families which possess a large number of paralogs and in which frequent gene duplication events occurred. Electronic supplementary material The online version of this article (doi:10.1186/s13015-016-0063-y) contains supplementary material, which is available to authorized users.
Collapse
|
7
|
Abstract
Background Dynamic programming algorithms provide exact solutions to many problems in computational biology, such as sequence alignment, RNA folding, hidden Markov models (HMMs), and scoring of phylogenetic trees. Structurally analogous algorithms compute optimal solutions, evaluate score distributions, and perform stochastic sampling. This is explained in the theory of Algebraic Dynamic Programming (ADP) by a strict separation of state space traversal (usually represented by a context free grammar), scoring (encoded as an algebra), and choice rule. A key ingredient in this theory is the use of yield parsers that operate on the ordered input data structure, usually strings or ordered trees. The computation of ensemble properties, such as a posteriori probabilities of HMMs or partition functions in RNA folding, requires the combination of two distinct, but intimately related algorithms, known as the inside and the outside recursion. Only the inside recursions are covered by the classical ADP theory. Results The ideas of ADP are generalized to a much wider scope of data structures by relaxing the concept of parsing. This allows us to formalize the conceptual complementarity of inside and outside variables in a natural way. We demonstrate that outside recursions are generically derivable from inside decomposition schemes. In addition to rephrasing the well-known algorithms for HMMs, pairwise sequence alignment, and RNA folding we show how the TSP and the shortest Hamiltonian path problem can be implemented efficiently in the extended ADP framework. As a showcase application we investigate the ancient evolution of HOX gene clusters in terms of shortest Hamiltonian paths. Conclusions The generalized ADP framework presented here greatly facilitates the development and implementation of dynamic programming algorithms for a wide spectrum of applications.
Collapse
|
8
|
Le Duc D, Renaud G, Krishnan A, Almén MS, Huynen L, Prohaska SJ, Ongyerth M, Bitarello BD, Schiöth HB, Hofreiter M, Stadler PF, Prüfer K, Lambert D, Kelso J, Schöneberg T. Kiwi genome provides insights into evolution of a nocturnal lifestyle. Genome Biol 2015; 16:147. [PMID: 26201466 PMCID: PMC4511969 DOI: 10.1186/s13059-015-0711-4] [Citation(s) in RCA: 59] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2015] [Accepted: 07/01/2015] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Kiwi, comprising five species from the genus Apteryx, are endangered, ground-dwelling bird species endemic to New Zealand. They are the smallest and only nocturnal representatives of the ratites. The timing of kiwi adaptation to a nocturnal niche and the genomic innovations, which shaped sensory systems and morphology to allow this adaptation, are not yet fully understood. RESULTS We sequenced and assembled the brown kiwi genome to 150-fold coverage and annotated the genome using kiwi transcript data and non-redundant protein information from multiple bird species. We identified evolutionary sequence changes that underlie adaptation to nocturnality and estimated the onset time of these adaptations. Several opsin genes involved in color vision are inactivated in the kiwi. We date this inactivation to the Oligocene epoch, likely after the arrival of the ancestor of modern kiwi in New Zealand. Genome comparisons between kiwi and representatives of ratites, Galloanserae, and Neoaves, including nocturnal and song birds, show diversification of kiwi's odorant receptors repertoire, which may reflect an increased reliance on olfaction rather than sight during foraging. Further, there is an enrichment of genes influencing mitochondrial function and energy expenditure among genes that are rapidly evolving specifically on the kiwi branch, which may also be linked to its nocturnal lifestyle. CONCLUSIONS The genomic changes in kiwi vision and olfaction are consistent with changes that are hypothesized to occur during adaptation to nocturnal lifestyle in mammals. The kiwi genome provides a valuable genomic resource for future genome-wide comparative analyses to other extinct and extant diurnal ratites.
Collapse
Affiliation(s)
- Diana Le Duc
- Institute of Biochemistry, Medical Faculty, University of Leipzig, Johannisallee 30, Leipzig, 04103, Germany.
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, 04103, Germany.
| | - Gabriel Renaud
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, 04103, Germany.
| | - Arunkumar Krishnan
- Department of Neuroscience, Unit of Functional Pharmacology, Uppsala University, Box 593, Husargatan 3, Uppsala, 751 24, Sweden.
| | - Markus Sällman Almén
- Department of Neuroscience, Unit of Functional Pharmacology, Uppsala University, Box 593, Husargatan 3, Uppsala, 751 24, Sweden.
| | - Leon Huynen
- Griffith School of Environment and School of Biomolecular and Physical Sciences, Griffith University, Nathan, Queensland, 4111, Australia.
| | - Sonja J Prohaska
- Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, 04103, Germany.
| | - Matthias Ongyerth
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, 04103, Germany.
| | - Bárbara D Bitarello
- Department of Genetics and Evolutionary Biology, University of São Paulo, São Paulo, SP, 05508-090, Brazil.
| | - Helgi B Schiöth
- Department of Neuroscience, Unit of Functional Pharmacology, Uppsala University, Box 593, Husargatan 3, Uppsala, 751 24, Sweden.
| | - Michael Hofreiter
- Adaptive Evolutionary Genomics, Institute for Biochemistry and Biology, University Potsdam, Potsdam, 14469, Germany.
| | - Peter F Stadler
- Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, 04103, Germany.
| | - Kay Prüfer
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, 04103, Germany.
| | - David Lambert
- Griffith School of Environment and School of Biomolecular and Physical Sciences, Griffith University, Nathan, Queensland, 4111, Australia.
| | - Janet Kelso
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, 04103, Germany.
| | - Torsten Schöneberg
- Institute of Biochemistry, Medical Faculty, University of Leipzig, Johannisallee 30, Leipzig, 04103, Germany.
| |
Collapse
|
9
|
Betat H, Mede T, Tretbar S, Steiner L, Stadler PF, Mörl M, Prohaska SJ. The ancestor of modern Holozoa acquired the CCA-adding enzyme from Alphaproteobacteria by horizontal gene transfer. Nucleic Acids Res 2015; 43:6739-46. [PMID: 26117543 PMCID: PMC4538823 DOI: 10.1093/nar/gkv631] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2015] [Accepted: 06/07/2015] [Indexed: 12/03/2022] Open
Abstract
Transfer RNAs (tRNAs) require the absolutely conserved sequence motif CCA at their 3′-ends, representing the site of aminoacylation. In the majority of organisms, this trinucleotide sequence is not encoded in the genome and thus has to be added post-transcriptionally by the CCA-adding enzyme, a specialized nucleotidyltransferase. In eukaryotic genomes this ubiquitous and highly conserved enzyme family is usually represented by a single gene copy. Analysis of published sequence data allows us to pin down the unusual evolution of eukaryotic CCA-adding enzymes. We show that the CCA-adding enzymes of animals originated from a horizontal gene transfer event in the stem lineage of Holozoa, i.e. Metazoa (animals) and their unicellular relatives, the Choanozoa. The tRNA nucleotidyltransferase, acquired from an α-proteobacterium, replaced the ancestral enzyme in Metazoa. However, in Choanoflagellata, the group of Choanozoa that is closest to Metazoa, both the ancestral and the horizontally transferred CCA-adding enzymes have survived. Furthermore, our data refute a mitochondrial origin of the animal tRNA nucleotidyltransferases.
Collapse
Affiliation(s)
- Heike Betat
- Institute for Biochemistry, University of Leipzig, Brüderstraße 34, D-04103 Leipzig, Germany
| | - Tobias Mede
- Computational EvoDevo Group, Department of Computer Science, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany
| | - Sandy Tretbar
- Institute for Biochemistry, University of Leipzig, Brüderstraße 34, D-04103 Leipzig, Germany
| | - Lydia Steiner
- Computational EvoDevo Group, Department of Computer Science, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany Max-Planck-Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig, Germany Fraunhofer Institut für Zelltherapie und Immunologie, Perlickstraße 1, D-04103 Leipzig, Germany Department of Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria Center for non-coding RNA in Technology and Health, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg, Denmark Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA
| | - Mario Mörl
- Institute for Biochemistry, University of Leipzig, Brüderstraße 34, D-04103 Leipzig, Germany
| | - Sonja J Prohaska
- Computational EvoDevo Group, Department of Computer Science, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany
| |
Collapse
|
10
|
Misof B, Meusemann K, von Reumont BM, Kück P, Prohaska SJ, Stadler PF. A priori assessment of data quality in molecular phylogenetics. Algorithms Mol Biol 2014. [DOI: 10.1186/s13015-014-0022-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
|
11
|
Lechner M, Hernandez-Rosales M, Doerr D, Wieseke N, Thévenin A, Stoye J, Hartmann RK, Prohaska SJ, Stadler PF. Orthology detection combining clustering and synteny for very large datasets. PLoS One 2014; 9:e105015. [PMID: 25137074 PMCID: PMC4138177 DOI: 10.1371/journal.pone.0105015] [Citation(s) in RCA: 73] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2014] [Accepted: 07/14/2014] [Indexed: 11/18/2022] Open
Abstract
The elucidation of orthology relationships is an important step both in gene function prediction as well as towards understanding patterns of sequence evolution. Orthology assignments are usually derived directly from sequence similarities for large data because more exact approaches exhibit too high computational costs. Here we present PoFF, an extension for the standalone tool Proteinortho, which enhances orthology detection by combining clustering, sequence similarity, and synteny. In the course of this work, FFAdj-MCS, a heuristic that assesses pairwise gene order using adjacencies (a similarity measure related to the breakpoint distance) was adapted to support multiple linear chromosomes and extended to detect duplicated regions. PoFF largely reduces the number of false positives and enables more fine-grained predictions than purely similarity-based approaches. The extension maintains the low memory requirements and the efficient concurrency options of its basis Proteinortho, making the software applicable to very large datasets.
Collapse
Affiliation(s)
- Marcus Lechner
- Institut für Pharmazeutische Chemie, Philipps-Universität Marburg, Marburg, Germany
- * E-mail:
| | - Maribel Hernandez-Rosales
- Bioinformatics Group, Department of Computer Science, Universität Leipzig, Leipzig, Germany
- Interdisciplinary Center for Bioinformatics, Universität Leipzig, Leipzig, Germany
- Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany
- Departamento de Ciência da Computação, Instituto de Ciências Exatas, Universidade de Brasília, Brasília, Brasil
| | - Daniel Doerr
- Genome Informatics, Faculty of Technology, Bielefeld University, Bielefeld, Germany
- Institute for Bioinformatics, Center for Biotechnology, Bielefeld University, Bielefeld, Germany
| | - Nicolas Wieseke
- Faculty of Mathematics and Computer Science University of Leipzig, Leipzig, Germany
| | - Annelyse Thévenin
- Genome Informatics, Faculty of Technology, Bielefeld University, Bielefeld, Germany
- Institute for Bioinformatics, Center for Biotechnology, Bielefeld University, Bielefeld, Germany
| | - Jens Stoye
- Genome Informatics, Faculty of Technology, Bielefeld University, Bielefeld, Germany
- Institute for Bioinformatics, Center for Biotechnology, Bielefeld University, Bielefeld, Germany
| | - Roland K. Hartmann
- Institut für Pharmazeutische Chemie, Philipps-Universität Marburg, Marburg, Germany
| | - Sonja J. Prohaska
- Computational EvoDevo Group, Department of Computer Science, Universität Leipzig, Leipzig, Germany
| | - Peter F. Stadler
- Bioinformatics Group, Department of Computer Science, Universität Leipzig, Leipzig, Germany
- Interdisciplinary Center for Bioinformatics, Universität Leipzig, Leipzig, Germany
- Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany
- Institute for Theoretical Chemistry, University of Vienna, Vienna, Austria
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
- The Santa Fe Institute, Santa Fe, New Mexico, United States of America
- RNomics Group, Fraunhofer Institut for Cell Therapy and Immunology, Leipzig, Germany
| |
Collapse
|
12
|
Müller GA, Wintsche A, Stangner K, Prohaska SJ, Stadler PF, Engeland K. The CHR site: definition and genome-wide identification of a cell cycle transcriptional element. Nucleic Acids Res 2014; 42:10331-50. [PMID: 25106871 PMCID: PMC4176359 DOI: 10.1093/nar/gku696] [Citation(s) in RCA: 73] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
The cell cycle genes homology region (CHR) has been identified as a DNA element with an important role in transcriptional regulation of late cell cycle genes. It has been shown that such genes are controlled by DREAM, MMB and FOXM1-MuvB and that these protein complexes can contact DNA via CHR sites. However, it has not been elucidated which sequence variations of the canonical CHR are functional and how frequent CHR-based regulation is utilized in mammalian genomes. Here, we define the spectrum of functional CHR elements. As the basis for a computational meta-analysis, we identify new CHR sequences and compile phylogenetic motif conservation as well as genome-wide protein-DNA binding and gene expression data. We identify CHR elements in most late cell cycle genes binding DREAM, MMB, or FOXM1-MuvB. In contrast, Myb- and forkhead-binding sites are underrepresented in both early and late cell cycle genes. Our findings support a general mechanism: sequential binding of DREAM, MMB and FOXM1-MuvB complexes to late cell cycle genes requires CHR elements. Taken together, we define the group of CHR-regulated genes in mammalian genomes and provide evidence that the CHR is the central promoter element in transcriptional regulation of late cell cycle genes by DREAM, MMB and FOXM1-MuvB.
Collapse
Affiliation(s)
- Gerd A Müller
- Molecular Oncology, Medical School, University of Leipzig, Semmelweisstr. 14, 04103 Leipzig, Germany
| | - Axel Wintsche
- Computational EvoDevo Group, Department of Computer Science and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, 04107 Leipzig, Germany
| | - Konstanze Stangner
- Molecular Oncology, Medical School, University of Leipzig, Semmelweisstr. 14, 04103 Leipzig, Germany
| | - Sonja J Prohaska
- Computational EvoDevo Group, Department of Computer Science and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, 04107 Leipzig, Germany
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstrasse 16-18, 04107 Leipzig, Germany Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, 04103 Leipzig, Germany Center for Non-coding RNA in Technology and Health, Department of Basic Veterinary and Animal Sciences, Faculty of Life Sciences University of Copenhagen, Grønnegårdsvej 3, 1870 Frederiksberg C Denmark Santa Fe Institute, 1399 Hyde Park Rd, Santa Fe, NM 87501, USA
| | - Kurt Engeland
- Molecular Oncology, Medical School, University of Leipzig, Semmelweisstr. 14, 04103 Leipzig, Germany
| |
Collapse
|
13
|
A. Parikesit A, Steiner L, F. Stadler P, J. Prohaska S. Pitfalls of ascertainment biases in genome annotations—computing comparable protein domain distributions in eukarya. Mal J Fund Appl Sci 2014. [DOI: 10.11113/mjfas.v10n2.57] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Most investigations into the large-scale patterns of protein evolution are based on gene annotations that have been compiled in reference databases. The use of these resources for quantitative comparisons, however, is complicated by sometimes vast differences in coverage. More importantly, however, we also observe substantial ascertainment biases that cannot be removed by simple normalization procedures. A striking example is provided by the correlations between protein domains. We observe that statistics derived from different computational gene annotation procedure show dramatic discrepancies, and even qualitative changes from negative to positive correlation, when compared to statistics obtained from annotation databases.________________________________________GRAPHICAL ABSTRACT
Collapse
|
14
|
Arnold C, Stadler PF, Prohaska SJ. Chromatin computation: epigenetic inheritance as a pattern reconstruction problem. J Theor Biol 2013; 336:61-74. [PMID: 23880640 DOI: 10.1016/j.jtbi.2013.07.012] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2013] [Revised: 07/02/2013] [Accepted: 07/15/2013] [Indexed: 01/19/2023]
Abstract
Eukaryotic histones carry a diverse set of specific chemical modifications that accumulate over the life-time of a cell and have a crucial impact on the cell state in general and the transcriptional program in particular. Replication constitutes a dramatic disruption of the chromatin states that effectively amounts to partial erasure of stored information. To preserve its epigenetic state the cell reconstructs (at least part of) the histone modifications by means of processes that are still very poorly understood. A plausible hypothesis is that the different combinations of reader and writer domains in histone-modifying enzymes implement local rewriting rules that are capable of "recomputing" the desired parental modification patterns on the basis of the partial information contained in that half of the nucleosomes that predate replication. To test whether such a mechanism is theoretically feasible, we have developed a flexible stochastic simulation system (available at http://www.bioinf.uni-leipzig.de/Software/StoChDyn) for studying the dynamics of histone modification states. The implementation is based on Gillespie's approach, i.e., it models the master equation of a detailed chemical model. It is efficient enough to use an evolutionary algorithm to find patterns across multiple cell divisions with high accuracy. We found that it is easy to evolve a system of enzymes that can maintain a particular chromatin state roughly stable, even without explicit boundary elements separating differentially modified chromatin domains. However, the success of this task depends on several previously unanticipated factors, such as the length of the initial state, the specific pattern that should be maintained, the time between replications, and chemical parameters such as enzymatic binding and dissociation rates. All these factors also influence the accumulation of errors in the wake of cell divisions.
Collapse
Affiliation(s)
- Christian Arnold
- Computational EvoDevo Group, Department of Computer Science, Universität Leipzig, Härtelstraße 16-18, 04107 Leipzig, Germany; Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, 04107 Leipzig, Germany; Harvard University, Department of Human Evolutionary Biology, 11 Divinity Avenue, Cambridge, MA 02138, USA.
| | | | | |
Collapse
|
15
|
Amemiya CT, Alföldi J, Lee AP, Fan S, Philippe H, Maccallum I, Braasch I, Manousaki T, Schneider I, Rohner N, Organ C, Chalopin D, Smith JJ, Robinson M, Dorrington RA, Gerdol M, Aken B, Biscotti MA, Barucca M, Baurain D, Berlin AM, Blatch GL, Buonocore F, Burmester T, Campbell MS, Canapa A, Cannon JP, Christoffels A, De Moro G, Edkins AL, Fan L, Fausto AM, Feiner N, Forconi M, Gamieldien J, Gnerre S, Gnirke A, Goldstone JV, Haerty W, Hahn ME, Hesse U, Hoffmann S, Johnson J, Karchner SI, Kuraku S, Lara M, Levin JZ, Litman GW, Mauceli E, Miyake T, Mueller MG, Nelson DR, Nitsche A, Olmo E, Ota T, Pallavicini A, Panji S, Picone B, Ponting CP, Prohaska SJ, Przybylski D, Saha NR, Ravi V, Ribeiro FJ, Sauka-Spengler T, Scapigliati G, Searle SMJ, Sharpe T, Simakov O, Stadler PF, Stegeman JJ, Sumiyama K, Tabbaa D, Tafer H, Turner-Maier J, van Heusden P, White S, Williams L, Yandell M, Brinkmann H, Volff JN, Tabin CJ, Shubin N, Schartl M, Jaffe DB, Postlethwait JH, Venkatesh B, Di Palma F, Lander ES, Meyer A, Lindblad-Toh K. The African coelacanth genome provides insights into tetrapod evolution. Nature 2013; 496:311-6. [PMID: 23598338 PMCID: PMC3633110 DOI: 10.1038/nature12027] [Citation(s) in RCA: 464] [Impact Index Per Article: 42.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2012] [Accepted: 02/20/2013] [Indexed: 01/28/2023]
Abstract
It was a zoological sensation when a living specimen of the coelacanth was first discovered in 1938, as this lineage of lobe-finned fish was thought to have gone extinct 70 million years ago. The modern coelacanth looks remarkably similar to many of its ancient relatives, and its evolutionary proximity to our own fish ancestors provides a glimpse of the fish that first walked on land. Here we report the genome sequence of the African coelacanth, Latimeria chalumnae. Through a phylogenomic analysis, we conclude that the lungfish, and not the coelacanth, is the closest living relative of tetrapods. Coelacanth protein-coding genes are significantly more slowly evolving than those of tetrapods, unlike other genomic features . Analyses of changes in genes and regulatory elements during the vertebrate adaptation to land highlight genes involved in immunity, nitrogen excretion and the development of fins, tail, ear, eye, brain, and olfaction. Functional assays of enhancers involved in the fin-to-limb transition and in the emergence of extra-embryonic tissues demonstrate the importance of the coelacanth genome as a blueprint for understanding tetrapod evolution.
Collapse
Affiliation(s)
- Chris T Amemiya
- Molecular Genetics Program, Benaroya Research Institute, Seattle, Washington 98101, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
16
|
Steiner L, Hopp L, Wirth H, Galle J, Binder H, Prohaska SJ, Rohlf T. A global genome segmentation method for exploration of epigenetic patterns. PLoS One 2012; 7:e46811. [PMID: 23077526 PMCID: PMC3470578 DOI: 10.1371/journal.pone.0046811] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2012] [Accepted: 09/05/2012] [Indexed: 11/18/2022] Open
Abstract
Current genome-wide ChIP-seq experiments on different epigenetic marks aim at unraveling the interplay between their regulation mechanisms. Published evaluation tools, however, allow testing for predefined hypotheses only. Here, we present a novel method for annotation-independent exploration of epigenetic data and their inter-correlation with other genome-wide features. Our method is based on a combinatorial genome segmentation solely using information on combinations of epigenetic marks. It does not require prior knowledge about the data (e.g. gene positions), but allows integrating the data in a straightforward manner. Thereby, it combines compression, clustering and visualization of the data in a single tool. Our method provides intuitive maps of epigenetic patterns across multiple levels of organization, e.g. of the co-occurrence of different epigenetic marks in different cell types. Thus, it facilitates the formulation of new hypotheses on the principles of epigenetic regulation. We apply our method to histone modification data on trimethylation of histone H3 at lysine 4, 9 and 27 in multi-potent and lineage-primed mouse cells, analyzing their combinatorial modification pattern as well as differentiation-related changes of single modifications. We demonstrate that our method is capable of reproducing recent findings of gene centered approaches, e.g. correlations between CpG-density and the analyzed histone modifications. Moreover, combining the clustered epigenetic data with information on the expression status of associated genes we classify differences in epigenetic status of e.g. house-keeping genes versus differentiation-related genes. Visualizing the distribution of modification states on the chromosomes, we discover strong patterns for chromosome X. For example, exclusively H3K9me3 marked segments are enriched, while poised and active states are rare. Hence, our method also provides new insights into chromosome-specific epigenetic patterns, opening up new questions how "epigenetic computation" is distributed over the genome in space and time.
Collapse
Affiliation(s)
- Lydia Steiner
- Junior Professorship for Computational EvoDevo, Institute of Computer Science, University of Leipzig, Leipzig, Germany
- Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany
| | - Lydia Hopp
- Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany
- Leipzig Research Center for Civilization Diseases, University of Leipzig, Leipzig, Germany
| | - Henry Wirth
- Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany
- Leipzig Research Center for Civilization Diseases, University of Leipzig, Leipzig, Germany
| | - Jörg Galle
- Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany
| | - Hans Binder
- Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany
| | - Sonja J. Prohaska
- Junior Professorship for Computational EvoDevo, Institute of Computer Science, University of Leipzig, Leipzig, Germany
- Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany
| | - Thimo Rohlf
- Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany
- Max-Planck-Institute for Mathematics in the Sciences, Leipzig, Germany
- * E-mail:
| |
Collapse
|
17
|
Bompfünewerer AF, Flamm C, Fried C, Fritzsch G, Hofacker IL, Lehmann J, Missal K, Mosig A, Müller B, Prohaska SJ, Stadler BMR, Stadler PF, Tanzer A, Washietl S, Witwer C. Evolutionary patterns of non-coding RNAs. Theory Biosci 2012; 123:301-69. [PMID: 18202870 DOI: 10.1016/j.thbio.2005.01.002] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2004] [Accepted: 01/24/2005] [Indexed: 01/04/2023]
Abstract
A plethora of new functions of non-coding RNAs (ncRNAs) have been discovered in past few years. In fact, RNA is emerging as the central player in cellular regulation, taking on active roles in multiple regulatory layers from transcription, RNA maturation, and RNA modification to translational regulation. Nevertheless, very little is known about the evolution of this "Modern RNA World" and its components. In this contribution, we attempt to provide at least a cursory overview of the diversity of ncRNAs and functional RNA motifs in non-translated regions of regular messenger RNAs (mRNAs) with an emphasis on evolutionary questions. This survey is complemented by an in-depth analysis of examples from different classes of RNAs focusing mostly on their evolution in the vertebrate lineage. We present a survey of Y RNA genes in vertebrates and study the molecular evolution of the U7 snRNA, the snoRNAs E1/U17, E2, and E3, the Y RNA family, the let-7 microRNA (miRNA) family, and the mRNA-like evf-1 gene. We furthermore discuss the statistical distribution of miRNAs in metazoans, which suggests an explosive increase in the miRNA repertoire in vertebrates. The analysis of the transcription of ncRNAs suggests that small RNAs in general are genetically mobile in the sense that their association with a hostgene (e.g. when transcribed from introns of a mRNA) can change on evolutionary time scales. The let-7 family demonstrates, that even the mode of transcription (as intron or as exon) can change among paralogous ncRNA.
Collapse
|
18
|
Lozada-Chávez I, Stadler PF, Prohaska SJ. "Hypothesis for the modern RNA world": a pervasive non-coding RNA-based genetic regulation is a prerequisite for the emergence of multicellular complexity. ORIGINS LIFE EVOL B 2011; 41:587-607. [PMID: 22322874 DOI: 10.1007/s11084-011-9262-1] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2011] [Accepted: 12/12/2011] [Indexed: 02/06/2023]
Abstract
The transitions to multicellularity mark the most pivotal and distinctive events in life's history on Earth. Although several transitions to "simple" multicellularity (SM) have been recorded in both bacterial and eukaryotic clades, transitions to complex multicellularity (CM) have only happened a few times in eukaryotes. A large number of cell types (associated with large body size), increased energy consumption per gene expressed, and an increment of non-protein-coding DNA positively correlate with CM. These three factors can indeed be understood as the causes and consequences of the regulation of gene expression. Here, we discuss how a vast expansion of non-protein-coding RNA (ncRNAs) regulators rather than large numbers of novel protein regulators can easily contribute to the emergence of CM. We also propose that the evolutionary advantage of RNA-based gene regulation derives from the robustness of the RNA structure that makes it easy to combine genetic drift with functional exploration. We describe a model which aims to explain how the evolutionary dynamic of ncRNAs becomes dominated by the accessibility of advantageous mutations to innovate regulation in complex multicellular organisms. The information and models discussed here outline the hypothesis that pervasive ncRNA-based regulatory systems, only capable of being expanded and explored in higher eukaryotes, are prerequisite to complex multicellularity. Thereby, regulatory RNA molecules in Eukarya have allowed intensification of morphological complexity by stabilizing critical phenotypes and controlling developmental precision. Although the origin of RNA on early Earth is still controversial, it is becoming clear that once RNA emerged into a protocellular system, its relevance within the evolution of biological systems has been greater than we previously thought.
Collapse
Affiliation(s)
- Irma Lozada-Chávez
- Computational EvoDevo Group, University of Leipzig, Härtelstrasse 16-18, 04107, Leipzig, Germany.
| | | | | |
Collapse
|
19
|
Parikesit AA, Stadler PF, Prohaska SJ. Evolution and quantitative comparison of genome-wide protein domain distributions. Genes (Basel) 2011; 2:912-24. [PMID: 24710298 PMCID: PMC3927604 DOI: 10.3390/genes2040912] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2011] [Revised: 10/07/2011] [Accepted: 10/25/2011] [Indexed: 02/01/2023] Open
Abstract
The metabolic and regulatory capabilities of an organism are implicit in its protein content. This is often hard to estimate, however, due to ascertainment biases inherent in the available genome annotations. Its complement of recognizable functional protein domains and their combinations convey essentially the same information and at the same time are much more readily accessible, although protein domain models trained for one phylogenetic group frequently fail on distantly related sequences. Pooling related domain models based on their GO-annotation in combination with de novo gene prediction methods provides estimates that seem to be less affected by phylogenetic biases. We show here for 18 diverse representatives from all eukaryotic kingdoms that a pooled analysis of the tendencies for co-occurrence or avoidance of protein domains is indeed feasible. This type of analysis can reveal general large-scale patterns in the domain co-occurrence and helps to identify lineage-specific variations in the evolution of protein domains. Somewhat surprisingly, we do not find strong ubiquitous patterns governing the evolutionary behavior of specific functional classes. Instead, there are strong variations between the major groups of Eukaryotes, pointing at systematic differences in their evolutionary constraints.
Collapse
Affiliation(s)
- Arli A Parikesit
- Computational EvoDevo Group, Department of Computer Science, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany.
| | - Peter F Stadler
- Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany.
| | - Sonja J Prohaska
- Computational EvoDevo Group, Department of Computer Science, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany.
| |
Collapse
|
20
|
Findeiss S, Engelhardt J, Prohaska SJ, Stadler PF. Protein-coding structured RNAs: A computational survey of conserved RNA secondary structures overlapping coding regions in drosophilids. Biochimie 2011; 93:2019-23. [PMID: 21835221 DOI: 10.1016/j.biochi.2011.07.023] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2011] [Accepted: 07/19/2011] [Indexed: 11/15/2022]
Abstract
Functional RNA elements can be embedded also within exonic sequences coding for functional proteins. While not uncommon in viruses, only a few examples of this type have been described in some detail for eukaryotic genomes. Here we use RNAz and RNAcode, two comparative genomics methods that measure signatures of stabilizing selection acting on RNA secondary structure and peptide sequence, resp., to survey the fruit fly genomes. We estimate that there might be on the order of 1000 loci that are subject to dual selection pressure. The used genome-wide screens also expose the limitations of the currently available methods.
Collapse
Affiliation(s)
- Sven Findeiss
- Department of Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria.
| | | | | | | |
Collapse
|
21
|
Raincrow JD, Dewar K, Stocsits C, Prohaska SJ, Amemiya CT, Stadler PF, Chiu CH. Hox clusters of the bichir (Actinopterygii, Polypterus senegalus) highlight unique patterns of sequence evolution in gnathostome phylogeny. J Exp Zool B Mol Dev Evol 2011; 316:451-64. [PMID: 21688387 DOI: 10.1002/jez.b.21420] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2011] [Revised: 03/27/2011] [Accepted: 04/24/2011] [Indexed: 12/12/2022]
Abstract
Teleost fishes have extra Hox gene clusters owing to shared or lineage-specific genome duplication events in rayfinned fish (actinopterygian) phylogeny. Hence, extrapolating between genome function of teleosts and human or even between different fish species is difficult. We have sequenced and analyzed Hox gene clusters of the Senegal bichir (Polypterus senegalus), an extant representative of the most basal actinopterygian lineage. Bichir possesses four Hox gene clusters (A, B, C, D); phylogenetic analysis supports their orthology to the four Hox gene clusters of the gnathostome ancestor. We have generated a comprehensive database of conserved Hox noncoding sequences that include cartilaginous, lobe-finned, and ray-finned fishes (bichir and teleosts). Our analysis identified putative and known Hox cis-regulatory sequences with differing depths of conservation in Gnathostoma. We found that although bichir possesses four Hox gene clusters, its pattern of conservation of noncoding sequences is mosaic between outgroups, such as human, coelacanth, and shark, with four Hox gene clusters and teleosts, such as zebrafish and pufferfish, with seven or eight Hox gene clusters. Notably, bichir Hox gene clusters have been invaded by DNA transposons and this trend is further exemplified in teleosts, suggesting an as yet unrecognized mechanism of genome evolution that may explain Hox cluster plasticity in actinopterygians. Taken together, our results suggest that actinopterygian Hox gene clusters experienced a reduction in selective constraints that surprisingly predates the teleost-specific genome duplication.
Collapse
Affiliation(s)
- Jeremy D Raincrow
- Department of Genetics, Rutgers University, Piscataway, New Jersey, USA
| | | | | | | | | | | | | |
Collapse
|
22
|
Lechner M, Findeiss S, Steiner L, Marz M, Stadler PF, Prohaska SJ. Proteinortho: detection of (co-)orthologs in large-scale analysis. BMC Bioinformatics 2011; 12:124. [PMID: 21526987 PMCID: PMC3114741 DOI: 10.1186/1471-2105-12-124] [Citation(s) in RCA: 793] [Impact Index Per Article: 61.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2010] [Accepted: 04/28/2011] [Indexed: 02/07/2023] Open
Abstract
Background Orthology analysis is an important part of data analysis in many areas of bioinformatics such as comparative genomics and molecular phylogenetics. The ever-increasing flood of sequence data, and hence the rapidly increasing number of genomes that can be compared simultaneously, calls for efficient software tools as brute-force approaches with quadratic memory requirements become infeasible in practise. The rapid pace at which new data become available, furthermore, makes it desirable to compute genome-wide orthology relations for a given dataset rather than relying on relations listed in databases. Results The program Proteinortho described here is a stand-alone tool that is geared towards large datasets and makes use of distributed computing techniques when run on multi-core hardware. It implements an extended version of the reciprocal best alignment heuristic. We apply Proteinortho to compute orthologous proteins in the complete set of all 717 eubacterial genomes available at NCBI at the beginning of 2009. We identified thirty proteins present in 99% of all bacterial proteomes. Conclusions Proteinortho significantly reduces the required amount of memory for orthology analysis compared to existing tools, allowing such computations to be performed on off-the-shelf hardware.
Collapse
Affiliation(s)
- Marcus Lechner
- RNA Bioinformatics Group, Department of Pharmaceutical Chemistry, Philipps-University Marburg, Germany.
| | | | | | | | | | | |
Collapse
|
23
|
Krakauer DC, Collins JP, Erwin D, Flack JC, Fontana W, Laubichler MD, Prohaska SJ, West GB, Stadler PF. The challenges and scope of theoretical biology. J Theor Biol 2011; 276:269-76. [PMID: 21315730 DOI: 10.1016/j.jtbi.2011.01.051] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2010] [Revised: 12/03/2010] [Accepted: 01/31/2011] [Indexed: 11/28/2022]
Abstract
Scientific theories seek to provide simple explanations for significant empirical regularities based on fundamental physical and mechanistic constraints. Biological theories have rarely reached a level of generality and predictive power comparable to physical theories. This discrepancy is explained through a combination of frozen accidents, environmental heterogeneity, and widespread non-linearities observed in adaptive processes. At the same time, model building has proven to be very successful when it comes to explaining and predicting the behavior of particular biological systems. In this respect biology resembles alternative model-rich frameworks, such as economics and engineering. In this paper we explore the prospects for general theories in biology, and suggest that these take inspiration not only from physics, but also from the information sciences. Future theoretical biology is likely to represent a hybrid of parsimonious reasoning and algorithmic or rule-based explanation. An open question is whether these new frameworks will remain transparent to human reason. In this context, we discuss the role of machine learning in the early stages of scientific discovery. We argue that evolutionary history is not only a source of uncertainty, but also provides the basis, through conserved traits, for very general explanations for biological regularities, and the prospect of unified theories of life.
Collapse
|
24
|
Abstract
The diverse fields of Omics research share a common logical structure combining a cataloging effort for a particular class of molecules or interactions, the underlying -ome, and a quantitative aspect attempting to record spatiotemporal patterns of concentration, expression, or variation. Consequently, these fields also share a common set of difficulties and limitations. In spite of the great success stories of Omics projects over the last decade, much remains to be understood not only at the technological, but also at the conceptual level. Here, we focus on the dark corners of Omics research, where the problems, limitations, conceptual difficulties, and lack of knowledge are hidden.
Collapse
Affiliation(s)
- Sonja J Prohaska
- Department of Computer Science and Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany
| | | |
Collapse
|
25
|
Prohaska SJ, Stadler PF, Krakauer DC. Innovation in gene regulation: The case of chromatin computation. J Theor Biol 2010; 265:27-44. [DOI: 10.1016/j.jtbi.2010.03.011] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2009] [Accepted: 03/06/2010] [Indexed: 11/17/2022]
|
26
|
Bermudez-Santana C, Attolini CSO, Kirsten T, Engelhardt J, Prohaska SJ, Steigele S, Stadler PF. Genomic organization of eukaryotic tRNAs. BMC Genomics 2010; 11:270. [PMID: 20426822 PMCID: PMC2888827 DOI: 10.1186/1471-2164-11-270] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2010] [Accepted: 04/28/2010] [Indexed: 01/20/2023] Open
Abstract
BACKGROUND Surprisingly little is known about the organization and distribution of tRNA genes and tRNA-related sequences on a genome-wide scale. While tRNA gene complements are usually reported in passing as part of genome annotation efforts, and peculiar features such as the tandem arrangements of tRNA gene in Entamoeba histolytica have been described in some detail, systematic comparative studies are rare and mostly restricted to bacteria. We therefore set out to survey the genomic arrangement of tRNA genes and pseudogenes in a wide range of eukaryotes to identify common patterns and taxon-specific peculiarities. RESULTS In line with previous reports, we find that tRNA complements evolve rapidly and tRNA gene and pseudogene locations are subject to rapid turnover. At phylum level, the distributions of the number of tRNA genes and pseudogenes numbers are very broad, with standard deviations on the order of the mean. Even among closely related species we observe dramatic changes in local organization. For instance, 65% and 87% of the tRNA genes and pseudogenes are located in genomic clusters in zebrafish and stickleback, resp., while such arrangements are relatively rare in the other three sequenced teleost fish genomes. Among basal metazoa, Trichoplax adherens has hardly any duplicated tRNA gene, while the sea anemone Nematostella vectensis boasts more than 17000 tRNA genes and pseudogenes. Dramatic variations are observed even within the eutherian mammals. Higher primates, for instance, have 616 +/- 120 tRNA genes and pseudogenes of which 17% to 36% are arranged in clusters, while the genome of the bushbaby Otolemur garnetti has 45225 tRNA genes and pseudogenes of which only 5.6% appear in clusters. In contrast, the distribution is surprisingly uniform across plant genomes. Consistent with this variability, syntenic conservation of tRNA genes and pseudogenes is also poor in general, with turn-over rates comparable to those of unconstrained sequence elements. Despite this large variation in abundance in Eukarya we observe a significant correlation between the number of tRNA genes, tRNA pseudogenes, and genome size. CONCLUSIONS The genomic organization of tRNA genes and pseudogenes shows complex lineage-specific patterns characterized by an extensive variability that is in striking contrast to the extreme levels of sequence-conservation of the tRNAs themselves. The comprehensive analysis of the genomic organization of tRNA genes and pseudogenes in Eukarya provides a basis for further studies into the interplay of tRNA gene arrangements and genome organization in general.
Collapse
Affiliation(s)
- Clara Bermudez-Santana
- Bioinformatics Group, Department of Computer Science and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107, Leipzig, Germany
- Department of Biology, Universidad Nacional de Colombia. Carrera45 # 26-85 - Edificio Uriel Gutiérrez, Bogotá D.C., Colombia
| | - Camille Stephan-Otto Attolini
- Bioinformatics Group, Department of Computer Science and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107, Leipzig, Germany
- Biostatistics and Bioinformatics unit, Institute for Research in Biomedicine (IRB Barcelona), Barcelona, Spain
| | - Toralf Kirsten
- Bioinformatics Group, Department of Computer Science and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107, Leipzig, Germany
| | - Jan Engelhardt
- Bioinformatics Group, Department of Computer Science and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107, Leipzig, Germany
| | - Sonja J Prohaska
- Bioinformatics Group, Department of Computer Science and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107, Leipzig, Germany
| | | | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107, Leipzig, Germany
- Max Planck Institute for Mathematics in the Sciences, Inselstraß 22 D-04103 Leipzig, Germany
- Fraunhofer Institute for Cell Therapy and Immunology, Perlickstraße 1, D-04103 Leipzig, Germany
- Santa Fe Institute, 1399 Hyde Park Rd, Santa Fe, NM 87501, USA
- Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria
| |
Collapse
|
27
|
Abstract
The precise elucidation of the gene concept has become the subject of intense discussion in light of results from several, large high-throughput surveys of transcriptomes and proteomes. In previous work, we proposed an approach for constructing gene concepts that combines genomic heritability with elements of function. Here, we introduce a definition of the gene within a computational framework of cellular interactions. The definition seeks to satisfy the practical requirements imposed by annotation, capture logical aspects of regulation, and encompass the evolutionary property of homology.
Collapse
Affiliation(s)
- Peter F. Stadler
- Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, 04107 Leipzig, Germany
- Max Planck Institute for Mathematics in the Sciences, Inselstrasse 22, 04103 Leipzig, Germany
- Fraunhofer Institut für Zelltherapie und Immunologie, IZI Perlickstraße 1, 04103 Leipzig, Germany
- Department of Theoretical Chemistry, University of Vienna, Währingerstraße 17, 1090 Vienna, Austria
- Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501 USA
| | - Sonja J. Prohaska
- Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, 04107 Leipzig, Germany
| | - Christian V. Forst
- University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd., Dallas, TX 75390-9066 USA
| | | |
Collapse
|
28
|
Hiller M, Findeiss S, Lein S, Marz M, Nickel C, Rose D, Schulz C, Backofen R, Prohaska SJ, Reuter G, Stadler PF. Conserved introns reveal novel transcripts in Drosophila melanogaster. Genome Res 2009; 19:1289-300. [PMID: 19458021 DOI: 10.1101/gr.090050.108] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Noncoding RNAs that are-like mRNAs-spliced, capped, and polyadenylated have important functions in cellular processes. The inventory of these mRNA-like noncoding RNAs (mlncRNAs), however, is incomplete even in well-studied organisms, and so far, no computational methods exist to predict such RNAs from genomic sequences only. The subclass of these transcripts that is evolutionarily conserved usually has conserved intron positions. We demonstrate here that a genome-wide comparative genomics approach searching for short conserved introns is capable of identifying conserved transcripts with a high specificity. Our approach requires neither an open reading frame nor substantial sequence or secondary structure conservation in the surrounding exons. Thus it identifies spliced transcripts in an unbiased way. After applying our approach to insect genomes, we predict 369 introns outside annotated coding transcripts, of which 131 are confirmed by expressed sequence tags (ESTs) and/or noncoding FlyBase transcripts. Of the remaining 238 novel introns, about half are associated with protein-coding genes-either extending coding or untranslated regions or likely belonging to unannotated coding genes. The remaining 129 introns belong to novel mlncRNAs that are largely unstructured. Using RT-PCR, we verified seven of 12 tested introns in novel mlncRNAs and 11 of 17 introns in novel coding genes. The expression level of all verified mlncRNA transcripts is low but varies during development, which suggests regulation. As conserved introns indicate both purifying selection on the exon-intron structure and conserved expression of the transcript in related species, the novel mlncRNAs are good candidates for functional transcripts.
Collapse
Affiliation(s)
- Michael Hiller
- Bioinformatics Group, Albert-Ludwigs-University Freiburg, 79110 Freiburg, Germany.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
29
|
Heffel A, Stadler PF, Prohaska SJ, Kauer G, Kuska JP. PROCESS FLOW FOR CLASSIFICATION AND CLUSTERING OF FRUIT FLY GENE EXPRESSION PATTERNS. Proc Int Conf Image Proc 2008; 1:721-724. [PMID: 20046820 PMCID: PMC2800053 DOI: 10.1109/icip.2008.4711856] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The rapidly growing collection of fruit fly embryo images makes automated Image Segmentation and classification an indispensable requirement for a large-scale analysis of in situ hybridization (ISH) - gene expression patterns (GEP). We present here such an automated process flow for Segmenting, Classification, and Clustering large-scale sets of Drosophila melanogaster GEP that is capable of dealing with most of the complications implicated in the images.
Collapse
Affiliation(s)
- Andreas Heffel
- Interdisciplinary Centre for Bioinformatics, University of Leipzig, Härtelstraβe 16-18, 04107 Leipzig
| | | | | | | | | |
Collapse
|
30
|
Lehmann J, Stadler PF, Prohaska SJ. SynBlast: assisting the analysis of conserved synteny information. BMC Bioinformatics 2008; 9:351. [PMID: 18721485 PMCID: PMC2543028 DOI: 10.1186/1471-2105-9-351] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2008] [Accepted: 08/24/2008] [Indexed: 01/06/2023] Open
Abstract
Motivation In the last years more than 20 vertebrate genomes have been sequenced, and the rate at which genomic DNA information becomes available is rapidly accelerating. Gene duplication and gene loss events inherently limit the accuracy of orthology detection based on sequence similarity alone. Fully automated methods for orthology annotation do exist but often fail to identify individual members in cases of large gene families, or to distinguish missing data from traceable gene losses. This situation can be improved in many cases by including conserved synteny information. Results Here we present the SynBlast pipeline that is designed to construct and evaluate local synteny information. SynBlast uses the genomic region around a focal reference gene to retrieve candidates for homologous regions from a collection of target genomes and ranks them in accord with the available evidence for homology. The pipeline is intended as a tool to aid high quality manual annotation in particular in those cases where automatic procedures fail. We demonstrate how SynBlast is applied to retrieving orthologous and paralogous clusters using the vertebrate Hox and ParaHox clusters as examples. Software The SynBlast package written in Perl is available under the GNU General Public License at .
Collapse
Affiliation(s)
- Jörg Lehmann
- Bioinformatics Group, Department of Computer Science, University of Leipzig, Härtelstrasse 16-18, D-04107 Leipzig, Germany.
| | | | | |
Collapse
|
31
|
Amemiya CT, Prohaska SJ, Hill-Force A, Cook A, Wasserscheid J, Ferrier DE, Pascual-Anaya J, Garcia-Fernàndez J, Dewar K, Stadler PF. The amphioxusHox cluster: characterization, comparative genomics, and evolution. J Exp Zool 2008; 310:465-77. [DOI: 10.1002/jez.b.21213] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
|
32
|
Dress AWM, Flamm C, Fritzsch G, Grünewald S, Kruspe M, Prohaska SJ, Stadler PF. Noisy: identification of problematic columns in multiple sequence alignments. Algorithms Mol Biol 2008; 3:7. [PMID: 18577231 PMCID: PMC2464588 DOI: 10.1186/1748-7188-3-7] [Citation(s) in RCA: 100] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2008] [Accepted: 06/24/2008] [Indexed: 11/10/2022] Open
Abstract
MOTIVATION Sequence-based methods for phylogenetic reconstruction from (nucleic acid) sequence data are notoriously plagued by two effects: homoplasies and alignment errors. Large evolutionary distances imply a large number of homoplastic sites. As most protein-coding genes show dramatic variations in substitution rates that are not uncorrelated across the sequence, this often leads to a patchwork pattern of (i) phylogenetically informative and (ii) effectively randomized regions. In highly variable regions, furthermore, alignment errors accumulate resulting in sometimes misleading signals in phylogenetic reconstruction. RESULTS We present here a method that, based on assessing the distribution of character states along a cyclic ordering of the taxa, allows the identification of phylogenetically uninformative homoplastic sites in a multiple sequence alignment. Removal of these sites appears to improve the performance of phylogenetic reconstruction algorithms as measured by various indices of "tree quality". In particular, we obtain more stable trees due to the exclusion of phylogenetically incompatible sites that most likely represent strongly randomized characters. SOFTWARE The computer program noisy implements this approach. It can be employed to improving phylogenetic reconstruction capability with quite a considerable success rate whenever (1) the average bootstrap support obtained from the original alignment is low, and (2) there are sufficiently many taxa in the data set - at least, say, 12 to 15 taxa. The software can be obtained under the GNU Public License from http://www.bioinf.uni-leipzig.de/Software/noisy/.
Collapse
Affiliation(s)
- Andreas WM Dress
- Department of Combinatorics and Geometry (DCG), MPG/CAS Partner Institute for Computational Biology (PICB), Shanghai Institutes for Biological Sciences (SIBS), Shanghai, PR China
- Max Planck Institute for Mathematics in the Sciences, Inselstrasse 22 -26, D 04103 Leipzig, Germany
| | - Christoph Flamm
- Institut für Theoretische Chemie und Molekulare Strukturbiologie Universität Wien, Währingerstraße 17, A-1090 Wien, Austria
| | - Guido Fritzsch
- Institute of Biology II: Zoologie, Molekulare Evolution und Systematik der Tiere, University of Leipzig, Talstrasse 33, D-04103 Leipzig, Germany
- Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany
| | - Stefan Grünewald
- Department of Combinatorics and Geometry (DCG), MPG/CAS Partner Institute for Computational Biology (PICB), Shanghai Institutes for Biological Sciences (SIBS), Shanghai, PR China
- Max Planck Institute for Mathematics in the Sciences, Inselstrasse 22 -26, D 04103 Leipzig, Germany
| | - Matthias Kruspe
- Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany
| | - Sonja J Prohaska
- Institut für Theoretische Chemie und Molekulare Strukturbiologie Universität Wien, Währingerstraße 17, A-1090 Wien, Austria
- Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe NM 87501, USA
- Biomedical Informatics, Arizona State University, PO-Box 878809, Tempe, AZ 85287, USA
| | - Peter F Stadler
- Institut für Theoretische Chemie und Molekulare Strukturbiologie Universität Wien, Währingerstraße 17, A-1090 Wien, Austria
- Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany
- Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe NM 87501, USA
- Bioinformatics Group, Department of Computer Science, Universität Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany
- RNomics Group, Fraunhofer Institut for Cell Therapy and Immunology (IZI), Perlickstraße 1, D-04103 Leipzig, Germany
| |
Collapse
|
33
|
Abstract
In order to describe a cell at molecular level, a notion of a “gene” is neither necessary nor helpful. It is sufficient to consider the molecules (i.e., chromosomes, transcripts, proteins) and their interactions to describe cellular processes. The downside of the resulting high resolution is that it becomes very tedious to address features on the organismal and phenotypic levels with a language based on molecular terms. Looking for the missing link between biological disciplines dealing with different levels of biological organization, we suggest to return to the original intent behind the term “gene”. To this end, we propose to investigate whether a useful notion of “gene” can be constructed based on an underlying notion of function, and whether this can serve as the necessary link and embed the various distinct gene concepts of biological (sub)disciplines in a coherent theoretical framework. In reply to the Genon Theory recently put forward by Klaus Scherrer and Jürgen Jost in this journal, we shall discuss a general approach to assess a gene definition that should then be tested for its expressiveness and potential cross-disciplinary relevance.
Collapse
Affiliation(s)
- Sonja J Prohaska
- Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM, 87501, USA.
| | | |
Collapse
|
34
|
Rose D, Hackermüller J, Washietl S, Reiche K, Hertel J, Findeiss S, Stadler PF, Prohaska SJ. Computational RNomics of drosophilids. BMC Genomics 2007; 8:406. [PMID: 17996037 PMCID: PMC2216035 DOI: 10.1186/1471-2164-8-406] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2007] [Accepted: 11/08/2007] [Indexed: 11/11/2022] Open
Abstract
Background Recent experimental and computational studies have provided overwhelming evidence for a plethora of diverse transcripts that are unrelated to protein-coding genes. One subclass consists of those RNAs that require distinctive secondary structure motifs to exert their biological function and hence exhibit distinctive patterns of sequence conservation characteristic for positive selection on RNA secondary structure. The deep-sequencing of 12 drosophilid species coordinated by the NHGRI provides an ideal data set of comparative computational approaches to determine those genomic loci that code for evolutionarily conserved RNA motifs. This class of loci includes the majority of the known small ncRNAs as well as structured RNA motifs in mRNAs. We report here on a genome-wide survey using RNAz. Results We obtain 16 000 high quality predictions among which we recover the majority of the known ncRNAs. Taking a pessimistically estimated false discovery rate of 40% into account, this implies that at least some ten thousand loci in the Drosophila genome show the hallmarks of stabilizing selection action of RNA structure, and hence are most likely functional at the RNA level. A subset of RNAz predictions overlapping with TRF1 and BRF binding sites [Isogai et al., EMBO J. 26: 79–89 (2007)], which are plausible candidates of Pol III transcripts, have been studied in more detail. Among these sequences we identify several "clusters" of ncRNA candidates with striking structural similarities. Conclusion The statistical evaluation of the RNAz predictions in comparison with a similar analysis of vertebrate genomes [Washietl et al., Nat. Biotech. 23: 1383–1390 (2005)] shows that qualitatively similar fractions of structured RNAs are found in introns, UTRs, and intergenic regions. The intergenic RNA structures, however, are concentrated much more closely around known protein-coding loci, suggesting that flies have significantly smaller complement of independent structured ncRNAs compared to mammals.
Collapse
Affiliation(s)
- Dominic Rose
- Bioinformatics Group, Department of Computer Science, University of Leipzig, Härtelstrasse 16-18, Leipzig, Germany.
| | | | | | | | | | | | | | | |
Collapse
|
35
|
Backofen R, Bernhart SH, Flamm C, Fried C, Fritzsch G, Hackermüller J, Hertel J, Hofacker IL, Missal K, Mosig A, Prohaska SJ, Rose D, Stadler PF, Tanzer A, Washietl S, Will S. RNAs everywhere: genome-wide annotation of structured RNAs. J Exp Zool B Mol Dev Evol 2007; 308:1-25. [PMID: 17171697 DOI: 10.1002/jez.b.21130] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Starting with the discovery of microRNAs and the advent of genome-wide transcriptomics, non-protein-coding transcripts have moved from a fringe topic to a central field research in molecular biology. In this contribution we review the state of the art of "computational RNomics", i.e., the bioinformatics approaches to genome-wide RNA annotation. Instead of rehashing results from recently published surveys in detail, we focus here on the open problem in the field, namely (functional) annotation of the plethora of putative RNAs. A series of exploratory studies are used to provide non-trivial examples for the discussion of some of the difficulties.
Collapse
|
36
|
Morgenstern B, Prohaska SJ, Pöhler D, Stadler PF. Multiple sequence alignment with user-defined anchor points. Algorithms Mol Biol 2006; 1:6. [PMID: 16722533 PMCID: PMC1481597 DOI: 10.1186/1748-7188-1-6] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2006] [Accepted: 04/19/2006] [Indexed: 11/15/2022] Open
Abstract
Background Automated software tools for multiple alignment often fail to produce biologically meaningful results. In such situations, expert knowledge can help to improve the quality of alignments. Results Herein, we describe a semi-automatic version of the alignment program DIALIGN that can take pre-defined constraints into account. It is possible for the user to specify parts of the sequences that are assumed to be homologous and should therefore be aligned to each other. Our software program can use these sites as anchor points by creating a multiple alignment respecting these constraints. This way, our alignment method can produce alignments that are biologically more meaningful than alignments produced by fully automated procedures. As a demonstration of how our method works, we apply our approach to genomic sequences around the Hox gene cluster and to a set of DNA-binding proteins. As a by-product, we obtain insights about the performance of the greedy algorithm that our program uses for multiple alignment and about the underlying objective function. This information will be useful for the further development of DIALIGN. The described alignment approach has been integrated into the TRACKER software system.
Collapse
Affiliation(s)
- Burkhard Morgenstern
- Universität Göttingen, Institut für Mikrobiologie und Genetik, Abteilung für Bioinformatik, Goldschmidtstrasse. 1, D-37077 Göttingen, Germany
| | - Sonja J Prohaska
- Universität Leipzig, Institut für Informatik und Interdisziplinäres Zentrum für Bioinformatik, Kreuzstrasse 7b, D-04103 Leipzig, Germany
| | - Dirk Pöhler
- Universität Göttingen, Institut für Mikrobiologie und Genetik, Abteilung für Bioinformatik, Goldschmidtstrasse. 1, D-37077 Göttingen, Germany
| | - Peter F Stadler
- Universität Leipzig, Institut für Informatik und Interdisziplinäres Zentrum für Bioinformatik, Kreuzstrasse 7b, D-04103 Leipzig, Germany
| |
Collapse
|
37
|
Abstract
The ParaHox cluster contains three Hox-related homeobox genes. The evolution of this sister of the Hox-gene clusters has been studied extensively in metazoans with a focus on its early evolution. Its fate within the vertebrate lineage, and in particular following the teleost-specific genome duplication, however, has not received much attention. Three of the four human ParaHox loci are linked with PDGFR family tyrosine kinases. We demonstrate that these loci arose as duplications in an ancestral vertebrate and trace the subsequent history of gene losses. Surprisingly, teleost fishes have not expanded their ParaHox repertoire following the teleost-specific genome duplication, while duplicates of the associated tyrosine kinases have survived, supporting the hypothesis of a large-scale duplication followed by extensive gene loss.
Collapse
Affiliation(s)
- Sonja J Prohaska
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, D-04107 Leipzig, Germany.
| | | |
Collapse
|
38
|
Wagner GP, Takahashi K, Lynch V, Prohaska SJ, Fried C, Stadler PF, Amemiya C. Molecular evolution of duplicated ray finned fish HoxA clusters: increased synonymous substitution rate and asymmetrical co-divergence of coding and non-coding sequences. J Mol Evol 2005; 60:665-76. [PMID: 15983874 DOI: 10.1007/s00239-004-0252-z] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2004] [Accepted: 11/22/2004] [Indexed: 01/01/2023]
Abstract
In this study the molecular evolution of duplicated HoxA genes in zebrafish and fugu has been investigated. All 18 duplicated HoxA genes studied have a higher non-synonymous substitution rate than the corresponding genes in either bichir or paddlefish, where these genes are not duplicated. The higher rate of evolution is not due solely to a higher non-synonymous-to-synonymous rate ratio but to an increase in both the non-synonymous as well as the synonymous substitution rate. The synonymous rate increase can be explained by a change in base composition, codon usage, or mutation rate. We found no changes in nucleotide composition or codon bias. Thus, we suggest that the HoxA genes may experience an increased mutation rate following cluster duplication. In the non-Hox nuclear gene RAG1 only an increase in non-synonymous substitutions could be detected, suggesting that the increased mutation rate is specific to duplicated Hox clusters and might be related to the structural instability of Hox clusters following duplication. The divergence among paralog genes tends to be asymmetric, with one paralog diverging faster than the other. In fugu, all b-paralogs diverge faster than the a-paralogs, while in zebrafish Hoxa-13a diverges faster. This asymmetry corresponds to the asymmetry in the divergence rate of conserved non-coding sequences, i.e., putative cis-regulatory elements. These results suggest that the 5' HoxA genes in the same cluster belong to a co-evolutionary unit in which genes have a tendency to diverge together.
Collapse
Affiliation(s)
- Günter P Wagner
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT 06520-8106, USA.
| | | | | | | | | | | | | |
Collapse
|
39
|
Stadler PF, Fried C, Prohaska SJ, Bailey WJ, Misof BY, Ruddle FH, Wagner GP. Evidence for independent Hox gene duplications in the hagfish lineage: a PCR-based gene inventory of Eptatretus stoutii. Mol Phylogenet Evol 2005; 32:686-94. [PMID: 15288047 DOI: 10.1016/j.ympev.2004.03.015] [Citation(s) in RCA: 70] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2003] [Revised: 02/13/2004] [Indexed: 11/22/2022]
Abstract
Hox genes code for transcription factors that play a major role in the development of all animal phyla. In invertebrates these genes usually occur as tightly linked cluster, with a few exceptions where the clusters have been dissolved. Only in vertebrates multiple clusters have been demonstrated which arose by duplication from a single ancestral cluster. This history of Hox cluster duplications, in particular during the early elaboration of the vertebrate body plan, is still poorly understood. In this paper we report the results of a PCR survey on genomic DNA of the pacific hagfish Eptatretus stoutii. Hagfishes are one of two clades of recent jawless fishes that are an offshoot of the early radiation of jawless vertebrates. Our data provide evidence for at least 33 distinct Hox genes in the hagfish genome, which is most compatible with the hypothesis of multiple Hox clusters. The largest number, seven, of distinct homeobox fragments could be assigned to paralog group 9, which could imply that the hagfish has more than four clusters. Quartet mapping reveals that within each paralog group the hagfish sequences are statistically more closely related to gnathostome Hox genes than with either amphioxus or lamprey genes. These results support two assumptions about the history of Hox genes: (1) The association of hagfish homeobox sequences with gnathostome sequences suggests that at least one Hox cluster duplication event happened in the stem of vertebrates, i.e., prior to the most recent common ancestor of jawed and jawless vertebrates. (2) The high number of paralog group 9 sequences in hagfish and the phylogenetic position of hagfish suggests that the hagfish lineage underwent additional independent Hox cluster/-gene duplication events.
Collapse
Affiliation(s)
- Peter F Stadler
- Lehrstuhl für Bioinformatik, Institut für Informatik, Universität Leipzig, Kreuzstrasse 7b, D-04103 Leipzig, Germany.
| | | | | | | | | | | | | |
Collapse
|
40
|
Abstract
Phylogenetic footprints are short pieces of noncoding DNA sequence in the vicinity of a gene that are conserved between evolutionary distant species. A seemingly simple problem is to sort footprints in their order along the genomes. It is complicated by the fact that not all footprints are collinear: they may cross each other. The problem thus becomes the identification of the crossing footprints, the sorting of the remaining collinear cliques, and finally the insertion of the noncollinear ones at "reasonable" positions. We show that solving the footprint sorting problem requires the solution of the "Minimum Weight Vertex Feedback Set Problem", which is known to be NP-complete and APX-hard. Nevertheless good approximations can be obtained for data sets of interest. The remaining steps of the sorting process are straightforward: computation of the transitive closure of an acyclic graph, linear extension of the resulting partial order, and finally sorting w.r.t. the linear extension. Alternatively, the footprint sorting problem can be rephrased as a combinatorial optimization problem for which approximate solutions can be obtained by means of general purpose heuristics. Footprint sortings obtained with different methods can be compared using a version of multiple sequence alignment that allows the identification of unambiguously ordered sublists. As an application we show that the rat has a slightly increased insertion/deletion rate in comparison to the mouse genome.
Collapse
Affiliation(s)
- Claudia Fried
- Bioinformatics, Department of Computer Science, University of Leipzig, Germany
| | | | | | | | | |
Collapse
|
41
|
Morgenstern B, Werner N, Prohaska SJ, Steinkamp R, Schneider I, Subramanian AR, Stadler PF, Weyer-Menkhoff J. Multiple sequence alignment with user-defined constraints at GOBICS. Bioinformatics 2004; 21:1271-3. [PMID: 15546937 DOI: 10.1093/bioinformatics/bti142] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Most multi-alignment methods are fully automated, i.e. they are based on a fixed set of mathematical rules. For various reasons, such methods may fail to produce biologically meaningful alignments. Herein, we describe a semi-automatic approach to multiple sequence alignment where biological expert knowledge can be used to influence the alignment procedure. The user can specify parts of the sequences that are biologically related to each other; our software program uses these sites as anchor points and creates a multiple alignment respecting these user-defined constraints. By using known functionally, structurally or evolutionarily related positions of the input sequences as anchor points, our method can produce alignments that reflect the true biological relationships among the input sequences more accurately than fully automated procedures can do.
Collapse
Affiliation(s)
- Burkhard Morgenstern
- Institut für Mikrobiologie und Genetik, Universität Göttingen, Abteilung für Bioinformatik Goldschmidtstrasse 1, D-37077 Göttingen, Germany.
| | | | | | | | | | | | | | | |
Collapse
|
42
|
Prohaska SJ, Fried C, Flamm C, Wagner GP, Stadler PF. Surveying phylogenetic footprints in large gene clusters: applications to Hox cluster duplications. Mol Phylogenet Evol 2004; 31:581-604. [PMID: 15062796 DOI: 10.1016/j.ympev.2003.08.009] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2003] [Revised: 08/07/2003] [Indexed: 11/23/2022]
Abstract
Evolutionarily conserved non-coding genomic sequences represent a potentially rich source for the discovery of gene regulatory regions. Since these elements are subject to stabilizing selection they evolve much more slowly than adjacent non-functional DNA. These so-called phylogenetic footprints can be detected by comparison of the sequences surrounding orthologous genes in different species. Therefore the loss of phylogenetic footprints as well as the acquisition of conserved non-coding sequences in some lineages, but not in others, can provide evidence for the evolutionary modification of cis-regulatory elements. We introduce here a statistical model of footprint evolution that allows us to estimate the loss of sequence conservation that can be attributed to gene loss and other structural reasons. This approach to studying the pattern of cis-regulatory element evolution, however, requires the comparison of relatively long sequences from many species. We have therefore developed an efficient software tool for the identification of corresponding footprints in long sequences from multiple species. We apply this novel method to the published sequences of HoxA clusters of shark, human, and the duplicated zebrafish and Takifugu clusters as well as the published HoxB cluster sequences. We find that there is a massive loss of sequence conservation in the intergenic region of the HoxA clusters, consistent with the finding in [Chiu et al., PNAS 99 (2002) 5492]. The loss of conservation after cluster duplication is more extensive than expected from structural reasons. This suggests that binding site turnover and/or adaptive modification may also contribute to the loss of sequence conservation.
Collapse
Affiliation(s)
- Sonja J Prohaska
- Lehrstuhl für Bioinformatik, Institut für Informatik, Uniersitäat Leipzig, Germany.
| | | | | | | | | |
Collapse
|
43
|
Abstract
The statistical analysis of phylogenetic footprints in the two known horn shark Hox clusters and the four mammalian clusters shows that the shark HoxN cluster is HoxD-like. This finding implies that the most recent common ancestor of jawed vertebrates had at least four Hox clusters, including those which are orthologous to the four mammalian Hox clusters.
Collapse
Affiliation(s)
- Sonja J Prohaska
- Bioinformatik, Institut für Informatik, Universität Leipzig, Kreuzstrassse 7b, D-04103 Leipzig, Germany
| | | | | | | | | | | |
Collapse
|
44
|
Abstract
In many eukaryotic genomes only a small fraction of the DNA codes for proteins, but the non-protein coding DNA harbors important genetic elements directing the development and the physiology of the organisms, like promoters, enhancers, insulators, and micro-RNA genes. The molecular evolution of these genetic elements is difficult to study because their functional significance is hard to deduce from sequence information alone. Here we propose an approach to the study of the rate of evolution of functional non-coding sequences at a macro-evolutionary scale. We identify functionally important non-coding sequences as Conserved Non-Coding Nucleotide (CNCN) sequences from the comparison of two outgroup species. The CNCN sequences so identified are then compared to their homologous sequences in a pair of ingroup species, and we monitor the degree of modification these sequences suffered in the two ingroup lineages. We propose a method to test for rate differences in the modification of CNCN sequences among the two ingroup lineages, as well as a method to estimate their rate of modification. We apply this method to the full sequences of the HoxA clusters from six gnathostome species: a shark, Heterodontus francisci; a basal ray finned fish, Polypterus senegalus; the amphibian, Xenopus tropicalis; as well as three mammalian species, human, rat and mouse. The results show that the evolutionary rate of CNCN sequences is not distinguishable among the three mammalian lineages, while the Xenopus lineage has a significantly increased rate of evolution. Furthermore the estimates of the rate parameters suggest that in the stem lineage of mammals the rate of CNCN sequence evolution was more than twice the rate observed within the placental amniotes clade, suggesting a high rate of evolution of cis-regulatory elements during the origin of amniotes and mammals. We conclude that the proposed methods can be used for testing hypotheses about the rate and pattern of evolution of putative cis-regulatory elements.
Collapse
Affiliation(s)
- Günter P Wagner
- Department of Ecology and Evolutionary Biology Yale University, New Haven, Connecticut, USA.
| | | | | | | |
Collapse
|
45
|
Abstract
Higher teleost fishes, including zebrafish and fugu, have duplicated their Hox genes relative to the gene inventory of other gnathostome lineages. The most widely accepted theory contends that the duplicate Hox clusters orginated synchronously during a single genome duplication event in the early history of ray-finned fishes. In this contribution we collect and re-evaluate all publicly available sequence information. In particular, we show that the short Hox gene fragments from published PCR surveys of the killifish Fundulus heteroclitus, the medaka Oryzias latipes and the goldfish Carassius auratus can be used to determine with little ambiguity not only their paralog group but also their membership in a particular cluster.Together with a survey of the genomic sequence data from the pufferfish Tetraodon nigroviridis we show that at least percomorpha, and possibly all eutelosts, share a system of 7 or 8 orthologous Hox gene clusters. There is little doubt about the orthology of the two teleost duplicates of the HoxA and HoxB clusters. A careful analysis of both the coding sequence of Hox genes and of conserved non-coding sequences provides additional support for the "duplication early" hypothesis that the Hox clusters in teleosts are derived from eight ancestral clusters by means of subsequent gene loss; the data remain ambiguous, however, in particular for the HoxC clusters.Assuming the "duplication early" hypothesis we use the new evidence on the Hox gene complements to determine the phylogenetic positions of gene-loss events in the wake of the cluster duplication. Surprisingly, we find that the resolution of redundancy seems to be a slow process that is still ongoing. A few suggestions on which additional sequence data would be most informative for resolving the history of the teleostean Hox genes are discussed.
Collapse
Affiliation(s)
- Sonja J Prohaska
- Lehrstuhl für Bioinformatik am Institut für Informatik, Universität Leipzig, Kreuzstraße 7b, D-04103, Leipzig, Germany
| | | |
Collapse
|
46
|
Chiu CH, Dewar K, Wagner GP, Takahashi K, Ruddle F, Ledje C, Bartsch P, Scemama JL, Stellwag E, Fried C, Prohaska SJ, Stadler PF, Amemiya CT. Bichir HoxA cluster sequence reveals surprising trends in ray-finned fish genomic evolution. Genome Res 2004; 14:11-7. [PMID: 14707166 PMCID: PMC314268 DOI: 10.1101/gr.1712904] [Citation(s) in RCA: 83] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The study of Hox clusters and genes provides insights into the evolution of genomic regulation of development. Derived ray-finned fishes (Actinopterygii, Teleostei) such as zebrafish and pufferfish possess duplicated Hox clusters that have undergone considerable sequence evolution. Whether these changes are associated with the duplication(s) that produced extra Hox clusters is unresolved because comparison with basal lineages is unavailable. We sequenced and analyzed the HoxA cluster of the bichir (Polypterus senegalus), a phylogenetically basal actinopterygian. Independent lines of evidence indicate that bichir has one HoxA cluster that is mosaic in its patterns of noncoding sequence conservation and gene retention relative to the HoxA clusters of human and shark, and the HoxAalpha and HoxAbeta clusters of zebrafish, pufferfish, and striped bass. HoxA cluster noncoding sequences conserved between bichir and euteleosts indicate that novel cis-sequences were acquired in the stem actinopterygians and maintained after cluster duplication. Hence, in the earliest actinopterygians, evolution of the single HoxA cluster was already more dynamic than in human and shark. This tendency peaked among teleosts after HoxA cluster duplication.
Collapse
Affiliation(s)
- Chi-Hua Chiu
- Department of Genetics, Rutgers University, Piscataway, New Jersey 08854, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
47
|
Abstract
Despite their homology and analogous function, the Hox gene clusters of vertebrates and invertebrates are subject to different constraints on their structural organization. This is demonstrated by a drastically different distribution of repetitive DNA elements in the Hox cluster regions. While gnathostomes have a strong tendency to exclude repetitive DNA elements from the inside of their Hox clusters, no such trend can be detected in the Hox gene clusters of protostomes. Repeats "invade" the gnathostome Hox clusters from the 5' and 3' ends while the core of the clusters remains virtually free of repetitive DNA. This invasion appears to be correlated with relaxed constraints associated with gene loss after cluster duplications.
Collapse
Affiliation(s)
- Claudia Fried
- Bioinformatics Group, Department of Computer Science, University of Leipzig Kreuzstrabetae 7b, D-04103 Leipzig, Germany.
| | | | | |
Collapse
|
48
|
Abstract
The analysis of the publicly available Hox gene sequences from the sea lamprey Petromyzon marinus provides evidence that the Hox clusters in lampreys and other vertebrate species arose from independent duplications. In particular, our analysis supports the hypothesis that the last common ancestor of agnathans and gnathostomes had only a single Hox cluster which was subsequently duplicated independently in the two lineages.
Collapse
Affiliation(s)
- Claudia Fried
- Bioinformatik, Institut für Informatik, Universität Leipzig, Kreuzstrasse 7b, D-04103 Leipzig, Germany
| | | | | |
Collapse
|