1
|
Dröge J, Gregor I, McHardy AC. Taxator-tk: precise taxonomic assignment of metagenomes by fast approximation of evolutionary neighborhoods. Bioinformatics 2015; 31:817-24. [PMID: 25388150 PMCID: PMC4380030 DOI: 10.1093/bioinformatics/btu745] [Citation(s) in RCA: 91] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2014] [Revised: 11/04/2014] [Accepted: 11/05/2014] [Indexed: 01/17/2023] Open
Abstract
MOTIVATION Metagenomics characterizes microbial communities by random shotgun sequencing of DNA isolated directly from an environment of interest. An essential step in computational metagenome analysis is taxonomic sequence assignment, which allows identifying the sequenced community members and reconstructing taxonomic bins with sequence data for the individual taxa. For the massive datasets generated by next-generation sequencing technologies, this cannot be performed with de-novo phylogenetic inference methods. We describe an algorithm and the accompanying software, taxator-tk, which performs taxonomic sequence assignment by fast approximate determination of evolutionary neighbors from sequence similarities. RESULTS Taxator-tk was precise in its taxonomic assignment across all ranks and taxa for a range of evolutionary distances and for short as well as for long sequences. In addition to the taxonomic binning of metagenomes, it is well suited for profiling microbial communities from metagenome samples because it identifies bacterial, archaeal and eukaryotic community members without being affected by varying primer binding strengths, as in marker gene amplification, or copy number variations of marker genes across different taxa. Taxator-tk has an efficient, parallelized implementation that allows the assignment of 6 Gb of sequence data per day on a standard multiprocessor system with 10 CPU cores and microbial RefSeq as the genomic reference data.
Collapse
Affiliation(s)
- J Dröge
- Department for Algorithmic Bioinformatics, Heinrich Heine University, Universitätsstraße 1, 40225 Düsseldorf, Germany, Max-Planck Research Group for Computational Genomics and Epidemiology, Max-Planck Institute for Informatics, University Campus E1 4, 66123 Saarbrücken, Germany and Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Inhoffenstraße 7, 38124 Braunschweig, Germany Department for Algorithmic Bioinformatics, Heinrich Heine University, Universitätsstraße 1, 40225 Düsseldorf, Germany, Max-Planck Research Group for Computational Genomics and Epidemiology, Max-Planck Institute for Informatics, University Campus E1 4, 66123 Saarbrücken, Germany and Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Inhoffenstraße 7, 38124 Braunschweig, Germany
| | - I Gregor
- Department for Algorithmic Bioinformatics, Heinrich Heine University, Universitätsstraße 1, 40225 Düsseldorf, Germany, Max-Planck Research Group for Computational Genomics and Epidemiology, Max-Planck Institute for Informatics, University Campus E1 4, 66123 Saarbrücken, Germany and Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Inhoffenstraße 7, 38124 Braunschweig, Germany Department for Algorithmic Bioinformatics, Heinrich Heine University, Universitätsstraße 1, 40225 Düsseldorf, Germany, Max-Planck Research Group for Computational Genomics and Epidemiology, Max-Planck Institute for Informatics, University Campus E1 4, 66123 Saarbrücken, Germany and Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Inhoffenstraße 7, 38124 Braunschweig, Germany
| | - A C McHardy
- Department for Algorithmic Bioinformatics, Heinrich Heine University, Universitätsstraße 1, 40225 Düsseldorf, Germany, Max-Planck Research Group for Computational Genomics and Epidemiology, Max-Planck Institute for Informatics, University Campus E1 4, 66123 Saarbrücken, Germany and Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Inhoffenstraße 7, 38124 Braunschweig, Germany Department for Algorithmic Bioinformatics, Heinrich Heine University, Universitätsstraße 1, 40225 Düsseldorf, Germany, Max-Planck Research Group for Computational Genomics and Epidemiology, Max-Planck Institute for Informatics, University Campus E1 4, 66123 Saarbrücken, Germany and Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Inhoffenstraße 7, 38124 Braunschweig, Germany Department for Algorithmic Bioinformatics, Heinrich Heine University, Universitätsstraße 1, 40225 Düsseldorf, Germany, Max-Planck Research Group for Computational Genomics and Epidemiology, Max-Planck Institute for Informatics, University Campus E1 4, 66123 Saarbrücken, Germany and Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Inhoffenstraße 7, 38124 Braunschweig, Germany
| |
Collapse
|
2
|
Dröge J, Buczek D, Suzuki Y, Makałowski W. Amoebozoa possess lineage-specific globin gene repertoires gained by individual horizontal gene transfers. Int J Biol Sci 2014; 10:689-701. [PMID: 25013378 PMCID: PMC4081604 DOI: 10.7150/ijbs.8327] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2013] [Accepted: 03/24/2014] [Indexed: 12/13/2022] Open
Abstract
The Amoebozoa represent a clade of unicellular amoeboid organisms that display a wide variety of lifestyles, including free-living and parasitic species. For example, the social amoeba Dictyostelium discoideum has the ability to aggregate into a multicellular fruiting body upon starvation, while the pathogenic amoeba Entamoeba histolytica is a parasite of humans. Globins are small heme proteins that are present in almost all extant organisms. Although several genomes of amoebozoan species have been sequenced, little is known about the phyletic distribution of globin genes within this phylum. Only two flavohemoglobins (FHbs) of D. discoideum have been reported and characterized previously while the genomes of Entamoeba species are apparently devoid of globin genes. We investigated eleven amoebozoan species for the presence of globin genes by genomic and phylogenetic in silico analyses. Additional FHb genes were identified in the genomes of four social amoebas and the true slime mold Physarum polycephalum. Moreover, a single-domain globin (SDFgb) of Hartmannella vermiformis, as well as two truncated hemoglobins (trHbs) of Acanthamoeba castellanii were identified. Phylogenetic evidence suggests that these globin genes were independently acquired via horizontal gene transfer from some ancestral bacteria. Furthermore, the phylogenetic tree of amoebozoan FHbs indicates that they do not share a common ancestry and that a transfer of FHbs from bacteria to amoeba occurred multiple times.
Collapse
Affiliation(s)
- Jasmin Dröge
- 1. Institute of Bioinformatics, Faculty of Medicine, University of Muenster, Niels Stensen Str. 14, 48149 Muenster, Germany
| | - Dorota Buczek
- 1. Institute of Bioinformatics, Faculty of Medicine, University of Muenster, Niels Stensen Str. 14, 48149 Muenster, Germany ; 2. Institute of Molecular Biology and Biotechnology, A. Mickiewicz University, Poznan, Poland
| | - Yutaka Suzuki
- 3. Department of Medical Genomic Sciences, University of Tokyo, Tokyo, Japan
| | - Wojciech Makałowski
- 1. Institute of Bioinformatics, Faculty of Medicine, University of Muenster, Niels Stensen Str. 14, 48149 Muenster, Germany ; 3. Department of Medical Genomic Sciences, University of Tokyo, Tokyo, Japan
| |
Collapse
|
3
|
Abstract
BACKGROUND Neuroglobin (Ngb) is a hexacoordinated globin expressed mainly in the central and peripheral nervous system of vertebrates. Although several hypotheses have been put forward regarding the role of neuroglobin, its definite function remains uncertain. Ngb appears to have a neuro-protective role enhancing cell viability under hypoxia and other types of oxidative stress. Ngb is phylogenetically ancient and has a substitution rate nearly four times lower than that of other vertebrate globins, e.g. hemoglobin. Despite its high sequence conservation among vertebrates Ngb seems to be elusive in invertebrates. PRINCIPAL FINDINGS We determined candidate orthologs in invertebrates and identified a globin of the placozoan Trichoplax adhaerens that is most likely orthologous to vertebrate Ngb and confirmed the orthologous relationship of the polymeric globin of the sea urchin Strongylocentrotus purpuratus to Ngb. The putative orthologous globin genes are located next to genes orthologous to vertebrate POMT2 similarly to localization of vertebrate Ngb. The shared syntenic position of the globins from Trichoplax, the sea urchin and of vertebrate Ngb strongly suggests that they are orthologous. A search for conserved transcription factor binding sites (TFBSs) in the promoter regions of the Ngb genes of different vertebrates via phylogenetic footprinting revealed several TFBSs, which may contribute to the specific expression of Ngb, whereas a comparative analysis with myoglobin revealed several common TFBSs, suggestive of regulatory mechanisms common to globin genes. SIGNIFICANCE Identification of the placozoan and echinoderm genes orthologous to vertebrate neuroglobin strongly supports the hypothesis of the early evolutionary origin of this globin, as it shows that neuroglobin was already present in the placozoan-bilaterian last common ancestor. Computational determination of the transcription factor binding sites repertoire provides on the one hand a set of transcriptional factors that are responsible for the specific expression of the Ngb genes and on the other hand a set of factors potentially controlling expression of a couple of different globin genes.
Collapse
Affiliation(s)
- Jasmin Dröge
- Institute of Bioinformatics, Faculty of Medicine, University of Muenster, Muenster, Germany
| | - Amit Pande
- Institute of Bioinformatics, Faculty of Medicine, University of Muenster, Muenster, Germany
| | - Ella W. Englander
- Department of Surgery, University of Texas Medical Branch, Galveston, Texas, United States of America
| | - Wojciech Makałowski
- Institute of Bioinformatics, Faculty of Medicine, University of Muenster, Muenster, Germany
| |
Collapse
|
4
|
Abstract
The vertebrate globin gene repertoire consists of seven members that differ in terms of structure, function and phyletic distribution. While hemoglobin, myoglobin, cytoglobin, and neuroglobin are present in almost all gnathostomes examined so far, other globin genes, like globin X, are much more restricted in their phyletic distribution. Till today, globin X has only been found in teleost fish and Xenopus. Here, we report that globin X is also present in the genomes of the sea lamprey, ghost shark and reptiles. Moreover, the identification of orthologs of globin X in crustacean, insects, platyhelminthes, and hemichordates confirms its ancient origin.
Collapse
Affiliation(s)
- Jasmin Dröge
- Institute of Bioinformatics, Faculty of Medicine, University of Muenster, Niels Stensen Str, 14, 48149 Muenster, Germany
| | | |
Collapse
|