1
|
García-López M, Meier-Kolthoff JP, Tindall BJ, Gronow S, Woyke T, Kyrpides NC, Hahnke RL, Göker M. Analysis of 1,000 Type-Strain Genomes Improves Taxonomic Classification of Bacteroidetes. Front Microbiol 2019; 10:2083. [PMID: 31608019 PMCID: PMC6767994 DOI: 10.3389/fmicb.2019.02083] [Citation(s) in RCA: 190] [Impact Index Per Article: 38.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2018] [Accepted: 08/23/2019] [Indexed: 11/25/2022] Open
Abstract
Although considerable progress has been made in recent years regarding the classification of bacteria assigned to the phylum Bacteroidetes, there remains a need to further clarify taxonomic relationships within a diverse assemblage that includes organisms of clinical, piscicultural, and ecological importance. Bacteroidetes classification has proved to be difficult, not least when taxonomic decisions rested heavily on interpretation of poorly resolved 16S rRNA gene trees and a limited number of phenotypic features. Here, draft genome sequences of a greatly enlarged collection of genomes of more than 1,000 Bacteroidetes and outgroup type strains were used to infer phylogenetic trees from genome-scale data using the principles drawn from phylogenetic systematics. The majority of taxa were found to be monophyletic but several orders, families and genera, including taxa proposed long ago such as Bacteroides, Cytophaga, and Flavobacterium but also quite recent taxa, as well as a few species were shown to be in need of revision. According proposals are made for the recognition of new orders, families and genera, as well as the transfer of a variety of species to other genera. In addition, emended descriptions are given for many species mainly involving information on DNA G+C content and (approximate) genome size, both of which can be considered valuable taxonomic markers. We detected many incongruities when comparing the results of the present study with existing classifications, which appear to be caused by insufficiently resolved 16S rRNA gene trees or incomplete taxon sampling. The few significant incongruities found between 16S rRNA gene and whole genome trees underline the pitfalls inherent in phylogenies based upon single gene sequences and the impediment in using ordinary bootstrapping in phylogenomic studies, particularly when combined with too narrow gene selections. While a significant degree of phylogenetic conservation was detected in all phenotypic characters investigated, the overall fit to the tree varied considerably, which is one of the probable causes of misclassifications in the past, much like the use of plesiomorphic character states as diagnostic features.
Collapse
Affiliation(s)
- Marina García-López
- Department of Microorganisms, Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany
| | - Jan P. Meier-Kolthoff
- Department of Microorganisms, Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany
| | - Brian J. Tindall
- Department of Microorganisms, Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany
| | - Sabine Gronow
- Department of Microorganisms, Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany
| | - Tanja Woyke
- Department of Energy, Joint Genome Institute, Walnut Creek, CA, United States
| | - Nikos C. Kyrpides
- Department of Energy, Joint Genome Institute, Walnut Creek, CA, United States
| | - Richard L. Hahnke
- Department of Microorganisms, Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany
| | - Markus Göker
- Department of Microorganisms, Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany
| |
Collapse
|
2
|
Meier-Kolthoff JP, Göker M. TYGS is an automated high-throughput platform for state-of-the-art genome-based taxonomy. Nat Commun 2019; 10:2182. [PMID: 31097708 PMCID: PMC6522516 DOI: 10.1038/s41467-019-10210-3] [Citation(s) in RCA: 1604] [Impact Index Per Article: 320.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2018] [Accepted: 04/29/2019] [Indexed: 02/07/2023] Open
Abstract
Microbial taxonomy is increasingly influenced by genome-based computational methods. Yet such analyses can be complex and require expert knowledge. Here we introduce TYGS, the Type (Strain) Genome Server, a user-friendly high-throughput web server for genome-based prokaryote taxonomy, connected to a large, continuously growing database of genomic, taxonomic and nomenclatural information. It infers genome-scale phylogenies and state-of-the-art estimates for species and subspecies boundaries from user-defined and automatically determined closest type genome sequences. TYGS also provides comprehensive access to nomenclature, synonymy and associated taxonomic literature. Clinically important examples demonstrate how TYGS can yield new insights into microbial classification, such as evidence for a species-level separation of previously proposed subspecies of Salmonella enterica. TYGS is an integrated approach for the classification of microbes that unlocks novel scientific approaches to microbiologists worldwide and is particularly helpful for the rapidly expanding field of genome-based taxonomic descriptions of new genera, species or subspecies.
Collapse
Affiliation(s)
- Jan P Meier-Kolthoff
- Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures, Inhoffenstraße 7B, 38124, Braunschweig, Germany.
| | - Markus Göker
- Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures, Inhoffenstraße 7B, 38124, Braunschweig, Germany
| |
Collapse
|
3
|
Ren J, Bai X, Lu YY, Tang K, Wang Y, Reinert G, Sun F. Alignment-Free Sequence Analysis and Applications. Annu Rev Biomed Data Sci 2018; 1:93-114. [PMID: 31828235 PMCID: PMC6905628 DOI: 10.1146/annurev-biodatasci-080917-013431] [Citation(s) in RCA: 58] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Genome and metagenome comparisons based on large amounts of next generation sequencing (NGS) data pose significant challenges for alignment-based approaches due to the huge data size and the relatively short length of the reads. Alignment-free approaches based on the counts of word patterns in NGS data do not depend on the complete genome and are generally computationally efficient. Thus, they contribute significantly to genome and metagenome comparison. Recently, novel statistical approaches have been developed for the comparison of both long and shotgun sequences. These approaches have been applied to many problems including the comparison of gene regulatory regions, genome sequences, metagenomes, binning contigs in metagenomic data, identification of virus-host interactions, and detection of horizontal gene transfers. We provide an updated review of these applications and other related developments of word-count based approaches for alignment-free sequence analysis.
Collapse
Affiliation(s)
- Jie Ren
- Molecular and Computational Biology Program, University of Southern California, Los Angeles, California, USA
| | - Xin Bai
- Molecular and Computational Biology Program, University of Southern California, Los Angeles, California, USA
- Centre for Computational Systems Biology, School of Mathematical Sciences, Fudan University, Shanghai, China
| | - Yang Young Lu
- Molecular and Computational Biology Program, University of Southern California, Los Angeles, California, USA
| | - Kujin Tang
- Molecular and Computational Biology Program, University of Southern California, Los Angeles, California, USA
| | - Ying Wang
- Department of Automation, Xiamen University, Xiamen, Fujian, China
| | - Gesine Reinert
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Fengzhu Sun
- Molecular and Computational Biology Program, University of Southern California, Los Angeles, California, USA
- Centre for Computational Systems Biology, School of Mathematical Sciences, Fudan University, Shanghai, China
| |
Collapse
|
4
|
Beisser D, Graupner N, Bock C, Wodniok S, Grossmann L, Vos M, Sures B, Rahmann S, Boenigk J. Comprehensive transcriptome analysis provides new insights into nutritional strategies and phylogenetic relationships of chrysophytes. PeerJ 2017; 5:e2832. [PMID: 28097055 PMCID: PMC5228505 DOI: 10.7717/peerj.2832] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2016] [Accepted: 11/27/2016] [Indexed: 02/02/2023] Open
Abstract
Background Chrysophytes are protist model species in ecology and ecophysiology and important grazers of bacteria-sized microorganisms and primary producers. However, they have not yet been investigated in detail at the molecular level, and no genomic and only little transcriptomic information is available. Chrysophytes exhibit different trophic modes: while phototrophic chrysophytes perform only photosynthesis, mixotrophs can gain carbon from bacterial food as well as from photosynthesis, and heterotrophs solely feed on bacteria-sized microorganisms. Recent phylogenies and megasystematics demonstrate an immense complexity of eukaryotic diversity with numerous transitions between phototrophic and heterotrophic organisms. The question we aim to answer is how the diverse nutritional strategies, accompanied or brought about by a reduction of the plasmid and size reduction in heterotrophic strains, affect physiology and molecular processes. Results We sequenced the mRNA of 18 chrysophyte strains on the Illumina HiSeq platform and analysed the transcriptomes to determine relations between the trophic mode (mixotrophic vs. heterotrophic) and gene expression. We observed an enrichment of genes for photosynthesis, porphyrin and chlorophyll metabolism for phototrophic and mixotrophic strains that can perform photosynthesis. Genes involved in nutrient absorption, environmental information processing and various transporters (e.g., monosaccharide, peptide, lipid transporters) were present or highly expressed only in heterotrophic strains that have to sense, digest and absorb bacterial food. We furthermore present a transcriptome-based alignment-free phylogeny construction approach using transcripts assembled from short reads to determine the evolutionary relationships between the strains and the possible influence of nutritional strategies on the reconstructed phylogeny. We discuss the resulting phylogenies in comparison to those from established approaches based on ribosomal RNA and orthologous genes. Finally, we make functionally annotated reference transcriptomes of each strain available to the community, significantly enhancing publicly available data on Chrysophyceae. Conclusions Our study is the first comprehensive transcriptomic characterisation of a diverse set of Chrysophyceaen strains. In addition, we showcase the possibility of inferring phylogenies from assembled transcriptomes using an alignment-free approach. The raw and functionally annotated data we provide will prove beneficial for further examination of the diversity within this taxon. Our molecular characterisation of different trophic modes presents a first such example.
Collapse
Affiliation(s)
- Daniela Beisser
- Genome Informatics, University of Duisburg-Essen, Essen, Germany
| | - Nadine Graupner
- Biodiversity, University of Duisburg-Essen, Essen, Germany.,Centre for Water and Environmental Research (ZWU), University of Duisburg-Essen, Essen, Germany
| | - Christina Bock
- Biodiversity, University of Duisburg-Essen, Essen, Germany.,Centre for Water and Environmental Research (ZWU), University of Duisburg-Essen, Essen, Germany
| | - Sabina Wodniok
- Biodiversity, University of Duisburg-Essen, Essen, Germany.,Centre for Water and Environmental Research (ZWU), University of Duisburg-Essen, Essen, Germany
| | - Lars Grossmann
- Biodiversity, University of Duisburg-Essen, Essen, Germany.,Centre for Water and Environmental Research (ZWU), University of Duisburg-Essen, Essen, Germany
| | - Matthijs Vos
- Theoretical and Applied Biodiversity, Ruhr-University Bochum, Bochum, Germany
| | - Bernd Sures
- Aquatic Ecology, University of Duisburg-Essen, Essen, Germany
| | - Sven Rahmann
- Genome Informatics, University of Duisburg-Essen, Essen, Germany
| | - Jens Boenigk
- Biodiversity, University of Duisburg-Essen, Essen, Germany.,Centre for Water and Environmental Research (ZWU), University of Duisburg-Essen, Essen, Germany
| |
Collapse
|
5
|
Hua K, Yu Q, Zhang R. A Guaranteed Similarity Metric Learning Framework for Biological Sequence Comparison. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:868-877. [PMID: 26529778 DOI: 10.1109/tcbb.2015.2495186] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Similarity of sequences is a key mathematical notion for Classification and Phylogenetic studies in Biology. The distance and similarity between two sequence are very important and widely studied. During the last decades, Similarity(distance) metric learning is one of the hottest topics of machine learning/data mining as well as their applications in the bioinformatics field. It is feasible to introduce machine learning technology to learn similarity metric from biological data. In this paper, we propose a novel framework of guaranteed similarity metric learning (GMSL) to perform alignment of biology sequences in any feature vector space. It introduces the (ϵ, γ, τ)-goodness similarity theory to Mahalanobis metric learning. As a theoretical guaranteed similarity metric learning approach, GMSL guarantees that the learned similarity function performs well in classification and clustering. Our experiments on the most used datasets demonstrate that our approach outperforms the state-of-the-art biological sequences alignment methods and other similarity metric learning algorithms in both accuracy and stability.
Collapse
|
6
|
Garrido-Sanz D, Meier-Kolthoff JP, Göker M, Martín M, Rivilla R, Redondo-Nieto M. Genomic and Genetic Diversity within the Pseudomonas fluorescens Complex. PLoS One 2016; 11:e0150183. [PMID: 26915094 PMCID: PMC4767706 DOI: 10.1371/journal.pone.0150183] [Citation(s) in RCA: 130] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2015] [Accepted: 02/10/2016] [Indexed: 01/22/2023] Open
Abstract
The Pseudomonas fluorescens complex includes Pseudomonas strains that have been taxonomically assigned to more than fifty different species, many of which have been described as plant growth-promoting rhizobacteria (PGPR) with potential applications in biocontrol and biofertilization. So far the phylogeny of this complex has been analyzed according to phenotypic traits, 16S rDNA, MLSA and inferred by whole-genome analysis. However, since most of the type strains have not been fully sequenced and new species are frequently described, correlation between taxonomy and phylogenomic analysis is missing. In recent years, the genomes of a large number of strains have been sequenced, showing important genomic heterogeneity and providing information suitable for genomic studies that are important to understand the genomic and genetic diversity shown by strains of this complex. Based on MLSA and several whole-genome sequence-based analyses of 93 sequenced strains, we have divided the P. fluorescens complex into eight phylogenomic groups that agree with previous works based on type strains. Digital DDH (dDDH) identified 69 species and 75 subspecies within the 93 genomes. The eight groups corresponded to clustering with a threshold of 31.8% dDDH, in full agreement with our MLSA. The Average Nucleotide Identity (ANI) approach showed inconsistencies regarding the assignment to species and to the eight groups. The small core genome of 1,334 CDSs and the large pan-genome of 30,848 CDSs, show the large diversity and genetic heterogeneity of the P. fluorescens complex. However, a low number of strains were enough to explain most of the CDSs diversity at core and strain-specific genomic fractions. Finally, the identification and analysis of group-specific genome and the screening for distinctive characters revealed a phylogenomic distribution of traits among the groups that provided insights into biocontrol and bioremediation applications as well as their role as PGPR.
Collapse
Affiliation(s)
- Daniel Garrido-Sanz
- Departamento de Biología, Facultad de Ciencias, Universidad Autónoma de Madrid, c/Darwin, 2, Madrid, 28049, Spain
| | - Jan P. Meier-Kolthoff
- Leibniz Institute DSMZ–German Collection of Microorganisms and Cell Cultures, Inhoffenstraße 7B, 38124, Braunschweig, Germany
| | - Markus Göker
- Leibniz Institute DSMZ–German Collection of Microorganisms and Cell Cultures, Inhoffenstraße 7B, 38124, Braunschweig, Germany
| | - Marta Martín
- Departamento de Biología, Facultad de Ciencias, Universidad Autónoma de Madrid, c/Darwin, 2, Madrid, 28049, Spain
| | - Rafael Rivilla
- Departamento de Biología, Facultad de Ciencias, Universidad Autónoma de Madrid, c/Darwin, 2, Madrid, 28049, Spain
| | - Miguel Redondo-Nieto
- Departamento de Biología, Facultad de Ciencias, Universidad Autónoma de Madrid, c/Darwin, 2, Madrid, 28049, Spain
- * E-mail:
| |
Collapse
|
7
|
Liu Y, Lai Q, Göker M, Meier-Kolthoff JP, Wang M, Sun Y, Wang L, Shao Z. Genomic insights into the taxonomic status of the Bacillus cereus group. Sci Rep 2015; 5:14082. [PMID: 26373441 PMCID: PMC4571650 DOI: 10.1038/srep14082] [Citation(s) in RCA: 169] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2015] [Accepted: 08/17/2015] [Indexed: 02/01/2023] Open
Abstract
The identification and phylogenetic relationships of bacteria within the Bacillus cereus group are controversial. This study aimed at determining the taxonomic affiliations of these strains using the whole-genome sequence-based Genome BLAST Distance Phylogeny (GBDP) approach. The GBDP analysis clearly separated 224 strains into 30 clusters, representing eleven known, partially merged species and accordingly 19–20 putative novel species. Additionally, 16S rRNA gene analysis, a novel variant of multi-locus sequence analysis (nMLSA) and screening of virulence genes were performed. The 16S rRNA gene sequence was not sufficient to differentiate the bacteria within this group due to its high conservation. The nMLSA results were consistent with GBDP. Moreover, a fast typing method was proposed using the pycA gene, and where necessary, the ccpA gene. The pXO plasmids and cry genes were widely distributed, suggesting little correlation with the phylogenetic positions of the host bacteria. This might explain why classifications based on virulence characteristics proved unsatisfactory in the past. In summary, this is the first large-scale and systematic study of the taxonomic status of the bacteria within the B. cereus group using whole-genome sequences, and is likely to contribute to further insights into their pathogenicity, phylogeny and adaptation to diverse environments.
Collapse
Affiliation(s)
- Yang Liu
- State Key Laboratory Breeding Base of Marine Genetic Resources; Key Laboratory of Marine Genetic Resources, Third Institute of Oceanography, SOA; South China Sea Bio-Resource Exploitation and Utilization Collaborative Innovation Centre; Fujian Collaborative Innovation Center for Exploitation and Utilization of Marine Biological Resources; Key Laboratory of Marine Genetic Resources of Fujian Province, Xiamen 361005, China
| | - Qiliang Lai
- State Key Laboratory Breeding Base of Marine Genetic Resources; Key Laboratory of Marine Genetic Resources, Third Institute of Oceanography, SOA; South China Sea Bio-Resource Exploitation and Utilization Collaborative Innovation Centre; Fujian Collaborative Innovation Center for Exploitation and Utilization of Marine Biological Resources; Key Laboratory of Marine Genetic Resources of Fujian Province, Xiamen 361005, China
| | - Markus Göker
- Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures GmbH, Inhoffenstraβe 7B, 38124, Braunschweig, Germany
| | - Jan P Meier-Kolthoff
- Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures GmbH, Inhoffenstraβe 7B, 38124, Braunschweig, Germany
| | - Meng Wang
- TEDA School of Biological Sciences and Biotechnology Nankai University, Tianjin, China
| | - Yamin Sun
- TEDA School of Biological Sciences and Biotechnology Nankai University, Tianjin, China
| | - Lei Wang
- TEDA School of Biological Sciences and Biotechnology Nankai University, Tianjin, China
| | - Zongze Shao
- State Key Laboratory Breeding Base of Marine Genetic Resources; Key Laboratory of Marine Genetic Resources, Third Institute of Oceanography, SOA; South China Sea Bio-Resource Exploitation and Utilization Collaborative Innovation Centre; Fujian Collaborative Innovation Center for Exploitation and Utilization of Marine Biological Resources; Key Laboratory of Marine Genetic Resources of Fujian Province, Xiamen 361005, China
| |
Collapse
|
8
|
Yin C, Yau SST. An improved model for whole genome phylogenetic analysis by Fourier transform. J Theor Biol 2015; 382:99-110. [PMID: 26151589 DOI: 10.1016/j.jtbi.2015.06.033] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2015] [Revised: 06/19/2015] [Accepted: 06/22/2015] [Indexed: 01/07/2023]
Abstract
DNA sequence similarity comparison is one of the major steps in computational phylogenetic studies. The sequence comparison of closely related DNA sequences and genomes is usually performed by multiple sequence alignments (MSA). While the MSA method is accurate for some types of sequences, it may produce incorrect results when DNA sequences undergone rearrangements as in many bacterial and viral genomes. It is also limited by its computational complexity for comparing large volumes of data. Previously, we proposed an alignment-free method that exploits the full information contents of DNA sequences by Discrete Fourier Transform (DFT), but still with some limitations. Here, we present a significantly improved method for the similarity comparison of DNA sequences by DFT. In this method, we map DNA sequences into 2-dimensional (2D) numerical sequences and then apply DFT to transform the 2D numerical sequences into frequency domain. In the 2D mapping, the nucleotide composition of a DNA sequence is a determinant factor and the 2D mapping reduces the nucleotide composition bias in distance measure, and thus improving the similarity measure of DNA sequences. To compare the DFT power spectra of DNA sequences with different lengths, we propose an improved even scaling algorithm to extend shorter DFT power spectra to the longest length of the underlying sequences. After the DFT power spectra are evenly scaled, the spectra are in the same dimensionality of the Fourier frequency space, then the Euclidean distances of full Fourier power spectra of the DNA sequences are used as the dissimilarity metrics. The improved DFT method, with increased computational performance by 2D numerical representation, can be applicable to any DNA sequences of different length ranges. We assess the accuracy of the improved DFT similarity measure in hierarchical clustering of different DNA sequences including simulated and real datasets. The method yields accurate and reliable phylogenetic trees and demonstrates that the improved DFT dissimilarity measure is an efficient and effective similarity measure of DNA sequences. Due to its high efficiency and accuracy, the proposed DFT similarity measure is successfully applied on phylogenetic analysis for individual genes and large whole bacterial genomes.
Collapse
Affiliation(s)
- Changchuan Yin
- Department of Mathematics, Statistics and Computer Science, The University of Illinois at Chicago, Chicago, IL 60607-7045, USA
| | - Stephen S-T Yau
- Department of Mathematical Sciences, Tsinghua University, Beijing 100084, China.
| |
Collapse
|
9
|
Meier-Kolthoff JP, Hahnke RL, Petersen J, Scheuner C, Michael V, Fiebig A, Rohde C, Rohde M, Fartmann B, Goodwin LA, Chertkov O, Reddy TBK, Pati A, Ivanova NN, Markowitz V, Kyrpides NC, Woyke T, Göker M, Klenk HP. Complete genome sequence of DSM 30083(T), the type strain (U5/41(T)) of Escherichia coli, and a proposal for delineating subspecies in microbial taxonomy. Stand Genomic Sci 2014; 9:2. [PMID: 25780495 PMCID: PMC4334874 DOI: 10.1186/1944-3277-9-2] [Citation(s) in RCA: 364] [Impact Index Per Article: 36.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2014] [Accepted: 06/16/2014] [Indexed: 12/02/2022] Open
Abstract
Although Escherichia coli is the most widely studied bacterial model organism and often considered to be the model bacterium per se, its type strain was until now forgotten from microbial genomics. As a part of the G enomic E ncyclopedia of B acteria and A rchaea project, we here describe the features of E. coli DSM 30083(T) together with its genome sequence and annotation as well as novel aspects of its phenotype. The 5,038,133 bp containing genome sequence includes 4,762 protein-coding genes and 175 RNA genes as well as a single plasmid. Affiliation of a set of 250 genome-sequenced E. coli strains, Shigella and outgroup strains to the type strain of E. coli was investigated using digital DNA:DNA-hybridization (dDDH) similarities and differences in genomic G+C content. As in the majority of previous studies, results show Shigella spp. embedded within E. coli and in most cases forming a single subgroup of it. Phylogenomic trees also recover the proposed E. coli phylotypes as monophyla with minor exceptions and place DSM 30083(T) in phylotype B2 with E. coli S88 as its closest neighbor. The widely used lab strain K-12 is not only genomically but also physiologically strongly different from the type strain. The phylotypes do not express a uniform level of character divergence as measured using dDDH, however, thus an alternative arrangement is proposed and discussed in the context of bacterial subspecies. Analyses of the genome sequences of a large number of E. coli strains and of strains from > 100 other bacterial genera indicate a value of 79-80% dDDH as the most promising threshold for delineating subspecies, which in turn suggests the presence of five subspecies within E. coli.
Collapse
Affiliation(s)
- Jan P Meier-Kolthoff
- />Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures, Inhoffenstraße 7B, 38124 Braunschweig, Germany
| | - Richard L Hahnke
- />Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures, Inhoffenstraße 7B, 38124 Braunschweig, Germany
| | - Jörn Petersen
- />Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures, Inhoffenstraße 7B, 38124 Braunschweig, Germany
| | - Carmen Scheuner
- />Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures, Inhoffenstraße 7B, 38124 Braunschweig, Germany
| | - Victoria Michael
- />Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures, Inhoffenstraße 7B, 38124 Braunschweig, Germany
| | - Anne Fiebig
- />Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures, Inhoffenstraße 7B, 38124 Braunschweig, Germany
| | - Christine Rohde
- />Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures, Inhoffenstraße 7B, 38124 Braunschweig, Germany
| | - Manfred Rohde
- />Helmholtz Centre for Infection Research, Inhoffenstraße 7, 38124 Braunschweig, Germany
| | | | | | | | - TBK Reddy
- />DOE Joint Genome Institute, Walnut Creek, Ca USA
| | - Amrita Pati
- />DOE Joint Genome Institute, Walnut Creek, Ca USA
| | | | | | - Nikos C Kyrpides
- />DOE Joint Genome Institute, Walnut Creek, Ca USA
- />Department of Biological Sciences, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Tanja Woyke
- />DOE Joint Genome Institute, Walnut Creek, Ca USA
| | - Markus Göker
- />Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures, Inhoffenstraße 7B, 38124 Braunschweig, Germany
| | - Hans-Peter Klenk
- />Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures, Inhoffenstraße 7B, 38124 Braunschweig, Germany
| |
Collapse
|
10
|
Schwende I, Pham TD. Pattern recognition and probabilistic measures in alignment-free sequence analysis. Brief Bioinform 2013; 15:354-68. [PMID: 24096012 DOI: 10.1093/bib/bbt070] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
With the massive production of genomic and proteomic data, the number of available biological sequences in databases has reached a level that is not feasible anymore for exact alignments even when just a fraction of all sequences is used. To overcome this inevitable time complexity, ultrafast alignment-free methods are studied. Within the past two decades, a broad variety of nonalignment methods have been proposed including dissimilarity measures on classical representations of sequences like k-words or Markov models. Furthermore, articles were published that describe distance measures on alternative representations such as compression complexity, spectral time series or chaos game representation. However, alignments are still the standard method for real world applications in biological sequence analysis, and the time efficient alignment-free approaches are usually applied in cases when the accustomed algorithms turn out to fail or be too inconvenient.
Collapse
Affiliation(s)
- Isabel Schwende
- PhD, Aizu Research Cluster for Medical Informatics and Engineering (ARC-Medical), Research Center for Advanced Information Science and Technology (CAIST), The University of Aizu, Aizuwakamatsu, Fukushima 965-8580, Japan.
| | | |
Collapse
|