1
|
Darrington M, Leftwich PT, Holmes NA, Friend LA, Clarke NVE, Worsley SF, Margaritopolous JT, Hogenhout SA, Hutchings MI, Chapman T. Characterisation of the symbionts in the Mediterranean fruit fly gut. Microb Genom 2022; 8. [PMID: 35446250 PMCID: PMC9453069 DOI: 10.1099/mgen.0.000801] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Symbioses between bacteria and their insect hosts can range from loose associations through to obligate interdependence. While fundamental evolutionary insights have been gained from the in-depth study of obligate mutualisms, there is increasing interest in the evolutionary potential of flexible symbiotic associations between hosts and their gut microbiomes. Understanding relationships between microbes and hosts also offers the potential for exploitation for insect control. Here, we investigate the gut microbiome of a global agricultural pest, the Mediterranean fruit fly (Ceratitis capitata). We used 16S rRNA profiling to compare the gut microbiomes of laboratory and wild strains raised on different diets and from flies collected from various natural plant hosts. The results showed that medfly guts harbour a simple microbiome that is primarily determined by the larval diet. However, regardless of the laboratory diet or natural plant host on which flies were raised, Klebsiella spp. dominated medfly microbiomes and were resistant to removal by antibiotic treatment. We sequenced the genome of the dominant putative Klebsiella spp. (‘Medkleb’) isolated from the gut of the Toliman wild-type strain. Genome-wide ANI analysis placed Medkleb within the K. oxytoca / michiganensis group. Species level taxonomy for Medkleb was resolved using a mutli-locus phylogenetic approach - and molecular, sequence and phenotypic analyses all supported its identity as K. michiganensis. Medkleb has a genome size (5825435 bp) which is 1.6 standard deviations smaller than the mean genome size of free-living Klebsiella spp. Medkleb also lacks some genes involved in environmental sensing. Moreover, the Medkleb genome contains at least two recently acquired unique genomic islands as well as genes that encode pectinolytic enzymes capable of degrading plant cell walls. This may be advantageous given that the medfly diet includes unripe fruits containing high proportions of pectin. The results suggest that the medfly harbours a commensal gut bacterium that may have developed a mutualistic association with its host and provide nutritional benefits.
Collapse
Affiliation(s)
- Mike Darrington
- School of Biological Sciences, University of East Anglia, Norwich Research Park, Norwich, NR4 7TJ, UK
| | - Philip T Leftwich
- School of Biological Sciences, University of East Anglia, Norwich Research Park, Norwich, NR4 7TJ, UK
| | - Neil A Holmes
- School of Biological Sciences, University of East Anglia, Norwich Research Park, Norwich, NR4 7TJ, UK.,Department of Molecular Microbiology, John Innes Centre, Norwich Research Park, Norwich, NR4 7UH, UK
| | - Lucy A Friend
- School of Biological Sciences, University of East Anglia, Norwich Research Park, Norwich, NR4 7TJ, UK
| | - Naomi V E Clarke
- School of Biological Sciences, University of East Anglia, Norwich Research Park, Norwich, NR4 7TJ, UK
| | - Sarah F Worsley
- School of Biological Sciences, University of East Anglia, Norwich Research Park, Norwich, NR4 7TJ, UK
| | - John T Margaritopolous
- Department of Plant Protection, Institute of Industrial and Fodder Crops, Hellenic Agricultural Organization-DEMETER, Volos, Greece
| | - Saskia A Hogenhout
- Department of Crop Genetics, John Innes Centre, Norwich Research Park, NR4 7UH, Norwich, UK
| | - Matthew I Hutchings
- School of Biological Sciences, University of East Anglia, Norwich Research Park, Norwich, NR4 7TJ, UK.,Department of Molecular Microbiology, John Innes Centre, Norwich Research Park, Norwich, NR4 7UH, UK
| | - Tracey Chapman
- School of Biological Sciences, University of East Anglia, Norwich Research Park, Norwich, NR4 7TJ, UK
| |
Collapse
|
2
|
Goussarov G, Cleenwerck I, Mysara M, Leys N, Monsieurs P, Tahon G, Carlier A, Vandamme P, Van Houdt R. PaSiT: a novel approach based on short-oligonucleotide frequencies for efficient bacterial identification and typing. Bioinformatics 2020; 36:2337-2344. [PMID: 31899493 PMCID: PMC7178395 DOI: 10.1093/bioinformatics/btz964] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2019] [Revised: 11/21/2019] [Accepted: 12/30/2019] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION One of the most widespread methods used in taxonomy studies to distinguish between strains or taxa is the calculation of average nucleotide identity. It requires a computationally expensive alignment step and is therefore not suitable for large-scale comparisons. Short oligonucleotide-based methods do offer a faster alternative but at the expense of accuracy. Here, we aim to address this shortcoming by providing a software that implements a novel method based on short-oligonucleotide frequencies to compute inter-genomic distances. RESULTS Our tetranucleotide and hexanucleotide implementations, which were optimized based on a taxonomically well-defined set of over 200 newly sequenced bacterial genomes, are as accurate as the short oligonucleotide-based method TETRA and average nucleotide identity, for identifying bacterial species and strains, respectively. Moreover, the lightweight nature of this method makes it applicable for large-scale analyses. AVAILABILITY AND IMPLEMENTATION The method introduced here was implemented, together with other existing methods, in a dependency-free software written in C, GenDisCal, available as source code from https://github.com/LM-UGent/GenDisCal. The software supports multithreading and has been tested on Windows and Linux (CentOS). In addition, a Java-based graphical user interface that acts as a wrapper for the software is also available. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gleb Goussarov
- Microbiology Unit, Belgian Nuclear Research Centre (SCK•CEN), Mol, Belgium
- Laboratory of Microbiology and BCCM/LMG Bacteria Collection, Department of Biochemistry and Microbiology, Faculty of Sciences, Ghent University, Ghent, Belgium
| | - Ilse Cleenwerck
- Laboratory of Microbiology and BCCM/LMG Bacteria Collection, Department of Biochemistry and Microbiology, Faculty of Sciences, Ghent University, Ghent, Belgium
| | - Mohamed Mysara
- Microbiology Unit, Belgian Nuclear Research Centre (SCK•CEN), Mol, Belgium
| | - Natalie Leys
- Microbiology Unit, Belgian Nuclear Research Centre (SCK•CEN), Mol, Belgium
| | - Pieter Monsieurs
- Microbiology Unit, Belgian Nuclear Research Centre (SCK•CEN), Mol, Belgium
| | - Guillaume Tahon
- Laboratory of Microbiology and BCCM/LMG Bacteria Collection, Department of Biochemistry and Microbiology, Faculty of Sciences, Ghent University, Ghent, Belgium
| | - Aurélien Carlier
- Laboratory of Microbiology and BCCM/LMG Bacteria Collection, Department of Biochemistry and Microbiology, Faculty of Sciences, Ghent University, Ghent, Belgium
- LIPM, Université de Toulouse, INRAE, CNRS, Castanet-Tolosan, France
| | - Peter Vandamme
- Laboratory of Microbiology and BCCM/LMG Bacteria Collection, Department of Biochemistry and Microbiology, Faculty of Sciences, Ghent University, Ghent, Belgium
| | - Rob Van Houdt
- Microbiology Unit, Belgian Nuclear Research Centre (SCK•CEN), Mol, Belgium
| |
Collapse
|
3
|
Bohlin J, Pettersson JHO. Evolution of Genomic Base Composition: From Single Cell Microbes to Multicellular Animals. Comput Struct Biotechnol J 2019; 17:362-370. [PMID: 30949307 PMCID: PMC6429543 DOI: 10.1016/j.csbj.2019.03.001] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Revised: 02/28/2019] [Accepted: 03/01/2019] [Indexed: 01/07/2023] Open
Abstract
Whole genome sequencing (WGS) of thousands of microbial genomes has provided considerable insight into evolutionary mechanisms in the microbial world. While substantially fewer eukaryotic genomes are available for analyses the number is rapidly increasing. This mini-review summarizes broadly evolutionary dynamics of base composition in the different domains of life from the perspective of prokaryotes. Common and different evolutionary mechanisms influencing genomic base composition in eukaryotes and prokaryotes are discussed. The conclusion from the data currently available suggests that while there are similarities there are also striking differences in how genomic base composition has evolved within prokaryotes and eukaryotes. For instance, homologous recombination appears to increase GC content locally in eukaryotes due to a non-selective process termed GC-biased gene conversion (gBGC). For prokaryotes on the other hand, increase in genomic GC content seems to be driven by the environment and selection. We find that similar phenomena observed for some organisms in each respective domain may be caused by very different mechanisms: while gBGC and recombination rates appear to explain the negative correlation between GC3 (GC content based on the third codon nucleotides) and genome size in some eukaryotes uptake of AT rich DNA sequences is the main reason for a similar negative correlation observed in prokaryotes. We provide further examples that indicate that base composition in prokaryotes and eukaryotes have evolved under very different constraints.
Collapse
Affiliation(s)
- Jon Bohlin
- Norwegian Institute of Public Health, Division of Infection Control and Environmental Health, Department of Infectious Disease Epidemiology and Modelling, Lovisenberggata 8, 0456 Oslo, Norway.,Centre for Fertility and Health, Norwegian Institute of Public Health, PO-Box 222 Skøyen, N-0213 Oslo, Norway.,Norwegian University of Life Sciences, Faculty of Veterinary Sciences, Production Animal Clinical Sciences, Ullevålsveien 72, 0454 Oslo, Norway
| | - John H-O Pettersson
- Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles Perkins Centre, School of Life and Environmental Sciences and Sydney Medical School the University of Sydney, New South Wales 2006, Australia.,Zoonosis Science Center, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.,Public Health Agency of Sweden, Nobels vg 18, SE-171 82 Solna, Sweden
| |
Collapse
|
4
|
Bohlin J, Eldholm V, Brynildsrud O, Petterson JHO, Alfsnes K. Modeling of the GC content of the substituted bases in bacterial core genomes. BMC Genomics 2018; 19:589. [PMID: 30081825 PMCID: PMC6080486 DOI: 10.1186/s12864-018-4984-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2018] [Accepted: 07/31/2018] [Indexed: 12/13/2022] Open
Abstract
Background The purpose of the present study was to examine the GC content of substituted bases (sbGC) in the core genomes of 35 bacterial species. Each species, or core genome, constituted genomes from at least 10 strains. We also wanted to explore whether sbGC for each strain was associated with the corresponding species’ core genome GC content (cgGC). We present a simple mathematical model that estimates sbGC from cgGC. The model assumes only that the estimated sbGC is a function of cgGC proportional to fixed AT→GC (α) and GC → AT (β) mutation rates. Non-linear regression was used to estimate parameters α and β from the empirical data described above. Results We found that sbGC for each strain showed a non-linear association with the corresponding cgGC with a bias towards higher GC content for most core genomes (66.3% of the strains), assuming as a null-hypothesis that sbGC should be approximately equal to cgGC. The most GC rich core genomes (i.e. approximately %GC > 60), on the other hand, exhibited slightly less GC-biased sbGC than expected. The best fitted regression model indicates that GC → AT mutation rates β = (1.91 ± 0.13) p < 0.001 are approximately (1.91/0.79) = 2.42 times as high, on average, as AT→GC α = (− 0.79 ± 0.25) p < 0.001 mutation rates. Whether the observed sbGC GC-bias for all but the most GC-rich prokaryotic species is due to selection, compensating for the GC → AT mutation bias, and/or selective neutral processes is currently debated. Residual standard error was found to be σ = 0.076 indicating estimated errors of sbGC to be approximately within ±15.2% GC (95% confidence interval) for the strains of all species in the study. Conclusion Not only did our mathematical model give reasonable estimates of sbGC it also provides further support to previous observations that mutation rates in prokaryotes exhibit a universal GC → AT bias that appears to be remarkably consistent between taxa. Electronic supplementary material The online version of this article (10.1186/s12864-018-4984-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jon Bohlin
- Norwegian Institute of Public Health, Lovisenberggata 8, P.O. Box 4404, 0403, Oslo, Norway.
| | - Vegard Eldholm
- Norwegian Institute of Public Health, Lovisenberggata 8, P.O. Box 4404, 0403, Oslo, Norway
| | - Ola Brynildsrud
- Norwegian Institute of Public Health, Lovisenberggata 8, P.O. Box 4404, 0403, Oslo, Norway
| | - John H-O Petterson
- Norwegian Institute of Public Health, Lovisenberggata 8, P.O. Box 4404, 0403, Oslo, Norway
| | - Kristian Alfsnes
- Norwegian Institute of Public Health, Lovisenberggata 8, P.O. Box 4404, 0403, Oslo, Norway
| |
Collapse
|
5
|
Yu X, Reva ON. SWPhylo - A Novel Tool for Phylogenomic Inferences by Comparison of Oligonucleotide Patterns and Integration of Genome-Based and Gene-Based Phylogenetic Trees. Evol Bioinform Online 2018; 14:1176934318759299. [PMID: 29511354 PMCID: PMC5826093 DOI: 10.1177/1176934318759299] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2017] [Accepted: 01/24/2018] [Indexed: 11/17/2022] Open
Abstract
Modern phylogenetic studies may benefit from the analysis of complete genome sequences of various microorganisms. Evolutionary inferences based on genome-scale analysis are believed to be more accurate than the gene-based alternative. However, the computational complexity of current phylogenomic procedures, inappropriateness of standard phylogenetic tools to process genome-wide data, and lack of reliable substitution models which correlates with alignment-free phylogenomic approaches deter microbiologists from using these opportunities. For example, the super-matrix and super-tree approaches of phylogenomics use multiple integrated genomic loci or individual gene-based trees to infer an overall consensus tree. However, these approaches potentially multiply errors of gene annotation and sequence alignment not mentioning the computational complexity and laboriousness of the methods. In this article, we demonstrate that the annotation- and alignment-free comparison of genome-wide tetranucleotide frequencies, termed oligonucleotide usage patterns (OUPs), allowed a fast and reliable inference of phylogenetic trees. These were congruent to the corresponding whole genome super-matrix trees in terms of tree topology when compared with other known approaches including 16S ribosomal RNA and GyrA protein sequence comparison, complete genome-based MAUVE, and CVTree methods. A Web-based program to perform the alignment-free OUP-based phylogenomic inferences was implemented at http://swphylo.bi.up.ac.za/. Applicability of the tool was tested on different taxa from subspecies to intergeneric levels. Distinguishing between closely related taxonomic units may be enforced by providing the program with alignments of marker protein sequences, eg, GyrA.
Collapse
Affiliation(s)
- Xiaoyu Yu
- Department of Biochemistry, Centre for Bioinformatics and Computational Biology, University of Pretoria, Pretoria, South Africa
| | - Oleg N Reva
- Department of Biochemistry, Centre for Bioinformatics and Computational Biology, University of Pretoria, Pretoria, South Africa
| |
Collapse
|
6
|
Beisser D, Graupner N, Bock C, Wodniok S, Grossmann L, Vos M, Sures B, Rahmann S, Boenigk J. Comprehensive transcriptome analysis provides new insights into nutritional strategies and phylogenetic relationships of chrysophytes. PeerJ 2017; 5:e2832. [PMID: 28097055 PMCID: PMC5228505 DOI: 10.7717/peerj.2832] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2016] [Accepted: 11/27/2016] [Indexed: 02/02/2023] Open
Abstract
Background Chrysophytes are protist model species in ecology and ecophysiology and important grazers of bacteria-sized microorganisms and primary producers. However, they have not yet been investigated in detail at the molecular level, and no genomic and only little transcriptomic information is available. Chrysophytes exhibit different trophic modes: while phototrophic chrysophytes perform only photosynthesis, mixotrophs can gain carbon from bacterial food as well as from photosynthesis, and heterotrophs solely feed on bacteria-sized microorganisms. Recent phylogenies and megasystematics demonstrate an immense complexity of eukaryotic diversity with numerous transitions between phototrophic and heterotrophic organisms. The question we aim to answer is how the diverse nutritional strategies, accompanied or brought about by a reduction of the plasmid and size reduction in heterotrophic strains, affect physiology and molecular processes. Results We sequenced the mRNA of 18 chrysophyte strains on the Illumina HiSeq platform and analysed the transcriptomes to determine relations between the trophic mode (mixotrophic vs. heterotrophic) and gene expression. We observed an enrichment of genes for photosynthesis, porphyrin and chlorophyll metabolism for phototrophic and mixotrophic strains that can perform photosynthesis. Genes involved in nutrient absorption, environmental information processing and various transporters (e.g., monosaccharide, peptide, lipid transporters) were present or highly expressed only in heterotrophic strains that have to sense, digest and absorb bacterial food. We furthermore present a transcriptome-based alignment-free phylogeny construction approach using transcripts assembled from short reads to determine the evolutionary relationships between the strains and the possible influence of nutritional strategies on the reconstructed phylogeny. We discuss the resulting phylogenies in comparison to those from established approaches based on ribosomal RNA and orthologous genes. Finally, we make functionally annotated reference transcriptomes of each strain available to the community, significantly enhancing publicly available data on Chrysophyceae. Conclusions Our study is the first comprehensive transcriptomic characterisation of a diverse set of Chrysophyceaen strains. In addition, we showcase the possibility of inferring phylogenies from assembled transcriptomes using an alignment-free approach. The raw and functionally annotated data we provide will prove beneficial for further examination of the diversity within this taxon. Our molecular characterisation of different trophic modes presents a first such example.
Collapse
Affiliation(s)
- Daniela Beisser
- Genome Informatics, University of Duisburg-Essen, Essen, Germany
| | - Nadine Graupner
- Biodiversity, University of Duisburg-Essen, Essen, Germany.,Centre for Water and Environmental Research (ZWU), University of Duisburg-Essen, Essen, Germany
| | - Christina Bock
- Biodiversity, University of Duisburg-Essen, Essen, Germany.,Centre for Water and Environmental Research (ZWU), University of Duisburg-Essen, Essen, Germany
| | - Sabina Wodniok
- Biodiversity, University of Duisburg-Essen, Essen, Germany.,Centre for Water and Environmental Research (ZWU), University of Duisburg-Essen, Essen, Germany
| | - Lars Grossmann
- Biodiversity, University of Duisburg-Essen, Essen, Germany.,Centre for Water and Environmental Research (ZWU), University of Duisburg-Essen, Essen, Germany
| | - Matthijs Vos
- Theoretical and Applied Biodiversity, Ruhr-University Bochum, Bochum, Germany
| | - Bernd Sures
- Aquatic Ecology, University of Duisburg-Essen, Essen, Germany
| | - Sven Rahmann
- Genome Informatics, University of Duisburg-Essen, Essen, Germany
| | - Jens Boenigk
- Biodiversity, University of Duisburg-Essen, Essen, Germany.,Centre for Water and Environmental Research (ZWU), University of Duisburg-Essen, Essen, Germany
| |
Collapse
|
7
|
The genome of Pseudomonas fluorescens strain R124 demonstrates phenotypic adaptation to the mineral environment. J Bacteriol 2013; 195:4793-803. [PMID: 23995634 DOI: 10.1128/jb.00825-13] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Microbial adaptation to environmental conditions is a complex process, including acquisition of positive traits through horizontal gene transfer or the modification of existing genes through duplication and/or mutation. In this study, we examined the adaptation of a Pseudomonas fluorescens isolate (R124) from the nutrient-limited mineral environment of a silica cave in comparison with P. fluorescens isolates from surface soil and the rhizosphere. Examination of metal homeostasis gene pathways demonstrated a high degree of conservation, suggesting that such systems remain functionally similar across chemical environments. The examination of genomic islands unique to our strain revealed the presence of genes involved in carbohydrate metabolism, aromatic carbon metabolism, and carbon turnover, confirmed through phenotypic assays, suggesting the acquisition of potentially novel mechanisms for energy metabolism in this strain. We also identified a twitching motility phenotype active at low-nutrient concentrations that may allow alternative exploratory mechanisms for this organism in a geochemical environment. Two sets of candidate twitching motility genes are present within the genome, one on the chromosome and one on a plasmid; however, a plasmid knockout identified the functional gene as being present on the chromosome. This work highlights the plasticity of the Pseudomonas genome, allowing the acquisition of novel nutrient-scavenging pathways across diverse geochemical environments while maintaining a core of functional stress response genes.
Collapse
|
8
|
Skewes AD, Welch RD. A Markovian analysis of bacterial genome sequence constraints. PeerJ 2013; 1:e127. [PMID: 24010012 PMCID: PMC3757466 DOI: 10.7717/peerj.127] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2013] [Accepted: 07/18/2013] [Indexed: 11/20/2022] Open
Abstract
The arrangement of nucleotides within a bacterial chromosome is influenced by numerous factors. The degeneracy of the third codon within each reading frame allows some flexibility of nucleotide selection; however, the third nucleotide in the triplet of each codon is at least partly determined by the preceding two. This is most evident in organisms with a strong G + C bias, as the degenerate codon must contribute disproportionately to maintaining that bias. Therefore, a correlation exists between the first two nucleotides and the third in all open reading frames. If the arrangement of nucleotides in a bacterial chromosome is represented as a Markov process, we would expect that the correlation would be completely captured by a second-order Markov model and an increase in the order of the model (e.g., third-, fourth-…order) would not capture any additional uncertainty in the process. In this manuscript, we present the results of a comprehensive study of the Markov property that exists in the DNA sequences of 906 bacterial chromosomes. All of the 906 bacterial chromosomes studied exhibit a statistically significant Markov property that extends beyond second-order, and therefore cannot be fully explained by codon usage. An unrooted tree containing all 906 bacterial chromosomes based on their transition probability matrices of third-order shares ∼25% similarity to a tree based on sequence homologies of 16S rRNA sequences. This congruence to the 16S rRNA tree is greater than for trees based on lower-order models (e.g., second-order), and higher-order models result in diminishing improvements in congruence. A nucleotide correlation most likely exists within every bacterial chromosome that extends past three nucleotides. This correlation places significant limits on the number of nucleotide sequences that can represent probable bacterial chromosomes. Transition matrix usage is largely conserved by taxa, indicating that this property is likely inherited, however some important exceptions exist that may indicate the convergent evolution of some bacteria.
Collapse
Affiliation(s)
- Aaron D Skewes
- Department of Biology, Syracuse University , Syracuse, NY, United States ; Department of Mathematics, Syracuse University , Syracuse, NY , United States
| | | |
Collapse
|
9
|
Bohlin J, Brynildsrud O, Vesth T, Skjerve E, Ussery DW. Amino acid usage is asymmetrically biased in AT- and GC-rich microbial genomes. PLoS One 2013; 8:e69878. [PMID: 23922837 PMCID: PMC3724673 DOI: 10.1371/journal.pone.0069878] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2013] [Accepted: 06/14/2013] [Indexed: 11/18/2022] Open
Abstract
INTRODUCTION Genomic base composition ranges from less than 25% AT to more than 85% AT in prokaryotes. Since only a small fraction of prokaryotic genomes is not protein coding even a minor change in genomic base composition will induce profound protein changes. We examined how amino acid and codon frequencies were distributed in over 2000 microbial genomes and how these distributions were affected by base compositional changes. In addition, we wanted to know how genome-wide amino acid usage was biased in the different genomes and how changes to base composition and mutations affected this bias. To carry this out, we used a Generalized Additive Mixed-effects Model (GAMM) to explore non-linear associations and strong data dependences in closely related microbes; principal component analysis (PCA) was used to examine genomic amino acid- and codon frequencies, while the concept of relative entropy was used to analyze genomic mutation rates. RESULTS We found that genomic amino acid frequencies carried a stronger phylogenetic signal than codon frequencies, but that this signal was weak compared to that of genomic %AT. Further, in contrast to codon usage bias (CUB), amino acid usage bias (AAUB) was differently distributed in AT- and GC-rich genomes in the sense that AT-rich genomes did not prefer specific amino acids over others to the same extent as GC-rich genomes. AAUB was also associated with relative entropy; genomes with low AAUB contained more random mutations as a consequence of relaxed purifying selection than genomes with higher AAUB. CONCLUSION Genomic base composition has a substantial effect on both amino acid- and codon frequencies in bacterial genomes. While phylogeny influenced amino acid usage more in GC-rich genomes, AT-content was driving amino acid usage in AT-rich genomes. We found the GAMM model to be an excellent tool to analyze the genomic data used in this study.
Collapse
Affiliation(s)
- Jon Bohlin
- Centre for Epidemiology and Biostatistics, Department of Food Safety and Infection Biology, Norwegian School of Veterinary Science, Oslo, Norway.
| | | | | | | | | |
Collapse
|
10
|
Teeling H, Glöckner FO. Current opportunities and challenges in microbial metagenome analysis--a bioinformatic perspective. Brief Bioinform 2012; 13:728-42. [PMID: 22966151 PMCID: PMC3504927 DOI: 10.1093/bib/bbs039] [Citation(s) in RCA: 148] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2012] [Accepted: 06/09/2012] [Indexed: 12/21/2022] Open
Abstract
Metagenomics has become an indispensable tool for studying the diversity and metabolic potential of environmental microbes, whose bulk is as yet non-cultivable. Continual progress in next-generation sequencing allows for generating increasingly large metagenomes and studying multiple metagenomes over time or space. Recently, a new type of holistic ecosystem study has emerged that seeks to combine metagenomics with biodiversity, meta-expression and contextual data. Such 'ecosystems biology' approaches bear the potential to not only advance our understanding of environmental microbes to a new level but also impose challenges due to increasing data complexities, in particular with respect to bioinformatic post-processing. This mini review aims to address selected opportunities and challenges of modern metagenomics from a bioinformatics perspective and hopefully will serve as a useful resource for microbial ecologists and bioinformaticians alike.
Collapse
|
11
|
Bohlin J, van Passel MWJ, Snipen L, Kristoffersen AB, Ussery D, Hardy SP. Relative entropy differences in bacterial chromosomes, plasmids, phages and genomic islands. BMC Genomics 2012; 13:66. [PMID: 22325062 PMCID: PMC3305612 DOI: 10.1186/1471-2164-13-66] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2011] [Accepted: 02/10/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND We sought to assess whether the concept of relative entropy (information capacity), could aid our understanding of the process of horizontal gene transfer in microbes. We analyzed the differences in information capacity between prokaryotic chromosomes, genomic islands (GI), phages, and plasmids. Relative entropy was estimated using the Kullback-Leibler measure. RESULTS Relative entropy was highest in bacterial chromosomes and had the sequence chromosomes > GI > phage > plasmid. There was an association between relative entropy and AT content in chromosomes, phages, plasmids and GIs with the strongest association being in phages. Relative entropy was also found to be lower in the obligate intracellular Mycobacterium leprae than in the related M. tuberculosis when measured on a shared set of highly conserved genes. CONCLUSIONS We argue that relative entropy differences reflect how plasmids, phages and GIs interact with microbial host chromosomes and that all these biological entities are, or have been, subjected to different selective pressures. The rate at which amelioration of horizontally acquired DNA occurs within the chromosome is likely to account for the small differences between chromosomes and stably incorporated GIs compared to the transient or independent replicons such as phages and plasmids.
Collapse
Affiliation(s)
- Jon Bohlin
- Norwegian School of Veterinary Science, EpiCentre, Department of Food Safety and Infection biology, Ullevålsveien 72, Oslo, Norway.
| | | | | | | | | | | |
Collapse
|
12
|
Bezuidt O, Pierneef R, Mncube K, Lima-Mendez G, Reva ON. Mainstreams of horizontal gene exchange in enterobacteria: consideration of the outbreak of enterohemorrhagic E. coli O104:H4 in Germany in 2011. PLoS One 2011; 6:e25702. [PMID: 22022434 PMCID: PMC3195076 DOI: 10.1371/journal.pone.0025702] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2011] [Accepted: 09/08/2011] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Escherichia coli O104:H4 caused a severe outbreak in Europe in 2011. The strain TY-2482 sequenced from this outbreak allowed the discovery of its closest relatives but failed to resolve ways in which it originated and evolved. On account of the previous statement, may we expect similar upcoming outbreaks to occur recurrently or spontaneously in the future? The inability to answer these questions shows limitations of the current comparative and evolutionary genomics methods. PRINCIPAL FINDINGS The study revealed oscillations of gene exchange in enterobacteria, which originated from marine γ-Proteobacteria. These mobile genetic elements have become recombination hotspots and effective 'vehicles' ensuring a wide distribution of successful combinations of fitness and virulence genes among enterobacteria. Two remarkable peculiarities of the strain TY-2482 and its relatives were observed: i) retaining the genetic primitiveness by these strains as they somehow avoided the main fluxes of horizontal gene transfer which effectively penetrated other enetrobacteria; ii) acquisition of antibiotic resistance genes in a plasmid genomic island of β-Proteobacteria origin which ontologically is unrelated to the predominant genomic islands of enterobacteria. CONCLUSIONS Oscillations of horizontal gene exchange activity were reported which result from a counterbalance between the acquired resistance of bacteria towards existing mobile vectors and the generation of new vectors in the environmental microflora. We hypothesized that TY-2482 may originate from a genetically primitive lineage of E. coli that has evolved in confined geographical areas and brought by human migration or cattle trade onto an intersection of several independent streams of horizontal gene exchange. Development of a system for monitoring the new and most active gene exchange events was proposed.
Collapse
Affiliation(s)
- Oliver Bezuidt
- Bioinformatics and Computational Biology Unit, Department of Biochemistry, University of Pretoria, Pretoria, South Africa
| | - Rian Pierneef
- Bioinformatics and Computational Biology Unit, Department of Biochemistry, University of Pretoria, Pretoria, South Africa
| | - Kingdom Mncube
- Bioinformatics and Computational Biology Unit, Department of Biochemistry, University of Pretoria, Pretoria, South Africa
| | - Gipsi Lima-Mendez
- Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe), Université Libre de Bruxelles, Bruxelles, Belgium
| | - Oleg N. Reva
- Bioinformatics and Computational Biology Unit, Department of Biochemistry, University of Pretoria, Pretoria, South Africa
- * E-mail:
| |
Collapse
|
13
|
Klockgether J, Cramer N, Wiehlmann L, Davenport CF, Tümmler B. Pseudomonas aeruginosa Genomic Structure and Diversity. Front Microbiol 2011; 2:150. [PMID: 21808635 PMCID: PMC3139241 DOI: 10.3389/fmicb.2011.00150] [Citation(s) in RCA: 199] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2011] [Accepted: 06/27/2011] [Indexed: 12/23/2022] Open
Abstract
The Pseudomonas aeruginosa genome (G + C content 65–67%, size 5.5–7 Mbp) is made up of a single circular chromosome and a variable number of plasmids. Sequencing of complete genomes or blocks of the accessory genome has revealed that the genome encodes a large repertoire of transporters, transcriptional regulators, and two-component regulatory systems which reflects its metabolic diversity to utilize a broad range of nutrients. The conserved core component of the genome is largely collinear among P. aeruginosa strains and exhibits an interclonal sequence diversity of 0.5–0.7%. Only a few loci of the core genome are subject to diversifying selection. Genome diversity is mainly caused by accessory DNA elements located in 79 regions of genome plasticity that are scattered around the genome and show an anomalous usage of mono- to tetradecanucleotides. Genomic islands of the pKLC102/PAGI-2 family that integrate into tRNALys or tRNAGly genes represent hotspots of inter- and intraclonal genomic diversity. The individual islands differ in their repertoire of metabolic genes that make a large contribution to the pangenome. In order to unravel intraclonal diversity of P. aeruginosa, the genomes of two members of the PA14 clonal complex from diverse habitats and geographic origin were compared. The genome sequences differed by less than 0.01% from each other. One hundred ninety-eight of the 231 single nucleotide substitutions (SNPs) were non-randomly distributed in the genome. Non-synonymous SNPs were mainly found in an integrated Pf1-like phage and in genes involved in transcriptional regulation, membrane and extracellular constituents, transport, and secretion. In summary, P. aeruginosa is endowed with a highly conserved core genome of low sequence diversity and a highly variable accessory genome that communicates with other pseudomonads and genera via horizontal gene transfer.
Collapse
Affiliation(s)
- Jens Klockgether
- Klinik für Pädiatrische Pneumologie, Allergologie und Neonatologie, Klinische Forschergruppe Hannover, Germany
| | | | | | | | | |
Collapse
|
14
|
Practical application of self-organizing maps to interrelate biodiversity and functional data in NGS-based metagenomics. ISME JOURNAL 2010; 5:918-28. [PMID: 21160538 DOI: 10.1038/ismej.2010.180] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
Next-generation sequencing (NGS) technologies have enabled the application of broad-scale sequencing in microbial biodiversity and metagenome studies. Biodiversity is usually targeted by classifying 16S ribosomal RNA genes, while metagenomic approaches target metabolic genes. However, both approaches remain isolated, as long as the taxonomic and functional information cannot be interrelated. Techniques like self-organizing maps (SOMs) have been applied to cluster metagenomes into taxon-specific bins in order to link biodiversity with functions, but have not been applied to broad-scale NGS-based metagenomics yet. Here, we provide a novel implementation, demonstrate its potential and practicability, and provide a web-based service for public usage. Evaluation with published data sets mimicking varyingly complex habitats resulted into classification specificities and sensitivities of close to 100% to above 90% from phylum to genus level for assemblies exceeding 8 kb for low and medium complexity data. When applied to five real-world metagenomes of medium complexity from direct pyrosequencing of marine subsurface waters, classifications of assemblies above 2.5 kb were in good agreement with fluorescence in situ hybridizations, indicating that biodiversity was mostly retained within the metagenomes, and confirming high classification specificities. This was validated by two protein-based classifications (PBCs) methods. SOMs were able to retrieve the relevant taxa down to the genus level, while surpassing PBCs in resolution. In order to make the approach accessible to a broad audience, we implemented a feature-rich web-based SOM application named TaxSOM, which is freely available at http://www.megx.net/toolbox/taxsom. TaxSOM can classify reads or assemblies exceeding 2.5 kb with high accuracy and thus assists in linking biodiversity and functions in metagenome studies, which is a precondition to study microbial ecology in a holistic fashion.
Collapse
|
15
|
Beloqui A, Nechitaylo TY, López-Cortés N, Ghazi A, Guazzaroni ME, Polaina J, Strittmatter AW, Reva O, Waliczek A, Yakimov MM, Golyshina OV, Ferrer M, Golyshin PN. Diversity of glycosyl hydrolases from cellulose-depleting communities enriched from casts of two earthworm species. Appl Environ Microbiol 2010; 76:5934-46. [PMID: 20622123 PMCID: PMC2935051 DOI: 10.1128/aem.00902-10] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2010] [Accepted: 07/01/2010] [Indexed: 11/20/2022] Open
Abstract
The guts and casts of earthworms contain microbial assemblages that process large amounts of organic polymeric substrates from plant litter and soil; however, the enzymatic potential of these microbial communities remains largely unexplored. In the present work, we retrieved carbohydrate-modifying enzymes through the activity screening of metagenomic fosmid libraries from cellulose-depleting microbial communities established with the fresh casts of two earthworm species, Aporrectodea caliginosa and Lumbricus terrestris, as inocula. Eight glycosyl hydrolases (GHs) from the A. caliginosa-derived community were multidomain endo-beta-glucanases, beta-glucosidases, beta-cellobiohydrolases, beta-galactosidase, and beta-xylosidases of known GH families. In contrast, two GHs derived from the L. terrestris microbiome had no similarity to any known GHs and represented two novel families of beta-galactosidases/alpha-arabinopyranosidases. Members of these families were annotated in public databases as conserved hypothetical proteins, with one being structurally related to isomerases/dehydratases. This study provides insight into their biochemistry, domain structures, and active-site architecture. The two communities were similar in bacterial composition but significantly different with regard to their eukaryotic inhabitants. Further sequence analysis of fosmids and plasmids bearing the GH-encoding genes, along with oligonucleotide usage pattern analysis, suggested that those apparently originated from Gammaproteobacteria (pseudomonads and Cellvibrio-like organisms), Betaproteobacteria (Comamonadaceae), and Alphaproteobacteria (Rhizobiales).
Collapse
Affiliation(s)
- Ana Beloqui
- CSIC, Institute of Catalysis, 28049 Madrid, Spain, HZI-Helmholtz Centre for Infection Research, 38124 Braunschweig, Germany, CSIC, Instituto de Agroquímica y Tecnología de Alimentos, 46980 Valencia, Spain, Eurofins MWG Operon, 85560 Ebersberg, Germany, Department of Biochemistry, University of Pretoria, 0002 Pretoria, South Africa, Istituto per l'Ambiente Marino Costiero, CNR, Messina 98122, Italy, School of Biological Sciences, Bangor University, Gwynedd LL57 2UW, United Kingdom, Centre for Integrated Research in the Rural Environment (CRRE), Aberystwyth University-Bangor University Partnership, Aberystwyth, Ceredigion SY23 3BF, United Kingdom
| | - Taras Y. Nechitaylo
- CSIC, Institute of Catalysis, 28049 Madrid, Spain, HZI-Helmholtz Centre for Infection Research, 38124 Braunschweig, Germany, CSIC, Instituto de Agroquímica y Tecnología de Alimentos, 46980 Valencia, Spain, Eurofins MWG Operon, 85560 Ebersberg, Germany, Department of Biochemistry, University of Pretoria, 0002 Pretoria, South Africa, Istituto per l'Ambiente Marino Costiero, CNR, Messina 98122, Italy, School of Biological Sciences, Bangor University, Gwynedd LL57 2UW, United Kingdom, Centre for Integrated Research in the Rural Environment (CRRE), Aberystwyth University-Bangor University Partnership, Aberystwyth, Ceredigion SY23 3BF, United Kingdom
| | - Nieves López-Cortés
- CSIC, Institute of Catalysis, 28049 Madrid, Spain, HZI-Helmholtz Centre for Infection Research, 38124 Braunschweig, Germany, CSIC, Instituto de Agroquímica y Tecnología de Alimentos, 46980 Valencia, Spain, Eurofins MWG Operon, 85560 Ebersberg, Germany, Department of Biochemistry, University of Pretoria, 0002 Pretoria, South Africa, Istituto per l'Ambiente Marino Costiero, CNR, Messina 98122, Italy, School of Biological Sciences, Bangor University, Gwynedd LL57 2UW, United Kingdom, Centre for Integrated Research in the Rural Environment (CRRE), Aberystwyth University-Bangor University Partnership, Aberystwyth, Ceredigion SY23 3BF, United Kingdom
| | - Azam Ghazi
- CSIC, Institute of Catalysis, 28049 Madrid, Spain, HZI-Helmholtz Centre for Infection Research, 38124 Braunschweig, Germany, CSIC, Instituto de Agroquímica y Tecnología de Alimentos, 46980 Valencia, Spain, Eurofins MWG Operon, 85560 Ebersberg, Germany, Department of Biochemistry, University of Pretoria, 0002 Pretoria, South Africa, Istituto per l'Ambiente Marino Costiero, CNR, Messina 98122, Italy, School of Biological Sciences, Bangor University, Gwynedd LL57 2UW, United Kingdom, Centre for Integrated Research in the Rural Environment (CRRE), Aberystwyth University-Bangor University Partnership, Aberystwyth, Ceredigion SY23 3BF, United Kingdom
| | - María-Eugenia Guazzaroni
- CSIC, Institute of Catalysis, 28049 Madrid, Spain, HZI-Helmholtz Centre for Infection Research, 38124 Braunschweig, Germany, CSIC, Instituto de Agroquímica y Tecnología de Alimentos, 46980 Valencia, Spain, Eurofins MWG Operon, 85560 Ebersberg, Germany, Department of Biochemistry, University of Pretoria, 0002 Pretoria, South Africa, Istituto per l'Ambiente Marino Costiero, CNR, Messina 98122, Italy, School of Biological Sciences, Bangor University, Gwynedd LL57 2UW, United Kingdom, Centre for Integrated Research in the Rural Environment (CRRE), Aberystwyth University-Bangor University Partnership, Aberystwyth, Ceredigion SY23 3BF, United Kingdom
| | - Julio Polaina
- CSIC, Institute of Catalysis, 28049 Madrid, Spain, HZI-Helmholtz Centre for Infection Research, 38124 Braunschweig, Germany, CSIC, Instituto de Agroquímica y Tecnología de Alimentos, 46980 Valencia, Spain, Eurofins MWG Operon, 85560 Ebersberg, Germany, Department of Biochemistry, University of Pretoria, 0002 Pretoria, South Africa, Istituto per l'Ambiente Marino Costiero, CNR, Messina 98122, Italy, School of Biological Sciences, Bangor University, Gwynedd LL57 2UW, United Kingdom, Centre for Integrated Research in the Rural Environment (CRRE), Aberystwyth University-Bangor University Partnership, Aberystwyth, Ceredigion SY23 3BF, United Kingdom
| | - Axel W. Strittmatter
- CSIC, Institute of Catalysis, 28049 Madrid, Spain, HZI-Helmholtz Centre for Infection Research, 38124 Braunschweig, Germany, CSIC, Instituto de Agroquímica y Tecnología de Alimentos, 46980 Valencia, Spain, Eurofins MWG Operon, 85560 Ebersberg, Germany, Department of Biochemistry, University of Pretoria, 0002 Pretoria, South Africa, Istituto per l'Ambiente Marino Costiero, CNR, Messina 98122, Italy, School of Biological Sciences, Bangor University, Gwynedd LL57 2UW, United Kingdom, Centre for Integrated Research in the Rural Environment (CRRE), Aberystwyth University-Bangor University Partnership, Aberystwyth, Ceredigion SY23 3BF, United Kingdom
| | - Oleg Reva
- CSIC, Institute of Catalysis, 28049 Madrid, Spain, HZI-Helmholtz Centre for Infection Research, 38124 Braunschweig, Germany, CSIC, Instituto de Agroquímica y Tecnología de Alimentos, 46980 Valencia, Spain, Eurofins MWG Operon, 85560 Ebersberg, Germany, Department of Biochemistry, University of Pretoria, 0002 Pretoria, South Africa, Istituto per l'Ambiente Marino Costiero, CNR, Messina 98122, Italy, School of Biological Sciences, Bangor University, Gwynedd LL57 2UW, United Kingdom, Centre for Integrated Research in the Rural Environment (CRRE), Aberystwyth University-Bangor University Partnership, Aberystwyth, Ceredigion SY23 3BF, United Kingdom
| | - Agnes Waliczek
- CSIC, Institute of Catalysis, 28049 Madrid, Spain, HZI-Helmholtz Centre for Infection Research, 38124 Braunschweig, Germany, CSIC, Instituto de Agroquímica y Tecnología de Alimentos, 46980 Valencia, Spain, Eurofins MWG Operon, 85560 Ebersberg, Germany, Department of Biochemistry, University of Pretoria, 0002 Pretoria, South Africa, Istituto per l'Ambiente Marino Costiero, CNR, Messina 98122, Italy, School of Biological Sciences, Bangor University, Gwynedd LL57 2UW, United Kingdom, Centre for Integrated Research in the Rural Environment (CRRE), Aberystwyth University-Bangor University Partnership, Aberystwyth, Ceredigion SY23 3BF, United Kingdom
| | - Michail M. Yakimov
- CSIC, Institute of Catalysis, 28049 Madrid, Spain, HZI-Helmholtz Centre for Infection Research, 38124 Braunschweig, Germany, CSIC, Instituto de Agroquímica y Tecnología de Alimentos, 46980 Valencia, Spain, Eurofins MWG Operon, 85560 Ebersberg, Germany, Department of Biochemistry, University of Pretoria, 0002 Pretoria, South Africa, Istituto per l'Ambiente Marino Costiero, CNR, Messina 98122, Italy, School of Biological Sciences, Bangor University, Gwynedd LL57 2UW, United Kingdom, Centre for Integrated Research in the Rural Environment (CRRE), Aberystwyth University-Bangor University Partnership, Aberystwyth, Ceredigion SY23 3BF, United Kingdom
| | - Olga V. Golyshina
- CSIC, Institute of Catalysis, 28049 Madrid, Spain, HZI-Helmholtz Centre for Infection Research, 38124 Braunschweig, Germany, CSIC, Instituto de Agroquímica y Tecnología de Alimentos, 46980 Valencia, Spain, Eurofins MWG Operon, 85560 Ebersberg, Germany, Department of Biochemistry, University of Pretoria, 0002 Pretoria, South Africa, Istituto per l'Ambiente Marino Costiero, CNR, Messina 98122, Italy, School of Biological Sciences, Bangor University, Gwynedd LL57 2UW, United Kingdom, Centre for Integrated Research in the Rural Environment (CRRE), Aberystwyth University-Bangor University Partnership, Aberystwyth, Ceredigion SY23 3BF, United Kingdom
| | - Manuel Ferrer
- CSIC, Institute of Catalysis, 28049 Madrid, Spain, HZI-Helmholtz Centre for Infection Research, 38124 Braunschweig, Germany, CSIC, Instituto de Agroquímica y Tecnología de Alimentos, 46980 Valencia, Spain, Eurofins MWG Operon, 85560 Ebersberg, Germany, Department of Biochemistry, University of Pretoria, 0002 Pretoria, South Africa, Istituto per l'Ambiente Marino Costiero, CNR, Messina 98122, Italy, School of Biological Sciences, Bangor University, Gwynedd LL57 2UW, United Kingdom, Centre for Integrated Research in the Rural Environment (CRRE), Aberystwyth University-Bangor University Partnership, Aberystwyth, Ceredigion SY23 3BF, United Kingdom
| | - Peter N. Golyshin
- CSIC, Institute of Catalysis, 28049 Madrid, Spain, HZI-Helmholtz Centre for Infection Research, 38124 Braunschweig, Germany, CSIC, Instituto de Agroquímica y Tecnología de Alimentos, 46980 Valencia, Spain, Eurofins MWG Operon, 85560 Ebersberg, Germany, Department of Biochemistry, University of Pretoria, 0002 Pretoria, South Africa, Istituto per l'Ambiente Marino Costiero, CNR, Messina 98122, Italy, School of Biological Sciences, Bangor University, Gwynedd LL57 2UW, United Kingdom, Centre for Integrated Research in the Rural Environment (CRRE), Aberystwyth University-Bangor University Partnership, Aberystwyth, Ceredigion SY23 3BF, United Kingdom
| |
Collapse
|
16
|
Bohlin J, Snipen L, Cloeckaert A, Lagesen K, Ussery D, Kristoffersen AB, Godfroid J. Genomic comparisons of Brucella spp. and closely related bacteria using base compositional and proteome based methods. BMC Evol Biol 2010; 10:249. [PMID: 20707916 PMCID: PMC2928237 DOI: 10.1186/1471-2148-10-249] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2010] [Accepted: 08/13/2010] [Indexed: 11/30/2022] Open
Abstract
Background Classification of bacteria within the genus Brucella has been difficult due in part to considerable genomic homogeneity between the different species and biovars, in spite of clear differences in phenotypes. Therefore, many different methods have been used to assess Brucella taxonomy. In the current work, we examine 32 sequenced genomes from genus Brucella representing the six classical species, as well as more recently described species, using bioinformatical methods. Comparisons were made at the level of genomic DNA using oligonucleotide based methods (Markov chain based genomic signatures, genomic codon and amino acid frequencies based comparisons) and proteomes (all-against-all BLAST protein comparisons and pan-genomic analyses). Results We found that the oligonucleotide based methods gave different results compared to that of the proteome based methods. Differences were also found between the oligonucleotide based methods used. Whilst the Markov chain based genomic signatures grouped the different species in genus Brucella according to host preference, the codon and amino acid frequencies based methods reflected small differences between the Brucella species. Only minor differences could be detected between all genera included in this study using the codon and amino acid frequencies based methods. Proteome comparisons were found to be in strong accordance with current Brucella taxonomy indicating a remarkable association between gene gain or loss on one hand and mutations in marker genes on the other. The proteome based methods found greater similarity between Brucella species and Ochrobactrum species than between species within genus Agrobacterium compared to each other. In other words, proteome comparisons of species within genus Agrobacterium were found to be more diverse than proteome comparisons between species in genus Brucella and genus Ochrobactrum. Pan-genomic analyses indicated that uptake of DNA from outside genus Brucella appears to be limited. Conclusions While both the proteome based methods and the Markov chain based genomic signatures were able to reflect environmental diversity between the different species and strains of genus Brucella, the genomic codon and amino acid frequencies based comparisons were not found adequate for such comparisons. The proteome comparison based phylogenies of the species in genus Brucella showed a surprising consistency with current Brucella taxonomy.
Collapse
Affiliation(s)
- Jon Bohlin
- Norwegian School of Veterinary Science, Department of Food Safety and Infection Biology, Epicenter, Ullevålsveien 72, PO Box 8146 Dep, NO-0033 Oslo, Norway.
| | | | | | | | | | | | | |
Collapse
|
17
|
Bohlin J, Snipen L, Hardy SP, Kristoffersen AB, Lagesen K, Dønsvik T, Skjerve E, Ussery DW. Analysis of intra-genomic GC content homogeneity within prokaryotes. BMC Genomics 2010; 11:464. [PMID: 20691090 PMCID: PMC3091660 DOI: 10.1186/1471-2164-11-464] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2010] [Accepted: 08/06/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Bacterial genomes possess varying GC content (total guanines (Gs) and cytosines (Cs) per total of the four bases within the genome) but within a given genome, GC content can vary locally along the chromosome, with some regions significantly more or less GC rich than on average. We have examined how the GC content varies within microbial genomes to assess whether this property can be associated with certain biological functions related to the organism's environment and phylogeny. We utilize a new quantity GCVAR, the intra-genomic GC content variability with respect to the average GC content of the total genome. A low GCVAR indicates intra-genomic GC homogeneity and high GCVAR heterogeneity. RESULTS The regression analyses indicated that GCVAR was significantly associated with domain (i.e. archaea or bacteria), phylum, and oxygen requirement. GCVAR was significantly higher among anaerobes than both aerobic and facultative microbes. Although an association has previously been found between mean genomic GC content and oxygen requirement, our analysis suggests that no such association exits when phylogenetic bias is accounted for. A significant association between GCVAR and mean GC content was also found but appears to be non-linear and varies greatly among phyla. CONCLUSIONS Our findings show that GCVAR is linked with oxygen requirement, while mean genomic GC content is not. We therefore suggest that GCVAR should be used as a complement to mean GC content.
Collapse
Affiliation(s)
- Jon Bohlin
- Norwegian School of Veterinary Science, Department of Food Safety and Infection Biology, Ullevålsveien 72, P,O, Box 8146 Dep, NO-0033 Oslo, Norway.
| | | | | | | | | | | | | | | |
Collapse
|
18
|
Davenport C, Ussery DW, Tümmler B. Comparative genomics of green sulfur bacteria. PHOTOSYNTHESIS RESEARCH 2010; 104:137-152. [PMID: 20099081 DOI: 10.1007/s11120-009-9515-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/20/2009] [Accepted: 12/07/2009] [Indexed: 05/28/2023]
Abstract
Eleven completely sequenced Chlorobi genomes were compared in oligonucleotide usage, gene contents, and synteny. The green sulfur bacteria (GSB) are equipped with a core genome that sustains their anoxygenic phototrophic lifestyle by photosynthesis, sulfur oxidation, and CO(2) fixation. Whole-genome gene family and single gene sequence comparisons yielded similar phylogenetic trees of the sequenced chromosomes indicating a concerted vertical evolution of large gene sets. Chromosomal synteny of genes is not preserved in the phylum Chlorobi. The accessory genome is characterized by anomalous oligonucleotide usage and endows the strains with individual features for transport, secretion, cell wall, extracellular constituents, and a few elements of the biosynthetic apparatus. Giant genes are a peculiar feature of the genera Chlorobium and Prosthecochloris. The predicted proteins have a huge molecular weight of 10(6), and are probably instrumental for the bacteria to generate their own intimate (micro)environment.
Collapse
Affiliation(s)
- Colin Davenport
- Klinische Forschergruppe, Klinik für Pädiatrische Pneumologie und Neonatologie, Medizinische Hochschule Hannover, Carl-Neuberg-Strasse 1, Hannover, Germany
| | | | | |
Collapse
|
19
|
Davenport CF, Tümmler B. Abundant oligonucleotides common to most bacteria. PLoS One 2010; 5:e9841. [PMID: 20352124 PMCID: PMC2843746 DOI: 10.1371/journal.pone.0009841] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2009] [Accepted: 03/03/2010] [Indexed: 11/25/2022] Open
Abstract
Background Bacteria show a bias in their genomic oligonucleotide composition far beyond that dictated by G+C content. Patterns of over- and underrepresented oligonucleotides carry a phylogenetic signal and are thus diagnostic for individual species. Patterns of short oligomers have been investigated by multiple groups in large numbers of bacteria genomes. However, global distributions of the most highly overrepresented mid-sized oligomers have not been assessed across all prokaryotes to date. We surveyed overrepresented mid-length oligomers across all prokaryotes and normalised for base composition and embedded oligomers using zero and second order Markov models. Principal Findings Here we report a presumably ancient set of oligomers conserved and overrepresented in nearly all branches of prokaryotic life, including Archaea. These oligomers are either adenine rich homopurines with one to three guanine nucleosides, or homopyridimines with one to four cytosine nucleosides. They do not show a consistent preference for coding or non-coding regions or aggregate in any coding frame, implying a role in DNA structure and as polypeptide binding sites. Structural parameters indicate these oligonucleotides to be an extreme and rigid form of B-DNA prone to forming triple stranded helices under common physiological conditions. Moreover, the narrow minor grooves of these structures are recognised by DNA binding and nucleoid associated proteins such as HU. Conclusion Homopurine and homopyrimidine oligomers exhibit distinct and unusual structural features and are present at high copy number in nearly all prokaryotic lineages. This fact suggests a non-neutral role of these oligonucleotides for bacterial genome organization that has been maintained throughout evolution.
Collapse
Affiliation(s)
- Colin F Davenport
- Pediatric Pneumology and Neonatology, Hanover Medical School, Hanover, Lower Saxony, Germany.
| | | |
Collapse
|
20
|
Examination of genome homogeneity in prokaryotes using genomic signatures. PLoS One 2009; 4:e8113. [PMID: 19956556 PMCID: PMC2781299 DOI: 10.1371/journal.pone.0008113] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2009] [Accepted: 11/05/2009] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND DNA word frequencies, normalized for genomic AT content, are remarkably stable within prokaryotic genomes and are therefore said to reflect a "genomic signature." The genomic signatures can be used to phylogenetically classify organisms from arbitrary sampled DNA. Genomic signatures can also be used to search for horizontally transferred DNA or DNA regions subjected to special selection forces. Thus, the stability of the genomic signature can be used as a measure of genomic homogeneity. The factors associated with the stability of the genomic signatures are not known, and this motivated us to investigate further. We analyzed the intra-genomic variance of genomic signatures based on AT content normalization (0(th) order Markov model) as well as genomic signatures normalized by smaller DNA words (1(st) and 2(nd) order Markov models) for 636 sequenced prokaryotic genomes. Regression models were fitted, with intra-genomic signature variance as the response variable, to a set of factors representing genomic properties such as genomic AT content, genome size, habitat, phylum, oxygen requirement, optimal growth temperature and oligonucleotide usage variance (OUV, a measure of oligonucleotide usage bias), measured as the variance between genomic tetranucleotide frequencies and Markov chain approximated tetranucleotide frequencies, as predictors. PRINCIPAL FINDINGS Regression analysis revealed that OUV was the most important factor (p<0.001) determining intra-genomic homogeneity as measured using genomic signatures. This means that the less random the oligonucleotide usage is in the sense of higher OUV, the more homogeneous the genome is in terms of the genomic signature. The other factors influencing variance in the genomic signature (p<0.001) were genomic AT content, phylum and oxygen requirement. CONCLUSIONS Genomic homogeneity in prokaryotes is intimately linked to genomic GC content, oligonucleotide usage bias (OUV) and aerobiosis, while oligonucleotide usage bias (OUV) is associated with genomic GC content, aerobiosis and habitat.
Collapse
|
21
|
Bohlin J, Skjerve E, Ussery DW. Analysis of genomic signatures in prokaryotes using multinomial regression and hierarchical clustering. BMC Genomics 2009; 10:487. [PMID: 19845945 PMCID: PMC2770534 DOI: 10.1186/1471-2164-10-487] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2009] [Accepted: 10/21/2009] [Indexed: 11/26/2022] Open
Abstract
Background Recently there has been an explosion in the availability of bacterial genomic sequences, making possible now an analysis of genomic signatures across more than 800 hundred different bacterial chromosomes, from a wide variety of environments. Using genomic signatures, we pair-wise compared 867 different genomic DNA sequences, taken from chromosomes and plasmids more than 100,000 base-pairs in length. Hierarchical clustering was performed on the outcome of the comparisons before a multinomial regression model was fitted. The regression model included the cluster groups as the response variable with AT content, phyla, growth temperature, selective pressure, habitat, sequence size, oxygen requirement and pathogenicity as predictors. Results Many significant factors were associated with the genomic signature, most notably AT content. Phyla was also an important factor, although considerably less so than AT content. Small improvements to the regression model, although significant, were also obtained by factors such as sequence size, habitat, growth temperature, selective pressure measured as oligonucleotide usage variance, and oxygen requirement. Conclusion The statistics obtained using hierarchical clustering and multinomial regression analysis indicate that the genomic signature is shaped by many factors, and this may explain the varying ability to classify prokaryotic organisms below genus level.
Collapse
Affiliation(s)
- Jon Bohlin
- Norwegian School of Veterinary Science, Oslo, Norway.
| | | | | |
Collapse
|
22
|
Bohlin J, Hardy SP, Ussery DW. Stretches of alternating pyrimidine/purines and purines are respectively linked with pathogenicity and growth temperature in prokaryotes. BMC Genomics 2009; 10:346. [PMID: 19646265 PMCID: PMC2728739 DOI: 10.1186/1471-2164-10-346] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2009] [Accepted: 07/31/2009] [Indexed: 02/02/2023] Open
Abstract
Background The genomic fractions of purine (RR) and alternating pyrimidine/purine (YR) stretches of 10 base pairs or more, have been linked to genomic AT content, the formation of different DNA helices, strand-biased gene distribution, DNA structure, and more. Although some of these factors are a consequence of the chemical properties of purines and pyrimidines, a thorough statistical examination of the distributions of YR/RR stretches in sequenced prokaryotic chromosomes has to the best of our knowledge, not been undertaken. The aim of this study is to expand upon previous research by using regression analysis to investigate how AT content, habitat, growth temperature, pathogenicity, phyla, oxygen requirement and halotolerance correlated with the distribution of RR and YR stretches in prokaryotes. Results Our results indicate that RR and YR-stretches are differently distributed in prokaryotic phyla. RR stretches are overrepresented in all phyla except for the Actinobacteria and β-Proteobacteria. In contrast, YR tracts are underrepresented in all phyla except for the β-Proteobacterial group. YR-stretches are associated with phylum, pathogenicity and habitat, whilst RR-tracts are associated with phylum, AT content, oxygen requirement, growth temperature and halotolerance. All associations described were statistically significant with p < 0.001. Conclusion Analysis of chromosomal distributions of RR/YR sequences in prokaryotes reveals a set of associations with environmental factors not observed with mono- and oligonucleotide frequencies. This implies that important information can be found in the distribution of RR/YR stretches that is more difficult to obtain from genomic mono- and oligonucleotide frequencies. The association between pathogenicity and fractions of YR stretches is assumed to be linked to recombination and horizontal transfer.
Collapse
Affiliation(s)
- Jon Bohlin
- Norwegian School of Veterinary Science, Oslo, Norway.
| | | | | |
Collapse
|
23
|
Davenport CF, Wiehlmann L, Reva ON, Tümmler B. Visualization of Pseudomonas genomic structure by abundant 8-14mer oligonucleotides. Environ Microbiol 2009; 11:1092-104. [PMID: 19161433 DOI: 10.1111/j.1462-2920.2008.01839.x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Under- and over-represented mono- to hexanucleotides are signatures of bacterial genomes, but the compositional biases of octa- to tetradecanucleotides have not yet been explored. Thirteen completely sequenced genomes of the Pseudomonas genus were searched for highly overrepresented 8-14mers. Between 59-989 overrepresented 8-14mers were found to exceed the applied threshold value. All genomic data sets of the 13 strains showed a consistent pattern, with individual oligomers clustering in either non-coding or coding regions. Non-coding oligonucleotides were typically part of longer repeats. Coding oligonucleotides were evenly distributed in the core genome, preferred one reading frame and matched with the local tetranucleotide usage patterns. Genomic islands were recognized by the depletion of overrepresented oligonucleotides. Several mainly coding 8-14mers occurred in genomes on average every 10 000 bp or less. Such frequently occurring 8-14mers could become useful markers for species identification. In the future of next-generation ultra-high throughput DNA sequencing, the composition of bacterial metagenomes may be quantified by scanning the primary sequence reads for these 8-14mer markers.
Collapse
Affiliation(s)
- Colin F Davenport
- Klinische Forschergruppe, OE 6711, Medizinische Hochschule Hannover, Hanover, Germany.
| | | | | | | |
Collapse
|
24
|
Pride DT, Schoenfeld T. Genome signature analysis of thermal virus metagenomes reveals Archaea and thermophilic signatures. BMC Genomics 2008; 9:420. [PMID: 18798991 PMCID: PMC2556352 DOI: 10.1186/1471-2164-9-420] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2008] [Accepted: 09/17/2008] [Indexed: 11/18/2022] Open
Abstract
Background Metagenomic analysis provides a rich source of biological information for otherwise intractable viral communities. However, study of viral metagenomes has been hampered by its nearly complete reliance on BLAST algorithms for identification of DNA sequences. We sought to develop algorithms for examination of viral metagenomes to identify the origin of sequences independent of BLAST algorithms. We chose viral metagenomes obtained from two hot springs, Bear Paw and Octopus, in Yellowstone National Park, as they represent simple microbial populations where comparatively large contigs were obtained. Thermal spring metagenomes have high proportions of sequences without significant Genbank homology, which has hampered identification of viruses and their linkage with hosts. To analyze each metagenome, we developed a method to classify DNA fragments using genome signature-based phylogenetic classification (GSPC), where metagenomic fragments are compared to a database of oligonucleotide signatures for all previously sequenced Bacteria, Archaea, and viruses. Results From both Bear Paw and Octopus hot springs, each assembled contig had more similarity to other metagenome contigs than to any sequenced microbial genome based on GSPC analysis, suggesting a genome signature common to each of these extreme environments. While viral metagenomes from Bear Paw and Octopus share some similarity, the genome signatures from each locale are largely unique. GSPC using a microbial database predicts most of the Octopus metagenome has archaeal signatures, while bacterial signatures predominate in Bear Paw; a finding consistent with those of Genbank BLAST. When using a viral database, the majority of the Octopus metagenome is predicted to belong to archaeal virus Families Globuloviridae and Fuselloviridae, while none of the Bear Paw metagenome is predicted to belong to archaeal viruses. As expected, when microbial and viral databases are combined, each of the Octopus and Bear Paw metagenomic contigs are predicted to belong to viruses rather than to any Bacteria or Archaea, consistent with the apparent viral origin of both metagenomes. Conclusion That BLAST searches identify no significant homologs for most metagenome contigs, while GSPC suggests their origin as archaeal viruses or bacteriophages, indicates GSPC provides a complementary approach in viral metagenomic analysis.
Collapse
Affiliation(s)
- David T Pride
- Department of Medicine, Division of Infectious Diseases and Geographic Medicine, Stanford University School of Medicine, Stanford, CA, USA.
| | | |
Collapse
|
25
|
Ganesan H, Rakitianskaia AS, Davenport CF, Tümmler B, Reva ON. The SeqWord Genome Browser: an online tool for the identification and visualization of atypical regions of bacterial genomes through oligonucleotide usage. BMC Bioinformatics 2008; 9:333. [PMID: 18687122 PMCID: PMC2528017 DOI: 10.1186/1471-2105-9-333] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2008] [Accepted: 08/07/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Data mining in large DNA sequences is a major challenge in microbial genomics and bioinformatics. Oligonucleotide usage (OU) patterns provide a wealth of information for large scale sequence analysis and visualization. The purpose of this research was to make OU statistical analysis available as a novel web-based tool for functional genomics and annotation. The tool is also available as a downloadable package. RESULTS The SeqWord Genome Browser (SWGB) was developed to visualize the natural compositional variation of DNA sequences. The applet is also used for identification of divergent genomic regions both in annotated sequences of bacterial chromosomes, plasmids, phages and viruses, and in raw DNA sequences prior to annotation by comparing local and global OU patterns. The applet allows fast and reliable identification of clusters of horizontally transferred genomic islands, large multi-domain genes and genes for ribosomal RNA. Within the majority of genomic fragments (also termed genomic core sequence), regions enriched with housekeeping genes, ribosomal proteins and the regions rich in pseudogenes or genetic vestiges may be contrasted. CONCLUSION The SWGB applet presents a range of comprehensive OU statistical parameters calculated for a range of bacterial species, plasmids and phages. It is available on the Internet at http://www.bi.up.ac.za/SeqWord/mhhapplet.php.
Collapse
Affiliation(s)
- Hamilton Ganesan
- Dep of Biochemistry, Bioinformatics and Computational Biology Unit, University of Pretoria, Lynnwood road, Hillcrest, Pretoria, 0002, South Africa.
| | | | | | | | | |
Collapse
|
26
|
Bohlin J, Skjerve E, Ussery DW. Investigations of oligonucleotide usage variance within and between prokaryotes. PLoS Comput Biol 2008; 4:e1000057. [PMID: 18421372 PMCID: PMC2289840 DOI: 10.1371/journal.pcbi.1000057] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2007] [Accepted: 03/12/2008] [Indexed: 11/18/2022] Open
Abstract
Oligonucleotide usage in archaeal and bacterial genomes can be linked to a number of properties, including codon usage (trinucleotides), DNA base-stacking energy (dinucleotides), and DNA structural conformation (di- to tetranucleotides). We wanted to assess the statistical information potential of different DNA 'word-sizes' and explore how oligonucleotide frequencies differ in coding and non-coding regions. In addition, we used oligonucleotide frequencies to investigate DNA composition and how DNA sequence patterns change within and between prokaryotic organisms. Among the results found was that prokaryotic chromosomes can be described by hexanucleotide frequencies, suggesting that prokaryotic DNA is predominantly short range correlated, i.e., information in prokaryotic genomes is encoded in short oligonucleotides. Oligonucleotide usage varied more within AT-rich and host-associated genomes than in GC-rich and free-living genomes, and this variation was mainly located in non-coding regions. Bias (selectional pressure) in tetranucleotide usage correlated with GC content, and coding regions were more biased than non-coding regions. Non-coding regions were also found to be approximately 5.5% more AT-rich than coding regions, on average, in the 402 chromosomes examined. Pronounced DNA compositional differences were found both within and between AT-rich and GC-rich genomes. GC-rich genomes were more similar and biased in terms of tetranucleotide usage in non-coding regions than AT-rich genomes. The differences found between AT-rich and GC-rich genomes may possibly be attributed to lifestyle, since tetranucleotide usage within host-associated bacteria was, on average, more dissimilar and less biased than free-living archaea and bacteria.
Collapse
Affiliation(s)
- Jon Bohlin
- Norwegian School of Veterinary Science, Oslo, Norway
| | | | | |
Collapse
|
27
|
|
28
|
Bohlin J, Skjerve E, Ussery DW. Reliability and applications of statistical methods based on oligonucleotide frequencies in bacterial and archaeal genomes. BMC Genomics 2008; 9:104. [PMID: 18307761 PMCID: PMC2289816 DOI: 10.1186/1471-2164-9-104] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2007] [Accepted: 02/28/2008] [Indexed: 11/22/2022] Open
Abstract
Background The increasing number of sequenced prokaryotic genomes contains a wealth of genomic data that needs to be effectively analysed. A set of statistical tools exists for such analysis, but their strengths and weaknesses have not been fully explored. The statistical methods we are concerned with here are mainly used to examine similarities between archaeal and bacterial DNA from different genomes. These methods compare observed genomic frequencies of fixed-sized oligonucleotides with expected values, which can be determined by genomic nucleotide content, smaller oligonucleotide frequencies, or be based on specific statistical distributions. Advantages with these statistical methods include measurements of phylogenetic relationship with relatively small pieces of DNA sampled from almost anywhere within genomes, detection of foreign/conserved DNA, and homology searches. Our aim was to explore the reliability and best suited applications for some popular methods, which include relative oligonucleotide frequencies (ROF), di- to hexanucleotide zero'th order Markov methods (ZOM) and 2.order Markov chain Method (MCM). Tests were performed on distant homology searches with large DNA sequences, detection of foreign/conserved DNA, and plasmid-host similarity comparisons. Additionally, the reliability of the methods was tested by comparing both real and random genomic DNA. Results Our findings show that the optimal method is context dependent. ROFs were best suited for distant homology searches, whilst the hexanucleotide ZOM and MCM measures were more reliable measures in terms of phylogeny. The dinucleotide ZOM method produced high correlation values when used to compare real genomes to an artificially constructed random genome with similar %GC, and should therefore be used with care. The tetranucleotide ZOM measure was a good measure to detect horizontally transferred regions, and when used to compare the phylogenetic relationships between plasmids and hosts, significant correlation (R2 = 0.4) was found with genomic GC content and intra-chromosomal homogeneity. Conclusion The statistical methods examined are fast, easy to implement, and powerful for a number of different applications involving genomic sequence comparisons. However, none of the measures examined were superior in all tests, and therefore the choice of the statistical method should depend on the task at hand.
Collapse
Affiliation(s)
- Jon Bohlin
- Norwegian School of Veterinary Science, P.O. Box 8146 Dep., N-0033 Oslo, Norway.
| | | | | |
Collapse
|
29
|
Reva ON, Hallin PF, Willenbrock H, Sicheritz-Ponten T, Tümmler B, Ussery DW. Global features of the Alcanivorax borkumensis SK2 genome. Environ Microbiol 2007; 10:614-25. [PMID: 18081853 DOI: 10.1111/j.1462-2920.2007.01483.x] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
The global feature of the completely sequenced Alcanivorax borkumensis SK2 type strain chromosome is its symmetry and homogeneity. The origin and terminus of replication are located opposite to each other in the chromosome and are discerned with high signal to noise ratios by maximal oligonucleotide usage biases on the leading and lagging strand. Genomic DNA structure is rather uniform throughout the chromosome with respect to intrinsic curvature, position preference or base stacking energy. The orthologs and paralogs of A. borkumensis genes with the highest sequence homology were found in most cases among gamma-Proteobacteria, with Acinetobacter and P. aeruginosa as closest relatives. A. borkumensis shares a similar oligonucleotide usage and promoter structure with the Pseudomonadales. A comparatively low number of only 18 genome islands with atypical oligonucleotide usage was detected in the A. borkumensis chromosome. The gene clusters that confer the assimilation of aliphatic hydrocarbons, are localized in two genome islands which were probably acquired from an ancestor of the Yersinia lineage, whereas the alk genes of Pseudomonas putida still exhibit the typical Alcanivorax oligonucleotide signature indicating a complex evolution of this major hydrocarbonoclastic trait.
Collapse
Affiliation(s)
- Oleg N Reva
- Klinische Forschergruppe, OE6711, Medizinische Hochschule Hannover, Carl-Neuberg-Strasse 1, D-30625 Hannover, Germany
| | | | | | | | | | | |
Collapse
|
30
|
Abstract
Characterisation of new viruses is often hindered by difficulties in amplifying them in cell culture, limited antigenic/serological cross-reactivity or the lack of nucleic acid hybridisation to known viral sequences. Numerous molecular methods have been used to genetically characterise new viruses without prior in vitro replication or the use of virus-specific reagents. In the recent metagenomic studies viral particles from uncultured environmental and clinical samples have been purified and their nucleic acids randomly amplified prior to subcloning and sequencing. Already known and novel viruses were then identified by comparing their translated sequence to those of viral proteins in public sequence databases. Metagenomic approaches to viral characterisation have been applied to seawater, near shore sediments, faeces, serum, plasma and respiratory secretions and have broadened the range of known viral diversity. Selection of samples with high viral loads, purification of viral particles, removal of cellular nucleic acids, efficient sequence-independent amplification of viral RNA and DNA, recognisable sequence similarities to known viral sequences and deep sampling of the nucleic acid populations through large scale sequencing can all improve the yield of new viruses. This review lists some of the animal viruses recently identified using sequence-independent methods, current laboratory and bioinformatics methods, together with their limitations and potential improvements. Viral metagenomic approaches provide novel opportunities to generate an unbiased characterisation of the viral populations in various organisms and environments.
Collapse
Affiliation(s)
- Eric L Delwart
- Blood Systems Research Institute, University of California, San Francisco, CA 94118, USA.
| |
Collapse
|
31
|
Klockgether J, Würdemann D, Reva O, Wiehlmann L, Tümmler B. Diversity of the abundant pKLC102/PAGI-2 family of genomic islands in Pseudomonas aeruginosa. J Bacteriol 2007; 189:2443-59. [PMID: 17194795 PMCID: PMC1899365 DOI: 10.1128/jb.01688-06] [Citation(s) in RCA: 88] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2006] [Accepted: 01/08/2007] [Indexed: 12/27/2022] Open
Abstract
The known genomic islands of Pseudomonas aeruginosa clone C strains are integrated into tRNA(Lys) (pKLC102) or tRNA(Gly) (PAGI-2 and PAGI-3) genes and differ from their core genomes by distinctive tetranucleotide usage patterns. pKLC102 and the related island PAPI-1 from P. aeruginosa PA14 were spontaneously mobilized from their host chromosomes at frequencies of 10% and 0.3%, making pKLC102 the most mobile genomic island known with a copy number of 30 episomal circular pKLC102 molecules per cell. The incidence of islands of the pKLC102/PAGI-2 type was investigated in 71 unrelated P. aeruginosa strains from diverse habitats and geographic origins. pKLC102- and PAGI-2-like islands were identified in 50 and 31 strains, respectively, and 15 and 10 subtypes were differentiated by hybridization on pKLC102 and PAGI-2 macroarrays. The diversity of PAGI-2-type islands was mainly caused by one large block of strain-specific genes, whereas the diversity of pKLC102-type islands was primarily generated by subtype-specific combination of gene cassettes. Chromosomal loss of PAGI-2 could be documented in sequential P. aeruginosa isolates from individuals with cystic fibrosis. PAGI-2 was present in most tested Cupriavidus metallidurans and Cupriavidus campinensis isolates from polluted environments, demonstrating the spread of PAGI-2 across habitats and species barriers. The pKLC102/PAGI-2 family is prevalent in numerous beta- and gammaproteobacteria and is characterized by high asymmetry of the cDNA strands. This evolutionarily ancient family of genomic islands retained its oligonucleotide signature during horizontal spread within and among taxa.
Collapse
Affiliation(s)
- Jens Klockgether
- Klinische Forschergruppe, OE 6710, Medizinische Hochschule Hannover, Carl-Neuberg-Str. 1, D-30625 Hannover, Germany
| | | | | | | | | |
Collapse
|
32
|
Chen XH, Vater J, Piel J, Franke P, Scholz R, Schneider K, Koumoutsi A, Hitzeroth G, Grammel N, Strittmatter AW, Gottschalk G, Süssmuth RD, Borriss R. Structural and functional characterization of three polyketide synthase gene clusters in Bacillus amyloliquefaciens FZB 42. J Bacteriol 2006; 188:4024-36. [PMID: 16707694 PMCID: PMC1482889 DOI: 10.1128/jb.00052-06] [Citation(s) in RCA: 250] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Although bacterial polyketides are of considerable biomedical interest, the molecular biology of polyketide biosynthesis in Bacillus spp., one of the richest bacterial sources of bioactive natural products, remains largely unexplored. Here we assign for the first time complete polyketide synthase (PKS) gene clusters to Bacillus antibiotics. Three giant modular PKS systems of the trans-acyltransferase type were identified in Bacillus amyloliquefaciens FZB 42. One of them, pks1, is an ortholog of the pksX operon with a previously unknown function in the sequenced model strain Bacillus subtilis 168, while the pks2 and pks3 clusters are novel gene clusters. Cassette mutagenesis combined with advanced mass spectrometric techniques such as matrix-assisted laser desorption ionization-time of flight mass spectrometry and liquid chromatography-electrospray ionization mass spectrometry revealed that the pks1 (bae) and pks3 (dif) gene clusters encode the biosynthesis of the polyene antibiotics bacillaene and difficidin or oxydifficidin, respectively. In addition, B. subtilis OKB105 (pheA sfp(0)), a transformant of the B. subtilis 168 derivative JH642, was shown to produce bacillaene, demonstrating that the pksX gene cluster directs the synthesis of that polyketide. The GenBank accession numbers for gene clusters pks1(bae), pks2, and pks3(dif) are AJ 634060.2, AJ 6340601.2, and AJ 6340602.2, respectively.
Collapse
Affiliation(s)
- Xiao-Hua Chen
- Institut für Biologie, AG Bakteriengenetik, Humboldt-Universität Berlin, Chausseestrasse 115, D-10115 Berlin, Germany
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
33
|
Pride DT, Wassenaar TM, Ghose C, Blaser MJ. Evidence of host-virus co-evolution in tetranucleotide usage patterns of bacteriophages and eukaryotic viruses. BMC Genomics 2006; 7:8. [PMID: 16417644 PMCID: PMC1360066 DOI: 10.1186/1471-2164-7-8] [Citation(s) in RCA: 94] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2005] [Accepted: 01/18/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Virus taxonomy is based on morphologic characteristics, as there are no widely used non-phenotypic measures for comparison among virus families. We examined whether there is phylogenetic signal in virus nucleotide usage patterns that can be used to determine ancestral relationships. The well-studied model of tail morphology in bacteriophage classification was used for comparison with nucleotide usage patterns. Tetranucleotide usage deviation (TUD) patterns were chosen since they have previously been shown to contain phylogenetic signal similar to that of 16S rRNA. RESULTS We found that bacteriophages have unique TUD patterns, representing genomic signatures that are relatively conserved among those with similar host range. Analysis of TUD-based phylogeny indicates that host influences are important in bacteriophage evolution, and phylogenies containing both phages and their hosts support their co-evolution. TUD-based phylogeny of eukaryotic viruses indicates that they cluster largely based on nucleic acid type and genome size. Similarities between eukaryotic virus phylogenies based on TUD and gene content substantiate the TUD methodology. CONCLUSION Differences between phenotypic and TUD analysis may provide clues to virus ancestry not previously inferred. As such, TUD analysis provides a complementary approach to morphology-based systems in analysis of virus evolution.
Collapse
Affiliation(s)
- David T Pride
- Department of Medicine, Division of Infectious Diseases And Geographic Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Trudy M Wassenaar
- Molecular Microbiology and Genomics Consultants, Zotzenheim, Germany
| | - Chandrabali Ghose
- Department of Medicine, Division of Infectious Diseases, Harvard Medical School, Boston, MA, USA
| | - Martin J Blaser
- Departments of Medicine and Microbiology, New York University School of Medicine and VA Medical Center, New York, NY4, USA
| |
Collapse
|
34
|
Reva ON, Tümmler B. Differentiation of regions with atypical oligonucleotide composition in bacterial genomes. BMC Bioinformatics 2005; 6:251. [PMID: 16225667 PMCID: PMC1274298 DOI: 10.1186/1471-2105-6-251] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2005] [Accepted: 10/14/2005] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Complete sequencing of bacterial genomes has become a common technique of present day microbiology. Thereafter, data mining in the complete sequence is an essential step. New in silico methods are needed that rapidly identify the major features of genome organization and facilitate the prediction of the functional class of ORFs. We tested the usefulness of local oligonucleotide usage (OU) patterns to recognize and differentiate types of atypical oligonucleotide composition in DNA sequences of bacterial genomes. RESULTS A total of 163 bacterial genomes of eubacteria and archaea published in the NCBI database were analyzed. Local OU patterns exhibit substantial intrachromosomal variation in bacteria. Loci with alternative OU patterns were parts of horizontally acquired gene islands or ancient regions such as genes for ribosomal proteins and RNAs. OU statistical parameters, such as local pattern deviation (D), pattern skew (PS) and OU variance (OUV) enabled the detection and visualization of gene islands of different functional classes. CONCLUSION A set of approaches has been designed for the statistical analysis of nucleotide sequences of bacterial genomes. These methods are useful for the visualization and differentiation of regions with atypical oligonucleotide composition prior to or accompanying gene annotation.
Collapse
Affiliation(s)
- Oleg N Reva
- Klinische Forschergruppe, OE6711, Medizinische Hochschule Hannover, Carl-Neuberg-Strasse 1, D-30625 Hannover, Germany
- Danylo Zabolotny Institute of Microbiology and Virology of the National Academy of Science of Ukraine, Dep. of Antibiotics, 154 Zabolotnogo Str., D03680, Kyiv GSP, Ukraine
| | - Burkhard Tümmler
- Klinische Forschergruppe, OE6711, Medizinische Hochschule Hannover, Carl-Neuberg-Strasse 1, D-30625 Hannover, Germany
| |
Collapse
|