Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Reva ON, Tümmler B. Global features of sequences of bacterial chromosomes, plasmids and phages revealed by analysis of oligonucleotide usage patterns. BMC Bioinformatics 2004;5:90. [PMID: 15239845 PMCID: PMC487896 DOI: 10.1186/1471-2105-5-90] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2004] [Accepted: 07/07/2004] [Indexed: 11/29/2022] Open

For:	Reva ON, Tümmler B. Global features of sequences of bacterial chromosomes, plasmids and phages revealed by analysis of oligonucleotide usage patterns. BMC Bioinformatics 2004;5:90. [PMID: 15239845 PMCID: PMC487896 DOI: 10.1186/1471-2105-5-90] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2004] [Accepted: 07/07/2004] [Indexed: 11/29/2022] Open

Number

Cited by Other Article(s)

Darrington M, Leftwich PT, Holmes NA, Friend LA, Clarke NVE, Worsley SF, Margaritopolous JT, Hogenhout SA, Hutchings MI, Chapman T. Characterisation of the symbionts in the Mediterranean fruit fly gut. Microb Genom 2022;8. [PMID: 35446250 PMCID: PMC9453069 DOI: 10.1099/mgen.0.000801] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open

Abstract

Symbioses between bacteria and their insect hosts can range from loose associations through to obligate interdependence. While fundamental evolutionary insights have been gained from the in-depth study of obligate mutualisms, there is increasing interest in the evolutionary potential of flexible symbiotic associations between hosts and their gut microbiomes. Understanding relationships between microbes and hosts also offers the potential for exploitation for insect control. Here, we investigate the gut microbiome of a global agricultural pest, the Mediterranean fruit fly (Ceratitis capitata). We used 16S rRNA profiling to compare the gut microbiomes of laboratory and wild strains raised on different diets and from flies collected from various natural plant hosts. The results showed that medfly guts harbour a simple microbiome that is primarily determined by the larval diet. However, regardless of the laboratory diet or natural plant host on which flies were raised, Klebsiella spp. dominated medfly microbiomes and were resistant to removal by antibiotic treatment. We sequenced the genome of the dominant putative Klebsiella spp. (‘Medkleb’) isolated from the gut of the Toliman wild-type strain. Genome-wide ANI analysis placed Medkleb within the K. oxytoca / michiganensis group. Species level taxonomy for Medkleb was resolved using a mutli-locus phylogenetic approach - and molecular, sequence and phenotypic analyses all supported its identity as K. michiganensis. Medkleb has a genome size (5825435 bp) which is 1.6 standard deviations smaller than the mean genome size of free-living Klebsiella spp. Medkleb also lacks some genes involved in environmental sensing. Moreover, the Medkleb genome contains at least two recently acquired unique genomic islands as well as genes that encode pectinolytic enzymes capable of degrading plant cell walls. This may be advantageous given that the medfly diet includes unripe fruits containing high proportions of pectin. The results suggest that the medfly harbours a commensal gut bacterium that may have developed a mutualistic association with its host and provide nutritional benefits.

Collapse

Goussarov G, Cleenwerck I, Mysara M, Leys N, Monsieurs P, Tahon G, Carlier A, Vandamme P, Van Houdt R. PaSiT: a novel approach based on short-oligonucleotide frequencies for efficient bacterial identification and typing. Bioinformatics 2020;36:2337-2344. [PMID: 31899493 PMCID: PMC7178395 DOI: 10.1093/bioinformatics/btz964] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2019] [Revised: 11/21/2019] [Accepted: 12/30/2019] [Indexed: 11/13/2022] Open

Bohlin J, Pettersson JHO. Evolution of Genomic Base Composition: From Single Cell Microbes to Multicellular Animals. Comput Struct Biotechnol J 2019;17:362-370. [PMID: 30949307 PMCID: PMC6429543 DOI: 10.1016/j.csbj.2019.03.001] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Revised: 02/28/2019] [Accepted: 03/01/2019] [Indexed: 01/07/2023] Open

Bohlin J, Eldholm V, Brynildsrud O, Petterson JHO, Alfsnes K. Modeling of the GC content of the substituted bases in bacterial core genomes. BMC Genomics 2018;19:589. [PMID: 30081825 PMCID: PMC6080486 DOI: 10.1186/s12864-018-4984-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2018] [Accepted: 07/31/2018] [Indexed: 12/13/2022] Open

Abstract

Background

The purpose of the present study was to examine the GC content of substituted bases (sbGC) in the core genomes of 35 bacterial species. Each species, or core genome, constituted genomes from at least 10 strains. We also wanted to explore whether sbGC for each strain was associated with the corresponding species’ core genome GC content (cgGC). We present a simple mathematical model that estimates sbGC from cgGC. The model assumes only that the estimated sbGC is a function of cgGC proportional to fixed AT→GC (α) and GC → AT (β) mutation rates. Non-linear regression was used to estimate parameters α and β from the empirical data described above.

Results

We found that sbGC for each strain showed a non-linear association with the corresponding cgGC with a bias towards higher GC content for most core genomes (66.3% of the strains), assuming as a null-hypothesis that sbGC should be approximately equal to cgGC. The most GC rich core genomes (i.e. approximately %GC > 60), on the other hand, exhibited slightly less GC-biased sbGC than expected. The best fitted regression model indicates that GC → AT mutation rates β = (1.91 ± 0.13) p < 0.001 are approximately (1.91/0.79) = 2.42 times as high, on average, as AT→GC α = (− 0.79 ± 0.25) p < 0.001 mutation rates. Whether the observed sbGC GC-bias for all but the most GC-rich prokaryotic species is due to selection, compensating for the GC → AT mutation bias, and/or selective neutral processes is currently debated. Residual standard error was found to be σ = 0.076 indicating estimated errors of sbGC to be approximately within ±15.2% GC (95% confidence interval) for the strains of all species in the study.

Conclusion

Not only did our mathematical model give reasonable estimates of sbGC it also provides further support to previous observations that mutation rates in prokaryotes exhibit a universal GC → AT bias that appears to be remarkably consistent between taxa.

Electronic supplementary material

The online version of this article (10.1186/s12864-018-4984-3) contains supplementary material, which is available to authorized users.

Collapse

Yu X, Reva ON. SWPhylo - A Novel Tool for Phylogenomic Inferences by Comparison of Oligonucleotide Patterns and Integration of Genome-Based and Gene-Based Phylogenetic Trees. Evol Bioinform Online 2018;14:1176934318759299. [PMID: 29511354 PMCID: PMC5826093 DOI: 10.1177/1176934318759299] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2017] [Accepted: 01/24/2018] [Indexed: 11/17/2022] Open

Beisser D, Graupner N, Bock C, Wodniok S, Grossmann L, Vos M, Sures B, Rahmann S, Boenigk J. Comprehensive transcriptome analysis provides new insights into nutritional strategies and phylogenetic relationships of chrysophytes. PeerJ 2017;5:e2832. [PMID: 28097055 PMCID: PMC5228505 DOI: 10.7717/peerj.2832] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2016] [Accepted: 11/27/2016] [Indexed: 02/02/2023] Open

Abstract

Background

Chrysophytes are protist model species in ecology and ecophysiology and important grazers of bacteria-sized microorganisms and primary producers. However, they have not yet been investigated in detail at the molecular level, and no genomic and only little transcriptomic information is available. Chrysophytes exhibit different trophic modes: while phototrophic chrysophytes perform only photosynthesis, mixotrophs can gain carbon from bacterial food as well as from photosynthesis, and heterotrophs solely feed on bacteria-sized microorganisms. Recent phylogenies and megasystematics demonstrate an immense complexity of eukaryotic diversity with numerous transitions between phototrophic and heterotrophic organisms. The question we aim to answer is how the diverse nutritional strategies, accompanied or brought about by a reduction of the plasmid and size reduction in heterotrophic strains, affect physiology and molecular processes.

Results

We sequenced the mRNA of 18 chrysophyte strains on the Illumina HiSeq platform and analysed the transcriptomes to determine relations between the trophic mode (mixotrophic vs. heterotrophic) and gene expression. We observed an enrichment of genes for photosynthesis, porphyrin and chlorophyll metabolism for phototrophic and mixotrophic strains that can perform photosynthesis. Genes involved in nutrient absorption, environmental information processing and various transporters (e.g., monosaccharide, peptide, lipid transporters) were present or highly expressed only in heterotrophic strains that have to sense, digest and absorb bacterial food. We furthermore present a transcriptome-based alignment-free phylogeny construction approach using transcripts assembled from short reads to determine the evolutionary relationships between the strains and the possible influence of nutritional strategies on the reconstructed phylogeny. We discuss the resulting phylogenies in comparison to those from established approaches based on ribosomal RNA and orthologous genes. Finally, we make functionally annotated reference transcriptomes of each strain available to the community, significantly enhancing publicly available data on Chrysophyceae.

Conclusions

Our study is the first comprehensive transcriptomic characterisation of a diverse set of Chrysophyceaen strains. In addition, we showcase the possibility of inferring phylogenies from assembled transcriptomes using an alignment-free approach. The raw and functionally annotated data we provide will prove beneficial for further examination of the diversity within this taxon. Our molecular characterisation of different trophic modes presents a first such example.

Collapse

The genome of Pseudomonas fluorescens strain R124 demonstrates phenotypic adaptation to the mineral environment. J Bacteriol 2013;195:4793-803. [PMID: 23995634 DOI: 10.1128/jb.00825-13] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open

Skewes AD, Welch RD. A Markovian analysis of bacterial genome sequence constraints. PeerJ 2013;1:e127. [PMID: 24010012 PMCID: PMC3757466 DOI: 10.7717/peerj.127] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2013] [Accepted: 07/18/2013] [Indexed: 11/20/2022] Open

Abstract

The arrangement of nucleotides within a bacterial chromosome is influenced by numerous factors. The degeneracy of the third codon within each reading frame allows some flexibility of nucleotide selection; however, the third nucleotide in the triplet of each codon is at least partly determined by the preceding two. This is most evident in organisms with a strong G + C bias, as the degenerate codon must contribute disproportionately to maintaining that bias. Therefore, a correlation exists between the first two nucleotides and the third in all open reading frames. If the arrangement of nucleotides in a bacterial chromosome is represented as a Markov process, we would expect that the correlation would be completely captured by a second-order Markov model and an increase in the order of the model (e.g., third-, fourth-…order) would not capture any additional uncertainty in the process. In this manuscript, we present the results of a comprehensive study of the Markov property that exists in the DNA sequences of 906 bacterial chromosomes. All of the 906 bacterial chromosomes studied exhibit a statistically significant Markov property that extends beyond second-order, and therefore cannot be fully explained by codon usage. An unrooted tree containing all 906 bacterial chromosomes based on their transition probability matrices of third-order shares ∼25% similarity to a tree based on sequence homologies of 16S rRNA sequences. This congruence to the 16S rRNA tree is greater than for trees based on lower-order models (e.g., second-order), and higher-order models result in diminishing improvements in congruence. A nucleotide correlation most likely exists within every bacterial chromosome that extends past three nucleotides. This correlation places significant limits on the number of nucleotide sequences that can represent probable bacterial chromosomes. Transition matrix usage is largely conserved by taxa, indicating that this property is likely inherited, however some important exceptions exist that may indicate the convergent evolution of some bacteria.

Collapse

Bohlin J, Brynildsrud O, Vesth T, Skjerve E, Ussery DW. Amino acid usage is asymmetrically biased in AT- and GC-rich microbial genomes. PLoS One 2013;8:e69878. [PMID: 23922837 PMCID: PMC3724673 DOI: 10.1371/journal.pone.0069878] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2013] [Accepted: 06/14/2013] [Indexed: 11/18/2022] Open

Abstract

INTRODUCTION

Genomic base composition ranges from less than 25% AT to more than 85% AT in prokaryotes. Since only a small fraction of prokaryotic genomes is not protein coding even a minor change in genomic base composition will induce profound protein changes. We examined how amino acid and codon frequencies were distributed in over 2000 microbial genomes and how these distributions were affected by base compositional changes. In addition, we wanted to know how genome-wide amino acid usage was biased in the different genomes and how changes to base composition and mutations affected this bias. To carry this out, we used a Generalized Additive Mixed-effects Model (GAMM) to explore non-linear associations and strong data dependences in closely related microbes; principal component analysis (PCA) was used to examine genomic amino acid- and codon frequencies, while the concept of relative entropy was used to analyze genomic mutation rates.

RESULTS

We found that genomic amino acid frequencies carried a stronger phylogenetic signal than codon frequencies, but that this signal was weak compared to that of genomic %AT. Further, in contrast to codon usage bias (CUB), amino acid usage bias (AAUB) was differently distributed in AT- and GC-rich genomes in the sense that AT-rich genomes did not prefer specific amino acids over others to the same extent as GC-rich genomes. AAUB was also associated with relative entropy; genomes with low AAUB contained more random mutations as a consequence of relaxed purifying selection than genomes with higher AAUB.

CONCLUSION

Genomic base composition has a substantial effect on both amino acid- and codon frequencies in bacterial genomes. While phylogeny influenced amino acid usage more in GC-rich genomes, AT-content was driving amino acid usage in AT-rich genomes. We found the GAMM model to be an excellent tool to analyze the genomic data used in this study.

Collapse

Teeling H, Glöckner FO. Current opportunities and challenges in microbial metagenome analysis--a bioinformatic perspective. Brief Bioinform 2012;13:728-42. [PMID: 22966151 PMCID: PMC3504927 DOI: 10.1093/bib/bbs039] [Citation(s) in RCA: 148] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2012] [Accepted: 06/09/2012] [Indexed: 12/21/2022] Open

Bohlin J, van Passel MWJ, Snipen L, Kristoffersen AB, Ussery D, Hardy SP. Relative entropy differences in bacterial chromosomes, plasmids, phages and genomic islands. BMC Genomics 2012;13:66. [PMID: 22325062 PMCID: PMC3305612 DOI: 10.1186/1471-2164-13-66] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2011] [Accepted: 02/10/2012] [Indexed: 11/10/2022] Open

Bezuidt O, Pierneef R, Mncube K, Lima-Mendez G, Reva ON. Mainstreams of horizontal gene exchange in enterobacteria: consideration of the outbreak of enterohemorrhagic E. coli O104:H4 in Germany in 2011. PLoS One 2011;6:e25702. [PMID: 22022434 PMCID: PMC3195076 DOI: 10.1371/journal.pone.0025702] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2011] [Accepted: 09/08/2011] [Indexed: 11/18/2022] Open

Abstract

BACKGROUND

Escherichia coli O104:H4 caused a severe outbreak in Europe in 2011. The strain TY-2482 sequenced from this outbreak allowed the discovery of its closest relatives but failed to resolve ways in which it originated and evolved. On account of the previous statement, may we expect similar upcoming outbreaks to occur recurrently or spontaneously in the future? The inability to answer these questions shows limitations of the current comparative and evolutionary genomics methods.

PRINCIPAL FINDINGS

The study revealed oscillations of gene exchange in enterobacteria, which originated from marine γ-Proteobacteria. These mobile genetic elements have become recombination hotspots and effective 'vehicles' ensuring a wide distribution of successful combinations of fitness and virulence genes among enterobacteria. Two remarkable peculiarities of the strain TY-2482 and its relatives were observed: i) retaining the genetic primitiveness by these strains as they somehow avoided the main fluxes of horizontal gene transfer which effectively penetrated other enetrobacteria; ii) acquisition of antibiotic resistance genes in a plasmid genomic island of β-Proteobacteria origin which ontologically is unrelated to the predominant genomic islands of enterobacteria.

CONCLUSIONS

Oscillations of horizontal gene exchange activity were reported which result from a counterbalance between the acquired resistance of bacteria towards existing mobile vectors and the generation of new vectors in the environmental microflora. We hypothesized that TY-2482 may originate from a genetically primitive lineage of E. coli that has evolved in confined geographical areas and brought by human migration or cattle trade onto an intersection of several independent streams of horizontal gene exchange. Development of a system for monitoring the new and most active gene exchange events was proposed.

Collapse

Klockgether J, Cramer N, Wiehlmann L, Davenport CF, Tümmler B. Pseudomonas aeruginosa Genomic Structure and Diversity. Front Microbiol 2011;2:150. [PMID: 21808635 PMCID: PMC3139241 DOI: 10.3389/fmicb.2011.00150] [Citation(s) in RCA: 199] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2011] [Accepted: 06/27/2011] [Indexed: 12/23/2022] Open

Abstract

The Pseudomonas aeruginosa genome (G + C content 65–67%, size 5.5–7 Mbp) is made up of a single circular chromosome and a variable number of plasmids. Sequencing of complete genomes or blocks of the accessory genome has revealed that the genome encodes a large repertoire of transporters, transcriptional regulators, and two-component regulatory systems which reflects its metabolic diversity to utilize a broad range of nutrients. The conserved core component of the genome is largely collinear among P. aeruginosa strains and exhibits an interclonal sequence diversity of 0.5–0.7%. Only a few loci of the core genome are subject to diversifying selection. Genome diversity is mainly caused by accessory DNA elements located in 79 regions of genome plasticity that are scattered around the genome and show an anomalous usage of mono- to tetradecanucleotides. Genomic islands of the pKLC102/PAGI-2 family that integrate into tRNA^Lys or tRNA^Gly genes represent hotspots of inter- and intraclonal genomic diversity. The individual islands differ in their repertoire of metabolic genes that make a large contribution to the pangenome. In order to unravel intraclonal diversity of P. aeruginosa, the genomes of two members of the PA14 clonal complex from diverse habitats and geographic origin were compared. The genome sequences differed by less than 0.01% from each other. One hundred ninety-eight of the 231 single nucleotide substitutions (SNPs) were non-randomly distributed in the genome. Non-synonymous SNPs were mainly found in an integrated Pf1-like phage and in genes involved in transcriptional regulation, membrane and extracellular constituents, transport, and secretion. In summary, P. aeruginosa is endowed with a highly conserved core genome of low sequence diversity and a highly variable accessory genome that communicates with other pseudomonads and genera via horizontal gene transfer.

Collapse

Practical application of self-organizing maps to interrelate biodiversity and functional data in NGS-based metagenomics. ISME JOURNAL 2010;5:918-28. [PMID: 21160538 DOI: 10.1038/ismej.2010.180] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]

Abstract

Next-generation sequencing (NGS) technologies have enabled the application of broad-scale sequencing in microbial biodiversity and metagenome studies. Biodiversity is usually targeted by classifying 16S ribosomal RNA genes, while metagenomic approaches target metabolic genes. However, both approaches remain isolated, as long as the taxonomic and functional information cannot be interrelated. Techniques like self-organizing maps (SOMs) have been applied to cluster metagenomes into taxon-specific bins in order to link biodiversity with functions, but have not been applied to broad-scale NGS-based metagenomics yet. Here, we provide a novel implementation, demonstrate its potential and practicability, and provide a web-based service for public usage. Evaluation with published data sets mimicking varyingly complex habitats resulted into classification specificities and sensitivities of close to 100% to above 90% from phylum to genus level for assemblies exceeding 8 kb for low and medium complexity data. When applied to five real-world metagenomes of medium complexity from direct pyrosequencing of marine subsurface waters, classifications of assemblies above 2.5 kb were in good agreement with fluorescence in situ hybridizations, indicating that biodiversity was mostly retained within the metagenomes, and confirming high classification specificities. This was validated by two protein-based classifications (PBCs) methods. SOMs were able to retrieve the relevant taxa down to the genus level, while surpassing PBCs in resolution. In order to make the approach accessible to a broad audience, we implemented a feature-rich web-based SOM application named TaxSOM, which is freely available at http://www.megx.net/toolbox/taxsom. TaxSOM can classify reads or assemblies exceeding 2.5 kb with high accuracy and thus assists in linking biodiversity and functions in metagenome studies, which is a precondition to study microbial ecology in a holistic fashion.

Collapse

Beloqui A, Nechitaylo TY, López-Cortés N, Ghazi A, Guazzaroni ME, Polaina J, Strittmatter AW, Reva O, Waliczek A, Yakimov MM, Golyshina OV, Ferrer M, Golyshin PN. Diversity of glycosyl hydrolases from cellulose-depleting communities enriched from casts of two earthworm species. Appl Environ Microbiol 2010;76:5934-46. [PMID: 20622123 PMCID: PMC2935051 DOI: 10.1128/aem.00902-10] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2010] [Accepted: 07/01/2010] [Indexed: 11/20/2022] Open

Affiliation(s)

Ana Beloqui CSIC, Institute of Catalysis, 28049 Madrid, Spain, HZI-Helmholtz Centre for Infection Research, 38124 Braunschweig, Germany, CSIC, Instituto de Agroquímica y Tecnología de Alimentos, 46980 Valencia, Spain, Eurofins MWG Operon, 85560 Ebersberg, Germany, Department of Biochemistry, University of Pretoria, 0002 Pretoria, South Africa, Istituto per l'Ambiente Marino Costiero, CNR, Messina 98122, Italy, School of Biological Sciences, Bangor University, Gwynedd LL57 2UW, United Kingdom, Centre for Integrated Research in the Rural Environment (CRRE), Aberystwyth University-Bangor University Partnership, Aberystwyth, Ceredigion SY23 3BF, United Kingdom
Taras Y. Nechitaylo CSIC, Institute of Catalysis, 28049 Madrid, Spain, HZI-Helmholtz Centre for Infection Research, 38124 Braunschweig, Germany, CSIC, Instituto de Agroquímica y Tecnología de Alimentos, 46980 Valencia, Spain, Eurofins MWG Operon, 85560 Ebersberg, Germany, Department of Biochemistry, University of Pretoria, 0002 Pretoria, South Africa, Istituto per l'Ambiente Marino Costiero, CNR, Messina 98122, Italy, School of Biological Sciences, Bangor University, Gwynedd LL57 2UW, United Kingdom, Centre for Integrated Research in the Rural Environment (CRRE), Aberystwyth University-Bangor University Partnership, Aberystwyth, Ceredigion SY23 3BF, United Kingdom
Nieves López-Cortés CSIC, Institute of Catalysis, 28049 Madrid, Spain, HZI-Helmholtz Centre for Infection Research, 38124 Braunschweig, Germany, CSIC, Instituto de Agroquímica y Tecnología de Alimentos, 46980 Valencia, Spain, Eurofins MWG Operon, 85560 Ebersberg, Germany, Department of Biochemistry, University of Pretoria, 0002 Pretoria, South Africa, Istituto per l'Ambiente Marino Costiero, CNR, Messina 98122, Italy, School of Biological Sciences, Bangor University, Gwynedd LL57 2UW, United Kingdom, Centre for Integrated Research in the Rural Environment (CRRE), Aberystwyth University-Bangor University Partnership, Aberystwyth, Ceredigion SY23 3BF, United Kingdom
Azam Ghazi CSIC, Institute of Catalysis, 28049 Madrid, Spain, HZI-Helmholtz Centre for Infection Research, 38124 Braunschweig, Germany, CSIC, Instituto de Agroquímica y Tecnología de Alimentos, 46980 Valencia, Spain, Eurofins MWG Operon, 85560 Ebersberg, Germany, Department of Biochemistry, University of Pretoria, 0002 Pretoria, South Africa, Istituto per l'Ambiente Marino Costiero, CNR, Messina 98122, Italy, School of Biological Sciences, Bangor University, Gwynedd LL57 2UW, United Kingdom, Centre for Integrated Research in the Rural Environment (CRRE), Aberystwyth University-Bangor University Partnership, Aberystwyth, Ceredigion SY23 3BF, United Kingdom
María-Eugenia Guazzaroni CSIC, Institute of Catalysis, 28049 Madrid, Spain, HZI-Helmholtz Centre for Infection Research, 38124 Braunschweig, Germany, CSIC, Instituto de Agroquímica y Tecnología de Alimentos, 46980 Valencia, Spain, Eurofins MWG Operon, 85560 Ebersberg, Germany, Department of Biochemistry, University of Pretoria, 0002 Pretoria, South Africa, Istituto per l'Ambiente Marino Costiero, CNR, Messina 98122, Italy, School of Biological Sciences, Bangor University, Gwynedd LL57 2UW, United Kingdom, Centre for Integrated Research in the Rural Environment (CRRE), Aberystwyth University-Bangor University Partnership, Aberystwyth, Ceredigion SY23 3BF, United Kingdom
Julio Polaina CSIC, Institute of Catalysis, 28049 Madrid, Spain, HZI-Helmholtz Centre for Infection Research, 38124 Braunschweig, Germany, CSIC, Instituto de Agroquímica y Tecnología de Alimentos, 46980 Valencia, Spain, Eurofins MWG Operon, 85560 Ebersberg, Germany, Department of Biochemistry, University of Pretoria, 0002 Pretoria, South Africa, Istituto per l'Ambiente Marino Costiero, CNR, Messina 98122, Italy, School of Biological Sciences, Bangor University, Gwynedd LL57 2UW, United Kingdom, Centre for Integrated Research in the Rural Environment (CRRE), Aberystwyth University-Bangor University Partnership, Aberystwyth, Ceredigion SY23 3BF, United Kingdom
Axel W. Strittmatter CSIC, Institute of Catalysis, 28049 Madrid, Spain, HZI-Helmholtz Centre for Infection Research, 38124 Braunschweig, Germany, CSIC, Instituto de Agroquímica y Tecnología de Alimentos, 46980 Valencia, Spain, Eurofins MWG Operon, 85560 Ebersberg, Germany, Department of Biochemistry, University of Pretoria, 0002 Pretoria, South Africa, Istituto per l'Ambiente Marino Costiero, CNR, Messina 98122, Italy, School of Biological Sciences, Bangor University, Gwynedd LL57 2UW, United Kingdom, Centre for Integrated Research in the Rural Environment (CRRE), Aberystwyth University-Bangor University Partnership, Aberystwyth, Ceredigion SY23 3BF, United Kingdom
Oleg Reva CSIC, Institute of Catalysis, 28049 Madrid, Spain, HZI-Helmholtz Centre for Infection Research, 38124 Braunschweig, Germany, CSIC, Instituto de Agroquímica y Tecnología de Alimentos, 46980 Valencia, Spain, Eurofins MWG Operon, 85560 Ebersberg, Germany, Department of Biochemistry, University of Pretoria, 0002 Pretoria, South Africa, Istituto per l'Ambiente Marino Costiero, CNR, Messina 98122, Italy, School of Biological Sciences, Bangor University, Gwynedd LL57 2UW, United Kingdom, Centre for Integrated Research in the Rural Environment (CRRE), Aberystwyth University-Bangor University Partnership, Aberystwyth, Ceredigion SY23 3BF, United Kingdom
Agnes Waliczek CSIC, Institute of Catalysis, 28049 Madrid, Spain, HZI-Helmholtz Centre for Infection Research, 38124 Braunschweig, Germany, CSIC, Instituto de Agroquímica y Tecnología de Alimentos, 46980 Valencia, Spain, Eurofins MWG Operon, 85560 Ebersberg, Germany, Department of Biochemistry, University of Pretoria, 0002 Pretoria, South Africa, Istituto per l'Ambiente Marino Costiero, CNR, Messina 98122, Italy, School of Biological Sciences, Bangor University, Gwynedd LL57 2UW, United Kingdom, Centre for Integrated Research in the Rural Environment (CRRE), Aberystwyth University-Bangor University Partnership, Aberystwyth, Ceredigion SY23 3BF, United Kingdom
Michail M. Yakimov CSIC, Institute of Catalysis, 28049 Madrid, Spain, HZI-Helmholtz Centre for Infection Research, 38124 Braunschweig, Germany, CSIC, Instituto de Agroquímica y Tecnología de Alimentos, 46980 Valencia, Spain, Eurofins MWG Operon, 85560 Ebersberg, Germany, Department of Biochemistry, University of Pretoria, 0002 Pretoria, South Africa, Istituto per l'Ambiente Marino Costiero, CNR, Messina 98122, Italy, School of Biological Sciences, Bangor University, Gwynedd LL57 2UW, United Kingdom, Centre for Integrated Research in the Rural Environment (CRRE), Aberystwyth University-Bangor University Partnership, Aberystwyth, Ceredigion SY23 3BF, United Kingdom
Olga V. Golyshina CSIC, Institute of Catalysis, 28049 Madrid, Spain, HZI-Helmholtz Centre for Infection Research, 38124 Braunschweig, Germany, CSIC, Instituto de Agroquímica y Tecnología de Alimentos, 46980 Valencia, Spain, Eurofins MWG Operon, 85560 Ebersberg, Germany, Department of Biochemistry, University of Pretoria, 0002 Pretoria, South Africa, Istituto per l'Ambiente Marino Costiero, CNR, Messina 98122, Italy, School of Biological Sciences, Bangor University, Gwynedd LL57 2UW, United Kingdom, Centre for Integrated Research in the Rural Environment (CRRE), Aberystwyth University-Bangor University Partnership, Aberystwyth, Ceredigion SY23 3BF, United Kingdom
Manuel Ferrer CSIC, Institute of Catalysis, 28049 Madrid, Spain, HZI-Helmholtz Centre for Infection Research, 38124 Braunschweig, Germany, CSIC, Instituto de Agroquímica y Tecnología de Alimentos, 46980 Valencia, Spain, Eurofins MWG Operon, 85560 Ebersberg, Germany, Department of Biochemistry, University of Pretoria, 0002 Pretoria, South Africa, Istituto per l'Ambiente Marino Costiero, CNR, Messina 98122, Italy, School of Biological Sciences, Bangor University, Gwynedd LL57 2UW, United Kingdom, Centre for Integrated Research in the Rural Environment (CRRE), Aberystwyth University-Bangor University Partnership, Aberystwyth, Ceredigion SY23 3BF, United Kingdom
Peter N. Golyshin CSIC, Institute of Catalysis, 28049 Madrid, Spain, HZI-Helmholtz Centre for Infection Research, 38124 Braunschweig, Germany, CSIC, Instituto de Agroquímica y Tecnología de Alimentos, 46980 Valencia, Spain, Eurofins MWG Operon, 85560 Ebersberg, Germany, Department of Biochemistry, University of Pretoria, 0002 Pretoria, South Africa, Istituto per l'Ambiente Marino Costiero, CNR, Messina 98122, Italy, School of Biological Sciences, Bangor University, Gwynedd LL57 2UW, United Kingdom, Centre for Integrated Research in the Rural Environment (CRRE), Aberystwyth University-Bangor University Partnership, Aberystwyth, Ceredigion SY23 3BF, United Kingdom

Collapse

Bohlin J, Snipen L, Cloeckaert A, Lagesen K, Ussery D, Kristoffersen AB, Godfroid J. Genomic comparisons of Brucella spp. and closely related bacteria using base compositional and proteome based methods. BMC Evol Biol 2010;10:249. [PMID: 20707916 PMCID: PMC2928237 DOI: 10.1186/1471-2148-10-249] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2010] [Accepted: 08/13/2010] [Indexed: 11/30/2022] Open

Abstract

Background

Classification of bacteria within the genus Brucella has been difficult due in part to considerable genomic homogeneity between the different species and biovars, in spite of clear differences in phenotypes. Therefore, many different methods have been used to assess Brucella taxonomy. In the current work, we examine 32 sequenced genomes from genus Brucella representing the six classical species, as well as more recently described species, using bioinformatical methods. Comparisons were made at the level of genomic DNA using oligonucleotide based methods (Markov chain based genomic signatures, genomic codon and amino acid frequencies based comparisons) and proteomes (all-against-all BLAST protein comparisons and pan-genomic analyses).

Results

We found that the oligonucleotide based methods gave different results compared to that of the proteome based methods. Differences were also found between the oligonucleotide based methods used. Whilst the Markov chain based genomic signatures grouped the different species in genus Brucella according to host preference, the codon and amino acid frequencies based methods reflected small differences between the Brucella species. Only minor differences could be detected between all genera included in this study using the codon and amino acid frequencies based methods.

Proteome comparisons were found to be in strong accordance with current Brucella taxonomy indicating a remarkable association between gene gain or loss on one hand and mutations in marker genes on the other. The proteome based methods found greater similarity between Brucella species and Ochrobactrum species than between species within genus Agrobacterium compared to each other. In other words, proteome comparisons of species within genus Agrobacterium were found to be more diverse than proteome comparisons between species in genus Brucella and genus Ochrobactrum. Pan-genomic analyses indicated that uptake of DNA from outside genus Brucella appears to be limited.

Conclusions

While both the proteome based methods and the Markov chain based genomic signatures were able to reflect environmental diversity between the different species and strains of genus Brucella, the genomic codon and amino acid frequencies based comparisons were not found adequate for such comparisons. The proteome comparison based phylogenies of the species in genus Brucella showed a surprising consistency with current Brucella taxonomy.

Collapse

Bohlin J, Snipen L, Hardy SP, Kristoffersen AB, Lagesen K, Dønsvik T, Skjerve E, Ussery DW. Analysis of intra-genomic GC content homogeneity within prokaryotes. BMC Genomics 2010;11:464. [PMID: 20691090 PMCID: PMC3091660 DOI: 10.1186/1471-2164-11-464] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2010] [Accepted: 08/06/2010] [Indexed: 11/10/2022] Open

Davenport C, Ussery DW, Tümmler B. Comparative genomics of green sulfur bacteria. PHOTOSYNTHESIS RESEARCH 2010;104:137-152. [PMID: 20099081 DOI: 10.1007/s11120-009-9515-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/20/2009] [Accepted: 12/07/2009] [Indexed: 05/28/2023]

Davenport CF, Tümmler B. Abundant oligonucleotides common to most bacteria. PLoS One 2010;5:e9841. [PMID: 20352124 PMCID: PMC2843746 DOI: 10.1371/journal.pone.0009841] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2009] [Accepted: 03/03/2010] [Indexed: 11/25/2022] Open

Examination of genome homogeneity in prokaryotes using genomic signatures. PLoS One 2009;4:e8113. [PMID: 19956556 PMCID: PMC2781299 DOI: 10.1371/journal.pone.0008113] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2009] [Accepted: 11/05/2009] [Indexed: 01/17/2023] Open

Abstract

BACKGROUND

DNA word frequencies, normalized for genomic AT content, are remarkably stable within prokaryotic genomes and are therefore said to reflect a "genomic signature." The genomic signatures can be used to phylogenetically classify organisms from arbitrary sampled DNA. Genomic signatures can also be used to search for horizontally transferred DNA or DNA regions subjected to special selection forces. Thus, the stability of the genomic signature can be used as a measure of genomic homogeneity. The factors associated with the stability of the genomic signatures are not known, and this motivated us to investigate further. We analyzed the intra-genomic variance of genomic signatures based on AT content normalization (0(th) order Markov model) as well as genomic signatures normalized by smaller DNA words (1(st) and 2(nd) order Markov models) for 636 sequenced prokaryotic genomes. Regression models were fitted, with intra-genomic signature variance as the response variable, to a set of factors representing genomic properties such as genomic AT content, genome size, habitat, phylum, oxygen requirement, optimal growth temperature and oligonucleotide usage variance (OUV, a measure of oligonucleotide usage bias), measured as the variance between genomic tetranucleotide frequencies and Markov chain approximated tetranucleotide frequencies, as predictors.

PRINCIPAL FINDINGS

Regression analysis revealed that OUV was the most important factor (p<0.001) determining intra-genomic homogeneity as measured using genomic signatures. This means that the less random the oligonucleotide usage is in the sense of higher OUV, the more homogeneous the genome is in terms of the genomic signature. The other factors influencing variance in the genomic signature (p<0.001) were genomic AT content, phylum and oxygen requirement.

CONCLUSIONS

Genomic homogeneity in prokaryotes is intimately linked to genomic GC content, oligonucleotide usage bias (OUV) and aerobiosis, while oligonucleotide usage bias (OUV) is associated with genomic GC content, aerobiosis and habitat.

Collapse

Bohlin J, Skjerve E, Ussery DW. Analysis of genomic signatures in prokaryotes using multinomial regression and hierarchical clustering. BMC Genomics 2009;10:487. [PMID: 19845945 PMCID: PMC2770534 DOI: 10.1186/1471-2164-10-487] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2009] [Accepted: 10/21/2009] [Indexed: 11/26/2022] Open

Bohlin J, Hardy SP, Ussery DW. Stretches of alternating pyrimidine/purines and purines are respectively linked with pathogenicity and growth temperature in prokaryotes. BMC Genomics 2009;10:346. [PMID: 19646265 PMCID: PMC2728739 DOI: 10.1186/1471-2164-10-346] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2009] [Accepted: 07/31/2009] [Indexed: 02/02/2023] Open

Abstract

Background

The genomic fractions of purine (RR) and alternating pyrimidine/purine (YR) stretches of 10 base pairs or more, have been linked to genomic AT content, the formation of different DNA helices, strand-biased gene distribution, DNA structure, and more. Although some of these factors are a consequence of the chemical properties of purines and pyrimidines, a thorough statistical examination of the distributions of YR/RR stretches in sequenced prokaryotic chromosomes has to the best of our knowledge, not been undertaken. The aim of this study is to expand upon previous research by using regression analysis to investigate how AT content, habitat, growth temperature, pathogenicity, phyla, oxygen requirement and halotolerance correlated with the distribution of RR and YR stretches in prokaryotes.

Results

Our results indicate that RR and YR-stretches are differently distributed in prokaryotic phyla. RR stretches are overrepresented in all phyla except for the Actinobacteria and β-Proteobacteria. In contrast, YR tracts are underrepresented in all phyla except for the β-Proteobacterial group. YR-stretches are associated with phylum, pathogenicity and habitat, whilst RR-tracts are associated with phylum, AT content, oxygen requirement, growth temperature and halotolerance. All associations described were statistically significant with p < 0.001.

Conclusion

Analysis of chromosomal distributions of RR/YR sequences in prokaryotes reveals a set of associations with environmental factors not observed with mono- and oligonucleotide frequencies. This implies that important information can be found in the distribution of RR/YR stretches that is more difficult to obtain from genomic mono- and oligonucleotide frequencies. The association between pathogenicity and fractions of YR stretches is assumed to be linked to recombination and horizontal transfer.

Collapse

Davenport CF, Wiehlmann L, Reva ON, Tümmler B. Visualization of Pseudomonas genomic structure by abundant 8-14mer oligonucleotides. Environ Microbiol 2009;11:1092-104. [PMID: 19161433 DOI: 10.1111/j.1462-2920.2008.01839.x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]

Pride DT, Schoenfeld T. Genome signature analysis of thermal virus metagenomes reveals Archaea and thermophilic signatures. BMC Genomics 2008;9:420. [PMID: 18798991 PMCID: PMC2556352 DOI: 10.1186/1471-2164-9-420] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2008] [Accepted: 09/17/2008] [Indexed: 11/18/2022] Open

Abstract

Background

Metagenomic analysis provides a rich source of biological information for otherwise intractable viral communities. However, study of viral metagenomes has been hampered by its nearly complete reliance on BLAST algorithms for identification of DNA sequences. We sought to develop algorithms for examination of viral metagenomes to identify the origin of sequences independent of BLAST algorithms. We chose viral metagenomes obtained from two hot springs, Bear Paw and Octopus, in Yellowstone National Park, as they represent simple microbial populations where comparatively large contigs were obtained. Thermal spring metagenomes have high proportions of sequences without significant Genbank homology, which has hampered identification of viruses and their linkage with hosts. To analyze each metagenome, we developed a method to classify DNA fragments using genome signature-based phylogenetic classification (GSPC), where metagenomic fragments are compared to a database of oligonucleotide signatures for all previously sequenced Bacteria, Archaea, and viruses.

Results

From both Bear Paw and Octopus hot springs, each assembled contig had more similarity to other metagenome contigs than to any sequenced microbial genome based on GSPC analysis, suggesting a genome signature common to each of these extreme environments. While viral metagenomes from Bear Paw and Octopus share some similarity, the genome signatures from each locale are largely unique. GSPC using a microbial database predicts most of the Octopus metagenome has archaeal signatures, while bacterial signatures predominate in Bear Paw; a finding consistent with those of Genbank BLAST. When using a viral database, the majority of the Octopus metagenome is predicted to belong to archaeal virus Families Globuloviridae and Fuselloviridae, while none of the Bear Paw metagenome is predicted to belong to archaeal viruses. As expected, when microbial and viral databases are combined, each of the Octopus and Bear Paw metagenomic contigs are predicted to belong to viruses rather than to any Bacteria or Archaea, consistent with the apparent viral origin of both metagenomes.

Conclusion

That BLAST searches identify no significant homologs for most metagenome contigs, while GSPC suggests their origin as archaeal viruses or bacteriophages, indicates GSPC provides a complementary approach in viral metagenomic analysis.

Collapse

Ganesan H, Rakitianskaia AS, Davenport CF, Tümmler B, Reva ON. The SeqWord Genome Browser: an online tool for the identification and visualization of atypical regions of bacterial genomes through oligonucleotide usage. BMC Bioinformatics 2008;9:333. [PMID: 18687122 PMCID: PMC2528017 DOI: 10.1186/1471-2105-9-333] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2008] [Accepted: 08/07/2008] [Indexed: 11/10/2022] Open

Bohlin J, Skjerve E, Ussery DW. Investigations of oligonucleotide usage variance within and between prokaryotes. PLoS Comput Biol 2008;4:e1000057. [PMID: 18421372 PMCID: PMC2289840 DOI: 10.1371/journal.pcbi.1000057] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2007] [Accepted: 03/12/2008] [Indexed: 11/18/2022] Open

Reva O, Tümmler B. Think big – giant genes in bacteria. Environ Microbiol 2008;10:768-77. [DOI: 10.1111/j.1462-2920.2007.01500.x] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]

Bohlin J, Skjerve E, Ussery DW. Reliability and applications of statistical methods based on oligonucleotide frequencies in bacterial and archaeal genomes. BMC Genomics 2008;9:104. [PMID: 18307761 PMCID: PMC2289816 DOI: 10.1186/1471-2164-9-104] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2007] [Accepted: 02/28/2008] [Indexed: 11/22/2022] Open

Abstract

Background

The increasing number of sequenced prokaryotic genomes contains a wealth of genomic data that needs to be effectively analysed. A set of statistical tools exists for such analysis, but their strengths and weaknesses have not been fully explored. The statistical methods we are concerned with here are mainly used to examine similarities between archaeal and bacterial DNA from different genomes. These methods compare observed genomic frequencies of fixed-sized oligonucleotides with expected values, which can be determined by genomic nucleotide content, smaller oligonucleotide frequencies, or be based on specific statistical distributions. Advantages with these statistical methods include measurements of phylogenetic relationship with relatively small pieces of DNA sampled from almost anywhere within genomes, detection of foreign/conserved DNA, and homology searches. Our aim was to explore the reliability and best suited applications for some popular methods, which include relative oligonucleotide frequencies (ROF), di- to hexanucleotide zero'th order Markov methods (ZOM) and 2.order Markov chain Method (MCM). Tests were performed on distant homology searches with large DNA sequences, detection of foreign/conserved DNA, and plasmid-host similarity comparisons. Additionally, the reliability of the methods was tested by comparing both real and random genomic DNA.

Results

Our findings show that the optimal method is context dependent. ROFs were best suited for distant homology searches, whilst the hexanucleotide ZOM and MCM measures were more reliable measures in terms of phylogeny. The dinucleotide ZOM method produced high correlation values when used to compare real genomes to an artificially constructed random genome with similar %GC, and should therefore be used with care. The tetranucleotide ZOM measure was a good measure to detect horizontally transferred regions, and when used to compare the phylogenetic relationships between plasmids and hosts, significant correlation (R²= 0.4) was found with genomic GC content and intra-chromosomal homogeneity.

Conclusion

The statistical methods examined are fast, easy to implement, and powerful for a number of different applications involving genomic sequence comparisons. However, none of the measures examined were superior in all tests, and therefore the choice of the statistical method should depend on the task at hand.

Collapse

Reva ON, Hallin PF, Willenbrock H, Sicheritz-Ponten T, Tümmler B, Ussery DW. Global features of the Alcanivorax borkumensis SK2 genome. Environ Microbiol 2007;10:614-25. [PMID: 18081853 DOI: 10.1111/j.1462-2920.2007.01483.x] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]

Delwart EL. Viral metagenomics. Rev Med Virol 2007;17:115-31. [PMID: 17295196 PMCID: PMC7169062 DOI: 10.1002/rmv.532] [Citation(s) in RCA: 224] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]

Klockgether J, Würdemann D, Reva O, Wiehlmann L, Tümmler B. Diversity of the abundant pKLC102/PAGI-2 family of genomic islands in Pseudomonas aeruginosa. J Bacteriol 2007;189:2443-59. [PMID: 17194795 PMCID: PMC1899365 DOI: 10.1128/jb.01688-06] [Citation(s) in RCA: 88] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2006] [Accepted: 01/08/2007] [Indexed: 12/27/2022] Open

Chen XH, Vater J, Piel J, Franke P, Scholz R, Schneider K, Koumoutsi A, Hitzeroth G, Grammel N, Strittmatter AW, Gottschalk G, Süssmuth RD, Borriss R. Structural and functional characterization of three polyketide synthase gene clusters in Bacillus amyloliquefaciens FZB 42. J Bacteriol 2006;188:4024-36. [PMID: 16707694 PMCID: PMC1482889 DOI: 10.1128/jb.00052-06] [Citation(s) in RCA: 250] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Pride DT, Wassenaar TM, Ghose C, Blaser MJ. Evidence of host-virus co-evolution in tetranucleotide usage patterns of bacteriophages and eukaryotic viruses. BMC Genomics 2006;7:8. [PMID: 16417644 PMCID: PMC1360066 DOI: 10.1186/1471-2164-7-8] [Citation(s) in RCA: 94] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2005] [Accepted: 01/18/2006] [Indexed: 11/10/2022] Open

Reva ON, Tümmler B. Differentiation of regions with atypical oligonucleotide composition in bacterial genomes. BMC Bioinformatics 2005;6:251. [PMID: 16225667 PMCID: PMC1274298 DOI: 10.1186/1471-2105-6-251] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2005] [Accepted: 10/14/2005] [Indexed: 11/10/2022] Open