301
|
López-Guerrero MG, Ormeño-Orrillo E, Velázquez E, Rogel MA, Acosta JL, Gónzalez V, Martínez J, Martínez-Romero E. Rhizobium etli taxonomy revised with novel genomic data and analyses. Syst Appl Microbiol 2012; 35:353-8. [DOI: 10.1016/j.syapm.2012.06.009] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2012] [Revised: 05/14/2012] [Accepted: 06/14/2012] [Indexed: 11/16/2022]
|
302
|
Loman NJ, Constantinidou C, Chan JZM, Halachev M, Sergeant M, Penn CW, Robinson ER, Pallen MJ. High-throughput bacterial genome sequencing: an embarrassment of choice, a world of opportunity. Nat Rev Microbiol 2012; 10:599-606. [DOI: 10.1038/nrmicro2850] [Citation(s) in RCA: 332] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
|
303
|
Quail MA, Smith M, Coupland P, Otto TD, Harris SR, Connor TR, Bertoni A, Swerdlow HP, Gu Y. A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics 2012; 13:341. [PMID: 22827831 PMCID: PMC3431227 DOI: 10.1186/1471-2164-13-341] [Citation(s) in RCA: 1264] [Impact Index Per Article: 97.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2012] [Accepted: 07/12/2012] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Next generation sequencing (NGS) technology has revolutionized genomic and genetic research. The pace of change in this area is rapid with three major new sequencing platforms having been released in 2011: Ion Torrent's PGM, Pacific Biosciences' RS and the Illumina MiSeq. Here we compare the results obtained with those platforms to the performance of the Illumina HiSeq, the current market leader. In order to compare these platforms, and get sufficient coverage depth to allow meaningful analysis, we have sequenced a set of 4 microbial genomes with mean GC content ranging from 19.3 to 67.7%. Together, these represent a comprehensive range of genome content. Here we report our analysis of that sequence data in terms of coverage distribution, bias, GC distribution, variant detection and accuracy. RESULTS Sequence generated by Ion Torrent, MiSeq and Pacific Biosciences technologies displays near perfect coverage behaviour on GC-rich, neutral and moderately AT-rich genomes, but a profound bias was observed upon sequencing the extremely AT-rich genome of Plasmodium falciparum on the PGM, resulting in no coverage for approximately 30% of the genome. We analysed the ability to call variants from each platform and found that we could call slightly more variants from Ion Torrent data compared to MiSeq data, but at the expense of a higher false positive rate. Variant calling from Pacific Biosciences data was possible but higher coverage depth was required. Context specific errors were observed in both PGM and MiSeq data, but not in that from the Pacific Biosciences platform. CONCLUSIONS All three fast turnaround sequencers evaluated here were able to generate usable sequence. However there are key differences between the quality of that data and the applications it will support.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Yong Gu
- Wellcome Trust Sanger Institute, Hinxton, UK
| |
Collapse
|
304
|
Abstract
Whole-genome alignment (WGA) is the prediction of evolutionary relationships at the nucleotide level between two or more genomes. It combines aspects of both colinear sequence alignment and gene orthology prediction, and is typically more challenging to address than either of these tasks due to the size and complexity of whole genomes. Despite the difficulty of this problem, numerous methods have been developed for its solution because WGAs are valuable for genome-wide analyses, such as phylogenetic inference, genome annotation, and function prediction. In this chapter, we discuss the meaning and significance of WGA and present an overview of the methods that address it. We also examine the problem of evaluating whole-genome aligners and offer a set of methodological challenges that need to be tackled in order to make the most effective use of our rapidly growing databases of whole genomes.
Collapse
Affiliation(s)
- Colin N Dewey
- Biostatistics and Medical Informatics and Computer Sciences, Genome Center of Wisconsin, University of Wisconsin-Madison, Madison, WI, USA.
| |
Collapse
|
305
|
Vogel V, Falquet L, Calderon-Copete SP, Basset P, Blanc DS. Short term evolution of a highly transmissible methicillin-resistant Staphylococcus aureus clone (ST228) in a tertiary care hospital. PLoS One 2012; 7:e38969. [PMID: 22720005 PMCID: PMC3377700 DOI: 10.1371/journal.pone.0038969] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2011] [Accepted: 05/15/2012] [Indexed: 11/25/2022] Open
Abstract
Staphylococcus aureus is recognized as one of the major human pathogens and is by far one of the most common nosocomial organisms. The genetic basis for the emergence of highly epidemic strains remains mysterious. Studying the microevolution of the different clones of S. aureus is essential for identifying the forces driving pathogen emergence and spread. The aim of the present study was to determine the genetic changes characterizing a lineage belonging to the South German clone (ST228) that spread over ten years in a tertiary care hospital in Switzerland. For this reason, we compared the whole genome of eight isolates recovered between 2001 and 2008 at the Lausanne hospital. The genetic comparison of these isolates revealed that their genomes are extremely closely related. Yet, a few more important genetic changes, such as the replacement of a plasmid, the loss of large fragments of DNA, or the insertion of transposases, were observed. These transfers of mobile genetic elements shaped the evolution of the ST228 lineage that spread within the Lausanne hospital. Nevertheless, although the strains analyzed differed in their dynamics, we have not been able to link a particular genetic element with spreading success. Finally, the present study showed that new sequencing technologies improve considerably the quality and quantity of information obtained for a single strain; but this information is still difficult to interpret and important investments are required for the technology to become accessible for routine investigations.
Collapse
Affiliation(s)
- Valérie Vogel
- Service of Hospital Preventive Medicine, Lausanne University Hospital, Lausanne, Switzerland.
| | | | | | | | | |
Collapse
|
306
|
Ågren J, Sundström A, Håfström T, Segerman B. Gegenees: fragmented alignment of multiple genomes for determining phylogenomic distances and genetic signatures unique for specified target groups. PLoS One 2012; 7:e39107. [PMID: 22723939 PMCID: PMC3377601 DOI: 10.1371/journal.pone.0039107] [Citation(s) in RCA: 196] [Impact Index Per Article: 15.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2012] [Accepted: 05/17/2012] [Indexed: 11/25/2022] Open
Abstract
The rapid development of Next Generation Sequencing technologies leads to the accumulation of huge amounts of sequencing data. The scientific community faces an enormous challenge in how to deal with this explosion. Here we present a software tool, ‘Gegenees’, that uses a fragmented alignment approach to facilitate the comparative analysis of hundreds of microbial genomes. The genomes are fragmented and compared, all against all, by a multithreaded BLAST control engine. Ready-made alignments can be complemented with new genomes without recalculating the existing data points. Gegenees gives a phylogenomic overview of the genomes and the alignment can then be mined for genomic regions with conservation patterns matching a defined target group and absent from a background group. The genomic regions are given biomarker scores forming a uniqueness signature that can be viewed and explored, graphically and in tabular form. A primer/probe alignment tool is also included for specificity verification of currently used or new primers. We exemplify the use of Gegenees on the Bacillus cereus group, on Foot and Mouth Disease Viruses, and on strains from the 2011 Escherichia coli O104:H4 outbreak. Gegenees contributes towards an increased capacity of fast and efficient data mining as more and more genomes become sequenced.
Collapse
Affiliation(s)
- Joakim Ågren
- Department of Bacteriology, National Veterinary Institute (SVA), Uppsala, Sweden
- Department of Biomedical Sciences and Veterinary Public Health, Swedish University of Agricultural Sciences (SLU), Uppsala, Sweden
| | - Anders Sundström
- Department of Bacteriology, National Veterinary Institute (SVA), Uppsala, Sweden
| | - Therese Håfström
- Department of Bacteriology, National Veterinary Institute (SVA), Uppsala, Sweden
| | - Bo Segerman
- Department of Bacteriology, National Veterinary Institute (SVA), Uppsala, Sweden
- * E-mail:
| |
Collapse
|
307
|
Croucher NJ, Harris SR, Barquist L, Parkhill J, Bentley SD. A high-resolution view of genome-wide pneumococcal transformation. PLoS Pathog 2012; 8:e1002745. [PMID: 22719250 PMCID: PMC3375284 DOI: 10.1371/journal.ppat.1002745] [Citation(s) in RCA: 78] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2012] [Accepted: 04/27/2012] [Indexed: 01/03/2023] Open
Abstract
Transformation is an important mechanism of microbial evolution through which bacteria have been observed to rapidly adapt in response to clinical interventions; examples include facilitating vaccine evasion and the development of penicillin resistance in the major respiratory pathogen Streptococcus pneumoniae. To characterise the process in detail, the genomes of 124 S. pneumoniae isolates produced through in vitro transformation were sequenced and recombination events detected. Those recombinations importing the selected marker were independent of unselected events elsewhere in the genome, the positions of which were not significantly affected by local sequence similarity between donor and recipient or mismatch repair processes. However, both types of recombinations were sometimes mosaic, with multiple non-contiguous segments originating from the same molecule of donor DNA. The lengths of the unselected events were exponentially distributed with a mean of 2.3 kb, implying that recombinations are stochastically resolved with a fixed per base probability of 4.4×10(-4) bp(-1). This distribution of recombination sizes, coupled with an observed under representation of large insertions within transferred sequence, suggests transformation has the potential to reduce the size of bacterial genomes, and is unlikely to act as an efficient mechanism for the uptake of accessory genomic loci.
Collapse
Affiliation(s)
- Nicholas J Croucher
- Pathogen Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.
| | | | | | | | | |
Collapse
|
308
|
Church PC, Goscinski A, Holt K, Inouye M, Ghoting A, Makarychev K, Reumann M. Design of multiple sequence alignment algorithms on parallel, distributed memory supercomputers. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2012; 2011:924-7. [PMID: 22254462 DOI: 10.1109/iembs.2011.6090208] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The challenge of comparing two or more genomes that have undergone recombination and substantial amounts of segmental loss and gain has recently been addressed for small numbers of genomes. However, datasets of hundreds of genomes are now common and their sizes will only increase in the future. Multiple sequence alignment of hundreds of genomes remains an intractable problem due to quadratic increases in compute time and memory footprint. To date, most alignment algorithms are designed for commodity clusters without parallelism. Hence, we propose the design of a multiple sequence alignment algorithm on massively parallel, distributed memory supercomputers to enable research into comparative genomics on large data sets. Following the methodology of the sequential progressiveMauve algorithm, we design data structures including sequences and sorted k-mer lists on the IBM Blue Gene/P supercomputer (BG/P). Preliminary results show that we can reduce the memory footprint so that we can potentially align over 250 bacterial genomes on a single BG/P compute node. We verify our results on a dataset of E.coli, Shigella and S.pneumoniae genomes. Our implementation returns results matching those of the original algorithm but in 1/2 the time and with 1/4 the memory footprint for scaffold building. In this study, we have laid the basis for multiple sequence alignment of large-scale datasets on a massively parallel, distributed memory supercomputer, thus enabling comparison of hundreds instead of a few genome sequences within reasonable time.
Collapse
|
309
|
Phylomark, a tool to identify conserved phylogenetic markers from whole-genome alignments. Appl Environ Microbiol 2012; 78:4884-92. [PMID: 22582056 DOI: 10.1128/aem.00929-12] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The sequencing and analysis of multiple housekeeping genes has been routinely used to phylogenetically compare closely related bacterial isolates. Recent studies using whole-genome alignment (WGA) and phylogenetics from >100 Escherichia coli genomes has demonstrated that tree topologies from WGA and multilocus sequence typing (MLST) markers differ significantly. A nonrepresentative phylogeny can lead to incorrect conclusions regarding important evolutionary relationships. In this study, the Phylomark algorithm was developed to identify a minimal number of useful phylogenetic markers that recapitulate the WGA phylogeny. To test the algorithm, we used a set of diverse draft and complete E. coli genomes. The algorithm identified more than 100,000 potential markers of different fragment lengths (500 to 900 nucleotides). Three molecular markers were ultimately chosen to determine the phylogeny based on a low Robinson-Foulds (RF) distance compared to the WGA phylogeny. A phylogenetic analysis demonstrated that a more representative phylogeny was inferred for a concatenation of these markers compared to all other MLST schemes for E. coli. As a functional test of the algorithm, the three markers (genomic guided E. coli markers, or GIG-EM) were amplified and sequenced from a set of environmental E. coli strains (ECOR collection) and informatically extracted from a set of 78 diarrheagenic E. coli strains (DECA collection). In the instances of the 40-genome test set and the DECA collection, the GIG-EM system outperformed other E. coli MLST systems in terms of recapitulating the WGA phylogeny. This algorithm can be employed to determine the minimal marker set for any organism that has sufficient genome sequencing.
Collapse
|
310
|
Shapiro BJ, Friedman J, Cordero OX, Preheim SP, Timberlake SC, Szabó G, Polz MF, Alm EJ. Population genomics of early events in the ecological differentiation of bacteria. Science 2012; 336:48-51. [PMID: 22491847 PMCID: PMC3337212 DOI: 10.1126/science.1218198] [Citation(s) in RCA: 355] [Impact Index Per Article: 27.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Genetic exchange is common among bacteria, but its effect on population diversity during ecological differentiation remains controversial. A fundamental question is whether advantageous mutations lead to selection of clonal genomes or, as in sexual eukaryotes, sweep through populations on their own. Here, we show that in two recently diverged populations of ocean bacteria, ecological differentiation has occurred akin to a sexual mechanism: A few genome regions have swept through subpopulations in a habitat-specific manner, accompanied by gradual separation of gene pools as evidenced by increased habitat specificity of the most recent recombinations. These findings reconcile previous, seemingly contradictory empirical observations of the genetic structure of bacterial populations and point to a more unified process of differentiation in bacteria and sexual eukaryotes than previously thought.
Collapse
Affiliation(s)
- B. Jesse Shapiro
- Program in Computational and Systems Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Broad Institute, Cambridge, MA 02142, USA
| | - Jonathan Friedman
- Program in Computational and Systems Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Otto X. Cordero
- Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Sarah P. Preheim
- Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Sonia C. Timberlake
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Gitta Szabó
- Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Martin F. Polz
- Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Eric J. Alm
- Program in Computational and Systems Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Broad Institute, Cambridge, MA 02142, USA
| |
Collapse
|
311
|
Chaudhuri RR, Henderson IR. The evolution of the Escherichia coli phylogeny. INFECTION GENETICS AND EVOLUTION 2012; 12:214-26. [DOI: 10.1016/j.meegid.2012.01.005] [Citation(s) in RCA: 105] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/02/2011] [Revised: 01/04/2012] [Accepted: 01/05/2012] [Indexed: 10/14/2022]
|
312
|
Hatje K, Kollmar M. A phylogenetic analysis of the brassicales clade based on an alignment-free sequence comparison method. FRONTIERS IN PLANT SCIENCE 2012; 3:192. [PMID: 22952468 PMCID: PMC3429886 DOI: 10.3389/fpls.2012.00192] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/25/2012] [Accepted: 08/06/2012] [Indexed: 05/06/2023]
Abstract
Phylogenetic analyses reveal the evolutionary derivation of species. A phylogenetic tree can be inferred from multiple sequence alignments of proteins or genes. The alignment of whole genome sequences of higher eukaryotes is a computational intensive and ambitious task as is the computation of phylogenetic trees based on these alignments. To overcome these limitations, we here used an alignment-free method to compare genomes of the Brassicales clade. For each nucleotide sequence a Chaos Game Representation (CGR) can be computed, which represents each nucleotide of the sequence as a point in a square defined by the four nucleotides as vertices. Each CGR is therefore a unique fingerprint of the underlying sequence. If the CGRs are divided by grid lines each grid square denotes the occurrence of oligonucleotides of a specific length in the sequence (Frequency Chaos Game Representation, FCGR). Here, we used distance measures between FCGRs to infer phylogenetic trees of Brassicales species. Three types of data were analyzed because of their different characteristics: (A) Whole genome assemblies as far as available for species belonging to the Malvidae taxon. (B) EST data of species of the Brassicales clade. (C) Mitochondrial genomes of the Rosids branch, a supergroup of the Malvidae. The trees reconstructed based on the Euclidean distance method are in general agreement with single gene trees. The Fitch-Margoliash and Neighbor joining algorithms resulted in similar to identical trees. Here, for the first time we have applied the bootstrap re-sampling concept to trees based on FCGRs to determine the support of the branchings. FCGRs have the advantage that they are fast to calculate, and can be used as additional information to alignment based data and morphological characteristics to improve the phylogenetic classification of species in ambiguous cases.
Collapse
Affiliation(s)
- Klas Hatje
- Abteilung NMR-Basierte Strukturbiologie, Max-Planck-Institut für Biophysikalische ChemieGöttingen, Germany
| | - Martin Kollmar
- Abteilung NMR-Basierte Strukturbiologie, Max-Planck-Institut für Biophysikalische ChemieGöttingen, Germany
- *Correspondence: Martin Kollmar, Abteilung NMR-Basierte Strukturbiologie, Max-Planck-Institut für Biophysikalische Chemie, Am Fassberg 11, D-37077 Göttingen, Germany. e-mail:
| |
Collapse
|
313
|
Riley DR, Angiuoli SV, Crabtree J, Dunning Hotopp JC, Tettelin H. Using Sybil for interactive comparative genomics of microbes on the web. ACTA ACUST UNITED AC 2011; 28:160-6. [PMID: 22121156 PMCID: PMC3259440 DOI: 10.1093/bioinformatics/btr652] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
Motivation: Analysis of multiple genomes requires sophisticated tools that provide search, visualization, interactivity and data export. Comparative genomics datasets tend to be large and complex, making development of these tools difficult. In addition to scalability, comparative genomics tools must also provide user-friendly interfaces such that the research scientist can explore complex data with minimal technical expertise. Results: We describe a new version of the Sybil software package and its application to the important human pathogen Streptococcus pneumoniae. This new software provides a feature-rich set of comparative genomics tools for inspection of multiple genome structures, mining of orthologous gene families and identification of potential vaccine candidates. Availability: The S.pneumoniae resource is online at http://strepneumo-sybil.igs.umaryland.edu. The software, database and website are available for download as a portable virtual machine and from http://sourceforge.net/projects/sybil. Contact:driley@som.umaryland.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- David R Riley
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA.
| | | | | | | | | |
Collapse
|
314
|
Laing C, Villegas A, Taboada EN, Kropinski A, Thomas JE, Gannon VPJ. Identification of Salmonella enterica species- and subgroup-specific genomic regions using Panseq 2.0. INFECTION GENETICS AND EVOLUTION 2011; 11:2151-61. [PMID: 22001825 DOI: 10.1016/j.meegid.2011.09.021] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/01/2011] [Revised: 09/02/2011] [Accepted: 09/22/2011] [Indexed: 01/04/2023]
Abstract
The pan-genome of a taxonomic group consists of evolutionarily conserved core genes shared by all members and accessory genes that are present only in some members of the group. Group- and subgroup-specific core genes are thought to contribute to shared phenotypes such as virulence and niche specificity. In this study we analyzed 39 Salmonella enterica genomes (16 closed, 23 draft), a species that contains two human-specific serovars that cause typhoid fever, as well as a large number of zoonotic serovars that cause gastroenteritis in humans. Panseq 2.0 was used to define the pan-genome by adjusting the threshold at which group-specific "core" loci are defined. We found the pan-genome to be 9.03 Mbp in size, and that the core genome size decreased, while the number of SNPs/100 bp increased, as the number of strains used to define the core genome increased, suggesting substantial divergence among S. enterica subgroups. Subgroup-specific "core" genes, in contrast, had fewer SNPs/100 bp, likely reflecting their more recent acquisition. Phylogenetic trees were created from the concatenated and aligned pan-genome, the core genome, and multi-locus-sequence typing (MLST) loci. Branch support increased among the trees, and strains of the same serovar grouped closer together as the number of loci used to create the tree increased. Further, high levels of discrimination were achieved even amongst the most closely related strains of S. enterica Typhi, suggesting that the data generated by Panseq may also be of value in short-term epidemiological studies. Panseq provides an easy and fast way of performing pan-genomic analyses, which can include the identification of group-dominant as well as group-specific loci and is available as a web-server and a standalone version at http://lfz.corefacility.ca/panseq/.
Collapse
Affiliation(s)
- Chad Laing
- Laboratory for Foodborne Zoonoses, Public Health Agency of Canada, Lethbridge, AB, Canada.
| | | | | | | | | | | |
Collapse
|
315
|
Fricke WF, Mammel MK, McDermott PF, Tartera C, White DG, Leclerc JE, Ravel J, Cebula TA. Comparative genomics of 28 Salmonella enterica isolates: evidence for CRISPR-mediated adaptive sublineage evolution. J Bacteriol 2011; 193:3556-68. [PMID: 21602358 PMCID: PMC3133335 DOI: 10.1128/jb.00297-11] [Citation(s) in RCA: 126] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2011] [Accepted: 05/09/2011] [Indexed: 12/27/2022] Open
Abstract
Despite extensive surveillance, food-borne Salmonella enterica infections continue to be a significant burden on public health systems worldwide. As the S. enterica species comprises sublineages that differ greatly in antigenic representation, virulence, and antimicrobial resistance phenotypes, a better understanding of the species' evolution is critical for the prediction and prevention of future outbreaks. The roles that virulence and resistance phenotype acquisition, exchange, and loss play in the evolution of S. enterica sublineages, which to a certain extent are represented by serotypes, remains mostly uncharacterized. Here, we compare 17 newly sequenced and phenotypically characterized nontyphoidal S. enterica strains to 11 previously sequenced S. enterica genomes to carry out the most comprehensive comparative analysis of this species so far. These phenotypic and genotypic data comparisons in the phylogenetic species context suggest that the evolution of known S. enterica sublineages is mediated mostly by two mechanisms, (i) the loss of coding sequences with known metabolic functions, which leads to functional reduction, and (ii) the acquisition of horizontally transferred phage and plasmid DNA, which provides virulence and resistance functions and leads to increasing specialization. Matches between S. enterica clustered regularly interspaced short palindromic repeats (CRISPR), part of a defense mechanism against invading plasmid and phage DNA, and plasmid and prophage regions suggest that CRISPR-mediated immunity could control short-term phenotype changes and mediate long-term sublineage evolution. CRISPR analysis could therefore be critical in assessing the evolutionary potential of S. enterica sublineages and aid in the prediction and prevention of future S. enterica outbreaks.
Collapse
Affiliation(s)
- W Florian Fricke
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland 21201, USA.
| | | | | | | | | | | | | | | |
Collapse
|
316
|
Angiuoli SV, Dunning Hotopp JC, Salzberg SL, Tettelin H. Improving pan-genome annotation using whole genome multiple alignment. BMC Bioinformatics 2011; 12:272. [PMID: 21718539 PMCID: PMC3142524 DOI: 10.1186/1471-2105-12-272] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2011] [Accepted: 06/30/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Rapid annotation and comparisons of genomes from multiple isolates (pan-genomes) is becoming commonplace due to advances in sequencing technology. Genome annotations can contain inconsistencies and errors that hinder comparative analysis even within a single species. Tools are needed to compare and improve annotation quality across sets of closely related genomes. RESULTS We introduce a new tool, Mugsy-Annotator, that identifies orthologs and evaluates annotation quality in prokaryotic genomes using whole genome multiple alignment. Mugsy-Annotator identifies anomalies in annotated gene structures, including inconsistently located translation initiation sites and disrupted genes due to draft genome sequencing or pseudogenes. An evaluation of species pan-genomes using the tool indicates that such anomalies are common, especially at translation initiation sites. Mugsy-Annotator reports alternate annotations that improve consistency and are candidates for further review. CONCLUSIONS Whole genome multiple alignment can be used to efficiently identify orthologs and annotation problem areas in a bacterial pan-genome. Comparisons of annotated gene structures within a species may show more variation than is actually present in the genome, indicating errors in genome annotation. Our new tool Mugsy-Annotator assists re-annotation efforts by highlighting edits that improve annotation consistency.
Collapse
Affiliation(s)
- Samuel V Angiuoli
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA
- Institute for Genome Sciences (IGS), University of Maryland Baltimore, Baltimore, Maryland 21201, USA
| | - Julie C Dunning Hotopp
- Institute for Genome Sciences (IGS), University of Maryland Baltimore, Baltimore, Maryland 21201, USA
| | - Steven L Salzberg
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA
| | - Hervé Tettelin
- Institute for Genome Sciences (IGS), University of Maryland Baltimore, Baltimore, Maryland 21201, USA
| |
Collapse
|
317
|
Everything at once: comparative analysis of the genomes of bacterial pathogens. Vet Microbiol 2011; 153:13-26. [PMID: 21764529 DOI: 10.1016/j.vetmic.2011.06.014] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2011] [Revised: 06/15/2011] [Accepted: 06/16/2011] [Indexed: 12/12/2022]
Abstract
The sum of unique genes in all genomes of a bacterial species is referred to as the pan-genome and is comprised of variably absent or present accessory genes and universally present core genes. The accessory genome is an important source of genetic variability in bacterial populations, allowing sub-populations of bacteria to better adapt to specific niches. Such subgroups may themselves have a relatively stable core genome that may influence host preference, virulence, or an association with specific disease syndromes. The core genome provides a useful means of phylogenetic reconstruction as well as contributing to phenotypic heterogeneity. Variation within the pan-genome forms the basis of comparative genotyping techniques, which have evolved alongside technology. Current high-throughput sequencing platforms have created an unprecedented opportunity for comparisons among multiple, closely related genomes. The computer algorithms and software for such comparisons continue to evolve and promise exciting advances in the world of bacterial comparative genomics. We review genotyping techniques based upon phenotypic traits, both core and accessory genomes, and look at some of the software programs currently available to perform whole-genome comparative analyses.
Collapse
|
318
|
Dewey CN. Positional orthology: putting genomic evolutionary relationships into context. Brief Bioinform 2011; 12:401-12. [PMID: 21705766 PMCID: PMC3178058 DOI: 10.1093/bib/bbr040] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Orthology is a powerful refinement of homology that allows us to describe more precisely the evolution of genomes and understand the function of the genes they contain. However, because orthology is not concerned with genomic position, it is limited in its ability to describe genes that are likely to have equivalent roles in different genomes. Because of this limitation, the concept of ‘positional orthology’ has emerged, which describes the relation between orthologous genes that retain their ancestral genomic positions. In this review, we formally define this concept, for which we introduce the shorter term ‘toporthology’, with respect to the evolutionary events experienced by a gene’s ancestors. Through a discussion of recent studies on the role of genomic context in gene evolution, we show that the distinction between orthology and toporthology is biologically significant. We then review a number of orthology prediction methods that take genomic context into account and thus that may be used to infer the important relation of toporthology.
Collapse
Affiliation(s)
- Colin N Dewey
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, 5785 Medical Sciences Center, 1300 University Ave, Madison, WI 53706, USA.
| |
Collapse
|
319
|
Sahl JW, Steinsland H, Redman JC, Angiuoli SV, Nataro JP, Sommerfelt H, Rasko DA. A comparative genomic analysis of diverse clonal types of enterotoxigenic Escherichia coli reveals pathovar-specific conservation. Infect Immun 2011; 79:950-60. [PMID: 21078854 PMCID: PMC3028850 DOI: 10.1128/iai.00932-10] [Citation(s) in RCA: 100] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2010] [Revised: 10/06/2010] [Accepted: 11/01/2010] [Indexed: 11/20/2022] Open
Abstract
Enterotoxigenic Escherichia coli (ETEC) is a major cause of diarrheal illness in children less than 5 years of age in low- and middle-income nations, whereas it is an emerging enteric pathogen in industrialized nations. Despite being an important cause of diarrhea, little is known about the genomic composition of ETEC. To address this, we sequenced the genomes of five ETEC isolates obtained from children in Guinea-Bissau with diarrhea. These five isolates represent distinct and globally dominant ETEC clonal groups. Comparative genomic analyses utilizing a gene-independent whole-genome alignment method demonstrated that sequenced ETEC strains share approximately 2.7 million bases of genomic sequence. Phylogenetic analysis of this "core genome" confirmed the diverse history of the ETEC pathovar and provides a finer resolution of the E. coli relationships than multilocus sequence typing. No identified genomic regions were conserved exclusively in all ETEC genomes; however, we identified more genomic content conserved among ETEC genomes than among non-ETEC E. coli genomes, suggesting that ETEC isolates share a genomic core. Comparisons of known virulence and of surface-exposed and colonization factor genes across all sequenced ETEC genomes not only identified variability but also indicated that some antigens are restricted to the ETEC pathovar. Overall, the generation of these five genome sequences, in addition to the two previously generated ETEC genomes, highlights the genomic diversity of ETEC. These studies increase our understanding of ETEC evolution, as well as provide insight into virulence factors and conserved proteins, which may be targets for vaccine development.
Collapse
Affiliation(s)
- Jason W. Sahl
- Institute for Genome Sciences, Department of Pediatrics, Center for Vaccine Development, Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, Maryland, Centre for International Health, Department of Biomedicine, University of Bergen, Bergen, Norway, Division of Infectious Disease Control, Norwegian Institute of Public Health, Oslo, Norway
| | - Hans Steinsland
- Institute for Genome Sciences, Department of Pediatrics, Center for Vaccine Development, Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, Maryland, Centre for International Health, Department of Biomedicine, University of Bergen, Bergen, Norway, Division of Infectious Disease Control, Norwegian Institute of Public Health, Oslo, Norway
| | - Julia C. Redman
- Institute for Genome Sciences, Department of Pediatrics, Center for Vaccine Development, Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, Maryland, Centre for International Health, Department of Biomedicine, University of Bergen, Bergen, Norway, Division of Infectious Disease Control, Norwegian Institute of Public Health, Oslo, Norway
| | - Samuel V. Angiuoli
- Institute for Genome Sciences, Department of Pediatrics, Center for Vaccine Development, Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, Maryland, Centre for International Health, Department of Biomedicine, University of Bergen, Bergen, Norway, Division of Infectious Disease Control, Norwegian Institute of Public Health, Oslo, Norway
| | - James P. Nataro
- Institute for Genome Sciences, Department of Pediatrics, Center for Vaccine Development, Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, Maryland, Centre for International Health, Department of Biomedicine, University of Bergen, Bergen, Norway, Division of Infectious Disease Control, Norwegian Institute of Public Health, Oslo, Norway
| | - Halvor Sommerfelt
- Institute for Genome Sciences, Department of Pediatrics, Center for Vaccine Development, Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, Maryland, Centre for International Health, Department of Biomedicine, University of Bergen, Bergen, Norway, Division of Infectious Disease Control, Norwegian Institute of Public Health, Oslo, Norway
| | - David A. Rasko
- Institute for Genome Sciences, Department of Pediatrics, Center for Vaccine Development, Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, Maryland, Centre for International Health, Department of Biomedicine, University of Bergen, Bergen, Norway, Division of Infectious Disease Control, Norwegian Institute of Public Health, Oslo, Norway
| |
Collapse
|
320
|
Sahl JW, Lloyd AL, Redman JC, Cebula TA, Wood DP, Mobley HLT, Rasko DA. Genomic characterization of asymptomatic Escherichia coli isolated from the neobladder. MICROBIOLOGY-SGM 2011; 157:1088-1102. [PMID: 21252277 DOI: 10.1099/mic.0.043018-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The replacement of the bladder with a neobladder made from ileal tissue is the prescribed treatment in some cases of bladder cancer or trauma. Studies have demonstrated that individuals with an ileal neobladder have recurrent colonization by Escherichia coli and other species that are commonly associated with urinary tract infections; however, pyelonephritis and complicated symptomatic infections with ileal neobladders are relatively rare. This study examines the genomic content of two E. coli isolates from individuals with neobladders using comparative genomic hybridization (CGH) with a pan-E. coli/Shigella microarray. Comparisons of the neobladder genome hybridization patterns with reference genomes demonstrate that the neobladder isolates are more similar to the commensal, laboratory-adapted E. coli and a subset of enteroaggregative E. coli than they are to uropathogenic E. coli isolates. Genes identified by CGH as exclusively present in the neobladder isolates among the 30 examined isolates were primarily from large enteric isolate plasmids. Isolations identified a large plasmid in each isolate, and sequencing confirmed similarity to previously identified plasmids of enteric species. Screening, via PCR, of more than 100 isolates of E. coli from environmental, diarrhoeagenic and urinary tract sources did not identify neobladder-specific genes that were widely distributed in these populations. These results taken together demonstrate that the neobladder isolates, while distinct, are genomically more similar to gastrointestinal or commensal E. coli, suggesting why they can colonize the transplanted intestinal tissue but rarely progress to acute pyelonephritis or more severe disease.
Collapse
Affiliation(s)
- Jason W Sahl
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Amanda L Lloyd
- Department of Microbiology and Immunology, University of Michigan Medical School, 1150 West Medical Center Drive, 5641 Medical Science II, Ann Arbor, MI 48109, USA
| | - Julia C Redman
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Thomas A Cebula
- Johns Hopkins University, Department of Biology, 3400 North Charles Street, Baltimore, MD 21218, USA
| | - David P Wood
- University of Michigan Medical School, Department of Urology, 1500 East Medical Center Drive, Ann Arbor, MI 48109, USA
| | - Harry L T Mobley
- Department of Microbiology and Immunology, University of Michigan Medical School, 1150 West Medical Center Drive, 5641 Medical Science II, Ann Arbor, MI 48109, USA
| | - David A Rasko
- Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD 21201, USA
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| |
Collapse
|