1
|
Abstract
Codon usage depends on mutation bias, tRNA-mediated selection, and the need for high efficiency and accuracy in translation. One codon in a synonymous codon family is often strongly over-used, especially in highly expressed genes, which often leads to a high dN/dS ratio because dS is very small. Many different codon usage indices have been proposed to measure codon usage and codon adaptation. Sense codon could be misread by release factors and stop codons misread by tRNAs, which also contribute to codon usage in rare cases. This chapter outlines the conceptual framework on codon evolution, illustrates codon-specific and gene-specific codon usage indices, and presents their applications. A new index for codon adaptation that accounts for background mutation bias (Index of Translation Elongation) is presented and contrasted with codon adaptation index (CAI) which does not consider background mutation bias. They are used to re-analyze data from a recent paper claiming that translation elongation efficiency matters little in protein production. The reanalysis disproves the claim.
Collapse
|
2
|
Gene Prediction in Metagenomic Fragments with Deep Learning. BIOMED RESEARCH INTERNATIONAL 2017; 2017:4740354. [PMID: 29250541 PMCID: PMC5698827 DOI: 10.1155/2017/4740354] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Accepted: 10/08/2017] [Indexed: 01/14/2023]
Abstract
Next generation sequencing technologies used in metagenomics yield numerous sequencing fragments which come from thousands of different species. Accurately identifying genes from metagenomics fragments is one of the most fundamental issues in metagenomics. In this article, by fusing multifeatures (i.e., monocodon usage, monoamino acid usage, ORF length coverage, and Z-curve features) and using deep stacking networks learning model, we present a novel method (called Meta-MFDL) to predict the metagenomic genes. The results with 10 CV and independent tests show that Meta-MFDL is a powerful tool for identifying genes from metagenomic fragments.
Collapse
|
3
|
Abstract
Gene finding is the process of identifying genome sequence regions representing stretches of DNA that encode biologically active products, such as proteins or functional noncoding RNAs. As this is usually the first step in the analysis of any novel genomic sequence or resequenced sample of well-known organisms, it is a very important issue, as all downstream analyses depend on the results. This chapter describes the biological basis for gene finding, and the programs and computational approaches that are available for the automated identification of protein-coding genes. For bacterial, archaeal, and eukaryotic genomes, as well as for multi-species sequence data originating from environmental community studies, the state of the art in automated gene finding is described.
Collapse
Affiliation(s)
- Alice Carolyn McHardy
- Department for Algorithmic Bioinformatics, Heinrich Heine University, Düsseldorf, Germany.
- Computational Biology of Infection Research, Helmholtz Center for Infection Research, Braunschweig, Germany.
| | - Andreas Kloetgen
- Department for Algorithmic Bioinformatics, Heinrich Heine University, Düsseldorf, Germany
- Department of Pediatric Oncology, Hematology and Clinical Immunology, Heinrich Heine University, Düsseldorf, Germany
| |
Collapse
|
4
|
Abstract
The number of large-scale genomics projects is increasing due to the availability of affordable high-throughput sequencing (HTS) technologies. The use of HTS for bacterial infectious disease research is attractive because one whole-genome sequencing (WGS) run can replace multiple assays for bacterial typing, molecular epidemiology investigations, and more in-depth pathogenomic studies. The computational resources and bioinformatics expertise required to accommodate and analyze the large amounts of data pose new challenges for researchers embarking on genomics projects for the first time. Here, we present a comprehensive overview of a bacterial genomics projects from beginning to end, with a particular focus on the planning and computational requirements for HTS data, and provide a general understanding of the analytical concepts to develop a workflow that will meet the objectives and goals of HTS projects.
Collapse
|
5
|
Complete Genome Sequence of Xanthomonas campestris pv. campestris Strain 17 from Taiwan. GENOME ANNOUNCEMENTS 2015; 3:3/6/e01466-15. [PMID: 26679582 PMCID: PMC4683227 DOI: 10.1128/genomea.01466-15] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Xanthomonas campestris pv. campestris 17 is a Gram-negative bacterium that is phytopathogenic to cruciferous plants in Taiwan. The 4,994,426-bp-long genome consists of 24 contigs with 4,050 protein-coding genes, 1 noncoding RNA (ncRNA) gene, 6 rRNA genes, and 55 tRNA genes.
Collapse
|
6
|
Kumar D, Mondal AK, Kutum R, Dash D. Proteogenomics of rare taxonomic phyla: A prospective treasure trove of protein coding genes. Proteomics 2015; 16:226-40. [PMID: 26773550 DOI: 10.1002/pmic.201500263] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Revised: 09/18/2015] [Accepted: 09/28/2015] [Indexed: 01/04/2023]
Abstract
Sustainable innovations in sequencing technologies have resulted in a torrent of microbial genome sequencing projects. However, the prokaryotic genomes sequenced so far are unequally distributed along their phylogenetic tree; few phyla contain the majority, the rest only a few representatives. Accurate genome annotation lags far behind genome sequencing. While automated computational prediction, aided by comparative genomics, remains a popular choice for genome annotation, substantial fraction of these annotations are erroneous. Proteogenomics utilizes protein level experimental observations to annotate protein coding genes on a genome wide scale. Benefits of proteogenomics include discovery and correction of gene annotations regardless of their phylogenetic conservation. This not only allows detection of common, conserved proteins but also the discovery of protein products of rare genes that may be horizontally transferred or taxonomy specific. Chances of encountering such genes are more in rare phyla that comprise a small number of complete genome sequences. We collated all bacterial and archaeal proteogenomic studies carried out to date and reviewed them in the context of genome sequencing projects. Here, we present a comprehensive list of microbial proteogenomic studies, their taxonomic distribution, and also urge for targeted proteogenomics of underexplored taxa to build an extensive reference of protein coding genes.
Collapse
Affiliation(s)
- Dhirendra Kumar
- G. N. Ramachandran Knowledge Center of Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, Delhi, India
| | - Anupam Kumar Mondal
- G. N. Ramachandran Knowledge Center of Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, Delhi, India
| | - Rintu Kutum
- G. N. Ramachandran Knowledge Center of Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, Delhi, India
| | - Debasis Dash
- G. N. Ramachandran Knowledge Center of Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, Delhi, India
| |
Collapse
|
7
|
SearchDOGS bacteria, software that provides automated identification of potentially missed genes in annotated bacterial genomes. J Bacteriol 2014; 196:2030-42. [PMID: 24659774 DOI: 10.1128/jb.01368-13] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
We report the development of SearchDOGS Bacteria, software to automatically detect missing genes in annotated bacterial genomes by combining BLAST searches with comparative genomics. Having successfully applied the approach to yeast genomes, we redeveloped SearchDOGS to function as a standalone, downloadable package, requiring only a set of GenBank annotation files as input. The software automatically generates a homology structure using reciprocal BLAST and a synteny-based method; this is followed by a scan of the entire genome of each species for unannotated genes. Results are provided in a HTML interface, providing coordinates, BLAST results, syntenic location, omega values (Ka/Ks, where Ks is the number of synonymous substitutions per synonymous site and Ka is the number of nonsynonymous substitutions per nonsynonymous site) for protein conservation estimates, and other information for each candidate gene. Using SearchDOGS Bacteria, we identified 155 gene candidates in the Shigella boydii sb227 genome, including 56 candidates of length < 60 codons. SearchDOGS Bacteria has two major advantages over currently available annotation software. First, it outperforms current methods in terms of sensitivity and is highly effective at identifying small or highly diverged genes. Second, as a freely downloadable package, it can be used with unpublished or confidential data.
Collapse
|
8
|
Vorontsov IE, Kulakovskiy IV, Makeev VJ. Jaccard index based similarity measure to compare transcription factor binding site models. Algorithms Mol Biol 2013; 8:23. [PMID: 24074225 PMCID: PMC3851813 DOI: 10.1186/1748-7188-8-23] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2012] [Accepted: 09/18/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Positional weight matrix (PWM) remains the most popular for quantification of transcription factor (TF) binding. PWM supplied with a score threshold defines a set of putative transcription factor binding sites (TFBS), thus providing a TFBS model.TF binding DNA fragments obtained by different experimental methods usually give similar but not identical PWMs. This is also common for different TFs from the same structural family. Thus it is often necessary to measure the similarity between PWMs. The popular tools compare PWMs directly using matrix elements. Yet, for log-odds PWMs, negative elements do not contribute to the scores of highly scoring TFBS and thus may be different without affecting the sets of the best recognized binding sites. Moreover, the two TFBS sets recognized by a given pair of PWMs can be more or less different depending on the score thresholds. RESULTS We propose a practical approach for comparing two TFBS models, each consisting of a PWM and the respective scoring threshold. The proposed measure is a variant of the Jaccard index between two TFBS sets. The measure defines a metric space for TFBS models of all finite lengths. The algorithm can compare TFBS models constructed using substantially different approaches, like PWMs with raw positional counts and log-odds. We present the efficient software implementation: MACRO-APE (MAtrix CompaRisOn by Approximate P-value Estimation). CONCLUSIONS MACRO-APE can be effectively used to compute the Jaccard index based similarity for two TFBS models. A two-pass scanning algorithm is presented to scan a given collection of PWMs for PWMs similar to a given query. AVAILABILITY AND IMPLEMENTATION MACRO-APE is implemented in ruby 1.9; software including source code and a manual is freely available at http://autosome.ru/macroape/ and in supplementary materials.
Collapse
|
9
|
Liu Y, Guo J, Hu G, Zhu H. Gene prediction in metagenomic fragments based on the SVM algorithm. BMC Bioinformatics 2013; 14 Suppl 5:S12. [PMID: 23735199 PMCID: PMC3622649 DOI: 10.1186/1471-2105-14-s5-s12] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Background Metagenomic sequencing is becoming a powerful technology for exploring micro-ogranisms from various environments, such as human body, without isolation and cultivation. Accurately identifying genes from metagenomic fragments is one of the most fundamental issues. Results In this article, we present a novel gene prediction method named MetaGUN for metagenomic fragments based on a machine learning approach of SVM. It implements in a three-stage strategy to predict genes. Firstly, it classifies input fragments into phylogenetic groups by a k-mer based sequence binning method. Then, protein-coding sequences are identified for each group independently with SVM classifiers that integrate entropy density profiles (EDP) of codon usage, translation initiation site (TIS) scores and open reading frame (ORF) length as input patterns. Finally, the TISs are adjusted by employing a modified version of MetaTISA. To identify protein-coding sequences, MetaGun builds the universal module and the novel module. The former is based on a set of representative species, while the latter is designed to find potential functionary DNA sequences with conserved domains. Conclusions Comparisons on artificial shotgun fragments with multiple current metagenomic gene finders show that MetaGUN predicts better results on both 3' and 5' ends of genes with fragments of various lengths. Especially, it makes the most reliable predictions among these methods. As an application, MetaGUN was used to predict genes for two samples of human gut microbiome. It identifies thousands of additional genes with significant evidences. Further analysis indicates that MetaGUN tends to predict more potential novel genes than other current metagenomic gene finders.
Collapse
Affiliation(s)
- Yongchu Liu
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing, China
| | | | | | | |
Collapse
|
10
|
Ivankov DN, Payne SH, Galperin MY, Bonissone S, Pevzner PA, Frishman D. How many signal peptides are there in bacteria? Environ Microbiol 2013; 15:983-90. [PMID: 23556536 PMCID: PMC3621014 DOI: 10.1111/1462-2920.12105] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Over the last 5 years proteogenomics (using mass spectroscopy to identify proteins predicted from genomic sequences) has emerged as a promising approach to the high-throughput identification of protein N-termini, which remains a problem in genome annotation. Comparison of the experimentally determined N-termini with those predicted by sequence analysis tools allows identification of the signal peptides and therefore conclusions on the cytoplasmic or extracytoplasmic (periplasmic or extracellular) localization of the respective proteins. We present here the results of a proteogenomic study of the signal peptides in Escherichia coli K-12 and compare its results with the available experimental data and predictions by such software tools as SignalP and Phobius. A single proteogenomics experiment recovered more than a third of all signal peptides that had been experimentally determined during the past three decades and confirmed at least 31 additional signal peptides, mostly in the known exported proteins, which had been previously predicted but not validated. The filtering of putative signal peptides for the peptide length and the presence of an eight-residue hydrophobic patch and a typical signal peptidase cleavage site proved sufficient to eliminate the false-positive hits. Surprisingly, the results of this proteogenomics study, as well as a re-analysis of the E. coli genome with the latest version of SignalP program, show that the fraction of proteins containing signal peptides is only about 10%, or half of previous estimates.
Collapse
Affiliation(s)
- Dmitry N. Ivankov
- Technische Universität München, Department of Genome-Oriented Bioinformatics, 85354 Freising, Germany
| | - Samuel H. Payne
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352, USA
| | - Michael Y. Galperin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | | | - Dmitrij Frishman
- Technische Universität München, Department of Genome-Oriented Bioinformatics, 85354 Freising, Germany
- Helmholtz Zentrum Munich, National Research Center for Environment and Health, Institute for Bioinformatics, 85764 Neuherberg, Germany
| |
Collapse
|
11
|
Campanaro S, Pascale FD, Telatin A, Schiavon R, Bartlett DH, Valle G. The transcriptional landscape of the deep-sea bacterium Photobacterium profundum in both a toxR mutant and its parental strain. BMC Genomics 2012; 13:567. [PMID: 23107454 PMCID: PMC3505737 DOI: 10.1186/1471-2164-13-567] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2012] [Accepted: 10/16/2012] [Indexed: 02/08/2023] Open
Abstract
Background The deep-sea bacterium Photobacterium profundum is an established model for studying high pressure adaptation. In this paper we analyse the parental strain DB110 and the toxR mutant TW30 by massively parallel cDNA sequencing (RNA-seq). ToxR is a transmembrane DNA-binding protein first discovered in Vibrio cholerae, where it regulates a considerable number of genes involved in environmental adaptation and virulence. In P. profundum the abundance and activity of this protein is influenced by hydrostatic pressure and its role is related to the regulation of genes in a pressure-dependent manner. Results To better characterize the ToxR regulon, we compared the expression profiles of wt and toxR strains in response to pressure changes. Our results revealed a complex expression pattern with a group of 22 genes having expression profiles similar to OmpH that is an outer membrane protein transcribed in response to high hydrostatic pressure. Moreover, RNA-seq allowed a deep characterization of the transcriptional landscape that led to the identification of 460 putative small RNA genes and the detection of 298 protein-coding genes previously unknown. We were also able to perform a genome-wide prediction of operon structure, transcription start and termination sites, revealing an unexpected high number of genes (992) with large 5′-UTRs, long enough to harbour cis-regulatory RNA structures, suggesting a correlation between intergenic region size and UTR length. Conclusion This work led to a better understanding of high-pressure response in P. profundum. Furthermore, the high-resolution RNA-seq analysis revealed several unexpected features about transcriptional landscape and general mechanisms of controlling bacterial gene expression.
Collapse
Affiliation(s)
- Stefano Campanaro
- Department of Biology and CRIBI Biotechnology Centre, University of Padua, Via Ugo Bassi 58/B, Padova 35131, Italy.
| | | | | | | | | | | |
Collapse
|
12
|
Abstract
We present here a novel methodology for predicting new genes in prokaryotic genomes on the basis of inherent energetics of DNA. Regions of higher thermodynamic stability were identified, which were filtered based on already known annotations to yield a set of potentially new genes. These were then processed for their compatibility with the stereo-chemical properties of proteins and tripeptide frequencies of proteins in Swissprot data, which results in a reliable set of new genes in a genome. Quite surprisingly, the methodology identifies new genes even in well-annotated genomes. Also, the methodology can handle genomes of any GC-content, size and number of annotated genes.
Collapse
|
13
|
A small predatory core genome in the divergent marine Bacteriovorax marinus SJ and the terrestrial Bdellovibrio bacteriovorus. ISME JOURNAL 2012; 7:148-60. [PMID: 22955231 PMCID: PMC3526173 DOI: 10.1038/ismej.2012.90] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Bacteriovorax marinus SJ is a predatory delta-proteobacterium isolated from a marine environment. The genome sequence of this strain provides an interesting contrast to that of the terrestrial predatory bacterium Bdellovibrio bacteriovorus HD100. Based on their predatory lifestyle, Bacteriovorax were originally designated as members of the genus Bdellovibrio but subsequently were re-assigned to a new genus and family based on genetic and phenotypic differences. B. marinus attaches to Gram-negative bacteria, penetrates through the cell wall to form a bdelloplast, in which it replicates, as shown using microscopy. Bacteriovorax is distinct, as it shares only 30% of its gene products with its closest sequenced relatives. Remarkably, 34% of predicted genes over 500 nt in length were completely unique with no significant matches in the databases. As expected, Bacteriovorax shares several characteristic loci with the other delta-proteobacteria. A geneset shared between Bacteriovorax and Bdellovibrio that is not conserved among other delta-proteobacteria such as Myxobacteria (which destroy prey bacteria externally via lysis), or the non-predatory Desulfo-bacteria and Geobacter species was identified. These 291 gene orthologues common to both Bacteriovorax and Bdellovibrio may be the key indicators of host-interaction predatory-specific processes required for prey entry. The locus from Bdellovibrio bacteriovorus is implicated in the switch from predatory to prey/host-independent growth. Although the locus is conserved in B. marinus, the sequence has only limited similarity. The results of this study advance understanding of both the similarities and differences between Bdellovibrio and Bacteriovorax and confirm the distant relationship between the two and their separation into different families.
Collapse
|
14
|
Goli B, Nair AS. The elusive short gene – an ensemble method for recognition for prokaryotic genome. Biochem Biophys Res Commun 2012; 422:36-41. [DOI: 10.1016/j.bbrc.2012.04.090] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2012] [Accepted: 04/17/2012] [Indexed: 10/28/2022]
|
15
|
Abstract
With the development of ultra-high-throughput technologies, the cost of sequencing bacterial genomes has been vastly reduced. As more genomes are sequenced, less time can be spent manually annotating those genomes, resulting in an increased reliance on automatic annotation pipelines. However, automatic pipelines can produce inaccurate genome annotation and their results often require manual curation. Here, we discuss the automatic and manual annotation of bacterial genomes, identify common problems introduced by the current genome annotation process and suggests potential solutions.
Collapse
Affiliation(s)
- Emily J Richardson
- The Roslin Institute, University of Edinburgh, Easter Bush, EH25 9RG, UK
| | | |
Collapse
|
16
|
Ellis JT, Sims RC, Miller CD. Monitoring microbial diversity of bioreactors using metagenomic approaches. Subcell Biochem 2012; 64:73-94. [PMID: 23080246 DOI: 10.1007/978-94-007-5055-5_4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
With the rapid development of molecular techniques, particularly 'omics' technologies, the field of microbial ecology is growing rapidly. The applications of next generation sequencing have allowed researchers to produce massive amounts of genetic data on individual microbes, providing information about microbial communities and their interactions through in situ and in vitro measurements. The ability to identify novel microbes, functions, and enzymes, along with developing an understanding of microbial interactions and functions, is necessary for efficient production of useful and high value products in bioreactors. The ability to optimize bioreactors fully and understand microbial interactions and functions within these systems will establish highly efficient industrial processes for the production of bioproducts. This chapter will provide an overview of bioreactors and metagenomic technologies to help the reader understand microbial communities, interactions, and functions in bioreactors.
Collapse
Affiliation(s)
- Joshua T Ellis
- Department of Biological Engineering, Utah State University, 4105 Old Main Hill, Logan, UT, 84322-4105, USA
| | | | | |
Collapse
|
17
|
Okamoto A, Yamada K. Proteome driven re-evaluation and functional annotation of the Streptococcus pyogenes SF370 genome. BMC Microbiol 2011; 11:249. [PMID: 22070424 PMCID: PMC3224786 DOI: 10.1186/1471-2180-11-249] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2011] [Accepted: 11/10/2011] [Indexed: 12/02/2022] Open
Abstract
Background The genome data of Streptococcus pyogenes SF370 has been widely used by many researchers and provides a vast array of interesting findings. Nevertheless, approximately 40% of genes remain classified as hypothetical proteins, and several coding sequences (CDSs) have been unrecognized. In this study, we attempted a shotgun proteomic analysis with a six-frame database that was independent of genome annotation. Results Nine proteins encoded by novel ORFs were found by shotgun proteomic analysis, and their specific mRNAs were verified by reverse transcriptional PCR (RT-PCR). We also provided functional annotations for hypothetical genes using proteomic analysis from three different culture conditions that were separated into three fractions: supernatant, soluble, and insoluble. Consequently, we identified 567 proteins on re-evaluation of the proteomic data using an in-house database comprising 1,697 annotated and nine non-annotated CDSs. We provided functional annotations for 126 hypothetical proteins (18.9% out of the 668 hypothetical proteins) based on their cellular fractions and expression profiles under different culture conditions. Conclusions The list of amino acid sequences that were annotated by genome analysis contains outdated information and unrecognized protein-coding sequences. We suggest that the six-frame database derived from actual DNA sequences be used for reliable proteomic analysis. In addition, the experimental evidence from functional proteomic analysis is useful for the re-evaluation of previously sequenced genomes.
Collapse
Affiliation(s)
- Akira Okamoto
- Department of Molecular Bacteriology, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya, Aichi 466-8550, Japan.
| | | |
Collapse
|
18
|
Holt DC, Holden MTG, Tong SYC, Castillo-Ramirez S, Clarke L, Quail MA, Currie BJ, Parkhill J, Bentley SD, Feil EJ, Giffard PM. A very early-branching Staphylococcus aureus lineage lacking the carotenoid pigment staphyloxanthin. Genome Biol Evol 2011; 3:881-95. [PMID: 21813488 PMCID: PMC3175761 DOI: 10.1093/gbe/evr078] [Citation(s) in RCA: 129] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
Here we discuss the evolution of the northern Australian Staphylococcus aureus isolate MSHR1132 genome. MSHR1132 belongs to the divergent clonal complex 75 lineage. The average nucleotide divergence between orthologous genes in MSHR1132 and typical S. aureus is approximately sevenfold greater than the maximum divergence observed in this species to date. MSHR1132 has a small accessory genome, which includes the well-characterized genomic islands, νSAα and νSaβ, suggesting that these elements were acquired well before the expansion of the typical S. aureus population. Other mobile elements show mosaic structure (the prophage φSa3) or evidence of recent acquisition from a typical S. aureus lineage (SCCmec, ICE6013 and plasmid pMSHR1132). There are two differences in gene repertoire compared with typical S. aureus that may be significant clues as to the genetic basis underlying the successful emergence of S. aureus as a pathogen. First, MSHR1132 lacks the genes for production of staphyloxanthin, the carotenoid pigment that confers upon S. aureus its characteristic golden color and protects against oxidative stress. The lack of pigment was demonstrated in 126 of 126 CC75 isolates. Second, a mobile clustered regularly interspaced short palindromic repeat (CRISPR) element is inserted into orfX of MSHR1132. Although common in other staphylococcal species, these elements are very rare within S. aureus and may impact accessory genome acquisition. The CRISPR spacer sequences reveal a history of attempted invasion by known S. aureus mobile elements. There is a case for the creation of a new taxon to accommodate this and related isolates.
Collapse
Affiliation(s)
- Deborah C Holt
- Menzies School of Health Research, Charles Darwin University, Darwin, Northern Territory, Australia
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Abstract
Vaccine informatics is an emerging research area that focuses on development and applications of bioinformatics methods that can be used to facilitate every aspect of the preclinical, clinical, and postlicensure vaccine enterprises. Many immunoinformatics algorithms and resources have been developed to predict T- and B-cell immune epitopes for epitope vaccine development and protective immunity analysis. Vaccine protein candidates are predictable in silico from genome sequences using reverse vaccinology. Systematic transcriptomics and proteomics gene expression analyses facilitate rational vaccine design and identification of gene responses that are correlates of protection in vivo. Mathematical simulations have been used to model host-pathogen interactions and improve vaccine production and vaccination protocols. Computational methods have also been used for development of immunization registries or immunization information systems, assessment of vaccine safety and efficacy, and immunization modeling. Computational literature mining and databases effectively process, mine, and store large amounts of vaccine literature and data. Vaccine Ontology (VO) has been initiated to integrate various vaccine data and support automated reasoning.
Collapse
|
20
|
Samayoa J, Yildiz FH, Karplus K. Identification of prokaryotic small proteins using a comparative genomic approach. ACTA ACUST UNITED AC 2011; 27:1765-71. [PMID: 21551138 DOI: 10.1093/bioinformatics/btr275] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Accurate prediction of genes encoding small proteins (on the order of 50 amino acids or less) remains an elusive open problem in bioinformatics. Some of the best methods for gene prediction use either sequence composition analysis or sequence similarity to a known protein coding sequence. These methods often fail for small proteins, however, either due to a lack of experimentally verified small protein coding genes or due to the limited statistical significance of statistics on small sequences. Our approach is based upon the hypothesis that true small proteins will be under selective pressure for encoding the particular amino acid sequence, for ease of translation by the ribosome and for structural stability. This stability can be achieved either independently or as part of a larger protein complex. Given this assumption, it follows that small proteins should display conserved local protein structure properties much like larger proteins. Our method incorporates neural-net predictions for three local structure alphabets within a comparative genomic approach using a genomic alignment of 22 closely related bacteria genomes to generate predictions for whether or not a given open reading frame (ORF) encodes for a small protein. RESULTS We have applied this method to the complete genome for Escherichia coli strain K12 and looked at how well our method performed on a set of 60 experimentally verified small proteins from this organism. Out of a total of 11 407 possible ORFs, we found that 6 of the top 10 and 27 of the top 100 predictions belonged to the set of 60 experimentally verified small proteins. We found 35 of all the true small proteins within the top 200 predictions. We compared our method to Glimmer, using a default Glimmer protocol and a modified small ORF Glimmer protocol with a lower minimum size cutoff. The default Glimmer protocol identified 16 of the true small proteins (all in the top 200 predictions), but failed to predict on 34 due to size cutoffs. The small ORF Glimmer protocol made predictions for all the experimentally verified small proteins but only contained 9 of the 60 true small proteins within the top 200 predictions. CONTACT jsamayoa@jhu.edu
Collapse
Affiliation(s)
- Josue Samayoa
- Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21218, USA.
| | | | | |
Collapse
|
21
|
Kumar K, Desai V, Cheng L, Khitrov M, Grover D, Satya RV, Yu C, Zavaljevski N, Reifman J. AGeS: a software system for microbial genome sequence annotation. PLoS One 2011; 6:e17469. [PMID: 21408217 PMCID: PMC3049762 DOI: 10.1371/journal.pone.0017469] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2010] [Accepted: 02/01/2011] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND The annotation of genomes from next-generation sequencing platforms needs to be rapid, high-throughput, and fully integrated and automated. Although a few Web-based annotation services have recently become available, they may not be the best solution for researchers that need to annotate a large number of genomes, possibly including proprietary data, and store them locally for further analysis. To address this need, we developed a standalone software application, the Annotation of microbial Genome Sequences (AGeS) system, which incorporates publicly available and in-house-developed bioinformatics tools and databases, many of which are parallelized for high-throughput performance. METHODOLOGY The AGeS system supports three main capabilities. The first is the storage of input contig sequences and the resulting annotation data in a central, customized database. The second is the annotation of microbial genomes using an integrated software pipeline, which first analyzes contigs from high-throughput sequencing by locating genomic regions that code for proteins, RNA, and other genomic elements through the Do-It-Yourself Annotation (DIYA) framework. The identified protein-coding regions are then functionally annotated using the in-house-developed Pipeline for Protein Annotation (PIPA). The third capability is the visualization of annotated sequences using GBrowse. To date, we have implemented these capabilities for bacterial genomes. AGeS was evaluated by comparing its genome annotations with those provided by three other methods. Our results indicate that the software tools integrated into AGeS provide annotations that are in general agreement with those provided by the compared methods. This is demonstrated by a >94% overlap in the number of identified genes, a significant number of identical annotated features, and a >90% agreement in enzyme function predictions.
Collapse
Affiliation(s)
- Kamal Kumar
- DoD Biotechnology High Performance Computing Software Applications
Institute, Telemedicine and Advanced Technology Research Center, U.S. Army
Medical Research and Materiel Command, Ft. Detrick, Maryland, United States of
America
| | - Valmik Desai
- DoD Biotechnology High Performance Computing Software Applications
Institute, Telemedicine and Advanced Technology Research Center, U.S. Army
Medical Research and Materiel Command, Ft. Detrick, Maryland, United States of
America
| | - Li Cheng
- DoD Biotechnology High Performance Computing Software Applications
Institute, Telemedicine and Advanced Technology Research Center, U.S. Army
Medical Research and Materiel Command, Ft. Detrick, Maryland, United States of
America
| | - Maxim Khitrov
- DoD Biotechnology High Performance Computing Software Applications
Institute, Telemedicine and Advanced Technology Research Center, U.S. Army
Medical Research and Materiel Command, Ft. Detrick, Maryland, United States of
America
| | - Deepak Grover
- DoD Biotechnology High Performance Computing Software Applications
Institute, Telemedicine and Advanced Technology Research Center, U.S. Army
Medical Research and Materiel Command, Ft. Detrick, Maryland, United States of
America
| | - Ravi Vijaya Satya
- DoD Biotechnology High Performance Computing Software Applications
Institute, Telemedicine and Advanced Technology Research Center, U.S. Army
Medical Research and Materiel Command, Ft. Detrick, Maryland, United States of
America
| | - Chenggang Yu
- DoD Biotechnology High Performance Computing Software Applications
Institute, Telemedicine and Advanced Technology Research Center, U.S. Army
Medical Research and Materiel Command, Ft. Detrick, Maryland, United States of
America
| | - Nela Zavaljevski
- DoD Biotechnology High Performance Computing Software Applications
Institute, Telemedicine and Advanced Technology Research Center, U.S. Army
Medical Research and Materiel Command, Ft. Detrick, Maryland, United States of
America
| | - Jaques Reifman
- DoD Biotechnology High Performance Computing Software Applications
Institute, Telemedicine and Advanced Technology Research Center, U.S. Army
Medical Research and Materiel Command, Ft. Detrick, Maryland, United States of
America
- * E-mail:
| |
Collapse
|
22
|
A commensal gone bad: complete genome sequence of the prototypical enterotoxigenic Escherichia coli strain H10407. J Bacteriol 2010; 192:5822-31. [PMID: 20802035 DOI: 10.1128/jb.00710-10] [Citation(s) in RCA: 140] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
In most cases, Escherichia coli exists as a harmless commensal organism, but it may on occasion cause intestinal and/or extraintestinal disease. Enterotoxigenic E. coli (ETEC) is the predominant cause of E. coli-mediated diarrhea in the developing world and is responsible for a significant portion of pediatric deaths. In this study, we determined the complete genomic sequence of E. coli H10407, a prototypical strain of enterotoxigenic E. coli, which reproducibly elicits diarrhea in human volunteer studies. We performed genomic and phylogenetic comparisons with other E. coli strains, revealing that the chromosome is closely related to that of the nonpathogenic commensal strain E. coli HS and to those of the laboratory strains E. coli K-12 and C. Furthermore, these analyses demonstrated that there were no chromosomally encoded factors unique to any sequenced ETEC strains. Comparison of the E. coli H10407 plasmids with those from several ETEC strains revealed that the plasmids had a mosaic structure but that several loci were conserved among ETEC strains. This study provides a genetic context for the vast amount of experimental and epidemiological data that have been published.
Collapse
|
23
|
Amthauer HA, Tsatsoulis C. Classifying genes to the correct Gene Ontology Slim term in Saccharomyces cerevisiae using neighbouring genes with classification learning. BMC Genomics 2010; 11:340. [PMID: 20509921 PMCID: PMC2890565 DOI: 10.1186/1471-2164-11-340] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2008] [Accepted: 05/28/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND There is increasing evidence that gene location and surrounding genes influence the functionality of genes in the eukaryotic genome. Knowing the Gene Ontology Slim terms associated with a gene gives us insight into a gene's functionality by informing us how its gene product behaves in a cellular context using three different ontologies: molecular function, biological process, and cellular component. In this study, we analyzed if we could classify a gene in Saccharomyces cerevisiae to its correct Gene Ontology Slim term using information about its location in the genome and information from its nearest-neighbouring genes using classification learning. RESULTS We performed experiments to establish that the MultiBoostAB algorithm using the J48 classifier could correctly classify Gene Ontology Slim terms of a gene given information regarding the gene's location and information from its nearest-neighbouring genes for training. Different neighbourhood sizes were examined to determine how many nearest neighbours should be included around each gene to provide better classification rules. Our results show that by just incorporating neighbour information from each gene's two-nearest neighbours, the percentage of correctly classified genes to their correct Gene Ontology Slim term for each ontology reaches over 80% with high accuracy (reflected in F-measures over 0.80) of the classification rules produced. CONCLUSIONS We confirmed that in classifying genes to their correct Gene Ontology Slim term, the inclusion of neighbour information from those genes is beneficial. Knowing the location of a gene and the Gene Ontology Slim information from neighbouring genes gives us insight into that gene's functionality. This benefit is seen by just including information from a gene's two-nearest neighbouring genes.
Collapse
Affiliation(s)
- Heather A Amthauer
- Department of Computer Science, Frostburg State University, Frostburg, Maryland, USA.
| | | |
Collapse
|
24
|
O'Toole PW, Snelling WJ, Canchaya C, Forde BM, Hardie KR, Josenhans C, Graham RL, McMullan G, Parkhill J, Belda E, Bentley SD. Comparative genomics and proteomics of Helicobacter mustelae, an ulcerogenic and carcinogenic gastric pathogen. BMC Genomics 2010; 11:164. [PMID: 20219135 PMCID: PMC2846917 DOI: 10.1186/1471-2164-11-164] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2009] [Accepted: 03/10/2010] [Indexed: 12/11/2022] Open
Abstract
Background Helicobacter mustelae causes gastritis, ulcers and gastric cancer in ferrets and other mustelids. H. mustelae remains the only helicobacter other than H. pylori that causes gastric ulceration and cancer in its natural host. To improve understanding of H. mustelae pathogenesis, and the ulcerogenic and carcinogenic potential of helicobacters in general, we sequenced the H. mustelae genome, and identified 425 expressed proteins in the envelope and cytosolic proteome. Results The H. mustelae genome lacks orthologs of major H. pylori virulence factors including CagA, VacA, BabA, SabA and OipA. However, it encodes ten autotransporter surface proteins, seven of which were detected in the expressed proteome, and which, except for the Hsr protein, are of unknown function. There are 26 putative outer membrane proteins in H. mustelae, some of which are most similar to the Hof proteins of H. pylori. Although homologs of putative virulence determinants of H. pylori (NapA, plasminogen adhesin, collagenase) and Campylobacter jejuni (CiaB, Peb4a) are present in the H. mustelae genome, it also includes a distinct complement of virulence-related genes including a haemagglutinin/haemolysin protein, and a glycosyl transferase for producing blood group A/B on its lipopolysaccharide. The most highly expressed 264 proteins in the cytosolic proteome included many corresponding proteins from H. pylori, but the rank profile in H. mustelae was distinctive. Of 27 genes shown to be essential for H. pylori colonization of the gerbil, all but three had orthologs in H. mustelae, identifying a shared set of core proteins for gastric persistence. Conclusions The determination of the genome sequence and expressed proteome of the ulcerogenic species H mustelae provides a comparative model for H. pylori to investigate bacterial gastric carcinogenesis in mammals, and to suggest ways whereby cag minus H. pylori strains might cause ulceration and cancer. The genome sequence was deposited in EMBL/GenBank/DDBJ under accession number FN555004.
Collapse
Affiliation(s)
- Paul W O'Toole
- Department of Microbiology, & Alimentary Pharmabiotic Centre, University College Cork, Cork, Ireland.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
25
|
Glöckner J, Kube M, Shrestha PM, Weber M, Glöckner FO, Reinhardt R, Liesack W. Phylogenetic diversity and metagenomics of candidate division OP3. Environ Microbiol 2010; 12:1218-29. [DOI: 10.1111/j.1462-2920.2010.02164.x] [Citation(s) in RCA: 77] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
26
|
Identifying translation initiation sites in prokaryotes using support vector machine. J Theor Biol 2010; 262:644-9. [DOI: 10.1016/j.jtbi.2009.10.023] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2009] [Revised: 10/12/2009] [Accepted: 10/12/2009] [Indexed: 11/17/2022]
|
27
|
Luo C, Hu GQ, Zhu H. Genome reannotation of Escherichia coli CFT073 with new insights into virulence. BMC Genomics 2009; 10:552. [PMID: 19930606 PMCID: PMC2785843 DOI: 10.1186/1471-2164-10-552] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2009] [Accepted: 11/22/2009] [Indexed: 11/30/2022] Open
Abstract
Background As one of human pathogens, the genome of Uropathogenic Escherichia coli strain CFT073 was sequenced and published in 2002, which was significant in pathogenetic bacterial genomics research. However, the current RefSeq annotation of this pathogen is now outdated to some degree, due to missing or misannotation of some essential genes associated with its virulence. We carried out a systematic reannotation by combining automated annotation tools with manual efforts to provide a comprehensive understanding of virulence for the CFT073 genome. Results The reannotation excluded 608 coding sequences from the RefSeq annotation. Meanwhile, a total of 299 coding sequences were newly added, about one third of them are found in genomic island (GI) regions while more than one fifth of them are located in virulence related regions pathogenicity islands (PAIs). Furthermore, there are totally 341 genes were relocated with their translational initiation sites (TISs), which resulted in a high quality of gene start annotation. In addition, 94 pseudogenes annotated in RefSeq were thoroughly inspected and updated. The number of miscellaneous genes (sRNAs) has been updated from 6 in RefSeq to 46 in the reannotation. Based on the adjustment in the reannotation, subsequent analysis were conducted by both general and case studies on new virulence factors or new virulence-associated genes that are crucial during the urinary tract infections (UTIs) process, including invasion, colonization, nutrition uptaking and population density control. Furthermore, miscellaneous RNAs collected in the reannotation are believed to contribute to the virulence of strain CFT073. The reannotation including the nucleotide data, the original RefSeq annotation, and all reannotated results is freely available via http://mech.ctb.pku.edu.cn/CFT073/. Conclusion As a result, the reannotation presents a more comprehensive picture of mechanisms of uropathogenicity of UPEC strain CFT073. The new genes change the view of its uropathogenicity in many respects, particularly by new genes in GI regions and new virulence-associated factors. The reannotation thus functions as an important source by providing new information about genomic structure and organization, and gene function. Moreover, we expect that the detailed analysis will facilitate the studies for exploration of novel virulence mechanisms and help guide experimental design.
Collapse
Affiliation(s)
- Chengwei Luo
- State Key Laboratory for Turbulence and Complex Systems, and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing 100871, China
| | | | | |
Collapse
|
28
|
Cao XJ, Dai J, Xu H, Nie S, Chang X, Hu BY, Sheng QH, Wang LS, Ning ZB, Li YX, Guo XK, Zhao GP, Zeng R. High-coverage proteome analysis reveals the first insight of protein modification systems in the pathogenic spirochete Leptospira interrogans. Cell Res 2009; 20:197-210. [PMID: 19918266 DOI: 10.1038/cr.2009.127] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
Leptospirosis is a widespread zoonotic disease caused by pathogenic spirochetes of the genus Leptospira that infects humans and a wide range of animals. By combining computational prediction and high-accuracy tandem mass spectra, we revised the genome annotation of Leptospira interrogans serovar Lai, a free-living pathogenic spirochete responsible for leptospirosis, providing substantial peptide evidence for novel genes and new gene boundaries. Subsequently, we presented a high-coverage proteome analysis of protein expression and multiple posttranslational modifications (PTMs). Approximately 64.3% of the predicted L. interrogans proteins were cataloged by detecting 2 540 proteins. Meanwhile, a profile of multiple PTMs was concurrently established, containing in total 32 phosphorylated, 46 acetylated and 155 methylated proteins. The PTM systems in the serovar Lai show unique features. Unique eukaryotic-like features of L. interrogans protein modifications were demonstrated in both phosphorylation and arginine methylation. This systematic analysis provides not only comprehensive information of high-coverage protein expression and multiple modifications in prokaryotes but also a view suggesting that the evolutionarily primitive L. interrogans shares significant similarities in protein modification systems with eukaryotes.
Collapse
Affiliation(s)
- Xing-Jun Cao
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Shanghai Institute for Biological Sciences, Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai 200031, China
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
29
|
Jeong H, Barbe V, Lee CH, Vallenet D, Yu DS, Choi SH, Couloux A, Lee SW, Yoon SH, Cattolico L, Hur CG, Park HS, Ségurens B, Kim SC, Oh TK, Lenski RE, Studier FW, Daegelen P, Kim JF. Genome sequences of Escherichia coli B strains REL606 and BL21(DE3). J Mol Biol 2009; 394:644-52. [PMID: 19786035 DOI: 10.1016/j.jmb.2009.09.052] [Citation(s) in RCA: 261] [Impact Index Per Article: 17.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2009] [Accepted: 09/21/2009] [Indexed: 12/23/2022]
Abstract
Escherichia coli K-12 and B have been the subjects of classical experiments from which much of our understanding of molecular genetics has emerged. We present here complete genome sequences of two E. coli B strains, REL606, used in a long-term evolution experiment, and BL21(DE3), widely used to express recombinant proteins. The two genomes differ in length by 72,304 bp and have 426 single base pair differences, a seemingly large difference for laboratory strains having a common ancestor within the last 67 years. Transpositions by IS1 and IS150 have occurred in both lineages. Integration of the DE3 prophage in BL21(DE3) apparently displaced a defective prophage in the lambda attachment site of B. As might have been anticipated from the many genetic and biochemical experiments comparing B and K-12 over the years, the B genomes are similar in size and organization to the genome of E. coli K-12 MG1655 and have >99% sequence identity over approximately 92% of their genomes. E. coli B and K-12 differ considerably in distribution of IS elements and in location and composition of larger mobile elements. An unexpected difference is the absence of a large cluster of flagella genes in B, due to a 41 kbp IS1-mediated deletion. Gene clusters that specify the LPS core, O antigen, and restriction enzymes differ substantially, presumably because of horizontal transfer. Comparative analysis of 32 independently isolated E. coli and Shigella genomes, both commensals and pathogenic strains, identifies a minimal set of genes in common plus many strain-specific genes that constitute a large E. coli pan-genome.
Collapse
Affiliation(s)
- Haeyoung Jeong
- Korea Research Institute of Bioscience and Biotechnology (KRIBB), 111 Gwahangno, Yuseong, Daejeon 305-806, Korea
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
30
|
Holden MTG, Hauser H, Sanders M, Ngo TH, Cherevach I, Cronin A, Goodhead I, Mungall K, Quail MA, Price C, Rabbinowitsch E, Sharp S, Croucher NJ, Chieu TB, Mai NTH, Diep TS, Chinh NT, Kehoe M, Leigh JA, Ward PN, Dowson CG, Whatmore AM, Chanter N, Iversen P, Gottschalk M, Slater JD, Smith HE, Spratt BG, Xu J, Ye C, Bentley S, Barrell BG, Schultsz C, Maskell DJ, Parkhill J. Rapid evolution of virulence and drug resistance in the emerging zoonotic pathogen Streptococcus suis. PLoS One 2009; 4:e6072. [PMID: 19603075 PMCID: PMC2705793 DOI: 10.1371/journal.pone.0006072] [Citation(s) in RCA: 188] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2009] [Accepted: 04/22/2009] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND Streptococcus suis is a zoonotic pathogen that infects pigs and can occasionally cause serious infections in humans. S. suis infections occur sporadically in human Europe and North America, but a recent major outbreak has been described in China with high levels of mortality. The mechanisms of S. suis pathogenesis in humans and pigs are poorly understood. METHODOLOGY/PRINCIPAL FINDINGS The sequencing of whole genomes of S. suis isolates provides opportunities to investigate the genetic basis of infection. Here we describe whole genome sequences of three S. suis strains from the same lineage: one from European pigs, and two from human cases from China and Vietnam. Comparative genomic analysis was used to investigate the variability of these strains. S. suis is phylogenetically distinct from other Streptococcus species for which genome sequences are currently available. Accordingly, approximately 40% of the approximately 2 Mb genome is unique in comparison to other Streptococcus species. Finer genomic comparisons within the species showed a high level of sequence conservation; virtually all of the genome is common to the S. suis strains. The only exceptions are three approximately 90 kb regions, present in the two isolates from humans, composed of integrative conjugative elements and transposons. Carried in these regions are coding sequences associated with drug resistance. In addition, small-scale sequence variation has generated pseudogenes in putative virulence and colonization factors. CONCLUSIONS/SIGNIFICANCE The genomic inventories of genetically related S. suis strains, isolated from distinct hosts and diseases, exhibit high levels of conservation. However, the genomes provide evidence that horizontal gene transfer has contributed to the evolution of drug resistance.
Collapse
Affiliation(s)
- Matthew T G Holden
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
31
|
Wilkinson P, Waterfield NR, Crossman L, Corton C, Sanchez-Contreras M, Vlisidou I, Barron A, Bignell A, Clark L, Ormond D, Mayho M, Bason N, Smith F, Simmonds M, Churcher C, Harris D, Thompson NR, Quail M, Parkhill J, Ffrench-Constant RH. Comparative genomics of the emerging human pathogen Photorhabdus asymbiotica with the insect pathogen Photorhabdus luminescens. BMC Genomics 2009; 10:302. [PMID: 19583835 PMCID: PMC2717986 DOI: 10.1186/1471-2164-10-302] [Citation(s) in RCA: 85] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2008] [Accepted: 07/07/2009] [Indexed: 01/05/2023] Open
Abstract
Background The Gram-negative bacterium Photorhabdus asymbiotica (Pa) has been recovered from human infections in both North America and Australia. Recently, Pa has been shown to have a nematode vector that can also infect insects, like its sister species the insect pathogen P. luminescens (Pl). To understand the relationship between pathogenicity to insects and humans in Photorhabdus we have sequenced the complete genome of Pa strain ATCC43949 from North America. This strain (formerly referred to as Xenorhabdus luminescens strain 2) was isolated in 1977 from the blood of an 80 year old female patient with endocarditis, in Maryland, USA. Here we compare the complete genome of Pa ATCC43949 with that of the previously sequenced insect pathogen P. luminescens strain TT01 which was isolated from its entomopathogenic nematode vector collected from soil in Trinidad and Tobago. Results We found that the human pathogen Pa had a smaller genome (5,064,808 bp) than that of the insect pathogen Pl (5,688,987 bp) but that each pathogen carries approximately one megabase of DNA that is unique to each strain. The reduced size of the Pa genome is associated with a smaller diversity in insecticidal genes such as those encoding the Toxin complexes (Tc's), Makes caterpillars floppy (Mcf) toxins and the Photorhabdus Virulence Cassettes (PVCs). The Pa genome, however, also shows the addition of a plasmid related to pMT1 from Yersinia pestis and several novel pathogenicity islands including a novel Type Three Secretion System (TTSS) encoding island. Together these data suggest that Pa may show virulence against man via the acquisition of the pMT1-like plasmid and specific effectors, such as SopB, that promote its persistence inside human macrophages. Interestingly the loss of insecticidal genes in Pa is not reflected by a loss of pathogenicity towards insects. Conclusion Our results suggest that North American isolates of Pa have acquired virulence against man via the acquisition of a plasmid and specific virulence factors with similarity to those shown to play roles in pathogenicity against humans in other bacteria.
Collapse
Affiliation(s)
- Paul Wilkinson
- School of Biosciences, University of Exeter in Cornwall, Penryn TR10 9EZ, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
32
|
Zihler A, Le Blay G, de Wouters T, Lacroix C, Braegger C, Lehner A, Tischler P, Rattei T, Hächler H, Stephan R. In vitroinhibition activity of different bacteriocin-producingEscherichia coliagainstSalmonellastrains isolated from clinical cases. Lett Appl Microbiol 2009; 49:31-8. [DOI: 10.1111/j.1472-765x.2009.02614.x] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
33
|
Abstract
As random shotgun metagenomic projects proliferate and become the dominant source of publicly available sequence data, procedures for the best practices in their execution and analysis become increasingly important. Based on our experience at the Joint Genome Institute, we describe the chain of decisions accompanying a metagenomic project from the viewpoint of the bioinformatic analysis step by step. We guide the reader through a standard workflow for a metagenomic project beginning with presequencing considerations such as community composition and sequence data type that will greatly influence downstream analyses. We proceed with recommendations for sampling and data generation including sample and metadata collection, community profiling, construction of shotgun libraries, and sequencing strategies. We then discuss the application of generic sequence processing steps (read preprocessing, assembly, and gene prediction and annotation) to metagenomic data sets in contrast to genome projects. Different types of data analyses particular to metagenomes are then presented, including binning, dominant population analysis, and gene-centric analysis. Finally, data management issues are presented and discussed. We hope that this review will assist bioinformaticians and biologists in making better-informed decisions on their journey during a metagenomic project.
Collapse
|
34
|
Evolutionary history of the phl gene cluster in the plant-associated bacterium Pseudomonas fluorescens. Appl Environ Microbiol 2009; 75:2122-31. [PMID: 19181839 DOI: 10.1128/aem.02052-08] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Pseudomonas fluorescens is of agricultural and economic importance as a biological control agent largely because of its plant association and production of secondary metabolites, in particular 2,4-diacetylphloroglucinol (2,4-DAPG). This polyketide, which is encoded by the eight-gene phl cluster, has antimicrobial effects on phytopathogens, promotes amino acid exudation from plant roots, and induces systemic resistance in plants. Despite its importance, 2,4-DAPG production is limited to a subset of P. fluorescens strains. Determination of the evolution of the phl cluster and understanding the selective pressures promoting its retention or loss in lineages of P. fluorescens will help in the development of P. fluorescens as a viable and effective inoculant for application in agriculture. In this study, genomic and sequence-based approaches were integrated to reconstruct the phylogeny of P. fluorescens and the phl cluster. It was determined that 2,4-DAPG production is an ancestral trait in the species P. fluorescens but that most lineages have lost this capacity through evolution. Furthermore, intragenomic recombination has relocated the phl cluster within the P. fluorescens genome at least three times, but the integrity of the cluster has always been maintained. The possible evolutionary and functional implications for retention of the phl cluster and 2,4-DAPG production in some lineages of P. fluorescens are discussed.
Collapse
|
35
|
Ward PN, Holden MTG, Leigh JA, Lennard N, Bignell A, Barron A, Clark L, Quail MA, Woodward J, Barrell BG, Egan SA, Field TR, Maskell D, Kehoe M, Dowson CG, Chanter N, Whatmore AM, Bentley SD, Parkhill J. Evidence for niche adaptation in the genome of the bovine pathogen Streptococcus uberis. BMC Genomics 2009; 10:54. [PMID: 19175920 PMCID: PMC2657157 DOI: 10.1186/1471-2164-10-54] [Citation(s) in RCA: 75] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2008] [Accepted: 01/28/2009] [Indexed: 01/28/2023] Open
Abstract
BACKGROUND Streptococcus uberis, a Gram positive bacterial pathogen responsible for a significant proportion of bovine mastitis in commercial dairy herds, colonises multiple body sites of the cow including the gut, genital tract and mammary gland. Comparative analysis of the complete genome sequence of S. uberis strain 0140J was undertaken to help elucidate the biology of this effective bovine pathogen. RESULTS The genome revealed 1,825 predicted coding sequences (CDSs) of which 62 were identified as pseudogenes or gene fragments. Comparisons with related pyogenic streptococci identified a conserved core (40%) of orthologous CDSs. Intriguingly, S. uberis 0140J displayed a lower number of mobile genetic elements when compared with other pyogenic streptococci, however bacteriophage-derived islands and a putative genomic island were identified. Comparative genomics analysis revealed most similarity to the genomes of Streptococcus agalactiae and Streptococcus equi subsp. zooepidemicus. In contrast, streptococcal orthologs were not identified for 11% of the CDSs, indicating either unique retention of ancestral sequence, or acquisition of sequence from alternative sources. Functions including transport, catabolism, regulation and CDSs encoding cell envelope proteins were over-represented in this unique gene set; a limited array of putative virulence CDSs were identified. CONCLUSION S. uberis utilises nutritional flexibility derived from a diversity of metabolic options to successfully occupy a discrete ecological niche. The features observed in S. uberis are strongly suggestive of an opportunistic pathogen adapted to challenging and changing environmental parameters.
Collapse
Affiliation(s)
- Philip N Ward
- Nuffield Department of Clinical Laboratory Sciences, Oxford University, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
| | - Matthew TG Holden
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - James A Leigh
- The School of Veterinary Medicine and Science, The University of Nottingham, Sutton Bonington Campus, Sutton Bonington, Leicestershire, LE12 5RD, UK
| | - Nicola Lennard
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Alexandra Bignell
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Andy Barron
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Louise Clark
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Michael A Quail
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - John Woodward
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Bart G Barrell
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Sharon A Egan
- The School of Veterinary Medicine and Science, The University of Nottingham, Sutton Bonington Campus, Sutton Bonington, Leicestershire, LE12 5RD, UK
| | - Terence R Field
- Institute for Animal Health, Compton Laboratory, Compton, Newbury, Berks, RG20 7NN, UK
| | - Duncan Maskell
- Dept. of Veterinary Medicine, The University of Cambridge, Cambridge, CB3 0ES, UK
| | - Michael Kehoe
- Institute for Cell and Molecular Biosciences, The Medical School, University of Newcastle upon Tyne, Framlington Place, Newcastle upon Tyne, NE2 4HH, UK
| | | | - Neil Chanter
- Centre for Preventative Medicine, Animal Health Trust, Newmarket, Suffolk, CB8 7UU, UK
| | - Adrian M Whatmore
- Department of Biological Sciences, University of Warwick, Coventry, CV4 7AL, UK
- Veterinary Laboratories Agency, Weybridge, UK
| | - Stephen D Bentley
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Julian Parkhill
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| |
Collapse
|
36
|
Hjerde E, Lorentzen MS, Holden MT, Seeger K, Paulsen S, Bason N, Churcher C, Harris D, Norbertczak H, Quail MA, Sanders S, Thurston S, Parkhill J, Willassen NP, Thomson NR. The genome sequence of the fish pathogen Aliivibrio salmonicida strain LFI1238 shows extensive evidence of gene decay. BMC Genomics 2008; 9:616. [PMID: 19099551 PMCID: PMC2627896 DOI: 10.1186/1471-2164-9-616] [Citation(s) in RCA: 71] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2008] [Accepted: 12/19/2008] [Indexed: 01/05/2023] Open
Abstract
Background The fish pathogen Aliivibrio salmonicida is the causative agent of cold-water vibriosis in marine aquaculture. The Gram-negative bacterium causes tissue degradation, hemolysis and sepsis in vivo. Results In total, 4 286 protein coding sequences were identified, and the 4.6 Mb genome of A. salmonicida has a six partite architecture with two chromosomes and four plasmids. Sequence analysis revealed a highly fragmented genome structure caused by the insertion of an extensive number of insertion sequence (IS) elements. The IS elements can be related to important evolutionary events such as gene acquisition, gene loss and chromosomal rearrangements. New A. salmonicida functional capabilities that may have been aquired through horizontal DNA transfer include genes involved in iron-acquisition, and protein secretion and play potential roles in pathogenicity. On the other hand, the degeneration of 370 genes and consequent loss of specific functions suggest that A. salmonicida has a reduced metabolic and physiological capacity in comparison to related Vibrionaceae species. Conclusion Most prominent is the loss of several genes involved in the utilisation of the polysaccharide chitin. In particular, the disruption of three extracellular chitinases responsible for enzymatic breakdown of chitin makes A. salmonicida unable to grow on the polymer form of chitin. These, and other losses could restrict the variety of carrier organisms A. salmonicida can attach to, and associate with. Gene acquisition and gene loss may be related to the emergence of A. salmonicida as a fish pathogen.
Collapse
Affiliation(s)
- Erik Hjerde
- Department of Molecular Biotechnology, Institute of Medical Biology, Faculty of Medicine, University of Tromsø, N-9037 Tromsø, Norway.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
37
|
Descorps-Declère S, Barba M, Labedan B. Matching curated genome databases: a non trivial task. BMC Genomics 2008; 9:501. [PMID: 18950477 PMCID: PMC2596144 DOI: 10.1186/1471-2164-9-501] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2008] [Accepted: 10/24/2008] [Indexed: 12/02/2022] Open
Abstract
Background Curated databases of completely sequenced genomes have been designed independently at the NCBI (RefSeq) and EBI (Genome Reviews) to cope with non-standard annotation found in the version of the sequenced genome that has been published by databanks GenBank/EMBL/DDBJ. These curation attempts were expected to review the annotations and to improve their pertinence when using them to annotate newly released genome sequences by homology to previously annotated genomes. However, we observed that such an uncoordinated effort has two unwanted consequences. First, it is not trivial to map the protein identifiers of the same sequence in both databases. Secondly, the two reannotated versions of the same genome differ at the level of their structural annotation. Results Here, we propose CorBank, a program devised to provide cross-referencing protein identifiers no matter what the level of identity is found between their matching sequences. Approximately 98% of the 1,983,258 amino acid sequences are matching, allowing instantaneous retrieval of their respective cross-references. CorBank further allows detecting any differences between the independently curated versions of the same genome. We found that the RefSeq and Genome Reviews versions are perfectly matching for only 50 of the 641 complete genomes we have analyzed. In all other cases there are differences occurring at the level of the coding sequence (CDS), and/or in the total number of CDS in the respective version of the same genome. CorBank is freely accessible at . The CorBank site contains also updated publication of the exhaustive results obtained by comparing RefSeq and Genome Reviews versions of each genome. Accordingly, this web site allows easy search of cross-references between RefSeq, Genome Reviews, and UniProt, for either a single CDS or a whole replicon. Conclusion CorBank is very efficient in rapid detection of the numerous differences existing between RefSeq and Genome Reviews versions of the same curated genome. Although such differences are acceptable as reflecting different views, we suggest that curators of both genome databases could help reducing further divergence by agreeing on a minimal dialogue and attempting to publish the point of view of the other database whenever it is technically possible.
Collapse
Affiliation(s)
- Stéphane Descorps-Declère
- Institut de Génétique et Microbiologie, Université Paris Sud XI, CNRS UMR 8621, Bât, 400, 91405 Orsay Cedex, France.
| | | | | |
Collapse
|
38
|
The genome of Burkholderia cenocepacia J2315, an epidemic pathogen of cystic fibrosis patients. J Bacteriol 2008; 191:261-77. [PMID: 18931103 PMCID: PMC2612433 DOI: 10.1128/jb.01230-08] [Citation(s) in RCA: 273] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
Bacterial infections of the lungs of cystic fibrosis (CF) patients cause major complications in the treatment of this common genetic disease. Burkholderia cenocepacia infection is particularly problematic since this organism has high levels of antibiotic resistance, making it difficult to eradicate; the resulting chronic infections are associated with severe declines in lung function and increased mortality rates. B. cenocepacia strain J2315 was isolated from a CF patient and is a member of the epidemic ET12 lineage that originated in Canada or the United Kingdom and spread to Europe. The 8.06-Mb genome of this highly transmissible pathogen comprises three circular chromosomes and a plasmid and encodes a broad array of functions typical of this metabolically versatile genus, as well as numerous virulence and drug resistance functions. Although B. cenocepacia strains can be isolated from soil and can be pathogenic to both plants and man, J2315 is representative of a lineage of B. cenocepacia rarely isolated from the environment and which spreads between CF patients. Comparative analysis revealed that ca. 21% of the genome is unique in comparison to other strains of B. cenocepacia, highlighting the genomic plasticity of this species. Pseudogenes in virulence determinants suggest that the pathogenic response of J2315 may have been recently selected to promote persistence in the CF lung. The J2315 genome contains evidence that its unique and highly adapted genetic content has played a significant role in its success as an epidemic CF pathogen.
Collapse
|
39
|
Klasson L, Walker T, Sebaihia M, Sanders MJ, Quail MA, Lord A, Sanders S, Earl J, O'Neill SL, Thomson N, Sinkins SP, Parkhill J. Genome evolution of Wolbachia strain wPip from the Culex pipiens group. Mol Biol Evol 2008; 25:1877-87. [PMID: 18550617 PMCID: PMC2515876 DOI: 10.1093/molbev/msn133] [Citation(s) in RCA: 189] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
The obligate intracellular bacterium Wolbachia pipientis strain wPip induces cytoplasmic incompatibility (CI), patterns of crossing sterility, in the Culex pipiens group of mosquitoes. The complete sequence is presented of the 1.48-Mbp genome of wPip which encodes 1386 coding sequences (CDSs), representing the first genome sequence of a B-supergroup Wolbachia. Comparisons were made with the smaller genomes of Wolbachia strains wMel of Drosophila melanogaster, an A-supergroup Wolbachia that is also a CI inducer, and wBm, a mutualist of Brugia malayi nematodes that belongs to the D-supergroup of Wolbachia. Despite extensive gene order rearrangement, a core set of Wolbachia genes shared between the 3 genomes can be identified and contrasts with a flexible gene pool where rapid evolution has taken place. There are much more extensive prophage and ankyrin repeat encoding (ANK) gene components of the wPip genome compared with wMel and wBm, and both are likely to be of considerable importance in wPip biology. Five WO-B-like prophage regions are present and contain some genes that are identical or highly similar in multiple prophage copies, whereas other genes are unique, and it is likely that extensive recombination, duplication, and insertion have occurred between copies. A much larger number of genes encode ankyrin repeat (ANK) proteins in wPip, with 60 present compared with 23 in wMel, many of which are within or close to the prophage regions. It is likely that this pattern is partly a result of expansions in the wPip lineage, due for example to gene duplication, but their presence is in some cases more ancient. The wPip genome underlines the considerable evolutionary flexibility of Wolbachia, providing clear evidence for the rapid evolution of ANK-encoding genes and of prophage regions. This host-Wolbachia system, with its complex patterns of sterility induced between populations, now provides an excellent model for unraveling the molecular systems underlying host reproductive manipulation.
Collapse
Affiliation(s)
- Lisa Klasson
- Peter Medawar Building for Pathogen Research and Department of Zoology, University of Oxford, Oxford, United Kingdom
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
40
|
Naito M, Hirakawa H, Yamashita A, Ohara N, Shoji M, Yukitake H, Nakayama K, Toh H, Yoshimura F, Kuhara S, Hattori M, Hayashi T, Nakayama K. Determination of the genome sequence of Porphyromonas gingivalis strain ATCC 33277 and genomic comparison with strain W83 revealed extensive genome rearrangements in P. gingivalis. DNA Res 2008; 15:215-25. [PMID: 18524787 PMCID: PMC2575886 DOI: 10.1093/dnares/dsn013] [Citation(s) in RCA: 202] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
The gram-negative anaerobic bacterium Porphyromonas gingivalis is a major causative agent of chronic periodontitis. Porphyromonas gingivalis strains have been classified into virulent and less-virulent strains by mouse subcutaneous soft tissue abscess model analysis. Here, we present the whole genome sequence of P. gingivalis ATCC 33277, which is classified as a less-virulent strain. We identified 2090 protein-coding sequences (CDSs), 4 RNA operons, and 53 tRNA genes in the ATCC 33277 genome. By genomic comparison with the virulent strain W83, we identified 461 ATCC 33277-specific and 415 W83-specific CDSs. Extensive genomic rearrangements were observed between the two strains: 175 regions in which genomic rearrangements have occurred were identified. Thirty-five of those genomic rearrangements were inversion or translocation and 140 were simple insertion, deletion, or replacement. Both strains contained large numbers of mobile elements, such as insertion sequences, miniature inverted-repeat transposable elements (MITEs), and conjugative transposons, which are frequently associated with genomic rearrangements. These findings indicate that the mobile genetic elements have been deeply involved in the extensive genome rearrangement of P. gingivalis and the occurrence of many of the strain-specific CDSs. We also describe here a very unique feature of MITE400, which we renamed MITEPgRS (MITE of P. gingivalis with Repeating Sequences).
Collapse
Affiliation(s)
- Mariko Naito
- Division of Microbiology and Oral Infection, Department of Molecular Microbiology and Immunology, Nagasaki University Graduate School of Biomedical Sciences, Sakamoto 1-7-1, Nagasaki, Japan
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
41
|
Singhal P, Jayaram B, Dixit SB, Beveridge DL. Prokaryotic gene finding based on physicochemical characteristics of codons calculated from molecular dynamics simulations. Biophys J 2008; 94:4173-83. [PMID: 18326660 PMCID: PMC2480686 DOI: 10.1529/biophysj.107.116392] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2007] [Accepted: 11/29/2007] [Indexed: 01/27/2023] Open
Abstract
An ab initio model for gene prediction in prokaryotic genomes is proposed based on physicochemical characteristics of codons calculated from molecular dynamics (MD) simulations. The model requires a specification of three calculated quantities for each codon: the double-helical trinucleotide base pairing energy, the base pair stacking energy, and an index of the propensity of a codon for protein-nucleic acid interactions. The base pairing and stacking energies for each codon are obtained from recently reported MD simulations on all unique tetranucleotide steps, and the third parameter is assigned based on the conjugate rule previously proposed to account for the wobble hypothesis with respect to degeneracies in the genetic code. The third interaction propensity parameter values correlate well with ab initio MD calculated solvation energies and flexibility of codon sequences as well as codon usage in genes and amino acid composition frequencies in approximately 175,000 protein sequences in the Swissprot database. Assignment of these three parameters for each codon enables the calculation of the magnitude and orientation of a cumulative three-dimensional vector for a DNA sequence of any length in each of the six genomic reading frames. Analysis of 372 genomes comprising approximately 350,000 genes shows that the orientations of the gene and nongene vectors are well differentiated and make a clear distinction feasible between genic and nongenic sequences at a level equivalent to or better than currently available knowledge-based models trained on the basis of empirical data, presenting a strong support for the possibility of a unique and useful physicochemical characterization of DNA sequences from codons to genomes.
Collapse
Affiliation(s)
- Poonam Singhal
- Department of Chemistry and Supercomputing Facility for Bioinformatics and Computational Biology, Indian Institute of Technology, Hauz Khas, New Delhi 110016, India
| | | | | | | |
Collapse
|
42
|
Stinear TP, Seemann T, Harrison PF, Jenkin GA, Davies JK, Johnson PDR, Abdellah Z, Arrowsmith C, Chillingworth T, Churcher C, Clarke K, Cronin A, Davis P, Goodhead I, Holroyd N, Jagels K, Lord A, Moule S, Mungall K, Norbertczak H, Quail MA, Rabbinowitsch E, Walker D, White B, Whitehead S, Small PLC, Brosch R, Ramakrishnan L, Fischbach MA, Parkhill J, Cole ST. Insights from the complete genome sequence of Mycobacterium marinum on the evolution of Mycobacterium tuberculosis. Genome Res 2008; 18:729-41. [PMID: 18403782 PMCID: PMC2336800 DOI: 10.1101/gr.075069.107] [Citation(s) in RCA: 387] [Impact Index Per Article: 24.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Mycobacterium marinum, a ubiquitous pathogen of fish and amphibia, is a near relative of Mycobacterium tuberculosis, the etiologic agent of tuberculosis in humans. The genome of the M strain of M. marinum comprises a 6,636,827-bp circular chromosome with 5424 CDS, 10 prophages, and a 23-kb mercury-resistance plasmid. Prominent features are the very large number of genes (57) encoding polyketide synthases (PKSs) and nonribosomal peptide synthases (NRPSs) and the most extensive repertoire yet reported of the mycobacteria-restricted PE and PPE proteins, and related-ESX secretion systems. Some of the NRPS genes comprise a novel family and seem to have been acquired horizontally. M. marinum is used widely as a model organism to study M. tuberculosis pathogenesis, and genome comparisons confirmed the close genetic relationship between these two species, as they share 3000 orthologs with an average amino acid identity of 85%. Comparisons with the more distantly related Mycobacterium avium subspecies paratuberculosis and Mycobacterium smegmatis reveal how an ancestral generalist mycobacterium evolved into M. tuberculosis and M. marinum. M. tuberculosis has undergone genome downsizing and extensive lateral gene transfer to become a specialized pathogen of humans and other primates without retaining an environmental niche. M. marinum has maintained a large genome so as to retain the capacity for environmental survival while becoming a broad host range pathogen that produces disease strikingly similar to M. tuberculosis. The work described herein provides a foundation for using M. marinum to better understand the determinants of pathogenesis of tuberculosis.
Collapse
Affiliation(s)
- Timothy P Stinear
- Department of Microbiology, Monash University, Clayton 3800, Australia.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
43
|
Evolution in the laboratory: the genome of Halobacterium salinarum strain R1 compared to that of strain NRC-1. Genomics 2008; 91:335-46. [PMID: 18313895 DOI: 10.1016/j.ygeno.2008.01.001] [Citation(s) in RCA: 114] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2007] [Revised: 12/12/2007] [Accepted: 01/02/2008] [Indexed: 01/23/2023]
Abstract
We report the sequence of the Halobacterium salinarum strain R1 chromosome and its four megaplasmids. Our set of protein-coding genes is supported by extensive proteomic and sequence homology data. The structures of the plasmids, which show three large-scale duplications (adding up to 100 kb), were unequivocally confirmed by cosmid analysis. The chromosome of strain R1 is completely colinear and virtually identical to that of strain NRC-1. Correlation of the plasmid sequences revealed 210 kb of sequence that occurs only in strain R1. The remaining 350 kb shows virtual sequence identity in the two strains. Nevertheless, the number and overall structure of the plasmids are largely incompatible. Also, 20% of the protein sequences differ despite the near identity at the DNA sequence level. Finally, we report genome-wide mobility data for insertion sequences from which we conclude that strains R1 and NRC-1 originate from the same natural isolate. This exemplifies evolution in the laboratory.
Collapse
|
44
|
Bentley SD, Corton C, Brown SE, Barron A, Clark L, Doggett J, Harris B, Ormond D, Quail MA, May G, Francis D, Knudson D, Parkhill J, Ishimaru CA. Genome of the actinomycete plant pathogen Clavibacter michiganensis subsp. sepedonicus suggests recent niche adaptation. J Bacteriol 2008; 190:2150-60. [PMID: 18192393 PMCID: PMC2258862 DOI: 10.1128/jb.01598-07] [Citation(s) in RCA: 75] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2007] [Accepted: 01/01/2008] [Indexed: 12/21/2022] Open
Abstract
Clavibacter michiganensis subsp. sepedonicus is a plant-pathogenic bacterium and the causative agent of bacterial ring rot, a devastating agricultural disease under strict quarantine control and zero tolerance in the seed potato industry. This organism appears to be largely restricted to an endophytic lifestyle, proliferating within plant tissues and unable to persist in the absence of plant material. Analysis of the genome sequence of C. michiganensis subsp. sepedonicus and comparison with the genome sequences of related plant pathogens revealed a dramatic recent evolutionary history. The genome contains 106 insertion sequence elements, which appear to have been active in extensive rearrangement of the chromosome compared to that of Clavibacter michiganensis subsp. michiganensis. There are 110 pseudogenes with overrepresentation in functions associated with carbohydrate metabolism, transcriptional regulation, and pathogenicity. Genome comparisons also indicated that there is substantial gene content diversity within the species, probably due to differential gene acquisition and loss. These genomic features and evolutionary dating suggest that there was recent adaptation for life in a restricted niche where nutrient diversity and perhaps competition are low, correlated with a reduced ability to exploit previously occupied complex niches outside the plant. Toleration of factors such as multiplication and integration of insertion sequence elements, genome rearrangements, and functional disruption of many genes and operons seems to indicate that there has been general relaxation of selective pressure on a large proportion of the genome.
Collapse
Affiliation(s)
- Stephen D Bentley
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, United Kingdom
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
45
|
Bechtel JM, Rajesh P, Ilikchyan I, Deng Y, Mishra PK, Wang Q, Wu X, Afonin KA, Grose WE, Wang Y, Khuder S, Fedorov A. Calculation of splicing potential from the Alternative Splicing Mutation Database. BMC Res Notes 2008; 1:4. [PMID: 18611287 PMCID: PMC2518266 DOI: 10.1186/1756-0500-1-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2008] [Accepted: 02/26/2008] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND The Alternative Splicing Mutation Database (ASMD) presents a collection of all known mutations inside human exons which affect splicing enhancers and silencers and cause changes in the alternative splicing pattern of the corresponding genes. FINDINGS An algorithm was developed to derive a Splicing Potential (SP) table from the ASMD information. This table characterizes the influence of each oligonucleotide on the splicing effectiveness of the exon containing it. If the SP value for an oligonucleotide is positive, it promotes exon retention, while negative SP values mean the sequence favors exon skipping. The merit of the SP approach is the ability to separate splicing signals from a wide range of sequence motifs enriched in exonic sequences that are attributed to protein-coding properties and/or translation efficiency. Due to its direct derivation from observed splice site selection, SP has an advantage over other computational approaches for predicting alternative splicing. CONCLUSION We show that a vast majority of known exonic splicing enhancers have highly positive cumulative SP values, while known splicing silencers have core motifs with strongly negative cumulative SP values. Our approach allows for computation of the cumulative SP value of any sequence segment and, thus, gives researchers the ability to measure the possible contribution of any sequence to the pattern of splicing.
Collapse
Affiliation(s)
- Jason M Bechtel
- Program in Bioinformatics and Proteomics/Genomics, University of Toledo Health Science Campus, Toledo, Ohio 43614, USA.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
46
|
Goto T, Yamashita A, Hirakawa H, Matsutani M, Todo K, Ohshima K, Toh H, Miyamoto K, Kuhara S, Hattori M, Shimizu T, Akimoto S. Complete genome sequence of Finegoldia magna, an anaerobic opportunistic pathogen. DNA Res 2008; 15:39-47. [PMID: 18263572 PMCID: PMC2650633 DOI: 10.1093/dnares/dsm030] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Finegoldia magna (formerly Peptostreptococcus magnus), a member of the Gram-positive anaerobic cocci (GPAC), is a commensal bacterium colonizing human skin and mucous membranes. Moreover, it is also recognized as an opportunistic pathogen responsible for various infectious diseases. Here, we report the complete genome sequence of F. magna ATCC 29328. The genome consists of a 1 797 577 bp circular chromosome and an 189 163 bp plasmid (pPEP1). The metabolic maps constructed based on the genome information confirmed that most F. magna strains cannot ferment most sugars, except fructose, and have various aminopeptidase activities. Three homologs of albumin-binding protein, a known virulence factor useful for antiphagocytosis, are encoded on the chromosome, and one albumin-binding protein homolog is encoded on the plasmid. A unique feature of the genome is that F. magna encodes many sortase genes, of which substrates may be involved in bacterial pathogenesis, such as antiphagocytosis and adherence to the host cell. The plasmid pPEP1 encodes seven sortase and seven substrate genes, whereas the chromosome encodes four sortase and 19 substrate genes. These plasmid-encoded sortases may play important roles in the pathogenesis of F. magna by enriching the variety of cell wall anchored surface proteins.
Collapse
Affiliation(s)
- Takatsugu Goto
- Department of Microbiology, Wakayama Medical University, 811-1 Kimiidera, Wakayama, Wakayama 641-0012, Japan.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
47
|
Grimm M, Stephan R, Iversen C, Manzardo GGG, Rattei T, Riedel K, Ruepp A, Frishman D, Lehner A. Cellulose as an extracellular matrix component present in Enterobacter sakazakii biofilms. J Food Prot 2008; 71:13-8. [PMID: 18236657 DOI: 10.4315/0362-028x-71.1.13] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Cellulose was identified and characterized as an extracellular matrix component present in the biofilm of an Enterobacter sakazakii clinical isolate grown in nutrient-deficient (M9) medium. Using a bacterial artificial cloning approach in Escherichia coli and subsequent screening of transformants for fluorescence on calcofluor plates, nine genes organized in two operons were identified as putatively responsible for the biosynthesis of cellulose. In addition to the genes already described for cellulose production, two more genes were identified, putatively transcribed together with the genes from the first operon. Putative cellulose in E. sakazakii ES5 biofilm grown on glass coverslips was visualized by calcofluor staining and confocal fluorescence laser scanning microscopy. For the first time, the presence of cellulose in biofilms produced by E. sakazakii was confirmed by methylation analysis.
Collapse
Affiliation(s)
- Maya Grimm
- Institute for Food Safety and Hygiene, Vetsuisse Faculty, University of Zurich, CH-8057 Zurich, Switzerland
| | | | | | | | | | | | | | | | | |
Collapse
|
48
|
Kang S, Yang SJ, Kim S, Bhak J. CONSORF: a consensus prediction system for prokaryotic coding sequences. Bioinformatics 2007; 23:3088-90. [DOI: 10.1093/bioinformatics/btm512] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
49
|
Hu GQ, Zheng X, Yang YF, Ortet P, She ZS, Zhu H. ProTISA: a comprehensive resource for translation initiation site annotation in prokaryotic genomes. Nucleic Acids Res 2007; 36:D114-9. [PMID: 17942412 PMCID: PMC2238952 DOI: 10.1093/nar/gkm799] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Correct annotation of translation initiation site (TIS) is essential for both experiments and bioinformatics studies of prokaryotic translation initiation mechanism as well as understanding of gene regulation and gene structure. Here we describe a comprehensive database ProTISA, which collects TIS confirmed through a variety of available evidences for prokaryotic genomes, including Swiss-Prot experiments record, literature, conserved domain hits and sequence alignment between orthologous genes. Moreover, by combining the predictions from our recently developed TIS post-processor, ProTISA provides a refined annotation for the public database RefSeq. Furthermore, the database annotates the potential regulatory signals associated with translation initiation at the TIS upstream region. As of July 2007, ProTISA includes 440 microbial genomes with more than 390 000 confirmed TISs. The database is available at http://mech.ctb.pku.edu.cn/protisa
Collapse
Affiliation(s)
- Gang-Qing Hu
- State Key Lab for Turbulence and Complex System and Department of Biomedical Engineering, Peking University, Beijing 100871, China
| | | | | | | | | | | |
Collapse
|
50
|
Qin T, Hirakawa H, Iida KI, Oshima K, Hattori M, Tashiro K, Kuhara S, Yoshida SI. Complete nucleotide sequence of pLD-TEX-KL, a 66-kb plasmid of Legionella dumoffii TEX-KL strain. Plasmid 2007; 58:261-8. [PMID: 17881053 DOI: 10.1016/j.plasmid.2007.08.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2007] [Revised: 07/12/2007] [Accepted: 08/01/2007] [Indexed: 10/22/2022]
Abstract
The complete nucleotide sequence of a large (66 kb) plasmid pLD-TEX-KL of Legionella dumoffii TEX-KL strain was determined. Of the 57 predicted open reading frames (ORFs), 39 (68%) encoded proteins similar to previously known proteins, five (9%) were assigned with putative functions, three (5%) encoded conserved hypothetical proteins, and 10 (18%) had no homology to any genes present in the current open databases. The ORFs with similar functions were organized in a modular structure; thus, transfer region was identified, as well as a putative heavy-metal ion transporter system (hel). The transfer region encoded homologs of the Salmonella entrica serovar Typhi conjugative system components involved in conjugation. In addition, we also found a potential protein that was analogous to the DNA polymerase III epsilon subunit. It is rarely found that plasmid encode the DNA polymerase.
Collapse
Affiliation(s)
- Tian Qin
- Department of Bacteriology, Faculty of Medical Sciences, Kyushu University, Fukuoka 812-8582, Japan.
| | | | | | | | | | | | | | | |
Collapse
|