1
|
Greenberg G, Shomorony I. Improving bacterial genome assembly using a test of strand orientation. Bioinformatics 2022; 38:ii34-ii41. [PMID: 36124787 DOI: 10.1093/bioinformatics/btac516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
SUMMARY The complexity of genome assembly is due in large part to the presence of repeats. In particular, large reverse-complemented repeats can lead to incorrect inversions of large segments of the genome. To detect and correct such inversions in finished bacterial genomes, we propose a statistical test based on tetranucleotide frequency (TNF), which determines whether two segments from the same genome are of the same or opposite orientation. In most cases, the test neatly partitions the genome into two segments of roughly equal length with seemingly opposite orientations. This corresponds to the segments between the DNA replication origin and terminus, which were previously known to have distinct nucleotide compositions. We show that, in several cases where this balanced partition is not observed, the test identifies a potential inverted misassembly, which is validated by the presence of a reverse-complemented repeat at the boundaries of the inversion. After inverting the sequence between the repeat, the balance of the misassembled genome is restored. Our method identifies 31 potential misassemblies in the NCBI database, several of which are further supported by a reassembly of the read data. AVAILABILITY AND IMPLEMENTATION A github repository is available at https://github.com/gcgreenberg/Oriented-TNF.git. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Grant Greenberg
- Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign
| | - Ilan Shomorony
- Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign
| |
Collapse
|
2
|
Sperlea T, Muth L, Martin R, Weigel C, Waldminghaus T, Heider D. gammaBOriS: Identification and Taxonomic Classification of Origins of Replication in Gammaproteobacteria using Motif-based Machine Learning. Sci Rep 2020; 10:6727. [PMID: 32317695 PMCID: PMC7174414 DOI: 10.1038/s41598-020-63424-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Accepted: 03/31/2020] [Indexed: 01/23/2023] Open
Abstract
The biology of bacterial cells is, in general, based on information encoded on circular chromosomes. Regulation of chromosome replication is an essential process that mostly takes place at the origin of replication (oriC), a locus unique per chromosome. Identification of high numbers of oriC is a prerequisite for systematic studies that could lead to insights into oriC functioning as well as the identification of novel drug targets for antibiotic development. Current methods for identifying oriC sequences rely on chromosome-wide nucleotide disparities and are therefore limited to fully sequenced genomes, leaving a large number of genomic fragments unstudied. Here, we present gammaBOriS (Gammaproteobacterial oriC Searcher), which identifies oriC sequences on gammaproteobacterial chromosomal fragments. It does so by employing motif-based machine learning methods. Using gammaBOriS, we created BOriS DB, which currently contains 25,827 gammaproteobacterial oriC sequences from 1,217 species, thus making it the largest available database for oriC sequences to date. Furthermore, we present gammaBOriTax, a machine-learning based approach for taxonomic classification of oriC sequences, which was trained on the sequences in BOriS DB. Finally, we extracted the motifs relevant for identification and classification decisions of the models. Our results suggest that machine learning sequence classification approaches can offer great support in functional motif identification.
Collapse
Affiliation(s)
- Theodor Sperlea
- Faculty of Mathematics and Computer Science, University of Marburg, Hans-Meerwein-Str. 6, D-35032, Marburg, Lahn, Germany
| | - Lea Muth
- Faculty of Mathematics and Computer Science, University of Marburg, Hans-Meerwein-Str. 6, D-35032, Marburg, Lahn, Germany
| | - Roman Martin
- Faculty of Mathematics and Computer Science, University of Marburg, Hans-Meerwein-Str. 6, D-35032, Marburg, Lahn, Germany
| | - Christoph Weigel
- Institute of Biotechnology, Faculty III, Technische Universität Berlin (TUB), Straße des 17. Juni 135, D-10623, Berlin, Germany
| | - Torsten Waldminghaus
- Chromosome Biology Group, LOEWE Center for Synthetic Microbiology (SYNMIKRO), Philipps-Universität Marburg, D-35043, Marburg, Lahn, Germany
| | - Dominik Heider
- Faculty of Mathematics and Computer Science, University of Marburg, Hans-Meerwein-Str. 6, D-35032, Marburg, Lahn, Germany.
| |
Collapse
|
3
|
Distinct evolutionary origins of common multi-drug resistance phenotypes in Salmonella typhimurium DT104: a convergent process for adaptation under stress. Mol Genet Genomics 2019; 294:597-605. [PMID: 30710177 DOI: 10.1007/s00438-019-01531-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2018] [Accepted: 01/11/2019] [Indexed: 10/27/2022]
Abstract
Antimicrobial resistance makes pathogenic bacteria hard to control, but little is known about the general processes of resistance gain or loss. Here, we compared distinct S. typhimurium DT104 strains resistant to zero, two, five, or more of the tested antimicrobials. We found that common resistance phenotypes could be encoded by distinct genes, on SGI-1 or plasmid. We also demonstrated close clonality among all the tested non-resistant and differently resistant DT104 strains, demonstrating dynamic acquisition or loss (by total deletion or gradual decaying of multi-drug resistance gene clusters) of the genetic traits. These findings reflect convergent processes to make the bacteria resistant to multiple antimicrobials by acquiring the needed traits from stochastically available origins. When the antimicrobial stress is absent, the resistance genes may be dropped off quickly, so the bacteria can save the cost for maintaining unneeded genes. Therefore, this work reiterates the importance of strictly controlled use of antimicrobials.
Collapse
|
4
|
diCenzo GC, Finan TM. The Divided Bacterial Genome: Structure, Function, and Evolution. Microbiol Mol Biol Rev 2017; 81:e00019-17. [PMID: 28794225 PMCID: PMC5584315 DOI: 10.1128/mmbr.00019-17] [Citation(s) in RCA: 134] [Impact Index Per Article: 19.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Approximately 10% of bacterial genomes are split between two or more large DNA fragments, a genome architecture referred to as a multipartite genome. This multipartite organization is found in many important organisms, including plant symbionts, such as the nitrogen-fixing rhizobia, and plant, animal, and human pathogens, including the genera Brucella, Vibrio, and Burkholderia. The availability of many complete bacterial genome sequences means that we can now examine on a broad scale the characteristics of the different types of DNA molecules in a genome. Recent work has begun to shed light on the unique properties of each class of replicon, the unique functional role of chromosomal and nonchromosomal DNA molecules, and how the exploitation of novel niches may have driven the evolution of the multipartite genome. The aims of this review are to (i) outline the literature regarding bacterial genomes that are divided into multiple fragments, (ii) provide a meta-analysis of completed bacterial genomes from 1,708 species as a way of reviewing the abundant information present in these genome sequences, and (iii) provide an encompassing model to explain the evolution and function of the multipartite genome structure. This review covers, among other topics, salient genome terminology; mechanisms of multipartite genome formation; the phylogenetic distribution of multipartite genomes; how each part of a genome differs with respect to genomic signatures, genetic variability, and gene functional annotation; how each DNA molecule may interact; as well as the costs and benefits of this genome structure.
Collapse
Affiliation(s)
- George C diCenzo
- Department of Biology, McMaster University, Hamilton, Ontario, Canada
| | - Turlough M Finan
- Department of Biology, McMaster University, Hamilton, Ontario, Canada
| |
Collapse
|
5
|
Saini S, Dewan L. Application of discrete wavelet transform for analysis of genomic sequences of Mycobacterium tuberculosis. SPRINGERPLUS 2016; 5:64. [PMID: 26839757 PMCID: PMC4722049 DOI: 10.1186/s40064-016-1668-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/05/2015] [Accepted: 01/04/2016] [Indexed: 12/04/2022]
Abstract
This paper highlights the potential of discrete wavelet transforms in the analysis and comparison of genomic sequences of Mycobacterium tuberculosis (MTB) with different resistance characteristics. Graphical representations of wavelet coefficients and statistical estimates of their parameters have been used to determine the extent of similarity between different sequences of MTB without the use of conventional methods such as Basic Local Alignment Search Tool. Based on the calculation of the energy of wavelet decomposition coefficients of complete genomic sequences, their broad classification of the type of resistance can be done. All the given genomic sequences can be grouped into two broad categories wherein the drug resistant and drug susceptible sequences form one group while the multidrug resistant and extensive drug resistant sequences form the other group. This method of segregation of the sequences is faster than conventional laboratory methods which require 3–4 weeks of culture of sputum samples. Thus the proposed method can be used as a tool to enhance clinical diagnostic investigations in near real-time.
Collapse
Affiliation(s)
- Shiwani Saini
- Department of Electrical Engineering, National Institute of Technology, Kurukshetra, Haryana 136119 India
| | - Lillie Dewan
- Department of Electrical Engineering, National Institute of Technology, Kurukshetra, Haryana 136119 India
| |
Collapse
|
6
|
Larson MA, Nalbantoglu U, Sayood K, Zentz EB, Bartling AM, Francesconi SC, Fey PD, Dempsey MP, Hinrichs SH. Francisella tularensis Subtype A.II Genomic Plasticity in Comparison with Subtype A.I. PLoS One 2015; 10:e0124906. [PMID: 25918839 PMCID: PMC4412822 DOI: 10.1371/journal.pone.0124906] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2015] [Accepted: 03/09/2015] [Indexed: 11/26/2022] Open
Abstract
Although Francisella tularensis is considered a monomorphic intracellular pathogen, molecular genotyping and virulence studies have demonstrated important differences within the tularensis subspecies (type A). To evaluate genetic variation within type A strains, sequencing and assembly of a new subtype A.II genome was achieved for comparison to other completed F. tularensis type A genomes. In contrast with the F. tularensis A.I strains (SCHU S4, FSC198, NE061598, and TI0902), substantial genomic variation was observed between the newly sequenced F. tularensis A.II strain (WY-00W4114) and the only other publically available A.II strain (WY96-3418). Genome differences between WY-00W4114 and WY96-3418 included three major chromosomal translocations, 1580 indels, and 286 nucleotide substitutions of which 159 were observed in predicted open reading frames and 127 were located in intergenic regions. The majority of WY-00W4114 nucleotide deletions occurred in intergenic regions, whereas most of the insertions and substitutions occurred in predicted genes. Of the nucleotide substitutions, 48 (30%) were synonymous and 111 (70%) were nonsynonymous. WY-00W4114 and WY96-3418 nucleotide polymorphisms were predominantly G/C to A/T allelic mutations, with WY-00W4114 having more A+T enrichment. In addition, the A.II genomes contained a considerably higher number of intact genes and longer repetitive sequences, including transposon remnants than the A.I genomes. Together these findings support the premise that F. tularensis A.II may have a fitness advantage compared to the A.I subtype due to the higher abundance of functional genes and repeated chromosomal sequences. A better understanding of the selective forces driving F. tularensis genetic diversity and plasticity is needed.
Collapse
Affiliation(s)
- Marilynn A. Larson
- Department of Pathology and Microbiology, University of Nebraska Medical Center, Omaha, Nebraska, United States of America
- * E-mail:
| | - Ufuk Nalbantoglu
- Department of Electrical Engineering, University of Nebraska-Lincoln, Lincoln, Nebraska, United States of America
| | - Khalid Sayood
- Department of Electrical Engineering, University of Nebraska-Lincoln, Lincoln, Nebraska, United States of America
| | - Emily B. Zentz
- OpGen Inc., Gaithersburg, Maryland, United States of America
| | - Amanda M. Bartling
- Department of Pathology and Microbiology, University of Nebraska Medical Center, Omaha, Nebraska, United States of America
| | | | - Paul D. Fey
- Department of Pathology and Microbiology, University of Nebraska Medical Center, Omaha, Nebraska, United States of America
| | - Michael P. Dempsey
- United States Air Force School of Aerospace Medicine, Wright-Patterson Air Force Base, Ohio, United States of America
| | - Steven H. Hinrichs
- Department of Pathology and Microbiology, University of Nebraska Medical Center, Omaha, Nebraska, United States of America
| |
Collapse
|
7
|
diCenzo GC, MacLean AM, Milunovic B, Golding GB, Finan TM. Examination of prokaryotic multipartite genome evolution through experimental genome reduction. PLoS Genet 2014; 10:e1004742. [PMID: 25340565 PMCID: PMC4207669 DOI: 10.1371/journal.pgen.1004742] [Citation(s) in RCA: 74] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2014] [Accepted: 09/08/2014] [Indexed: 01/12/2023] Open
Abstract
Many bacteria carry two or more chromosome-like replicons. This occurs in pathogens such as Vibrio cholerea and Brucella abortis as well as in many N2-fixing plant symbionts including all isolates of the alfalfa root-nodule bacteria Sinorhizobium meliloti. Understanding the evolution and role of this multipartite genome organization will provide significant insight into these important organisms; yet this knowledge remains incomplete, in part, because technical challenges of large-scale genome manipulations have limited experimental analyses. The distinct evolutionary histories and characteristics of the three replicons that constitute the S. meliloti genome (the chromosome (3.65 Mb), pSymA megaplasmid (1.35 Mb), and pSymB chromid (1.68 Mb)) makes this a good model to examine this topic. We transferred essential genes from pSymB into the chromosome, and constructed strains that lack pSymB as well as both pSymA and pSymB. This is the largest reduction (45.4%, 3.04 megabases, 2866 genes) of a prokaryotic genome to date and the first removal of an essential chromid. Strikingly, strains lacking pSymA and pSymB (ΔpSymAB) lost the ability to utilize 55 of 74 carbon sources and various sources of nitrogen, phosphorous and sulfur, yet the ΔpSymAB strain grew well in minimal salts media and in sterile soil. This suggests that the core chromosome is sufficient for growth in a bulk soil environment and that the pSymA and pSymB replicons carry genes with more specialized functions such as growth in the rhizosphere and interaction with the plant. These experimental data support a generalized evolutionary model, in which non-chromosomal replicons primarily carry genes with more specialized functions. These large secondary replicons increase the organism's niche range, which offsets their metabolic burden on the cell (e.g. pSymA). Subsequent co-evolution with the chromosome then leads to the formation of a chromid through the acquisition of functions core to all niches (e.g. pSymB). Rhizobia are free-living bacteria of agricultural and environmental importance that form root-nodules on leguminous plants and provide these plants with fixed nitrogen. Many of the rhizobia have a multipartite genome, as do several plant and animal pathogens. All isolates of the alfalfa symbiont, Sinorhizobium meliloti, carry three large replicons, the chromosome (∼3.7 Mb), pSymA megaplasmid (∼1.4 Mb), and pSymB chromid (∼1.7 Mb). To gain insight into the role and evolutionary history of these replicons, we have ‘reversed evolution’ by constructing a S. meliloti strain consisting solely of the chromosome and lacking the pSymB chromid and pSymA megaplasmid. As the resulting strain was viable, we could perform a detailed phenotypic analysis and these data provided significant insight into the biology and metabolism of S. meliloti. The data lend direct experimental evidence in understanding the evolution and role of the multipartite genome. Specifically the large secondary replicons increase the organism's niche range, and this advantage offsets the metabolic burden of these replicons on the cell. Additionally, the single-chromosome strain offers a useful platform to facilitate future forward genetic approaches to understanding and manipulating the symbiosis and plant-microbe interactions.
Collapse
Affiliation(s)
- George C. diCenzo
- Department of Biology, McMaster University, Hamilton, Ontario, Canada
| | | | | | - G. Brian Golding
- Department of Biology, McMaster University, Hamilton, Ontario, Canada
| | - Turlough M. Finan
- Department of Biology, McMaster University, Hamilton, Ontario, Canada
- * E-mail:
| |
Collapse
|
8
|
Saha SK, Goswami A, Dutta C. Association of purine asymmetry, strand-biased gene distribution and PolC within Firmicutes and beyond: a new appraisal. BMC Genomics 2014; 15:430. [PMID: 24899249 PMCID: PMC4070872 DOI: 10.1186/1471-2164-15-430] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2013] [Accepted: 05/08/2014] [Indexed: 11/10/2022] Open
Abstract
Background The Firmicutes often possess three conspicuous genome features: marked Purine Asymmetry (PAS) across two strands of replication, Strand-biased Gene Distribution (SGD) and presence of two isoforms of DNA polymerase III alpha subunit, PolC and DnaE. Despite considerable research efforts, it is not clear whether the co-existence of PAS, PolC and/or SGD is an essential and exclusive characteristic of the Firmicutes. The nature of correlations, if any, between these three features within and beyond the lineages of Firmicutes has also remained elusive. The present study has been designed to address these issues. Results A large-scale analysis of diverse bacterial genomes indicates that PAS, PolC and SGD are neither essential nor exclusive features of the Firmicutes. PolC prevails in four bacterial phyla: Firmicutes, Fusobacteria, Tenericutes and Thermotogae, while PAS occurs only in subsets of Firmicutes, Fusobacteria and Tenericutes. There are five major compositional trends in Firmicutes: (I) an explicit PAS or G + A-dominance along the entire leading strand (II) only G-dominance in the leading strand, (III) alternate stretches of purine-rich and pyrimidine-rich sequences, (IV) G + T dominance along the leading strand, and (V) no identifiable patterns in base usage. Presence of strong SGD has been observed not only in genomes having PAS, but also in genomes with G-dominance along their leading strands – an observation that defies the notion of co-occurrence of PAS and SGD in Firmicutes. The PolC-containing non-Firmicutes organisms often have alternate stretches of R-dominant and Y-dominant sequences along their genomes and most of them show relatively weak, but significant SGD. Firmicutes having G + A-dominance or G-dominance along LeS usually show distinct base usage patterns in three codon sites of genes. Probable molecular mechanisms that might have incurred such usage patterns have been proposed. Conclusion Co-occurrence of PAS, strong SGD and PolC should not be regarded as a genome signature of the Firmicutes. Presence of PAS in a species may warrant PolC and strong SGD, but PolC and/or SGD not necessarily implies PAS. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-430) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | - Chitra Dutta
- Structural Biology & Bioinformatics Division, CSIR- Indian Institute of Chemical Biology, 4, Raja S, C, Mullick Road, Kolkata 700032, India.
| |
Collapse
|
9
|
The alternative translational profile that underlies the immune-evasive state of persistence in Chlamydiaceae exploits differential tryptophan contents of the protein repertoire. Microbiol Mol Biol Rev 2012; 76:405-43. [PMID: 22688818 DOI: 10.1128/mmbr.05013-11] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
One form of immune evasion is a developmental state called "persistence" whereby chlamydial pathogens respond to the host-mediated withdrawal of L-tryptophan (Trp). A sophisticated survival mode of reversible quiescence is implemented. A mechanism has evolved which suppresses gene products necessary for rapid pathogen proliferation but allows expression of gene products that underlie the morphological and developmental characteristics of persistence. This switch from one translational profile to an alternative translational profile of newly synthesized proteins is proposed to be accomplished by maximizing the Trp content of some proteins needed for rapid proliferation (e.g., ADP/ATP translocase, hexose-phosphate transporter, phosphoenolpyruvate [PEP] carboxykinase, the Trp transporter, the Pmp protein superfamily for cell adhesion and antigenic variation, and components of the cell division pathway) while minimizing the Trp content of other proteins supporting the state of persistence. The Trp starvation mechanism is best understood in the human-Chlamydia trachomatis relationship, but the similarity of up-Trp and down-Trp proteomic profiles in all of the pathogenic Chlamydiaceae suggests that Trp availability is an underlying cue relied upon by this family of pathogens to trigger developmental transitions. The biochemically expensive pathogen strategy of selectively increased Trp usage to guide the translational profile can be leveraged significantly with minimal overall Trp usage by (i) regional concentration of Trp residue placements, (ii) amplified Trp content of a single protein that is required for expression or maturation of multiple proteins with low Trp content, and (iii) Achilles'-heel vulnerabilities of complex pathways to high Trp content of one or a few enzymes.
Collapse
|
10
|
Arakawa K, Tomita M. Measures of compositional strand bias related to replication machinery and its applications. Curr Genomics 2012; 13:4-15. [PMID: 22942671 PMCID: PMC3269016 DOI: 10.2174/138920212799034749] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2011] [Revised: 09/10/2011] [Accepted: 09/20/2011] [Indexed: 11/22/2022] Open
Abstract
The compositional asymmetry of complementary bases in nucleotide sequences implies the existence of a mutational or selectional bias in the two strands of the DNA duplex, which is commonly shaped by strand-specific mechanisms in transcription or replication. Such strand bias in genomes, frequently visualized by GC skew graphs, is used for the computational prediction of transcription start sites and replication origins, as well as for comparative evolutionary genomics studies. The use of measures of compositional strand bias in order to quantify the degree of strand asymmetry is crucial, as it is the basis for determining the applicability of compositional analysis and comparing the strength of the mutational bias in different biological machineries in various species. Here, we review the measures of strand bias that have been proposed to date, including the ∆GC skew, the B1 index, the predictability score of linear discriminant analysis for gene orientation, the signal-to-noise ratio of the oligonucleotide bias, and the GC skew index. These measures have been predominantly designed for and applied to the analysis of replication-related mutational processes in prokaryotes, but we also give research examples in eukaryotes.
Collapse
Affiliation(s)
- Kazuharu Arakawa
- Institute for Advanced Biosciences, Keio University, Fujisawa 252-8520, Japan
| | | |
Collapse
|
11
|
Xu L, Kuo J, Liu JK, Wong TY. Bacterial phylogenetic tree construction based on genomic translation stop signals. MICROBIAL INFORMATICS AND EXPERIMENTATION 2012; 2:6. [PMID: 22651236 PMCID: PMC3466146 DOI: 10.1186/2042-5783-2-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/23/2012] [Accepted: 04/15/2012] [Indexed: 11/10/2022]
Abstract
Background The efficiencies of the stop codons TAA, TAG, and TGA in protein synthesis termination are not the same. These variations could allow many genes to be regulated. There are many similar nucleotide trimers found on the second and third reading-frames of a gene. They are called premature stop codons (PSC). Like stop codons, the PSC in bacterial genomes are also highly bias in terms of their quantities and qualities on the genes. Phylogenetically related species often share a similar PSC profile. We want to know whether the selective forces that influence the stop codons and the PSC usage biases in a genome are related. We also wish to know how strong these trimers in a genome are related to the natural history of the bacterium. Knowing these relations may provide better knowledge in the phylogeny of bacteria Results A 16SrRNA-alignment tree of 19 well-studied α-, β- and γ-Proteobacteria Type species is used as standard reference for bacterial phylogeny. The genomes of sixty-one bacteria, belonging to the α-, β- and γ-Proteobacteria subphyla, are used for this study. The stop codons and PSC are collectively termed “Translation Stop Signals” (TSS). A gene is represented by nine scalars corresponding to the numbers of counts of TAA, TAG, and TGA on each of the three reading-frames of that gene. “Translation Stop Signals Ratio” (TSSR) is the ratio between the TSS counts. Four types of TSSR are investigated. The TSSR-1, TSSR-2 and TSSR-3 are each a 3-scalar series corresponding respectively to the average ratio of TAA: TAG: TGA on the first, second, and third reading-frames of all genes in a genome. The Genomic-TSSR is a 9-scalar series representing the ratio of distribution of all TSS on the three reading-frames of all genes in a genome. Results show that bacteria grouped by their similarities based on TSSR-1, TSSR-2, or TSSR-3 values could only partially resolve the phylogeny of the species. However, grouping bacteria based on thier Genomic-TSSR values resulted in clusters of bacteria identical to those bacterial clusters of the reference tree. Unlike the 16SrRNA method, the Genomic-TSSR tree is also able to separate closely related species/strains at high resolution. Species and strains separated by the Genomic-TSSR grouping method are often in good agreement with those classified by other taxonomic methods. Correspondence analysis of individual genes shows that most genes in a bacterial genome share a similar TSSR value. However, within a chromosome, the Genic-TSSR values of genes near the replication origin region (Ori) are more similar to each other than those genes near the terminus region (Ter). Conclusion The translation stop signals on the three reading-frames of the genes on a bacterial genome are interrelated, possibly due to frequent off-frame recombination facilitated by translational-associated recombination (TSR). However, TSR may not occur randomly in a bacterial chromosome. Genes near the Ori region are often highly expressed and a bacterium always maintains multiple copies of Ori. Frequent collisions between DNA- polymerase and RNA-polymerase would create many DNA strand-breaks on the genes; whereas DNA strand-break induced homologues-recombination is more likely to take place between genes with similar sequence. Thus, localized recombination could explain why the TSSR of genes near the Ori region are more similar to each other. The quantity and quality of these TSS in a genome strongly reflect the natural history of a bacterium. We propose that the Genomic- TSSR can be used as a subjective biomarker to represent the phyletic status of a bacterium.
Collapse
Affiliation(s)
- Lijing Xu
- Department of Biological Sciences, Bioinformatics Program, The University of Memphis, Memphis, TN, USA
| | - Jimmy Kuo
- Department of Planning and Research, National Museum of Marine Biology and Aquarium, Pingtung, Taiwan
| | - Jong-Kang Liu
- Department of Biological Sciences, National Sun Yat-sen University, Kaohsiung, Taiwan
| | - Tit-Yee Wong
- Department of Biological Sciences, Bioinformatics Program, The University of Memphis, Memphis, TN, USA
| |
Collapse
|
12
|
Shah K, Krishnamachari A. Nucleotide correlation based measure for identifying origin of replication in genomic sequences. Biosystems 2012; 107:52-5. [PMID: 21945744 DOI: 10.1016/j.biosystems.2011.09.003] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2011] [Revised: 08/30/2011] [Accepted: 09/10/2011] [Indexed: 12/18/2022]
Abstract
Computational prediction of the origin of replication is a challenging problem and of immense interest to biologists. Several methods have been proposed for identifying the replicon site for various classes of organisms. However, these methods have limited applicability since the replication mechanism is different in different organisms. We propose a correlation measure and show that it is correctly able to predict the origin of replication in most of the bacterial genomes. When applied to Methanocaldococcus jannaschii, Plasmodium falciparum apicoplast and Nicotiana tabacum plastid, this correlation based method is able to correctly predict the origin of replication whereas the generally used GC skew measure fails. Thus, this correlation based measure is a novel and promising tool for predicting the origin of replication in a wide class of organisms. This could have important implications in not only gaining a deeper understanding of the replication machinery in higher organisms, but also for drug discovery.
Collapse
Affiliation(s)
- Kushal Shah
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, India.
| | | |
Collapse
|
13
|
CAGO: a software tool for dynamic visual comparison and correlation measurement of genome organization. PLoS One 2011; 6:e27080. [PMID: 22114666 PMCID: PMC3219657 DOI: 10.1371/journal.pone.0027080] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2011] [Accepted: 10/10/2011] [Indexed: 11/26/2022] Open
Abstract
CAGO (Comparative Analysis of Genome Organization) is developed to address two critical shortcomings of conventional genome atlas plotters: lack of dynamic exploratory functions and absence of signal analysis for genomic properties. With dynamic exploratory functions, users can directly manipulate chromosome tracks of a genome atlas and intuitively identify distinct genomic signals by visual comparison. Signal analysis of genomic properties can further detect inconspicuous patterns from noisy genomic properties and calculate correlations between genomic properties across various genomes. To implement dynamic exploratory functions, CAGO presents each genome atlas in Scalable Vector Graphics (SVG) format and allows users to interact with it using a SVG viewer through JavaScript. Signal analysis functions are implemented using R statistical software and a discrete wavelet transformation package waveslim. CAGO is not only a plotter for generating complex genome atlases, but also a platform for exploring genome atlases with dynamic exploratory functions for visual comparison and with signal analysis for comparing genomic properties across multiple organisms. The web-based application of CAGO, its source code, user guides, video demos, and live examples are publicly available and can be accessed at http://cbs.ym.edu.tw/cago.
Collapse
|
14
|
Guo FB. [Strong strand specific composition bias-a genomic character of some obligate parasites or symbionts]. YI CHUAN = HEREDITAS 2011; 33:1039-1047. [PMID: 21993278 DOI: 10.3724/sp.j.1005.2011.01039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
DNA replication includes a set of asymmetric mechanisms, which is a division into lagging and leading strands. The former is synthesized continuously whereas the synthesis for the latter is discontinuous. Such a asymmetric mechanism leads to distinct nucleotide composition of these two strands. Strands specific nucleotide composition bias was originally found in genomes of echinoderm and vertebrate mitochondria and then in several bacterial genomes. With the rapid growth in the number of sequenced genomes, many bacteria and even eukaryotes are found to have the consistent strand composition bias. In some bacteria, the extent of strand specific composition bias was so strong that genes on the two replicating strands could be separated according to their codon usages. Till now, 11 obligate intracellular bacteria have been found to have separate codon usages according to whether genes located on the leading or lagging strands. However, there is still not a well-accepted theory that could interpret the reason for the occurrence of separate codon usages in some special bacterial genomes and not in others. This paper reviews the related works and points out its open problems.
Collapse
Affiliation(s)
- Feng-Biao Guo
- University of Electronic Science and Technology of China, Chengdu, China.
| |
Collapse
|
15
|
Matthews TD, Edwards R, Maloy S. Chromosomal rearrangements formed by rrn recombination do not improve replichore balance in host-specific Salmonella enterica serovars. PLoS One 2010; 5:e13503. [PMID: 20976060 PMCID: PMC2957434 DOI: 10.1371/journal.pone.0013503] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2010] [Accepted: 09/23/2010] [Indexed: 01/16/2023] Open
Abstract
Background Most of the ∼2,600 serovars of Salmonella enterica have a broad host range as well as a conserved gene order. In contrast, some Salmonella serovars are host-specific and frequently exhibit large chromosomal rearrangements from recombination between rrn operons. One hypothesis explaining these rearrangements suggests that replichore imbalance introduced from horizontal transfer of pathogenicity islands and prophages drives chromosomal rearrangements in an attempt to improve balance. Methodology/Principal Findings This hypothesis was directly tested by comparing the naturally-occurring chromosomal arrangement types to the theoretically possible arrangement types, and estimating their replichore balance using a calculator. In addition to previously characterized strains belonging to host-specific serovars, the arrangement types of 22 serovar Gallinarum strains was also determined. Only 48 out of 1,440 possible arrangement types were identified in 212 host-specific strains. While the replichores of most naturally-occurring arrangement types were well-balanced, most theoretical arrangement types had imbalanced replichores. Furthermore, the most common types of rearrangements did not change replichore balance. Conclusions/Significance The results did not support the hypothesis that replichore imbalance causes these rearrangements, and suggest that the rearrangements could be explained by aspects of a host-specific lifestyle.
Collapse
Affiliation(s)
- T. David Matthews
- Center for Microbial Sciences, Department of Biology, San Diego State University, San Diego, California, United States of America
| | - Robert Edwards
- Center for Microbial Sciences, Department of Biology, San Diego State University, San Diego, California, United States of America
- Department of Computer Science, San Diego State University, San Diego, California, United States of America
- Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, Illinois, United States of America
| | - Stanley Maloy
- Center for Microbial Sciences, Department of Biology, San Diego State University, San Diego, California, United States of America
- * E-mail:
| |
Collapse
|
16
|
Song JZ, Duan KM, Ware T, Surette M. The wavelet-based cluster analysis for temporal gene expression data. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2010:39382. [PMID: 17713589 PMCID: PMC3171337 DOI: 10.1155/2007/39382] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2005] [Revised: 10/01/2006] [Accepted: 03/04/2007] [Indexed: 11/17/2022]
Abstract
A variety of high-throughput methods have made it possible to generate detailed temporal expression data for a single gene or large numbers of genes. Common methods for analysis of these large data sets can be problematic. One challenge is the comparison of temporal expression data obtained from different growth conditions where the patterns of expression may be shifted in time. We propose the use of wavelet analysis to transform the data obtained under different growth conditions to permit comparison of expression patterns from experiments that have time shifts or delays. We demonstrate this approach using detailed temporal data for a single bacterial gene obtained under 72 different growth conditions. This general strategy can be applied in the analysis of data sets of thousands of genes under different conditions.
Collapse
Affiliation(s)
- JZ Song
- Department of Animal and Avian Science, 2413 Animal Science Center, University of Maryland, College Park, MD 20742, USA
| | - KM Duan
- Department of Microbiology and Infectious Diseases, and Department of Biochemistry and Molecular Biology, Health Sciences Centre, University of Calgary, Calgary, AB T2N 4N1, Canada
| | - T Ware
- Department of Mathematics, University of Calgary, Calgary, AB T2N 4N1, Canada
| | - M Surette
- Department of Microbiology and Infectious Diseases, and Department of Biochemistry and Molecular Biology, Health Sciences Centre, University of Calgary, Calgary, AB T2N 4N1, Canada
| |
Collapse
|
17
|
Mitra A, Liu G, Song J. A genome-wide analysis of array-based comparative genomic hybridization (CGH) data to detect intra-species variations and evolutionary relationships. PLoS One 2009; 4:e7978. [PMID: 19956659 PMCID: PMC2777320 DOI: 10.1371/journal.pone.0007978] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2009] [Accepted: 10/13/2009] [Indexed: 11/18/2022] Open
Abstract
Array-based comparative genomics hybridization (aCGH) has gained prevalence as an effective technique for measuring structural variations in the genome. Copy-number variations (CNVs) form a large source of genomic structural variation, but it is not known whether phenotypic differences between intra-species groups, such as divergent human populations, or breeds of a domestic animal, can be attributed to CNVs. Several computational methods have been proposed to improve the detection of CNVs from array CGH data, but few population studies have used CGH data for identification of intra-species differences. In this paper we propose a novel method of genome-wide comparison and classification using CGH data that condenses whole genome information, aimed at quantification of intra-species variations and discovery of shared ancestry. Our strategy included smoothing CGH data using an appropriate denoising algorithm, extracting features via wavelets, quantifying the information via wavelet power spectrum and hierarchical clustering of the resultant profile. To evaluate the classification efficiency of our method, we used simulated data sets. We applied it to aCGH data from human and bovine individuals and showed that it successfully detects existing intra-specific variations with additional evolutionary implications.
Collapse
Affiliation(s)
- Apratim Mitra
- Department of Animal and Avian Sciences, University of Maryland, College Park, Maryland, United States of America
| | - George Liu
- Bovine Functional Genomics Lab, Animal and Natural Resources Institute, Agricultural Research Service, United States Department of Agriculture, Beltsville, Maryland, United States of America
| | - Jiuzhou Song
- Department of Animal and Avian Sciences, University of Maryland, College Park, Maryland, United States of America
- * E-mail:
| |
Collapse
|
18
|
Rosenstein R, Götz F. Genomic differences between the food-grade Staphylococcus carnosus and pathogenic staphylococcal species. Int J Med Microbiol 2009; 300:104-8. [PMID: 19818681 DOI: 10.1016/j.ijmm.2009.08.014] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open
Abstract
By comparative analyses based on the newly sequenced genome of the meat starter bacterium Staphylococcus carnosus TM300, we observed remarkable differences in the content of mobile genetic elements between non-pathogenic and pathogenic staphylococci. While the latter reveal highly flexible genomes with various mobile elements indicating frequent exchange and rearrangement of genomic material, S. carnosus shows a conspicuous lack of those elements carrying only remnants of a prophage and a genomic island in its genome. Furthermore, the S. carnosus genome is significantly poor in repetitive sequences. Despite being known as completely avirulent, S. carnosus reveals also various gene products with similarity to proteins annotated as virulence factors in S. aureus. In addition, the genome carries a number of mutationally inactivated genes including those of the global regulatory systems agr and sae. Our data indicate that S. carnosus has adapted to the constant environmental conditions encountered as part of a starter culture population by a reductive evolution leading to gene loss and inactivation.
Collapse
Affiliation(s)
- Ralf Rosenstein
- Lehrstuhl Mikrobielle Genetik, Universität Tübingen, Waldhäuser Strasse 70/8, D-72076 Tübingen, Germany
| | | |
Collapse
|
19
|
Maruyama F, Kobata M, Kurokawa K, Nishida K, Sakurai A, Nakano K, Nomura R, Kawabata S, Ooshima T, Nakai K, Hattori M, Hamada S, Nakagawa I. Comparative genomic analyses of Streptococcus mutans provide insights into chromosomal shuffling and species-specific content. BMC Genomics 2009; 10:358. [PMID: 19656368 PMCID: PMC2907686 DOI: 10.1186/1471-2164-10-358] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2008] [Accepted: 08/05/2009] [Indexed: 11/20/2022] Open
Abstract
Background Streptococcus mutans is the major pathogen of dental caries, and it occasionally causes infective endocarditis. While the pathogenicity of this species is distinct from other human pathogenic streptococci, the species-specific evolution of the genus Streptococcus and its genomic diversity are poorly understood. Results We have sequenced the complete genome of S. mutans serotype c strain NN2025, and compared it with the genome of UA159. The NN2025 genome is composed of 2,013,587 bp, and the two strains show highly conserved core-genome. However, comparison of the two S. mutans strains showed a large genomic inversion across the replication axis producing an X-shaped symmetrical DNA dot plot. This phenomenon was also observed between other streptococcal species, indicating that streptococcal genetic rearrangements across the replication axis play an important role in Streptococcus genetic shuffling. We further confirmed the genomic diversity among 95 clinical isolates using long-PCR analysis. Genomic diversity in S. mutans appears to occur frequently between insertion sequence (IS) elements and transposons, and these diversity regions consist of restriction/modification systems, antimicrobial peptide synthesis systems, and transporters. S. mutans may preferentially reject the phage infection by clustered regularly interspaced short palindromic repeats (CRISPRs). In particular, the CRISPR-2 region, which is highly divergent between strains, in NN2025 has long repeated spacer sequences corresponding to the streptococcal phage genome. Conclusion These observations suggest that S. mutans strains evolve through chromosomal shuffling and that phage infection is not needed for gene acquisition. In contrast, S. pyogenes tolerates phage infection for acquisition of virulence determinants for niche adaptation.
Collapse
Affiliation(s)
- Fumito Maruyama
- Division of Bacteriology, Department of Infectious Diseases Control, International Research Center for Infectious Diseases, The Institute of Medical Science, The University of Tokyo, Tokyo 108-8639, Japan.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Sernova NV, Gelfand MS. Identification of replication origins in prokaryotic genomes. Brief Bioinform 2008; 9:376-91. [PMID: 18660512 DOI: 10.1093/bib/bbn031] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The availability of hundreds of complete bacterial genomes has created new challenges and simultaneously opportunities for bioinformatics. In the area of statistical analysis of genomic sequences, the studies of nucleotide compositional bias and gene bias between strands and replichores paved way to the development of tools for prediction of bacterial replication origins. Only a few (about 20) origin regions for eubacteria and archaea have been proven experimentally. One reason for that may be that this is now considered as an essentially bioinformatics problem, where predictions are sufficiently reliable not to run labor-intensive experiments, unless specifically needed. Here we describe the main existing approaches to the identification of replication origin (oriC) and termination (terC) loci in prokaryotic chromosomes and characterize a number of computational tools based on various skew types and other types of evidence. We also classify the eubacterial and archaeal chromosomes by predictability of their replication origins using skew plots. Finally, we discuss possible combined approaches to the identification of the oriC sites that may be used to improve the prediction tools, in particular, the analysis of DnaA binding sites using the comparative genomic methods.
Collapse
Affiliation(s)
- Natalia V Sernova
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Bolshoi Karetny pereulok, 19, Moscow, 127994, Russia
| | | |
Collapse
|
21
|
The relaxing ori-ter balance of Mycoplasma genomes. ACTA ACUST UNITED AC 2008; 51:182-9. [PMID: 18239897 DOI: 10.1007/s11427-008-0017-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2006] [Accepted: 12/05/2007] [Indexed: 10/22/2022]
Abstract
Mycoplasma are wall-less bacteria with small genomes, which are thought to have resulted from massive genome reductive processes, during which the ori-ter balance may be disrupted. For technical difficulties, ori and ter have been located only in a few Mycoplasma strains. Using the Z curve method, we were able to locate turning points on the Mycoplasma genomes, with the minimum and maximum points co-locating with ori or ter in the reference genomes. Assuming Z curve correctly located ori and ter, we calculated the distances from ori to ter in both directions on the circular genome and calculated the ori-ter balance status. The Mycoplasma genomes were not balanced, possibly as a result of close association of Mycoplasma with hosts, where there would be no other microbes for Mycoplasma to compete with for nutrients, so fastest possible growth related to balanced genomes might not be needed by Mycoplasma, leading to a relaxing ori-ter balance.
Collapse
|
22
|
Morton RA, Morton BR. Separating the effects of mutation and selection in producing DNA skew in bacterial chromosomes. BMC Genomics 2007; 8:369. [PMID: 17935620 PMCID: PMC2099444 DOI: 10.1186/1471-2164-8-369] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2007] [Accepted: 10/12/2007] [Indexed: 01/01/2023] Open
Abstract
Background Many bacterial chromosomes display nucleotide asymmetry, or skew, between the leading and lagging strands of replication. Mutational differences between these strands result in an overall pattern of skew that is centered about the origin of replication. Such a pattern could also arise from selection coupled with a bias for genes coded on the leading strand. The relative contributions of selection and mutation in producing compositional skew are largely unknown. Results We describe a model to quantify the contribution of mutational differences between the leading and lagging strands in producing replication-induced skew. When the origin and terminus of replication are known, the model can be used to estimate the relative accumulation of G over C and of A over T on the leading strand due to replication effects in a chromosome with bidirectional replication arms. The model may also be implemented in a maximum likelihood framework to estimate the locations of origin and terminus. We find that our estimations for the origin and terminus agree very well with the location of genes that are thought to be associated with the replication origin. This indicates that our model provides an accurate, objective method of determining the replication arms and also provides support for the hypothesis that these genes represent an ancestral cluster of origin-associated genes. Conclusion The model has several advantages over other methods of analyzing genome skew. First, it quantifies the role of mutation in generating skew so that its effect on composition, for example codon bias, can be assessed. Second, it provides an objective method for locating origin and terminus, one that is based on chromosome-wide accumulation of leading vs lagging strand nucleotide differences. Finally, the model has the potential to be utilized in a maximum likelihood framework in order to analyze the effect of chromosome rearrangements on nucleotide composition.
Collapse
Affiliation(s)
- Richard A Morton
- Department of Biology, McMaster University, 1280 Main Street West, Hamilton ON L8S 4K1, Canada.
| | | |
Collapse
|
23
|
Touchon M, Rocha EPC. From GC skews to wavelets: a gentle guide to the analysis of compositional asymmetries in genomic data. Biochimie 2007; 90:648-59. [PMID: 17988781 DOI: 10.1016/j.biochi.2007.09.015] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2007] [Accepted: 09/21/2007] [Indexed: 12/29/2022]
Abstract
Compositional asymmetries are pervasive in DNA sequences. They are the result of the asymmetric interactions between DNA and cellular mechanisms such as replication and transcription. Here, we review many of the methods that have been proposed over the years to analyse compositional asymmetries in DNA sequences. Among these we list GC skews, oligonucleotide skews and wavelets, which among other uses have been extensively employed to delimitate origins and termini of replication in genomes. We also review the use of multivariate methods, such as factorial correspondence analysis, discriminant analysis and analysis of variance, which allow assigning compositional strand asymmetries to the different biological processes shaping sequence composition. Finally, we review methods that have been used to infer substitution matrices and allow understanding the mutational processes underlying strand asymmetry. We focus on replication asymmetries because they have been more thoroughly studied, but the methods may be adapted, and often are, to other problems. Although strand asymmetry has been studied more frequently through compositional skews of nucleotides or oligonucleotides, we recall that, depending on the goal of the analysis, other methods may be more appropriate to answer certain biological questions. We also refer to programs freely available to analyse strand asymmetry.
Collapse
Affiliation(s)
- Marie Touchon
- Atelier de Bioinformatique, Université Pierre et Marie Curie-Paris 6, Paris, France
| | | |
Collapse
|
24
|
Chen C, Chen CW. Quantitative analysis of mutation and selection pressures on base composition skews in bacterial chromosomes. BMC Genomics 2007; 8:286. [PMID: 17711583 PMCID: PMC2031905 DOI: 10.1186/1471-2164-8-286] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2007] [Accepted: 08/21/2007] [Indexed: 11/24/2022] Open
Abstract
Background Most bacterial chromosomes exhibit asymmetry of base composition with respect to leading vs. lagging strands (GC and AT skews). These skews reflect mainly those in protein coding sequences, which are driven by asymmetric mutation pressures during replication and transcription (notably asymmetric cytosine deamination) plus subsequent selection for preferred structures, signals, amino acid or codons. The transcription-associated effects but not the replication-associated effects contribute to the overall skews through the uneven distribution of the coding sequences on the leading and lagging strands. Results Analysis of 185 representative bacterial chromosomes showed diverse and characteristic patterns of skews among different clades. The base composition skews in the coding sequences were used to derive quantitatively the effect of replication-driven mutation plus subsequent selection ('replication-associated pressure', RAP), and the effect of transcription-driven mutation plus subsequent selection at translation level ('transcription-associate pressure', TAP). While different clades exhibit distinct patterns of RAP and TAP, RAP is absent or nearly absent in some bacteria, but TAP is present in all. The selection pressure at the translation level is evident in all bacteria based on the analysis of the skews at the three codon positions. Contribution of asymmetric cytosine deamination was found to be weak to TAP in most phyla, and strong to RAP in all the Proteobacteria but weak in most of the Firmicutes. This possibly reflects the differences in their chromosomal replication machineries. A strong negative correlation between TAP and G+C content and between TAP and chromosomal size were also revealed. Conclusion The study reveals the diverse mutation and selection forces associated with replication and transcription in various groups of bacteria that shape the distinct patterns of base composition skews in the chromosomes during evolution. Some closely relative species with distinct base composition parameters are uncovered in this study, which also provides opportunities for comparative bioinformatic and genetic investigations to uncover the underlying principles for mutation and selection.
Collapse
Affiliation(s)
- Chi Chen
- Institute of Biomedical Informatics, National Yang-Ming University, Shih-Pai, Taipei 111, Taiwan
| | - Carton W Chen
- Institute of Biomedical Informatics, National Yang-Ming University, Shih-Pai, Taipei 111, Taiwan
- Department of Life Sciences and Institute of Genome Sciences, National Yang-Ming University, Shih-Pai, Taipei 111, Taiwan
| |
Collapse
|
25
|
Arakawa K, Saito R, Tomita M. Noise-reduction filtering for accurate detection of replication termini in bacterial genomes. FEBS Lett 2006; 581:253-8. [PMID: 17188685 DOI: 10.1016/j.febslet.2006.12.021] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2006] [Revised: 12/05/2006] [Accepted: 12/08/2006] [Indexed: 11/27/2022]
Abstract
Bacterial chromosomes are highly polarized in their nucleotide composition through mutational selection related to replication. Using compositional skews such as the GC skew, replication origin and terminus can be predicted in silico by observing the shift points. However, the genome sequence is affected by myriad functional requirements and selection on numerous subgenomic features, and elimination of this "noise" should lead to better predictions. Here, we present a noise-reduction approach that uses low-pass filtering through Fast Fourier transform coupled with cumulative skew graphs. It increases the prediction accuracy of the replication termini compared with previously documented methods based on genomic base composition.
Collapse
Affiliation(s)
- Kazuharu Arakawa
- Institute for Advanced Biosciences, Keio University, Fujisawa 252-8520, Japan
| | | | | |
Collapse
|
26
|
Bai X, Zhang J, Ewing A, Miller SA, Jancso Radek A, Shevchenko DV, Tsukerman K, Walunas T, Lapidus A, Campbell JW, Hogenhout SA. Living with genome instability: the adaptation of phytoplasmas to diverse environments of their insect and plant hosts. J Bacteriol 2006; 188:3682-96. [PMID: 16672622 PMCID: PMC1482866 DOI: 10.1128/jb.188.10.3682-3696.2006] [Citation(s) in RCA: 205] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Phytoplasmas ("Candidatus Phytoplasma," class Mollicutes) cause disease in hundreds of economically important plants and are obligately transmitted by sap-feeding insects of the order Hemiptera, mainly leafhoppers and psyllids. The 706,569-bp chromosome and four plasmids of aster yellows phytoplasma strain witches' broom (AY-WB) were sequenced and compared to the onion yellows phytoplasma strain M (OY-M) genome. The phytoplasmas have small repeat-rich genomes. This comparative analysis revealed that the repeated DNAs are organized into large clusters of potential mobile units (PMUs), which contain tra5 insertion sequences (ISs) and genes for specialized sigma factors and membrane proteins. So far, these PMUs appear to be unique to phytoplasmas. Compared to mycoplasmas, phytoplasmas lack several recombination and DNA modification functions, and therefore, phytoplasmas may use different mechanisms of recombination, likely involving PMUs, for the creation of variability, allowing phytoplasmas to adjust to the diverse environments of plants and insects. The irregular GC skews and the presence of ISs and large repeated sequences in the AY-WB and OY-M genomes are indicative of high genomic plasticity. Nevertheless, segments of approximately 250 kb located between the lplA and glnQ genes are syntenic between the two phytoplasmas and contain the majority of the metabolic genes and no ISs. AY-WB appears to be further along in the reductive evolution process than OY-M. The AY-WB genome is approximately 154 kb smaller than the OY-M genome, primarily as a result of fewer multicopy sequences, including PMUs. Furthermore, AY-WB lacks genes that are truncated and are part of incomplete pathways in OY-M.
Collapse
Affiliation(s)
- Xiaodong Bai
- Department of Entomology, The Ohio State University, Ohio Agricultural Research and Development Center, Wooster, 44691, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
27
|
Liu GR, Liu WQ, Johnston RN, Sanderson KE, Li SX, Liu SL. Genome plasticity and ori-ter rebalancing in Salmonella typhi. Mol Biol Evol 2005; 23:365-71. [PMID: 16237205 DOI: 10.1093/molbev/msj042] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Genome plasticity resulting from frequent rearrangement of the bacterial genome is a fascinating but poorly understood phenomenon. First reported in Salmonella typhi, it has been observed only in a small number of Salmonella serovars, although the over 2,500 known Salmonella serovars are all very closely related. To gain insights into this phenomenon and elucidate its roles in bacterial evolution, especially those involved in the formation of particular pathogens, we systematically analyzed the genomes of 127 wild-type S. typhi strains isolated from many places of the world and compared them with the two sequenced strains, Ty2 and CT18, attempting to find possible associations between genome rearrangement and other significant genomic features. Like other host-adapted Salmonella serovars, S. typhi contained large genome insertions, including the 134 kb Salmonella pathogenicity island, SPI7. Our analyses showed that SPI7 disrupted the physical balance of the bacterial genome between the replication origin (ori) and terminus (ter) when this DNA segment was inserted into the genome, and rearrangement in individual strains further changed the genome balance status, with a general tendency toward a better balanced genome structure. In a given S. typhi strain, genome diversification occurred and resulted in different structures among cells in the culture. Under a stressed condition, bacterial cells with better balanced genome structures were selected to greatly increase in proportion; in such cases, bacteria with better balanced genomes formed larger colonies and grew with shorter generation times. Our results support the hypothesis that genome plasticity as a result of frequent rearrangement provides the opportunity for the bacterial genome to adopt a better balanced structure and thus eventually stabilizes the genome during evolution.
Collapse
Affiliation(s)
- Gui-Rong Liu
- Department of Microbiology, Peking University Health Science Center, Beijing, China
| | | | | | | | | | | |
Collapse
|
28
|
Wu KY, Liu GR, Liu WQ, Wang AQ, Zhan S, Sanderson KE, Johnston RN, Liu SL. The genome of Salmonella enterica serovar gallinarum: distinct insertions/deletions and rare rearrangements. J Bacteriol 2005; 187:4720-7. [PMID: 15995186 PMCID: PMC1169526 DOI: 10.1128/jb.187.14.4720-4727.2005] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Salmonella enterica serovar Gallinarum is a fowl-adapted pathogen, causing typhoid fever in chickens. It has the same antigenic formula (1,9,12:--:--) as S. enterica serovar Pullorum, which is also adapted to fowl but causes pullorum disease (diarrhea). The close relatedness but distinct pathogeneses make this pair of fowl pathogens good models for studies of bacterial genomic evolution and the way these organisms acquired pathogenicity. To locate and characterize the genomic differences between serovar Gallinarum and other salmonellae, we constructed a physical map of serovar Gallinarum strain SARB21 by using I-CeuI, XbaI, and AvrII with pulsed-field gel electrophoresis techniques. In the 4,740-kb genome, we located two insertions and six deletions relative to the genome of S. enterica serovar Typhimurium LT2, which we used as a reference Salmonella genome. Four of the genomic regions with reduced lengths corresponded to the four prophages in the genome of serovar Typhimurium LT2, and the others contained several smaller deletions relative to serovar Typhimurium LT2, including regions containing srfJ, std, and stj and gene clusters encoding a type I restriction system in serovar Typhimurium LT2. The map also revealed some rare rearrangements, including two inversions and several translocations. Further characterization of these insertions, deletions, and rearrangements will provide new insights into the molecular basis for the specific host-pathogen interactions and mechanisms of genomic evolution to create a new pathogen.
Collapse
Affiliation(s)
- Kai-Yu Wu
- Department of Microbiology and Infectious Diseases, University of Calgary, Alberta, Canada
| | | | | | | | | | | | | | | |
Collapse
|
29
|
Lewenza S, Falsafi RK, Winsor G, Gooderham WJ, McPhee JB, Brinkman FSL, Hancock REW. Construction of a mini-Tn5-luxCDABE mutant library in Pseudomonas aeruginosa PAO1: a tool for identifying differentially regulated genes. Genome Res 2005; 15:583-9. [PMID: 15805499 PMCID: PMC1074373 DOI: 10.1101/gr.3513905] [Citation(s) in RCA: 141] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Pseudomonas aeruginosa is a major cause of nosocomial (hospital-derived) infections, is the predominant pathogen in chronic cystic fibrosis lung infections, and remains difficult to treat due to its high intrinsic antibiotic resistance. The completion of the P. aeruginosa PAO1 genome sequence provides the opportunity for genome-wide studies to increase our understanding of the pathogenesis and biology of this important pathogen. In this report, we describe the construction of a mini-Tn5-luxCDABE mutant library and a high-throughput inverse PCR method to amplify DNA flanking the site of insertion for sequencing and insertion site mapping. In addition to producing polar knockout mutations in nonessential genes, the promoterless luxCDABE reporter present in the transposon serves as a real-time reporter of gene expression for the inactivated gene. A total of 2519 transposon insertion sites were mapped, 77% of which were nonredundant insertions. Of the insertions within an ORF, -55% of total and unique insertion sites were transcriptional luxCDABE fusions. A bias toward low insertion-site density in the genome region that surrounds the predicted terminus of replication was observed. To demonstrate the utility of chromosomal lux fusions, we performed extensive regulatory screens to identify genes that were differentially regulated under magnesium or phosphate limitation. This approach led to the discovery of many known and novel genes necessary for these environmental adaptations, including genes involved in resistance to cationic antimicrobial peptides. This dual-purpose mutant library allows for functional and regulation studies and will serve as a resource for the research community to further our understanding of P. aeruginosa biology.
Collapse
Affiliation(s)
- Shawn Lewenza
- Department of Microbiology and Immunology, University of British Columbia, Vancouver, British Columbia, Canada V6T 1Z4
| | | | | | | | | | | | | |
Collapse
|
30
|
Inter-genomic displacement via lateral gene transfer of bacterial trp operons in an overall context of vertical genealogy. BMC Biol 2004; 2:15. [PMID: 15214963 PMCID: PMC471576 DOI: 10.1186/1741-7007-2-15] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2004] [Accepted: 06/23/2004] [Indexed: 11/19/2022] Open
Abstract
Background The growing conviction that lateral gene transfer plays a significant role in prokaryote genealogy opens up a need for comprehensive evaluations of gene-enzyme systems on a case-by-case basis. Genes of tryptophan biosynthesis are frequently organized as whole-pathway operons, an attribute that is expected to facilitate multi-gene transfer in a single step. We have asked whether events of lateral gene transfer are sufficient to have obscured our ability to track the vertical genealogy that underpins tryptophan biosynthesis. Results In 47 complete-genome Bacteria, the genes encoding the seven catalytic domains that participate in primary tryptophan biosynthesis were distinguished from any paralogs or xenologs engaged in other specialized functions. A reliable list of orthologs with carefully ascertained functional roles has thus been assembled and should be valuable as an annotation resource. The protein domains associated with primary tryptophan biosynthesis were then concatenated, yielding single amino-acid sequence strings that represent the entire tryptophan pathway. Lateral gene transfer of several whole-pathway trp operons was demonstrated by use of phylogenetic analysis. Lateral gene transfer of partial-pathway trp operons was also shown, with newly recruited genes functioning either in primary biosynthesis (rarely) or specialized metabolism (more frequently). Conclusions (i) Concatenated tryptophan protein trees are congruent with 16S rRNA subtrees provided that the genomes represented are of sufficiently close phylogenetic spacing. There are currently seven tryptophan congruency groups in the Bacteria. Recognition of a succession of others can be expected in the near future, but ultimately these should coalesce to a single grouping that parallels the 16S rRNA tree (except for cases of lateral gene transfer). (ii) The vertical trace of evolution for tryptophan biosynthesis can be deduced. The daunting complexities engendered by paralogy, xenology, and idiosyncrasies of nomenclature at this point in time have necessitated an expert-assisted manual effort to achieve a correct analysis. Once recognized and sorted out, paralogy and xenology can be viewed as features that enrich evolutionary histories.
Collapse
|