1
|
Arakawa K, Tomita M. The GC Skew Index: A Measure of Genomic Compositional Asymmetry and the Degree of Replicational Selection. Evol Bioinform Online 2017. [DOI: 10.1177/117693430700300006] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Circular bacterial chromosomes have highly polarized nucleotide composition in the two replichores, and this genomic strand asymmetry can be visualized using GC skew graphs. Here we propose and discuss the GC skew index (GCSI) for the quantification of genomic compositional skew, which combines a normalized measure of fast Fourier transform to capture the shape of the skew graph and Euclidean distance between the two vertices in a cumulative skew graph to represent the degree of skew. We calculated GCSI for all available bacterial genomes, and GCSI correlated well with the visibility of GC skew. This novel index is useful for estimating confidence levels for the prediction of replication origin and terminus by methods based on GC skew and for measuring the strength of replicational selection in a genome.
Collapse
Affiliation(s)
- Kazuharu Arakawa
- Institute for Advanced Biosciences, Keio University, Fujisawa, Kanagawa 252-8520, Japan
| | - Masaru Tomita
- Institute for Advanced Biosciences, Keio University, Fujisawa, Kanagawa 252-8520, Japan
| |
Collapse
|
2
|
Arakawa K, Tomita M. Selection Effects on the Positioning of Genes and Gene Structures from the Interplay of Replication and Transcription in Bacterial Genomes. Evol Bioinform Online 2017. [DOI: 10.1177/117693430700300005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Bacterial chromosomes are partly shaped by the functional requirements for efficient replication, which lead to strand bias as commonly characterized by the excess of guanines over cytosines in the leading strand. Gene structures are also highly organized within bacterial genomes as a result of such functional constraints, displaying characteristic positioning and structuring along the genome. Here we analyze the gene structures in completely sequenced bacterial chromosomes to observe the positional constraints on gene orientation, length, and codon usage with regard to the positions of replication origin and terminus. Selection on these gene features is different in regions surrounding the terminus of replication from the rest of the genome, but the selection could be either positive or negative depending on the species, and these positional effects are partly attributed to the A-T enrichment near the terminus. Characteristic gene structuring relative to the position of replication origin and terminus is commonly observed among most bacterial species with circular chromosomes, and therefore we argue that the highly organized gene positioning as well as the strand bias should be considered for genomics studies of bacteria.
Collapse
Affiliation(s)
- Kazuharu Arakawa
- Institute for Advanced Biosciences, Keio University, Fujisawa, Kanagawa 252-8520, Japan
| | - Masaru Tomita
- Institute for Advanced Biosciences, Keio University, Fujisawa, Kanagawa 252-8520, Japan
| |
Collapse
|
3
|
Dimude JU, Midgley-Smith SL, Stein M, Rudolph CJ. Replication Termination: Containing Fork Fusion-Mediated Pathologies in Escherichia coli. Genes (Basel) 2016; 7:genes7080040. [PMID: 27463728 PMCID: PMC4999828 DOI: 10.3390/genes7080040] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2016] [Revised: 07/12/2016] [Accepted: 07/19/2016] [Indexed: 01/18/2023] Open
Abstract
Duplication of bacterial chromosomes is initiated via the assembly of two replication forks at a single defined origin. Forks proceed bi-directionally until they fuse in a specialised termination area opposite the origin. This area is flanked by polar replication fork pause sites that allow forks to enter but not to leave. The precise function of this replication fork trap has remained enigmatic, as no obvious phenotypes have been associated with its inactivation. However, the fork trap becomes a serious problem to cells if the second fork is stalled at an impediment, as replication cannot be completed, suggesting that a significant evolutionary advantage for maintaining this chromosomal arrangement must exist. Recently, we demonstrated that head-on fusion of replication forks can trigger over-replication of the chromosome. This over-replication is normally prevented by a number of proteins including RecG helicase and 3’ exonucleases. However, even in the absence of these proteins it can be safely contained within the replication fork trap, highlighting that multiple systems might be involved in coordinating replication fork fusions. Here, we discuss whether considering the problems associated with head-on replication fork fusion events helps us to better understand the important role of the replication fork trap in cellular metabolism.
Collapse
Affiliation(s)
- Juachi U Dimude
- Division of Biosciences, College of Health and Life Sciences, Brunel University London, Uxbridge UB8 3PH, UK.
| | - Sarah L Midgley-Smith
- Division of Biosciences, College of Health and Life Sciences, Brunel University London, Uxbridge UB8 3PH, UK.
| | - Monja Stein
- Division of Biosciences, College of Health and Life Sciences, Brunel University London, Uxbridge UB8 3PH, UK.
| | - Christian J Rudolph
- Division of Biosciences, College of Health and Life Sciences, Brunel University London, Uxbridge UB8 3PH, UK.
| |
Collapse
|
4
|
GC skew and mitochondrial origins of replication. Mitochondrion 2014; 17:56-66. [DOI: 10.1016/j.mito.2014.05.009] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2013] [Revised: 05/09/2014] [Accepted: 05/28/2014] [Indexed: 11/18/2022]
|
5
|
GEMBASSY: an EMBOSS associated software package for comprehensive genome analyses. SOURCE CODE FOR BIOLOGY AND MEDICINE 2013; 8:17. [PMID: 23987304 PMCID: PMC3847652 DOI: 10.1186/1751-0473-8-17] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/31/2013] [Accepted: 08/28/2013] [Indexed: 11/10/2022]
Abstract
The popular European Molecular Biology Open Software Suite (EMBOSS) currently contains over 400 tools used in various bioinformatics researches, equipped with sophisticated development frameworks for interoperability and tool discoverability as well as rich documentations and various user interfaces. In order to further strengthen EMBOSS in the fields of genomics, we here present a novel EMBOSS associated software (EMBASSY) package named GEMBASSY, which adds more than 50 analysis tools from the G-language Genome Analysis Environment and its Representational State Transfer (REST) and SOAP web services. GEMBASSY basically contains wrapper programs of G-language REST/SOAP web services to provide intuitive and easy access to various annotations within complete genome flatfiles, as well as tools for analyzing nucleic composition, calculating codon usage, and visualizing genomic information. For example, analysis methods such as for calculating distance between sequences by genomic signatures and for predicting gene expression levels from codon usage bias are effective in the interpretation of meta-genomic and meta-transcriptomic data. GEMBASSY tools can be used seamlessly with other EMBOSS tools and UNIX command line tools. The source code written in C is available from GitHub (https://github.com/celery-kotone/GEMBASSY/) and the distribution package is freely available from the GEMBASSY web site (http://www.g-language.org/gembassy/).
Collapse
|
6
|
Arakawa K, Tomita M. Measures of compositional strand bias related to replication machinery and its applications. Curr Genomics 2012; 13:4-15. [PMID: 22942671 PMCID: PMC3269016 DOI: 10.2174/138920212799034749] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2011] [Revised: 09/10/2011] [Accepted: 09/20/2011] [Indexed: 11/22/2022] Open
Abstract
The compositional asymmetry of complementary bases in nucleotide sequences implies the existence of a mutational or selectional bias in the two strands of the DNA duplex, which is commonly shaped by strand-specific mechanisms in transcription or replication. Such strand bias in genomes, frequently visualized by GC skew graphs, is used for the computational prediction of transcription start sites and replication origins, as well as for comparative evolutionary genomics studies. The use of measures of compositional strand bias in order to quantify the degree of strand asymmetry is crucial, as it is the basis for determining the applicability of compositional analysis and comparing the strength of the mutational bias in different biological machineries in various species. Here, we review the measures of strand bias that have been proposed to date, including the ∆GC skew, the B1 index, the predictability score of linear discriminant analysis for gene orientation, the signal-to-noise ratio of the oligonucleotide bias, and the GC skew index. These measures have been predominantly designed for and applied to the analysis of replication-related mutational processes in prokaryotes, but we also give research examples in eukaryotes.
Collapse
Affiliation(s)
- Kazuharu Arakawa
- Institute for Advanced Biosciences, Keio University, Fujisawa 252-8520, Japan
| | | |
Collapse
|
7
|
Kono N, Arakawa K, Tomita M. Validation of bacterial replication termination models using simulation of genomic mutations. PLoS One 2012; 7:e34526. [PMID: 22509315 PMCID: PMC3317982 DOI: 10.1371/journal.pone.0034526] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2011] [Accepted: 03/05/2012] [Indexed: 11/21/2022] Open
Abstract
In bacterial circular chromosomes and most plasmids, the replication is known to be terminated when either of the following occurs: the forks progressing in opposite directions meet at the distal end of the chromosome or the replication forks become trapped by Tus proteins bound to Ter sites. Most bacterial genomes have various polarities in their genomic structures. The most notable feature is polar genomic compositional asymmetry of the bases G and C in the leading and lagging strands, called GC skew. This asymmetry is caused by replication-associated mutation bias, and this “footprint" of the replication machinery suggests that, in contrast to the two known mechanisms, replication termination occurs near the chromosome dimer resolution site dif. To understand this difference between the known replication machinery and genomic compositional bias, we undertook a simulation study of genomic mutations, and we report here how different replication termination models contribute to the generation of replication-related genomic compositional asymmetry. Contrary to naive expectations, our results show that a single finite termination site at dif or at the GC skew shift point is not sufficient to reconstruct the genomic compositional bias as observed in published sequences. The results also show that the known replication mechanisms are sufficient to explain the position of the GC skew shift point.
Collapse
Affiliation(s)
- Nobuaki Kono
- Institute for Advanced Biosciences, Keio University, Fujisawa, Kanagawa, Japan
| | - Kazuharu Arakawa
- Institute for Advanced Biosciences, Keio University, Fujisawa, Kanagawa, Japan
- * E-mail:
| | - Masaru Tomita
- Institute for Advanced Biosciences, Keio University, Fujisawa, Kanagawa, Japan
| |
Collapse
|
8
|
Comprehensive prediction of chromosome dimer resolution sites in bacterial genomes. BMC Genomics 2011; 12:19. [PMID: 21223577 PMCID: PMC3025954 DOI: 10.1186/1471-2164-12-19] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2010] [Accepted: 01/11/2011] [Indexed: 11/30/2022] Open
Abstract
Background During the replication process of bacteria with circular chromosomes, an odd number of homologous recombination events results in concatenated dimer chromosomes that cannot be partitioned into daughter cells. However, many bacteria harbor a conserved dimer resolution machinery consisting of one or two tyrosine recombinases, XerC and XerD, and their 28-bp target site, dif. Results To study the evolution of the dif/XerCD system and its relationship with replication termination, we report the comprehensive prediction of dif sequences in silico using a phylogenetic prediction approach based on iterated hidden Markov modeling. Using this method, dif sites were identified in 641 organisms among 16 phyla, with a 97.64% identification rate for single-chromosome strains. The dif sequence positions were shown to be strongly correlated with the GC skew shift-point that is induced by replicational mutation/selection pressures, but the difference in the positions of the predicted dif sites and the GC skew shift-points did not correlate with the degree of replicational mutation/selection pressures. Conclusions The sequence of dif sites is widely conserved among many bacterial phyla, and they can be computationally identified using our method. The lack of correlation between dif position and the degree of GC skew suggests that replication termination does not occur strictly at dif sites.
Collapse
|
9
|
Arakawa K, Kido N, Oshita K, Tomita M. G-language genome analysis environment with REST and SOAP web service interfaces. Nucleic Acids Res 2010; 38:W700-5. [PMID: 20439313 PMCID: PMC2896103 DOI: 10.1093/nar/gkq315] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
G-language genome analysis environment (G-language GAE) contains more than 100 programs that focus on the analysis of bacterial genomes, including programs for the identification of binding sites by means of information theory, analysis of nucleotide composition bias and the distribution of particular oligonucleotides, calculation of codon bias and prediction of expression levels, and visualization of genomic information. We have provided a collection of web services for these programs by utilizing REST and SOAP technologies. The REST interface, available at http://rest.g-language.org/, provides access to all 145 functions of the G-language GAE. These functions can be accessed from other online resources. All analysis functions are represented by unique universal resource identifiers. Users can access the functions directly via the corresponding universe resource locators (URLs), and biological web sites can readily embed the functions by simply linking to these URLs. The SOAP services, available at http://www.g-language.org/wiki/soap/, provide language-independent programmatic access to 77 analysis programs. The SOAP service Web Services Definition Language file can be readily loaded into graphical clients such as the Taverna workbench to integrate the programs with other services and workflows.
Collapse
Affiliation(s)
- Kazuharu Arakawa
- Institute for Advanced Biosciences, Keio University, Fujisawa 252-8520, Japan.
| | | | | | | |
Collapse
|
10
|
Arakawa K, Suzuki H, Tomita M. Quantitative analysis of replication-related mutation and selection pressures in bacterial chromosomes and plasmids using generalised GC skew index. BMC Genomics 2009; 10:640. [PMID: 20042086 PMCID: PMC2804667 DOI: 10.1186/1471-2164-10-640] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2009] [Accepted: 12/30/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Due to their bi-directional replication machinery starting from a single finite origin, bacterial genomes show characteristic nucleotide compositional bias between the two replichores, which can be visualised through GC skew or (C-G)/(C+G). Although this polarisation is used for computational prediction of replication origins in many bacterial genomes, the degree of GC skew visibility varies widely among different species, necessitating a quantitative measurement of GC skew strength in order to provide confidence measures for GC skew-based predictions of replication origins. RESULTS Here we discuss a quantitative index for the measurement of GC skew strength, named the generalised GC skew index (gGCSI), which is applicable to genomes of any length, including bacterial chromosomes and plasmids. We demonstrate that gGCSI is independent of the window size and can thus be used to compare genomes with different sizes, such as bacterial chromosomes and plasmids. It can suggest the existence of different replication mechanisms in archaea and of rolling-circle replication in plasmids. Correlation of gGCSI values between plasmids and their corresponding host chromosomes suggests that within the same strain, these replicons have reproduced using the same replication machinery and thus exhibit similar strengths of replication strand skew. CONCLUSIONS gGCSI can be applied to genomes of any length and thus allows comparative study of replication-related mutation and selection pressures in genomes of different lengths such as bacterial chromosomes and plasmids. Using gGCSI, we showed that replication-related mutation or selection pressure is similar for replicons with similar machinery.
Collapse
Affiliation(s)
- Kazuharu Arakawa
- Institute for Advanced Biosciences, Keio University, Fujisawa, 252-8520, Japan.
| | | | | |
Collapse
|
11
|
Duggin IG, Wake RG, Bell SD, Hill TM. The replication fork trap and termination of chromosome replication. Mol Microbiol 2008; 70:1323-33. [PMID: 19019156 DOI: 10.1111/j.1365-2958.2008.06500.x] [Citation(s) in RCA: 82] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Bacteria that have a circular chromosome with a bidirectional DNA replication origin are thought to utilize a 'replication fork trap' to control termination of replication. The fork trap is an arrangement of replication pause sites that ensures that the two replication forks fuse within the terminus region of the chromosome, approximately opposite the origin on the circular map. However, the biological significance of the replication fork trap has been mysterious, as its inactivation has no obvious consequence. Here we review the research that led to the replication fork trap theory, and we aim to integrate several recent findings that contribute towards an understanding of the physiological roles of the replication fork trap. Likely roles include the prevention of over-replication, and the optimization of post-replicative mechanisms of chromosome segregation, such as that involving FtsK in Escherichia coli.
Collapse
Affiliation(s)
- Iain G Duggin
- Sir William Dunn School of Pathology, University of Oxford, Oxford OX1 3RE, UK.
| | | | | | | |
Collapse
|
12
|
The mitochondrial genome of the phytopathogenic basidiomycete Moniliophthora perniciosa is 109kb in size and contains a stable integrated plasmid. ACTA ACUST UNITED AC 2008; 112:1136-52. [DOI: 10.1016/j.mycres.2008.04.014] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2007] [Revised: 03/19/2008] [Accepted: 04/24/2008] [Indexed: 11/17/2022]
|
13
|
Touchon M, Rocha EPC. From GC skews to wavelets: a gentle guide to the analysis of compositional asymmetries in genomic data. Biochimie 2007; 90:648-59. [PMID: 17988781 DOI: 10.1016/j.biochi.2007.09.015] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2007] [Accepted: 09/21/2007] [Indexed: 12/29/2022]
Abstract
Compositional asymmetries are pervasive in DNA sequences. They are the result of the asymmetric interactions between DNA and cellular mechanisms such as replication and transcription. Here, we review many of the methods that have been proposed over the years to analyse compositional asymmetries in DNA sequences. Among these we list GC skews, oligonucleotide skews and wavelets, which among other uses have been extensively employed to delimitate origins and termini of replication in genomes. We also review the use of multivariate methods, such as factorial correspondence analysis, discriminant analysis and analysis of variance, which allow assigning compositional strand asymmetries to the different biological processes shaping sequence composition. Finally, we review methods that have been used to infer substitution matrices and allow understanding the mutational processes underlying strand asymmetry. We focus on replication asymmetries because they have been more thoroughly studied, but the methods may be adapted, and often are, to other problems. Although strand asymmetry has been studied more frequently through compositional skews of nucleotides or oligonucleotides, we recall that, depending on the goal of the analysis, other methods may be more appropriate to answer certain biological questions. We also refer to programs freely available to analyse strand asymmetry.
Collapse
Affiliation(s)
- Marie Touchon
- Atelier de Bioinformatique, Université Pierre et Marie Curie-Paris 6, Paris, France
| | | |
Collapse
|