1
|
Moeckel C, Zaravinos A, Georgakopoulos-Soares I. Strand asymmetries across genomic processes. Comput Struct Biotechnol J 2023; 21:2036-2047. [PMID: 36968020 PMCID: PMC10030826 DOI: 10.1016/j.csbj.2023.03.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Revised: 03/08/2023] [Accepted: 03/08/2023] [Indexed: 03/12/2023] Open
Abstract
Across biological systems, a number of genomic processes, including transcription, replication, DNA repair, and transcription factor binding, display intrinsic directionalities. These directionalities are reflected in the asymmetric distribution of nucleotides, motifs, genes, transposon integration sites, and other functional elements across the two complementary strands. Strand asymmetries, including GC skews and mutational biases, have shaped the nucleotide composition of diverse organisms. The investigation of strand asymmetries often serves as a method to understand underlying biological mechanisms, including protein binding preferences, transcription factor interactions, retrotransposition, DNA damage and repair preferences, transcription-replication collisions, and mutagenesis mechanisms. Research into this subject also enables the identification of functional genomic sites, such as replication origins and transcription start sites. Improvements in our ability to detect and quantify DNA strand asymmetries will provide insights into diverse functionalities of the genome, the contribution of different mutational mechanisms in germline and somatic mutagenesis, and our knowledge of genome instability and evolution, which all have significant clinical implications in human disease, including cancer. In this review, we describe key developments that have been made across the field of genomic strand asymmetries, as well as the discovery of associated mechanisms.
Collapse
Affiliation(s)
- Camille Moeckel
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Apostolos Zaravinos
- Department of Life Sciences, European University Cyprus, Diogenis Str., 6, Nicosia 2404, Cyprus
- Cancer Genetics, Genomics and Systems Biology laboratory, Basic and Translational Cancer Research Center (BTCRC), Nicosia 1516, Cyprus
| | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| |
Collapse
|
2
|
The Kaumoebavirus LCC10 Genome Reveals a Unique Gene Strand Bias among "Extended Asfarviridae". Viruses 2021; 13:v13020148. [PMID: 33498382 PMCID: PMC7909422 DOI: 10.3390/v13020148] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2020] [Revised: 01/13/2021] [Accepted: 01/15/2021] [Indexed: 12/14/2022] Open
Abstract
Kaumoebavirus infects the amoeba Vermamoeba vermiformis and has recently been described as a distant relative of the African swine fever virus. To characterize the diversity and evolution of this novel viral genus, we report here on the isolation and genome sequencing of a second strain of Kaumoebavirus, namely LCC10. Detailed analysis of the sequencing data suggested that its 362-Kb genome is linear with covalently closed hairpin termini, so that DNA forms a single continuous polynucleotide chain. Comparative genomic analysis indicated that although the two sequenced Kaumoebavirus strains share extensive gene collinearity, 180 predicted genes were either gained or lost in only one genome. As already observed in another distant relative, i.e., Faustovirus, which infects the same host, the center and extremities of the Kaumoebavirus genome exhibited a higher rate of sequence divergence and the major capsid protein gene was colonized by type-I introns. A possible role of the Vermamoeba host in the genesis of these evolutionary traits is hypothesized. The Kaumoebavirus genome exhibited a significant gene strand bias over the two-third of genome length, a feature not seen in the other members of the “extended Asfarviridae” clade. We suggest that this gene strand bias was induced by a putative single origin of DNA replication located near the genome extremity that imparted a selective force favoring the genes positioned on the leading strand.
Collapse
|
3
|
Georgakopoulos-Soares I, Mouratidis I, Parada GE, Matharu N, Hemberg M, Ahituv N. Asymmetron: a toolkit for the identification of strand asymmetry patterns in biological sequences. Nucleic Acids Res 2021; 49:e4. [PMID: 33211865 PMCID: PMC7797064 DOI: 10.1093/nar/gkaa1052] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Revised: 10/15/2020] [Accepted: 10/20/2020] [Indexed: 11/23/2022] Open
Abstract
DNA strand asymmetries can have a major effect on several biological functions, including replication, transcription and transcription factor binding. As such, DNA strand asymmetries and mutational strand bias can provide information about biological function. However, a versatile tool to explore this does not exist. Here, we present Asymmetron, a user-friendly computational tool that performs statistical analysis and visualizations for the evaluation of strand asymmetries. Asymmetron takes as input DNA features provided with strand annotation and outputs strand asymmetries for consecutive occurrences of a single DNA feature or between pairs of features. We illustrate the use of Asymmetron by identifying transcriptional and replicative strand asymmetries of germline structural variant breakpoints. We also show that the orientation of the binding sites of 45% of human transcription factors analyzed have a significant DNA strand bias in transcribed regions, that is also corroborated in ChIP-seq analyses, and is likely associated with transcription. In summary, we provide a novel tool to assess DNA strand asymmetries and show how it can be used to derive new insights across a variety of biological disciplines.
Collapse
Affiliation(s)
- Ilias Georgakopoulos-Soares
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
| | - Ioannis Mouratidis
- Aristotle University of Thessaloniki, Department of Mathematics, Thessaloniki, GR, Greece
| | - Guillermo E Parada
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
- Wellcome Trust Cancer Research UK Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge CB2 1QN, UK
| | - Navneet Matharu
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
- Innovative Genomics Institute, University of California San Francisco, San Francisco, CA, USA
| | - Martin Hemberg
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
- Wellcome Trust Cancer Research UK Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge CB2 1QN, UK
| | - Nadav Ahituv
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
| |
Collapse
|
4
|
Rashmi M, Swati D. Comparative Genomics of Trypanosomatid Pathogens using Codon Usage Bias. Bioinformation 2013; 9:912-8. [PMID: 24307769 PMCID: PMC3842577 DOI: 10.6026/97320630009912] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2013] [Accepted: 10/19/2013] [Indexed: 11/25/2022] Open
Abstract
UNLABELLED It is well known that an amino acid can be encoded by more than one codon, called synonymous codons. The preferential use of one particular codon for coding an amino acid is referred to as codon usage bias (CUB). A quantitative analytical method, CUB and a related tool, Codon Adaptative Index have been applied to comparatively study whole genomes of a few pathogenic Trypanosomatid species. This quantitative attempt is of direct help in the comparison of qualitative features like mutational and translational selection. Pathogens of the Leishmania and Trypanosoma genus cause debilitating disease and suffering in human beings and animals. Of these, whole genome sequences are available for only five species. The complete coding sequences (CDS), highly expressed, essential and low expressed genes have all been studied for their CUB signature. The codon usage bias of essential genes and highly expressed genes show distribution similar to codon usage bias of all CDSs in Trypanosomatids. Translational selection is the dominant force selecting the preferred codon, and selection due to mutation is negligible. In contrast to an earlier study done on these pathogens, it is found in this work that CUB and CAI may be used to distinguish the Trypanosomatid genomes at the sub-genus level. Further, CUB may effectively be used as a signature of the species differentiation by using Principal Component Analysis (PCA). ABBREVIATIONS CUB - Codon Usage Bias, CAI - Codon Adaptative Index, CDS - Coding sequences, t-RNA - Transfer RNA, PCA - Principal Component Analysis.
Collapse
Affiliation(s)
- Mayank Rashmi
- Department of Bioinformatics, MMV, Banaras Hindu University, Varanasi-221005, India
| | - D Swati
- Department of Bioinformatics, MMV, Banaras Hindu University, Varanasi-221005, India
- Department of Physics, MMV, Banaras Hindu University, Varanasi 221005, India
| |
Collapse
|
5
|
Kreuzer KN, Brister JR. Initiation of bacteriophage T4 DNA replication and replication fork dynamics: a review in the Virology Journal series on bacteriophage T4 and its relatives. Virol J 2010; 7:358. [PMID: 21129203 PMCID: PMC3016281 DOI: 10.1186/1743-422x-7-358] [Citation(s) in RCA: 77] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2010] [Accepted: 12/03/2010] [Indexed: 11/10/2022] Open
Abstract
Bacteriophage T4 initiates DNA replication from specialized structures that form in its genome. Immediately after infection, RNA-DNA hybrids (R-loops) occur on (at least some) replication origins, with the annealed RNA serving as a primer for leading-strand synthesis in one direction. As the infection progresses, replication initiation becomes dependent on recombination proteins in a process called recombination-dependent replication (RDR). RDR occurs when the replication machinery is assembled onto D-loop recombination intermediates, and in this case, the invading 3' DNA end is used as a primer for leading strand synthesis. Over the last 15 years, these two modes of T4 DNA replication initiation have been studied in vivo using a variety of approaches, including replication of plasmids with segments of the T4 genome, analysis of replication intermediates by two-dimensional gel electrophoresis, and genomic approaches that measure DNA copy number as the infection progresses. In addition, biochemical approaches have reconstituted replication from origin R-loop structures and have clarified some detailed roles of both replication and recombination proteins in the process of RDR and related pathways. We will also discuss the parallels between T4 DNA replication modes and similar events in cellular and eukaryotic organelle DNA replication, and close with some current questions of interest concerning the mechanisms of replication, recombination and repair in phage T4.
Collapse
Affiliation(s)
- Kenneth N Kreuzer
- Department of Biochemistry, Duke University Medical Center, Durham, NC 27710 USA
| | - J Rodney Brister
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894 USA
| |
Collapse
|
6
|
Abstract
A regional analysis of nucleotide substitution rates along human genes and their flanking regions allows us to quantify the effect of mutational mechanisms associated with transcription in germ line cells. Our analysis reveals three distinct patterns of substitution rates. First, a sharp decline in the deamination rate of methylated CpG dinucleotides, which is observed in the vicinity of the 5' end of genes. Second, a strand asymmetry in complementary substitution rates, which extends from the 5' end to 1 kbp downstream from the 3' end, associated with transcription-coupled repair. Finally, a localized strand asymmetry, an excess of C-->T over G-->A substitution in the nontemplate strand confined to the first 1-2 kbp downstream of the 5' end of genes. We hypothesize that higher exposure of the nontemplate strand near the 5' end of genes leads to a higher cytosine deamination rate. Up to now, only the somatic hypermutation (SHM) pathway has been known to mediate localized and strand-specific mutagenic processes associated with transcription in mammalia. The mutational patterns in SHM are induced by cytosine deaminase, which just targets single-stranded DNA. This DNA conformation is induced by R-loops, which preferentially occur at the 5' ends of genes. We predict that R-loops are extensively formed in the beginning of transcribed regions in germ line cells.
Collapse
|
7
|
Nolan JM, Petrov V, Bertrand C, Krisch HM, Karam JD. Genetic diversity among five T4-like bacteriophages. Virol J 2006; 3:30. [PMID: 16716236 PMCID: PMC1524935 DOI: 10.1186/1743-422x-3-30] [Citation(s) in RCA: 90] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2006] [Accepted: 05/23/2006] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Bacteriophages are an important repository of genetic diversity. As one of the major constituents of terrestrial biomass, they exert profound effects on the earth's ecology and microbial evolution by mediating horizontal gene transfer between bacteria and controlling their growth. Only limited genomic sequence data are currently available for phages but even this reveals an overwhelming diversity in their gene sequences and genomes. The contribution of the T4-like phages to this overall phage diversity is difficult to assess, since only a few examples of complete genome sequence exist for these phages. Our analysis of five T4-like genomes represents half of the known T4-like genomes in GenBank. RESULTS Here, we have examined in detail the genetic diversity of the genomes of five relatives of bacteriophage T4: the Escherichia coli phages RB43, RB49 and RB69, the Aeromonas salmonicida phage 44RR2.8t (or 44RR) and the Aeromonas hydrophila phage Aeh1. Our data define a core set of conserved genes common to these genomes as well as hundreds of additional open reading frames (ORFs) that are nonconserved. Although some of these ORFs resemble known genes from bacterial hosts or other phages, most show no significant similarity to any known sequence in the databases. The five genomes analyzed here all have similarities in gene regulation to T4. Sequence motifs resembling T4 early and late consensus promoters were observed in all five genomes. In contrast, only two of these genomes, RB69 and 44RR, showed similarities to T4 middle-mode promoter sequences and to the T4 motA gene product required for their recognition. In addition, we observed that each phage differed in the number and assortment of putative genes encoding host-like metabolic enzymes, tRNA species, and homing endonucleases. CONCLUSION Our observations suggest that evolution of the T4-like phages has drawn on a highly diverged pool of genes in the microbial world. The T4-like phages harbour a wealth of genetic material that has not been identified previously. The mechanisms by which these genes may have arisen may differ from those previously proposed for the evolution of other bacteriophage genomes.
Collapse
Affiliation(s)
- James M Nolan
- Department of Biological Sciences, University of New Orleans, 2000 Lakeshore Dr., New Orleans, LA 70148, USA
- Department of Biochemistry, Tulane University Health Sciences Center, 1430 Tulane Ave., New Orleans, LA 70112, USA
| | - Vasiliy Petrov
- Department of Biochemistry, Tulane University Health Sciences Center, 1430 Tulane Ave., New Orleans, LA 70112, USA
| | - Claire Bertrand
- LMGM-CNRS UMR 5100,118, route de Narbonne, 31062 Toulouse cedex 09, France
| | - Henry M Krisch
- LMGM-CNRS UMR 5100,118, route de Narbonne, 31062 Toulouse cedex 09, France
| | - Jim D Karam
- Department of Biochemistry, Tulane University Health Sciences Center, 1430 Tulane Ave., New Orleans, LA 70112, USA
| |
Collapse
|
8
|
Guy L, Karamata D, Moreillon P, Roten CAH. Genometrics as an essential tool for the assembly of whole genome sequences: the example of the chromosome of Bifidobacterium longum NCC2705. BMC Microbiol 2005; 5:60. [PMID: 16223444 PMCID: PMC1285363 DOI: 10.1186/1471-2180-5-60] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2005] [Accepted: 10/13/2005] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND Analysis of the first reported complete genome sequence of Bifidobacterium longum NCC2705, an actinobacterium colonizing the gastrointestinal tract, uncovered its proteomic relatedness to Streptomyces coelicolor and Mycobacterium tuberculosis. However, a rapid scrutiny by genometric methods revealed a genome organization totally different from all so far sequenced high-GC Gram-positive chromosomes. RESULTS Generally, the cumulative GC- and ORF orientation skew curves of prokaryotic genomes consist of two linear segments of opposite slope: the minimum and the maximum of the curves correspond to the origin and the terminus of chromosome replication, respectively. However, analyses of the B. longum NCC2705 chromosome yielded six, instead of two, linear segments, while its dnaA locus, usually associated with the origin of replication, was not located at the minimum of the curves. Furthermore, the coorientation of gene transcription with replication was very low. Comparison with closely related actinobacteria strongly suggested that the chromosome of B. longum was misassembled, and the identification of two pairs of relatively long homologous DNA sequences offers the possibility for an alternative genome assembly proposed here below. By genometric criteria, this configuration displays all of the characters common to bacteria, in particular to related high-GC Gram-positives. In addition, it is compatible with the partially sequenced genome of DJO10A B. longum strain. Recently, a corrected sequence of B. longum NCC2705, with a configuration similar to the one proposed here below, has been deposited in GenBank, confirming our predictions. CONCLUSION Genometric analyses, in conjunction with standard bioinformatic tools and knowledge of bacterial chromosome architecture, represent fast and straightforward methods for the evaluation of chromosome assembly.
Collapse
Affiliation(s)
- Lionel Guy
- Département de Microbiologie Fondamentale, Faculté de Biologie et Médecine, Université de Lausanne, CH-1015 Lausanne, Switzerland
| | - Dimitri Karamata
- Département de Microbiologie Fondamentale, Faculté de Biologie et Médecine, Université de Lausanne, CH-1015 Lausanne, Switzerland
| | - Philippe Moreillon
- Département de Microbiologie Fondamentale, Faculté de Biologie et Médecine, Université de Lausanne, CH-1015 Lausanne, Switzerland
| | - Claude-Alain H Roten
- Département de Microbiologie Fondamentale, Faculté de Biologie et Médecine, Université de Lausanne, CH-1015 Lausanne, Switzerland
| |
Collapse
|
9
|
McClellan DA, Whiting DG, Christensen R, Sailsbery J. Genetic codes as evolutionary filters: subtle differences in the structure of genetic codes result in significant differences in patterns of nucleotide substitution. J Theor Biol 2004; 226:393-400. [PMID: 14759645 DOI: 10.1016/j.jtbi.2003.09.019] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2003] [Revised: 09/13/2003] [Accepted: 09/15/2003] [Indexed: 10/26/2022]
Abstract
The codon-degeneracy model (CDM) predicts that patterns of nucleotide substitution in protein-coding genes are largely determined by the relative frequencies of four-fold (4f), two-fold, and non-degenerate sites, the attributes of which are determined by the structure of the governing genetic code. The CDM thus further predicts that genetic codes with alternative structures will "filter" molecular evolution differentially. A method, therefore, is presented by which the CDM may be applied to the unique structure of any genetic code. The mathematical relationship between the proportion of transitions at 4f degenerate nucleotide sites and the transition-to-transversion ratio is described. Predictions for five individual genetic codes, relative to the relationship between code structure and expected patterns of nucleotide substitution, are clearly defined. To test this "filter" hypothesis of genetic codes, simulated DNA sequence data sets were generated with a variety of input parameter values to estimate the relationship between patterns of nucleotide substitution and best-fit estimates of transition bias at 4f degenerate sites for both the universal genetic code and the vertebrate mitochondrial genetic code. These analyses confirm the prediction of the CDM that, all else being equal, even small differences in the structure of alternative genetic codes may result in significant shifts in the overall pattern of nucleotide substitution.
Collapse
Affiliation(s)
- David A McClellan
- Department of Integrative Biology, Brigham Young University, WIDB 401, Provo, UT 84602-5181, USA.
| | | | | | | |
Collapse
|
10
|
Abstract
The frequencies of individual nucleotides exhibit significant fluctuations across eukaryotic genes. In this paper, we investigate nucleotide variation across an averaged representation of all known human genes. Such a representation allows us to average out random fluctuations that constitute noise and uncover remarkable systematic trends in nucleotide distributions, particularly near boundaries between genetic elements--the promoter, exons, and introns. We propose that such variations result from differential mutational pressures and from the presence of specific regulatory motifs, such as transcription and splicing factor binding sites. Specifically, we observe significant GC and TA biases (excess of G over C and T over A) in noncoding regions of genes. Such biases are most probably caused by transcription-coupled mismatch repair, an effect that has recently been detected in mammalian genes. Subsequently, we examine the distribution of all hexanucleotides and identify motifs that are overrepresented within regulatory regions. By clustering and aligning such sequences, we recognize families of putative regulatory elements involved in exonic and intronic splicing control, and 3' mRNA processing. Some of our motifs have been identified in prior theoretical and experimental studies, thus validating our approach, but we detect several novel sequences that we propose as candidates for future functional assays and mutation screens for genetic disorders.
Collapse
|
11
|
Miller ES, Heidelberg JF, Eisen JA, Nelson WC, Durkin AS, Ciecko A, Feldblyum TV, White O, Paulsen IT, Nierman WC, Lee J, Szczypinski B, Fraser CM. Complete genome sequence of the broad-host-range vibriophage KVP40: comparative genomics of a T4-related bacteriophage. J Bacteriol 2003; 185:5220-33. [PMID: 12923095 PMCID: PMC180978 DOI: 10.1128/jb.185.17.5220-5233.2003] [Citation(s) in RCA: 181] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2003] [Accepted: 04/30/2003] [Indexed: 11/20/2022] Open
Abstract
The complete genome sequence of the T4-like, broad-host-range vibriophage KVP40 has been determined. The genome sequence is 244,835 bp, with an overall G+C content of 42.6%. It encodes 386 putative protein-encoding open reading frames (CDSs), 30 tRNAs, 33 T4-like late promoters, and 57 potential rho-independent terminators. Overall, 92.1% of the KVP40 genome is coding, with an average CDS size of 587 bp. While 65% of the CDSs were unique to KVP40 and had no known function, the genome sequence and organization show specific regions of extensive conservation with phage T4. At least 99 KVP40 CDSs have homologs in the T4 genome (Blast alignments of 45 to 68% amino acid similarity). The shared CDSs represent 36% of all T4 CDSs but only 26% of those from KVP40. There is extensive representation of the DNA replication, recombination, and repair enzymes as well as the viral capsid and tail structural genes. KVP40 lacks several T4 enzymes involved in host DNA degradation, appears not to synthesize the modified cytosine (hydroxymethyl glucose) present in T-even phages, and lacks group I introns. KVP40 likely utilizes the T4-type sigma-55 late transcription apparatus, but features of early- or middle-mode transcription were not identified. There are 26 CDSs that have no viral homolog, and many did not necessarily originate from Vibrio spp., suggesting an even broader host range for KVP40. From these latter CDSs, an NAD salvage pathway was inferred that appears to be unique among bacteriophages. Features of the KVP40 genome that distinguish it from T4 are presented, as well as those, such as the replication and virion gene clusters, that are substantially conserved.
Collapse
Affiliation(s)
- Eric S Miller
- Department of Microbiology, North Carolina State University, Raleigh, NC 27695-7615, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Majewski J. Dependence of mutational asymmetry on gene-expression levels in the human genome. Am J Hum Genet 2003; 73:688-92. [PMID: 12881777 PMCID: PMC1180696 DOI: 10.1086/378134] [Citation(s) in RCA: 61] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2003] [Accepted: 07/01/2003] [Indexed: 11/03/2022] Open
Abstract
A great deal of effort has been devoted to measuring the rates of different types of nucleotide substitutions. Mutation rates are known to depend on factors such as methylation status and nearest-neighbor nucleotide effects. However, until recently, in eukaryotes, the rates have not been considered to be strand specific. In a recent analysis of mammalian lineages, Green et al. (2003) uncovered an asymmetry in the frequencies of substitutions on the coding and noncoding strands of genes and showed that this resulted in a nucleotide-content asymmetry within most genes. The authors argue that this bias may be caused by the mammalian transcription-coupled repair in germ cells, but they did not demonstrate an association with germ-cell gene expression. In this work, I analyze nucleotide contents in genes with known expression patterns and levels and provide evidence that the observed asymmetry in mutation rates is, in fact, caused by transcription. The results also imply that germline transcription may occur in a large percentage, 71%-91%, of all human genes.
Collapse
Affiliation(s)
- Jacek Majewski
- Laboratory of Statistical Genetics, Rockefeller University, New York, NY, 10021, USA.
| |
Collapse
|
13
|
Miller ES, Kutter E, Mosig G, Arisaka F, Kunisawa T, Rüger W. Bacteriophage T4 genome. Microbiol Mol Biol Rev 2003; 67:86-156, table of contents. [PMID: 12626685 PMCID: PMC150520 DOI: 10.1128/mmbr.67.1.86-156.2003] [Citation(s) in RCA: 562] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Phage T4 has provided countless contributions to the paradigms of genetics and biochemistry. Its complete genome sequence of 168,903 bp encodes about 300 gene products. T4 biology and its genomic sequence provide the best-understood model for modern functional genomics and proteomics. Variations on gene expression, including overlapping genes, internal translation initiation, spliced genes, translational bypassing, and RNA processing, alert us to the caveats of purely computational methods. The T4 transcriptional pattern reflects its dependence on the host RNA polymerase and the use of phage-encoded proteins that sequentially modify RNA polymerase; transcriptional activator proteins, a phage sigma factor, anti-sigma, and sigma decoy proteins also act to specify early, middle, and late promoter recognition. Posttranscriptional controls by T4 provide excellent systems for the study of RNA-dependent processes, particularly at the structural level. The redundancy of DNA replication and recombination systems of T4 reveals how phage and other genomes are stably replicated and repaired in different environments, providing insight into genome evolution and adaptations to new hosts and growth environments. Moreover, genomic sequence analysis has provided new insights into tail fiber variation, lysis, gene duplications, and membrane localization of proteins, while high-resolution structural determination of the "cell-puncturing device," combined with the three-dimensional image reconstruction of the baseplate, has revealed the mechanism of penetration during infection. Despite these advances, nearly 130 potential T4 genes remain uncharacterized. Current phage-sequencing initiatives are now revealing the similarities and differences among members of the T4 family, including those that infect bacteria other than Escherichia coli. T4 functional genomics will aid in the interpretation of these newly sequenced T4-related genomes and in broadening our understanding of the complex evolution and ecology of phages-the most abundant and among the most ancient biological entities on Earth.
Collapse
Affiliation(s)
- Eric S Miller
- Department of Microbiology, North Carolina State University, Raleigh, North Carolina 27695-7615, USA.
| | | | | | | | | | | |
Collapse
|
14
|
Sueoka N. Wide intra-genomic G+C heterogeneity in human and chicken is mainly due to strand-symmetric directional mutation pressures: dGTP-oxidation and symmetric cytosine-deamination hypotheses. Gene 2002; 300:141-54. [PMID: 12468095 DOI: 10.1016/s0378-1119(02)01046-6] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
The intra-strand Parity Rule 2 of DNA (PR2) states that A=T and G=C within each strands. Useful corollaries of PR2 are G/(G+C)=A/(A+T)=0.5, G/(G+A)=C/(C+T)=G+C, G/(G+T)=C/(C+A)=G+C. Here. A, T, G, and C represent relative contents of the four nucleotide residues in a specific strand of DNA, so that A+T+G+C=1. Thus, deviations from the PR2 is a sign of strand-specific (or asymmetric) mutation and/or selection pressures. The present study delineates the symmetric and asymmetric effects of mutations on the intra-genomic heterogeneity of the G+C content in the human genome. The results of this study on the human genome are: (1) When both two- and four-codon amino acids were combined, only slight departures from the PR2 were observed in the total ranges of G+C content of the third-codon position. Thus, the G+C heterogeneity is likely to be caused by symmetric mutagenesis between the two strands. (2) The above result makes the deamination of cytosine due to double-strand breathing of DNA [Mol. Biol. Evol. 17 (2000) 1371] and/or incorporation of the oxidized guanine (8-oxo-guanine) opposite adenine during DNA replication (dGTP-oxidation hypothesis) as the most likely candidates for the major cause of the diversities of the G+C content. (3) Patterns of amino acid-specific PR2-biases detected by plotting PR2 corollaries against the G+C content of third codon position revealed that eight four-codon amino acids can be divided into three types by the second codon letter: (a) C(2)-type (Ala, Pro, Ser4, and Thr), (b) G(2)-type (Arg4 and Gly), and (c) T(2)-type (Leu4 and Val). (4) Most of the asymmetric plot patterns of the above three classes in PR2 biases can be explained by C(2)-->T(2) deamination of C(2)pG(3) of C(2)-type to T(2)pG(3) (T(2)-type) in both human and chicken. This explains the existence of some preferred codons in human and chicken. However, these biases (asymmetric) hardly contribute to the overall G+C content diversity of the third codon position.
Collapse
Affiliation(s)
- Noboru Sueoka
- Department of Molecular, Cellular, and Developmental Biology, University of Colorado, Boulder, CO 80309-0347, USA.
| |
Collapse
|
15
|
Lobry JR, Sueoka N. Asymmetric directional mutation pressures in bacteria. Genome Biol 2002; 3:RESEARCH0058. [PMID: 12372146 PMCID: PMC134625 DOI: 10.1186/gb-2002-3-10-research0058] [Citation(s) in RCA: 127] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2001] [Revised: 06/18/2002] [Accepted: 08/15/2002] [Indexed: 11/20/2022] Open
Abstract
BACKGROUND When there are no strand-specific biases in mutation and selection rates (that is, in the substitution rates) between the two strands of DNA, the average nucleotide composition is theoretically expected to be A = T and G = C within each strand. Deviations from these equalities are therefore evidence for an asymmetry in selection and/or mutation between the two strands. By focusing on weakly selected regions that could be oriented with respect to replication in 43 out of 51 completely sequenced bacterial chromosomes, we have been able to detect asymmetric directional mutation pressures. RESULTS Most of the 43 chromosomes were found to be relatively enriched in G over C and T over A, and slightly depleted in G+C, in their weakly selected positions (intergenic regions and third codon positions) in the leading strand compared with the lagging strand. Deviations from A = T and G = C were highly correlated between third codon positions and intergenic regions, with a lower degree of deviation in intergenic regions, and were not correlated with overall genomic G+C content. CONCLUSIONS During the course of bacterial chromosome evolution, the effects of asymmetric directional mutation pressures are commonly observed in weakly selected positions. The degree of deviation from equality is highly variable among species, and within species is higher in third codon positions than in intergenic regions. The orientation of these effects is almost universal and is compatible in most cases with the hypothesis of an excess of cytosine deamination in the single-stranded state during DNA replication. However, the variation in G+C content between species is influenced by factors other than asymmetric mutation pressure.
Collapse
Affiliation(s)
- Jean R Lobry
- Laboratoire BBE CNRS UMR 5558, Université Claude Bernard, 43 Bd du 11 Novembre 1918, F-69622 Villeurbanne cedex, France.
| | | |
Collapse
|
16
|
Abstract
The human genome, as in other eukaryotes, has a wide heterogeneity in the DNA base composition. The evolutionary basis for this heterogeneity has been unknown. A previous study of the human genome (846 genes analyzed) has shown that, in the major range of the G+C content in the third codon position (0.25-0.75), biases from the Parity Rule 2 (PR2) among the synonymous codons of the four-codon amino acids are similar except in the highest G+C range (Sueoka, N., 1999. Translation-coupled violation of Parity Rule 2 in human genes is not the cause of heterogeneity of the DNA G+C content of third codon position. Gene 238, 53-58.). PR2 is an intra-strand rule where A=T and G=C are expected when there are no biases between the two complementary strands of DNA in mutation and selection rates (substitution rates). In this study, 14,026 human genes were analyzed. In addition, the third codon positions of two-codon amino acids were analyzed. New results show the following: (a) The G+C contents of the third codon position of human genes are scattered in the G+C range of 0.22-0.96 in the third codon position. (b) The PR2 biases are similar in the range of 0.25-0.75, whereas, in the high G+C range (0.75-0.96; 13% of the genes), the PR2-bias fingerprints are different from those of the major range. (c) Unlike the PR2 biases, the G+C contents of the third codon position for both four-codon and two-codon amino acids are all correlated almost perfectly with the G+C content of the third codon position over the total G+C ranges. These results support the notion that the directional mutation pressure, rather than the directional selection pressure, is mainly responsible for the heterogeneity of the G+C content of the third codon position.
Collapse
Affiliation(s)
- N Sueoka
- University of Colorado, Department of Molecular, Cellular, and Developmental Biology, Boulder, CO 80309-0347, USA.
| | | |
Collapse
|
17
|
Beletskii A, Grigoriev A, Joyce S, Bhagwat AS. Mutations induced by bacteriophage T7 RNA polymerase and their effects on the composition of the T7 genome. J Mol Biol 2000; 300:1057-65. [PMID: 10903854 DOI: 10.1006/jmbi.2000.3944] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
We show here that transcription by the bacteriophage T7 RNA polymerase increases the deamination of cytosine bases in the non-transcribed strand to uracil, causing C to T mutations in that strand. Under optimal conditions, the mutation frequency increases about fivefold over background, and is similar to that seen with the Escherichia coli RNA polymerase. Further, we found that a mutant T7 RNA polymerase with a slower rate of elongation caused more cytosine deaminations than its wild-type parent. These results suggest that promoting cytosine deamination in the non-transcribed strand is a general property of transcription in E. coli and is dependent on the length of time the transcription bubble stays open during elongation. To see if transcription-induced mutations have influenced the evolution of bacteriophage T7, we analyzed its genome for a bias in base composition. Our analysis showed a significant excess of thymine over cytosine bases in the highly transcribed regions of the genome. Moreover, the average value of this bias correlated well with the levels of transcription of different genomic regions. Our results indicate that transcription-induced mutations have altered the composition of bacteriophage T7 genome and suggest that this may be a significant force in genome evolution.
Collapse
Affiliation(s)
- A Beletskii
- Department of Chemistry, Wayne State University, Detroit, MI 48202, USA
| | | | | | | |
Collapse
|
18
|
Sueoka N. Translation-coupled violation of Parity Rule 2 in human genes is not the cause of heterogeneity of the DNA G+C content of third codon position. Gene 1999; 238:53-8. [PMID: 10570983 DOI: 10.1016/s0378-1119(99)00320-0] [Citation(s) in RCA: 137] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
The genome of higher eukaryotes consists of genes having a widely heterogeneous base composition at the third codon position. Ubiquitous variability of the DNA base composition has the following two aspects: intragenomic heterogeneity of the G+C content and the amino-acid-specific translation-coupled biases from the Parity Rule 2 (PR2). PR2 is an intrastrand rule where A = T and G = C are expected if there is no bias in mutation and selection between the two complementary strands of DNA. To examine whether or not the biases from PR2 are responsible for the wide heterogeneity of the DNA G+C content in human, the third codon position of 846 human genes was analyzed. Genes were separated into six groups according to their G+C content of the third codon position, and each group was examined for the translation-coupled PR2 biases in the nucleotide composition of the third codon position for two- and four-codon amino acids. The results show that genes in the different G+C content groups have similar PR2 biases, indicating that the intragenomic heterogeneity of the G+C content is not correlated with translation-coupled biases from the PR2. Therefore, the heterogeneity of the G+C content is likely to be determined by some other mechanism (e.g. locally variable directional mutation pressures) than amino-acid-specific selections for the codon preference.
Collapse
Affiliation(s)
- N Sueoka
- University of Colorado, Department of Molecular, Cellular, and Developmental Biology, Boulder 80309-0347, USA.
| |
Collapse
|