1
|
Subramaniam S, Smith GR. RecBCD enzyme and Chi recombination hotspots as determinants of self vs. non-self: Myths and mechanisms. ADVANCES IN GENETICS 2022; 109:1-37. [PMID: 36334915 PMCID: PMC10047805 DOI: 10.1016/bs.adgen.2022.06.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Bacteria face a challenge when DNA enters their cells by transformation, mating, or phage infection. Should they treat this DNA as an invasive foreigner and destroy it, or consider it one of their own and potentially benefit from incorporating new genes or alleles to gain useful functions? It is frequently stated that the short nucleotide sequence Chi (5' GCTGGTGG 3'), a hotspot of homologous genetic recombination recognized by Escherichia coli's RecBCD helicase-nuclease, allows E. coli to distinguish its DNA (self) from any other DNA (non-self) and to destroy non-self DNA, and that Chi is "over-represented" in the E. coli genome. We show here that these latter statements (dogmas) are not supported by available evidence. We note Chi's wide-spread occurrence and activity in distantly related bacterial species and phages. We illustrate multiple, highly non-random features of the genomes of E. coli and coliphage P1 that account for Chi's high frequency and genomic position, leading us to propose that P1 selects for Chi's enhancement of recombination, whereas E. coli selects for the preferred codons in Chi. We discuss other, previously described mechanisms for self vs. non-self determination involving RecBCD and for RecBCD's destruction of DNA that cannot recombine, whether foreign or domestic, with or without Chi.
Collapse
Affiliation(s)
| | - Gerald R Smith
- Division of Basic Sciences, Fred Hutchinson Cancer Center, Seattle, WA, United States.
| |
Collapse
|
2
|
Tatarinova TV, Chekalin E, Nikolsky Y, Bruskin S, Chebotarov D, McNally KL, Alexandrov N. Nucleotide diversity analysis highlights functionally important genomic regions. Sci Rep 2016; 6:35730. [PMID: 27774999 PMCID: PMC5075931 DOI: 10.1038/srep35730] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2016] [Accepted: 09/30/2016] [Indexed: 12/15/2022] Open
Abstract
We analyzed functionality and relative distribution of genetic variants across the complete Oryza sativa genome, using the 40 million single nucleotide polymorphisms (SNPs) dataset from the 3,000 Rice Genomes Project (http://snp-seek.irri.org), the largest and highest density SNP collection for any higher plant. We have shown that the DNA-binding transcription factors (TFs) are the most conserved group of genes, whereas kinases and membrane-localized transporters are the most variable ones. TFs may be conserved because they belong to some of the most connected regulatory hubs that modulate transcription of vast downstream gene networks, whereas signaling kinases and transporters need to adapt rapidly to changing environmental conditions. In general, the observed profound patterns of nucleotide variability reveal functionally important genomic regions. As expected, nucleotide diversity is much higher in intergenic regions than within gene bodies (regions spanning gene models), and protein-coding sequences are more conserved than untranslated gene regions. We have observed a sharp decline in nucleotide diversity that begins at about 250 nucleotides upstream of the transcription start and reaches minimal diversity exactly at the transcription start. We found the transcription termination sites to have remarkably symmetrical patterns of SNP density, implying presence of functional sites near transcription termination. Also, nucleotide diversity was significantly lower near 3′ UTRs, the area rich with regulatory regions.
Collapse
Affiliation(s)
- Tatiana V Tatarinova
- Center for Personalized Medicine and Spatial Sciences Institute, University of Southern California, Los Angeles, CA, USA.,Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russian Federation
| | | | - Yuri Nikolsky
- Vavilov Institute of General Genetics, Moscow, Russia.,F1 Genomics, San Diego, CA, USA.,School of Systems Biology, George Mason University, VA, USA
| | | | - Dmitry Chebotarov
- International Rice Research Institute, Los Baños, Laguna 4031, Philippines
| | - Kenneth L McNally
- International Rice Research Institute, Los Baños, Laguna 4031, Philippines
| | | |
Collapse
|
3
|
Characterization of the intergenic spacer rDNAs of two pig nodule worms, Oesophagostomum dentatum and O. quadrispinulatum. ScientificWorldJournal 2014; 2014:147963. [PMID: 25197691 PMCID: PMC4147281 DOI: 10.1155/2014/147963] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2014] [Revised: 07/16/2014] [Accepted: 07/17/2014] [Indexed: 01/22/2023] Open
Abstract
The characteristics of the intergenic spacer rDNAs (IGS rDNAs) of Oesophagostomum dentatum and O. quadrispinulatum isolated from pigs in different geographical locations in Mainland China were determined, and the phylogenetic relationships of the two species were reconstructed using the IGS rDNA sequences. The organization of the IGS rDNA sequences was similar to their organization in other eukaryotes. The 28S-18S IGS rDNA sequences of both O. dentatum and O. quadrispinulatum were found to have variable lengths, that is, 759-762 bp and 937-1128 bp, respectively. All of the sequences contained direct repeats and inverted repeats. The length polymorphisms were related to the different numbers and organization of repetitive elements. Different types and numbers of repeats were found between the two pig nodule species, and two IGS structures were found within O. quadrispinulatum. Phylogenetic analysis showed that all O. dentatum isolates were clustered into one clade, but O. quadrispinulatum isolates from different origins were grouped into two distinct clusters. These results suggested independent species and the existence of genotypes or subspecies within pig nodule worms. Different types and numbers of repeats and IGS rDNA structures could serve as potential markers for differentiating these two species of pig nodule worms.
Collapse
|
4
|
Mahale KN, Kempraj V, Dasgupta D. Does the growth temperature of a prokaryote influence the purine content of its mRNAs? Gene 2012; 497:83-9. [PMID: 22305982 DOI: 10.1016/j.gene.2012.01.040] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2011] [Accepted: 01/19/2012] [Indexed: 11/20/2022]
Abstract
The formation and breaking of hydrogen bonds between nucleic acid bases are dependent on temperature. The high G+C content of organisms was surmised to be an adaptation for high temperature survival because of the thermal stability of G:C pairs. However, a survey of genomic GC% and optimum growth temperature (OGT) of several prokaryotes revoked any direct relation between them. Significantly high purine (R=A or G) content in mRNAs is also seen as a selective response for survival among thermophiles. Nevertheless, the biological relevance of thermophiles loading their unstable mRNAs with excess purines (purine-loading or R-loading) is not persuasive. Here, we analysed the mRNA sequences from the genomes of 168 prokaryotes (as obtained from NCBI Genome database) with their OGTs ranging from -5 °C to 100 °C to verify the relation between R-loading and OGT. Our analysis fails to demonstrate any correlation between R-loading of the mRNA pool and OGT of a prokaryote. The percentage of purine-loaded mRNAs in prokaryotes is found to be in a rough negative correlation with the genomic GC% (r(2)=0.655, slope=-1.478, P<000.1). We conclude that genomic GC% and bias against certain combinations of nucleotides drive the mRNA-synonymous (sense) strands of DNA towards variations in R-loading.
Collapse
|
5
|
Tatarinova TV, Alexandrov NN, Bouck JB, Feldmann KA. GC3 biology in corn, rice, sorghum and other grasses. BMC Genomics 2010; 11:308. [PMID: 20470436 PMCID: PMC2895627 DOI: 10.1186/1471-2164-11-308] [Citation(s) in RCA: 105] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2009] [Accepted: 05/16/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The third, or wobble, position in a codon provides a high degree of possible degeneracy and is an elegant fault-tolerance mechanism. Nucleotide biases between organisms at the wobble position have been documented and correlated with the abundances of the complementary tRNAs. We and others have noticed a bias for cytosine and guanine at the third position in a subset of transcripts within a single organism. The bias is present in some plant species and warm-blooded vertebrates but not in all plants, or in invertebrates or cold-blooded vertebrates. RESULTS Here we demonstrate that in certain organisms the amount of GC at the wobble position (GC3) can be used to distinguish two classes of genes. We highlight the following features of genes with high GC3 content: they (1) provide more targets for methylation, (2) exhibit more variable expression, (3) more frequently possess upstream TATA boxes, (4) are predominant in certain classes of genes (e.g., stress responsive genes) and (5) have a GC3 content that increases from 5'to 3'. These observations led us to formulate a hypothesis to explain GC3 bimodality in grasses. CONCLUSIONS Our findings suggest that high levels of GC3 typify a class of genes whose expression is regulated through DNA methylation or are a legacy of accelerated evolution through gene conversion. We discuss the three most probable explanations for GC3 bimodality: biased gene conversion, transcriptional and translational advantage and gene methylation.
Collapse
Affiliation(s)
- Tatiana V Tatarinova
- Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, USA.
| | | | | | | |
Collapse
|
6
|
RecBCD enzyme and the repair of double-stranded DNA breaks. Microbiol Mol Biol Rev 2009; 72:642-71, Table of Contents. [PMID: 19052323 DOI: 10.1128/mmbr.00020-08] [Citation(s) in RCA: 404] [Impact Index Per Article: 26.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
The RecBCD enzyme of Escherichia coli is a helicase-nuclease that initiates the repair of double-stranded DNA breaks by homologous recombination. It also degrades linear double-stranded DNA, protecting the bacteria from phages and extraneous chromosomal DNA. The RecBCD enzyme is, however, regulated by a cis-acting DNA sequence known as Chi (crossover hotspot instigator) that activates its recombination-promoting functions. Interaction with Chi causes an attenuation of the RecBCD enzyme's vigorous nuclease activity, switches the polarity of the attenuated nuclease activity to the 5' strand, changes the operation of its motor subunits, and instructs the enzyme to begin loading the RecA protein onto the resultant Chi-containing single-stranded DNA. This enzyme is a prototypical example of a molecular machine: the protein architecture incorporates several autonomous functional domains that interact with each other to produce a complex, sequence-regulated, DNA-processing machine. In this review, we discuss the biochemical mechanism of the RecBCD enzyme with particular emphasis on new developments relating to the enzyme's structure and DNA translocation mechanism.
Collapse
|
7
|
Microsatellites that violate Chargaff's second parity rule have base order-dependent asymmetries in the folding energies of complementary DNA strands and may not drive speciation. J Theor Biol 2008; 254:168-77. [DOI: 10.1016/j.jtbi.2008.05.013] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2008] [Revised: 05/16/2008] [Accepted: 05/16/2008] [Indexed: 11/21/2022]
|
8
|
Sernova NV, Gelfand MS. Identification of replication origins in prokaryotic genomes. Brief Bioinform 2008; 9:376-91. [PMID: 18660512 DOI: 10.1093/bib/bbn031] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The availability of hundreds of complete bacterial genomes has created new challenges and simultaneously opportunities for bioinformatics. In the area of statistical analysis of genomic sequences, the studies of nucleotide compositional bias and gene bias between strands and replichores paved way to the development of tools for prediction of bacterial replication origins. Only a few (about 20) origin regions for eubacteria and archaea have been proven experimentally. One reason for that may be that this is now considered as an essentially bioinformatics problem, where predictions are sufficiently reliable not to run labor-intensive experiments, unless specifically needed. Here we describe the main existing approaches to the identification of replication origin (oriC) and termination (terC) loci in prokaryotic chromosomes and characterize a number of computational tools based on various skew types and other types of evidence. We also classify the eubacterial and archaeal chromosomes by predictability of their replication origins using skew plots. Finally, we discuss possible combined approaches to the identification of the oriC sites that may be used to improve the prediction tools, in particular, the analysis of DnaA binding sites using the comparative genomic methods.
Collapse
Affiliation(s)
- Natalia V Sernova
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Bolshoi Karetny pereulok, 19, Moscow, 127994, Russia
| | | |
Collapse
|
9
|
Dreszer TR, Wall GD, Haussler D, Pollard KS. Biased clustered substitutions in the human genome: the footprints of male-driven biased gene conversion. Genome Res 2007; 17:1420-30. [PMID: 17785536 PMCID: PMC1987345 DOI: 10.1101/gr.6395807] [Citation(s) in RCA: 89] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
We examined fixed substitutions in the human lineage since divergence from the common ancestor with the chimpanzee, and determined what fraction are AT to GC (weak-to-strong). Substitutions that are densely clustered on the chromosomes show a remarkable excess of weak-to-strong "biased" substitutions. These unexpected biased clustered substitutions (UBCS) are common near the telomeres of all autosomes but not the sex chromosomes. Regions of extreme bias are enriched for genes. Human and chimp orthologous regions show a striking similarity in the shape and magnitude of their respective UBCS maps, suggesting a relatively stable force leads to clustered bias. The strong and stable signal near telomeres may have participated in the evolution of isochores. One exception to the UBCS pattern found in all autosomes is chromosome 2, which shows a UBCS peak midchromosome, mapping to the fusion site of two ancestral chromosomes. This provides evidence that the fusion occurred as recently as 740,000 years ago and no more than approximately 3 million years ago. No biased clustering was found in SNPs, suggesting that clusters of biased substitutions are selected from mutations. UBCS is strongly correlated with male (and not female) recombination rates, which explains the lack of UBCS signal on chromosome X. These observations support the hypothesis that biased gene conversion (BGC), specifically in the male germline, played a significant role in the evolution of the human genome.
Collapse
MESH Headings
- Animals
- Chromosomes, Human, Pair 2/genetics
- Chromosomes, Human, X/genetics
- Chromosomes, Human, Y/genetics
- Evolution, Molecular
- Female
- Gene Conversion
- Gene Fusion
- Genome, Human
- Humans
- Male
- Models, Genetic
- Pan troglodytes/genetics
- Polymorphism, Single Nucleotide
- Recombination, Genetic
- Sex Characteristics
- Species Specificity
- Telomere/genetics
- Time Factors
Collapse
Affiliation(s)
- Timothy R. Dreszer
- Department of Biomolecular Engineering, University of California, Santa Cruz, California 95064, USA
| | - Gregory D. Wall
- Department of Statistics, University of California, Davis, California 95616, USA
| | - David Haussler
- Department of Biomolecular Engineering, University of California, Santa Cruz, California 95064, USA
- Howard Hughes Medical Institute, University of California, Santa Cruz, California 95064, USA
- Corresponding authors.E-mail ; fax (831) 459-1809.E-mail ; fax (530) 754-9658
| | - Katherine S. Pollard
- Department of Statistics, University of California, Davis, California 95616, USA
- UC Davis Genome Center, University of California, Davis, California 95616, USA
- Corresponding authors.E-mail ; fax (831) 459-1809.E-mail ; fax (530) 754-9658
| |
Collapse
|
10
|
Arakawa K, Uno R, Nakayama Y, Tomita M. Validating the significance of genomic properties of Chi sites from the distribution of all octamers in Escherichia coli. Gene 2007; 392:239-46. [PMID: 17270364 DOI: 10.1016/j.gene.2006.12.022] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2006] [Revised: 12/15/2006] [Accepted: 12/18/2006] [Indexed: 10/23/2022]
Abstract
Chi sites (5'-GCTGGTGG-3') are homologous recombinational hotspot octamer sequences, which attenuate the exonuclease activity of RecBCD in Escherichia coli. They are overrepresented in the genome (1008 occurrences), preferentially located within coding regions (98%), oriented in the direction of replication (75%), and occur most commonly on the mRNA-synonymous sense strand of the double helix (79%). Previous statistical studies of the genome sequence suggested that these genomic properties of Chi sites appear to be related to their role in recombinational repair and therefore to replication and transcription. In this study, we employ three mathematical models to predict the properties of Chi sites from single nucleotide and multi-nucleotide compositions, and validate them statistically using the distribution of all octamer sequences in the entire genome, or exclusively within ORFs. The model based on the overall distribution of all octamers provided better predictions than the single nucleotide composition model, and the ORF and sense strand preference of Chi sites were shown to be within the standard deviation of all octamers. In contrast, the orientation bias of the Chi sites in the direction of replication was significant, although the bias was not as pronounced as with the single nucleotide composition model, suggesting a selective pressure related to the role of RecBCD in replication.
Collapse
Affiliation(s)
- Kazuharu Arakawa
- Institute for Advanced Biosciences, Keio University, Fujisawa 252-8520, Japan
| | | | | | | |
Collapse
|
11
|
Koepsell SA, Larson MA, Griep MA, Hinrichs SH. Staphylococcus aureus helicase but not Escherichia coli helicase stimulates S. aureus primase activity and maintains initiation specificity. J Bacteriol 2006; 188:4673-80. [PMID: 16788176 PMCID: PMC1482979 DOI: 10.1128/jb.00316-06] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2006] [Accepted: 04/17/2006] [Indexed: 11/20/2022] Open
Abstract
Bacterial primases are essential for DNA replication due to their role in polymerizing the formation of short RNA primers repeatedly on the lagging-strand template and at least once on the leading-strand template. The ability of recombinant Staphylococcus aureus DnaG primase to utilize different single-stranded DNA templates was tested using oligonucleotides of the sequence 5'-CAGA (CA)5 XYZ (CA)3-3', where XYZ represented the variable trinucleotide. These experiments demonstrated that S. aureus primase synthesized RNA primers predominately on templates containing 5'-d(CTA)-3' or TTA and to a much lesser degree on GTA-containing templates, in contrast to results seen with the Escherichia coli DnaG primase recognition sequence 5'-d(CTG)-3'. Primer synthesis was initiated complementarily to the middle nucleotide of the recognition sequence, while the third nucleotide, an adenosine, was required to support primer synthesis but was not copied into the RNA primer. The replicative helicases from both S. aureus and E. coli were tested for their ability to stimulate either S. aureus or E. coli primase. Results showed that each bacterial helicase could only stimulate the cognate bacterial primase. In addition, S. aureus helicase stimulated the production of full-length primers, whereas E. coli helicase increased the synthesis of only short RNA polymers. These studies identified important differences between E. coli and S. aureus related to DNA replication and suggest that each bacterial primase and helicase may have adapted unique properties optimized for replication.
Collapse
Affiliation(s)
- Scott A Koepsell
- Department of Microbiology and Pathology, University of Nebraska Medical Center, Omaha, Nebraska 68198-6495, USA
| | | | | | | |
Collapse
|
12
|
Abstract
The replication of the chromosome is among the most essential functions of the bacterial cell and influences many other cellular mechanisms, from gene expression to cell division. Yet the way it impacts on the bacterial chromosome was not fully acknowledged until the availability of complete genomes allowed one to look upon genomes as more than bags of genes. Chromosomal replication includes a set of asymmetric mechanisms, among which are a division in a lagging and a leading strand and a gradient between early and late replicating regions. These differences are the causes of many of the organizational features observed in bacterial genomes, in terms of both gene distribution and sequence composition along the chromosome. When asymmetries or gradients increase in some genomes, e.g. due to a different composition of the DNA polymerase or to a higher growth rate, so do the corresponding biases. As some of the features of the chromosome structure seem to be under strong selection, understanding such biases is important for the understanding of chromosome organization and adaptation. Inversely, understanding chromosome organization may shed further light on questions relating to replication and cell division. Ultimately, the understanding of the interplay between these different elements will allow a better understanding of bacterial genetics and evolution.
Collapse
Affiliation(s)
- Eduardo P C Rocha
- Atelier de Bioinformatique, Université Pierre et Marie Curie, 12, Rue Cuvier, 75005 Paris, and Unité Génétique des Génomes Bactériens, Institut Pasteur, 28 rue du Dr Roux, 75724 Paris Cedex 15, France
| |
Collapse
|
13
|
Vinogradov AE. Isochores and tissue-specificity. Nucleic Acids Res 2003; 31:5212-20. [PMID: 12930973 PMCID: PMC212799 DOI: 10.1093/nar/gkg699] [Citation(s) in RCA: 97] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2003] [Revised: 05/11/2003] [Accepted: 07/03/2003] [Indexed: 11/13/2022] Open
Abstract
The housekeeping (ubiquitously expressed) genes in the mammal genome were shown here to be on average slightly GC-richer than tissue-specific genes. Both housekeeping and tissue-specific genes occupy similar ranges of GC content, but the former tend to concentrate in the upper part of the range. In the human genome, tissue-specific genes show two maxima, GC-poor and GC-rich. The strictly tissue-specific human genes tend to concentrate in the GC-poor region; their distribution is left-skewed and thus reciprocal to the distribution of housekeeping genes. The intermediately tissue-specific genes show an intermediate GC content and the right-skewed distribution. Both in the human and mouse, genes specific for some tissues (e.g., parts of the central nervous system) have a higher average GC content than housekeeping genes. Since they are not transcribed in the germ line (in contrast to housekeeping genes), and therefore have a lower probability of inheritable gene conversion, this finding contradicts the biased gene conversion (BGC) explanation for elevated GC content in the heavy isochores of mammal genome. Genes specific for germ-line tissues (ovary, testes) show a low average GC content, which is also in contradiction to the BGC explanation. Both for the total data set and for the most part of tissues taken separately, a weak positive correlation was found between gene GC content and expression level. The fraction of ubiquitously expressed genes is nearly 1.5-fold higher in the mouse than in the human. This suggests that mouse tissues are comparatively less differentiated (on the molecular level), which can be related to a less pronounced isochoric structure of the mouse genome. In each separate tissue (in both species), tissue-specific genes do not form a clear-cut frequency peak (in contrast to housekeeping genes), but constitute a continuum with a gradually increasing degree of tissue-specificity, which probably reflects the path of cell differentiation and/or an independent use of the same protein in several unrelated tissues.
Collapse
Affiliation(s)
- Alexander E Vinogradov
- Institute of Cytology, Russian Academy of Sciences, Tikhoretsky Avenue 4, St Petersburg 194064, Russia.
| |
Collapse
|
14
|
Tayebi N, Stubblefield BK, Park JK, Orvisky E, Walker JM, LaMarca ME, Sidransky E. Reciprocal and nonreciprocal recombination at the glucocerebrosidase gene region: implications for complexity in Gaucher disease. Am J Hum Genet 2003; 72:519-34. [PMID: 12587096 PMCID: PMC1180228 DOI: 10.1086/367850] [Citation(s) in RCA: 91] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2002] [Accepted: 11/26/2002] [Indexed: 11/03/2022] Open
Abstract
Gaucher disease results from an autosomal recessive deficiency of the lysosomal enzyme glucocerebrosidase. The glucocerebrosidase gene is located in a gene-rich region of 1q21 that contains six genes and two pseudogenes within 75 kb. The presence of contiguous, highly homologous pseudogenes for both glucocerebrosidase and metaxin at the locus increases the likelihood of DNA rearrangements in this region. These recombinations can complicate genotyping in patients with Gaucher disease and contribute to the difficulty in interpreting genotype-phenotype correlations in this disorder. In the present study, DNA samples from 240 patients with Gaucher disease were examined using several complementary approaches to identify and characterize recombinant alleles, including direct sequencing, long-template polymerase chain reaction, polymorphic microsatellite repeats, and Southern blots. Among the 480 alleles studied, 59 recombinant alleles were identified, including 34 gene conversions, 18 fusions, and 7 downstream duplications. Twenty-two percent of the patients evaluated had at least one recombinant allele. Twenty-six recombinant alleles were found among 310 alleles from patients with type 1 disease, 18 among 74 alleles from patients with type 2 disease, and 15 among 96 alleles from patients with type 3 disease. Several patients carried two recombinations or mutations on the same allele. Generally, alleles resulting from nonreciprocal recombination (gene conversion) could be distinguished from those arising by reciprocal recombination (crossover and exchange), and the length of the converted sequence was determined. Homozygosity for a recombinant allele was associated with early lethality. Ten different sites of crossover and a shared pentamer motif sequence (CACCA) that could be a hotspot for recombination were identified. These findings contribute to a better understanding of genotype-phenotype relationships in Gaucher disease and may provide insights into the mechanisms of DNA rearrangement in other disorders.
Collapse
Affiliation(s)
- Nahid Tayebi
- Clinical Neuroscience Branch, National Institute of Mental Health, National Institutes of Health, Bethesda, MD 20892, USA
| | | | | | | | | | | | | |
Collapse
|
15
|
Cristillo AD, Mortimer JR, Barrette IH, Lillicrap TP, Forsdyke DR. Double-stranded RNA as a not-self alarm signal: to evade, most viruses purine-load their RNAs, but some (HTLV-1, Epstein-Barr) pyrimidine-load. J Theor Biol 2001; 208:475-91. [PMID: 11222051 DOI: 10.1006/jtbi.2000.2233] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
For double-stranded RNA (dsRNA) to signal the presence of foreign (non-self) nucleic acid, self-RNA-self-RNA interactions should be minimized. Indeed, self-RNAs appear to have been fine-tuned over evolutionary time by the introduction of purines in clusters in the loop regions of stem-loop structures. This adaptation should militate against the "kissing" interactions which initiate formation of dsRNA. Our analyses of virus base compositions suggest that, to avoid triggering the host cell's dsRNA surveillance mechanism, most viruses purine-load their RNAs to resemble host RNAs ("stealth" strategy). However, some GC-rich latent viruses (HTLV-1, EBV) pyrimidine-load their RNAs. It is suggested that when virus production begins, these RNAs suddenly increase in concentration and impair host mRNA function by virtue of an excess of complementary "kissing" interactions ("surprise" strategy). Remarkably, the only mRNA expressed in the most fundamental form of EBV latency (the "EBNA-1 program") is purine-loaded. This apparent stealth strategy is reinforced by a simple sequence repeat which prefers purine-rich codons. During latent infection the EBNA-1 protein may evade recognition by cytotoxic T-cells, not by virtue of containing a simple sequence amino acid repeat as has been proposed, but by virtue of the encoding mRNA being purine-loaded to prevent interactions with host RNAs of either genic or non-genic origin.
Collapse
Affiliation(s)
- A D Cristillo
- Department of Biochemistry, Queen's University, Kingston, Ontario, K7L3N6, Canada
| | | | | | | | | |
Collapse
|
16
|
Uno R, Nakayama Y, Arakawa K, Tomita M. The orientation bias of Chi sequences is a general tendency of G-rich oligomers. Gene 2000; 259:207-15. [PMID: 11163978 DOI: 10.1016/s0378-1119(00)00430-3] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
The Chi sequences are specific oligomers that stimulate DNA repair by homologous recombination, and are different sequences in each organism. Approximately 75% of the copies of the Chi sequence (5'-GCTGGTGG-3') of Escherichia coli reside on the leading strand, and this orientation bias is often believed to be a consequence of the biological role of Chi sequences as the signal sequence of RecBCD pathway in DNA replication. However, our computer analysis found that many G-rich oligomers also show this asymmetric orientation pattern. The shift in the Chi orientation bias appears around the replication origin and terminus, but these locations are also coincident with the shift points in GC content or GC skew. We conducted the same analysis with the genome of Bacillus subtilis, and found that in addition to Chi, other G-rich oligomers show similar asymmetric orientation patterns, whose shift points were coincident with those of the GC skew. However, the genome of Haemophilus influenzae Rd, whose GC skew is not so pronounced, does not clearly show asymmetric orientation patterns of Chi or other G-rich oligomers. These results lead us to suggest that the uneven distribution of the Chi orientation between the two strands of the double helix is mostly due to the uneven distribution of G content (GC skew) and that the replication-related function of Chi sequences is not the primary factor responsible for the evolutionary pressure causing the orientation bias.
Collapse
Affiliation(s)
- R Uno
- Laboratory for Bioinformatics, Keio University 5322 Endo, Fujisawa, Kanagawa 252-8502, Japan
| | | | | | | |
Collapse
|
17
|
Lao PJ, Forsdyke DR. Crossover hot-spot instigator (Chi) sequences in Escherichia coli occupy distinct recombination/transcription islands. Gene 2000; 243:47-57. [PMID: 10675612 DOI: 10.1016/s0378-1119(99)00564-8] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Crossover hot-spot instigator (Chi) sequences (5'-GCTGGTGG-3') are orientation-dependent, strand-specific sequences implicated in RecA-mediated DNA recombination. In Escherichia coli and Haemophilus influenzae Chi and Chi-like sequences preferentially locate to approx. 1kb recombination 'islands' in the mRNA-synonymous strands of open reading frames (ORFs). Since mRNA-synonymous strands follow Szybalski's transcription direction rule in being G-rich, and the average ORF is about 1kb, then, on this basis alone, Chi sequences are seen to reside in 1kb G-rich 'islands'. However, RecA preferentially binds GT-rich sequences, suggesting that genomic context might potentiate Chi action. Consistent with this, we report for E. coli that 1kb sequence windows with Chi near their centres are a distinct subset of total 1kb windows, the mRNA-synonymous strands being preferentially enriched in both G and T. Chi function might be particularly important for bacteria that survive high temperature and radiation. These often exist in habitats where recombination with E. coli DNA would be unlikely, so canonical Chi sequences might not confer a selective disadvantage in this respect. In general, Chi sequences are not more frequent in thermophilic bacteria and Deinococcus radiodurans, than in E. coli and other mesophilic bacteria. Only two of five thermophilic bacteria examined showed preferential location of Chi sequences to mRNA-synonymous strands. In the thermophile Methanococcus jannaschii, windows containing the canonical Chi sequence do not form a distinct subset. We suggest that in thermophilic bacteria and D. radiodurans the Chi function may be achieved by sequences that differ from the canonical Chi sequence, or that the number of these sequences is sufficient, or that the Chi function is unnecessary.
Collapse
Affiliation(s)
- P J Lao
- Department of Biochemistry, Queen's University, Kingston, Canada
| | | |
Collapse
|
18
|
Lao PJ, Forsdyke DR. Thermophilic bacteria strictly obey Szybalski's transcription direction rule and politely purine-load RNAs with both adenine and guanine. Genome Res 2000; 10:228-36. [PMID: 10673280 PMCID: PMC310832 DOI: 10.1101/gr.10.2.228] [Citation(s) in RCA: 83] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/1999] [Accepted: 12/16/1999] [Indexed: 11/24/2022]
Abstract
When transcription is to the right of the promoter, the "top," mRNA-synonymous strand of DNA tends to be purine-rich. When transcription is to the left of the promoter, the top, mRNA-template strand tends to be pyrimidine-rich. This transcription-direction rule suggests that there has been an evolutionary selection pressure for the purine-loading of RNAs. The politeness hypothesis states that purine-loading prevents distracting RNA-RNA interactions and excessive formation of double-stranded RNA, which might trigger various intracellular alarms. Because RNA-RNA interactions have a distinct entropy-driven component, the pressure for the evolution of purine-loading might be greater in organisms living at high temperatures. In support of this, we find that Chargaff differences (a measure of purine-loading) are greater in thermophiles than in nonthermophiles and extend to both purine bases. In thermophiles the pressure to purine-load affects codon choice, indicating that some features of their amino acid composition (e.g., high levels of glutamic acid) might reflect purine-loading pressure (i.e., constraints on mRNA) rather than direct constraints on protein structure and function.
Collapse
Affiliation(s)
- P J Lao
- Department of Biochemistry, Queen's University, Kingston, Ontario, K7L 3N6, Canada
| | | |
Collapse
|
19
|
Forsdyke DR. Two levels of information in DNA: relationship of Romanes' "intrinsic" variability of the reproductive system, and Bateson's "residue" to the species-dependent component of the base composition, (C+G)%. J Theor Biol 1999; 201:47-61. [PMID: 10534435 DOI: 10.1006/jtbi.1999.1013] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
In 1886 Charles Darwin's research associate George Romanes published a paper entitled "Physiological Selection: An Additional Suggestion on the Origin of Species". This was criticized by his Victorian contemporaries and largely ignored by those who followed. However, the recent recognition of two levels of information in DNA suggests that Romanes had solved the major problems with Darwin's theory. It was apparent from the outset that the form of reproductive isolation likely to apply most generally to initial species divergence (hybrid sterility), would depend on differences, not in "primary" information ("genic"), but in "secondary" information ("chromosomal"). This viewpoint, further elaborated by Bateson & Saunders (1902), White (1978), and King (1993), is criticized by the genic school (Coyne & Orr, 1998) because it requires visible differences between chromosomes, and appears not to explain Haldane's rule. However, chromosomal differentiation with respect to the species-dependent component of base composition [(C+G)%; Forsdyke, 1996] appears to resolve these problems. Because it explained so much, it was easy to believe that the genic viewpoint explained everything. Romanes and Bateson thought otherwise. We are only just beginning to recognize what they were trying to tell us.
Collapse
Affiliation(s)
- D R Forsdyke
- Department of Biochemistry, Queen's University, Kingston, Ontario, K7L 3N6, Canada.
| |
Collapse
|
20
|
El Karoui M, Biaudet V, Schbath S, Gruss A. Characteristics of Chi distribution on different bacterial genomes. Res Microbiol 1999; 150:579-87. [PMID: 10672998 DOI: 10.1016/s0923-2508(99)00132-1] [Citation(s) in RCA: 73] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
Abstract
The availability of full genome sequences provides the bases for analyzing global properties of the genetic text. For example, oligonucleotide sequences that are over- or underrepresented can be identified by taking into account the overall genome composition and organization. One of the most overrepresented oligonucleotides in Escherichia coli is the Chi site, an octanucleotide that stimulates DNA repair by homologous recombination. Here we analyze the genomic distribution of Chi in E. coli and in the three other bacteria where a Chi sequence has been identified; note that Chi is a different sequence in each organism. For each bacterial genome, Chi sequences are frequent, regularly distributed, and overrepresented. This suggests that selection for Chi may have occurred during evolution to favor efficient repair of a damaged chromosome. Other characteristics of Chi distribution are not conserved and might reflect specific features of DNA repair in each host. The different sequence and characteristics of Chi in each microorganism suggest that selection for Chi occurred independently in different bacteria.
Collapse
Affiliation(s)
- M El Karoui
- Laboratoire de génétique azppliquée-URLGA, INRA, Domaine de Vilvert, Jouy en Josas, France
| | | | | | | |
Collapse
|