151
|
Eppinger M, Baar C, Raddatz G, Huson DH, Schuster SC. Comparative analysis of four Campylobacterales. Nat Rev Microbiol 2004; 2:872-85. [PMID: 15494744 DOI: 10.1038/nrmicro1024] [Citation(s) in RCA: 77] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Comparative genome analysis can be used to identify species-specific genes and gene clusters, and analysis of these genes can give an insight into the mechanisms involved in a specific bacteria-host interaction. Comparative analysis can also provide important information on the genome dynamics and degree of recombination in a particular species. This article describes the comparative genome analysis of representatives of four different Campylobacterales species - two pathogens of humans, Helicobacter pylori and Campylobacter jejuni, as well as Helicobacter hepaticus, which is associated with liver cancer in rodents, and the non-pathogenic commensal species, Wolinella succinogenes.
Collapse
Affiliation(s)
- Mark Eppinger
- Max-Planck-Institute for Developmental Biology, Genome Centre, Spemannstr. 35, 72076 Tübingen, Germany
| | | | | | | | | |
Collapse
|
152
|
Touchon M, Arneodo A, d'Aubenton-Carafa Y, Thermes C. Transcription-coupled and splicing-coupled strand asymmetries in eukaryotic genomes. Nucleic Acids Res 2004; 32:4969-78. [PMID: 15388799 PMCID: PMC521644 DOI: 10.1093/nar/gkh823] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Under no-strand bias conditions, each genomic DNA strand should present equimolarities of A and T and of G and C. Deviations from these rules are attributed to asymmetric properties intrinsic to DNA mutation-repair processes. In bacteria, strand biases are associated with replication or transcription. In eukaryotes, recent studies demonstrate that human genes present transcription-coupled biases that might reflect transcription-coupled repair processes. Here, we study strand asymmetries in intron sequences of evolutionarily distant eukaryotes, and show that two superimposed intron biases can be distinguished. (i) Biases that are maximum at intron extremities and decrease over large distances to zero values in internal regions, possibly reflecting interactions between pre-mRNA and splicing machinery; these extend over approximately 0.5 kb in mammals and Arabidopsis thaliana, and over 1 kb in Caenorhabditis elegans and Drosophila melanogaster. (ii) Biases that are constant along introns, possibly associated with transcription. Strikingly, in C.elegans, these latter biases extend over intergenic regions that separate co-oriented genes. When appropriately examined, all genomes present transcription-coupled excess of T over A in the coding strand. On the opposite, GC skews are either positive (mammals, plants) or negative (invertebrates). These results suggest that transcription-coupled asymmetries result from mutation-repair mechanisms that differ between vertebrates and invertebrates.
Collapse
Affiliation(s)
- Marie Touchon
- Centre de Génétique Moléculaire (CNRS), Allée de la Terrasse, 91198 Gif-sur-Yvette, France
| | | | | | | |
Collapse
|
153
|
Contursi P, Pisani FM, Grigoriev A, Cannio R, Bartolucci S, Rossi M. Identification and autonomous replication capability of a chromosomal replication origin from the archaeon Sulfolobus solfataricus. Extremophiles 2004; 8:385-91. [PMID: 15480865 DOI: 10.1007/s00792-004-0399-y] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2003] [Accepted: 05/10/2004] [Indexed: 11/29/2022]
Abstract
Here, we describe the identification of a chromosomal DNA replication origin (oriC) from the hyperthermophilic archaeon Sulfolobus solfataricus (subdomain of Crenarchaeota). By means of a cumulative GC-skew analysis of the Sulfolobus genome sequence, a candidate oriC was mapped within a 1.12-kb region located between the two divergently transcribed MCM- and cdc6-like genes. We demonstrated that plasmids containing the Sulfolobus oriC sequence and a hygromycin-resistance selectable marker were maintained in an episomal state in transformed S. solfataricus cells under selective pressure. The proposed location of the origin was confirmed by 2-D gel electrophoresis experiments. This is the first report on the functional cloning of a chromosomal oriC from an archaeon and represents an important step toward the reconstitution of an archaeal in vitro DNA replication system.
Collapse
Affiliation(s)
- Patrizia Contursi
- Dipartimento di Chimica Biologica, Università degli Studi di Napoli, Via Mezzocannone, 16, 80134, Napoli, Italy
| | | | | | | | | | | |
Collapse
|
154
|
Abstract
Tracing the history of molecular changes in coronaviruses using phylogenetic methods can provide powerful insights into the patterns of modification to sequences that underlie alteration to selective pressure and molecular function in the SARS-CoV (severe acute respiratory syndrome coronavirus) genome. The topology and branch lengths of the phylogenetic relationships among the family Coronaviridae, including SARS-CoV, have been estimated using the replicase polyprotein. The spike protein fragments S1 (involved in receptor-binding) and S2 (involved in membrane fusion) have been found to have different mutation rates. Fragment S1 can be further divided into two regions (S1A, which comprises approximately the first 400 nucleotides, and S1B, comprising the next 280) that also show different rates of mutation. The phylogeny presented on the basis of S1B shows that SARS-CoV is closely related to MHV (murine hepatitis virus), which is known to bind the murine receptor CEACAM1. The predicted structure, accessibility and mutation rate of the S1B region is also presented. Because anti-SARS drugs based on S2 heptads have short half-lives and are difficult to manufacture, our findings suggest that the S1B region might be of interest for anti-SARS drug discovery.
Collapse
Affiliation(s)
- Pietro Liò
- EMBL European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK.
| | | |
Collapse
|
155
|
Mackiewicz P, Zakrzewska-Czerwinska J, Zawilak A, Dudek MR, Cebrat S. Where does bacterial replication start? Rules for predicting the oriC region. Nucleic Acids Res 2004; 32:3781-91. [PMID: 15258248 PMCID: PMC506792 DOI: 10.1093/nar/gkh699] [Citation(s) in RCA: 143] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Three methods, based on DNA asymmetry, the distribution of DnaA boxes and dnaA gene location, were applied to identify the putative replication origins in 120 chromosomes. The chromosomes were classified according to the agreement of these methods and the applicability of these methods was evaluated. DNA asymmetry is the most universal method of putative oriC identification in bacterial chromosomes, but it should be applied together with other methods to achieve better prediction. The three methods identify the same region as a putative origin in all Bacilli and Clostridia, many Actinobacteria and gamma Proteobacteria. The organization of clusters of DnaA boxes was analysed in detail. For 76 chromosomes, a DNA fragment containing multiple DnaA boxes was identified as a putative origin region. Most bacterial chromosomes exhibit an overrepresentation of DnaA boxes; many of them contain at least two clusters of DnaA boxes in the vicinity of the oriC region. The additional clusters of DnaA boxes are probably involved in controlling replication initiation. Surprisingly, the characteristic features of the initiation of replication, i.e. a cluster of DnaA boxes, a dnaA gene and a switch in asymmetry, were not found in some of the analysed chromosomes, particularly those of obligatory intracellular parasites or endosymbionts. This is presumably connected with many mechanisms disturbing DNA asymmetry, translocation or disappearance of the dnaA gene and decay of the Escherichia coli perfect DnaA box pattern.
Collapse
Affiliation(s)
- Pawel Mackiewicz
- Department of Genomics, Institute of Genetics and Microbiology, University of Wrocław, Przybyszewskiego 63/77, 51-148 Wrocław, Poland
| | | | | | | | | |
Collapse
|
156
|
Zhang CT, Zhang R. A nucleotide composition constraint of genome sequences. Comput Biol Chem 2004; 28:149-53. [PMID: 15130543 DOI: 10.1016/j.compbiolchem.2004.02.002] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2004] [Revised: 02/15/2004] [Accepted: 02/15/2004] [Indexed: 11/23/2022]
Abstract
Let a, c, g and t denote the occurrence frequencies of A, C, G and T, respectively, in a genome. We calculated the statistical quantity S = a2 + c2 + g2 + t2 for each of 809 genomes (11 archaea, 42 bacteria, 3 eukaryota, 90 phages, 36 viroids and 627 viruses) and 236 plasmids. We found that S < 1/3 is strictly valid for almost all of the above genomes or plasmids. As a direct deduction of the above observation, it is shown that (i) the statistical quantity S is a kind of genome order index, which is negatively correlated with the Shannon H function; (ii) S < 1/3 suggests that a minimal value of the Shannon H function is required for each genome; (iii) S defined above would be a new biological statistical quantity, useful to describe the composition features of genomes; (iv) By jointly considering the Chargaff Parity Rule 2, it is shown that the genomic G + C content should be in between 0.211 and 0.789.
Collapse
Affiliation(s)
- Chun-Ting Zhang
- Department of Physics, Tianjin University, Tianjin 300072, China.
| | | |
Collapse
|
157
|
Aerts S, Thijs G, Dabrowski M, Moreau Y, De Moor B. Comprehensive analysis of the base composition around the transcription start site in Metazoa. BMC Genomics 2004; 5:34. [PMID: 15171795 PMCID: PMC436054 DOI: 10.1186/1471-2164-5-34] [Citation(s) in RCA: 58] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2004] [Accepted: 06/01/2004] [Indexed: 11/29/2022] Open
Abstract
Background The transcription start site of a metazoan gene remains poorly understood, mostly because there is no clear signal present in all genes. Now that several sequenced metazoan genomes have been annotated, we have been able to compare the base composition around the transcription start site for all annotated genes across multiple genomes. Results The most prominent feature in the base compositions is a significant local variation in G+C content over a large region around the transcription start site. The change is present in all animal phyla but the extent of variation is different between distinct classes of vertebrates, and the shape of the variation is completely different between vertebrates and arthropods. Furthermore, the height of the variation correlates with CpG frequencies in vertebrates but not in invertebrates and it also correlates with gene expression, especially in mammals. We also detect GC and AT skews in all clades (where %G is not equal to %C or %A is not equal to %T respectively) but these occur in a more confined region around the transcription start site and in the coding region. Conclusions The dramatic changes in nucleotide composition in humans are a consequence of CpG nucleotide frequencies and of gene expression, the changes in Fugu could point to primordial CpG islands, and the changes in the fly are of a totally different kind and unrelated to dinucleotide frequencies.
Collapse
Affiliation(s)
- Stein Aerts
- Department of Electrical Engineering (ESAT-SCD), Katholieke Universiteit Leuven, Belgium
| | - Gert Thijs
- Department of Electrical Engineering (ESAT-SCD), Katholieke Universiteit Leuven, Belgium
| | - Michal Dabrowski
- Laboratory of Transcription Regulation, Nencki Institute, Warsaw, Poland
| | - Yves Moreau
- Department of Electrical Engineering (ESAT-SCD), Katholieke Universiteit Leuven, Belgium
- On leave at Center for Biological Sequence Analysis, BioCentrum, Technical University of Denmark, Lyngby, Denmark
| | - Bart De Moor
- Department of Electrical Engineering (ESAT-SCD), Katholieke Universiteit Leuven, Belgium
| |
Collapse
|
158
|
Abstract
The replication of the chromosome is among the most essential functions of the bacterial cell and influences many other cellular mechanisms, from gene expression to cell division. Yet the way it impacts on the bacterial chromosome was not fully acknowledged until the availability of complete genomes allowed one to look upon genomes as more than bags of genes. Chromosomal replication includes a set of asymmetric mechanisms, among which are a division in a lagging and a leading strand and a gradient between early and late replicating regions. These differences are the causes of many of the organizational features observed in bacterial genomes, in terms of both gene distribution and sequence composition along the chromosome. When asymmetries or gradients increase in some genomes, e.g. due to a different composition of the DNA polymerase or to a higher growth rate, so do the corresponding biases. As some of the features of the chromosome structure seem to be under strong selection, understanding such biases is important for the understanding of chromosome organization and adaptation. Inversely, understanding chromosome organization may shed further light on questions relating to replication and cell division. Ultimately, the understanding of the interplay between these different elements will allow a better understanding of bacterial genetics and evolution.
Collapse
Affiliation(s)
- Eduardo P C Rocha
- Atelier de Bioinformatique, Université Pierre et Marie Curie, 12, Rue Cuvier, 75005 Paris, and Unité Génétique des Génomes Bactériens, Institut Pasteur, 28 rue du Dr Roux, 75724 Paris Cedex 15, France
| |
Collapse
|
159
|
Abstract
Focused efforts by several international laboratories have resulted in the sequencing of the genome of the causative agent of severe acute respiratory syndrome (SARS), novel coronavirus SARS-CoV, in record time. Using cumulative skew diagrams, I found that mutational patterns in the SARS-CoV genome were strikingly different from other coronaviruses in terms of mutation rates, although they were in general agreement with the model of the coronavirus lifecycle. These findings might be relevant for the development of sequence-based diagnostics and the design of agents to treat SARS.
Collapse
|
160
|
Zhang R, Zhang CT. Identification of replication origins in the genome of the methanogenic archaeon, Methanocaldococcus jannaschii. Extremophiles 2004; 8:253-8. [PMID: 15197606 DOI: 10.1007/s00792-004-0385-4] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2003] [Accepted: 02/10/2004] [Indexed: 10/26/2022]
Abstract
Methanocaldococcus jannaschii has been notorious as an archaeon in which the replication origins are difficult to identify. Although extensive efforts have been exerted on this issue, the locations of replication origins still remain elusive 7 years after the publication of its complete genome sequence in 1996. Ambiguous results were obtained in identifying the replication origins of M. jannaschii based on all theoretical and experimental approaches. In the genome of M. jannaschii, we found that an ORF (MJ0774), annotated as a hypothetical protein, is a homologue of the Cdc6 protein. The position of the gene is at a global minimum of the x component of the Z curve, i.e., RY disparity curve, which has been used to identify replication origins in other Archaea. In addition, an intergenic region (694,540-695,226 bp) that is between the cdc6 gene and an adjacent ORF shows almost all the characteristics of known replication origins, i.e., it is highly rich in AT composition (80%) and contains multiple copies of repeat elements and AT stretches. Therefore, these lines of evidence strongly suggest that the identified region is a replication origin, which is designated as oriC1. The analysis of the y component of the Z curve, i.e., MK disparity curve, suggests the presence of another replication origin corresponding to one of the peaks in the MK disparity curve at around 1,388 kb of the genome.
Collapse
Affiliation(s)
- Ren Zhang
- Department of Epidemiology and Biostatistics, Tianjin Cancer Institute and Hospital, 300060 Tianjin, China
| | | |
Collapse
|
161
|
Touchon M, Nicolay S, Arneodo A, d'Aubenton-Carafa Y, Thermes C. Transcription-coupled TA and GC strand asymmetries in the human genome. FEBS Lett 2004; 555:579-82. [PMID: 14675777 DOI: 10.1016/s0014-5793(03)01306-1] [Citation(s) in RCA: 59] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Analysis of the whole set of human genes reveals that most of them present TA and GC skews, that these biases are correlated to each other and are specific to gene sequences, exhibiting sharp transitions between transcribed and non-transcribed regions. The GC asymmetries cannot be explained solely by a model previously proposed for (G+T) skew based on transitions measured in a small set of human genes. We propose that the GC skew results from additional transcription-coupled mutation process that would include transversions. During evolution, both processes acting on a large majority of genes in germline cells would have produced these transcription-coupled strand asymmetries.
Collapse
Affiliation(s)
- M Touchon
- Centre de Génétique Moléculaire, CNRS, Allée de la Terrasse, 91198, Gif-sur-Yvette, France
| | | | | | | | | |
Collapse
|
162
|
Singer GAC, Hickey DA. Thermophilic prokaryotes have characteristic patterns of codon usage, amino acid composition and nucleotide content. Gene 2004; 317:39-47. [PMID: 14604790 DOI: 10.1016/s0378-1119(03)00660-7] [Citation(s) in RCA: 126] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
A number of recent studies have shown that thermophilic prokaryotes have distinguishable patterns of both synonymous codon usage and amino acid composition, indicating the action of natural selection related to thermophily. On the other hand, several other studies of whole genomes have illustrated that nucleotide bias can have dramatic effects on synonymous codon usage and also on the amino acid composition of the encoded proteins. This raises the possibility that the thermophile-specific patterns observed at both the codon and protein levels are merely reflections of a single underlying effect at the level of nucleotide composition. Moreover, such an effect at the nucleotide level might be due entirely to mutational bias. In this study, we have compared the genomes of thermophiles and mesophiles at three levels: nucleotide content, codon usage and amino acid composition. Our results indicate that the genomes of thermophiles are distinguishable from mesophiles at all three levels and that the codon and amino acid frequency differences cannot be explained simply by the patterns of nucleotide composition. At the nucleotide level, we see a consistent tendency for the frequency of adenine to increase at all codon positions within the thermophiles. Thermophiles are also distinguished by their pattern of synonymous codon usage for several amino acids, particularly arginine and isoleucine. At the protein level, the most dramatic effect is a two-fold decrease in the frequency of glutamine residues among thermophiles. These results indicate that adaptation to growth at high temperature requires a coordinated set of evolutionary changes affecting (i) mRNA thermostability, (ii) stability of codon-anticodon interactions and (iii) increased thermostability of the protein products. We conclude that elevated growth temperature imposes selective constraints at all three molecular levels: nucleotide content, codon usage and amino acid composition. In addition to these multiple selective effects, however, the genomes of both thermophiles and mesophiles are often subject to superimposed large changes in composition due to mutational bias.
Collapse
Affiliation(s)
- Gregory A C Singer
- Department of Biology, University of Ottawa, 30 Marie Curie, Ottawa, Ontario, Canada K1N 6N5.
| | | |
Collapse
|
163
|
|
164
|
Rispe C, Delmotte F, van Ham RCHJ, Moya A. Mutational and selective pressures on codon and amino acid usage in Buchnera, endosymbiotic bacteria of aphids. Genome Res 2004; 14:44-53. [PMID: 14672975 PMCID: PMC314276 DOI: 10.1101/gr.1358104] [Citation(s) in RCA: 73] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2003] [Revised: 10/08/2003] [Indexed: 02/07/2023]
Abstract
We have explored compositional variation at synonymous (codon usage) and nonsynonymous (amino acid usage) positions in three complete genomes of Buchnera, endosymbiotic bacteria of aphids, and also in their orthologs in Escherichia coli, a close free-living relative. We sought to discriminate genes of variable expression levels in order to weigh the relative contributions of mutational bias and selection in the genomic changes following symbiosis. We identified clear strand asymmetries, distribution biases (putative high-expression genes were found more often on the leading strand), and a residual slight codon bias within each strand. Amino acid usage was strongly biased in putative high-expression genes, characterized by avoidance of aromatic amino acids, but above all by greater conservation and resistance to AT enrichment. Despite the almost complete loss of codon bias and heavy mutational pressure, selective forces are still strong at nonsynonymous sites of a fraction of the genome. However, Buchnera from Baizongia pistaciae appears to have suffered a stronger symbiotic syndrome than the two other species.
Collapse
Affiliation(s)
- Claude Rispe
- UMR BIO3P, Institut National de la Recherche Agronomique, BP35327, 35653 Le Rheu cedex, France.
| | | | | | | |
Collapse
|
165
|
Nilsson D, Andersson B. A graphical tool for parasite genome annotation. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2004; 73:55-60. [PMID: 14715167 DOI: 10.1016/s0169-2607(02)00162-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
A graphical tool to facilitate rapid primary annotation of genomic sequence has been developed. Within a single interface the user can import sequences or database entries, run feature prediction programs and similarity searches, filter results, add additional manually found features and notes, and finally export annotations for database submission. Integrated rule-based feature corroboration and a novel decision support heuristic using ORF orientation, length and base-composition further enhances the efficiency of the annotation process without compromising flexibility. The program has been explicitly tailored to use in protozoan parasite genome projects, but can constitute a useful tool for prokaryote annotation as well. It is successfully being used by our lab in the Trypanosoma cruzi genome project, and can be obtained from the authors upon request.
Collapse
Affiliation(s)
- Daniel Nilsson
- Center for Genomics and Bioinformatics, Karolinska Institutet, S-17177 Stockholm, Sweden
| | | |
Collapse
|
166
|
Dermitzakis ET, Reymond A, Scamuffa N, Ucla C, Kirkness E, Rossier C, Antonarakis SE. Evolutionary Discrimination of Mammalian Conserved Non-Genic Sequences (CNGs). Science 2003; 302:1033-5. [PMID: 14526086 DOI: 10.1126/science.1087047] [Citation(s) in RCA: 136] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Analysis of the human and mouse genomes identified an abundance of conserved non-genic sequences (CNGs). The significance and evolutionary depth of their conservation remain unanswered. We have quantified levels and patterns of conservation of 191 CNGs of human chromosome 21 in 14 mammalian species. We found that CNGs are significantly more conserved than protein-coding genes and noncoding RNAS (ncRNAs) within the mammalian class from primates to monotremes to marsupials. The pattern of substitutions in CNGs differed from that seen in protein-coding and ncRNA genes and resembled that of protein-binding regions. About 0.3% to 1% of the human genome corresponds to a previously unknown class of extremely constrained CNGs shared among mammals.
Collapse
Affiliation(s)
- Emmanouil T Dermitzakis
- Division of Medical Genetics and National Center of Competence in Research (NCCR) Frontiers in Genetics, University of Geneva Medical School and University Hospitals, 1211 Geneva, Switzerland.
| | | | | | | | | | | | | |
Collapse
|
167
|
Majewski J. Dependence of mutational asymmetry on gene-expression levels in the human genome. Am J Hum Genet 2003; 73:688-92. [PMID: 12881777 PMCID: PMC1180696 DOI: 10.1086/378134] [Citation(s) in RCA: 61] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2003] [Accepted: 07/01/2003] [Indexed: 11/03/2022] Open
Abstract
A great deal of effort has been devoted to measuring the rates of different types of nucleotide substitutions. Mutation rates are known to depend on factors such as methylation status and nearest-neighbor nucleotide effects. However, until recently, in eukaryotes, the rates have not been considered to be strand specific. In a recent analysis of mammalian lineages, Green et al. (2003) uncovered an asymmetry in the frequencies of substitutions on the coding and noncoding strands of genes and showed that this resulted in a nucleotide-content asymmetry within most genes. The authors argue that this bias may be caused by the mammalian transcription-coupled repair in germ cells, but they did not demonstrate an association with germ-cell gene expression. In this work, I analyze nucleotide contents in genes with known expression patterns and levels and provide evidence that the observed asymmetry in mutation rates is, in fact, caused by transcription. The results also imply that germline transcription may occur in a large percentage, 71%-91%, of all human genes.
Collapse
Affiliation(s)
- Jacek Majewski
- Laboratory of Statistical Genetics, Rockefeller University, New York, NY, 10021, USA.
| |
Collapse
|
168
|
El-Sayed NMA, Ghedin E, Song J, MacLeod A, Bringaud F, Larkin C, Wanless D, Peterson J, Hou L, Taylor S, Tweedie A, Biteau N, Khalak HG, Lin X, Mason T, Hannick L, Caler E, Blandin G, Bartholomeu D, Simpson AJ, Kaul S, Zhao H, Pai G, Van Aken S, Utterback T, Haas B, Koo HL, Umayam L, Suh B, Gerrard C, Leech V, Qi R, Zhou S, Schwartz D, Feldblyum T, Salzberg S, Tait A, Turner CMR, Ullu E, White O, Melville S, Adams MD, Fraser CM, Donelson JE. The sequence and analysis of Trypanosoma brucei chromosome II. Nucleic Acids Res 2003; 31:4856-63. [PMID: 12907728 PMCID: PMC169936 DOI: 10.1093/nar/gkg673] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2003] [Revised: 05/29/2003] [Accepted: 06/09/2003] [Indexed: 11/14/2022] Open
Abstract
We report here the sequence of chromosome II from Trypanosoma brucei, the causative agent of African sleeping sickness. The 1.2-Mb pairs encode about 470 predicted genes organised in 17 directional clusters on either strand, the largest cluster of which has 92 genes lined up over a 284-kb region. An analysis of the GC skew reveals strand compositional asymmetries that coincide with the distribution of protein-coding genes, suggesting these asymmetries may be the result of transcription-coupled repair on coding versus non-coding strand. A 5-cM genetic map of the chromosome reveals recombinational 'hot' and 'cold' regions, the latter of which is predicted to include the putative centromere. One end of the chromosome consists of a 250-kb region almost exclusively composed of RHS (pseudo)genes that belong to a newly characterised multigene family containing a hot spot of insertion for retroelements. Interspersed with the RHS genes are a few copies of truncated RNA polymerase pseudogenes as well as expression site associated (pseudo)genes (ESAGs) 3 and 4, and 76 bp repeats. These features are reminiscent of a vestigial variant surface glycoprotein (VSG) gene expression site. The other end of the chromosome contains a 30-kb array of VSG genes, the majority of which are pseudogenes, suggesting that this region may be a site for modular de novo construction of VSG gene diversity during transposition/gene conversion events.
Collapse
|
169
|
Achaz G, Coissac E, Netter P, Rocha EPC. Associations between inverted repeats and the structural evolution of bacterial genomes. Genetics 2003; 164:1279-89. [PMID: 12930739 PMCID: PMC1462642 DOI: 10.1093/genetics/164.4.1279] [Citation(s) in RCA: 61] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The stability of the structure of bacterial genomes is challenged by recombination events. Since major rearrangements (i.e., inversions) are thought to frequently operate by homologous recombination between inverted repeats, we analyzed the presence and distribution of such repeats in bacterial genomes and their relation to the conservation of chromosomal structure. First, we show that there is a strong under-representation of inverted repeats, relative to direct repeats, in most chromosomes, especially among the ones regarded as most stable. Second, we show that the avoidance of repeats is frequently associated with the stability of the genomes. Closely related genomes reported to differ in terms of stability are also found to differ in the number of inverted repeats. Third, when using replication strand bias as a proxy for genome stability, we find a significant negative correlation between this strand bias and the abundance of inverted repeats. Fourth, when measuring the recombining potential of inverted repeats and their eventual impact on different features of the chromosomal structure, we observe a tendency of repeats to be located in the chromosome in such a way that rearrangements produce a smaller strand switch and smaller asymmetries than expected by chance. Finally, we discuss the limitations of our analysis and the influence of factors such as the nature of repeats, e.g., transposases, or the differences in the recombination machinery among bacteria. These results shed light on the challenges imposed on the genome structure by the presence of inverted repeats.
Collapse
Affiliation(s)
- Guillaume Achaz
- Structure et Dynamique des Génomes, Institut Jacques Monod, 75251 Paris, France
| | | | | | | |
Collapse
|
170
|
Chen LL, Zhang CT. Seven GC-rich microbial genomes adopt similar codon usage patterns regardless of their phylogenetic lineages. Biochem Biophys Res Commun 2003; 306:310-7. [PMID: 12788106 DOI: 10.1016/s0006-291x(03)00973-2] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Seven GC-rich (group I) and three AT-rich (group II) microbial genomes are analyzed in this paper. The seven microbes in group I belong to different phylogenetic lineages, even different domains of life. The common feature is that they are highly GC-rich organisms, with more than 60% genomic GC content. Group II includes three bacteria, which belong to the same subdivision as Pseudomonas aeruginosa in group I. The genomic GC content of the three bacteria is in the range of 26-50%. It is shown that although the phylogenetic lineages of the organisms in group I are remote, the common feature of highly genomic GC content forces them to adopt similar codon usage patterns, which constitutes the basis of an algorithm using a set of universal parameters to recognize known genes in the seven genomes. The common codon usage pattern of function known genes in the seven genomes is GGS type, where G, G, and S are the bases of G, non-G, and G/C, respectively. On the contrary, although the phylogenetic lineages of the three bacteria in group II are quite close, the codon usage patterns of function known genes in these genomes are obviously distinct. There are no universal parameters to identify known genes in the three genomes in group II. It can be deduced that the genomic GC content is more important than phylogenetic lineage in gene recognition programs. We hope that the work might be useful for understanding the common characteristics in the organization of microbial genomes.
Collapse
Affiliation(s)
- Ling-Ling Chen
- Department of Physics, Tianjin University, 300072, Tianjin, China
| | | |
Collapse
|
171
|
Song J, Ware A, Liu SL. Wavelet to predict bacterial ori and ter: a tendency towards a physical balance. BMC Genomics 2003; 4:17. [PMID: 12732098 PMCID: PMC156607 DOI: 10.1186/1471-2164-4-17] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2003] [Accepted: 05/05/2003] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Chromosomal DNA replication in bacteria starts at the origin (ori) and the two replicores propagate in opposite directions up to the terminus (ter) region. We hypothesize that the two replicores need to reach ter at the same time to maintain a physical balance; DNA insertion would disrupt such a balance, requiring chromosomal rearrangements to restore the balance. To test this hypothesis, we needed to demonstrate that ori and ter are in a physical balance in bacterial chromosomes. Using wavelet analysis, we documented GC skew, AT skew, purine excess and keto excess on the published bacterial genomic sequences to locate the turning (minimum and maximum) points on the curves. Previously, the minimum point had been supposed to correlate with ori and the maximum to correlate with ter. RESULTS We observed a strong tendency of the bacterial chromosomes towards a physical balance, with the minima and maxima corresponding to the known or putative ori and ter and being about half chromosome separated in most of the bacteria studied. A nonparametric method based on wavelet transformation was employed to perform significance tests for the predicted loci. CONCLUSIONS The wavelet approach can reliably predict the ori and ter regions and the bacterial chromosomes have a strong tendency towards a physical balance between ori and ter.
Collapse
Affiliation(s)
- Jiuzhou Song
- Departments of Microbiology and Infectious Diseases, University of Calgary, Calgary, Canada
| | - Antony Ware
- Mathematics and Statistics, University of Calgary, Calgary, Canada
| | - Shu-Lin Liu
- Departments of Microbiology and Infectious Diseases, University of Calgary, Calgary, Canada
- Department of Microbiology, Peking University School of Basic Medical Sciences, Beijing, China
| |
Collapse
|
172
|
Green P, Ewing B, Miller W, Thomas PJ, Green ED. Transcription-associated mutational asymmetry in mammalian evolution. Nat Genet 2003; 33:514-7. [PMID: 12612582 DOI: 10.1038/ng1103] [Citation(s) in RCA: 206] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2002] [Accepted: 01/19/2003] [Indexed: 11/09/2022]
Abstract
Although mutation is commonly thought of as a random process, evolutionary studies show that different types of nucleotide substitution occur with widely varying rates that presumably reflect biases intrinsic to mutation and repair mechanisms. A strand asymmetry, the occurrence of particular substitution types at higher rates than their complementary types, that is associated with DNA replication has been found in bacteria and mitochondria. A strand asymmetry that is associated with transcription and attributable to higher rates of cytosine deamination on the coding strand has been observed in enterobacteria. Here, we describe a qualitatively different transcription-associated strand asymmetry in mammals, which may be a byproduct of transcription-coupled repair in germline cells. This mutational asymmetry has acted over long periods of time to produce a compositional asymmetry, an excess of G+T over A+C on the coding strand, in most genes. The mutational and compositional asymmetries can be used to detect the orientations and approximate extents of transcribed regions.
Collapse
Affiliation(s)
- Phil Green
- Howard Hughes Medical Institute and Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA.
| | | | | | | | | |
Collapse
|
173
|
Abstract
In many prokaryotes, asymmetrical mutational or selective pressures have caused compositional skews between complementary strands of replication arms, especially sensitive in the distribution of guanine and cytosine. In Escherichia coli, most of the guanine/cytosine skew is caused by mutation rates differing on leading and lagging strands, but contribution of skewed functionally important guanine-rich motifs (Chi and Rag sites), which control chromosome repair or positioning, is noticeable. Interference between replication and gene expression plays a minor role. The situation may be different in other bacteria. Studies of chromosome processing and bacterial taxonomy might profit from consideration of chromosome polarisation.
Collapse
Affiliation(s)
- Jean R Lobry
- Laboratoire de Biométrie et Biologie évolutive, CNRS UMR 5558, Université Claude Bernard, 43 Boulevard du 11 Novembre 1918, F-69622 Villeurbanne cedex, France.
| | | |
Collapse
|
174
|
Zhang R, Zhang CT. Multiple replication origins of the archaeon Halobacterium species NRC-1. Biochem Biophys Res Commun 2003; 302:728-34. [PMID: 12646230 DOI: 10.1016/s0006-291x(03)00252-3] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
The genomic sequence of the halophilic archaeon Halobacterium NRC-1 has been analyzed by the Z curve method. The Z curve is a three-dimensional curve that uniquely represents a given DNA sequence. Based on the known behaviors of the Z curves for the archaea whose replication origins have been identified, the analysis of the Z curve for the genome of Halobacterium NRC-1 strongly suggests that the large genome has two replication origins, oriC1 (921,863-922,014) and oriC2 (1,806,444-1,807,229), which are located at two sharp peaks of the Z curve. These two regions are next to the cdc6 genes and contain multiple copies of stretches of G and C, i.e., ggggtgggg and ccccacccc, which may also be regarded as direct and inverted repeats. Based on the above analysis, a model of replication of Halobacterium NRC-1 with two replication origins and two termini has been proposed. The experimental confirmation of this model would constitute the first example of multiple replication origins of archaea, which will finally provide much insight into the understanding of replication mechanisms of eukaryotic organisms, including human. In addition, the potential multiple replication origins of the archaeon Sulfolobus solfataricus are suggested by the analysis based on the Z curve method.
Collapse
Affiliation(s)
- Ren Zhang
- Department of Epidemiology and Biostatistics, Tianjin Cancer Institute and Hospital, Tianjin 300060, China
| | | |
Collapse
|
175
|
Abstract
Changes in technology in the past decade have had such an impact on the way that molecular evolution research is done that it is difficult now to imagine working in a world without genomics or the Internet. In 1992, GenBank was less than a hundredth of its current size and was updated every three months on a huge spool of tape. Homology searches took 30 minutes and rarely found a hit. Now it is difficult to find sequences with only a few homologs to use as examples for teaching bioinformatics. For molecular evolution researchers, the genomics revolution has showered us with raw data and the information revolution has given us the wherewithal to analyze it. In broad terms, the most significant outcome from these changes has been our newfound ability to examine the evolution of genomes as a whole, enabling us to infer genome-wide evolutionary patterns and to identify subsets of genes whose evolution has been in some way atypical.
Collapse
Affiliation(s)
- Kenneth H Wolfe
- Department of Genetics, Smurfit Institute, University of Dublin, Trinity College, Dublin 2, Ireland.
| | | |
Collapse
|
176
|
Sánchez J, José MV. Analysis of bilateral inverse symmetry in whole bacterial chromosomes. Biochem Biophys Res Commun 2002; 299:126-34. [PMID: 12435398 DOI: 10.1016/s0006-291x(02)02583-4] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The positions of the 64 DNA tri-nucleotides (triplets) along the Borrelia burgdorferi chromosome were determined and cumulative position plots (CPP) were obtained. Analysis of CPP for complementary triplets revealed close correlations in complementary triplet frequencies (CTF) between opposing leading and lagging strands. Such bilateral inverse symmetry (BIS) applied also to complementary mono- and di-nucleotides and to some >3 n-tuples. At the level of individual bases BIS explains Chargaff's second parity rule for whole bacterial chromosomes. Using shuffled control sequences we show that single-base BIS was not the source of higher-order BIS. Analysis of CTF in 45 other chromosomes suggests that BIS is a general property of eubacteria. BIS at the various levels may be due to the very similar numbers of codons used in chromosomal halves. Evolutionarily, BIS could have resulted from asymmetric substitution of bases combined with genetic rearrangements. However, the provocative theoretical alternative of whole-genome inverse duplication is here considered.
Collapse
Affiliation(s)
- J Sánchez
- Department of Medical Microbiology and Immunology, University of Gothenburg, SE413-46, Gothenburg, Sweden.
| | | |
Collapse
|
177
|
Mokkapati SK, Bhagwat AS. Lack of dependance of transcription-induced cytosine deaminations on protein synthesis. Mutat Res 2002; 508:131-6. [PMID: 12379468 DOI: 10.1016/s0027-5107(02)00192-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Transcription-induced mutations (TIM) is a phenomenon in Escherichia coli in which transcription promotes C to T and other mutations in a strand-specific manner. Because the processes of transcription and translation are coupled in prokaryotes and some models regarding creating a hypermutagenic state in E. coli require new protein synthesis, we tested the possibility that TIM was dependent on efficient synthesis of proteins. We used puromycin to reversibly inhibit protein synthesis and found that it had little effect on mRNA synthesis, plasmid copy-number or TIM. Our results show that TIM is not dependent on efficient translation of mRNA and this helps eliminate certain models concerning the mechanism underlying TIM.
Collapse
|
178
|
Zhang R, Zhang CT. Single replication origin of the archaeon Methanosarcina mazei revealed by the Z curve method. Biochem Biophys Res Commun 2002; 297:396-400. [PMID: 12237132 DOI: 10.1016/s0006-291x(02)02214-3] [Citation(s) in RCA: 21] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The genomic sequence of the archaeon Methanosarcina mazei has been analyzed by the Z curve method. The Z curve is a three-dimensional curve that uniquely represents the given DNA sequence. The three-dimensional Z curve and its x and y components for the genome of M. mazei show a sharp peak and relatively broad peak, respectively. The cdc6 gene is located exactly at the position of the sharp peak. Based on the known behavior of the Z curves for the archaea whose replication origins have been identified, we hypothesize that the replication origin and termination sites correspond to the positions of the sharp peak and broad peak, respectively. We have located an intergenic region that is between the cdc6 gene (MM1314) and the gene for an adjacent protein (MM1315), which shows strong characteristics of the known replication origins. This region is highly rich in AT and contains multiple copies of consecutive repeats. Our results strongly suggest that the single replication origin of M. mazei is situated at the intergenic region between the cdc6 gene and the gene for the adjacent protein, from 1,564,657 to 1,566,241 bp of the genome.
Collapse
Affiliation(s)
- Ren Zhang
- Department of Epidemiology and Biostatistics, Tianjin Cancer Institute and Hospital, China
| | | |
Collapse
|
179
|
Rocha E. Is there a role for replication fork asymmetry in the distribution of genes in bacterial genomes? Trends Microbiol 2002; 10:393-5. [PMID: 12217498 DOI: 10.1016/s0966-842x(02)02420-4] [Citation(s) in RCA: 71] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Replication generates bacterial chromosomes with strands that differ in the number of genes and base composition. It has been suggested that in bacteria such as Bacillus subtilis, PolC is responsible for the synthesis of the leading strand and DnaE for the lagging strand, whereas in many other bacteria DnaE is responsible for the synthesis of both strands. Here, I show that the possession of PolC correlates with leading strands that contain an average of 78% of genes compared with 58% for genomes that do not contain PolC. This suggests that asymmetrical replication forks could have a major role in defining and constraining the structure of the bacterial chromosome. The presence of PolC is not correlated with compositional strand bias, suggesting that the two biases result from different types of structural asymmetry.
Collapse
Affiliation(s)
- Eduardo Rocha
- Unité GGB, URA CNRS 2171, Institut Pasteur, 28 rue Dr. Roux, 75015, Paris, France.
| |
Collapse
|
180
|
Li W, Bernaola-Galván P, Haghighi F, Grosse I. Applications of recursive segmentation to the analysis of DNA sequences. COMPUTERS & CHEMISTRY 2002; 26:491-510. [PMID: 12144178 DOI: 10.1016/s0097-8485(02)00010-4] [Citation(s) in RCA: 64] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Recursive segmentation is a procedure that partitions a DNA sequence into domains with a homogeneous composition of the four nucleotides A, C, G and T. This procedure can also be applied to any sequence converted from a DNA sequence, such as to a binary strong(G + C)/weak(A + T) sequence, to a binary sequence indicating the presence or absence of the dinucleotide CpG, or to a sequence indicating both the base and the codon position information. We apply various conversion schemes in order to address the following five DNA sequence analysis problems: isochore mapping, CpG island detection, locating the origin and terminus of replication in bacterial genomes, finding complex repeats in telomere sequences, and delineating coding and noncoding regions. We find that the recursive segmentation procedure can successfully detect isochore borders, CpG islands, and the origin and terminus of replication, but it needs improvement for detecting complex repeats as well as borders between coding and noncoding regions.
Collapse
Affiliation(s)
- Wentian Li
- Center for Genomics and Human Genetics, North Shore-LIJ Research Institute, Manhasset, NY 11030, USA.
| | | | | | | |
Collapse
|
181
|
Karlin S, Brocchieri L, Trent J, Blaisdell BE, Mrázek J. Heterogeneity of genome and proteome content in bacteria, archaea, and eukaryotes. Theor Popul Biol 2002; 61:367-90. [PMID: 12167359 DOI: 10.1006/tpbi.2002.1606] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Our analysis compares bacteria, archaea, and eukaryota with respect to a wide assortment of genome and proteome properties. These properties include ribosomal protein gene distributions, chaperone protein contrasts, major variation of transcription/translation factors, gene encoding pathways of energy metabolism, and predicted protein expression levels. Significant differences within and between the three domains of life include protein lengths, information processing procedures, many metabolic and lipid biosynthesis pathways, cellular controls, and regulatory proteins. Differences among genomes are influenced by lifestyle, habitat, physiology, energy sources, and other factors.
Collapse
Affiliation(s)
- Samuel Karlin
- Department of Mathematics, Stanford University, California 94305-2125, USA
| | | | | | | | | |
Collapse
|
182
|
Tang SL, Nuttall S, Ngui K, Fisher C, Lopez P, Dyall-Smith M. HF2: a double-stranded DNA tailed haloarchaeal virus with a mosaic genome. Mol Microbiol 2002; 44:283-96. [PMID: 11967086 DOI: 10.1046/j.1365-2958.2002.02890.x] [Citation(s) in RCA: 74] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
HF2 is a haloarchaeal virus infecting two Halorubrum species (Family Halobacteriaceae). It is lytic, has a head-and-tail morphology and belongs to the Myoviridae (contractile tails). The linear double-stranded DNA genome was sequenced and found to be 77 670 bp in length, with a mol% G+C of 55.8. A total of 121 likely open reading frames (ORFs) were identified, of which 37 overlapped at start and stop codons. The predicted proteins were usually acidic (average pI of 4.8), and less than about 12% of them had homologues in the sequence databases. Four complete tRNA-like sequences (tRNA-Arg, -Asx, -Pro and -Tyr) and an incomplete tRNA-Thr were detected. A transcription map showed that most of the genome was transcribed and that the synthesis of transcripts occurred in a highly organized and reproducible pattern over a 5 h infection cycle. Transcripts often spanned multiple ORFs, suggesting that viral genes were organized into operons. The predicted ORF and observed transcript directions matched well and showed that transcription is mainly directed inwards from the genome termini, meeting at about 45-48 kb, and this was also a turning point in a cumulative GC-skew plot. The low point in cumulative GC-skew, near the left end, was a region rich in short repeats and lacking ORFs, which is likely to be an origin of replication. The HF2 genome is a mosaic of components from widely different sources, demonstrating clearly that viruses of haloarchaea, like their bacteriophage counterparts, are vectors for the exchange and transmission of genetic material between wide taxonomic distances, even across domains.
Collapse
Affiliation(s)
- Sen-Lin Tang
- Department of Microbiology and Immunology, University of Melbourne, Parkville, Victoria 3010, Australia
| | | | | | | | | | | |
Collapse
|
183
|
Nicolas P, Bize L, Muri F, Hoebeke M, Rodolphe F, Ehrlich SD, Prum B, Bessières P. Mining Bacillus subtilis chromosome heterogeneities using hidden Markov models. Nucleic Acids Res 2002; 30:1418-26. [PMID: 11884641 PMCID: PMC101363 DOI: 10.1093/nar/30.6.1418] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2001] [Revised: 01/24/2002] [Accepted: 01/24/2002] [Indexed: 11/14/2022] Open
Abstract
We present here the use of a new statistical segmentation method on the Bacillus subtilis chromosome sequence. Maximum likelihood parameter estimation of a hidden Markov model, based on the expectation-maximization algorithm, enables one to segment the DNA sequence according to its local composition. This approach is not based on sliding windows; it enables different compositional classes to be separated without prior knowledge of their content, size and localization. We compared these compositional classes, obtained from the sequence, with the annotated DNA physical map, sequence homologies and repeat regions. The first heterogeneity revealed discriminates between the two coding strands and the non-coding regions. Other main heterogeneities arise; some are related to horizontal gene transfer, some to t-enriched composition of hydrophobic protein coding strands, and others to the codon usage fitness of highly expressed genes. Concerning potential and established gene transfers, we found 9 of the 10 known prophages, plus 14 new regions of atypical composition. Some of them are surrounded by repeats, most of their genes have unknown function or possess homology to genes involved in secondary catabolism, metal and antibiotic resistance. Surprisingly, we notice that all of these detected regions are a + t-richer than the host genome, raising the question of their remote sources.
Collapse
Affiliation(s)
- Pierre Nicolas
- Laboratoire de Mathématique, Informatique et Génome, INRA, Route de Saint-Cyr, F-78026 Versailles cedex, France.
| | | | | | | | | | | | | | | |
Collapse
|
184
|
Gerber AS, Loggins R, Kumar S, Dowling TE. Does nonneutral evolution shape observed patterns of DNA variation in animal mitochondrial genomes? Annu Rev Genet 2002; 35:539-66. [PMID: 11700293 DOI: 10.1146/annurev.genet.35.102401.091106] [Citation(s) in RCA: 105] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Early studies of animal mitochondrial DNA (mtDNA) assumed that nucleotide sequence variation was neutral. Recent analyses of sequences from a variety of taxa have brought the validity of this assumption into question. Here we review analytical methods used to test for neutrality and evidence for nonneutral evolution of animal mtDNA. Evaluations of mitochondrial haplotypes in different nuclear backgrounds identified differences in performance, typically favoring coevolved mitochondrial and nuclear genomes. Experimental manipulations also indicated that certain haplotypes have an advantage over others; however, biotic and historical effects and cyto-nuclear interactions make it difficult to assess the relative importance of nonneutral factors. Statistical analyses of sequences have been used to argue for nonneutrality of mtDNA; however, rejection of neutral patterns in the published literature is common but not predominant. Patterns of replacement and synonymous substitutions within and between species identified a trend toward an excess of replacement mutations within species. This pattern has been viewed as support for the existence of mildly deleterious mutations within species; however, other alternative explanations that can produce similar patterns cannot be eliminated.
Collapse
Affiliation(s)
- A S Gerber
- Department of Biology, University of North Dakota, Grand Forks, North Dakota 58202-9019, USA
| | | | | | | |
Collapse
|
185
|
Lobry JR, Sueoka N. Asymmetric directional mutation pressures in bacteria. Genome Biol 2002; 3:RESEARCH0058. [PMID: 12372146 PMCID: PMC134625 DOI: 10.1186/gb-2002-3-10-research0058] [Citation(s) in RCA: 127] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2001] [Revised: 06/18/2002] [Accepted: 08/15/2002] [Indexed: 11/20/2022] Open
Abstract
BACKGROUND When there are no strand-specific biases in mutation and selection rates (that is, in the substitution rates) between the two strands of DNA, the average nucleotide composition is theoretically expected to be A = T and G = C within each strand. Deviations from these equalities are therefore evidence for an asymmetry in selection and/or mutation between the two strands. By focusing on weakly selected regions that could be oriented with respect to replication in 43 out of 51 completely sequenced bacterial chromosomes, we have been able to detect asymmetric directional mutation pressures. RESULTS Most of the 43 chromosomes were found to be relatively enriched in G over C and T over A, and slightly depleted in G+C, in their weakly selected positions (intergenic regions and third codon positions) in the leading strand compared with the lagging strand. Deviations from A = T and G = C were highly correlated between third codon positions and intergenic regions, with a lower degree of deviation in intergenic regions, and were not correlated with overall genomic G+C content. CONCLUSIONS During the course of bacterial chromosome evolution, the effects of asymmetric directional mutation pressures are commonly observed in weakly selected positions. The degree of deviation from equality is highly variable among species, and within species is higher in third codon positions than in intergenic regions. The orientation of these effects is almost universal and is compatible in most cases with the hypothesis of an excess of cytosine deamination in the single-stranded state during DNA replication. However, the variation in G+C content between species is influenced by factors other than asymmetric mutation pressure.
Collapse
Affiliation(s)
- Jean R Lobry
- Laboratoire BBE CNRS UMR 5558, Université Claude Bernard, 43 Bd du 11 Novembre 1918, F-69622 Villeurbanne cedex, France.
| | | |
Collapse
|
186
|
Kowalczuk M, Mackiewicz P, Mackiewicz D, Nowicka A, Dudkiewicz M, Dudek MR, Cebrat S. High correlation between the turnover of nucleotides under mutational pressure and the DNA composition. BMC Evol Biol 2001; 1:13. [PMID: 11801180 PMCID: PMC64649 DOI: 10.1186/1471-2148-1-13] [Citation(s) in RCA: 22] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2001] [Accepted: 12/17/2001] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND Any DNA sequence is a result of compromise between the selection and mutation pressures exerted on it during evolution. It is difficult to estimate the relative influence of each of these pressures on the rate of accumulation of substitutions. However, it is important to discriminate between the effect of mutations, and the effect of selection, when studying the phylogenic relations between taxa. RESULTS We have tested in computer simulations, and analytically, the available substitution matrices for many genomes, and we have found that DNA strands in equilibrium under mutational pressure have unique feature: the fraction of each type of nucleotide is linearly dependent on the time needed for substitution of half of nucleotides of a given type, with a correlation coefficient close to 1. Substitution matrices found for sequences under selection pressure do not have this property. A substitution matrix for the leading strand of the Borrelia burgdorferi genome, having reached equilibrium in computer simulation, gives a DNA sequence with nucleotide composition and asymmetry corresponding precisely to the third positions in codons of protein coding genes located on the leading strand. CONCLUSIONS Parameters of mutational pressure allow us to count DNA composition in equilibrium with this mutational pressure. Comparing any real DNA sequence with the sequence in equilibrium it is possible to estimate the distance between these sequences, which could be used as a measure of the selection pressure. Furthermore, the parameters of the mutational pressure enable direct estimation of the relative mutation rates in any DNA sequence in the studied genome.
Collapse
Affiliation(s)
- Maria Kowalczuk
- Institute of Microbiology, Wroclaw University, ul. Przybyszewskiego 63/77, 51-148 Wroclaw, Poland
| | - Pawel Mackiewicz
- Institute of Microbiology, Wroclaw University, ul. Przybyszewskiego 63/77, 51-148 Wroclaw, Poland
| | - Dorota Mackiewicz
- Institute of Microbiology, Wroclaw University, ul. Przybyszewskiego 63/77, 51-148 Wroclaw, Poland
| | - Aleksandra Nowicka
- Institute of Microbiology, Wroclaw University, ul. Przybyszewskiego 63/77, 51-148 Wroclaw, Poland
| | - Malgorzata Dudkiewicz
- Institute of Microbiology, Wroclaw University, ul. Przybyszewskiego 63/77, 51-148 Wroclaw, Poland
| | | | - Stanislaw Cebrat
- Institute of Microbiology, Wroclaw University, ul. Przybyszewskiego 63/77, 51-148 Wroclaw, Poland
| |
Collapse
|
187
|
Abstract
We tried to identify the substitutions involved in the establishment of replication strand bias, which has been recognized as an important evolutionary factor in the evolution of bacterial genomes. First, we analyzed the composition asymmetry of 28 complete bacterial genomes and used it to test the possibility that asymmetric deamination of cytosine might be at the origin of the bias. The model showed significant correlation to the data but left unexplained a significant portion of the variance and indicated a systematic underestimation of GC skews in comparison with TA skews. Second, we analyzed the substitutions acting on the genes from five fully sequenced Chlamydia genomes that had not suffered strand switch since speciation. This analysis showed that substitutions were not at equilibrium in Chlamydia trachomatis or in C. muridarum and that strand bias is still an on-going process in these genes. Third, we identified substitutions involved in the adaptation of genes that had switched strands after speciation. These genes adapted quickly to the skewed composition of the new strand, mostly due to C-->T, A-->G, and C-->G asymmetric substitutions. This observation was reinforced by the analysis of genes that switched strands after divergence between Bacillus subtilis and B. halodurans. Finally, we propose a more extended model based on the analysis of the substitution asymmetries of CHLAMYDIA: This model fits well with the data provided by bacterial genomes presenting strong strand bias.
Collapse
Affiliation(s)
- E P Rocha
- Atelier de BioInformatique, Université Paris VI, Paris, France.
| | | |
Collapse
|
188
|
Francino MP, Ochman H. Deamination as the basis of strand-asymmetric evolution in transcribed Escherichia coli sequences. Mol Biol Evol 2001; 18:1147-50. [PMID: 11371605 DOI: 10.1093/oxfordjournals.molbev.a003888] [Citation(s) in RCA: 77] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
189
|
Abstract
We calculated nucleotide distribution curves along the DNA molecules of the human chromosomes 21 and 22, their correlations in more than 10,000 equidistant positions, and subjected the correlations to cluster analysis. The cluster analysis demonstrated that both DNA molecules were composed of two types of segments exhibiting qualitatively different correlations. The segments differed most in the correlation of the distribution curves of cytosine and guanine, which was very high in type I segments but weak in type II segments. The type I and II segments also significantly differed in the correlations of the distribution curves of adenine with thymine. In addition, adenine strongly anticorrelated with cytosine but this anticorrelation was uniform along both chromosomes and, therefore, it did not contribute to the distinction of the two types of segments. The segments were up to 100 kbp long but they had nothing in common with isochores. Building blocks of the mosaic structure of the DNA molecules of the human chromosomes 21 and 22 are very similar but different in several interesting aspects from those of E. coli.
Collapse
Affiliation(s)
- D Häring
- Institute of Biophysics, Academy of Sciences of the Czech Republic, Brno
| | | |
Collapse
|
190
|
Lopez P, Philippe H. Composition strand asymmetries in prokaryotic genomes: mutational bias and biased gene orientation. COMPTES RENDUS DE L'ACADEMIE DES SCIENCES. SERIE III, SCIENCES DE LA VIE 2001; 324:201-8. [PMID: 11291306 DOI: 10.1016/s0764-4469(00)01298-1] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Most prokaryotic genomes display strand compositional asymmetries, but the reasons for these biases remain unclear. When the distribution of gene orientation is biased, as it often is, this may induce a bias in composition, as codon frequencies are not identical. We show here that this effect can be estimated and removed, and that the residual base skews are the highest at third base codon positions and lower at first and second positions. This strongly suggests that compositional asymmetries result from 1) a replication-related mutational bias that is filtered through selective pressure and/or from 2) an uneven distribution of gene orientation. In most cases, the mutational bias alters the codon usage and amino acid frequencies of the leading and the lagging strand. However, these features are not ubiquitous amongst prokaryotes, and the biological reasons for them remain to be found.
Collapse
Affiliation(s)
- P Lopez
- Equipe phylogénie, bio-informatique et génome, UMR 7622, bâtiment B, 6e étage, case 24, 9, quai Saint-Bernard, 75252 Paris, France
| | | |
Collapse
|
191
|
Capiaux H, Cornet F, Corre J, Guijo MI, Pérals K, Rebollo JE, Louarn JM. Polarization of the Escherichia coli chromosome. A view from the terminus. Biochimie 2001; 83:161-70. [PMID: 11278065 DOI: 10.1016/s0300-9084(00)01202-5] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The E. coli chromosome replication arms are polarized by motifs such as RRNAGGGS oligomers, found preferentially on leading strands. Their skew increases regularly from the origin to dif (the site in the center of the terminus where chromosome dimer resolution occurs), to reach a value of 90% near dif. Convergent information indicates that polarization in opposite directions from the dif region controls tightly the activity of dif, probably by orienting mobilization of the terminus at cell division. Another example of polarization is the presence, in the region peripheral to the terminus, of small non-divisible zones whose inversion interferes with spatial separation of sister nucleoids. The two phenomena may contribute to the organization of the Ter macrodomain.
Collapse
Affiliation(s)
- H Capiaux
- Laboratoire de Microbiologie et de Génétique moléculaires du CNRS, 118, route de Narbonne, 31320 Toulouse cedex, France
| | | | | | | | | | | | | |
Collapse
|
192
|
Mackiewicz P, Mackiewicz D, Kowalczuk M, Cebrat S. Flip-flop around the origin and terminus of replication in prokaryotic genomes. Genome Biol 2001; 2:INTERACTIONS1004. [PMID: 11790247 PMCID: PMC138987 DOI: 10.1186/gb-2001-2-12-interactions1004] [Citation(s) in RCA: 42] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
A response to Evidence for symmetric chromosomal inversions around the replication origin in bacteria by JA Eisen, JF Heidelberg, O White, SL Salzberg. Genome Biology 2000, 1:research0011.1-0011.9.
Collapse
Affiliation(s)
- Paweł Mackiewicz
- Institute of Microbiology, Department of Genetics, Wrocław University, ul. Przybyszewskiego 63/77, 51-148 Wrocław, Poland
| | - Dorota Mackiewicz
- Institute of Microbiology, Department of Genetics, Wrocław University, ul. Przybyszewskiego 63/77, 51-148 Wrocław, Poland
| | - Maria Kowalczuk
- Institute of Microbiology, Department of Genetics, Wrocław University, ul. Przybyszewskiego 63/77, 51-148 Wrocław, Poland
| | - Stanisław Cebrat
- Institute of Microbiology, Department of Genetics, Wrocław University, ul. Przybyszewskiego 63/77, 51-148 Wrocław, Poland
| |
Collapse
|
193
|
Abstract
Of Chargaff's four rules on DNA base composition, only his first parity rule was incorporated into mainstream biology as the DNA double helix. Now, the cluster rule, the second parity rule, and the GC rule, reveal the multiple levels of information in our genomes and potential conflicts between them. In these terms we can understand how double-stranded RNA became an intracellular alarm signal, how potentially recombining nucleic acids can distinguish between 'self' and 'not-self' so leading to the origin of species, how isochores evolved to facilitate gene duplication, and how unlikely it is that any mutation can ever remain truly neutral.
Collapse
Affiliation(s)
- D R Forsdyke
- Department of Biochemistry, Queen's University, Kingston, Ontario K7L3N6, Canada.
| | | |
Collapse
|
194
|
Abstract
Experimental approaches, as well as computer analysis on genomic sequences, have revealed a large variability in base composition between regions in the same genome or between genomes of different species. In most cases, however, the biological causes of these compositional biases remain unknown. The recent large increase in the availability of completely sequenced genomes can give new insight into evolution processes involved in these compositional biases.
Collapse
Affiliation(s)
- C Gautier
- Biometry and Evolutionary Biology Laboratory (bâtiment 741), Université Claude Bernard Lyon 1 and CNRS, 43 bd 11 nov, 69622 Villeurbanne Cedex, France.
| |
Collapse
|
195
|
Abstract
In some ciliates, the DNA sequences of the germline genomes have been profoundly modified during evolution, providing unprecedented examples of germline DNA malleability. Although the significance of the modifications and malleability is unclear, they may reflect the evolution of mechanisms that facilitate evolution. Because of the modifications, these ciliates must perform remarkable feats of cutting, splicing, rearrangement and elimination of DNA sequences to convert the chromosomal DNA in the germline genome (micronuclear genome) into gene-sized DNA molecules in the somatic genome (macronuclear genome). How these manipulations of DNA are guided and carried out is largely unknown. However, the organization and manipulation of ciliate DNA sequences are new phenomena that expand a general appreciation for the flexibility of DNA in evolution and development.
Collapse
Affiliation(s)
- D M Prescott
- Department of Molecular, Cellular and Developmental Biology, University of Colorado, Boulder, Colorado 80309-0347, USA.
| |
Collapse
|
196
|
Picardeau M, Lobry JR, Hinnebusch BJ. Analyzing DNA strand compositional asymmetry to identify candidate replication origins of Borrelia burgdorferi linear and circular plasmids. Genome Res 2000; 10:1594-604. [PMID: 11042157 PMCID: PMC310945 DOI: 10.1101/gr.124000] [Citation(s) in RCA: 57] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
The Lyme disease agent Borrelia burgdorferi has a genome composed of a linear chromosome and a series of linear and circular plasmids. We previously mapped the oriC of the linear chromosome to the center of the molecule, where a pronounced switch in CG skew occurs. In this study, we analyzed B. burgdorferi plasmid sequences for AT and CG skew in an effort to similarly identify plasmid replication origins. Cumulative skew diagrams of the plasmids suggested that they, like the linear chromosome, replicate bidirectionally from an internal origin. The B. burgdorferi linear chromosome contains homologs to partitioning protein genes soj and spoOJ, which are closely linked to oriC at the minimum cumulative skew point of the 1-Mb molecule. A soj/parA homolog also maps to cumulative skew minima of the B. burgdorferi linear and circular plasmids, further suggesting that these regions contain the replication origin. The heterogeneity in these genes and in the nucleotide sequences of the putative origin regions could account for the mutual compatibility of the multiple DNA elements in B. burgdorferi.
Collapse
Affiliation(s)
- M Picardeau
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rocky Mountain Laboratories, Laboratory of Human Bacterial Pathogenesis, Hamilton, Montana 59840, USA
| | | | | |
Collapse
|
197
|
Hooper SD, Berg OG. Gradients in nucleotide and codon usage along Escherichia coli genes. Nucleic Acids Res 2000; 28:3517-23. [PMID: 10982871 PMCID: PMC110745 DOI: 10.1093/nar/28.18.3517] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The usage of codons and nucleotide combinations varies along genes and systematic variation causes gradients in usage. We have studied such gradients of nucleotides and nucleotide combinations and their immediate context in Escherichia coli. To distinguish mutational and selectional effects, the genes were subdivided into three groups with different codon usage bias and the gradients of nucleotide usage were studied in each group. Some combinations that can be associated with a propensity for processivity errors show strong negative gradients that become weaker in genes with low codon bias, consistent with a selection on translational efficiency. One of the strongest gradients is for third position G, which shows a pervasive positive gradient in usage in most contexts of surrounding bases.
Collapse
Affiliation(s)
- S D Hooper
- Department of Molecular Evolution, EBC, Uppsala University, Norbyvägen 18C, SE-75236, Uppsala, Sweden
| | | |
Collapse
|
198
|
Beletskii A, Grigoriev A, Joyce S, Bhagwat AS. Mutations induced by bacteriophage T7 RNA polymerase and their effects on the composition of the T7 genome. J Mol Biol 2000; 300:1057-65. [PMID: 10903854 DOI: 10.1006/jmbi.2000.3944] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
We show here that transcription by the bacteriophage T7 RNA polymerase increases the deamination of cytosine bases in the non-transcribed strand to uracil, causing C to T mutations in that strand. Under optimal conditions, the mutation frequency increases about fivefold over background, and is similar to that seen with the Escherichia coli RNA polymerase. Further, we found that a mutant T7 RNA polymerase with a slower rate of elongation caused more cytosine deaminations than its wild-type parent. These results suggest that promoting cytosine deamination in the non-transcribed strand is a general property of transcription in E. coli and is dependent on the length of time the transcription bubble stays open during elongation. To see if transcription-induced mutations have influenced the evolution of bacteriophage T7, we analyzed its genome for a bias in base composition. Our analysis showed a significant excess of thymine over cytosine bases in the highly transcribed regions of the genome. Moreover, the average value of this bias correlated well with the levels of transcription of different genomic regions. Our results indicate that transcription-induced mutations have altered the composition of bacteriophage T7 genome and suggest that this may be a significant force in genome evolution.
Collapse
Affiliation(s)
- A Beletskii
- Department of Chemistry, Wayne State University, Detroit, MI 48202, USA
| | | | | | | |
Collapse
|
199
|
Gierlik A, Kowalczuk M, Mackiewicz P, Dudek MR, Cebrat S. Is there replication-associated mutational pressure in the Saccharomyces cerevisiae genome? J Theor Biol 2000; 202:305-14. [PMID: 10666362 DOI: 10.1006/jtbi.1999.1062] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Compositional bias of yeast chromosomes was analysed using detrended DNA walks. Unlike eubacterial chromosomes, the yeast chromosomes did not show the specific asymmetry correlated with origin and terminus of replication. It is probably a result of a relative excess of autonomously replicating sequences (ARS) and of random choice of these sequences in each replication cycle. Nevertheless, the last ARS from both ends of chromosomes are responsible for unidirectional replication of subtelomeric sequences with pre-established leading/lagging roles of DNA strands. In these sequences a specific asymmetry is observed, resembling the asymmetry introduced by replication-associated mutational pressure into eubacterial chromosomes.
Collapse
Affiliation(s)
- A Gierlik
- Institute of Microbiology, Wroclaw University, ul. Przybyszewskiego 63/77, Wroclaw, 54-148, Poland
| | | | | | | | | |
Collapse
|
200
|
Sueoka N. Translation-coupled violation of Parity Rule 2 in human genes is not the cause of heterogeneity of the DNA G+C content of third codon position. Gene 1999; 238:53-8. [PMID: 10570983 DOI: 10.1016/s0378-1119(99)00320-0] [Citation(s) in RCA: 137] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
The genome of higher eukaryotes consists of genes having a widely heterogeneous base composition at the third codon position. Ubiquitous variability of the DNA base composition has the following two aspects: intragenomic heterogeneity of the G+C content and the amino-acid-specific translation-coupled biases from the Parity Rule 2 (PR2). PR2 is an intrastrand rule where A = T and G = C are expected if there is no bias in mutation and selection between the two complementary strands of DNA. To examine whether or not the biases from PR2 are responsible for the wide heterogeneity of the DNA G+C content in human, the third codon position of 846 human genes was analyzed. Genes were separated into six groups according to their G+C content of the third codon position, and each group was examined for the translation-coupled PR2 biases in the nucleotide composition of the third codon position for two- and four-codon amino acids. The results show that genes in the different G+C content groups have similar PR2 biases, indicating that the intragenomic heterogeneity of the G+C content is not correlated with translation-coupled biases from the PR2. Therefore, the heterogeneity of the G+C content is likely to be determined by some other mechanism (e.g. locally variable directional mutation pressures) than amino-acid-specific selections for the codon preference.
Collapse
Affiliation(s)
- N Sueoka
- University of Colorado, Department of Molecular, Cellular, and Developmental Biology, Boulder 80309-0347, USA.
| |
Collapse
|