51
|
Nikolaou C, Almirantis Y. A study on the correlation of nucleotide skews and the positioning of the origin of replication: different modes of replication in bacterial species. Nucleic Acids Res 2005; 33:6816-22. [PMID: 16321966 PMCID: PMC1301597 DOI: 10.1093/nar/gki988] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Deviations from Chargaff's 2nd parity rule, according to which A approximately T and G approximately C in single stranded DNA, have been associated with replication as well as with transcription in prokaryotes. Based on observations regarding mainly the transcription-replication co-linearity in a large number of prokaryotic species, we formulate the hypothesis that the replication procedure may follow different modes between genomes throughout which the skews clearly follow different patterns. We draw the conclusion that multiple functional sites of origin of replication may exist in the genomes of most archaea and in some exceptional cases of eubacteria, while in the majority of eubacteria, replication occurs through a single fixed origin.
Collapse
Affiliation(s)
- Christoforos Nikolaou
- Institute of Biology, National Centre of Scientific Research Demokritos, 15310 Athens, Greece.
| | | |
Collapse
|
52
|
Guy L, Karamata D, Moreillon P, Roten CAH. Genometrics as an essential tool for the assembly of whole genome sequences: the example of the chromosome of Bifidobacterium longum NCC2705. BMC Microbiol 2005; 5:60. [PMID: 16223444 PMCID: PMC1285363 DOI: 10.1186/1471-2180-5-60] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2005] [Accepted: 10/13/2005] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND Analysis of the first reported complete genome sequence of Bifidobacterium longum NCC2705, an actinobacterium colonizing the gastrointestinal tract, uncovered its proteomic relatedness to Streptomyces coelicolor and Mycobacterium tuberculosis. However, a rapid scrutiny by genometric methods revealed a genome organization totally different from all so far sequenced high-GC Gram-positive chromosomes. RESULTS Generally, the cumulative GC- and ORF orientation skew curves of prokaryotic genomes consist of two linear segments of opposite slope: the minimum and the maximum of the curves correspond to the origin and the terminus of chromosome replication, respectively. However, analyses of the B. longum NCC2705 chromosome yielded six, instead of two, linear segments, while its dnaA locus, usually associated with the origin of replication, was not located at the minimum of the curves. Furthermore, the coorientation of gene transcription with replication was very low. Comparison with closely related actinobacteria strongly suggested that the chromosome of B. longum was misassembled, and the identification of two pairs of relatively long homologous DNA sequences offers the possibility for an alternative genome assembly proposed here below. By genometric criteria, this configuration displays all of the characters common to bacteria, in particular to related high-GC Gram-positives. In addition, it is compatible with the partially sequenced genome of DJO10A B. longum strain. Recently, a corrected sequence of B. longum NCC2705, with a configuration similar to the one proposed here below, has been deposited in GenBank, confirming our predictions. CONCLUSION Genometric analyses, in conjunction with standard bioinformatic tools and knowledge of bacterial chromosome architecture, represent fast and straightforward methods for the evaluation of chromosome assembly.
Collapse
Affiliation(s)
- Lionel Guy
- Département de Microbiologie Fondamentale, Faculté de Biologie et Médecine, Université de Lausanne, CH-1015 Lausanne, Switzerland
| | - Dimitri Karamata
- Département de Microbiologie Fondamentale, Faculté de Biologie et Médecine, Université de Lausanne, CH-1015 Lausanne, Switzerland
| | - Philippe Moreillon
- Département de Microbiologie Fondamentale, Faculté de Biologie et Médecine, Université de Lausanne, CH-1015 Lausanne, Switzerland
| | - Claude-Alain H Roten
- Département de Microbiologie Fondamentale, Faculté de Biologie et Médecine, Université de Lausanne, CH-1015 Lausanne, Switzerland
| |
Collapse
|
53
|
Bradshaw PC, Rathi A, Samuels DC. Mitochondrial-encoded membrane protein transcripts are pyrimidine-rich while soluble protein transcripts and ribosomal RNA are purine-rich. BMC Genomics 2005; 6:136. [PMID: 16185363 PMCID: PMC1262711 DOI: 10.1186/1471-2164-6-136] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2005] [Accepted: 09/26/2005] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Eukaryotic organisms contain mitochondria, organelles capable of producing large amounts of ATP by oxidative phosphorylation. Each cell contains many mitochondria with many copies of mitochondrial DNA in each organelle. The mitochondrial DNA encodes a small but functionally critical portion of the oxidative phosphorylation machinery, a few other species-specific proteins, and the rRNA and tRNA used for the translation of these transcripts. Because the microenvironment of the mitochondrion is unique, mitochondrial genes may be subject to different selectional pressures than those affecting nuclear genes. RESULTS From an analysis of the mitochondrial genomes of a wide range of eukaryotic species we show that there are three simple rules for the pyrimidine and purine abundances in mitochondrial DNA transcripts. Mitochondrial membrane protein transcripts are pyrimidine rich, rRNA transcripts are purine-rich and the soluble protein transcripts are purine-rich. The transitions between pyrimidine and purine-rich regions of the genomes are rapid and are easily visible on a pyrimidine-purine walk graph. These rules are followed, with few exceptions, independent of which strand encodes the gene. Despite the robustness of these rules across a diverse set of species, the magnitude of the differences between the pyrimidine and purine content is fairly small. Typically, the mitochondrial membrane protein transcripts have a pyrimidine richness of 56%, the rRNA transcripts are 55% purine, and the soluble protein transcripts are only 53% purine. CONCLUSION The pyrimidine richness of mitochondrial-encoded membrane protein transcripts is partly driven by U nucleotides in the second codon position in all species, which yields hydrophobic amino acids. The purine-richness of soluble protein transcripts is mainly driven by A nucleotides in the first codon position. The purine-richness of rRNA is also due to an abundance of A nucleotides. Possible mechanisms as to how these trends are maintained in mtDNA genomes of such diverse ancestry, size and variability of A-T richness are discussed.
Collapse
Affiliation(s)
- Patrick C Bradshaw
- Virginia Bioinformatics Institute, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| | - Anand Rathi
- Virginia Bioinformatics Institute, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| | - David C Samuels
- Virginia Bioinformatics Institute, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| |
Collapse
|
54
|
Biochemical characterization of a novel thermostable glucose-1-phosphate thymidylyltransferase from Thermuscaldophilus: Probing the molecular basis for its unusual thermostability. Enzyme Microb Technol 2005. [DOI: 10.1016/j.enzmictec.2005.02.024] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
55
|
Basak S, Ghosh TC. On the origin of genomic adaptation at high temperature for prokaryotic organisms. Biochem Biophys Res Commun 2005; 330:629-32. [PMID: 15809043 DOI: 10.1016/j.bbrc.2005.02.134] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2005] [Indexed: 10/25/2022]
Abstract
For a long time, the central issue of evolutionary genomics was to find out the adaptive strategy of nucleic acid molecules of various microorganisms having different optimal growth temperatures (Topt). Long-standing controversies exist regarding the correlations between genomic G+C content and Topt, and this debate has not been yet settled. We address this problem by considering the fact that adaptation to growth at high temperature requires a coordinated set of evolutionary changes affecting: (i) nucleic acid thermostability and (ii) stability of codon-anticodon interactions. In the present study, we analyzed 16 prokaryotic genomes having intermediate G+C content and widely varying optimal growth temperatures. Results show that elevated growth temperature imposes selective constraints not only on nucleic acid level but also affects the stability of codon-anticodon interaction. We observed a decrease in the frequency of SSC and SSG codons with the increase in Topt to avoid the formation of side-by-side GC base pairs in the codon-anticodon interaction, thereby making it impossible for a genome to increase GC composition uniformly through the whole coding sequence. Thus, we suggest that any attempt to obtain a generalized relation between genomic GC composition and optimal growth temperature would hardly evolve any satisfactory result.
Collapse
Affiliation(s)
- Surajit Basak
- Bioinformatics Centre, Bose Institute, P 1/12, C.I.T. Scheme VII M, Kolkata 700 054, India
| | | |
Collapse
|
56
|
Friedman R, Drake JW, Hughes AL. Genome-wide patterns of nucleotide substitution reveal stringent functional constraints on the protein sequences of thermophiles. Genetics 2005; 167:1507-12. [PMID: 15280258 PMCID: PMC1470942 DOI: 10.1534/genetics.104.026344] [Citation(s) in RCA: 58] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
To test the hypothesis that the proteins of thermophilic prokaryotes are subject to unusually stringent functional constraints, we estimated the numbers of synonymous and nonsynonymous nucleotide substitutions per site between 17,957 pairs of orthologous genes from 22 pairs of closely related species of Archaea and Bacteria. The average ratio of nonsynonymous to synonymous substitutions was significantly lower in thermophiles than in nonthermophiles, and this effect was observed in both Archaea and Bacteria. There was no evidence that this difference could be explained by factors such as nucleotide content bias. Rather, the results support the hypothesis that proteins of thermophiles are subject to unusually strong purifying selection, leading to a reduced overall level of amino acid evolution per mutational event. The results show that genome-wide patterns of sequence evolution can be influenced by natural selection exerted by a species' environment and shed light on a previous observation that relatively few of the mutations arising in a thermophilic archaeon were nucleotide substitutions in contrast to indels.
Collapse
Affiliation(s)
- Robert Friedman
- Department of Biological Sciences, University of South Carolina, Columbia, South Carolina 29208, USA
| | | | | |
Collapse
|
57
|
Basak S, Banerjee T, Gupta SK, Ghosh TC. Investigation on the causes of codon and amino acid usages variation between thermophilic Aquifex aeolicus and mesophilic Bacillus subtilis. J Biomol Struct Dyn 2005; 22:205-14. [PMID: 15317481 DOI: 10.1080/07391102.2004.10506996] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Base composition, codon usages and amino acid usages have been analyzed by taking 529 orthologous sequences of Aquifex aeolicus and Bacillus subtilis, having different optimal growth temperatures. These two bacteria do not have significant difference in overall GC composition, but GC(1+2) and GC3 levels were found to vary significantly. Significant increments in purine content and GC3 composition have been observed in the coding sequences of Aquifex aeolicus than its Bacillus subtilis counterparts. Correspondence analyses on codon and amino acid usages reveal that variation in base composition actually influences their codon and amino acid usages. Two selection pressures acting on the nucleotide level (GC3 and purine enrichment), causes variation in the amino acid usage differently in different protein secondary structures. Our results suggest that adaptation of amino acid usages in coil structure of Aquifex aeolicus proteins is under the control of both purine increment and GC3 composition, whereas the adaptation of the amino acids in the helical region of thermophilic bacteria is strongly influenced by the purine content. Evolutionary perspectives concerning the temperature adaptation of DNA and protein molecules of these two bacteria have been discussed on the basis of these results.
Collapse
Affiliation(s)
- S Basak
- Bioinformatics Centre, Bose Institute, P 1/12, C.I.T. Scheme VII M, Kolkata 700 054, India
| | | | | | | |
Collapse
|
58
|
Guy L, Roten CAH. Genometric analyses of the organization of circular chromosomes: a universal pressure determines the direction of ribosomal RNA genes transcription relative to chromosome replication. Gene 2004; 340:45-52. [PMID: 15556293 DOI: 10.1016/j.gene.2004.06.056] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2004] [Revised: 06/08/2004] [Accepted: 06/29/2004] [Indexed: 10/26/2022]
Abstract
Selective pressures related to gene function and chromosomal architecture are acting on genome sequences and can be revealed, for instance, by appropriate genometric methods. Cumulative nucleotide skew analyses, i.e., GC, TA, and ORF orientation skews, predict the location of the origin of DNA replication for 88 out of 100 completely sequenced bacterial chromosomes. These methods appear fully reliable for proteobacteria, Gram-positives, and spirochetes as well as for euryarchaeotes. Based on this genome architecture information, coorientation analyses reveal that in prokaryotes, ribosomal RNA (rRNA) genes encoding the small and large ribosomal subunits are all transcribed in the same direction as DNA replication; that is, they are located along the leading strand. This result offers a simple and reliable method for circumscribing the region containing the origin of the DNA replication and reveals a strong selective pressure acting on the orientation of rRNA genes similar to the weaker one acting on the orientation of ORFs. Rate of coorientation of transfer RNA (tRNA) genes with DNA replication appears to be taxon-specific. Analyzing nucleotide biases such as GC and TA skews of genes and plotting one against the other reveals a taxonomic clusterization of species. All ribosomal RNA genes are enriched in Gs and depleted in Cs, the only so far known exception being the rRNA genes of deuterostomian mitochondria. However, this exception can be explained by the fact that in the chromosome of the human mitochondrion, the model of the deuterostomian organelle genome, DNA replication, and rRNA transcription proceed in opposite directions. A general rule is deduced from prokaryotic and mitochondrial genomes: ribosomal RNA genes that are transcribed in the same direction as the DNA replication are enriched in Gs, and those transcribed in the opposite direction are depleted in Gs.
Collapse
MESH Headings
- Base Composition/genetics
- Chromosomes, Archaeal/genetics
- Chromosomes, Bacterial/genetics
- DNA Replication/genetics
- DNA, Circular/genetics
- DNA, Mitochondrial/genetics
- Databases, Nucleic Acid
- Genome, Archaeal
- Genome, Bacterial
- Humans
- Models, Genetic
- Phylogeny
- RNA, Ribosomal/genetics
- Replication Origin/genetics
- Transcription, Genetic/genetics
Collapse
Affiliation(s)
- Lionel Guy
- Département de Microbiologie Fondamentale, Faculté de Biologie et de Médecine, Université de Lausanne, CH-1015 Lausanne, Switzerland
| | | |
Collapse
|
59
|
Rabus R, Ruepp A, Frickey T, Rattei T, Fartmann B, Stark M, Bauer M, Zibat A, Lombardot T, Becker I, Amann J, Gellner K, Teeling H, Leuschner WD, Glöckner FO, Lupas AN, Amann R, Klenk HP. The genome of Desulfotalea psychrophila, a sulfate-reducing bacterium from permanently cold Arctic sediments. Environ Microbiol 2004; 6:887-902. [PMID: 15305914 DOI: 10.1111/j.1462-2920.2004.00665.x] [Citation(s) in RCA: 158] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Desulfotalea psychrophila is a marine sulfate-reducing delta-proteobacterium that is able to grow at in situ temperatures below 0 degrees C. As abundant members of the microbial community in permanently cold marine sediments, D. psychrophila-like bacteria contribute to the global cycles of carbon and sulfur. Here, we describe the genome sequence of D. psychrophila strain LSv54, which consists of a 3 523 383 bp circular chromosome with 3118 predicted genes and two plasmids of 121 586 bp and 14 663 bp. Analysis of the genome gave insight into the metabolic properties of the organism, e.g. the presence of TRAP-T systems as a major route for the uptake of C(4)-dicarboxylates, the unexpected presence of genes from the TCA cycle, a TAT secretion system, the lack of a beta-oxidation complex and typical Desulfovibrio cytochromes, such as c(553), c(3) and ncc. D. psychrophila encodes more than 30 two-component regulatory systems, including a new Ntr subcluster of hybrid kinases, nine putative cold shock proteins and nine potentially cold shock-inducible proteins. A comparison of D. psychrophila's genome features with those of the only other published genome from a sulfate reducer, the hyperthermophilic archaeon Archaeoglobus fulgidus, revealed many striking differences, but only a few shared features.
Collapse
Affiliation(s)
- R Rabus
- Max-Planck-Institute for Marine Microbiology, Celsiusstrasse 1, 28359 Bremen, Germany
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
60
|
Abstract
Most positively selected mutations cause changes in metabolism, resulting in a better-adapted phenotype. But as well as acting on the information content of genes, natural selection may also act directly on nucleic acid and protein molecules. We review the evidence for direct temperature-dependent natural selection acting on genomes, transcriptomes and proteomes.
Collapse
Affiliation(s)
- Donal A Hickey
- Department of Biology, Concordia University, 7141 Sherbrooke Street, Montreal, Quebec, H4B 1R6, Canada.
| | | |
Collapse
|
61
|
Paz A, Mester D, Baca I, Nevo E, Korol A. Adaptive role of increased frequency of polypurine tracts in mRNA sequences of thermophilic prokaryotes. Proc Natl Acad Sci U S A 2004; 101:2951-6. [PMID: 14973185 PMCID: PMC365726 DOI: 10.1073/pnas.0308594100] [Citation(s) in RCA: 63] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The mechanism of an organism's adaptation to high temperatures has been investigated intensively in recent years. It was suggested that the macromolecules of thermophilic microorganisms (especially proteins) have structural features that enhance their thermostability. We compared mRNA sequences of 72 fully sequenced prokaryotic proteomes (14 thermophilic and 58 mesophilic species). Although the differences between the percentage of adenine plus guanine content of whole mRNAs of different prokaryotic species are much lower than those of guanine plus cytosine content, the thermophile purine-pyrimidine (R/Y) ratio within their mRNAs is significantly higher than that of the mesophiles. The first and third codon positions of both thermophiles and mesophiles are purine-biased, with the bias more pronounced by the thermophiles. Thermophile mRNAs that display the highest R/Y ratio (1.43-1.69) are those of the ribosomal proteins, histone-like proteins, DNA-dependent RNA polymerase subunits, and heat-shock proteins. Within mesophilic prokaryotes and five eukaryotic species, the R/Y ratio of the mRNAs of heat-shock proteins is higher than their average over coding part of the genome. Polypurine tracts (R)(n) (with n > or = 5) are much more abundant within the thermophile mRNAs compared with mesophiles. Between two sequential pure-purinic codons of thermophile mRNAs, there is a rather strong tendency for the occurrence of adenine but not guanine tracts. The data suggest that mixed adenine.guanine and polyadenine tracts in mRNAs increase the thermostability beyond the contribution of amino acids encoded by purine tracts, which highlights the importance of ecological stress in the evolution of genome architecture.
Collapse
Affiliation(s)
- Arnon Paz
- Institute of Evolution, Haifa University, Mount Carmel, Haifa 31905, Israel
| | | | | | | | | |
Collapse
|
62
|
Chen SL, Lee W, Hottes AK, Shapiro L, McAdams HH. Codon usage between genomes is constrained by genome-wide mutational processes. Proc Natl Acad Sci U S A 2004; 101:3480-5. [PMID: 14990797 PMCID: PMC373487 DOI: 10.1073/pnas.0307827100] [Citation(s) in RCA: 230] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Analysis of genome-wide codon bias shows that only two parameters effectively differentiate the genome-wide codon bias of 100 eubacterial and archaeal organisms. The first parameter correlates with genome GC content, and the second parameter correlates with context-dependent nucleotide bias. Both of these parameters may be calculated from intergenic sequences. Therefore, genome-wide codon bias in eubacteria and archaea may be predicted from intergenic sequences that are not translated. When these two parameters are calculated for genes from nonmammalian eukaryotic organisms, genes from the same organism again have similar values, and genome-wide codon bias may also be predicted from intergenic sequences. In mammals, genes from the same organism are similar only in the second parameter, because GC content varies widely among isochores. Our results suggest that, in general, genome-wide codon bias is determined primarily by mutational processes that act throughout the genome, and only secondarily by selective forces acting on translated sequences.
Collapse
Affiliation(s)
- Swaine L Chen
- Department of Developmental Biology, Stanford University School of Medicine, Beckman Center, B300, Stanford, CA 94304, USA.
| | | | | | | | | |
Collapse
|
63
|
Singer GAC, Hickey DA. Thermophilic prokaryotes have characteristic patterns of codon usage, amino acid composition and nucleotide content. Gene 2004; 317:39-47. [PMID: 14604790 DOI: 10.1016/s0378-1119(03)00660-7] [Citation(s) in RCA: 126] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
A number of recent studies have shown that thermophilic prokaryotes have distinguishable patterns of both synonymous codon usage and amino acid composition, indicating the action of natural selection related to thermophily. On the other hand, several other studies of whole genomes have illustrated that nucleotide bias can have dramatic effects on synonymous codon usage and also on the amino acid composition of the encoded proteins. This raises the possibility that the thermophile-specific patterns observed at both the codon and protein levels are merely reflections of a single underlying effect at the level of nucleotide composition. Moreover, such an effect at the nucleotide level might be due entirely to mutational bias. In this study, we have compared the genomes of thermophiles and mesophiles at three levels: nucleotide content, codon usage and amino acid composition. Our results indicate that the genomes of thermophiles are distinguishable from mesophiles at all three levels and that the codon and amino acid frequency differences cannot be explained simply by the patterns of nucleotide composition. At the nucleotide level, we see a consistent tendency for the frequency of adenine to increase at all codon positions within the thermophiles. Thermophiles are also distinguished by their pattern of synonymous codon usage for several amino acids, particularly arginine and isoleucine. At the protein level, the most dramatic effect is a two-fold decrease in the frequency of glutamine residues among thermophiles. These results indicate that adaptation to growth at high temperature requires a coordinated set of evolutionary changes affecting (i) mRNA thermostability, (ii) stability of codon-anticodon interactions and (iii) increased thermostability of the protein products. We conclude that elevated growth temperature imposes selective constraints at all three molecular levels: nucleotide content, codon usage and amino acid composition. In addition to these multiple selective effects, however, the genomes of both thermophiles and mesophiles are often subject to superimposed large changes in composition due to mutational bias.
Collapse
Affiliation(s)
- Gregory A C Singer
- Department of Biology, University of Ottawa, 30 Marie Curie, Ottawa, Ontario, Canada K1N 6N5.
| | | |
Collapse
|
64
|
Lambros RJ, Mortimer JR, Forsdyke DR. Optimum growth temperature and the base composition of open reading frames in prokaryotes. Extremophiles 2003; 7:443-50. [PMID: 14666404 DOI: 10.1007/s00792-003-0353-4] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2003] [Accepted: 06/20/2003] [Indexed: 11/27/2022]
Abstract
The purine-loading index (PLI) is the difference between the numbers of purines (A+G) and pyrimidines (T+C) per kilobase of single-stranded nucleic acid. By purine-loading their mRNAs organisms may minimize unnecessary RNA-RNA interactions and prevent inadvertent formation of "self" double-stranded RNA. Since RNA-RNA interactions have a strong entropy-driven component, this need to minimize should increase as temperature increases. Consistent with this, we report for 550 prokaryotic species that optimum growth temperature is related to the average PLI of open reading frames. With increasing temperature prokaryotes tend to acquire base A and lose base C, while keeping bases T and G relatively constant. Accordingly, while the PLI increases, the (G+C)% decreases. The previously observed positive correlation between (G+C)% and optimum growth temperature, which applies to RNA species whose structure is of major importance for their function (ribosomal and transfer RNAs) does not apply to mRNAs, and hence is unlikely to apply generally to genomic DNA.
Collapse
Affiliation(s)
- R J Lambros
- Department of Biochemistry, Queen's University, Kingston, Ontario K7L3N6, Canada
| | | | | |
Collapse
|
65
|
Xue HY, Forsdyke DR. Low-complexity segments in Plasmodium falciparum proteins are primarily nucleic acid level adaptations. Mol Biochem Parasitol 2003; 128:21-32. [PMID: 12706793 DOI: 10.1016/s0166-6851(03)00039-2] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Protein segments that contain few of the possible 20 amino acids, sometimes in tandem repeat arrays, are referred to as containing "simple" or "low-complexity" sequence. Many Plasmodium falciparum proteins are longer than their homologs in other species by virtue of their content of such low-complexity segments that have no known function; these are interspersed among segments of higher complexity to which function can often be ascribed. If there is low complexity at the protein level, there is likely to be low complexity at the corresponding nucleic acid level (departure from equifrequency of the four bases). Thus, low complexity may have been selected primarily at the nucleic acid level and low complexity at the protein level may be secondary. In this case, the amino acid composition of low-complexity segments should be more reflective than that of high complexity segments on forces operating at the nucleic acid level, which include GC-pressure and AG-pressure. Consistent with this, for amino acid determining first and second codon positions, open reading frames containing low-complexity segments show increased contributions to downward GC-pressure (revealed as decreased percentage of G+C) and to upward AG-pressure (revealed as increased percentage A+G). When not countermanded by high contributions to AG-pressure, low-complexity segments can contribute to base order-dependent fold potential; in this respect, they resemble introns. Thus, in P. falciparum, low-complexity segments appear as adaptations primarily serving nucleic acid level functions.
Collapse
Affiliation(s)
- H Y Xue
- Department of Biochemistry, Queen's University, Kingston, Ont, K7L3N6, Canada
| | | |
Collapse
|
66
|
Forsdyke DR, Madill CA, Smith SD. Immunity as a function of the unicellular state: implications of emerging genomic data. Trends Immunol 2002; 23:575-9. [PMID: 12464568 DOI: 10.1016/s1471-4906(02)02329-3] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
Instead of being greeted as supporting the growing corpus of immunological theory, recent advances in the bioinformatic analysis of genomes have often surprised the discoverers and failed to attract the attention of immunologists. In fact, the view that multicellular immune systems are adaptations of already highly evolved unicellular immune systems that are capable of self/not-self discrimination can assist our comprehension of phenomena, such as 'junk' DNA, genetic polymorphism and the ubiquity of repetitive elements. For instance, the 'hidden transcriptome', revealed by run-on transcription of genes or repetitive elements, contains a diverse repertoire of RNA 'immune receptors' with the potential to form double-stranded RNA with viral RNA 'antigens', thus triggering intracellular alarms.
Collapse
Affiliation(s)
- Donald R Forsdyke
- Dept of Biochemistry, Queen's University, Kingston, Ontario, Canada K7L 3N6.
| | | | | |
Collapse
|
67
|
Lynn DJ, Singer GAC, Hickey DA. Synonymous codon usage is subject to selection in thermophilic bacteria. Nucleic Acids Res 2002; 30:4272-7. [PMID: 12364606 PMCID: PMC140546 DOI: 10.1093/nar/gkf546] [Citation(s) in RCA: 148] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The patterns of synonymous codon usage, both within and among genomes, have been extensively studied over the past two decades. Despite the accumulating evidence that natural selection can shape codon usage, it has not been possible to link a particular pattern of codon usage to a specific external selective force. Here, we have analyzed the patterns of synonymous codon usage in 40 completely sequenced prokaryotic genomes. By combining the genes from several genomes (more than 80 000 genes in all) into a single dataset for this analysis, we were able to investigate variations in codon usage, both within and between genomes. The results show that synonymous codon usage is affected by two major factors: (i) the overall G+C content of the genome and (ii) growth at high temperature. This study focused on the relationship between synonymous codon usage and the ability to grow at high temperature. We have been able to eliminate both phylogenetic history and lateral gene transfer as possible explanations for the characteristic pattern of codon usage among the thermophiles. Thus, these results demonstrate a clear link between a particular pattern of codon usage and an external selective force.
Collapse
Affiliation(s)
- David J Lynn
- Department of Biology, University of Ottawa, 30 Marie Curie, Ottawa, ON K1N 6N5, Canada
| | | | | |
Collapse
|
68
|
Abstract
Rich and Ayala propose that the zero rate of non-amino-acid-changing (synonymous) mutations in some proteins of Plasmodium falciparum reflects a recent population bottleneck. Alternatively, Arnot and Saul propose sequence conservation in response to selective pressures other than the pressure to encode protein. Among these are fold pressure and purine-loading pressure. Genomes adapt to these by acquisition of introns and/or low-complexity (simple-sequence) segments in proteins. Adaptive explanations include facilitation of intragenic recombination (and hence diversification of the encoded protein) by DNA stem-loop secondary structures.
Collapse
Affiliation(s)
- Donald R Forsdyke
- Dept of Biochemistry, Queen's University, Kingston, Ontario, Canada K7L3N6.
| |
Collapse
|
69
|
Abstract
The hypothesis that genomic regions rich in non-protein-coding RNAs (ncRNAs) can be identified using local variations in single-base and dinucleotide statistics has been investigated. (G+C)%, (G-C)% difference, (A-T)% difference and dinucleotide-frequency statistics were compared among seven classes of ncRNAs and three genomes. Significant variations were observed in (G+C)% and, in Methanococcus jannaschii, in the frequency of the dinucleotide 'CG'. Screening programs based on these two base-composition statistics were developed. With (G+C)% screening alone, a 1% fraction of the M.jannaschii genome containing all 44 known transfer RNAs, ribosomal RNAs and signal recognition particle RNAs could be identified. When (G+C)% combined with CG dinucleotide-frequency screening was used, 43 of the 44 known M.jannaschii structural ncRNAs were again identified, while the number of presumably false hits overlapping a known or putative protein-coding gene was reduced from 15 to 6. In addition, 19 candidate ncRNAs were identified including one with significant homology to several known archaeal RNaseP RNAs.
Collapse
Affiliation(s)
- Peter Schattner
- Center for Biomolecular Science and Engineering, 227 Sinsheimer Laboratories, University of California, 1156 High Street, Santa Cruz, CA 95064, USA.
| |
Collapse
|
70
|
Abstract
A microbial pathogen species can adapt to its host species to the extent that members of the host species are uniform. Loss of this uniformity would make it difficult for a pathogen species to transfer, from one member of the host species to another, what it had "learned" through selection of its members with advantageous mutations. The existence of major histocompatibility complex (MHC) polymorphism indicates that non-uniformity within a species is an effective host defence strategy. By virtue of this molecular discontinuity among its members the host species can "present a moving target" to the pathogen. Many proteins other than MHC proteins show polymorphism - a phenomenon which has suggested that mutations in regions of protein molecules which do not affect overt function are neutral. However, in the context of the author's differential aggregation theory of intracellular self/not-self discrimination as previously applied to the problem of the antigenicity of cancer cells, such polymorphism should serve for the recruitment of subsets of self-antigens into the antigenic repertoire of an infected cell. These would act as "intracellular antibodies" by virtue of their weak, but specific, aggregation with pathogen proteins. Peptides from the self-antigens, as well as (or instead of) those from the antigens of the pathogen, would then serve as targets for attack by cytotoxic T cells. Thus, polymorphism of intracellular proteins should be of adaptive value, serving to amplify and individualize the immune response to intracellular pathogens.
Collapse
Affiliation(s)
- D R Forsdyke
- Department of Biochemistry, Queen's University, Kingston, Ontario, K7L 3N6, Canada.
| |
Collapse
|
71
|
Cristillo AD, Mortimer JR, Barrette IH, Lillicrap TP, Forsdyke DR. Double-stranded RNA as a not-self alarm signal: to evade, most viruses purine-load their RNAs, but some (HTLV-1, Epstein-Barr) pyrimidine-load. J Theor Biol 2001; 208:475-91. [PMID: 11222051 DOI: 10.1006/jtbi.2000.2233] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
For double-stranded RNA (dsRNA) to signal the presence of foreign (non-self) nucleic acid, self-RNA-self-RNA interactions should be minimized. Indeed, self-RNAs appear to have been fine-tuned over evolutionary time by the introduction of purines in clusters in the loop regions of stem-loop structures. This adaptation should militate against the "kissing" interactions which initiate formation of dsRNA. Our analyses of virus base compositions suggest that, to avoid triggering the host cell's dsRNA surveillance mechanism, most viruses purine-load their RNAs to resemble host RNAs ("stealth" strategy). However, some GC-rich latent viruses (HTLV-1, EBV) pyrimidine-load their RNAs. It is suggested that when virus production begins, these RNAs suddenly increase in concentration and impair host mRNA function by virtue of an excess of complementary "kissing" interactions ("surprise" strategy). Remarkably, the only mRNA expressed in the most fundamental form of EBV latency (the "EBNA-1 program") is purine-loaded. This apparent stealth strategy is reinforced by a simple sequence repeat which prefers purine-rich codons. During latent infection the EBNA-1 protein may evade recognition by cytotoxic T-cells, not by virtue of containing a simple sequence amino acid repeat as has been proposed, but by virtue of the encoding mRNA being purine-loaded to prevent interactions with host RNAs of either genic or non-genic origin.
Collapse
Affiliation(s)
- A D Cristillo
- Department of Biochemistry, Queen's University, Kingston, Ontario, K7L3N6, Canada
| | | | | | | | | |
Collapse
|
72
|
Abstract
Of Chargaff's four rules on DNA base composition, only his first parity rule was incorporated into mainstream biology as the DNA double helix. Now, the cluster rule, the second parity rule, and the GC rule, reveal the multiple levels of information in our genomes and potential conflicts between them. In these terms we can understand how double-stranded RNA became an intracellular alarm signal, how potentially recombining nucleic acids can distinguish between 'self' and 'not-self' so leading to the origin of species, how isochores evolved to facilitate gene duplication, and how unlikely it is that any mutation can ever remain truly neutral.
Collapse
Affiliation(s)
- D R Forsdyke
- Department of Biochemistry, Queen's University, Kingston, Ontario K7L3N6, Canada.
| | | |
Collapse
|
73
|
Lao PJ, Forsdyke DR. Crossover hot-spot instigator (Chi) sequences in Escherichia coli occupy distinct recombination/transcription islands. Gene 2000; 243:47-57. [PMID: 10675612 DOI: 10.1016/s0378-1119(99)00564-8] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Crossover hot-spot instigator (Chi) sequences (5'-GCTGGTGG-3') are orientation-dependent, strand-specific sequences implicated in RecA-mediated DNA recombination. In Escherichia coli and Haemophilus influenzae Chi and Chi-like sequences preferentially locate to approx. 1kb recombination 'islands' in the mRNA-synonymous strands of open reading frames (ORFs). Since mRNA-synonymous strands follow Szybalski's transcription direction rule in being G-rich, and the average ORF is about 1kb, then, on this basis alone, Chi sequences are seen to reside in 1kb G-rich 'islands'. However, RecA preferentially binds GT-rich sequences, suggesting that genomic context might potentiate Chi action. Consistent with this, we report for E. coli that 1kb sequence windows with Chi near their centres are a distinct subset of total 1kb windows, the mRNA-synonymous strands being preferentially enriched in both G and T. Chi function might be particularly important for bacteria that survive high temperature and radiation. These often exist in habitats where recombination with E. coli DNA would be unlikely, so canonical Chi sequences might not confer a selective disadvantage in this respect. In general, Chi sequences are not more frequent in thermophilic bacteria and Deinococcus radiodurans, than in E. coli and other mesophilic bacteria. Only two of five thermophilic bacteria examined showed preferential location of Chi sequences to mRNA-synonymous strands. In the thermophile Methanococcus jannaschii, windows containing the canonical Chi sequence do not form a distinct subset. We suggest that in thermophilic bacteria and D. radiodurans the Chi function may be achieved by sequences that differ from the canonical Chi sequence, or that the number of these sequences is sufficient, or that the Chi function is unnecessary.
Collapse
Affiliation(s)
- P J Lao
- Department of Biochemistry, Queen's University, Kingston, Canada
| | | |
Collapse
|