51
|
A computational prediction of isochores based on hidden Markov models. Gene 2006; 385:41-9. [PMID: 17020791 DOI: 10.1016/j.gene.2006.04.032] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2005] [Revised: 03/17/2006] [Accepted: 04/03/2006] [Indexed: 11/30/2022]
Abstract
Mammalian genomes are organised into a mosaic of regions (in general more than 300 kb in length), with differing, relatively homogeneous G+C contents. The G+C content is the basic characteristic of isochores, but they have also been associated with many other biological properties. For instance, the genes are more compact and their density is highest in G+C rich isochores. Various ways of locating isochores in the human genome have been developed, but such methods use only the base composition of the DNA sequences. The present paper proposes a new method, based on a hidden Markov model, which takes into account several of the biological properties associated with the isochore structure of a genome. This method leads to good segmentation of the human genome into isochores, and also permits a new analysis of the known heterogeneity of G+C rich isochores: most (60%) of the G+C poor genes embedded in G+C rich isochores have UTR sequences characteristic of G+C rich genes. This genomic feature is discussed in the context of both evolution and genome function.
Collapse
|
52
|
Fortes GG, Bouza C, Martínez P, Sánchez L. Diversity in isochore structure among cold-blooded vertebrates based on GC content of coding and non-coding sequences. Genetica 2006; 129:281-9. [PMID: 16897446 DOI: 10.1007/s10709-006-0009-2] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2005] [Accepted: 04/19/2006] [Indexed: 11/29/2022]
Abstract
To review the general consideration about the different compositional structure of warm and cold-blooded vertebrates genomes, we used of the increasing number of genetic sequences, including coding (exons) and non-coding (introns) regions, that have been deposited on the databases throughout last years. The nucleotide distributions of the third codon positions (GC3) have been analyzed in 1510 coding sequences (CDS) of fish, 1414 CDS of amphibians and 320 CDS of reptiles. Also, the relationship between GC content of 74, 56 and 25 CDS of fish, amphibians and reptiles, respectively and that of their corresponding introns (GCI) have been considerated. In accordance with recent data, sequence analysis showed the presence of very GC3-rich CDS in these poikilotherm vertebrates. However, very high diversity in compositional patterns among different orders of fish, amphibians and reptiles was found. Significant positive correlations between GC3 and GCI was also confirmed for the genes analyzed. Nevertheless, introns resulted to be poorer in GC than their corresponding CDS, this difference being larger than in human genome. Because the limited number of available sequences including exons and introns we must be cautious about the results derived from them. However, the indicious of higher GC richness of coding sequences than of their corresponding introns could aid to understand the discrepancy of sequence analysis with the ultracentrifugation studies in cold-blooded vertebrates that did not predict the existence of GC-rich isochores.
Collapse
Affiliation(s)
- Gloria G Fortes
- Departamento de Genética, Facultad de Veterinaria, Universidad de Santiago de Compostela, Lugo, Spain
| | | | | | | |
Collapse
|
53
|
Cohanim AB, Trifonov EN, Kashi Y. Specific Selection Pressure at the Third Codon Positions: Contribution to 10- to 11-Base Periodicity in Prokaryotic Genomes. J Mol Evol 2006; 63:393-400. [PMID: 16897261 DOI: 10.1007/s00239-005-0258-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2005] [Accepted: 04/03/2006] [Indexed: 10/24/2022]
Abstract
Prokaryotic sequences are responsible for more than just protein coding. There are two 10- to 11-base periodical patterns superimposed on the protein coding message within the same sequence. Positional auto- and cross-correlation analysis of the sequences shows that these two patterns are a short-range counter-phase oscillation of AA and TT dinucleotides and a medium-range in-phase oscillation of the same dinucleotides, spanning distances of up to approximately 30 and approximately 100 bases, respectively. The short-range oscillation is encoded by the amino acid sequences themselves, apparently, due to the presence of amphipathic alpha-helices in the proteins. The medium-range oscillation, related to DNA folding in the cell, is created largely by a special choice of the bases in the third positions of the codons. Interestingly, the amino acid sequences do contribute to that signal as well. That is, the very amino acid sequences are, to some extent, degenerate to serve the same oscillating pattern that is associated with the degenerate third codon positions.
Collapse
Affiliation(s)
- Amir B Cohanim
- Department of Biotechnology and Food Engineering, Technion, Haifa, 32000, Israel
| | | | | |
Collapse
|
54
|
Joy F, Basak S, Gupta SK, Das PJ, Ghosh SK, Ghosh TC. Compositional correlations in canine genome reflects similarity with human genes. BMB Rep 2006; 39:240-6. [PMID: 16756751 DOI: 10.5483/bmbrep.2006.39.3.240] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The base compositional correlations that hold among various coding and noncoding regions of the canine genome have been analysed. The distribution pattern of genes, on the basis of GC(3) composition, shows a wide range similar to that observed in human. However the occurrence of maximum number of genes was observed in the range of 65-75% of GC(3) composition. The correlation between the coding DNA sequences of canine with the different noncoding regions (introns and flanking regions) is found to be significant and in many cases the degree of correlation show similarity to human genome. We found that these correlations are not limited to the GC content alone, but is holding at the level of the frequency of individual bases as well. The present study suggests that canines ideally belong to the predicted 'general mammalian pattern' of genome composition along with human beings.
Collapse
Affiliation(s)
- Faustin Joy
- Bioinformatics Centre, Bose Institute, Kolkata, India
| | | | | | | | | | | |
Collapse
|
55
|
Chain FJJ, Evans BJ. Multiple mechanisms promote the retained expression of gene duplicates in the tetraploid frog Xenopus laevis. PLoS Genet 2006; 2:e56. [PMID: 16683033 PMCID: PMC1449897 DOI: 10.1371/journal.pgen.0020056] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2005] [Accepted: 02/28/2006] [Indexed: 01/19/2023] Open
Abstract
Gene duplication provides a window of opportunity for biological variants to persist under the protection of a co-expressed copy with similar or redundant function. Duplication catalyzes innovation (neofunctionalization), subfunction degeneration (subfunctionalization), and genetic buffering (redundancy), and the genetic survival of each paralog is triggered by mechanisms that add, compromise, or do not alter protein function. We tested the applicability of three types of mechanisms for promoting the retained expression of duplicated genes in 290 expressed paralogs of the tetraploid clawed frog, Xenopus laevis. Tests were based on explicit expectations concerning the ka/ks ratio, and the number and location of nonsynonymous substitutions after duplication. Functional constraints on the majority of paralogs are not significantly different from a singleton ortholog. However, we recover strong support that some of them have an asymmetric rate of nonsynonymous substitution: 6% match predictions of the neofunctionalization hypothesis in that (1) each paralog accumulated nonsynonymous substitutions at a significantly different rate and (2) the one that evolves faster has a higher ka/ks ratio than the other paralog and than a singleton ortholog. Fewer paralogs (3%) exhibit a complementary pattern of substitution at the protein level that is predicted by enhancement or degradation of different functional domains, and the remaining 13% have a higher average ka/ks ratio in both paralogs that is consistent with altered functional constraints, diversifying selection, or activity-reducing mutations after duplication. We estimate that these paralogs have been retained since they originated by genome duplication between 21 and 41 million years ago. Multiple mechanisms operate to promote the retained expression of duplicates in the same genome, in genes in the same functional class, over the same period of time following duplication, and sometimes in the same pair of paralogs. None of these paralogs are superfluous; degradation or enhancement of different protein subfunctions and neofunctionalization are plausible hypotheses for the retained expression of some of them. Evolution of most X. laevis paralogs, however, is consistent with retained expression via mechanisms that do not radically alter functional constraints, such as selection to preserve post-duplication stoichiometry or temporal, quantitative, or spatial subfunctionalization. Gene duplication plays a fundamental role in biological innovation but it is not clear how both copies of a duplicated gene manage to circumvent degradation by mutation if neither is unique. This study explores genetic mechanisms that could make each copy of a duplicate gene different, and therefore distinguishable and potentially preserved by natural selection. It is based on DNA sequences of the protein-coding region of 290 expressed duplicated genes in a frog, Xenopus laevis, that underwent complete duplication of its entire genome. Results provide evidence for multiple mechanisms acting within the same genome, within the same functional classes of genes, within the same period of time following duplication, and even on the same set of duplicated genes. Each copy of a duplicate gene may be subject to distinct evolutionary constraints, and this could be associated with degradation or enhancement of function. Functional constraints of most of these duplicates, however, are not substantially different from a single copy gene; their persistence in the first dozens of millions of years after duplication may more frequently be explained by mechanisms acting on their expression rather than their function.
Collapse
Affiliation(s)
- Frédéric J. J Chain
- Center for Environmental Genomics, Department of Biology, McMaster University, Hamilton, Ontario, Canada
| | - Ben J Evans
- Center for Environmental Genomics, Department of Biology, McMaster University, Hamilton, Ontario, Canada
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
56
|
Abstract
Extensive DNA sequence analysis of three eukaryotes, S. cerevisiae, C. elegans, and D. melanogaster, reveals two different AA/TT periodical patterns associated with the nucleosome positioning. The first pattern is the counter-phase oscillation of AA and TT dinucleotides, which has been frequently considered as the nucleosome DNA pattern. This represents the sequence rule I for chromatin structure. The second pattern is the in-phase oscillation of the AA and TT dinucleotides with the same nucleosome DNA period, 10.4 bases. This pattern apparently corresponds to curved DNA, that also participates in the nucleosome formation, and represents the sequence rule II for chromatin. The positional correlations of AA and TT dinucleotides also indicate that the nucleosomes are separated by specific linker sizes (preferably 8, 18, ... bases), dictated by the steric exclusion rules. Thus, the sequence positions of the neighboring nucleosomes are correlated, and this represents the sequence rule III.
Collapse
Affiliation(s)
- Amir B Cohanim
- Department of Biotechnology and Food Engineering, Technion, Haifa 32000, Israel
| | | | | |
Collapse
|
57
|
Mitreva M, Wendl MC, Martin J, Wylie T, Yin Y, Larson A, Parkinson J, Waterston RH, McCarter JP. Codon usage patterns in Nematoda: analysis based on over 25 million codons in thirty-two species. Genome Biol 2006; 7:R75. [PMID: 26271136 PMCID: PMC1779591 DOI: 10.1186/gb-2006-7-8-r75] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2006] [Revised: 06/30/2006] [Accepted: 08/14/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Codon usage has direct utility in molecular characterization of species and is also a arker for molecular evolution. To understand codon usage within the diverse phylum Nematoda,we analyzed a total of 265,494 expressed sequence tags (ESTs) from 30 nematode species. The full genomes of Caenorhabditis elegans and C. briggsae were also examined. A total of 25,871,325 codons ere analyzed and a comprehensive codon usage table for all species was generated. This is the first codon usage table available for 24 of these organisms. RESULTS Codon usage similarity in Nematoda usually persists over the breadth of a genus but thenrapidly diminishes even within each clade. Globodera, Meloidogyne, Pristionchus, and Strongyloides have the most highly derived patterns of codon usage. The major factor affecting differences in codon usage between species is the coding sequence GC content, which varies in nematodes from 32%to 51%. Coding GC content (measured as GC3) also explains much of the observed variation in the effective number of codons (R = 0.70), which is a measure of codon bias, and it even accounts for differences in amino acid frequency. Codon usage is also affected by neighboring nucleotides(N1 context). Coding GC content correlates strongly with estimated noncoding genomic GC content (R = 0.92). On examining abundant clusters in five species, candidate optimal codons were identified that may be preferred in highly expressed transcripts. CONCLUSION Evolutionary models indicate that total genomic GC content, probably the product of directional mutation pressure, drives codon usage rather than the converse, a conclusion that is supported by examination of nematode genomes.
Collapse
Affiliation(s)
- Makedonka Mitreva
- Genome Sequencing Center, Washington University School of Medicine, St Louis, Missouri 63108, USA
| | - Michael C Wendl
- Genome Sequencing Center, Washington University School of Medicine, St Louis, Missouri 63108, USA
| | - John Martin
- Genome Sequencing Center, Washington University School of Medicine, St Louis, Missouri 63108, USA
| | - Todd Wylie
- Genome Sequencing Center, Washington University School of Medicine, St Louis, Missouri 63108, USA
| | - Yong Yin
- Genome Sequencing Center, Washington University School of Medicine, St Louis, Missouri 63108, USA
| | - Allan Larson
- Department of Biology, Washington University, St. Louis, Missouri 63130, USA
| | - John Parkinson
- Hospital for Sick Children, Toronto, and Departments of Biochemistry/Medical Genetics and Microbiology, University of Toronto, M5G 1X8, Canada
| | - Robert H Waterston
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - James P McCarter
- Genome Sequencing Center, Washington University School of Medicine, St Louis, Missouri 63108, USA
- Divergence Inc., St Louis, Missouri 63141, USA
| |
Collapse
|
58
|
Banerjee T, Gupta SK, Ghosh TC. Role of mutational bias and natural selection on genome-wide nucleotide bias in prokaryotic organisms. Biosystems 2005; 81:11-8. [PMID: 15917123 DOI: 10.1016/j.biosystems.2005.01.002] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2004] [Revised: 01/08/2005] [Accepted: 01/12/2005] [Indexed: 11/24/2022]
Abstract
Correlations between genomic GC contents and amino acid frequencies were studied in the homologous sequences of 12 eubacterial genomes. Results show that amino acids encoded by GC-rich codons increases significantly with genomic GC contents, whereas opposite trend was observed in case of amino acids encoded by GC-poor codons. Further studies show all the amino acids do not change in the predicted direction according to their genomic GC pressure, suggesting that protein evolution is not entirely dictated by their nucleotide frequencies. Amino acid substitution matrix calculated among hydrophobic, amphipathic and hydrophilic amino acid groups' shows that amphipathic and hydrophilic amino acids are more frequently substituted by hydrophobic amino acids than from hydrophobic to hydrophilic or amphipathic amino acids. This indicates that nucleotide bias induces a directional changes in proteome composition in such a way that underwent strong changes in hydropathy values. In fact, significant increases in hydrophobicity values have also been observed with the increase of genomic GC contents. Correlations between GC contents and amino acid compositions in three different predicted protein secondary structures show that hydropathy values increases significantly with GC contents in aperiodic and helix structures whereas strand structure remains insensitive with the genomic GC levels. The relative importance of mutation and selection on the evolution of proteins have been discussed on the basis of these results.
Collapse
Affiliation(s)
- T Banerjee
- Bioinformatics Centre, Bose Institute, P 1/12, C.I.T. Scheme VII M, Kolkata 700 054, India
| | | | | |
Collapse
|
59
|
Raghava GPS, Han JH. Correlation and prediction of gene expression level from amino acid and dipeptide composition of its protein. BMC Bioinformatics 2005; 6:59. [PMID: 15773999 PMCID: PMC1083413 DOI: 10.1186/1471-2105-6-59] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2004] [Accepted: 03/17/2005] [Indexed: 11/29/2022] Open
Abstract
Background A large number of papers have been published on analysis of microarray data with particular emphasis on normalization of data, detection of differentially expressed genes, clustering of genes and regulatory network. On other hand there are only few studies on relation between expression level and composition of nucleotide/protein sequence, using expression data. There is a need to understand why particular genes/proteins express more in particular conditions. In this study, we analyze 3468 genes of Saccharomyces cerevisiae obtained from Holstege et al., (1998) to understand the relationship between expression level and amino acid composition. Results We compute the correlation between expression of a gene and amino acid composition of its protein. It was observed that some residues (like Ala, Gly, Arg and Val) have significant positive correlation (r > 0.20) and some other residues (Like Asp, Leu, Asn and Ser) have negative correlation (r < -0.15) with the expression of genes. A significant negative correlation (r = -0.18) was also found between length and gene expression. These observations indicate the relationship between percent composition and gene expression level. Thus, attempts have been made to develop a Support Vector Machine (SVM) based method for predicting the expression level of genes from its protein sequence. In this method the SVM is trained with proteins whose gene expression data is known in a given condition. Then trained SVM is used to predict the gene expression of other proteins of the same organism in the same condition. A correlation coefficient r = 0.70 was obtained between predicted and experimentally determined expression of genes, which improves from r = 0.70 to 0.72 when dipeptide composition was used instead of residue composition. The method was evaluated using 5-fold cross validation test. We also demonstrate that amino acid composition information along with gene expression data can be used for improving the function classification of proteins. Conclusion There is a correlation between gene expression and amino acid composition that can be used to predict the expression level of genes up to a certain extent. A web server based on the above strategy has been developed for calculating the correlation between amino acid composition and gene expression and prediction of expression level . This server will allow users to study the evolution from expression data.
Collapse
Affiliation(s)
- Gajendra PS Raghava
- Department of Computer Science and Engineering, Pohang University of Science and Technology, San 31 Hyo-Ja Dong, Pohang 790–784, Republic of Korea
- Bioinformatics Centre, Institute of Microbial Technology, Sector 39A, Chandigarh-160036, India
| | - Joon H Han
- Department of Computer Science and Engineering, Pohang University of Science and Technology, San 31 Hyo-Ja Dong, Pohang 790–784, Republic of Korea
| |
Collapse
|
60
|
D'Onofrio G, Ghosh TC. The compositional transition of vertebrate genomes: an analysis of the secondary structure of the proteins encoded by human genes. Gene 2005; 345:27-33. [PMID: 15716110 DOI: 10.1016/j.gene.2004.11.037] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2004] [Revised: 11/12/2004] [Accepted: 11/23/2004] [Indexed: 11/25/2022]
Abstract
Fluctuations and increments of both C(3) and G(3) levels along the human coding sequences were investigated comparing two sets of Xenopus/human orthologous genes. The first set of genes shows minor differences of the GC(3) levels, the second shows considerable increments of the GC(3) levels in the human genes. In both data sets, the fluctuations of C(3) and G(3) levels along the coding sequences correlated with the secondary structures of the encoded proteins. The human genes that underwent the compositional transition showed a different increment of the C(3) and G(3) levels within and among the structural units of the proteins. The relative synonymous codon usage (RSCU) of several amino acids were also affected during the compositional transition, showing that there exists a correlation between RSCU and protein secondary structures in human genes. The importance of natural selection for the formation of isochore organization of the human genome has been discussed on the basis of these results.
Collapse
Affiliation(s)
- Giuseppe D'Onofrio
- Laboratorio di Evoluzione Molecolare, Stazione Zoologica A. Dohrn, 80121 Napoli, Italy.
| | | |
Collapse
|
61
|
Abstract
The existence of a well conserved linear relationship between GC levels of genes' second and third codon positions (GC2, GC3) prompted us to focus on the landscape, or joint distribution, spanned by these two variables. In human, well curated coding sequences now cover at least 15%-30% of the estimated total gene set. Our analysis of the landscape defined by this gene set revealed not only the well documented linear crest, but also the presence of several peaks and valleys along that crest, a property that was also indicated in two other warm-blooded vertebrates represented by large gene databases, that is, mouse and chicken. GC2 is the sum of eight amino acid frequencies, whereas GC3 is linearly related to the GC level of the chromosomal region containing the gene. The landscapes therefore portray relations between proteins and the DNA environments of the genes that encode them.
Collapse
Affiliation(s)
- Stéphane Cruveiller
- Laboratorio di Evoluzione Molecolare, Stazione Zoologica Anton Dohrn, 80121 Napoli, Italy
| | | | | | | |
Collapse
|
62
|
Wan XF, Xu D, Kleinhofs A, Zhou J. Quantitative relationship between synonymous codon usage bias and GC composition across unicellular genomes. BMC Evol Biol 2004; 4:19. [PMID: 15222899 PMCID: PMC476735 DOI: 10.1186/1471-2148-4-19] [Citation(s) in RCA: 96] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2004] [Accepted: 06/28/2004] [Indexed: 11/25/2022] Open
Abstract
Background Codon usage bias has been widely reported to correlate with GC composition. However, the quantitative relationship between codon usage bias and GC composition across species has not been reported. Results Based on an informatics method (SCUO) we developed previously using Shannon informational theory and maximum entropy theory, we investigated the quantitative relationship between codon usage bias and GC composition. The regression based on 70 bacterial and 16 archaeal genomes showed that in bacteria, SCUO = -2.06 * GC3 + 2.05*(GC3)2 + 0.65, r = 0.91, and that in archaea, SCUO = -1.79 * GC3 + 1.85*(GC3)2 + 0.56, r = 0.89. We developed an analytical model to quantify synonymous codon usage bias by GC compositions based on SCUO. The parameters within this model were inferred by inspecting the relationship between codon usage bias and GC composition across 70 bacterial and 16 archaeal genomes. We further simplified this relationship using only GC3. This simple model was supported by computational simulation. Conclusions The synonymous codon usage bias could be simply expressed as 1+ (p/2)log2(p/2) + ((1-p)/2)log2((l-p)/2), where p = GC3. The software we developed for measuring SCUO (codonO) is available at .
Collapse
Affiliation(s)
- Xiu-Feng Wan
- Environmental Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
- Digital Biology Laboratory, Department of Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Dong Xu
- Digital Biology Laboratory, Department of Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Andris Kleinhofs
- Department of Genetics and Cell Biology, Washington State University, Pullman, WA 99164, USA
| | - Jizhong Zhou
- Environmental Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| |
Collapse
|
63
|
Bharanidharan D, Bhargavi GR, Uthanumallian K, Gautham N. Correlations between nucleotide frequencies and amino acid composition in 115 bacterial species. Biochem Biophys Res Commun 2004; 315:1097-103. [PMID: 14985126 DOI: 10.1016/j.bbrc.2004.01.129] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2004] [Indexed: 11/21/2022]
Abstract
We studied the correlations between amino acid composition and mononucleotide and dinucleotide frequencies in 115 bacterial genomes of varying G+C content. Observed amino acid frequencies were compared with those expected from the actual mononucleotide and dinucleotide frequencies. Both mononucleotide and dinucleotide frequencies correlate well with the amino acid frequency, with dinucleotide frequencies doing so better. Despite the strong correlations, some of the observed amino acid frequencies, in particular for Arg, Val, Asp, Glu, Ser, and Cys, were consistently different from predicted values in all genomes. We suggest that this variation from predicted values is a consequence of selection pressure at the level of amino acids, while the close correspondence to the predictions in residues such as Thr, Phe, Lys, and Asn arises only from mutation and selection pressure at the level of the nucleic acid sequences.
Collapse
Affiliation(s)
- D Bharanidharan
- Department of Crystallography and Biophysics, University of Madras, Guindy Campus, Chennai 600 025, India
| | | | | | | |
Collapse
|
64
|
Abstract
We analyzed the codon usage bias of eight open reading frames (ORFs) across up to 79 human papillomavirus (HPV) genotypes from three distinct phylogenetic groups. All eight ORFs across HPV genotypes show a strong codon usage bias, amongst degenerately encoded amino acids, toward 18 codons mainly with T at the 3rd position. For all 18 degenerately encoded amino acids, codon preferences amongst human and animal PV ORFs are significantly different from those averaged across mammalian genes. Across the HPV types, the L2 ORFs show the highest codon usage bias (73.2+/-1.6% and the E4 ORFs the lowest (51.1+/-0.5%), reflecting as similar bias in codon 3rd position A+T content (L2: 76.1+/-4.2%; E4: 58.6+/-4.5%). The E4 ORF, uniquely amongst the HPV ORFs, is G+C rich, while the other ORFs are A+T rich. Codon usage bias correlates positively with A+T content at the codon 3rd position in the E2, E6, L1 and L2 ORFs, but negatively in the E4 ORFs. A general conservation of preferred codon usage across human and non-human PV genotypes whether they originate from a same supergroup or not, together with observed difference between the preferred codon usage for HPV ORFs and for genes of the cells they infect, suggests that specific codon usage bias and A+T content variation may somehow increase the replicational fitness of HPVs in mammalian epithelial cells, and have practical implications for gene therapy of HPV infection.
Collapse
Affiliation(s)
- Kong-Nan Zhao
- Centre for Immunology and Cancer Research, Princess Alexandra Hospital, University of Queensland, Qld 4102, Woolloongabba, Australia.
| | | | | |
Collapse
|
65
|
Abstract
It is well known that the gene distribution is non-uniform in the human genome, reaching the highest concentration in the GC-rich isochores. Also the amino acid frequencies, and the hydrophobicity, of the corresponding encoded proteins are affected by the high GC level of the genes localized in the GC-rich isochores. It was hypothesized that the gene expression level as well is higher in GC-rich compared to GC-poor isochores [Mol. Biol. Evol. 10 (1993) 186]. Several features of human genes and proteins, namely expression level, coding and non-coding lengths, and hydrophobicity were investigated in the present paper. The results support the hypothesis reported above, since all the parameters so far studied converge to the same conclusion, that the average expression level of the GC-rich genes is significantly higher than that of the GC-poor genes.
Collapse
Affiliation(s)
- Stilianos Arhondakis
- Laboratorio di Evoluzione Molecolare, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121 Naples, Italy
| | | | | | | |
Collapse
|
66
|
Chen SL, Lee W, Hottes AK, Shapiro L, McAdams HH. Codon usage between genomes is constrained by genome-wide mutational processes. Proc Natl Acad Sci U S A 2004; 101:3480-5. [PMID: 14990797 PMCID: PMC373487 DOI: 10.1073/pnas.0307827100] [Citation(s) in RCA: 230] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Analysis of genome-wide codon bias shows that only two parameters effectively differentiate the genome-wide codon bias of 100 eubacterial and archaeal organisms. The first parameter correlates with genome GC content, and the second parameter correlates with context-dependent nucleotide bias. Both of these parameters may be calculated from intergenic sequences. Therefore, genome-wide codon bias in eubacteria and archaea may be predicted from intergenic sequences that are not translated. When these two parameters are calculated for genes from nonmammalian eukaryotic organisms, genes from the same organism again have similar values, and genome-wide codon bias may also be predicted from intergenic sequences. In mammals, genes from the same organism are similar only in the second parameter, because GC content varies widely among isochores. Our results suggest that, in general, genome-wide codon bias is determined primarily by mutational processes that act throughout the genome, and only secondarily by selective forces acting on translated sequences.
Collapse
Affiliation(s)
- Swaine L Chen
- Department of Developmental Biology, Stanford University School of Medicine, Beckman Center, B300, Stanford, CA 94304, USA.
| | | | | | | | | |
Collapse
|
67
|
Abstract
A positive correlation holds between the GC level of third codon positions of human genes (GC(3)) and hydropathy of the encoded proteins. This correlation may appear counterintuitive, since it links a physical property of proteins to the base composition of 'synonymous' sites. We here establish the nontriviality of the correlation, which has recently been contested. In particular, the correlation cannot simply be a consequence of an analogous correlation for first and second codon positions, since no such correlation exists. More generally, for any explanation via two chained correlations, the intermediate property would need to be strongly correlated with hydrophobicity and/or GC(3).
Collapse
Affiliation(s)
- Kamel Jabbari
- Laboratoire de Génétique Moléculaire, Institut Jacques Monod, 2 Place Jussieu, 75005 Paris, France
| | | | | | | |
Collapse
|
68
|
Chattopadhyay S, Chakrabarti J. Temporal changes in phosphoglycerate kinase coding sequences: a quantitative measure. J Comput Biol 2003; 10:83-93. [PMID: 12676052 DOI: 10.1089/106652703763255688] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The ratio of the average of the square of the number of the nucleotides to that of the random sequence of the same strand bias is proposed as a quantitative measure of evolution in some coding DNA sequences. Applying this measure to the phosphoglycerate kinase gene we observe a monotonic rise of the ratio with evolution. We present an interpretation of this data on some bacteria.
Collapse
Affiliation(s)
- Sujay Chattopadhyay
- Department of Theoretical Physics, Indian Association for the Cultivation of Science, Calcutta 700 032,
| | | |
Collapse
|
69
|
Herbeck JT, Wall DP, Wernegreen JJ. Gene expression level influences amino acid usage, but not codon usage, in the tsetse fly endosymbiont Wigglesworthia. MICROBIOLOGY (READING, ENGLAND) 2003; 149:2585-2596. [PMID: 12949182 DOI: 10.1099/mic.0.26381-0] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Wigglesworthia glossinidia brevipalpis, the obligate bacterial endosymbiont of the tsetse fly Glossina brevipalpis, is characterized by extreme genome reduction and AT nucleotide composition bias. Here, multivariate statistical analyses are used to test the hypothesis that mutational bias and genetic drift shape synonymous codon usage and amino acid usage of Wigglesworthia. The results show that synonymous codon usage patterns vary little across the genome and do not distinguish genes of putative high and low expression levels, thus indicating a lack of translational selection. Extreme AT composition bias across the genome also drives relative amino acid usage, but predicted high-expression genes (ribosomal proteins and chaperonins) use GC-rich amino acids more frequently than do low-expression genes. The levels and configuration of amino acid differences between Wigglesworthia and Escherichia coli were compared to test the hypothesis that the relatively GC-rich amino acid profiles of high-expression genes reflect greater amino acid conservation at these loci. This hypothesis is supported by reduced levels of protein divergence at predicted high-expression Wigglesworthia genes and similar configurations of amino acid changes across expression categories. Combined, the results suggest that codon and amino acid usage in the Wigglesworthia genome reflect a strong AT mutational bias and elevated levels of genetic drift, consistent with expected effects of an endosymbiotic lifestyle and repeated population bottlenecks. However, these impacts of mutation and drift are apparently attenuated by selection on amino acid composition at high-expression genes.
Collapse
Affiliation(s)
- Joshua T Herbeck
- Josephine Bay Paul Center for Comparative Molecular Biology and Evolution, Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA 02543, USA
| | - Dennis P Wall
- Department of Biological Sciences, Stanford University, Stanford, CA 94305, USA
| | - Jennifer J Wernegreen
- Josephine Bay Paul Center for Comparative Molecular Biology and Evolution, Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA 02543, USA
| |
Collapse
|
70
|
Abstract
Synonymous codon usage bias is determined by a combination of mutational biases, selection at the level of translation, and genetic drift. In a study of mtDNA in insects, we analyzed patterns of codon usage across a phylogeny of 88 insect species spanning 12 orders. We employed a likelihood-based method for estimating levels of codon bias and determining major codon preference that removes the possible effects of genome nucleotide composition bias. Three questions are addressed: (1) How variable are codon bias levels across the phylogeny? (2) How variable are major codon preferences? and (3) Are there phylogenetic constraints on codon bias or preference? There is high variation in the level of codon bias values among the 88 taxa, but few readily apparent phylogenetic patterns. Bias level shifts within the lepidopteran genus Papilio are most likely a result of population size effects. Shifts in major codon preference occur across the tree in all of the amino acids in which there was bias of some level. The vast majority of changes involves double-preference models, however, and shifts between single preferred codons within orders occur only 11 times. These shifts among codons in double-preference models are phylogenetically conservative.
Collapse
Affiliation(s)
- Joshua T Herbeck
- Division of Insect Biology, University of California, Berkeley, Berkeley, CA 94720, USA.
| | | |
Collapse
|
71
|
Som A, Sahoo S, Mukhopadhyay I, Chakrabarti J, Chaudhury R. Scaling violations in coding DNA. EUROPHYSICS LETTERS (EPL) 2003; 62:271-277. [DOI: 10.1209/epl/i2003-00341-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/19/2023]
|
72
|
Duret L, Semon M, Piganeau G, Mouchiroud D, Galtier N. Vanishing GC-rich isochores in mammalian genomes. Genetics 2002; 162:1837-47. [PMID: 12524353 PMCID: PMC1462357 DOI: 10.1093/genetics/162.4.1837] [Citation(s) in RCA: 123] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
To understand the origin and evolution of isochores-the peculiar spatial distribution of GC content within mammalian genomes-we analyzed the synonymous substitution pattern in coding sequences from closely related species in different mammalian orders. In primate and cetartiodactyls, GC-rich genes are undergoing a large excess of GC --> AT substitutions over AT --> GC substitutions: GC-rich isochores are slowly disappearing from the genome of these two mammalian orders. In rodents, our analyses suggest both a decrease in GC content of GC-rich isochores and an increase in GC-poor isochores, but more data will be necessary to assess the significance of this pattern. These observations question the conclusions of previous works that assumed that base composition was at equilibrium. Analysis of allele frequency in human polymorphism data, however, confirmed that in the GC-rich parts of the genome, GC alleles have a higher probability of fixation than AT alleles. This fixation bias appears not strong enough to overcome the large excess of GC --> AT mutations. Thus, whatever the evolutionary force (neutral or selective) at the origin of GC-rich isochores, this force is no longer effective in mammals. We propose a model based on the biased gene conversion hypothesis that accounts for the origin of GC-rich isochores in the ancestral amniote genome and for their decline in present-day mammals.
Collapse
Affiliation(s)
- Laurent Duret
- Laboratoire de Biométrie et Biologie Evolutive, UMR CNRS 5558 Université Claude Bernard Lyon 1, 69622 Villeurbanne Cedex, France.
| | | | | | | | | |
Collapse
|
73
|
D'Onofrio G, Ghosh TC, Bernardi G. The base composition of the genes is correlated with the secondary structures of the encoded proteins. Gene 2002; 300:179-87. [PMID: 12468099 DOI: 10.1016/s0378-1119(02)01045-4] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
The analysis of a non-redundant set of human proteins, for which both the crystallographic structures and the corresponding gene sequences are available, show that bases at third codon position are non-uniformly distributed along the coding sequences. Significant compositional differences are found by comparing the gene regions corresponding to the different secondary structures of the proteins. Inter-and intra-structure differences were most pronounced in the GC-richest genes. These results are not compatible with any proposed hypotheses based on a neutral process of formation/maintenance of the high GC(3) levels of the genes localized in the GC-richest isochores of the human genome.
Collapse
Affiliation(s)
- Giuseppe D'Onofrio
- Laboratorio di Evoluzione Molecolare, Stazione Zoologica A. Dohrn, Naples, Italy.
| | | | | |
Collapse
|
74
|
Abstract
Genes are non-uniformly distributed in the human genome, reaching the highest concentration in GC-rich isochores. This is one of the fundamental aspects of the human genome organization (Gene 241/259 (2000a,b) 3/31, for a review). In the present paper the gene distribution was analyzed in relationship to the gene expression pattern and levels. In this study evidence is produced showing: (i) that a biased gene distribution towards GC-rich isochores applies to both tissue-specific and housekeeping genes; and (ii) that genes localized in GC-rich isochores have high transcriptional levels. Since gene density and transcriptional levels are correlated with each other and both are correlated with the GC content of the isochores, the biased gene distribution in the human genome presumably is the result of selection at the gene expression levels.
Collapse
Affiliation(s)
- Giuseppe D'Onofrio
- Laboratorio di Evoluzione Molecolare, Stazione Zoologica Anton Dohrn, Naples, Italy.
| |
Collapse
|
75
|
Birdsell JA. Integrating genomics, bioinformatics, and classical genetics to study the effects of recombination on genome evolution. Mol Biol Evol 2002; 19:1181-97. [PMID: 12082137 DOI: 10.1093/oxfordjournals.molbev.a004176] [Citation(s) in RCA: 183] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
This study presents compelling evidence that recombination significantly increases the silent GC content of a genome in a selectively neutral manner, resulting in a highly significant positive correlation between recombination and "GC3s" in the yeast Saccharomyces cerevisiae. Neither selection nor mutation can explain this relationship. A highly significant GC-biased mismatch repair system is documented for the first time in any member of the Kingdom Fungi. Much of the variation in the GC3s within yeast appears to result from GC-biased gene conversion. Evidence suggests that GC-biased mismatch repair exists in numerous organisms spanning six kingdoms. This transkingdom GC mismatch repair bias may have evolved in response to a ubiquitous AT mutational bias. A significant positive correlation between recombination and GC content is found in many of these same organisms, suggesting that the processes influencing the evolution of the yeast genome may be a general phenomenon. Nonrecombining regions of the genome and nonrecombining genomes would not be subject to this type of molecular drive. It is suggested that the low GC content characteristic of many nonrecombining genomes may be the result of three processes (1) a prevailing AT mutational bias, (2) random fixation of the most common types of mutation, and (3) the absence of the GC-biased gene conversion which, in recombining organisms, permits the reversal of the most common types of mutation. A model is proposed to explain the observation that introns, intergenic regions, and pseudogenes typically have lower GC content than the silent sites of corresponding open reading frames. This model is based on the observation that the greater the heterology between two sequences, the less likely it is that recombination will occur between them. According to this "Constraint" hypothesis, the formation and propagation of heteroduplex DNA is expected to occur, on average, more frequently within conserved coding and regulatory regions of the genome. In organisms possessing GC-biased mismatch repair, this would enhance the GC content of these regions through biased gene conversion. These findings have a number of important implications for the way we view genome evolution and suggest a new model for the evolution of sex.
Collapse
Affiliation(s)
- John A Birdsell
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona 85121, USA.
| |
Collapse
|
76
|
Bronner G, Spataro B, Gautier C. Cartographie génomique comparée chez les mammifères. Med Sci (Paris) 2002. [DOI: 10.1051/medsci/20021867767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
77
|
Berkhout B, Grigoriev A, Bakker M, Lukashov VV. Codon and amino acid usage in retroviral genomes is consistent with virus-specific nucleotide pressure. AIDS Res Hum Retroviruses 2002; 18:133-41. [PMID: 11839146 DOI: 10.1089/08892220252779674] [Citation(s) in RCA: 63] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Retroviral RNA genomes are known to have a biased nucleotide composition. For instance, the plus-strand RNA of human immunodeficiency virus (HIV) is A-rich, and the genome of human T cell leukemia virus (HTLV) is C-rich, and other retroviruses have a U-rich or G-rich genome. The biased composition of these genomes is most likely caused by directional mutational pressure of the respective reverse transcriptase enzymes. Using a set of retroviral genomes with a distinct nucleotide composition, we performed skew analyses of the nucleotide bias along the complete viral genome. Distinct nucleotide signatures were apparent, and these typical patterns were generally conserved across the viral genome. Furthermore, it is demonstrated that this typical nucleotide bias, combined with a profound discrimination against the CpG dinucleotide sequence, strongly influences the codon usage of the retroviruses in a direct manner, and their amino acid usage in an indirect manner. The fact that both codon usage and amino acid usage are so closely entwined with the genome composition has important practical implications. For instance, the typical trends in nucleotide usage could influence the molecular phylogenetic reconstruction of the family Retroviridae.
Collapse
Affiliation(s)
- Ben Berkhout
- Department of Human Retrovirology, Academic Medical Center, University of Amsterdam, 1105 AZ Amsterdam, The Netherlands.
| | | | | | | |
Collapse
|
78
|
Ponger L, Duret L, Mouchiroud D. Determinants of CpG islands: expression in early embryo and isochore structure. Genome Res 2001; 11:1854-60. [PMID: 11691850 PMCID: PMC311164 DOI: 10.1101/gr.174501] [Citation(s) in RCA: 87] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
In an attempt to understand the origin of CpG islands (CGIs) in mammalian genomes, we have studied their location and structure according to the expression pattern of genes and to the G + C content of isochores in which they are embedded. We show that CGIs located over the transcription start site (named start CGIs) are very different structurally from the others (named no-start CGIs): (1) 61.6% of the no-start CGIs are due to repeated sequences (79 % are due to Alus), whereas only 5.6% of the start CGIs are due to such repeats; (2) start CGIs are longer and display a higher CpGo/e ratio and G + C level than no-start CGIs. The frequency of tissue-specific genes associated to a start CGI varies according to the genomic G + C content, from 25% in G + C-poor isochores to 64% in G + C-rich isochores. Conversely, the frequency of housekeeping genes associated to a start CGI (90%) is independent of the isochore context. Interestingly, the structure of start CGIs is very similar for tissue-specific and housekeeping genes. Moreover, 93% of genes expressed in early embryo are found to exhibit a CpG island over their transcription start point. These observations are consistent with the hypothesis that the occurrence of these CGIs is the consequence of gene expression at this stage, when the methylation pattern is installed.
Collapse
Affiliation(s)
- L Ponger
- Laboratoire de Biométrie et Biologie Evolutive, Unité Nixte de Recherche Centre National de la Recherche Scientifique 5558-Université Claude Bernard, 69622 Villeurbanne Cedex, France.
| | | | | |
Collapse
|
79
|
Abstract
Within-intron difference of correlation with base composition of the adjacent exons was studied in the genomes of 34 species. For this purpose, GC-percent was determined for segments of 50 bp in length taken at both intron margins and in the internal part of the intron. It was found that in certain genomes the coefficient of correlation with GC-percent of the adjacent exon was significantly higher for the intron margin than for the internal part of the intron (homeotherms, cereals). Only part of this difference can be explained by unequal probability of insertion of transposable elements. Those multicellular organisms which have a low or no within-intron difference in correlation with the adjacent exons (anamniotes, invertebrates, dicots) show a higher local compositional heterogeneity (a greater exon/intron contrast in the GC-content). These results are evidence against the mutational bias being a possible explanation for the compositional genome heterogeneity. Thus, in the genomes with a high global heterogeneity there seems to be a selective force for compliance of intron base composition with the adjacent exons. This force is stronger in those parts of the intron that are closer to exons. In addition, the previously found positive general correlation between the genome size and average intron length was confirmed with a much larger dataset. However, within separate phylogenetic groups this rule can be broken, as it occurs in the cereals (family Poaceae), where a negative correlation was found.
Collapse
Affiliation(s)
- A E Vinogradov
- Institute of Cytology, Russian Academy of Sciences, Tikhoretsky Avenue 4, 194064, St. Petersburg, Russia.
| |
Collapse
|
80
|
Abstract
A few months ago the International Human Genome Sequencing Consortium (IHGSC) published a 61-page paper on the human genome (IHGSC, Nature 409 (2001) 860). Here comments will be presented on some points of the paper that were previously investigated in our laboratory, and some misunderstandings and misconceptions about the organization and the evolutionary history of the human genome will be discussed. A very recent article on the same subject (Eyre-Walker and Hurst, Nat. Rev. Genet. 2 (2001) 549) will also be addressed. The present paper is a complement to two review articles which were published last year (Bernardi, Gene 241 (2000) 3; Gene 259(1) (2000) 31).
Collapse
Affiliation(s)
- G Bernardi
- Laboratory of Molecular Evolution, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121, Naples, Italy.
| |
Collapse
|
81
|
Musto H, Cruveiller S, D'Onofrio G, Romero H, Bernardi G. Translational selection on codon usage in Xenopus laevis. Mol Biol Evol 2001; 18:1703-7. [PMID: 11504850 DOI: 10.1093/oxfordjournals.molbev.a003958] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
A correspondence analysis of codon usage in Xenopus laevis revealed that the first axis is strongly correlated with the base composition at third codon positions. The second axis discriminates between putatively highly expressed genes and the other coding sequences, with expression levels being confirmed by the analysis of Expressed sequence tag frequencies. The comparison of codon usage of the sequences displaying the extreme values on the second axis indicates that several codons are statistically more frequent among the highly expressed (mainly housekeeping) genes. Translational selection appears, therefore, to influence synonymous codon usage in Xenopus.
Collapse
Affiliation(s)
- H Musto
- Laboratorio di Evoluzione Molecolare, Stazione Zoologica Anton Dohrn, Naples, Italy
| | | | | | | | | |
Collapse
|
82
|
Abstract
One of the most striking features of mammalian chromosomes is the variation in G+C content that occurs over scales of hundreds of kilobases to megabases, the so-called 'isochore' structure of the human genome. This variation in base composition affects both coding and non-coding sequences and seems to reflect a fundamental level of genome organization. However, although we have known about isochores for over 25 years, we still have a poor understanding of why they exist. In this article, we review the current evidence for the three main hypotheses.
Collapse
Affiliation(s)
- A Eyre-Walker
- Centre for the Study of Evolution and School of Biological Sciences, University of Sussex, Brighton BN1 9QG, UK.
| | | |
Collapse
|
83
|
Duret L, Hurst LD. The elevated GC content at exonic third sites is not evidence against neutralist models of isochore evolution. Mol Biol Evol 2001; 18:757-62. [PMID: 11319260 DOI: 10.1093/oxfordjournals.molbev.a003858] [Citation(s) in RCA: 38] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The human genome is divided into isochores, large stretches (>>300 kb) of genomic DNA with more or less consistent GC content. Mutational/neutralist and selectionist models have been put forward to explain their existence. A major criticism of the mutational models is that they cannot account for the higher GC content at fourfold-redundant silent sites within exons (GC4) than in flanking introns (GCi). Indeed, it has been asserted that it is hard to envisage a mutational bias explanation, as it is difficult to see how repair enzymes might act differently in exons and their flanking introns. However, this rejection, we note, ignores the effects of transposable elements (TEs), which are a major component of introns and tend to cause them to have a GC content different from (usually lower than) that dictated by point mutational processes alone. As TEs tend not to insert at the extremities of introns, this model predicts that GC content at the extremities of introns should be more like that at GC4 than are the intronic interiors. This we show to be true. The model also correctly predicts that small introns should have a composition more like that at GC4 than large introns. We conclude that the logic of the previous rejection of neutralist models is unsafe.
Collapse
Affiliation(s)
- L Duret
- Pole BioInformatique Lyonnais, Laboratoire BBE-UMR Centre National de la Recherche Scientifique 5558, Universite Claude Bernard-Lyon 1, Villeurbanne, France
| | | |
Collapse
|
84
|
Takano-Shimizu T. Local changes in GC/AT substitution biases and in crossover frequencies on Drosophila chromosomes. Mol Biol Evol 2001; 18:606-19. [PMID: 11264413 DOI: 10.1093/oxfordjournals.molbev.a003841] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
I present here evidence of remarkable local changes in GC/AT substitution biases and in crossover frequencies on Drosophila chromosomes. The substitution pattern at 10 loci in the telomeric region of the X chromosome was studied for four species of the Drosophila melanogaster species subgroup. Drosophila orena and Drosophila erecta are clearly the most closely related species pair (the erecta complex) among the four species studied; however, the overall data at the 10 loci revealed a clear dichotomy in the silent substitution patterns between the AT-biased- substitution melanogaster and erecta lineages and the GC-biased-substitution yakuba and orena lineages, suggesting two or more independent changes in GC/AT substitution biases. More importantly, the results indicated a between- loci heterogeneity in GC/AT substitution bias in this small region independently in the yakuba and orena lineages. Indeed, silent substitutions in the orena lineage were significantly biased toward G and C at the consecutive yellow, lethal of scute, and asense loci, but they were significantly biased toward A and T at sta. The substitution bias toward G and C was centered in different areas in yakuba (significantly biased at EG:165H7.3, EG:171D11.2, and suppressor of sable). The similar silent substitution patterns in coding and noncoding regions, furthermore, suggested mutational biases as a cause of the substitution biases. On the other hand, previous study reveals that Drosophila yakuba has about 20-fold higher crossover frequencies in the telomeric region of the X chromosome than does D. melanogaster; this study revealed that the total genetic map length of the yakuba X chromosome was only about 1.5 times as large as that of melanogaster and that the map length of the X-telomeric y-sta region did not differ between Drosophila yakuba and D. erecta. Taken together, the data strongly suggested that an approximately 20- fold reduction in the X-telomeric crossover frequencies occurred in the ancestral population of D. melanogaster after the melanogaster-yakuba divergence but before the melanogaster-simulans divergence.
Collapse
Affiliation(s)
- T Takano-Shimizu
- Department of Population Genetics, National Institute of Genetics, Mishima, Shizuoka-ken, Japan.
| |
Collapse
|
85
|
Knight RD, Freeland SJ, Landweber LF. A simple model based on mutation and selection explains trends in codon and amino-acid usage and GC composition within and across genomes. Genome Biol 2001; 2:RESEARCH0010. [PMID: 11305938 PMCID: PMC31479 DOI: 10.1186/gb-2001-2-4-research0010] [Citation(s) in RCA: 206] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2000] [Revised: 02/01/2001] [Accepted: 02/13/2001] [Indexed: 11/28/2022] Open
Abstract
BACKGROUND Correlations between genome composition (in terms of GC content) and usage of particular codons and amino acids have been widely reported, but poorly explained. We show here that a simple model of processes acting at the nucleotide level explains codon usage across a large sample of species (311 bacteria, 28 archaea and 257 eukaryotes). The model quantitatively predicts responses (slope and intercept of the regression line on genome GC content) of individual codons and amino acids to genome composition. RESULTS Codons respond to genome composition on the basis of their GC content relative to their synonyms (explaining 71-87% of the variance in response among the different codons, depending on measure). Amino-acid responses are determined by the mean GC content of their codons (explaining 71-79% of the variance). Similar trends hold for genes within a genome. Position-dependent selection for error minimization explains why individual bases respond differently to directional mutation pressure. CONCLUSIONS Our model suggests that GC content drives codon usage (rather than the converse). It unifies a large body of empirical evidence concerning relationships between GC content and amino-acid or codon usage in disparate systems. The relationship between GC content and codon and amino-acid usage is ahistorical; it is replicated independently in the three domains of living organisms, reinforcing the idea that genes and genomes at mutation/selection equilibrium reproduce a unique relationship between nucleic acid and protein composition. Thus, the model may be useful in predicting amino-acid or nucleotide sequences in poorly characterized taxa.
Collapse
Affiliation(s)
- Robin D Knight
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, NJ 08544, USA
| | - Stephen J Freeland
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, NJ 08544, USA
| | - Laura F Landweber
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, NJ 08544, USA
| |
Collapse
|
86
|
Abstract
Experimental approaches, as well as computer analysis on genomic sequences, have revealed a large variability in base composition between regions in the same genome or between genomes of different species. In most cases, however, the biological causes of these compositional biases remain unknown. The recent large increase in the availability of completely sequenced genomes can give new insight into evolution processes involved in these compositional biases.
Collapse
Affiliation(s)
- C Gautier
- Biometry and Evolutionary Biology Laboratory (bâtiment 741), Université Claude Bernard Lyon 1 and CNRS, 43 bd 11 nov, 69622 Villeurbanne Cedex, France.
| |
Collapse
|
87
|
Singer GA, Hickey DA. Nucleotide bias causes a genomewide bias in the amino acid composition of proteins. Mol Biol Evol 2000; 17:1581-8. [PMID: 11070046 DOI: 10.1093/oxfordjournals.molbev.a026257] [Citation(s) in RCA: 184] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We analyzed the nucleotide contents of several completely sequenced genomes, and we show that nucleotide bias can have a dramatic effect on the amino acid composition of the encoded proteins. By surveying the genes in 21 completely sequenced eubacterial and archaeal genomes, along with the entire Saccharomyces cerevisiae genome and two Plasmodium falciparum chromosomes, we show that biased DNA encodes biased proteins on a genomewide scale. The predicted bias affects virtually all genes within the genome, and it could be clearly seen even when we limited the analysis to sets of homologous gene sequences. Parallel patterns of compositional bias were found within the archaea and the eubacteria. We also found a positive correlation between the degree of amino acid bias and the magnitude of protein sequence divergence. We conclude that mutational bias can have a major effect on the molecular evolution of proteins. These results could have important implications for the interpretation of protein-based molecular phylogenies and for the inference of functional protein adaptation from comparative sequence data.
Collapse
Affiliation(s)
- G A Singer
- Department of Biology, University of Ottawa, Ottawa, Ontario, Canada
| | | |
Collapse
|
88
|
Abstract
We present a new approach to DNA segmentation into compositionally homogeneous blocks. The Bayesian estimator, which is applicable for both short and long segments, is used to obtain the measure of homogeneity. An exact optimal segmentation is found via the dynamic programming technique. After completion of the segmentation procedure, the sequence composition on different scales can be analyzed with filtration of boundaries via the partition function approach.
Collapse
Affiliation(s)
- V E Ramensky
- Engelhardt Institute of Molecular Biology, Vavilova, Russia.
| | | | | |
Collapse
|
89
|
Abstract
The nuclear genomes of vertebrates are mosaics of isochores, very long stretches (>>300kb) of DNA that are homogeneous in base composition and are compositionally correlated with the coding sequences that they embed. Isochores can be partitioned in a small number of families that cover a range of GC levels (GC is the molar ratio of guanine+cytosine in DNA), which is narrow in cold-blooded vertebrates, but broad in warm-blooded vertebrates. This difference is essentially due to the fact that the GC-richest 10-15% of the genomes of the ancestors of mammals and birds underwent two independent compositional transitions characterized by strong increases in GC levels. The similarity of isochore patterns across mammalian orders, on the one hand, and across avian orders, on the other, indicates that these higher GC levels were then maintained, at least since the appearance of ancestors of warm-blooded vertebrates. After a brief review of our current knowledge on the organization of the vertebrate genome, evidence will be presented here in favor of the idea that the generation and maintenance of the GC-richest isochores in the genomes of warm-blooded vertebrates were due to natural selection.
Collapse
Affiliation(s)
- G Bernardi
- Laboratorio di Evoluzione Molecolare, Stazione Zoologica Anton Dohrn, Napoli, Italy.
| |
Collapse
|
90
|
Wan H, Wootton JC. A global compositional complexity measure for biological sequences: AT-rich and GC-rich genomes encode less complex proteins. ACTA ACUST UNITED AC 2000. [DOI: 10.1016/s0097-8485(00)80008-x] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
91
|
Nishizawa K, Nishizawa M, Kim KS. Tendency for local repetitiveness in amino acid usages in modern proteins. J Mol Biol 1999; 294:937-53. [PMID: 10588898 DOI: 10.1006/jmbi.1999.3275] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Systematic analyses of human proteins show that neural and immune system-specific, and therefore, relatively "modern" proteins have a tendency for repetitive use of amino acids at a local scale ( approximately 1-20 residues), while ancient proteins (human homologues of Escherichia coli proteins) do not. Those protein subsegments which are unique based on homology search account for the repetitiveness. Simulation shows that such repetitiveness can be maintained by frequent duplication on a very short scale (one to two codons) in the presence of substitutive point mutation, while the latter tends to mitigate the repetitiveness. DNA analyses also show the presence of cryptic (i.e. "out of the codon frame") repetitiveness, which cannot fully be explained by features in protein sequences. Simulative modification of the amino acid sequences of immune system-specific proteins estimate that 2.4 duplication events occur during the period equivalent to ten events of substitution mutation. It is also suggested that the repetitiveness leads to longitudinal unevenness within a given peptide domain. Those peptide motifs which contain similarly charged residues are likely to be generated more frequently in the presence of the tendency for repetitiveness than in its absence. Therefore, the neutral propensity of DNA for duplication, which can also tend to generate repetitiveness in amino acid sequences, seems to be manifested primarily when the constraints on amino acid sequences are relatively weak, and yet may be positively contributing to generation of unevenness in modern proteins.
Collapse
Affiliation(s)
- K Nishizawa
- Department of Biochemistry, Teikyo University School of Medicine, Kaga, Itabashi, Tokyo, 173, Japan.
| | | | | |
Collapse
|
92
|
Majumdar S, Gupta SK, Sundararajan VS, Ghosh TC. Compositional correlation studies among the three different codon positions in 12 bacterial genomes. Biochem Biophys Res Commun 1999; 266:66-71. [PMID: 10581166 DOI: 10.1006/bbrc.1999.1774] [Citation(s) in RCA: 24] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Compositional distributions in the three codon positions of the coding sequences of 12 fully sequenced prokaryotic genomes, which are publicly available, were investigated. A universal compositional correlation was observed in most of the genomes under investigation irrespective of their overall genomic GC contents. In all the genomes, the GC contents at the first codon positions are always greater than the overall GC contents of the genomes whereas the reverse is true in the case of second codon positions. GC contents at the third codon positions are higher than the overall genomic GC contents in high GC containing genomes, and the opposite situation was found in case of low GC genomes except for Helicobacter pylori. In high-GC rich genomes, the GC contents at the first + second codon positions are less than the GC contents at the third codon positions, and they are low in low-GC genomes except for Helicobacter pylori. The distributions of four bases at the three different positions were also investigated for all 12 organisms. It was observed that in high-GC genomes G is the most dominant base and in low-GC genomes A is the most dominant base in the first codon positions. But purine bases, i.e., (A + G), predominantly occur in the first codon position. In the second codon position, A is the most dominant base in most of the organisms and G is the least dominant base in all the organisms. There is no unique regular pattern of individual bases at the third codon positions; however, there are significant differences in the occurrences of (G + C) contents in the third codon positions among the different organisms. Calculations of dinucleotide frequencies in 12 different organisms indicate that in GC-rich genomes GG, GC, CC, and CG dinucleotides are the most dominant whereas the reverse is true in case of low-GC genomes. Biological implications of these results are discussed in this paper.
Collapse
Affiliation(s)
- S Majumdar
- Distributed Information Centre, Bose Institute, P 1/12, C.I.T. Scheme, VII M Calcutta, 700 054, India
| | | | | | | |
Collapse
|
93
|
D'Onofrio G, Jabbari K, Musto H, Bernardi G. The correlation of protein hydropathy with the base composition of coding sequences. Gene 1999; 238:3-14. [PMID: 10570978 DOI: 10.1016/s0378-1119(99)00257-7] [Citation(s) in RCA: 67] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
The "universal correlation" (D'Onofrio, G., Bernardi, G., 1992. A universal compositional correlation among codon positions. Gene 110, 81-88.) that holds between <GC3> and <GC1> or <GC2> (<GC> values are the average values of the coding sequences of each genome analyzed) at both the inter- and intra-genomic level, was re-analyzed on a vastly larger dataset. The results showed a slight, but significant, difference in the <GC3> vs. <GC1> correlations exhibited by prokaryotes and eukaryotes. This finding prompted an analysis of the correlation between <GC3> and the amino acid frequencies in the encoded proteins, which has shown that positive correlations exist between <GC3> values of coding sequences and the hydropathy of the corresponding proteins. These correlations are due to the fact that hydrophobic and amphypathic amino acids increase, whereas hydrophilic amino acids decrease with increasing <GC3> values. Hydropathy values of prokaryotic proteins are systematically higher than those of eukaryotes, but the slopes of the regression lines are identical. The lower hydrophobicity of eukaryotic proteins is due to differences in the amino acid composition. In particular, the twofold higher cysteine (and disulfide bond) level of eukaryotic proteins compared to prokaryotic proteins most probably compensates for their lower hydrophobicity. This supports the viewpoint that hydrophobicity plays a structural and functional role as far as protein stability is concerned.
Collapse
Affiliation(s)
- G D'Onofrio
- Laboratorio di Evoluzione Molecolare, Stazione Zoologica Anton Dohrn, Napoli, Italy
| | | | | | | |
Collapse
|
94
|
Cruveiller S, Jabbari K, D'Onofrio G, Bernardi G. Different hydrophobicities of orthologous proteins from Xenopus and human. Gene 1999; 238:15-21. [PMID: 10570979 DOI: 10.1016/s0378-1119(99)00259-0] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
A compositional transition was previously detected by comparing orthologous coding sequences from cold- and warm-blooded vertebrates (see Bernardi, G., Hughes, S., Mouchiroud, D., 1997. The major compositional transitions in the vertebrate genome. J. Mol. Evol. 44, S44-S51 for a review). The transition is characterized by higher GC levels (GC is the molar ratio of guanine+cytosine in DNA) and, especially, by higher GC3 levels (GC3 is the GC level of third codon positions) in coding sequences from warm-blooded vertebrates. This transition essentially affects GC-rich genes, although the nucleotide substitution rate is of the same order of magnitude in both GC-poor and GC-rich genes. In order to understand the evolutionary basis of the changes, we have compared the hydrophobicity of orthologous proteins from Xenopus and human. Although the differences are small in proteins encoded by coding sequences ranging from 0 to 65% in GC3, they are large in the proteins encoded by sequences characterized by GC3 values higher than 65%. The latter proteins are more hydrophobic in human than in Xenopus.
Collapse
Affiliation(s)
- S Cruveiller
- Laboratorio di Evoluzione Molecolare, Stazione Zoologica Anton Dohrn, Naples, Italy
| | | | | | | |
Collapse
|
95
|
Rodríguez-Trelles F, Tarrío R, Ayala FJ. Switch in codon bias and increased rates of amino acid substitution in the Drosophila saltans species group. Genetics 1999; 153:339-50. [PMID: 10471717 PMCID: PMC1460741 DOI: 10.1093/genetics/153.1.339] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
We investigated the nucleotide composition of five genes, Xdh, Adh, Sod, Per, and 28SrRNA, in nine species of Drosophila (subgenus Sophophora) and one of Scaptodrosophila. The six species of the Drosophila saltans group markedly differ from the others in GC content and codon use bias. The GC content in the third codon position, and to a lesser extent in the first position and the introns, is higher in the D. melanogaster and D. obscura groups than in the D. saltans group (in Scaptodrosophila it is intermediate but closer to the melanogaster and obscura species). Differences are greater for Xdh than for Adh, Sod, Per, and 28SrRNA, which are functionally more constrained. We infer that rapid evolution of GC content in the saltans lineage is largely due to a shift in mutation pressure, which may have been associated with diminished natural selection due to smaller effective population numbers rather than reduced recombination rates. The rate of GC content evolution impacts the rate of protein evolution and may distort phylogenetic inferences. Previous observations suggesting that GC content evolution is very limited in Drosophila may have been distorted due to the restricted number of genes and species (mostly D. melanogaster) investigated.
Collapse
Affiliation(s)
- F Rodríguez-Trelles
- Department of Ecology and Evolutionary Biology, University of California, Irvine, California 92697-2525, USA.
| | | | | |
Collapse
|
96
|
Abstract
BACKGROUND Nucleotide substitution rates and G + C content vary considerably among mammalian genes. It has been proposed that the mammalian genome comprises a mosaic of regions - termed isochores - with differing G + C content. The regional variation in gene G + C content might therefore be a reflection of the isochore structure of chromosomes, but the factors influencing the variation of nucleotide substitution rate are still open to question. RESULTS To examine whether nucleotide substitution rates and gene G + C content are influenced by the chromosomal location of genes, we compared human and murid (mouse or rat) orthologues known to belong to one of the chromosomal (autosomal) segments conserved between these species. Multiple members of gene families were excluded from the dataset. Sets of neighbouring genes were defined as those lying within 1 centiMorgan (cM) of each other on the mouse genetic map. For both synonymous substitution rates and G + C content at silent sites, neighbouring genes were found to be significantly more similar to each other than sets of genes randomly drawn from the dataset. Moreover, we demonstrated that the regional similarities in G + C content (isochores) and synonymous substitution rate were independent of each other. CONCLUSIONS Our results provide the first substantial statistical evidence for the existence of a regional variation in the synonymous substitution rate within the mammalian genome, indicating that different chromosomal regions evolve at different rates. This regional phenomenon which shapes gene evolution could reflect the existence of 'evolutionary rate units' along the chromosome.
Collapse
Affiliation(s)
- G Matassi
- Institute of Genetics, University of Nottingham, Queens Medical Centre, Nottingham, NG7 2UH, UK.
| | | | | |
Collapse
|
97
|
D'Onofrio G, Jabbari K, Musto H, Alvarez-Valin F, Cruveiller S, Bernardi G. Evolutionary genomics of vertebrates and its implications. Ann N Y Acad Sci 1999; 870:81-94. [PMID: 10415475 DOI: 10.1111/j.1749-6632.1999.tb08867.x] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
The discovery that the vertebrate genomes of warm-blooded vertebrates are mosaics of isochores, long DNA segments homogeneous in base composition, yet belonging to families covering a broad spectrum of GC levels, has led to two major observations. The first is that gene density is strikingly non-uniform in the genome of all vertebrates, gene concentration increasing with increasing GC levels. (Although the genomes of cold-blooded vertebrates are characterized by smaller compositional heterogeneities than those of warm-blooded vertebrates and high GC levels are not attained, their gene distribution is basically similar to that of warm-blooded vertebrates.) The second observation is that the GC-richest and gene-richest isochores underwent a compositional transition (characterized by a strong increase in GC level) between cold- and warm-blooded vertebrates. Evidence to be discussed favors the idea that this compositional transition and the ensuing highly heterogeneous compositional pattern was due to, and was maintained by, natural selection.
Collapse
Affiliation(s)
- G D'Onofrio
- Laboratoire de Génétique Moléculaire, Institut Jacques Monod 2, Paris, France.
| | | | | | | | | | | |
Collapse
|
98
|
Morton BR. Strand asymmetry and codon usage bias in the chloroplast genome of Euglena gracilis. Proc Natl Acad Sci U S A 1999; 96:5123-8. [PMID: 10220429 PMCID: PMC21827 DOI: 10.1073/pnas.96.9.5123] [Citation(s) in RCA: 48] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/1998] [Accepted: 02/22/1999] [Indexed: 11/18/2022] Open
Abstract
It is shown that the two strands of the chloroplast genome from Euglena gracilis are asymmetric with regards to nucleotide composition. This asymmetry switches at both the origin of replication and a location that is halfway around the circular genome from the origin. In both halves of the genome the leading strand is G+T-rich, having a bias toward G over C and T over A, and the lagging strand is A+C-rich. This asymmetry is probably the result of a difference in mutation dynamics between the leading and lagging strands. In addition to composition asymmetry, the two strands differ with regards to coding content. In both halves of the genome the vast majority of genes are coded by the leading strand. These two aspects of strand asymmetry are then applied to a statistical test for selection on codon usage. The results indicate that selection on codon usage is limited to genes on the leading strand; no gene on the A+C-rich lagging strand shows evidence for selection, suggesting that highly expressed genes are coded predominantly on the strand of DNA that is the leading strand during replication. On the basis of these observations it is proposed that the coding strand bias is generated by selection to code highly expressed genes on the leading strand to coordinate the direction of replication and transcription, thereby increasing the potential rate of both reactions.
Collapse
Affiliation(s)
- B R Morton
- Department of Biological Sciences, Barnard College, Columbia University, 3009 Broadway, New York, NY 10027, USA.
| |
Collapse
|
99
|
Abstract
Transcriptional repression in eukaryotes often involves tens or hundreds of kilobase pairs, two to three orders of magnitude more than the bacterial operator/repressor model does. Classical repression, represented by this model, was maintained over the whole span of evolution under different guises, and consists of repressor factors interacting primarily with promoters and, in later evolution, also with enhancers. The use of much larger amounts of DNA in the other mode of repression, here called the sectorial mode ('superrepression'), results in the conceptual transfer of so-called junk DNA to the domain of functional DNA. This contribution to the solution of the c-value paradox involves perhaps 15% of genomic 'junk,' and encompasses the bulk of the introns, thought to fill a stabilizing role in sectorially repressed chromatin structures. In the case of developmental genes, such structures appear to be heterochromatoid in character. However, solid clues regarding general structural features of superrepressed terminal differentiation genes remain elusive. The competition among superrepressible DNA sectors for sectorially binding factors offers, in principle, a molecular mechanism for developmental switches. Position effect variegation may be considered an abnormal manifestation of normal processes that underly development and involve heterochromatoid sectorial repression, which is apparently required for local elimination or modulation of morphological features (morpholysis). Sectorial repression of genes participating either in development or in terminal differentiation is considered instrumental in establishing stable cell types, and provides a basis for the distinction between determination and cell type specification. The gamut of possible stable cell types may have been broadened by the appearance in evolution of heavy isochores. Additional types of relatively frequent GC-rich cis-acting DNA motifs may offer reiterated binding sites to factors endowed with a selective (though not individually strong) affinity for these motifs. The majority of sequence motifs thought to be used in superrepression need not be individually maintained by natural selection. It is re-emphasized that the dispensability of sequences is not an indicator of their nonfunctionality and that in many cases, along noncoding sequences, nucleotides tend to fill functions collectively, rather than individually.
Collapse
Affiliation(s)
- E Zuckerkandl
- Institute of Molecular Medical Sciences, Palo Alto, CA 94306, USA
| |
Collapse
|
100
|
Lobry JR. Influence of genomic G+C content on average amino-acid composition of proteins from 59 bacterial species. Gene X 1997; 205:309-16. [PMID: 9461405 DOI: 10.1016/s0378-1119(97)00403-4] [Citation(s) in RCA: 106] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
The amino-acid composition of 23,490 proteins from 59 bacterial species was analyzed as a function of genomic G+C content. Observed amino-acid frequencies were compared with those expected from a neutral model assuming the absence of selection on average protein composition. Integral membrane proteins and non-integral membrane proteins were analyzed separately. The average deviation from this neutral model shows that there is a selective pressure increasing content in charged amino acids for non-integral membrane proteins, and content in hydrophobic amino acids for integral membrane proteins. Amino-acid frequencies were greatly influenced by genomic G+C content, but the influence was found to be often weaker than predicted. This may be evidence for a selective pressure, maintaining most amino-acid frequencies close to an optimal value. Concordance between the genetic code and protein composition is discussed in the light of this observation.
Collapse
Affiliation(s)
- J R Lobry
- CNRS UMR 5558-Laboratoire BGBP, Université Claude Bernard, Villeurbanne, France.
| |
Collapse
|