1
|
Lamolle G, Simón D, Iriarte A, Musto H. Main Factors Shaping Amino Acid Usage Across Evolution. J Mol Evol 2023:10.1007/s00239-023-10120-5. [PMID: 37264211 DOI: 10.1007/s00239-023-10120-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 05/17/2023] [Indexed: 06/03/2023]
Abstract
The standard genetic code determines that in most species, including viruses, there are 20 amino acids that are coded by 61 codons, while the other three codons are stop triplets. Considering the whole proteome each species features its own amino acid frequencies, given the slow rate of change, closely related species display similar GC content and amino acids usage. In contrast, distantly related species display different amino acid frequencies. Furthermore, within certain multicellular species, as mammals, intragenomic differences in the usage of amino acids are evident. In this communication, we shall summarize some of the most prominent and well-established factors that determine the differences found in the amino acid usage, both across evolution and intragenomically.
Collapse
Affiliation(s)
- Guillermo Lamolle
- Laboratorio de Genómica Evolutiva, Facultad de Ciencias, Universidad de La República, Montevideo, Uruguay
| | - Diego Simón
- Laboratorio de Genómica Evolutiva, Facultad de Ciencias, Universidad de La República, Montevideo, Uruguay
- Laboratorio de Virología Molecular, Centro de Investigaciones Nucleares, Facultad de Ciencias, Universidad de La República, Montevideo, Uruguay
- Laboratorio de Evolución Experimental de Virus, Institut Pasteur de Montevideo, Montevideo, Uruguay
| | - Andrés Iriarte
- Laboratorio de Genómica Evolutiva, Facultad de Ciencias, Universidad de La República, Montevideo, Uruguay
- Laboratorio de Biología Computacional, Departamento de Desarrollo Biotecnológico, Instituto de Higiene, Facultad de Medicina, Universidad de La República, Montevideo, Uruguay
| | - Héctor Musto
- Laboratorio de Genómica Evolutiva, Facultad de Ciencias, Universidad de La República, Montevideo, Uruguay.
| |
Collapse
|
2
|
Musto H. How Many Factors Influence Genomic GC Content Among Prokaryotes? J Mol Evol 2023; 91:6-9. [PMID: 36370165 DOI: 10.1007/s00239-022-10077-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 11/04/2022] [Indexed: 11/14/2022]
|
3
|
Musto H. In Memoriam of Giorgio Bernardi and Noboru Sueoka: A Personal View. J Mol Evol 2022; 90:325-327. [PMID: 35838772 DOI: 10.1007/s00239-022-10066-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Accepted: 07/08/2022] [Indexed: 11/24/2022]
Affiliation(s)
- Héctor Musto
- Laboratorio de Genómica Evolutiva, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay.
| |
Collapse
|
4
|
Lamolle G, Iriarte A, Musto H. Codon usage in the flatworm Schistosoma mansoni is shaped by the mutational bias towards A+T and translational selection, which increases GC-ending codons in highly expressed genes. Mol Biochem Parasitol 2021; 247:111445. [PMID: 34942292 DOI: 10.1016/j.molbiopara.2021.111445] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Revised: 12/14/2021] [Accepted: 12/17/2021] [Indexed: 11/30/2022]
Abstract
Schistosoma mansoni is a trematode flatworm that parasitizes humans and produces a disease called bilharzia. At the genomic level, it is characterized by a low genomic GC content and an "isochore-like" structure, where GC-richest regions, mainly placed at the extremes of the chromosomes, are interspersed with low GC-regions. Furthermore, the GC-richest regions are at the same time the gene-richest, and where the most heavily expressed genes are placed. Taking these features into account, we decided to reanalyze the codon usage of this flatworm. Our results show that a) when all genes are considered together, the strong mutational bias towards A + T leads to a predominance of A/T-ending codons, b) a multivariate analysis discriminates between highly and lowly expressed genes, c) the sequences expressed at highest levels display a significant increase in G/C-ending codons, d) when comparing the molecular distances with a closely related species the synonymous distance in highly expressed genes is significantly lower than in lowly expressed sequences. Therefore, we conclude that despite previous results, which were performed with a small sample of genes, codon usage in S. mansoni is the result of two forces that operate in opposite directions: while mutational bias leads to a predominance of A/T codons, translational selection, working at the level of speed, increment G/C ending triplets.
Collapse
Affiliation(s)
- Guillermo Lamolle
- Unidad de Genómica Evolutiva, Facultad de Ciencias, Universidad de la República, Iguá 4225, 11400 Montevideo, Uruguay
| | - Andrés Iriarte
- Laboratorio de Biología Computacional, Departamento de Desarrollo Biotecnológico, Instituto de Higiene, Facultad de Medicina, Universidad de la República, Avenida A. Navarro 3051, 11600 Montevideo, Uruguay.
| | - Héctor Musto
- Unidad de Genómica Evolutiva, Facultad de Ciencias, Universidad de la República, Iguá 4225, 11400 Montevideo, Uruguay.
| |
Collapse
|
5
|
Abstract
Since the genetic code is degenerate, several codons are translated to the same amino acid. Although these triplets were historically considered to be "synonymous" and therefore expected to be used at rather equal frequencies in all genomes, we now know that this is not the case. Indeed, since several coding sequences were obtained in the late '70s and early '80s in the last century, coming from either the same or different species, it was evident that (a) each genome, taken globally, displayed different codon usage patterns, which means that different genomes display a particular global codon usage table when all genes are considered together, and (b) there is a strong intragenomic diversity: in other words, within a given species the codon usage pattern can (and usually do) differ greatly among genes in the same genome. These different patterns were attributed to two main factors: first, the mutational bias characteristic of each genome, which determines that GC- poor species display a general bias towards A/T codons while the reverse is true for GC- rich species. Second, the differences in codon usage among genes from the same species are due to natural selection acting at the level of translation, in such a way that highly expressed genes tend to use codons that match with the most abundant isoacceptor tRNAs. Thus, these genes are translated at a highest rate, which in turn leads to avoid the limiting factor in translation which is the number of available ribosomes per cell. Although these explanations are still valid, new factors are almost constantly postulated to affect codon usage. In this mini review, we shall try to summarize them.
Collapse
Affiliation(s)
- Andrés Iriarte
- Laboratorio de Genómica Evolutiva, Depto. de Biología Celular y Molecular, Facultad de Ciencias, Universidad de la República, 11400, Montevideo, Uruguay.,Laboratorio de Biología Computacional, Depto. de Desarrollo Biotecnológico, Instituto de Higiene, Facultad de Medicina, Universidad de la República, 11600, Montevideo, Uruguay
| | - Guillermo Lamolle
- Laboratorio de Genómica Evolutiva, Depto. de Biología Celular y Molecular, Facultad de Ciencias, Universidad de la República, 11400, Montevideo, Uruguay
| | - Héctor Musto
- Laboratorio de Genómica Evolutiva, Depto. de Biología Celular y Molecular, Facultad de Ciencias, Universidad de la República, 11400, Montevideo, Uruguay.
| |
Collapse
|
6
|
Abstract
Eukaryotic genomes are compositionally heterogeneous, that is, composed by regions that differ in guanine-cytosine (GC) content (isochores). The most well documented case is that of vertebrates (mainly mammals) although it has been also noted among unicellular eukaryotes and invertebrates. In the human genome, regarded as a typical mammal, this heterogeneity is associated with several features. Specifically, genes located in GC-richest regions are the GC3-richest, display CpG islands and have shorter introns. Furthermore, these genes are more heavily expressed and tend to be located at the extremes of the chromosomes. Although the compositional heterogeneity seems to be widespread among eukaryotes, the associated properties noted in the human genome and other mammals have not been investigated in depth in other taxa Here we provide evidence that the genome of the parasitic flatworm Schistosoma mansoni is compositionally heterogeneous and exhibits an isochore-like structure, displaying some features associated, until now, only with the human and other vertebrate genomes, with the exception of gene concentration.
Collapse
Affiliation(s)
- Guillermo Lamolle
- Laboratorio de Organización y Evolución del Genoma, Facultad de Ciencias, Udelar, Montevideo, Uruguay
| | - Anna V Protasio
- Wellcome Trust Genome Campus, Wellcome Trust Sanger Institute, Cambridge, United Kingdom
| | - Andrés Iriarte
- Laboratorio de Organización y Evolución del Genoma, Facultad de Ciencias, Udelar, Montevideo, Uruguay Dpto. de Desarrollo Biotecnológico, Facultad de Medicina, Instituto de Higiene, Udelar, Montevideo, Uruguay
| | - Eugenio Jara
- Laboratorio de Organización y Evolución del Genoma, Facultad de Ciencias, Udelar, Montevideo, Uruguay
| | - Diego Simón
- Laboratorio de Organización y Evolución del Genoma, Facultad de Ciencias, Udelar, Montevideo, Uruguay
| | - Héctor Musto
- Laboratorio de Organización y Evolución del Genoma, Facultad de Ciencias, Udelar, Montevideo, Uruguay
| |
Collapse
|
7
|
|
8
|
Qiu H, Hildebrand F, Kuraku S, Meyer A. Unresolved orthology and peculiar coding sequence properties of lamprey genes: the KCNA gene family as test case. BMC Genomics 2011; 12:325. [PMID: 21699680 PMCID: PMC3141671 DOI: 10.1186/1471-2164-12-325] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2010] [Accepted: 06/23/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In understanding the evolutionary process of vertebrates, cyclostomes (hagfishes and lamprey) occupy crucial positions. Resolving molecular phylogenetic relationships of cyclostome genes with gnathostomes (jawed vertebrates) genes is indispensable in deciphering both the species tree and gene trees. However, molecular phylogenetic analyses, especially those including lamprey genes, have produced highly discordant results between gene families. To efficiently scrutinize this problem using partial genome assemblies of early vertebrates, we focused on the potassium voltage-gated channel, shaker-related (KCNA) family, whose members are mostly single-exon. RESULTS Seven sea lamprey KCNA genes as well as six elephant shark genes were identified, and their orthologies to bony vertebrate subgroups were assessed. In contrast to robustly supported orthology of the elephant shark genes to gnathostome subgroups, clear orthology of any sea lamprey gene could not be established. Notably, sea lamprey KCNA sequences displayed unique codon usage pattern and amino acid composition, probably associated with exceptionally high GC-content in their coding regions. This lamprey-specific property of coding sequences was also observed generally for genes outside this gene family. CONCLUSIONS Our results suggest that secondary modifications of sequence properties unique to the lamprey lineage may be one of the factors preventing robust orthology assessments of lamprey genes, which deserves further genome-wide validation. The lamprey lineage-specific alteration of protein-coding sequence properties needs to be taken into consideration in tackling the key questions about early vertebrate evolution.
Collapse
Affiliation(s)
- Huan Qiu
- Department of Biology, University of Konstanz, Konstanz, Germany
| | | | | | | |
Collapse
|
9
|
|
10
|
Abstract
The isochore theory depicts the genomes of warm-blooded vertebrates as a mosaic of long genomic regions that are characterized by relatively homogeneous GC content. In the absence of genomic data, the GC content at third-codon positions of protein-coding genes (GC3) was commonly used as a proxy for the GC content of isochores. Oddly, in the postgenomic era, GC3 is still sometimes used as a proxy for the GC composition of isochores. Here, we use genic and genomic sequences from human, chimpanzee, cow, mouse, rat, chicken, and zebrafish to show that GC3 only explains a very small proportion of the variation in GC content of long genomic sequences flanking the genes (GCf), and what little correlation there is between GC3 and GCf was found to decay rapidly with distance from the gene. The coefficient of variation of GC3 was found to be much larger than that of GCf and, therefore, GC3 and GCf values are not comparable with each other. Comparisons of orthologous gene pairs from 1) human and chimpanzee and 2) mouse and rat show strong correlations between their GC3 values, but very weak correlations between their GCf values. We conclude that the GC content of third-codon position cannot be used as stand-in for isochoric composition.
Collapse
Affiliation(s)
- Eran Elhaik
- Department of Biology and Biochemistry, University of Houston, TX, USA
| | | | | |
Collapse
|
11
|
Abstract
Vertebrate genomes are comprised of isochores that are relatively long (>100 kb) regions with a relatively homogenous (either GC-rich or AT-rich) base composition and with rather sharp boundaries with neighboring isochores. Mammals and living archosaurs (birds and crocodilians) have heterogeneous genomes that include very GC-rich isochores. In sharp contrast, the genomes of amphibians and fishes are more homogeneous and they have a lower overall GC content. Because DNA with higher GC content is more thermostable, the elevated GC content of mammalian and archosaurian DNA has been hypothesized to be an adaptation to higher body temperatures. This hypothesis can be tested by examining structure of isochores across the reptilian clade, which includes the archosaurs, testudines (turtles), and lepidosaurs (lizards and snakes), because reptiles exhibit diverse body sizes, metabolic rates, and patterns of thermoregulation. This study focuses on a comparative analysis of a new set of expressed genes of the red-eared slider turtle and orthologs of the turtle genes in mammalian (human, mouse, dog, and opossum), archosaurian (chicken and alligator), and amphibian (western clawed frog) genomes. EST (expressed sequence tag) data from a turtle cDNA library enriched for genes that have specialized functions (developmental genes) revealed using the GC content of the third-codon-position to examine isochore structure requires careful consideration of the types of genes examined. The more highly expressed genes (e.g., housekeeping genes) are more likely to be GC-rich than are genes with specialized functions. However, the set of highly expressed turtle genes demonstrated that the turtle genome has a GC content that is intermediate between the GC-poor amphibians and the GC-rich mammals and archosaurs. There was a strong correlation between the GC content of all turtle genes and the GC content of other vertebrate genes, with the slope of the line describing this relationship also indicating that the isochore structure of turtles is intermediate between that of amphibians and other amniotes. These data are consistent with some thermal hypotheses of isochore evolution, but we believe that the credible set of models for isochore evolution still includes a variety of models. These data expand the amount of genomic data available from reptiles upon which future studies of reptilian genomics can build.
Collapse
Affiliation(s)
- Jena L Chojnowski
- Department of Zoology, University of Florida, 223 Bartram Hall, PO Box 118525, Gainesville, FL 32611, USA
| | | |
Collapse
|
12
|
Rispe C, Legeai F, Gauthier JP, Tagu D. Strong heterogeneity in nucleotidic composition and codon bias in the pea aphid (Acyrthosiphon pisum) shown by EST-based coding genome reconstruction. J Mol Evol 2007; 65:413-24. [PMID: 17928936 DOI: 10.1007/s00239-007-9023-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2006] [Revised: 06/08/2007] [Accepted: 07/02/2007] [Indexed: 10/22/2022]
Abstract
The aim of this study was to analyze patterns of nucleotidic composition and codon usage in the pea aphid genome (Acyrthosiphon pisum). A collection of 60,000 expressed sequence tags (ESTs) in the pea aphid has been used to automatically reconstruct 5809 coding sequences (CDSs), based on similarity with known proteins and on coding style recognition. Reconstructions were manually checked for ribosomal proteins, leading to tentatively reconstruct the nea-complete set of this category. Pea aphid coding sequences showed a shift toward AT (especially at the third codon position) compared to drosophila homologues. Genes with a putative high level of expression (ribosomal and other genes with high EST support) remained more GC3-rich and had a distinct codon usage from bulk sequences: they exhibited a preference for C-ending codons and CGT (for arginine), which thus appeared optimal for translation. However, the discrimination was not as strong as in drosophila, suggesting a reduced degree of translational selection. The space of variation in codon usage for A. pisum appeared to be larger than in drosophila, with a substantial fraction of genes that remained GC3-rich. Some of those (in particular some structural proteins) also showed high levels of codon bias and a very strong preference for C-ending codons, which could be explained either by strong translational selection or by other mechanisms. Finally, genomic traces were analyzed to build 206 fragments containing a full CDS, which allowed studying the correlations between GC contents of coding and those of noncoding (flanking and introns) sequences.
Collapse
Affiliation(s)
- Claude Rispe
- Institut National de la Recherche Agronomique, Domaine de la Motte, Unité Mixte de Recherche 1099 BIO3P, Le Rheu, France.
| | | | | | | |
Collapse
|
13
|
Chojnowski JL, Franklin J, Katsu Y, Iguchi T, Guillette LJ, Kimball RT, Braun EL. Patterns of Vertebrate Isochore Evolution Revealed by Comparison of Expressed Mammalian, Avian, and Crocodilian Genes. J Mol Evol 2007; 65:259-66. [PMID: 17674077 DOI: 10.1007/s00239-007-9003-2] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2006] [Accepted: 05/18/2007] [Indexed: 10/23/2022]
Abstract
Vertebrate genomes are mosaics of isochores, defined as long (>100 kb) regions with relatively homogeneous within-region base composition. Birds and mammals have more GC-rich isochores than amphibians and fish, and the GC-rich isochores of birds and mammals have been suggested to be an adaptation to homeothermy. If this hypothesis is correct, all poikilothermic (cold-blooded) vertebrates, including the nonavian reptiles, are expected to lack a GC-rich isochore structure. Previous studies using various methods to examine isochore structure in crocodilians, turtles, and squamates have led to different conclusions. We collected more than 6000 expressed sequence tags (ESTs) from the American alligator to overcome sample size limitations suggested to be the fundamental problem in the previous reptilian studies. The alligator ESTs were assembled and aligned with their human, mouse, chicken, and western clawed frog orthologs, resulting in 366 alignments. Analyses of third-codon-position GC content provided conclusive evidence that the poikilothermic alligator has GC-rich isochores, like homeothermic birds and mammals. We placed these results in a theoretical framework able to unify available models of isochore evolution. The data collected for this study allowed us to reject the models that explain the evolution of GC content using changes in body temperature associated with the transition from poikilothermy to homeothermy. Falsification of these models places fundamental constraints upon the plausible pathways for the evolution of isochores.
Collapse
Affiliation(s)
- Jena L Chojnowski
- Department of Zoology, University of Florida, Gainesville, FL 32611, USA.
| | | | | | | | | | | | | |
Collapse
|
14
|
Abstract
The Cyclostomata consists of the two orders Myxiniformes (hagfishes) and Petromyzoniformes (lampreys), and its monophyly has been unequivocally supported by recent molecular phylogenetic studies. Under this updated vertebrate phylogeny, we performed in silico evolutionary analyses using currently available cDNA sequences of cyclostomes. We first calculated the GC-content at four-fold degenerate sites (GC(4)), which revealed that an extremely high GC-content is shared by all the lamprey species we surveyed, whereas no striking pattern in GC-content was observed in any of the hagfish species surveyed. We then estimated the timing of diversification in cyclostome evolution using nucleotide and amino acid sequences. We obtained divergence times of 470-390 million years ago (Mya) in the Ordovician-Silurian-Devonian Periods for the interordinal split between Myxiniformes and Petromyzoniformes; 90-60 Mya in the Cretaceous-Tertiary Periods for the split between the two hagfish subfamilies, Myxininae and Eptatretinae; 280-220 Mya in the Permian-Triassic Periods for the split between the two lamprey subfamilies, Geotriinae and Petromyzoninae; and 30-10 Mya in the Tertiary Period for the split between the two lamprey genera, Petromyzon and Lethenteron. This evolutionary configuration indicates that Myxiniformes and Petromyzoniformes diverged shortly after the common ancestor of cyclostomes split from the future gnathostome lineage. Our results also suggest that intra-subfamilial diversification in hagfish and lamprey lineages (especially those distributed in the northern hemisphere) occurred in the Cretaceous or Tertiary Periods.
Collapse
Affiliation(s)
- Shigehiro Kuraku
- Laboratory for Evolutionary Morphology, RIKEN Center for Developmental Biology, Kobe 650-0047, Japan.
| | | |
Collapse
|
15
|
Abstract
The base compositional correlations that hold among various coding and noncoding regions of the canine genome have been analysed. The distribution pattern of genes, on the basis of GC(3) composition, shows a wide range similar to that observed in human. However the occurrence of maximum number of genes was observed in the range of 65-75% of GC(3) composition. The correlation between the coding DNA sequences of canine with the different noncoding regions (introns and flanking regions) is found to be significant and in many cases the degree of correlation show similarity to human genome. We found that these correlations are not limited to the GC content alone, but is holding at the level of the frequency of individual bases as well. The present study suggests that canines ideally belong to the predicted 'general mammalian pattern' of genome composition along with human beings.
Collapse
Affiliation(s)
- Faustin Joy
- Bioinformatics Centre, Bose Institute, Kolkata, India
| | | | | | | | | | | |
Collapse
|
16
|
Kuraku S, Ishijima J, Nishida-Umehara C, Agata K, Kuratani S, Matsuda Y. cDNA-based gene mapping and GC3 profiling in the soft-shelled turtle suggest a chromosomal size-dependent GC bias shared by sauropsids. Chromosome Res 2006; 14:187-202. [PMID: 16544192 DOI: 10.1007/s10577-006-1035-8] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2005] [Accepted: 01/10/2006] [Indexed: 10/24/2022]
Abstract
Mammalian and avian genomes comprise several classes of chromosomal segments that vary dramatically in GC-content. Especially in chicken, microchromosomes exhibit a higher GC-content and a higher gene density than macrochromosomes. To understand the evolutionary history of the intra-genome GC heterogeneity in amniotes, it is necessary to examine the equivalence of this GC heterogeneity at the nucleotide level between these animals including reptiles, from which birds diverged. We isolated cDNAs for 39 protein-coding genes from the Chinese soft-shelled turtle, Pelodiscus sinensis, and performed chromosome mapping of 31 genes. The GC-content of exonic third positions (GC3) of P. sinensis genes showed a heterogeneous distribution, and exhibited a significant positive correlation with that of chicken and human orthologs, indicating that the last common ancestor of extant amniotes had already established a GC-compartmentalized genomic structure. Furthermore, chromosome mapping in P. sinensis revealed that microchromosomes tend to contain more GC-rich genes than GC-poor genes, as in chicken. These results illustrate two modes of genome evolution in amniotes: mammals elaborated the genomic configuration in which GC-rich and GC-poor regions coexist in individual chromosomes, whereas sauropsids (reptiles and birds) refined the chromosomal size-dependent GC compartmentalization in which GC-rich genomic fractions tend to be confined to microchromosomes.
Collapse
Affiliation(s)
- Shigehiro Kuraku
- Laboratory for Evolutionary Morphology, RIKEN Center for Developmental Biology, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, 650-0047, Japan.
| | | | | | | | | | | |
Collapse
|
17
|
Abstract
A sequence analysis of the genomes of Anopheles gambiae and Drosophila melanogaster reveals that Anopheles DNA is more heterogeneous and GC-richer than Drosophila DNA. The gene concentration across the Anopheles genome is characterized by low levels in the GC-poor part of the genome and a 3-fold increase in the GC-richest part; this gene density gradient is approximately half that of Drosophila. GC levels of introns and flanking sequences are correlated with GC(3) values (GC levels of third codon positions) of the corresponding genes with slopes much lower than unity; in other words, most introns and intergenic sequences are less GC-rich than the corresponding GC(3) values. These findings, which describe a compositional shift within Diptera, is of interest because of their parallels in the well studied major shift in vertebrates.
Collapse
Affiliation(s)
- Kamel Jabbari
- Laboratoire de Génétique Moléculaire, Institut Jacques Monod, 2 Place Jussieu, F-75005 Paris, France
| | | |
Collapse
|
18
|
Abstract
The existence of a well conserved linear relationship between GC levels of genes' second and third codon positions (GC2, GC3) prompted us to focus on the landscape, or joint distribution, spanned by these two variables. In human, well curated coding sequences now cover at least 15%-30% of the estimated total gene set. Our analysis of the landscape defined by this gene set revealed not only the well documented linear crest, but also the presence of several peaks and valleys along that crest, a property that was also indicated in two other warm-blooded vertebrates represented by large gene databases, that is, mouse and chicken. GC2 is the sum of eight amino acid frequencies, whereas GC3 is linearly related to the GC level of the chromosomal region containing the gene. The landscapes therefore portray relations between proteins and the DNA environments of the genes that encode them.
Collapse
Affiliation(s)
- Stéphane Cruveiller
- Laboratorio di Evoluzione Molecolare, Stazione Zoologica Anton Dohrn, 80121 Napoli, Italy
| | | | | | | |
Collapse
|
19
|
Abstract
In this study codon usage bias of all experimentally known genes of Lactococcus lactis has been analyzed. Since Lactococcus lactis is an AT rich organism, it is expected to occur A and/or T at the third position of codons and detailed analysis of overall codon usage data indicates that A and/or T ending codons are predominant in this organism. However, multivariate statistical analyses based both on codon count and on relative synonymous codon usage (RSCU) detect a large number of genes, which are supposed to be highly expressed are clustered at one end of the first major axis, while majority of the putatively lowly expressed genes are clustered at the other end of the first major axis. It was observed that in the highly expressed genes C and T ending codons are significantly higher than the lowly expressed genes and also it was observed that C ending codons are predominant in the duets of highly expressed genes, whereas the T endings codons are abundant in the quartets. Abundance of C and T ending codons in the highly expressed genes suggest that, besides, compositional biases, translational selection are also operating in shaping the codon usage variation among the genes in this organism as observed in other compositionally skewed organisms. The second major axis generated by correspondence analysis on simple codon counts differentiates the genes into two distinct groups according to their hydrophobicity values, but the same analysis computed with relative synonymous codon usage values could not discriminate the genes according to the hydropathy values. This suggests that amino acid composition exerts constraints on codon usage in this organism. On the other hand the second major axis produced by correspondence analysis on RSCU values differentiates the genes into two groups according to the synonymous codon usage for cysteine residues (rarest amino acids in this organism), which is nothing but a artifactual effect induced by the RSCU values. Other factors such as length of the genes and the positions of the genes in the leading and lagging strand of replication have practically no influence in the codon usage variation among the genes in this organism.
Collapse
Affiliation(s)
- S K Gupta
- Bioinformatics Centre Bose Institute, P 1/12, CIT Scheme VII M, Kolkata 700 054, India
| | | | | |
Collapse
|
20
|
Abstract
This study presents compelling evidence that recombination significantly increases the silent GC content of a genome in a selectively neutral manner, resulting in a highly significant positive correlation between recombination and "GC3s" in the yeast Saccharomyces cerevisiae. Neither selection nor mutation can explain this relationship. A highly significant GC-biased mismatch repair system is documented for the first time in any member of the Kingdom Fungi. Much of the variation in the GC3s within yeast appears to result from GC-biased gene conversion. Evidence suggests that GC-biased mismatch repair exists in numerous organisms spanning six kingdoms. This transkingdom GC mismatch repair bias may have evolved in response to a ubiquitous AT mutational bias. A significant positive correlation between recombination and GC content is found in many of these same organisms, suggesting that the processes influencing the evolution of the yeast genome may be a general phenomenon. Nonrecombining regions of the genome and nonrecombining genomes would not be subject to this type of molecular drive. It is suggested that the low GC content characteristic of many nonrecombining genomes may be the result of three processes (1) a prevailing AT mutational bias, (2) random fixation of the most common types of mutation, and (3) the absence of the GC-biased gene conversion which, in recombining organisms, permits the reversal of the most common types of mutation. A model is proposed to explain the observation that introns, intergenic regions, and pseudogenes typically have lower GC content than the silent sites of corresponding open reading frames. This model is based on the observation that the greater the heterology between two sequences, the less likely it is that recombination will occur between them. According to this "Constraint" hypothesis, the formation and propagation of heteroduplex DNA is expected to occur, on average, more frequently within conserved coding and regulatory regions of the genome. In organisms possessing GC-biased mismatch repair, this would enhance the GC content of these regions through biased gene conversion. These findings have a number of important implications for the way we view genome evolution and suggest a new model for the evolution of sex.
Collapse
Affiliation(s)
- John A Birdsell
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona 85121, USA.
| |
Collapse
|
21
|
Abstract
Within-intron difference of correlation with base composition of the adjacent exons was studied in the genomes of 34 species. For this purpose, GC-percent was determined for segments of 50 bp in length taken at both intron margins and in the internal part of the intron. It was found that in certain genomes the coefficient of correlation with GC-percent of the adjacent exon was significantly higher for the intron margin than for the internal part of the intron (homeotherms, cereals). Only part of this difference can be explained by unequal probability of insertion of transposable elements. Those multicellular organisms which have a low or no within-intron difference in correlation with the adjacent exons (anamniotes, invertebrates, dicots) show a higher local compositional heterogeneity (a greater exon/intron contrast in the GC-content). These results are evidence against the mutational bias being a possible explanation for the compositional genome heterogeneity. Thus, in the genomes with a high global heterogeneity there seems to be a selective force for compliance of intron base composition with the adjacent exons. This force is stronger in those parts of the intron that are closer to exons. In addition, the previously found positive general correlation between the genome size and average intron length was confirmed with a much larger dataset. However, within separate phylogenetic groups this rule can be broken, as it occurs in the cereals (family Poaceae), where a negative correlation was found.
Collapse
Affiliation(s)
- A E Vinogradov
- Institute of Cytology, Russian Academy of Sciences, Tikhoretsky Avenue 4, 194064, St. Petersburg, Russia.
| |
Collapse
|
22
|
Abstract
Codon usage bias of Entamoeba histolytica, a protozoan parasite, was investigated using the available DNA sequence data. Entamoeba histolytica having AT rich genome, is expected to have A and/or T at the third position of codons. Overall codon usage data analysis indicates that A and/or T ending codons are strongly biased in the coding region of this organism. However, multivariate statistical analysis suggests that there is a single major trend in codon usage variation among the genes. The genes which are supposed to be highly expressed are clustered at one end, while the majority of the putatively lowly expressed genes are clustered at the other end. The codon usage pattern is distinctly different in these two sets of genes. C ending codons are significantly higher in the putatively highly expressed genes suggesting that C ending codons are translationally optimal in this organism. In the putatively lowly expressed genes A and/or T ending codons are predominant, which suggests that compositional constraints are playing the major role in shaping codon usage variation among the lowly expressed genes. These results suggest that both mutational bias and translational selection are operational in the codon usage variation in this organism.
Collapse
Affiliation(s)
- T C Ghosh
- Distributed Information Centre, Bose Institute, P 1/12, C.I.T. Scheme, VII M, 700 054, Calcutta, India.
| | | | | |
Collapse
|
23
|
Abstract
The nuclear genomes of vertebrates are mosaics of isochores, very long stretches (>>300kb) of DNA that are homogeneous in base composition and are compositionally correlated with the coding sequences that they embed. Isochores can be partitioned in a small number of families that cover a range of GC levels (GC is the molar ratio of guanine+cytosine in DNA), which is narrow in cold-blooded vertebrates, but broad in warm-blooded vertebrates. This difference is essentially due to the fact that the GC-richest 10-15% of the genomes of the ancestors of mammals and birds underwent two independent compositional transitions characterized by strong increases in GC levels. The similarity of isochore patterns across mammalian orders, on the one hand, and across avian orders, on the other, indicates that these higher GC levels were then maintained, at least since the appearance of ancestors of warm-blooded vertebrates. After a brief review of our current knowledge on the organization of the vertebrate genome, evidence will be presented here in favor of the idea that the generation and maintenance of the GC-richest isochores in the genomes of warm-blooded vertebrates were due to natural selection.
Collapse
Affiliation(s)
- G Bernardi
- Laboratorio di Evoluzione Molecolare, Stazione Zoologica Anton Dohrn, Napoli, Italy.
| |
Collapse
|