1
|
The Patterns of Codon Usage between Chordates and Arthropods are Different but Co-evolving with Mutational Biases. Mol Biol Evol 2024; 41:msae080. [PMID: 38667829 PMCID: PMC11108087 DOI: 10.1093/molbev/msae080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 03/22/2024] [Accepted: 04/15/2024] [Indexed: 05/22/2024] Open
Abstract
Different frequencies amongst codons that encode the same amino acid (i.e. synonymous codons) have been observed in multiple species. Studies focused on uncovering the forces that drive such codon usage showed that a combined effect of mutational biases and translational selection works to produce different frequencies of synonymous codons. However, only few have been able to measure and distinguish between these forces that may leave similar traces on the coding regions. Here, we have developed a codon model that allows the disentangling of mutation, selection on amino acids and synonymous codons, and GC-biased gene conversion (gBGC) which we employed on an extensive dataset of 415 chordates and 191 arthropods. We found that chordates need 15 more synonymous codon categories than arthropods to explain the empirical codon frequencies, which suggests that the extent of codon usage can vary greatly between animal phyla. Moreover, methylation at CpG sites seems to partially explain these patterns of codon usage in chordates but not in arthropods. Despite the differences between the two phyla, our findings demonstrate that in both, GC-rich codons are disfavored when mutations are GC-biased, and the opposite is true when mutations are AT-biased. This indicates that selection on the genomic coding regions might act primarily to stabilize its GC/AT content on a genome-wide level. Our study shows that the degree of synonymous codon usage varies considerably among animals, but is likely governed by a common underlying dynamic.
Collapse
|
2
|
Genome-wide analysis of codon usage in sesame ( Sesamum indicum L.). Heliyon 2022; 8:e08687. [PMID: 35106386 PMCID: PMC8789531 DOI: 10.1016/j.heliyon.2021.e08687] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Revised: 11/20/2021] [Accepted: 12/24/2021] [Indexed: 10/28/2022] Open
Abstract
Sesamum indicum is an ancient oil crop grown in tropical and subtropical areas of the world. We have analyzed 23,538 coding sequences (CDS) of S. indicum to understand the factors shaping codon usage in this important oil crop plant. We identified eleven highly preferred codons in S. indicum that have AT-endings. The slope of a neutrality plot was less than one while effective number of codons (ENC) plot showed distribution above and below the standard curve. There is a significant relationship between protein length and relative synonymous codon usage (RSCU) at the primary axis while there is a weak correlation between protein length and Nc values. Correspondence analysis conducted on RSCU values differentiated CDS based on their GC content and their characteristic feature and showed a discrete distribution. Moreover, by determining codon usage, we found out that majority of the lignan biosynthesis related genes showed a weaker codon usage bias. These results provide insights into understanding codon evolution in sesame.
Collapse
|
3
|
The Adenine/Thymine Deleterious Selection Model for GC Content Evolution at the Third Codon Position of the Histone Genes in Drosophila. Genes (Basel) 2021; 12:genes12050721. [PMID: 34065869 PMCID: PMC8150595 DOI: 10.3390/genes12050721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 05/07/2021] [Accepted: 05/07/2021] [Indexed: 12/02/2022] Open
Abstract
The evolution of the GC (guanine cytosine) content at the third codon position of the histone genes (H1, H2A, H2B, H3, H4, H2AvD, H3.3A, H3.3B, and H4r) in 12 or more Drosophila species is reviewed. For explaining the evolution of the GC content at the third codon position of the genes, a model assuming selection with a deleterious effect for adenine/thymine and a size effect is presented. The applicability of the model to whole-genome genes is also discussed.
Collapse
|
4
|
Analysis of synonymous codon usage of transcriptome database in Rheum palmatum. PeerJ 2021; 9:e10450. [PMID: 33505783 PMCID: PMC7789865 DOI: 10.7717/peerj.10450] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Accepted: 11/09/2020] [Indexed: 12/21/2022] Open
Abstract
Background Rheum palmatum is an endangered and important medicinal plant in Asian countries, especially in China. However, there is little knowledge about the codon usage bias for R. palmatum CDSs. In this project, codon usage bias was determined based on the R. palmatum 2,626 predicted CDSs from R. palmatum transcriptome. Methods In this study, all codon usage bias parameters and nucleotide compositions were calculated by Python script, Codon W, DNA Star, CUSP of EMBOSS. Results The average GC and GC3 content are 46.57% and 46.6%, respectively, the results suggested that there exists a little more AT than GC in the R. palmatum genes, and the codon bias of R. palmatum genes preferred to end with A/T. We concluded that the codon bias in R. palmatum was affect by nucleotide composition, mutation pressure, natural selection, gene expression levels, and the mutation pressure is the prominent factor. In addition, we figured out 28 optimal codons and most of them ended with A or U. The project here can offer important information for further studies on enhancing the gene expression using codon optimization in heterogeneous expression system, predicting the genetic and evolutionary mechanisms in R. palmatum.
Collapse
|
5
|
Abstract
Choanoflagellates and filastereans are the closest known single celled relatives of Metazoa within Holozoa and provide insight into how animals evolved from their unicellular ancestors. Codon usage bias has been extensively studied in metazoans, with both natural selection and mutation pressure playing important roles in different species. The disparate nature of metazoan codon usage patterns prevents the reconstruction of ancestral traits. However, traits conserved across holozoan protists highlight characteristics in the unicellular ancestors of Metazoa. Presented here are the patterns of codon usage in the choanoflagellates Monosiga brevicollis and Salpingoeca rosetta, as well as the filasterean Capsaspora owczarzaki. Codon usage is shown to be remarkably conserved. Highly biased genes preferentially use GC-ending codons, however there is limited evidence this is driven by local mutation pressure. The analyses presented provide strong evidence that natural selection, for both translational accuracy and efficiency, dominates codon usage bias in holozoan protists. In particular, the signature of selection for translational accuracy can be detected even in the most weakly biased genes. Biased codon usage is shown to have coevolved with the tRNA species, with optimal codons showing complementary binding to the highest copy number tRNA genes. Furthermore, tRNA modification is shown to be a common feature for amino acids with higher levels of degeneracy and highly biased genes show a strong preference for using modified tRNAs in translation. The translationally optimal codons defined here will be of benefit to future transgenics work in holozoan protists, as their use should maximise protein yields from edited transgenes.
Collapse
|
6
|
Comparative genomics of Bacteria commonly identified in the built environment. BMC Genomics 2019; 20:92. [PMID: 30691394 PMCID: PMC6350394 DOI: 10.1186/s12864-018-5389-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2018] [Accepted: 12/18/2018] [Indexed: 02/01/2023] Open
Abstract
BACKGROUND The microbial community of the built environment (BE) can impact the lives of people and has been studied for a variety of indoor, outdoor, underground, and extreme locations. Thus far, these microorganisms have mainly been investigated by culture-based methods or amplicon sequencing. However, both methods have limitations, complicating multi-study comparisons and limiting the knowledge gained regarding in-situ microbial lifestyles. A greater understanding of BE microorganisms can be achieved through basic information derived from the complete genome. Here, we investigate the level of diversity and genomic features (genome size, GC content, replication strand skew, and codon usage bias) from complete genomes of bacteria commonly identified in the BE, providing a first step towards understanding these bacterial lifestyles. RESULTS Here, we selected bacterial genera commonly identified in the BE (or "Common BE genomes") and compared them against other prokaryotic genera ("Other genomes"). The "Common BE genomes" were identified in various climates and in indoor, outdoor, underground, or extreme built environments. The diversity level of the 16S rRNA varied greatly between genera. The genome size, GC content and GC skew strength of the "Common BE genomes" were statistically larger than those of the "Other genomes" but were not practically significant. In contrast, the strength of selected codon usage bias (S value) was statistically higher with a large effect size in the "Common BE genomes" compared to the "Other genomes." CONCLUSION Of the four genomic features tested, the S value could play a more important role in understanding the lifestyles of bacteria living in the BE. This parameter could be indicative of bacterial growth rates, gene expression, and other factors, potentially affected by BE growth conditions (e.g., temperature, humidity, and nutrients). However, further experimental evidence, species-level BE studies, and classification by BE location is needed to define the relationship between genomic features and the lifestyles of BE bacteria more robustly.
Collapse
|
7
|
Whole genome analysis of codon usage in Echinococcus. Mol Biochem Parasitol 2018; 225:54-66. [DOI: 10.1016/j.molbiopara.2018.08.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2018] [Revised: 07/20/2018] [Accepted: 08/01/2018] [Indexed: 01/15/2023]
|
8
|
Analytical Biases Associated with GC-Content in Molecular Evolution. Front Genet 2017; 8:16. [PMID: 28261263 PMCID: PMC5309256 DOI: 10.3389/fgene.2017.00016] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2016] [Accepted: 02/06/2017] [Indexed: 12/19/2022] Open
Abstract
Molecular evolution is being revolutionized by high-throughput sequencing allowing an increased amount of genome-wide data available for multiple species. While base composition summarized by GC-content is one of the first metrics measured in genomes, its genomic distribution is a frequently neglected feature in downstream analyses based on DNA sequence comparisons. Here, we show how base composition heterogeneity among loci and taxa can bias common molecular evolution analyses such as phylogenetic tree reconstruction, detection of natural selection and estimation of codon usage. We then discuss the biological, technical and methodological causes of these GC-associated biases and suggest approaches to overcome them.
Collapse
|
9
|
New views on the selection acting on genetic polymorphism in central metabolic genes. Ann N Y Acad Sci 2016; 1389:108-123. [PMID: 27859384 DOI: 10.1111/nyas.13285] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2016] [Revised: 09/20/2016] [Accepted: 09/29/2016] [Indexed: 12/14/2022]
Abstract
Studies of the polymorphism of central metabolic genes as a source of fitness variation in natural populations date back to the discovery of allozymes in the 1960s. The unique features of these genes and their enzymes and our knowledge base greatly facilitates the systems-level study of this group. The expectation that pathway flux control is central to understanding the molecular evolution of genes is discussed, as well as studies that attempt to place gene-specific molecular evolution and polymorphism into a context of pathway and network architecture. There is an increasingly complex picture of the metabolic genes assuming additional roles beyond their textbook anabolic and catabolic reactions. In particular, this review emphasizes the potential role of these genes as part of the energy-sensing machinery. It is underscored that the concentrations of key cellular metabolites are the reflections of cellular energy status and nutritional input. These metabolites are the top-down signaling messengers that set signaling through signaling pathways that are involved in energy economy. I propose that the polymorphisms in central metabolic genes shift metabolite concentrations and in that fashion act as genetic modifiers of the energy-state coupling to the transcriptional networks that affect physiological trade-offs with significant fitness consequences.
Collapse
|
10
|
Evolution of GC content in the histone gene repeating units from Drosophila lutescens, D. takahashii and D. pseudoobscura. Genes Genet Syst 2016; 91:27-36. [PMID: 27021916 DOI: 10.1266/ggs.15-00018] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
A subset of histone genes (H1, H2A, H2B and H4), which are encoded along with H3 within repeating units, were analyzed in Drosophila lutescens, D. takahashii and D. pseudoobscura to investigate the evolutionary mechanisms influencing this multigene family and its GC content. Nucleotide divergence among species was more marked in the less functional regions. A strong inverse relationship was observed between the extent of evolutionary divergence and GC content within the repeating units; this finding indicated that the functional constraint on a region must be associated with both divergence and GC content. The GC content at 3(rd) codon positions in the histone genes from D. lutescens and D. takahashii was higher than that from D. melanogaster, while that from D. pseudoobscura was similar. These evolutionary patterns were similar to those of H3 gene regions. Based on these findings, we propose that the evolutionary mechanisms governing nucleotide content at 3(rd) codon positions tend to eliminate A and T nucleotides more frequently than G and C nucleotides. These changes might be the consequence of negative selection and would result in GC-rich 3(rd) codon positions. In addition, interspecific differences in GC content, which exhibited the same pattern for all histone genes, could be explained by different selection efficiencies that result from changes in population size.
Collapse
|
11
|
The Selective Advantage of Synonymous Codon Usage Bias in Salmonella. PLoS Genet 2016; 12:e1005926. [PMID: 26963725 PMCID: PMC4786093 DOI: 10.1371/journal.pgen.1005926] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2015] [Accepted: 02/18/2016] [Indexed: 11/18/2022] Open
Abstract
The genetic code in mRNA is redundant, with 61 sense codons translated into 20 different amino acids. Individual amino acids are encoded by up to six different codons but within codon families some are used more frequently than others. This phenomenon is referred to as synonymous codon usage bias. The genomes of free-living unicellular organisms such as bacteria have an extreme codon usage bias and the degree of bias differs between genes within the same genome. The strong positive correlation between codon usage bias and gene expression levels in many microorganisms is attributed to selection for translational efficiency. However, this putative selective advantage has never been measured in bacteria and theoretical estimates vary widely. By systematically exchanging optimal codons for synonymous codons in the tuf genes we quantified the selective advantage of biased codon usage in highly expressed genes to be in the range 0.2-4.2 x 10-4 per codon per generation. These data quantify for the first time the potential for selection on synonymous codon choice to drive genome-wide sequence evolution in bacteria, and in particular to optimize the sequences of highly expressed genes. This quantification may have predictive applications in the design of synthetic genes and for heterologous gene expression in biotechnology.
Collapse
|
12
|
Analysis of synonymous codon usage patterns in sixty-four different bivalve species. PeerJ 2015; 3:e1520. [PMID: 26713259 PMCID: PMC4690358 DOI: 10.7717/peerj.1520] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2015] [Accepted: 11/28/2015] [Indexed: 12/21/2022] Open
Abstract
Synonymous codon usage bias (CUB) is a defined as the non-random usage of codons encoding the same amino acid across different genomes. This phenomenon is common to all organisms and the real weight of the many factors involved in its shaping still remains to be fully determined. So far, relatively little attention has been put in the analysis of CUB in bivalve mollusks due to the limited genomic data available. Taking advantage of the massive sequence data generated from next generation sequencing projects, we explored codon preferences in 64 different species pertaining to the six major evolutionary lineages in Bivalvia. We detected remarkable differences across species, which are only partially dependent on phylogeny. While the intensity of CUB is mild in most organisms, a heterogeneous group of species (including Arcida and Mytilida, among the others) display higher bias and a strong preference for AT-ending codons. We show that the relative strength and direction of mutational bias, selection for translational efficiency and for translational accuracy contribute to the establishment of synonymous codon usage in bivalves. Although many aspects underlying bivalve CUB still remain obscure, we provide for the first time an overview of this phenomenon in this large, commercially and environmentally important, class of marine invertebrates.
Collapse
|
13
|
Computational methods of identification of pseudogenes based on functionality: entropy and GC content. Methods Mol Biol 2014; 1167:41-62. [PMID: 24823770 DOI: 10.1007/978-1-4939-0835-6_4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Spectral entropy and GC content analyses reveal comprehensive structural features of DNA sequences. To illustrate the significance of these features, we analyze the β-esterase gene cluster, including the Est-6 gene and the ψEst-6 putative pseudogene, in seven species of the Drosophila melanogaster subgroup. The spectral entropies show distinctly lower structural ordering for ψEst-6 than for Est-6 in all species studied. However, entropy accumulation is not a completely random process for either gene and it shows to be nucleotide dependent. Furthermore, GC content in synonymous positions is uniformly higher in Est-6 than in ψEst-6, in agreement with the reduced GC content generally observed in pseudogenes and nonfunctional sequences. The observed differences in entropy and GC content reflect an evolutionary shift associated with the process of pseudogenization and subsequent functional divergence of ψEst-6 and Est-6 after the duplication event. The data obtained show the relevance and significance of entropy and GC content analyses for pseudogene identification and for the comparative study of gene-pseudogene evolution.
Collapse
|
14
|
Codon usage bias: causative factors, quantification methods and genome-wide patterns: with emphasis on insect genomes. Biol Rev Camb Philos Soc 2012; 88:49-61. [PMID: 22889422 DOI: 10.1111/j.1469-185x.2012.00242.x] [Citation(s) in RCA: 124] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Codon usage bias refers to the phenomenon where specific codons are used more often than other synonymous codons during translation of genes, the extent of which varies within and among species. Molecular evolutionary investigations suggest that codon bias is manifested as a result of balance between mutational and translational selection of such genes and that this phenomenon is widespread across species and may contribute to genome evolution in a significant manner. With the advent of whole-genome sequencing of numerous species, both prokaryotes and eukaryotes, genome-wide patterns of codon bias are emerging in different organisms. Various factors such as expression level, GC content, recombination rates, RNA stability, codon position, gene length and others (including environmental stress and population size) can influence codon usage bias within and among species. Moreover, there has been a continuous quest towards developing new concepts and tools to measure the extent of codon usage bias of genes. In this review, we outline the fundamental concepts of evolution of the genetic code, discuss various factors that may influence biased usage of synonymous codons and then outline different principles and methods of measurement of codon usage bias. Finally, we discuss selected studies performed using whole-genome sequences of different insect species to show how codon bias patterns vary within and among genomes. We conclude with generalized remarks on specific emerging aspects of codon bias studies and highlight the recent explosion of genome-sequencing efforts on arthropods (such as twelve Drosophila species, species of ants, honeybee, Nasonia and Anopheles mosquitoes as well as the recent launch of a genome-sequencing project involving 5000 insects and other arthropods) that may help us to understand better the evolution of codon bias and its biological significance.
Collapse
|
15
|
Studying patterns of recent evolution at synonymous sites and intronic sites in Drosophila melanogaster. J Mol Evol 2009; 70:116-28. [PMID: 20041239 DOI: 10.1007/s00239-009-9314-6] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2009] [Accepted: 12/07/2009] [Indexed: 10/20/2022]
Abstract
Most previous studies of the evolution of codon usage bias (CUB) and intronic GC content (iGC) in Drosophila melanogaster were based on between-species comparisons, reflecting long-term evolutionary events. However, a complete picture of the evolution of CUB and iGC cannot be drawn without knowledge of their more recent evolutionary history. Here, we used a polymorphism dataset collected from Zimbabwe to study patterns of the recent evolution of CUB and iGC. Analyzing coding and intronic data jointly with a model which can simultaneously estimate selection, mutational, and demographic parameters, we have found that: (1) natural selection is probably acting on synonymous codons; (2) a constant population size model seems to be sufficient to explain most of the observed synonymous polymorphism patterns; (3) GC is favored over AT in introns. In agreement with the long-term evolutionary patterns, ongoing selection acting on X-linked synonymous codons is stronger than that acting on autosomal codons. The selective differences between preferred and unpreferred codons tend to be greater than the differences between GC and AT in introns, suggesting that natural selection, not just biased gene conversion, may have influenced the evolution of CUB. Interestingly, evidence for non-equilibrium evolution comes exclusively from the intronic data. However, three different models, an equilibrium model with two classes of selected sites and two non-equilibrium models with changes in either population size or mutational parameters, fit the intronic data equally well. These results show that using inadequate selection (or demographic) models can result in incorrect estimates of demographic (or selection) parameters.
Collapse
|
16
|
Measuring and detecting molecular adaptation in codon usage against nonsense errors during protein translation. Genetics 2009; 183:1493-505. [PMID: 19822731 PMCID: PMC2787434 DOI: 10.1534/genetics.109.108209] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2009] [Accepted: 09/26/2009] [Indexed: 11/18/2022] Open
Abstract
Codon usage bias (CUB) has been documented across a wide range of taxa and is the subject of numerous studies. While most explanations of CUB invoke some type of natural selection, most measures of CUB adaptation are heuristically defined. In contrast, we present a novel and mechanistic method for defining and contextualizing CUB adaptation to reduce the cost of nonsense errors during protein translation. Using a model of protein translation, we develop a general approach for measuring the protein production cost in the face of nonsense errors of a given allele as well as the mean and variance of these costs across its coding synonyms. We then use these results to define the nonsense error adaptation index (NAI) of the allele or a contiguous subset thereof. Conceptually, the NAI value of an allele is a relative measure of its elevation on a specific and well-defined adaptive landscape. To illustrate its utility, we calculate NAI values for the entire coding sequence and across a set of nonoverlapping windows for each gene in the Saccharomyces cerevisiae S288c genome. Our results provide clear evidence of adaptation to reduce the cost of nonsense errors and increasing adaptation with codon position and expression. The magnitude and nature of this adaptation are also largely consistent with simulation results in which nonsense errors are the only selective force driving CUB evolution. Because NAI is derived from mechanistic models, it is both easier to interpret and more amenable to future refinement than other commonly used measures of codon bias. Further, our approach can also be used as a starting point for developing other mechanistically derived measures of adaptation such as for translational accuracy.
Collapse
|
17
|
Effect of exonic splicing regulation on synonymous codon usage in alternatively spliced exons of Dscam. BMC Evol Biol 2009; 9:214. [PMID: 19709440 PMCID: PMC2741454 DOI: 10.1186/1471-2148-9-214] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2008] [Accepted: 08/27/2009] [Indexed: 12/31/2022] Open
Abstract
Background Synonymous codon usage is typically biased towards translationally superior codons in many organisms. In Drosophila, genomic data indicates that translationally optimal codons and splice optimal codons are mostly mutually exclusive, and adaptation to translational efficiency is reduced in the intron-exon boundary regions where potential exonic splicing enhancers (ESEs) reside. In contrast to genomic scale analyses on large datasets, a refined study on a well-controlled set of samples can be effective in demonstrating the effects of particular splice-related factors. Down syndrome cell adhesion molecule (Dscam) has the largest number of alternatively spliced exons (ASEs) known to date, and the splicing frequency of each ASE is accessible from the relative abundance of the transcript. Thus, these ASEs comprise a unique model system for studying the effect of splicing regulation on synonymous codon usage. Results Codon Bias Indices (CBI) in the 3' boundary regions were reduced compared to the rest of the exonic regions among 48 and 33 ASEs of exon 6 and 9 clusters, respectively. These regional differences in CBI were affected by splicing frequency and distance from adjacent exons. Synonymous divergence levels between the 3' boundary region and the remaining exonic region of exon 6 ASEs were similar. Additionally, another sensitive comparison of paralogous exonic regions in recently retrotransposed processed genes and their parental genes revealed that, in the former, the differences in CBI between what were formerly the central regions and the boundary regions gradually became smaller over time. Conclusion Analyses of the multiple ASEs of Dscam allowed direct tests of the effect of splice-related factors on synonymous codon usage and provided clear evidence that synonymous codon usage bias is restricted by exonic splicing signals near the intron-exon boundary. A similar synonymous divergence level between the different exonic regions suggests that the intensity of splice-related selection is generally weak and comparable to that of translational selection. Finally, the leveling off of differences in codon bias over time in retrotransposed genes meets the direct prediction of the tradeoff model that invokes conflict between translational superiority and splicing regulation, and strengthens the conclusions obtained from Dscam.
Collapse
|
18
|
Abstract
Codon usage bias (CUB) is a ubiquitous observation in molecular evolution. As a model, Drosophila has been particularly well-studied and indications show that selection at least partially controls codon usage, probably through selection for translational efficiency. Although many aspects of Drosophila CUB have been studied, this is the first study relating codon usage to development in this holometabolous insect with very different life stages. Here we ask the question: What developmental stage of Drosophila melanogaster has the greatest CUB? Genes with maximum expression in the larval stage have the greatest overall CUB when compared with embryos, pupae, and adults. (The same pattern was observed in Drosophila pseudoobscura, see Supplementary Material online.) We hypothesize this is related to the very rapid growth of larvae, placing increased selective pressure to produce large amounts of protein: a 300-fold increase requiring an approximate doubling of protein content every 10 h. Genes with highest expression in adult males and early embryos, stages with the least de novo protein synthesis, display the least CUB. These results are consistent with the hypothesis that CUB is caused (at least in part) by selection for efficient protein production. This seems to hold on the individual gene level (highly expressed genes are more biased than lowly expressed genes) as well as on a more global scale where genes with maximum expression during times of very rapid growth and protein synthesis are more biased than genes with maximum expression during times of low growth.
Collapse
|
19
|
Accelerated sequence divergence of conserved genomic elements in Drosophila melanogaster. Genome Res 2008; 18:1592-601. [PMID: 18583644 DOI: 10.1101/gr.077131.108] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Recent genomic sequencing of 10 additional Drosophila genomes provides a rich resource for comparative genomics analyses aimed at understanding the similarities and differences between species and between Drosophila and mammals. Using a phylogenetic approach, we identified 64 genomic elements that have been highly conserved over most of the Drosophila tree, but that have experienced a recent burst of evolution along the Drosophila melanogaster lineage. Compared to similarly defined elements in humans, these regions of rapid lineage-specific evolution in Drosophila differ dramatically in location, mechanism of evolution, and functional properties of associated genes. Notably, the majority reside in protein-coding regions and primarily result from rapid adaptive synonymous site evolution. In fact, adaptive evolution appears to be driving substitutions to unpreferred codons. Our analysis also highlights interesting noncoding genomic regions, such as regulatory regions in the gene gooseberry-neuro and a putative novel miRNA.
Collapse
|
20
|
Codon bias is a major factor explaining phage evolution in translationally biased hosts. J Mol Evol 2008; 66:210-23. [PMID: 18286220 DOI: 10.1007/s00239-008-9068-6] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2007] [Revised: 11/20/2007] [Accepted: 12/07/2007] [Indexed: 11/28/2022]
Abstract
The size and diversity of bacteriophage populations require methodologies to quantitatively study the landscape of phage differences. Statistical approaches are confronted with small genome sizes forbidding significant single-phage analysis, and comparative methods analyzing full phage genomes represent an alternative but they are of difficult interpretation due to lateral gene transfer, which creates a mosaic spectrum of related phage species. Based on a large-scale codon bias analysis of 116 DNA phages hosted by 11 translationally biased bacteria belonging to different phylogenetic families, we observe that phage genomes are almost always under codon selective pressure imposed by translationally biased hosts, and we propose a classification of phages with translationally biased hosts which is based on adaptation patterns. We introduce a computational method for comparing phages sharing homologous proteins, possibly accepted by different hosts. We observe that throughout phages, independently from the host, capsid genes appear to be the most affected by host translational bias. For coliphages, genes involved in virion morphogenesis, host interaction and ssDNA binding are also affected by adaptive pressure. Adaptation affects long and small phages in a significant way. We analyze in more detail the Microviridae phage space to illustrate the potentiality of the approach. The small number of directions in adaptation observed in phages grouped around phi X174 is discussed in the light of functional bias. The adaptation analysis of the set of Microviridae phages defined around phi MH2K shows that phage classification based on adaptation does not reflect bacterial phylogeny.
Collapse
|
21
|
The molecular basis of host adaptation in cactophilic Drosophila: molecular evolution of a glutathione S-transferase gene (GstD1) in Drosophila mojavensis. Genetics 2008; 178:1073-83. [PMID: 18245335 DOI: 10.1534/genetics.107.083287] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Drosophila mojavensis is a cactophilic fly endemic to the northwestern deserts of North America. This species includes four genetically isolated cactus host races each individually specializing on the necrotic tissues of a different cactus species. The necrosis of each cactus species provides the resident D. mojavensis populations with a distinct chemical environment. A previous investigation of the role of transcriptional variation in the adaptation of D. mojavensis to its hosts produced a set of candidate loci that are differentially expressed in response to host shifts, and among them was glutathione S-transferase D1 (GstD1). In both D. melanogaster and Anopheles gambiae, GstD1 has been implicated in the resistance of these species to the insecticide dichloro-diphenyl-trichloroethane (DDT). The pattern of sequence variation of the GstD1 locus from all four D. mojavensis populations, D. arizonae (sister species), and D. navojoa (outgroup) has been examined. The data suggest that in two populations of D. mojavensis GstD1 has gone through a period of adaptive amino acid evolution. Further analyses indicate that of the seven amino acid fixations that occurred in the D. mojavensis lineage, two of them occur in the active site pocket, potentially having a significant effect on substrate specificity and in the adaptation to alternative cactus hosts.
Collapse
|
22
|
Variable strength of translational selection among 12 Drosophila species. Genetics 2007; 177:1337-48. [PMID: 18039870 PMCID: PMC2147958 DOI: 10.1534/genetics.107.070466] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2007] [Accepted: 09/05/2007] [Indexed: 01/06/2023] Open
Abstract
Codon usage bias in Drosophila melanogaster genes has been attributed to negative selection of those codons whose cellular tRNA abundance restricts rates of mRNA translation. Previous studies, which involved limited numbers of genes, can now be compared against analyses of the entire gene complements of 12 Drosophila species whose genome sequences have become available. Using large numbers (6138) of orthologs represented in all 12 species, we establish that the codon preferences of more closely related species are better correlated. Differences between codon usage biases are attributed, in part, to changes in mutational biases. These biases are apparent from the strong correlation (r = 0.92, P < 0.001) among these genomes' intronic G + C contents and exonic G + C contents at degenerate third codon positions. To perform a cross-species comparison of selection on codon usage, while accounting for changes in mutational biases, we calibrated each genome in turn using the codon usage bias indices of highly expressed ribosomal protein genes. The strength of translational selection was predicted to have varied between species largely according to their phylogeny, with the D. melanogaster group species exhibiting the strongest degree of selection.
Collapse
|
23
|
Evidence for a trade-off between translational efficiency and splicing regulation in determining synonymous codon usage in Drosophila melanogaster. Mol Biol Evol 2007; 24:2755-62. [PMID: 17905999 DOI: 10.1093/molbev/msm210] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
In Drosophila melanogaster, synonymous codons corresponding to the most abundant cognate tRNAs are used more frequently, especially in highly expressed genes. Increased use of such "optimal" codons is considered an adaptation for translational efficiency. Need it always be the case that selection should favor the use of a translationally optimal codon? Here, we investigate one possible confounding factor, namely, the need to specify information in exons necessary to enable correct splicing. As expected from such a model, in Drosophila many codons show different usage near intron-exon boundaries versus exon core regions. However, this finding is in principle also consistent with Hill-Robertson effects modulating usage of translationally optimal codons. However, several results support the splice model over the translational selection model: 1) the trends in codon usage are strikingly similar to those in mammals in which codon usage near boundaries correlates with abundance in exonic splice enhancers (ESEs), 2) codons preferred near boundaries tend to be enriched for A and avoid C (conversely those avoided near boundaries prefer C rather than A), as expected were ESEs involved, and 3) codons preferred near boundaries are typically not translationally optimal. We conclude that usage of translationally optimal codons usage is compromised in the vicinity of splice junctions in intron-containing genes, to the effect that we observe higher levels of usage of translationally optimal codons at the center of exons. On the gene level, however, controlling for known correlates of codon bias, the impact on codon usage patterns is quantitatively small. These results have implications for inferring aspects of the mechanism of splicing given nothing more than a well-annotated genome.
Collapse
|
24
|
Abstract
Background Currently, there is little data available regarding the role of gender-specific gene expression on synonymous codon usage (translational selection) in most organisms, and particularly plants. Using gender-specific EST libraries (with > 4000 ESTs) from Zea mays and Triticum aestivum, we assessed whether gender-specific gene expression per se and gender-specific gene expression level are associated with selection on codon usage. Results We found clear evidence of a greater bias in codon usage for genes expressed in female than in male organs and gametes, based on the variation in GC content at third codon positions and the frequency of species-preferred codons. This finding holds true for both highly and for lowly expressed genes. In addition, we found that highly expressed genes have greater codon bias than lowly expressed genes for both female- and male-specific genes. Moreover, in both species, genes with female-specific expression show a greater usage of species-specific preferred codons for each of the 18 amino acids having synonymous codons. A supplemental analysis of Brassica napus suggests that bias in codon usage could also be higher in genes expressed in male gametophytic tissues than in heterogeneous (flower) tissues. Conclusion This study reports gender-specific bias in codon usage in plants. The findings reported here, based on the analysis of 1 497 876 codons, are not caused either by differences in the biological functions of the genes or by differences in protein lengths, nor are they likely attributable to mutational bias. The data are best explained by gender-specific translational selection. Plausible explanations for these findings and the relevance to these and other organisms are discussed.
Collapse
|
25
|
Abstract
A strong negative correlation between the rate of amino-acid substitution and codon usage bias in Drosophila has been attributed to interference between positive selection at nonsynonymous sites and weak selection on codon usage. To further explore this possibility we have investigated polymorphism and divergence at three kinds of sites: synonymous, nonsynonymous and intronic in relation to codon bias in D. melanogaster and D. simulans. We confirmed that protein evolution is one of the main explicative parameters for interlocus codon bias variation (r(2) approximately 40%). However, intron or synonymous diversities, which could have been expected to be good indicators of local interference [here defined as the additional increase of drift due to selection on tightly linked sites, also called 'genetic draft' by Gillespie (2000)] did not covary significantly with codon bias or with protein evolution. Concurrently, levels of polymorphism were reduced in regions of low recombination rates whereas codon bias was not. Finally, while nonsynonymous diversities were very well correlated between species, neither synonymous nor intron diversities observed in D. melanogaster were correlated with those observed in D. simulans. All together, our results suggest that the selective constraint on the protein is a stable component of gene evolution while local interference is not. The pattern of variation in genetic draft along the genome therefore seems to be instable through evolutionary times and should therefore be considered as a minor determinant of codon bias variance. We argue that selective constraints for optimal codon usage are likely to be correlated with selective constraints on the protein, both between codons within a gene, as previously suggested, and also between genes within a genome.
Collapse
|
26
|
Latitudinal clines for nucleotide polymorphisms in the Esterase 6 gene of Drosophila melanogaster. Genetica 2006; 129:259-71. [PMID: 16955332 DOI: 10.1007/s10709-006-0006-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2005] [Accepted: 03/31/2006] [Indexed: 11/27/2022]
Abstract
Previous studies have found non-neutral patterns of nucleotide polymorphism in the promoter and coding regions of Est6 in D. melanogaster. Coding region polymorphism peaks around two closely linked replacement differences associated with the EST6-F/EST6-S allozyme polymorphism. The promoter contains two common, highly diverged haplotype groups, P1 and P7, that differentially affect Est6 expression. Allozyme studies have also revealed latitudinal clines in EST6-F and EST6-S frequencies that recur across continents. Here we analyse nucleotide polymorphisms across the promoter and the region of peak coding sequence polymorphism in 10 Australian populations along a 25 degrees latitudinal gradient in order to examine the basis for the allozyme clines. As with the earlier studies, we find an excess of intermediate to high frequency variants in both the P1/P7 region and around the two EST6-F/EST6-S replacements in some populations. The two EST6-F/EST6-S replacement polymorphisms show latitudinal clines whereas the P1 and P7 groups of promoter haplotypes do not. However the strongest clines are for three co-segregating silent site polymorphisms in a 4 bp stretch at the 3' end of the sequenced region. Monte Carlo simulations show that the clines for those three sites can explain all others in the data but none of the others can explain those three. Thus the allozyme clines may not reflect selection on either the P1/P7 polymorphism or the two replacements previously associated with the EST6-F/EST-S difference.
Collapse
|
27
|
A model of protein translation including codon bias, nonsense errors, and ribosome recycling. J Theor Biol 2006; 239:417-34. [PMID: 16171830 DOI: 10.1016/j.jtbi.2005.08.007] [Citation(s) in RCA: 72] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2005] [Revised: 08/05/2005] [Accepted: 08/08/2005] [Indexed: 11/15/2022]
Abstract
We present and analyse a model of protein translation at the scale of an individual messenger RNA (mRNA) transcript. The model we develop is unique in that it incorporates the phenomena of ribosome recycling and nonsense errors. The model conceptualizes translation as a probabilistic wave of ribosome occupancy traveling down a heterogeneous medium, the mRNA transcript. Our results show that the heterogeneity of the codon translation rates along the mRNA results in short-scale spikes and dips in the wave. Nonsense errors attenuate this wave on a longer scale while ribosome recycling reinforces it. We find that the combination of nonsense errors and codon usage bias can have a large effect on the probability that a ribosome will completely translate a transcript. We also elucidate how these forces interact with ribosome recycling to determine the overall translation rate of an mRNA transcript. We derive a simple cost function for nonsense errors using our model and apply this function to the yeast (Saccharomyces cervisiae) genome. Using this function we are able to detect position dependent selection on codon bias which correlates with gene expression levels as predicted a priori. These results indirectly validate our underlying model assumptions and confirm that nonsense errors can play an important role in shaping codon usage bias.
Collapse
|
28
|
Determination of mutation trend in proteins by means of translation probability between RNA codes and mutated amino acids. Biochem Biophys Res Commun 2005; 337:692-700. [PMID: 16202392 PMCID: PMC7117410 DOI: 10.1016/j.bbrc.2005.09.106] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2005] [Revised: 09/19/2005] [Accepted: 09/19/2005] [Indexed: 11/30/2022]
Abstract
In this study, we estimate the translation probability to amino acid from RNA codon. With the determined 183 translation probabilities and amino-acid composition of eight highly mutated proteins, we construct the theoretical distributions of mutated amino acids in these proteins and then compare them with their actual distributions affected by mutations. Thereafter we trace the pattern of translation probabilities from RNA codons to mutated amino acids of 1053 point missense mutations. Finally, we statistically conclude that the natural mutation trend goes along the theoretical translation probability.
Collapse
|
29
|
Relationships among stop codon usage bias, its context, isochores, and gene expression level in various eukaryotes. J Mol Evol 2005; 61:437-44. [PMID: 16170455 DOI: 10.1007/s00239-004-0277-3] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2004] [Accepted: 01/25/2005] [Indexed: 11/25/2022]
Abstract
It is well known that stop codons play a critical role in the process of protein synthesis. However, little effort has been made to investigate whether stop codon usage exhibits biases, such as widely seen for synonymous codon usage. Here we systematically investigate stop codon usage bias in various eukaryotes as well as its relationships with its context, GC3 content, gene expression level, and secondary structure. The results show that there is a strong bias for stop codon usage in different eukaryotes, i.e., UAA is overrepresented in the lower eukaryotes, UGA is overrepresented in the higher eukaryotes, and UAG is least used in all eukaryotes. Different conserved patterns for each stop codon in different eukaryotic classes are found based on information content and logo analysis. GC3 contents increase with increasing complexity of organisms. Secondary structure prediction revealed that UAA is generally associated with loop structures, whereas UGA is more uniformly present in loop and stem structures, i.e., UGA is less biased toward having a particular structure. The stop codon usage bias, however, shows no significant relationship with GC3 content and gene expression level in individual eukaryotes. The results indicate that genomic complexity and GC3 content might contribute to stop codon usage bias in different eukaryotes. Our results indicate that stop codons, like synonymous codons, exhibit biases in usage. Additional work will be needed to understand the causes of these biases and their relationship to the mechanism of protein termination.
Collapse
|
30
|
Global mRNA stability is not associated with levels of gene expression in Drosophila melanogaster but shows a negative correlation with codon bias. J Mol Evol 2005; 61:306-14. [PMID: 16044249 DOI: 10.1007/s00239-004-0271-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2004] [Accepted: 03/16/2005] [Indexed: 11/26/2022]
Abstract
A multitude of factors contribute to the regulation of gene expression in living cells. The relationship between codon usage bias and gene expression has been extensively studied, and it has been shown that codon bias may have adaptive significance in many unicellular and multicellular organisms. Given the central role of mRNA in post-transcriptional regulation, we hypothesize that mRNA stability is another important factor associated either with positive or negative regulation of gene expression. We have conducted genome-wide studies of the association between gene expression (measured as transcript abundance in public EST databases), mRNA stability, codon bias, GC content, and gene length in Drosophila melanogaster. To remove potential bias of gene length inherently present in EST libraries, gene expression is measured as normalized transcript abundance. It is demonstrated that codon bias and GC content in second codon position are positively associated with transcript abundance. Gene length is negatively associated with transcript abundance. The stability of thermodynamically predicted mRNA secondary structures is not associated with transcript abundance, but there is a negative correlation between mRNA stability and codon bias. This finding does not support the hypothesis that codon bias has evolved as an indirect consequence of selection favoring thermodynamically stable mRNA molecules.
Collapse
|
31
|
Entropy and GC Content in the beta-esterase gene cluster of the Drosophila melanogaster subgroup. Mol Biol Evol 2005; 22:2063-72. [PMID: 15972847 DOI: 10.1093/molbev/msi197] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
We perform spectral entropy and GC content analyses in the beta-esterase gene cluster, including the Est-6 gene and the psiEst-6 putative pseudogene, in seven species of the Drosophila melanogaster species subgroup. psiEst-6 combines features of functional and nonfunctional genes. The spectral entropies show distinctly lower structural ordering for psiEst-6 than for Est-6 in all species studied. Our observations agree with previous results for D. melanogaster and provide additional support to our hypothesis that after the duplication event Est-6 retained the esterase-coding function and its role during copulation, while psiEst-6 lost that function but now operates in conjunction with Est-6 as an intergene. Entropy accumulation is not a completely random process for either gene. Structural entropy is nucleotide dependent. The relative normalized deviations for structural entropy are higher for G than for C nucleotides. The entropy values are similar for Est-6 and psiEst-6 in the case of A and T but are lower for Est-6 in the case of G and C. The GC content in synonymous positions is uniformly higher in Est-6 than in psiEst-6, which agrees with the reduced GC content generally observed in pseudogenes and nonfunctional sequences. The observed differences in entropy and GC content reflect an evolutionary shift associated with the process of pseudogenization and subsequent functional divergence of psiEst-6 and Est-6 after the duplication event.
Collapse
|
32
|
Abstract
To study the roles of translational accuracy, translational efficiency, and the Hill-Robertson effect in codon usage bias, we studied the intragenic spatial distribution of synonymous codon usage bias in four prokaryotic (Escherichia coli, Bacillus subtilis, Sulfolobus tokodaii, and Thermotoga maritima) and two eukaryotic (Saccharomyces cerevisiae and Drosophila melanogaster) genomes. We generated supersequences at each codon position across genes in a genome and computed the overall bias at each codon position. By quantitatively evaluating the trend of spatial patterns using isotonic regression, we show that in yeast and prokaryotic genomes, codon usage bias increases along translational direction, which is consistent with purifying selection against nonsense errors. Fruit fly genes show a nearly symmetric M-shaped spatial pattern of codon usage bias, with less bias in the middle and both ends. The low codon usage bias in the middle region is best explained by interference (the Hill-Robertson effect) between selections at different codon positions. In both yeast and fruit fly, spatial patterns of codon usage bias are characteristically different from patterns of GC-content variations. Effect of expression level on the strength of codon usage bias is more conspicuous than its effect on the shape of the spatial distribution.
Collapse
|
33
|
Abstract
The nonrandom use of synonymous codons (codon bias) is a well-established phenomenon in Drosophila. Recent reports suggest that levels of codon bias differ among genes that are differentially expressed between the sexes, with male-expressed genes showing less codon bias than female-expressed genes. To examine the relationship between sex-biased gene expression and level of codon bias on a genomic scale, we surveyed synonymous codon usage in 7276 D. melanogaster genes that were classified as male-, female-, or non-sex-biased in their expression in microarray experiments. We found that male-biased genes have significantly less codon bias than both female- and non-sex-biased genes. This pattern holds for both germline and somatically expressed genes. Furthermore, we find a significantly negative correlation between level of codon bias and degree of sex-biased expression for male-biased genes. In contrast, female-biased genes do not differ from non-sex-biased genes in their level of codon bias and show a significantly positive correlation between codon bias and degree of sex-biased expression. These observations cannot be explained by differences in chromosomal distribution, mutational processes, recombinational environment, gene length, or absolute expression level among genes of the different expression classes. We propose that the observed codon bias differences result from differences in selection at synonymous and/or linked nonsynonymous sites between genes with male- and female-biased expression.
Collapse
|
34
|
Abstract
We have investigated patterns of within-species polymorphism and between-species divergence for synonymous and nonsynonymous variants at a set of autosomal and X-linked loci of Drosophila miranda. D. pseudoobscura and D. affinis were used for the between-species comparisons. The results suggest the action of purifying selection on nonsynonymous, polymorphic variants. Among synonymous polymorphisms, there is a significant excess of synonymous mutations from preferred to unpreferred codons and of GC to AT mutations. There was no excess of GC to AT mutations among polymorphisms at noncoding sites. This suggests that selection is acting to maintain the use of preferred codons. Indirect evidence suggests that biased gene conversion in favor of GC base pairs may also be operating. The joint intensity of selection and biased gene conversion, in terms of the product of effective population size and the sum of the selection and conversion coefficients, was estimated to be approximately 0.65.
Collapse
|
35
|
Abstract
New and simple numerical criteria based on a codon adaptation index are applied to the complete genomic sequences of 80 Eubacteria and 16 Archaea, to infer weak and strong genome tendencies toward content bias, translational bias, and strand bias. These criteria can be applied to all microbial genomes, even those for which little biological information is known, and a codon bias signature, that is the collection of strong biases displayed by a genome, can be automatically derived. A codon bias space, where genomes are identified by their preferred codons, is proposed as a novel formal framework to interpret genomic relationships. Principal component analysis confirms that although GC content has a dominant effect on codon bias space, thermophilic and mesophilic species can be identified and separated by codon preferences. Two more examples concerning lifestyle are studied with linear discriminant analysis: suitable separating functions characterized by sets of preferred codons are provided to discriminate: translationally biased (hyper)thermophiles from mesophiles, and organisms with different respiratory characteristics, aerobic, anaerobic, facultative aerobic and facultative anaerobic. These results suggest that codon bias space might reflect the geometry of a prokaryotic "physiology space." Evolutionary perspectives are noted, numerical criteria and distances among organisms are validated on known cases, and various results and predictions are discussed both on methodological and biological grounds.
Collapse
|
36
|
Intragenic codon bias in a set of mouse and human genes. J Theor Biol 2004; 230:215-25. [PMID: 15302553 DOI: 10.1016/j.jtbi.2004.05.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2004] [Revised: 05/06/2004] [Accepted: 05/06/2004] [Indexed: 11/20/2022]
Abstract
To better conceptualize the mechanism underlying the evolution of synonymous codons, we have analysed intragenic codon usage in chosen "regions" of some mouse and human genes. We divided a given gene into two regions: one consisting of a trinucleotide repeat (TNR) and the other consisting of the "rest of the coding region" (RCR). Usually, a TNR is composed of a repetitive single codon, which may reflect its frequency in a gene. In contrast, a non-random frequency of a codon in the RCR versus TNR (or vice versa) of a gene should indicate a bias for that codon within the TNR. We examined this scenario by comparing codon frequency between the RCR and the cognate TNR(s) for a set of human and mouse genes. A TNR length of six amino acids or more was used to identify genes from the Genbank database. Twenty nine human and twenty one mouse genes containing TNRs coding for nine different amino acid runs were identified. The ratio of codon frequency in a TNR versus the corresponding RCR was expressed as "fold change" which was also regarded as a measure of codon bias (defined as preferential use either in TNR or in RCR). Chi-square values were then determined from the distribution of codon frequency in a TNR vs. the cognate RCR. At p<0.001, 22% and 27%, respectively, of human and mouse TNRs showed codon bias. Greater than 40% of the TNRs (29 out of 69 in human, and 18 of 42 in mouse) showed codon bias at p<0.05. In addition, we identify eight single-codon TNRs in mouse and ten in human genes. Thus, our results show intragenic codon bias in both mouse and human genes expressed in diverse tissue types. Since our results are independent of the Codon Adaptation Index (CAI) and starvation CAI, and since the tRNA repertoire in a cell or in a tissue is constant, our data suggest that other constraints besides tRNA abundance played a role in creating intragenic codon bias in these genes.
Collapse
|
37
|
Abstract
Synonymous codons are not used at random, significantly influencing the base composition of the genome. The selection-mutation-drift model proposes that this bias reflects natural selection in favor of a subset of preferred codons. Previous estimates in Drosophila of the intensity of selective forces involved seem too large to be reconciled with theoretical predictions of the level of codon bias. This probably results from confounding effects of the demographic histories of the species concerned. We have studied three species of the virilis group of Drosophila, which are more likely to satisfy the assumptions of the evolutionary models. We analyzed the patterns of polymorphism and divergence in a sample of 18 genes and applied a new method for estimating the intensity of selection on synonymous mutations based on the frequencies of unpreferred mutations among polymorphic sites. This yielded estimates of selection intensities (N(e)s) of the order of 0.65, which is more compatible with the observed levels of codon bias. Our results support the action of both selection and mutational bias on codon usage bias and suggest that codon usage and genome base composition in the D. americana lineage are in approximate equilibrium. Biased gene conversion may also contribute to the observed patterns.
Collapse
|
38
|
Abstract
The primary structures of peptides may be adapted for efficient synthesis as well as proper function. Here, the Saccharomyces cerevisiae genome sequence, DNA microarray expression data, tRNA gene numbers, and functional categorizations of proteins are employed to determine whether the amino acid composition of peptides reflects natural selection to optimize the speed and accuracy of translation. Strong relationships between synonymous codon usage bias and estimates of transcript abundance suggest that DNA array data serve as adequate predictors of translation rates. Amino acid usage also shows striking relationships with expression levels. Stronger correlations between tRNA concentrations and amino acid abundances among highly expressed proteins than among less abundant proteins support adaptation of both tRNA abundances and amino acid usage to enhance the speed and accuracy of protein synthesis. Natural selection for efficient synthesis appears to also favor shorter proteins as a function of their expression levels. Comparisons restricted to proteins within functional classes are employed to control for differences in amino acid composition and protein size that reflect differences in the functional requirements of proteins expressed at different levels.
Collapse
|
39
|
Abstract
Synonymous codon usage in yeast appears to be influenced by natural selection on gene expression, as well as regional variation in compositional bias. Because of the large number of potential targets of selection (i.e., most of the codons in the genome) and presumed small selection coefficients, codon usage is an excellent model for studying factors that limit the effectiveness of selection. We use factor analysis to identify major trends in codon usage for 5836 genes in Saccharomyces cerevisiae. The primary factor is strongly correlated with gene expression, consistent with the model that a subset of codons allows for more efficient translation. The secondary factor is very strongly correlated with third codon position GC content and probably reflects regional variation in compositional bias. We find that preferred codon usage decreases in the face of three potential limitations on the effectiveness of selection: reduced recombination rate, increased gene length, and reduced intergenic spacing. All three patterns are consistent with the Hill-Robertson effect (reduced effectiveness of selection among linked targets). A reduction in gene expression in closely spaced genes may also reflect selection conflicts due to antagonistic pleiotropy.
Collapse
|
40
|
Abstract
Revealing the determinants of codon usage bias is central to the understanding of factors governing viral evolution. Herein, we report the results of a survey of codon usage bias in a wide range of genetically and ecologically diverse human RNA viruses. This analysis showed that the overall extent of codon usage bias in RNA viruses is low and that there is little variation in bias between genes. Furthermore, the strong correlation between base and dinucleotide composition and codon usage bias suggested that mutation pressure rather than natural (translational) selection is the most important determinant of the codon bias observed. However, we also detected correlations between codon usage bias and some characteristics of viral genome structure and ecology, with increased bias in segmented and aerosol-transmitted viruses and decreased bias in vector-borne viruses. This suggests that translational selection may also have some influence in shaping codon usage bias.
Collapse
|
41
|
Abstract
The molecular evolution of the histone multigene family was studied by cloning and determining the nucleotide sequences of the histone 3 genes in seven Drosophila species, D. takahashii, D. lutescens, D. ficusphila, D. persimilis, D.pseudoobscura, D. americana and D. immigrans. CT repeats, a TATA box and an AGTG motif in the 5' region, and a hairpin loop and purine-rich motifs (CAA(T/G)GAGA) in the 3' region were conserved even in distantly related species. In D. hydei and D.americana, the GC content at the third codon position in the protein coding region was relatively low (49% and 45%), while in D. takahashii and D. lutescens it was relatively high (64% and 65%). The non- significant correlation between the GC contents in the 3' region and at the third codon position as well as the evidence of less constraint in the 3' region suggested that mutational bias may not be the major mechanism responsible for the biased nucleotide change at the third codon position or for codon usage bias.
Collapse
|
42
|
Hill-Robertson interference is a minor determinant of variations in codon bias across Drosophila melanogaster and Caenorhabditis elegans genomes. Mol Biol Evol 2002; 19:1399-406. [PMID: 12200468 DOI: 10.1093/oxfordjournals.molbev.a004203] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
According to population genetics models, genomic regions with lower crossing-over rates are expected to experience less effective selection because of Hill-Robertson interference (HRi). The effect of genetic linkage is thought to be particularly important for a selection of weak intensity such as selection affecting codon usage. Consistent with this model, codon bias correlates positively with recombination rate in Drosophila melanogaster and Caenorhabditis elegans. However, in these species, the G+C content of both noncoding DNA and synonymous sites correlates positively with recombination, which suggests that mutation patterns and recombination are associated. To remove this effect of mutation patterns on codon bias, we used the synonymous sites of lowly expressed genes that are expected to be effectively neutral sites. We measured the differences between codon biases of highly expressed genes and their lowly expressed neighbors. In D. melanogaster we find that HRi weakly reduces selection on codon usage of genes located in regions of very low recombination; but these genes only comprise 4% of the total. In C. elegans we do not find any evidence for the effect of recombination on selection for codon bias. Computer simulations indicate that HRi poorly enhances codon bias if the local recombination rate is greater than the mutation rate. This prediction of the model is consistent with our data and with the current estimate of the mutation rate in D. melanogaster. The case of C. elegans, which is highly self-fertilizing, is discussed. Our results suggest that HRi is a minor determinant of variations in codon bias across the genome.
Collapse
|
43
|
Shannon information theoretic computation of synonymous codon usage biases in coding regions of human and mouse genomes. Genome Res 2002; 12:944-55. [PMID: 12045147 PMCID: PMC1383734 DOI: 10.1101/gr.213402] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2001] [Accepted: 03/06/2002] [Indexed: 11/24/2022]
Abstract
Exonic GC of human mRNA reference sequences (RefSeqs), as well as A, C, G, and T in codon position 3 are linearly correlated with genomic GC. These observations utilize information from the completed human genome sequence and a large, high-quality set of human and mouse coding sequences, and are in accord with similar determinations published by others. A Shannon Information Theoretic measure of bias in synonymous codon usage was developed. When applied to either human or mouse RefSeqs, this measure is nonlinearly correlated with genomic, exonic, and third codon position A, C, G, and T. Information values between orthologous mouse and human RefSeqs are linearly correlated: mouse = 0.092 + 0.55 human. Mouse genes were consistently placed in genomic regions whose GC content was closer to 50% than was the GC content of the human ortholog. Since the (nonlinear) information versus percent GC curve has a minimum at 50% GC and monotonically increases with increasing distance from 50% GC, this phenomenon directly results in the low slope of 0.55. This appears to be a manifestation of an evolutionary strategy for placement of genes in regions of the genome with a GC content that relates synonymous codon bias and protein folding.
Collapse
|
44
|
Abstract
The combination of complete genome sequence information and estimates of mRNA abundances have begun to reveal causes of both silent and protein sequence evolution. Translational selection appears to explain patterns of synonymous codon usage in many prokaryotes as well as a number of eukaryotic model organisms (with the notable exception of vertebrates). Relationships between gene length and codon usage bias, however, remain unexplained. Intriguing correlations between expression patterns and protein divergence suggest some general mechanisms underlying protein evolution.
Collapse
|
45
|
The evolutionary analysis of "orphans" from the Drosophila genome identifies rapidly diverging and incorrectly annotated genes. Genetics 2001; 159:589-98. [PMID: 11606536 PMCID: PMC1461820 DOI: 10.1093/genetics/159.2.589] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
In genome projects of eukaryotic model organisms, a large number of novel genes of unknown function and evolutionary history ("orphans") are being identified. Since many orphans have no known homologs in distant species, it is unclear whether they are restricted to certain taxa or evolve rapidly, either because of a lack of constraints or positive Darwinian selection. Here we use three criteria for the selection of putatively rapidly evolving genes from a single sequence of Drosophila melanogaster. Thirteen candidate genes were chosen from the Adh region on the second chromosome and 1 from the tip of the X chromosome. We succeeded in obtaining sequence from 6 of these in the closely related species D. simulans and D. yakuba. Only 1 of the 6 genes showed a large number of amino acid replacements and in-frame insertions/deletions. A population survey of this gene suggests that its rapid evolution is due to the fixation of many neutral or nearly neutral mutations. Two other genes showed "normal" levels of divergence between species. Four genes had insertions/deletions that destroy the putative reading frame within exons, suggesting that these exons have been incorrectly annotated. The evolutionary analysis of orphan genes in closely related species is useful for the identification of both rapidly evolving and incorrectly annotated genes.
Collapse
|
46
|
Selection at the amino acid level can influence synonymous codon usage: implications for the study of codon adaptation in plastid genes. Genetics 2001; 159:347-58. [PMID: 11560910 PMCID: PMC1461792 DOI: 10.1093/genetics/159.1.347] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
A previously employed method that uses the composition of noncoding DNA as the basis of a test for selection between synonymous codons in plastid genes is reevaluated. The test requires the assumption that in the absence of selective differences between synonymous codons the composition of silent sites in coding sequences will match the composition of noncoding sites. It is demonstrated here that this assumption is not necessarily true and, more generally, that using compositional properties to draw inferences about selection on silent changes in coding sequences is much more problematic than commonly assumed. This is so because selection on nonsynonymous changes can influence the composition of synonymous sites (i.e., codon usage) in a complex manner, meaning that the composition biases of different silent sites, including neutral noncoding DNA, are not comparable. These findings also draw into question the commonly utilized method of investigating how selection to increase translation accuracy influences codon usage. The work then focuses on implications for studies that assess codon adaptation, which is selection on codon usage to enhance translation rate, in plastid genes. A new test that does not require the use of noncoding DNA is proposed and applied. The results of this test suggest that far fewer plastid genes display codon adaptation than previously thought.
Collapse
|
47
|
Abstract
I present here evidence of remarkable local changes in GC/AT substitution biases and in crossover frequencies on Drosophila chromosomes. The substitution pattern at 10 loci in the telomeric region of the X chromosome was studied for four species of the Drosophila melanogaster species subgroup. Drosophila orena and Drosophila erecta are clearly the most closely related species pair (the erecta complex) among the four species studied; however, the overall data at the 10 loci revealed a clear dichotomy in the silent substitution patterns between the AT-biased- substitution melanogaster and erecta lineages and the GC-biased-substitution yakuba and orena lineages, suggesting two or more independent changes in GC/AT substitution biases. More importantly, the results indicated a between- loci heterogeneity in GC/AT substitution bias in this small region independently in the yakuba and orena lineages. Indeed, silent substitutions in the orena lineage were significantly biased toward G and C at the consecutive yellow, lethal of scute, and asense loci, but they were significantly biased toward A and T at sta. The substitution bias toward G and C was centered in different areas in yakuba (significantly biased at EG:165H7.3, EG:171D11.2, and suppressor of sable). The similar silent substitution patterns in coding and noncoding regions, furthermore, suggested mutational biases as a cause of the substitution biases. On the other hand, previous study reveals that Drosophila yakuba has about 20-fold higher crossover frequencies in the telomeric region of the X chromosome than does D. melanogaster; this study revealed that the total genetic map length of the yakuba X chromosome was only about 1.5 times as large as that of melanogaster and that the map length of the X-telomeric y-sta region did not differ between Drosophila yakuba and D. erecta. Taken together, the data strongly suggested that an approximately 20- fold reduction in the X-telomeric crossover frequencies occurred in the ancestral population of D. melanogaster after the melanogaster-yakuba divergence but before the melanogaster-simulans divergence.
Collapse
|
48
|
Abstract
The relationships between synonymous and nonsynonymous substitution rates and between synonymous rate and codon usage bias are important to our understanding of the roles of mutation and selection in the evolution of Drosophila genes. Previous studies used approximate estimation methods that ignore codon bias. In this study we reexamine those relationships using maximum-likelihood methods to estimate substitution rates, which accommodate the transition/transversion rate bias and codon usage bias. We compiled a sample of homologous DNA sequences at 83 nuclear loci from Drosophila melanogaster and at least one other species of Drosophila. Our analysis was consistent with previous studies in finding that synonymous rates were positively correlated with nonsynonymous rates. Our analysis differed from previous studies, however, in that synonymous rates were unrelated to codon bias. We therefore conducted a simulation study to investigate the differences between approaches. The results suggested that failure to properly account for multiple substitutions at the same site and for biased codon usage by approximate methods can lead to an artifactual correlation between synonymous rate and codon bias. Implications of the results for translational selection are discussed.
Collapse
|
49
|
Inferring parameters of mutation, selection and demography from patterns of synonymous site evolution in Drosophila. Genetics 2001; 157:245-57. [PMID: 11139506 PMCID: PMC1461462 DOI: 10.1093/genetics/157.1.245] [Citation(s) in RCA: 117] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Selection acting on codon usage can cause patterns of synonymous evolution to deviate considerably from those expected under neutrality. To investigate the quantitative relationship between parameters of mutation, selection, and demography, and patterns of synonymous site divergence, we have developed a novel combination of population genetic models and likelihood methods of phylogenetic sequence analysis. Comparing 50 orthologous gene pairs from Drosophila melanogaster and D. virilis and 27 from D. melanogaster and D. simulans, we show considerable variation between amino acids and genes in the strength of selection acting on codon usage and find evidence for both long-term and short-term changes in the strength of selection between species. Remarkably, D. melanogaster shows no evidence of current selection on codon usage, while its sister species D. simulans experiences only half the selection pressure for codon usage of their common ancestor. We also find evidence for considerable base asymmetries in the rate of mutation, such that the average synonymous mutation rate is 20-30% higher than in noncoding regions. A Bayesian approach is adopted to investigate how accounting for selection on codon usage influences estimates of the parameters of mutation.
Collapse
|
50
|
A test of translational selection at 'silent' sites in the human genome: base composition comparisons in alternatively spliced genes. Gene 2000; 261:93-105. [PMID: 11164041 DOI: 10.1016/s0378-1119(00)00482-0] [Citation(s) in RCA: 78] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Natural selection appears to discriminate among synonymous codons to enhance translational efficiency in a wide range of prokaryotes and eukaryotes. Codon bias is strongly related to gene expression levels in these species. In addition, between-gene variation in silent DNA divergence is inversely correlated with codon bias. However, in mammals, between-gene comparisons are complicated by distinctive nucleotide-content bias (isochores) throughout the genome. In this study, we attempted to identify translational selection by analyzing the DNA sequences of alternatively spliced genes in humans and in Drosophila melanogaster. Among codons in an alternatively spliced gene, those in constitutively expressed exons are translated more often than those in alternatively spliced exons. Thus, translational selection should act more strongly to bias codon usage and reduce silent divergence in constitutive than in alternative exons. By controlling for regional forces affecting base-composition evolution, this within-gene comparison makes it possible to detect codon selection at synonymous sites in mammals. We found that GC-ending codons are more abundant in constitutive than alternatively spliced exons in both Drosophila and humans. Contrary to our expectation, however, silent DNA divergence between mammalian species is higher in constitutive than in alternative exons.
Collapse
|