1
|
Zhang S, Lin R, Cui L, Jiang T, Shi J, Lu C, Li P, Zhou M. Alter codon bias of the P. pastoris genome to overcome a bottleneck in codon optimization strategy development and improve protein expression. Microbiol Res 2024; 282:127629. [PMID: 38330819 DOI: 10.1016/j.micres.2024.127629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 12/27/2023] [Accepted: 01/21/2024] [Indexed: 02/10/2024]
Abstract
Apart from its role in translation, codon bias is also an important mechanism to regulate mRNA levels. The traditional frequency-based codon optimization strategy is rather efficient in organisms such as N. crassa, but much less in yeast P. pastoris which is a popular host for heterologous protein expression. This is because that unlike N. crassa, the preferred codons of P. pastoris are actually AU-rich and hence codon optimization for extremely low GC content comes with issues of pre-mature transcriptional termination or low RNA stability in spite of translational advantages. To overcome this bottleneck, we focused on three reporter genes in P. pastoris first and confirmed the great advantage of GC-prone codon optimization on mRNA levels. Then we altered the codon bias profile of P. pastoris by introducing additional rare tRNA gene copies. Prior to that we constructed IPTG-regulated tRNA species to enable chassis cells to switch between different codon bias status. As demonstrated again with reporter genes, protein yield of luc and 0788 was successfully increased by 4-5 folds in chassis cells. In summary, here we provide an alternative codon optimization strategy for genes with unsatisfactory performance under traditional codon frequency-based optimization.
Collapse
Affiliation(s)
- Siyu Zhang
- State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Ru Lin
- State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Luyao Cui
- State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Tianyi Jiang
- China Innovation Center of Roche, Shanghai 201203, China
| | - Jiacheng Shi
- State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Chaoyu Lu
- State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Pengfei Li
- State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Mian Zhou
- State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai 200237, China.
| |
Collapse
|
2
|
Li Y, Wang R, Wang H, Pu F, Feng X, Jin L, Ma Z, Ma XX. Codon Usage Bias in Autophagy-Related Gene 13 in Eukaryotes: Uncovering the Genetic Divergence by the Interplay Between Nucleotides and Codon Usages. Front Cell Infect Microbiol 2021; 11:771010. [PMID: 34804999 PMCID: PMC8602353 DOI: 10.3389/fcimb.2021.771010] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2021] [Accepted: 10/12/2021] [Indexed: 12/15/2022] Open
Abstract
Synonymous codon usage bias is a universal characteristic of genomes across various organisms. Autophagy-related gene 13 (atg13) is one essential gene for autophagy initiation, yet the evolutionary trends of the atg13 gene at the usages of nucleotide and synonymous codon remains unexplored. According to phylogenetic analyses for the atg13 gene of 226 eukaryotic organisms at the nucleotide and amino acid levels, it is clear that their nucleotide usages exhibit more genetic information than their amino acid usages. Specifically, the overall nucleotide usage bias quantified by information entropy reflected that the usage biases at the first and second codon positions were stronger than those at the third position of the atg13 genes. Furthermore, the bias level of nucleotide ‘G’ usage is highest, while that of nucleotide ‘C’ usage is lowest in the atg13 genes. On top of that, genetic features represented by synonymous codon usage exhibits a species-specific pattern on the evolution of the atg13 genes to some extent. Interestingly, the codon usages of atg13 genes in the ancestor animals (Latimeria chalumnae, Petromyzon marinus, and Rhinatrema bivittatum) are strongly influenced by mutation pressure from nucleotide composition constraint. However, the distributions of nucleotide composition at different codon positions in the atg13 gene display that natural selection still dominates atg13 codon usages during organisms’ evolution.
Collapse
Affiliation(s)
- Yicong Li
- Biomedical Research Center, Northwest Minzu University, Lanzhou, China
| | - Rui Wang
- Viterbi School of Engineering, University of Southern California, Los Angeles, CA, United States
| | - Huihui Wang
- Biomedical Research Center, Northwest Minzu University, Lanzhou, China
| | - Feiyang Pu
- Biomedical Research Center, Northwest Minzu University, Lanzhou, China
| | - Xili Feng
- Biomedical Research Center, Northwest Minzu University, Lanzhou, China
| | - Li Jin
- Biomedical Research Center, Northwest Minzu University, Lanzhou, China
| | - Zhongren Ma
- Biomedical Research Center, Northwest Minzu University, Lanzhou, China
| | - Xiao-Xia Ma
- Biomedical Research Center, Northwest Minzu University, Lanzhou, China
| |
Collapse
|
3
|
Liu Y, Yang Q, Zhao F. Synonymous but Not Silent: The Codon Usage Code for Gene Expression and Protein Folding. Annu Rev Biochem 2021; 90:375-401. [PMID: 33441035 DOI: 10.1146/annurev-biochem-071320-112701] [Citation(s) in RCA: 109] [Impact Index Per Article: 27.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Codon usage bias, the preference for certain synonymous codons, is found in all genomes. Although synonymous mutations were previously thought to be silent, a large body of evidence has demonstrated that codon usage can play major roles in determining gene expression levels and protein structures. Codon usage influences translation elongation speed and regulates translation efficiency and accuracy. Adaptation of codon usage to tRNA expression determines the proteome landscape. In addition, codon usage biases result in nonuniform ribosome decoding rates on mRNAs, which in turn influence the cotranslational protein folding process that is critical for protein function in diverse biological processes. Conserved genome-wide correlations have also been found between codon usage and protein structures. Furthermore, codon usage is a major determinant of mRNA levels through translation-dependent effects on mRNA decay and translation-independent effects on transcriptional and posttranscriptional processes. Here, we discuss the multifaceted roles and mechanisms of codon usage in different gene regulatory processes.
Collapse
Affiliation(s)
- Yi Liu
- Department of Physiology, University of Texas Southwestern Medical Center, Dallas, Texas 75390-9040, USA;
| | - Qian Yang
- Department of Physiology, University of Texas Southwestern Medical Center, Dallas, Texas 75390-9040, USA;
| | - Fangzhou Zhao
- Department of Physiology, University of Texas Southwestern Medical Center, Dallas, Texas 75390-9040, USA;
| |
Collapse
|
4
|
Abstract
Codon usage depends on mutation bias, tRNA-mediated selection, and the need for high efficiency and accuracy in translation. One codon in a synonymous codon family is often strongly over-used, especially in highly expressed genes, which often leads to a high dN/dS ratio because dS is very small. Many different codon usage indices have been proposed to measure codon usage and codon adaptation. Sense codon could be misread by release factors and stop codons misread by tRNAs, which also contribute to codon usage in rare cases. This chapter outlines the conceptual framework on codon evolution, illustrates codon-specific and gene-specific codon usage indices, and presents their applications. A new index for codon adaptation that accounts for background mutation bias (Index of Translation Elongation) is presented and contrasted with codon adaptation index (CAI) which does not consider background mutation bias. They are used to re-analyze data from a recent paper claiming that translation elongation efficiency matters little in protein production. The reanalysis disproves the claim.
Collapse
|
5
|
Hanson G, Coller J. Codon optimality, bias and usage in translation and mRNA decay. Nat Rev Mol Cell Biol 2017; 19:20-30. [PMID: 29018283 DOI: 10.1038/nrm.2017.91] [Citation(s) in RCA: 495] [Impact Index Per Article: 61.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The advent of ribosome profiling and other tools to probe mRNA translation has revealed that codon bias - the uneven use of synonymous codons in the transcriptome - serves as a secondary genetic code: a code that guides the efficiency of protein production, the fidelity of translation and the metabolism of mRNAs. Recent advancements in our understanding of mRNA decay have revealed a tight coupling between ribosome dynamics and the stability of mRNA transcripts; this coupling integrates codon bias into the concept of codon optimality, or the effects that specific codons and tRNA concentrations have on the efficiency and fidelity of the translation machinery. In this Review, we first discuss the evidence for codon-dependent effects on translation, beginning with the basic mechanisms through which translation perturbation can affect translation efficiency, protein folding and transcript stability. We then discuss how codon effects are leveraged by the cell to tailor the proteome to maintain homeostasis, execute specific gene expression programmes of growth or differentiation and optimize the efficiency of protein production.
Collapse
Affiliation(s)
- Gavin Hanson
- Center for RNA Science and Therapeutics, Case Western Reserve University, Cleveland, Ohio 44106, USA
| | - Jeff Coller
- Center for RNA Science and Therapeutics, Case Western Reserve University, Cleveland, Ohio 44106, USA
| |
Collapse
|
6
|
Multiple Transcript Properties Related to Translation Affect mRNA Degradation Rates in Saccharomyces cerevisiae. G3-GENES GENOMES GENETICS 2016; 6:3475-3483. [PMID: 27633789 PMCID: PMC5100846 DOI: 10.1534/g3.116.032276] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Degradation of mRNA contributes to variation in transcript abundance. Studies of individual mRNAs have shown that both cis and trans factors affect mRNA degradation rates. However, the factors underlying transcriptome-wide variation in mRNA degradation rates are poorly understood. We investigated the contribution of different transcript properties to transcriptome-wide degradation rate variation in the budding yeast, Saccharomyces cerevisiae, using multiple regression analysis. We find that multiple transcript properties are significantly associated with variation in mRNA degradation rates, and that a model incorporating these properties explains ∼50% of the genome-wide variance. Predictors of mRNA degradation rates include transcript length, ribosome density, biased codon usage, and GC content of the third position in codons. To experimentally validate these factors, we studied individual transcripts expressed from identical promoters. We find that decreasing ribosome density by mutating the first translational start site of a transcript increases its degradation rate. Using coding sequence variants of green fluorescent protein (GFP) that differ only at synonymous sites, we show that increased GC content of the third position of codons results in decreased rates of mRNA degradation. Thus, in steady-state conditions, a large fraction of genome-wide variation in mRNA degradation rates is determined by inherent properties of transcripts, many of which are related to translation, rather than specific regulatory mechanisms.
Collapse
|
7
|
Shabalina SA, Spiridonov NA, Kashina A. Sounds of silence: synonymous nucleotides as a key to biological regulation and complexity. Nucleic Acids Res 2013; 41:2073-94. [PMID: 23293005 PMCID: PMC3575835 DOI: 10.1093/nar/gks1205] [Citation(s) in RCA: 194] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
Messenger RNA is a key component of an intricate regulatory network of its own. It accommodates numerous nucleotide signals that overlap protein coding sequences and are responsible for multiple levels of regulation and generation of biological complexity. A wealth of structural and regulatory information, which mRNA carries in addition to the encoded amino acid sequence, raises the question of how these signals and overlapping codes are delineated along non-synonymous and synonymous positions in protein coding regions, especially in eukaryotes. Silent or synonymous codon positions, which do not determine amino acid sequences of the encoded proteins, define mRNA secondary structure and stability and affect the rate of translation, folding and post-translational modifications of nascent polypeptides. The RNA level selection is acting on synonymous sites in both prokaryotes and eukaryotes and is more common than previously thought. Selection pressure on the coding gene regions follows three-nucleotide periodic pattern of nucleotide base-pairing in mRNA, which is imposed by the genetic code. Synonymous positions of the coding regions have a higher level of hybridization potential relative to non-synonymous positions, and are multifunctional in their regulatory and structural roles. Recent experimental evidence and analysis of mRNA structure and interspecies conservation suggest that there is an evolutionary tradeoff between selective pressure acting at the RNA and protein levels. Here we provide a comprehensive overview of the studies that define the role of silent positions in regulating RNA structure and processing that exert downstream effects on proteins and their functions.
Collapse
Affiliation(s)
- Svetlana A Shabalina
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20984, USA.
| | | | | |
Collapse
|
8
|
Baycin-Hizal D, Tabb DL, Chaerkady R, Chen L, Lewis NE, Nagarajan H, Sarkaria V, Kumar A, Wolozny D, Colao J, Jacobson E, Tian Y, O'Meally RN, Krag SS, Cole RN, Palsson BO, Zhang H, Betenbaugh M. Proteomic analysis of Chinese hamster ovary cells. J Proteome Res 2012; 11:5265-76. [PMID: 22971049 DOI: 10.1021/pr300476w] [Citation(s) in RCA: 142] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
To complement the recent genomic sequencing of Chinese hamster ovary (CHO) cells, proteomic analysis was performed on CHO cells including the cellular proteome, secretome, and glycoproteome using tandem mass spectrometry (MS/MS) of multiple fractions obtained from gel electrophoresis, multidimensional liquid chromatography, and solid phase extraction of glycopeptides (SPEG). From the 120 different mass spectrometry analyses generating 682,097 MS/MS spectra, 93,548 unique peptide sequences were identified with at most 0.02 false discovery rate (FDR). A total of 6164 grouped proteins were identified from both glycoproteome and proteome analysis, representing an 8-fold increase in the number of proteins currently identified in the CHO proteome. Furthermore, this is the first proteomic study done using the CHO genome exclusively, which provides for more accurate identification of proteins. From this analysis, the CHO codon frequency was determined and found to be distinct from humans, which will facilitate expression of human proteins in CHO cells. Analysis of the combined proteomic and mRNA data sets indicated the enrichment of a number of pathways including protein processing and apoptosis but depletion of proteins involved in steroid hormone and glycosphingolipid metabolism. Five-hundred four of the detected proteins included N-acetylation modifications, and 1292 different proteins were observed to be N-glycosylated. This first large-scale proteomic analysis will enhance the knowledge base about CHO capabilities for recombinant expression and provide information useful in cell engineering efforts aimed at modifying CHO cellular functions.
Collapse
Affiliation(s)
- Deniz Baycin-Hizal
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, Maryland, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
9
|
TPS1 terminator increases mRNA and protein yield in a Saccharomyces cerevisiae expression system. Biosci Biotechnol Biochem 2011; 75:2234-6. [PMID: 22056446 DOI: 10.1271/bbb.110246] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Both terminators and promoters regulate gene expression. In Saccharomyces cerevisiae, the TPS1 terminator (TPS1t), coupled to a gene encoding a fluorescent protein, produced more transgenic mRNA and protein than did similar constructs containing other terminators, such as CYC1t, TDH3t, and PGK1t. This suggests that TPS1t can be used as a general terminator in the development of metabolically engineered yeast in high-yield systems.
Collapse
|
10
|
Dikicioglu D, Karabekmez E, Rash B, Pir P, Kirdar B, Oliver SG. How yeast re-programmes its transcriptional profile in response to different nutrient impulses. BMC SYSTEMS BIOLOGY 2011; 5:148. [PMID: 21943358 PMCID: PMC3224505 DOI: 10.1186/1752-0509-5-148] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/05/2011] [Accepted: 09/25/2011] [Indexed: 01/18/2023]
Abstract
Background A microorganism is able to adapt to changes in its physicochemical or nutritional environment and this is crucial for its survival. The yeast, Saccharomyces cerevisiae, has developed mechanisms to respond to such environmental changes in a rapid and effective manner; such responses may demand a widespread re-programming of gene activity. The dynamics of the re-organization of the cellular activities of S. cerevisiae in response to the sudden and transient removal of either carbon or nitrogen limitation has been studied by following both the short- and long-term changes in yeast's transcriptomic profiles. Results The study, which spans timescales from seconds to hours, has revealed the hierarchy of metabolic and genetic regulatory switches that allow yeast to adapt to, and recover from, a pulse of a previously limiting nutrient. At the transcriptome level, a glucose impulse evoked significant changes in the expression of genes concerned with glycolysis, carboxylic acid metabolism, oxidative phosphorylation, and nucleic acid and sulphur metabolism. In ammonium-limited cultures, an ammonium impulse resulted in the significant changes in the expression of genes involved in nitrogen metabolism and ion transport. Although both perturbations evoked significant changes in the expression of genes involved in the machinery and process of protein synthesis, the transcriptomic response was delayed and less complex in the case of an ammonium impulse. Analysis of the regulatory events by two different system-level, network-based approaches provided further information about dynamic organization of yeast cells as a response to a nutritional change. Conclusions The study provided important information on the temporal organization of transcriptomic organization and underlying regulatory events as a response to both carbon and nitrogen impulse. It has also revealed the importance of a long-term dynamic analysis of the response to the relaxation of a nutritional limitation to understand the molecular basis of the cells' dynamic behaviour.
Collapse
Affiliation(s)
- Duygu Dikicioglu
- Department of Chemical Engineering, Bogazici University, Bebek 34342, Istanbul, Turkey
| | | | | | | | | | | |
Collapse
|
11
|
Deane CM, Saunders R. The imprint of codons on protein structure. Biotechnol J 2011; 6:641-9. [DOI: 10.1002/biot.201000329] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2011] [Revised: 03/10/2011] [Accepted: 03/23/2011] [Indexed: 12/23/2022]
|
12
|
Kahali B, Ahmad S, Ghosh TC. Selective constraints in yeast genes with differential expressivity: codon pair usage and mRNA stability perspectives. Gene 2011; 481:76-82. [PMID: 21554930 DOI: 10.1016/j.gene.2011.04.009] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2011] [Revised: 04/18/2011] [Accepted: 04/19/2011] [Indexed: 01/22/2023]
Abstract
Protein translation has been elucidated to be dictated by evolutionary constraints, namely, variations in tRNA availabilities and/or variations in codon-anticodon binding that is manifested in biased codon usage. Taking advantage of publicly available mRNA expression and protein abundance data for Saccharomyces cerevisiae, we have performed a comprehensive analysis of the diverse factors guiding translation leading to desired protein levels irrespective of the corresponding high or low mRNA levels. It has been elucidated in this study that different combinations of most abundant/non abundant tRNA isoacceptors are selected for in S. cerevisiae that helps in achieving the optimum speed and accuracy in the protein translation process. This is also accompanied by the strategic location of codon pairs in coherence to mRNA secondary structure folding stability for the above mentioned combinations of tRNA isoacceptors. We thus find that codon pair contextual effects; in addition to tRNA abundance and mRNA folding stability during translation elongation process play plausible roles in maintaining translation accuracy and speed that can achieve desired protein levels.
Collapse
Affiliation(s)
- Bratati Kahali
- Bioinformatics Centre, Bose Institute, C.I.T. Scheme VII M, Kolkata, India.
| | | | | |
Collapse
|
13
|
Features of recent codon evolution: a comparative polymorphism-fixation study. J Biomed Biotechnol 2010; 2010:202918. [PMID: 20622912 PMCID: PMC2896653 DOI: 10.1155/2010/202918] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2010] [Accepted: 03/31/2010] [Indexed: 11/17/2022] Open
Abstract
Features of amino-acid and codon changes can provide us important insights on protein evolution. So far, investigators have often examined mutation patterns at either interspecies fixed substitution or intraspecies nucleotide polymorphism level, but not both. Here, we performed a unique analysis of a combined set of intra-species polymorphisms and inter-species substitutions in human codons. Strong difference in mutational pattern was found at codon positions 1, 2, and 3 between the polymorphism and fixation data. Fixation had strong bias towards increasing the rarest codons but decreasing the most frequently used codons, suggesting that codon equilibrium has not been reached yet. We detected strong CpG effect on CG-containing codons and subsequent suppression by fixation. Finally, we detected the signature of purifying selection against Amid R:U dinucleotides at synonymous dicodon boundaries. Overall, fixation process could effectively and quickly correct the volatile changes introduced by polymorphisms so that codon changes could be gradual and directional and that codon composition could be kept relatively stable during evolution.
Collapse
|
14
|
CHEN W, LUO LF, ZHANG LR, XING YQ. Nucleosome Positioning and RNA Splicing*. PROG BIOCHEM BIOPHYS 2009. [DOI: 10.3724/sp.j.1206.2008.00816] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
15
|
Huang Y, Koonin EV, Lipman DJ, Przytycka TM. Selection for minimization of translational frameshifting errors as a factor in the evolution of codon usage. Nucleic Acids Res 2009; 37:6799-810. [PMID: 19745054 PMCID: PMC2777431 DOI: 10.1093/nar/gkp712] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
In a wide range of genomes, it was observed that the usage of synonymous codons is biased toward specific codons and codon patterns. Factors that are implicated in the selection for codon usage include facilitation of fast and accurate translation. There are two types of translational errors: missense errors and processivity errors. There is considerable evidence in support of the hypothesis that codon usage is optimized to minimize missense errors. In contrast, little is known about the relationship between codon usage and frameshifting errors, an important form of processivity errors, which appear to occur at frequencies comparable to the frequencies of missense errors. Based on the recently proposed pause-and-slip model of frameshifting, we developed Frameshifting Robustness Score (FRS). We used this measure to test if the pattern of codon usage indicates optimization against frameshifting errors. We found that the FRS values of protein-coding sequences from four analyzed genomes (the bacteria Bacillus subtilis and Escherichia coli, and the yeasts Saccharomyces cerevisiae and Schizosaccharomyce pombe) were typically higher than expected by chance. Other properties of FRS patterns observed in B. subtilis, S. cerevisiae and S. pombe, such as the tendency of FRS to increase from the 5′- to 3′-end of protein-coding sequences, were also consistent with the hypothesis of optimization against frameshifting errors in translation. For E. coli, the results of different tests were less consistent, suggestive of a much weaker optimization, if any. Collectively, the results fit the concept of selection against mistranslation-induced protein misfolding being one of the factors shaping the evolution of both coding and non-coding sequences.
Collapse
Affiliation(s)
- Yang Huang
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | | | | |
Collapse
|
16
|
Warnecke T, Weber CC, Hurst LD. Why there is more to protein evolution than protein function: splicing, nucleosomes and dual-coding sequence. Biochem Soc Trans 2009; 37:756-61. [PMID: 19614589 DOI: 10.1042/bst0370756] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
There is considerable variation in the rate at which different proteins evolve. Why is this? Classically, it has been considered that the density of functionally important sites must predict rates of protein evolution. Likewise, amino acid choice is usually assumed to reflect optimal protein function. In the present article, we briefly review evidence suggesting that this protein function-centred view is too simplistic. In particular, we concentrate on how selection acting during the protein's production history can also affect protein evolutionary rates and amino acid choice. Exploring the role of selection at the DNA and RNA level, we specifically address how the need (i) to specify exonic splice enhancer motifs in pre-mRNA, and (ii) to ensure nucleosome positioning on DNA have an impact on amino acid choice and rates of evolution. For both, we review evidence that sequence affected by more than one coding demand is particularly constrained. Strikingly, in mammals, splicing-related constraints are quantitatively as important as expression parameters in predicting rates of protein evolution. These results indicate that there is substantially more to protein evolution than protein functional constraints.
Collapse
Affiliation(s)
- Tobias Warnecke
- Department of Biology and Biochemistry, University of Bath, Bath BA2 7AY, UK
| | | | | |
Collapse
|
17
|
Suzuki H, Brown CJ, Forney LJ, Top EM. Comparison of correspondence analysis methods for synonymous codon usage in bacteria. DNA Res 2008; 15:357-65. [PMID: 18940873 PMCID: PMC2608848 DOI: 10.1093/dnares/dsn028] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2008] [Accepted: 09/24/2008] [Indexed: 12/02/2022] Open
Abstract
Synonymous codon usage varies both between organisms and among genes within a genome, and arises due to differences in G + C content, replication strand skew, or gene expression levels. Correspondence analysis (CA) is widely used to identify major sources of variation in synonymous codon usage among genes and provides a way to identify horizontally transferred or highly expressed genes. Four methods of CA have been developed based on three kinds of input data: absolute codon frequency, relative codon frequency, and relative synonymous codon usage (RSCU) as well as within-group CA (WCA). Although different CA methods have been used in the past, no comprehensive comparative study has been performed to evaluate their effectiveness. Here, the four CA methods were evaluated by applying them to 241 bacterial genome sequences. The results indicate that WCA is more effective than the other three methods in generating axes that reflect variations in synonymous codon usage. Furthermore, WCA reveals sources that were previously unnoticed in some genomes; e.g. synonymous codon usage related to replication strand skew was detected in Rickettsia prowazekii. Though CA based on RSCU is widely used, our evaluation indicates that this method does not perform as well as WCA.
Collapse
Affiliation(s)
- Haruo Suzuki
- Department of Biological Sciences and Initiative for Bioinformatics and Evolutionary Studies, University of Idaho, PO Box 443051, Moscow, Idaho 83844-3051, USA.
| | | | | | | |
Collapse
|
18
|
Abstract
The persistent difficulties in the production of protein at high levels in heterologous systems, as well as the inability to understand pathologies associated with protein aggregation, highlight our limited knowledge on the mechanisms of protein folding in vivo. Attempts to improve yield and quality of recombinant proteins are diverse, frequently involving optimization of the cell growth temperature, the use of synonymous codons and/or the co-expression of tRNAs, chaperones and folding catalysts among others. Although protein secondary structure can be determined largely by the amino acid sequence, protein folding within the cell is affected by a range of factors beyond amino acid sequence. The folding pathway of a nascent polypeptide can be affected by transient interactions with other proteins and ligands, the ribosome, translocation through a pore membrane, redox conditions, among others. The translation rate as well as the translation machinery itself can dramatically affect protein folding, and thus the structure and function of the protein product. This review addresses current efforts to better understand how the use of synonymous codons in the mRNA and the availability of tRNAs can modulate translation kinetics, affecting the folding, the structure and the biological activity of proteins.
Collapse
Affiliation(s)
- Monica Marin
- Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay.
| |
Collapse
|
19
|
Sánchez J, López-Villaseñor I. A simple model to explain three-base periodicity in coding DNA. FEBS Lett 2006; 580:6413-22. [PMID: 17097640 DOI: 10.1016/j.febslet.2006.10.056] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2006] [Revised: 10/04/2006] [Accepted: 10/19/2006] [Indexed: 11/26/2022]
Abstract
A simple model is put forward to explain the long-known three-base periodicity in coding DNA. We propose the concept of same-phase triplet clustering, i.e. a condition wherein a triplet appears several times in one phase without interruption by the two other possible phases. For instance, in the sequence (i): NTT_GNN_NTT_GNN_NTT_GNN_NNN_NTT_GNN (where N is any nucleotide but combinations producing TTG are excluded) there would be clustering of same-phase TTG because this triplet appears uninterruptedly in phase 2. In contrast, in the sequence (ii): TTG_NTT_GNN_NNT_TGN_NNN_NTT_GNN there is no same-phase clustering because neighboring TTGs are all in different phases. Observe also that in sequence (i) TTG triplets are separated by 3, 3 and 6 nucleotides (3n distances), while in sequence (ii) they are separated by 1, 4 and 5 nucleotides (non-3n distances). In this work, we demonstrate that in coding DNA the 3n distances generated by (i)-type sequences proportionally outnumber the non-3n distances generated by (ii)-type sequences, this condition would be the basis of three-base periodicity. Randomized sequences had (i)- and (ii)-type sequences too but clustering was statistically different. To prove our model we generated (i)-type sequences in a randomized sequence by inducing clustering of same-phase triplets. In agreement with the model this sequence displayed three-base periodicity. Furthermore, two- and four-base periodicities could also be induced by artificially inducing clustering of duplets and tetraplets.
Collapse
Affiliation(s)
- Joaquín Sánchez
- Facultad de Medicina, UAEM, Av. Universidad 1001, Cuernavaca, Morelos, CP 62210, México D.F., Mexico.
| | | |
Collapse
|
20
|
Qu HQ, Lawrence SG, Guo F, Majewski J, Polychronakos C. Strand bias in complementary single-nucleotide polymorphisms of transcribed human sequences: evidence for functional effects of synonymous polymorphisms. BMC Genomics 2006; 7:213. [PMID: 16916449 PMCID: PMC1559705 DOI: 10.1186/1471-2164-7-213] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2006] [Accepted: 08/17/2006] [Indexed: 11/25/2022] Open
Abstract
Background Complementary single-nucleotide polymorphisms (SNPs) may not be distributed equally between two DNA strands if the strands are functionally distinct, such as in transcribed genes. In introns, an excess of A↔G over the complementary C↔T substitutions had previously been found and attributed to transcription-coupled repair (TCR), demonstrating the valuable functional clues that can be obtained by studying such asymmetry. Here we studied asymmetry of human synonymous SNPs (sSNPs) in the fourfold degenerate (FFD) sites as compared to intronic SNPs (iSNPs). Results The identities of the ancestral bases and the direction of mutations were inferred from human-chimpanzee genomic alignment. After correction for background nucleotide composition, excess of A→G over the complementary T→C polymorphisms, which was observed previously and can be explained by TCR, was confirmed in FFD SNPs and iSNPs. However, when SNPs were separately examined according to whether they mapped to a CpG dinucleotide or not, an excess of C→T over G→A polymorphisms was found in non-CpG site FFD SNPs but was absent from iSNPs and CpG site FFD SNPs. Conclusion The genome-wide discrepancy of human FFD SNPs provides novel evidence for widespread selective pressure due to functional effects of sSNPs. The similar asymmetry pattern of FFD SNPs and iSNPs that map to a CpG can be explained by transcription-coupled mechanisms, including TCR and transcription-coupled mutation. Because of the hypermutability of CpG sites, more CpG site FFD SNPs are relatively younger and have confronted less selection effect than non-CpG FFD SNPs, which can explain the asymmetric discrepancy of CpG site FFD SNPs vs. non-CpG site FFD SNPs.
Collapse
Affiliation(s)
- Hui-Qi Qu
- Endocrine Genetics Laboratory, The McGill University Health Center (Montreal Children's Hospital), Montréal, Québec, Canada
| | - Steve G Lawrence
- Department of Human Genetics, McGill University, Montréal, Québec, Canada
| | - Fan Guo
- Endocrine Genetics Laboratory, The McGill University Health Center (Montreal Children's Hospital), Montréal, Québec, Canada
| | - Jacek Majewski
- Department of Human Genetics, McGill University, Montréal, Québec, Canada
| | - Constantin Polychronakos
- Endocrine Genetics Laboratory, The McGill University Health Center (Montreal Children's Hospital), Montréal, Québec, Canada
- Department of Pediatrics, The McGill University Health Center (Montreal Children's Hospital), 2300 Tupper, Montréal, Québec H3H 1P3, Canada
| |
Collapse
|
21
|
Chamary JV, Parmley JL, Hurst LD. Hearing silence: non-neutral evolution at synonymous sites in mammals. Nat Rev Genet 2006; 7:98-108. [PMID: 16418745 DOI: 10.1038/nrg1770] [Citation(s) in RCA: 607] [Impact Index Per Article: 31.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Although the assumption of the neutral theory of molecular evolution - that some classes of mutation have too small an effect on fitness to be affected by natural selection - seems intuitively reasonable, over the past few decades the theory has been in retreat. At least in species with large populations, even synonymous mutations in exons are not neutral. By contrast, in mammals, neutrality of these mutations is still commonly assumed. However, new evidence indicates that even some synonymous mutations are subject to constraint, often because they affect splicing and/or mRNA stability. This has implications for understanding disease, optimizing transgene design, detecting positive selection and estimating the mutation rate.
Collapse
Affiliation(s)
- J V Chamary
- Center for Integrative Genomics, University of Lausanne, Switzerland.
| | | | | |
Collapse
|
22
|
Zhao H, Li QZ, Zeng CQ, Yang HM, Yu J. Neighboring-nucleotide effects on the mutation patterns of the rice genome. GENOMICS PROTEOMICS & BIOINFORMATICS 2006; 3:158-68. [PMID: 16487081 PMCID: PMC5172528 DOI: 10.1016/s1672-0229(05)03021-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
DNA composition dynamics across genomes of diverse taxonomy is a major subject of genome analyses. DNA composition changes are characteristics of both replication and repair machineries. We investigated 3,611,007 single nucleotide polymorphisms (SNPs) generated by comparing two sequenced rice genomes from distant inbred lines (subspecies), including those from 242,811 introns and 45,462 protein-coding sequences (CDSs). Neighboring-nucleotide effects (NNEs) of these SNPs are diverse, depending on structural content-based classifications (genome-wide, intronic, and CDS) and sequence context-based categories (A/C, A/G, A/T, C/G, C/T, and G/T substitutions) of the analyzed SNPs. Strong and evident NNEs and nucleotide proportion biases surrounding the analyzed SNPs were observed in 1−3 bp sequences on both sides of an SNP. Strong biases were observed around neighboring nucleotides of protein-coding SNPs, which exhibit a periodicity of three in nucleotide content, constrained by a combined effect of codon-related rules and DNA repair mechanisms. Unlike a previous finding in the human genome, we found negative correlation between GC contents of chromosomes and the magnitude of corresponding bias of nucleotide C at −1 site and G at +1 site. These results will further our understanding of the mutation mechanism in rice as well as its evolutionary implications.
Collapse
Affiliation(s)
- Hui Zhao
- Beijing Genomics Institute, Chinese Academy of Sciences (CAS), Beijing 101300, China
- Graduate School, CAS, Beijing 100039, China
| | - Qi-Zhai Li
- Beijing Genomics Institute, Chinese Academy of Sciences (CAS), Beijing 101300, China
- Academy of Mathematics and Systems Science, CAS, Beijing 100080, China
- Graduate School, CAS, Beijing 100039, China
| | - Chang-Qing Zeng
- Beijing Genomics Institute, Chinese Academy of Sciences (CAS), Beijing 101300, China
| | - Huan-Ming Yang
- Beijing Genomics Institute, Chinese Academy of Sciences (CAS), Beijing 101300, China
- Corresponding authors.
| | - Jun Yu
- Beijing Genomics Institute, Chinese Academy of Sciences (CAS), Beijing 101300, China
- Corresponding authors.
| |
Collapse
|
23
|
Siwach P, Pophaly SD, Ganesh S. Genomic and Evolutionary Insights into Genes Encoding Proteins with Single Amino Acid Repeats. Mol Biol Evol 2006; 23:1357-69. [PMID: 16618963 DOI: 10.1093/molbev/msk022] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Mutations causing expansion of amino acid repeats are responsible for 19 hereditary disorders. Repeats in several other proteins also show length variations. These observations prompted us to identify single amino acid repeat-containing proteins (SARPs) in humans and to understand their functional and evolutionary significance. We identified 8812 SARPs containing 17 146 repeat domains, each harboring 4 or more residues. In all, 5% of SARPs (471) showed repeat length variations, and nearly 84% of them (394) have repeats of 10 residues or less. We find that SARPs are involved in functions that require formation of multiprotein complexes. Nearly 78% (6859) of the SARPs did not find a paralogue in the human proteome, and such proteins are considered as orphan SARPs. Orphan SARPs show longer repeat stretches, longer peptide length, and lower expression levels as compared with SARPs belonging to protein family. Because the intensity of gene expression is known to relate inversely with the rate of protein sequence evolution, our results suggest that the orphan SARPs evolve faster than the familial forms and therefore are under a weaker selection pressure. We also find that while GC-rich codons are favored for coding the repeat tracts of SARPs, specific codons and not nucleotide motifs per se are selected, suggesting functional constraints placed on the usage of codons. One of the constraints could be the mRNA stability as clustering of rare codons is known to destabilize the transcripts and rare codons are not favored for coding repeat tracts. Genes encoding polymorphic SARPs show preferential localization toward the telomeric segments. Further, the sex-specific recombination rates of the chromosomal locus strongly correlate with the parental gender that influence the repeat instability in disorder caused by dynamic mutation. Therefore, instability associated with repeats might be driven by processes that are specific to sperm or oocyte development, and the recombination frequency might play a positive role in this process.
Collapse
Affiliation(s)
- Pratibha Siwach
- Department of Biological Sciences and Bioengineering, Indian Institute of Technology, Kanpur, India
| | | | | |
Collapse
|
24
|
Current awareness on yeast. Yeast 2005; 22:1249-56. [PMID: 16320446 DOI: 10.1002/yea.1170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
25
|
Chamary JV, Hurst LD. Evidence for selection on synonymous mutations affecting stability of mRNA secondary structure in mammals. Genome Biol 2005; 6:R75. [PMID: 16168082 PMCID: PMC1242210 DOI: 10.1186/gb-2005-6-9-r75] [Citation(s) in RCA: 238] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2005] [Revised: 06/08/2005] [Accepted: 07/20/2005] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND In mammals, contrary to what is usually assumed, recent evidence suggests that synonymous mutations may not be selectively neutral. This position has proven contentious, not least because of the absence of a viable mechanism. Here we test whether synonymous mutations might be under selection owing to their effects on the thermodynamic stability of mRNA, mediated by changes in secondary structure. RESULTS We provide numerous lines of evidence that are all consistent with the above hypothesis. Most notably, by simulating evolution and reallocating the substitutions observed in the mouse lineage, we show that the location of synonymous mutations is non-random with respect to stability. Importantly, the preference for cytosine at 4-fold degenerate sites, diagnostic of selection, can be explained by its effect on mRNA stability. Likewise, by interchanging synonymous codons, we find naturally occurring mRNAs to be more stable than simulant transcripts. Housekeeping genes, whose proteins are under strong purifying selection, are also under the greatest pressure to maintain stability. CONCLUSION Taken together, our results provide evidence that, in mammals, synonymous sites do not evolve neutrally, at least in part owing to selection on mRNA stability. This has implications for the application of synonymous divergence in estimating the mutation rate.
Collapse
Affiliation(s)
- JV Chamary
- Department of Biology and Biochemistry, University of Bath, Bath BA2 7AY, UK
| | - Laurence D Hurst
- Department of Biology and Biochemistry, University of Bath, Bath BA2 7AY, UK
| |
Collapse
|