1
|
Hernández G, García A, Weingarten-Gabbay S, Mishra R, Hussain T, Amiri M, Moreno-Hagelsieb G, Montiel-Dávalos A, Lasko P, Sonenberg N. Functional analysis of the AUG initiator codon context reveals novel conserved sequences that disfavor mRNA translation in eukaryotes. Nucleic Acids Res 2024; 52:1064-1079. [PMID: 38038264 PMCID: PMC10853783 DOI: 10.1093/nar/gkad1152] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 11/09/2023] [Accepted: 11/15/2023] [Indexed: 12/02/2023] Open
Abstract
mRNA translation is a fundamental process for life. Selection of the translation initiation site (TIS) is crucial, as it establishes the correct open reading frame for mRNA decoding. Studies in vertebrate mRNAs discovered that a purine at -3 and a G at +4 (where A of the AUG initiator codon is numbered + 1), promote TIS recognition. However, the TIS context in other eukaryotes has been poorly experimentally analyzed. We analyzed in vitro the influence of the -3, -2, -1 and + 4 positions of the TIS context in rabbit, Drosophila, wheat, and yeast. We observed that -3A conferred the best translational efficiency across these species. However, we found variability at the + 4 position for optimal translation. In addition, the Kozak motif that was defined from mammalian cells was only weakly predictive for wheat and essentially non-predictive for yeast. We discovered eight conserved sequences that significantly disfavored translation. Due to the big differences in translational efficiency observed among weak TIS context sequences, we define a novel category that we termed 'barren AUG context sequences (BACS)', which represent sequences disfavoring translation. Analysis of mRNA-ribosomal complexes structures provided insights into the function of BACS. The gene ontology of the BACS-containing mRNAs is presented.
Collapse
Affiliation(s)
- Greco Hernández
- mRNA and Cancer Laboratory, Unit of Biomedical Research on Cancer, National Institute of Cancer (INCan), Mexico City 14080, Mexico
| | - Alejandra García
- mRNA and Cancer Laboratory, Unit of Biomedical Research on Cancer, National Institute of Cancer (INCan), Mexico City 14080, Mexico
| | - Shira Weingarten-Gabbay
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Laboratory of Virology and Infectious Disease, The Rockefeller University, New York, NY, USA
| | - Rishi Kumar Mishra
- Department of Developmental Biology and Genetics, Indian Institute of Science, Bengaluru-560012, India
| | - Tanweer Hussain
- Department of Developmental Biology and Genetics, Indian Institute of Science, Bengaluru-560012, India
| | - Mehdi Amiri
- Department of Biochemistry and Goodman Cancer Institute. McGill University., Montreal, QC H3A 1A3, Canada
| | - Gabriel Moreno-Hagelsieb
- Department of Biology, Wilfrid Laurier University. 75 University Ave. W, Waterloo, ON N2L 3C5, Canada
| | - Angélica Montiel-Dávalos
- mRNA and Cancer Laboratory, Unit of Biomedical Research on Cancer, National Institute of Cancer (INCan), Mexico City 14080, Mexico
| | - Paul Lasko
- Department of Biology, McGill University. Montreal, QC H3G 0B1, Canada
| | - Nahum Sonenberg
- Department of Biochemistry and Goodman Cancer Institute. McGill University., Montreal, QC H3A 1A3, Canada
| |
Collapse
|
2
|
Hernández G, Osnaya VG, Pérez-Martínez X. Conservation and Variability of the AUG Initiation Codon Context in Eukaryotes. Trends Biochem Sci 2019; 44:1009-1021. [PMID: 31353284 DOI: 10.1016/j.tibs.2019.07.001] [Citation(s) in RCA: 71] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2018] [Revised: 06/22/2019] [Accepted: 07/02/2019] [Indexed: 01/30/2023]
Abstract
Selection of the translation initiation site (TIS) is a crucial step during translation. In the 1980s Marylin Kozak performed key studies on vertebrate mRNAs to characterize the optimal TIS consensus sequence, the Kozak motif. Within this motif, conservation of nucleotides in crucial positions, namely a purine at -3 and a G at +4 (where the A of the AUG is numbered +1), is essential for TIS recognition. Ever since its characterization the Kozak motif has been regarded as the optimal sequence to initiate translation in all eukaryotes. We revisit here published in silico data on TIS consensus sequences, as well as experimental studies from diverse eukaryotic lineages, and propose that, while the -3A/G position is universally conserved, the remaining variability of the consensus sequences enables their classification as optimal, strong, and moderate TIS sequences.
Collapse
Affiliation(s)
- Greco Hernández
- Translation and Cancer Laboratory, Unit of Biomedical Research on Cancer, National Institute of Cancer (Instituto Nacional de Cancerología, INCan), 22 San Fernando Avenue, Tlalpan, 14080 Mexico City, Mexico.
| | - Vincent G Osnaya
- Translation and Cancer Laboratory, Unit of Biomedical Research on Cancer, National Institute of Cancer (Instituto Nacional de Cancerología, INCan), 22 San Fernando Avenue, Tlalpan, 14080 Mexico City, Mexico
| | - Xochitl Pérez-Martínez
- Department of Molecular Genetics, Cell Physiology Institute (Instituto de Fisiología Celular), Universidad Nacional Autónoma de México (UNAM), 04510 Mexico City, Mexico
| |
Collapse
|
3
|
Cuperus JT, Groves B, Kuchina A, Rosenberg AB, Jojic N, Fields S, Seelig G. Deep learning of the regulatory grammar of yeast 5' untranslated regions from 500,000 random sequences. Genome Res 2017; 27:2015-2024. [PMID: 29097404 PMCID: PMC5741052 DOI: 10.1101/gr.224964.117] [Citation(s) in RCA: 126] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2017] [Accepted: 10/18/2017] [Indexed: 11/25/2022]
Abstract
Our ability to predict protein expression from DNA sequence alone remains poor, reflecting our limited understanding of cis-regulatory grammar and hampering the design of engineered genes for synthetic biology applications. Here, we generate a model that predicts the protein expression of the 5′ untranslated region (UTR) of mRNAs in the yeast Saccharomyces cerevisiae. We constructed a library of half a million 50-nucleotide-long random 5′ UTRs and assayed their activity in a massively parallel growth selection experiment. The resulting data allow us to quantify the impact on protein expression of Kozak sequence composition, upstream open reading frames (uORFs), and secondary structure. We trained a convolutional neural network (CNN) on the random library and showed that it performs well at predicting the protein expression of both a held-out set of the random 5′ UTRs as well as native S. cerevisiae 5′ UTRs. The model additionally was used to computationally evolve highly active 5′ UTRs. We confirmed experimentally that the great majority of the evolved sequences led to higher protein expression rates than the starting sequences, demonstrating the predictive power of this model.
Collapse
Affiliation(s)
- Josh T Cuperus
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| | - Benjamin Groves
- Department of Electrical Engineering, University of Washington, Seattle, Washington 98195, USA
| | - Anna Kuchina
- Department of Electrical Engineering, University of Washington, Seattle, Washington 98195, USA
| | - Alexander B Rosenberg
- Department of Electrical Engineering, University of Washington, Seattle, Washington 98195, USA
| | | | - Stanley Fields
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA.,Department of Medicine, University of Washington, Seattle, Washington 98195, USA
| | - Georg Seelig
- Department of Electrical Engineering, University of Washington, Seattle, Washington 98195, USA.,Department of Computer Science & Engineering, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
4
|
Pop C, Rouskin S, Ingolia NT, Han L, Phizicky EM, Weissman JS, Koller D. Causal signals between codon bias, mRNA structure, and the efficiency of translation and elongation. Mol Syst Biol 2014; 10:770. [PMID: 25538139 PMCID: PMC4300493 DOI: 10.15252/msb.20145524] [Citation(s) in RCA: 169] [Impact Index Per Article: 15.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
Ribosome profiling data report on the distribution of translating ribosomes, at steady-state, with codon-level resolution. We present a robust method to extract codon translation rates and protein synthesis rates from these data, and identify causal features associated with elongation and translation efficiency in physiological conditions in yeast. We show that neither elongation rate nor translational efficiency is improved by experimental manipulation of the abundance or body sequence of the rare AGG tRNA. Deletion of three of the four copies of the heavily used ACA tRNA shows a modest efficiency decrease that could be explained by other rate-reducing signals at gene start. This suggests that correlation between codon bias and efficiency arises as selection for codons to utilize translation machinery efficiently in highly translated genes. We also show a correlation between efficiency and RNA structure calculated both computationally and from recent structure probing data, as well as the Kozak initiation motif, which may comprise a mechanism to regulate initiation.
Collapse
Affiliation(s)
- Cristina Pop
- Computer Science Department, Stanford University, Stanford, CA, USA
| | - Silvi Rouskin
- Department of Cellular and Molecular Pharmacology, California Institute of Quantitative Biology, Center for RNA Systems Biology, Howard Hughes Medical Institute, University of California, San Francisco, CA, USA
| | - Nicholas T Ingolia
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Lu Han
- School of Medicine and Dentistry, University of Rochester Medical Center, Rochester, NY, USA
| | - Eric M Phizicky
- School of Medicine and Dentistry, University of Rochester Medical Center, Rochester, NY, USA
| | - Jonathan S Weissman
- Department of Cellular and Molecular Pharmacology, California Institute of Quantitative Biology, Center for RNA Systems Biology, Howard Hughes Medical Institute, University of California, San Francisco, CA, USA
| | - Daphne Koller
- Computer Science Department, Stanford University, Stanford, CA, USA
| |
Collapse
|
5
|
Deciphering the rules by which 5'-UTR sequences affect protein expression in yeast. Proc Natl Acad Sci U S A 2013; 110:E2792-801. [PMID: 23832786 DOI: 10.1073/pnas.1222534110] [Citation(s) in RCA: 178] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
The 5'-untranslated region (5'-UTR) of mRNAs contains elements that affect expression, yet the rules by which these regions exert their effect are poorly understood. Here, we studied the impact of 5'-UTR sequences on protein levels in yeast, by constructing a large-scale library of mutants that differ only in the 10 bp preceding the translational start site of a fluorescent reporter. Using a high-throughput sequencing strategy, we obtained highly accurate measurements of protein abundance for over 2,000 unique sequence variants. The resulting pool spanned an approximately sevenfold range of protein levels, demonstrating the powerful consequences of sequence manipulations of even 1-10 nucleotides immediately upstream of the start codon. We devised computational models that predicted over 70% of the measured expression variability in held-out sequence variants. Notably, a combined model of the most prominent features successfully explained protein abundance in an additional, independently constructed library, whose nucleotide composition differed greatly from the library used to parameterize the model. Our analysis reveals the dominant contribution of the start codon context at positions -3 to -1, mRNA secondary structure, and out-of-frame upstream AUGs (uAUGs) to phenotypic diversity, thereby advancing our understanding of how protein levels are modulated by 5'-UTR sequences, and paving the way toward predictably tuning protein expression through manipulations of 5'-UTRs.
Collapse
|
6
|
Schofield DA, Westwater C, Barth JL, DiNovo AA. Development of a yeast biosensor–biocatalyst for the detection and biodegradation of the organophosphate paraoxon. Appl Microbiol Biotechnol 2007; 76:1383-94. [PMID: 17665192 DOI: 10.1007/s00253-007-1107-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2007] [Revised: 06/27/2007] [Accepted: 06/28/2007] [Indexed: 10/23/2022]
Abstract
Organophosphate (OP) poisoning can occur through unintentional exposure to OP pesticides, or by the deliberate release of OP nerve agents. Consequently, there is considerable interest in the development of systems that can detect and/or biodegrade these agents. The aim of this study was to generate a prototype fluorescent reporter yeast biosensor that could detect and biodegrade the model OP pesticide, paraoxon, and subsequently detect paraoxon hydrolysis. Saccharomyces cerevisiae was engineered to hydrolyze paraoxon through the heterologous expression of the Flavobacterium species opd (organophosphate degrading) gene. Global transcription profiling was subsequently used to identify yeast genes, which were induced in the presence of paraoxon, and genes, which were associated with paraoxon hydrolysis. Paraoxon-inducible genes and genes associated with paraoxon hydrolysis were identified. Candidate paraoxon-inducible promoters were cloned and fused to the yeast-enhanced green fluorescent protein (yEGFP), and candidate promoters associated with paraoxon hydrolysis were fused to the red fluorescent protein (yDsRed). The ability of the yeast biosensor to detect paraoxon and paraoxon hydrolysis was demonstrated by the specific induction of the fluorescent reporter (yEGFP and yDsRed, respectively). Biosensors responded to paraoxon in a dose- and time-dependent manner, and detection was rapid (15 to 30 min). yDsRed induction occurred only in the recombinant opd(+) strains suggesting that yDsRed induction was strictly associated with paraoxon hydrolysis. Together, these results indicate that the yeast biocatalyst-biosensor can detect and degrade paraoxon and potentially also monitor the decontamination process.
Collapse
Affiliation(s)
- David A Schofield
- Guild Associates Inc., 1313B Ashley River Road, Charleston, SC 29407, USA.
| | | | | | | |
Collapse
|
7
|
Kurz M, Cowieson NP, Robin G, Hume DA, Martin JL, Kobe B, Listwan P. Incorporating a TEV cleavage site reduces the solubility of nine recombinant mouse proteins. Protein Expr Purif 2006; 50:68-73. [PMID: 16798010 DOI: 10.1016/j.pep.2006.05.006] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2006] [Revised: 05/12/2006] [Accepted: 05/14/2006] [Indexed: 11/22/2022]
Abstract
Failure to express soluble proteins in bacteria is mainly attributed to the properties of the target protein itself, as well as the choice of the vector, the purification tag and the linker between the tag and protein, and codon usage. The expression of proteins with fusion tags to facilitate subsequent purification steps is a widely used procedure in the production of recombinant proteins. However, the additional residues can affect the properties of the protein; therefore, it is often desirable to remove the tag after purification. This is usually done by engineering a cleavage site between the tag and the encoded protein that is recognised by a site-specific protease, such as the one from tobacco etch virus (TEV). In this study, we investigated the effect of four different tags on the bacterial expression and solubility of nine mouse proteins. Two of the four engineered constructs contained hexahistidine tags with either a long or short linker. The other two constructs contained a TEV cleavage site engineered into the linker region. Our data show that inclusion of the TEV recognition site directly downstream of the recombination site of the Invitrogen Gateway vector resulted in a loss of solubility of the nine mouse proteins. Our work suggests that one needs to be very careful when making modifications to expression vectors and combining different affinity and fusion tags and cleavage sites.
Collapse
Affiliation(s)
- Mareike Kurz
- Institute for Molecular Bioscience, University of Queensland, Brisbane, Qld 4072, Australia
| | | | | | | | | | | | | |
Collapse
|
8
|
Abstract
Great advances have been made in the past three decades in understanding the molecular mechanics underlying protein synthesis in bacteria, but our understanding of the corresponding events in eukaryotic organisms is only beginning to catch up. In this review we describe the current state of our knowledge and ignorance of the molecular mechanics underlying eukaryotic translation. We discuss the mechanisms conserved across the three kingdoms of life as well as the important divergences that have taken place in the pathway.
Collapse
Affiliation(s)
- Lee D Kapp
- Department of Biophysics and Biophysical Chemistry, Johns Hopkins University School of Medicine, 725 North Wolfe Street, Baltimore, Maryland 21205-2185, USA.
| | | |
Collapse
|
9
|
Johansson B, Christensson C, Hobley T, Hahn-Hägerdal B. Xylulokinase overexpression in two strains of Saccharomyces cerevisiae also expressing xylose reductase and xylitol dehydrogenase and its effect on fermentation of xylose and lignocellulosic hydrolysate. Appl Environ Microbiol 2001; 67:4249-55. [PMID: 11526030 PMCID: PMC93154 DOI: 10.1128/aem.67.9.4249-4255.2001] [Citation(s) in RCA: 91] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2001] [Accepted: 06/21/2001] [Indexed: 11/20/2022] Open
Abstract
Fermentation of the pentose sugar xylose to ethanol in lignocellulosic biomass would make bioethanol production economically more competitive. Saccharomyces cerevisiae, an efficient ethanol producer, can utilize xylose only when expressing the heterologous genes XYL1 (xylose reductase) and XYL2 (xylitol dehydrogenase). Xylose reductase and xylitol dehydrogenase convert xylose to its isomer xylulose. The gene XKS1 encodes the xylulose-phosphorylating enzyme xylulokinase. In this study, we determined the effect of XKS1 overexpression on two different S. cerevisiae host strains, H158 and CEN.PK, also expressing XYL1 and XYL2. H158 has been previously used as a host strain for the construction of recombinant xylose-utilizing S. cerevisiae strains. CEN.PK is a new strain specifically developed to serve as a host strain for the development of metabolic engineering strategies. Fermentation was carried out in defined and complex media containing a hexose and pentose sugar mixture or a birch wood lignocellulosic hydrolysate. XKS1 overexpression increased the ethanol yield by a factor of 2 and reduced the xylitol yield by 70 to 100% and the final acetate concentrations by 50 to 100%. However, XKS1 overexpression reduced the total xylose consumption by half for CEN.PK and to as little as one-fifth for H158. Yeast extract and peptone partly restored sugar consumption in hydrolysate medium. CEN.PK consumed more xylose but produced more xylitol than H158 and thus gave lower ethanol yields on consumed xylose. The results demonstrate that strain background and modulation of XKS1 expression are important for generating an efficient xylose-fermenting recombinant strain of S. cerevisiae.
Collapse
Affiliation(s)
- B Johansson
- Department of Applied Microbiology, Lund University, 221 00 Lund, Sweden
| | | | | | | |
Collapse
|
10
|
Abstract
The relationship between the codon usage bias and the sequence context surrounding the AUG translation initiation codon was examined in 211 Saccharomyces cerevisiae mRNA sequences. The codon usage bias and the number of matches to optimal AUG context, (A/U)A(A/C)AA(A/C)AUGUC(U/C), for translation initiation showed a positive relationship, indicating that these two factors are evolutionally under the similar natural selection constraint at the translation level. A new index (AUGCAI = AUG Context Adaptation Index) for the measure of optimal AUG context was devised, and the importance of each position of AUG context was also examined.
Collapse
Affiliation(s)
- H Miyasaka
- Kansai Electric Power Co., Environmental Research Center, Hyogo, Japan.
| |
Collapse
|
11
|
Ranjan A, Hasnain SE. Influence of codon usage and translational initiation codon context in the AcNPV-based expression system: computer analysis using homologous and heterologous genes. Virus Genes 1995; 9:149-53. [PMID: 7537424 DOI: 10.1007/bf01702657] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Codon usage by all the known gene sequences from Autographa californica nuclear polyhedrosis virus (AcNPV) was compared with that of firefly luciferase (luc) and the beta subunit of human chorionic gonadotropin (beta hCG) expressed to contrasting levels in the baculovirus system. The highly expressed luc gene showed a codon usage similar to AcNPV genes, as reflected by a very low D-squared statistic value (0.78) and a similar G/C usage (45%) at wobble positions. However, the underexpressed beta hCG gene displayed a high D-squared value (7.3) and G/C usage (82.5%) at the wobble base position. Alignment of the 20 nucleotides around the initiation codon of 23 AcNPV genes identified a novel consensus translation initiation sequence aag/ta/tat/aa/cAAaATGaa/ct/ag/aAan, which was quite different from the Kozak consensus sequence (GCC)GCCA/GCCATGG. An extension of these analyses to a sample of other heterologous genes overexpressed and underexpressed in BEVS suggested similar trends. These theoretical analyses have important implications for heterologous gene expression in this system.
Collapse
Affiliation(s)
- A Ranjan
- Eukaryotic Gene Expression Laboratory, National Institute of Immunology, New Delhi-India
| | | |
Collapse
|
12
|
Abstract
This article reviews current knowledge on the mechanisms affecting the fidelity of initiation codon selection, and discusses the effects of structural features in the 5′-non-coding region on the efficiency of translation of messenger RNA molecules.
Collapse
Affiliation(s)
- M Kozak
- Department of Biochemistry, University of Medicine and Dentistry of New Jersey, Piscataway 08854
| |
Collapse
|