1
|
Lobel JH, Ingolia NT. Precise measurement of molecular phenotypes with barcode-based CRISPRi systems. Genome Biol 2025; 26:142. [PMID: 40414878 DOI: 10.1186/s13059-025-03610-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2024] [Accepted: 05/07/2025] [Indexed: 05/27/2025] Open
Abstract
Genome-wide CRISPR-Cas9 screens have untangled regulatory networks driving diverse biological processes. Their success relies on interrogating specific molecular phenotypes and distinguishing key regulators from background effects. Here, we realize these goals by optimizing CRISPR interference with barcoded expression reporter sequencing (CiBER-seq) to dramatically improve the sensitivity and scope of genome-wide screens. We systematically address technical factors that distort phenotypic measurements by normalizing expression reporters against closely matched promoters. We use our improved CiBER-seq to accurately capture known components of well-studied RNA and protein quality control systems. These results demonstrate the precision and versatility of CiBER-seq for dissecting cellular pathways.
Collapse
Affiliation(s)
- Joseph H Lobel
- Department of Molecular and Cell Biology, University of California Berkeley, Berkeley, CA, 94720, USA
| | - Nicholas T Ingolia
- Department of Molecular and Cell Biology, University of California Berkeley, Berkeley, CA, 94720, USA.
- Center for Computational Biology and California Institute for Quantitative Biosciences, University of California, Berkeley, Berkeley, CA, 94720, USA.
| |
Collapse
|
2
|
Zhang J, Shao W, Xu Y, Tian F, Chen J, Wang D, Lin X, He C, Yang X, Staiger D, Ding Y, Yu X, Xiao J. Unveiling the regulatory role of GRP7 in ABA signal-mediated mRNA translation efficiency regulation. Nat Commun 2025; 16:3947. [PMID: 40287405 PMCID: PMC12033289 DOI: 10.1038/s41467-025-59329-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Accepted: 04/15/2025] [Indexed: 04/29/2025] Open
Abstract
Abscisic acid (ABA) is a crucial phytohormone involved in plant growth and stress responses. While the transcriptional regulation triggered by ABA is well-documented, its effects on translational regulation have been less studied. Through Ribo-seq and RNA-seq analyses, we find that ABA treatment not only influences gene expression at the mRNA level but also significantly impacts mRNA translation efficiency (TE) in Arabidopsis thaliana. ABA inhibits global mRNA translation via its core signaling pathway, which includes ABA receptors, protein phosphatase 2Cs (PP2Cs), and SNF1-related protein kinase 2 s (SnRK2s). Upon ABA treatment, Glycine-rich RNA-binding proteins 7 and 8 (GRP7&8) protein levels decrease due to both reduced mRNA level and decreased TE, which diminishes their association with polysomes and leads to a global decline in mRNA TE. The absence of GRP7&8 results in a global impairment of ABA-regulated translational changes, linking ABA signaling to GRP7-dependent modulation of mRNA translation. The regulation of GRP7 on TE relies significantly on its direct binding to target mRNAs. Moreover, mRNA translation efficiency under drought stress is partially dependent on the ABA-GRP7&8 pathways. Collectively, our study reveals GRP7's role downstream of SnRK2s in mediating translation regulation in ABA signaling, offering a model for ABA-triggered multi-route regulation of environmental adaptation.
Collapse
Affiliation(s)
- Jing Zhang
- Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Wenna Shao
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Yongxin Xu
- Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Fa'an Tian
- University of Chinese Academy of Sciences, Beijing, China
- Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China
| | - Jinchao Chen
- Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Dongzhi Wang
- Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China
| | - Xuelei Lin
- Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China
| | | | - Xiaofei Yang
- John Innes Centre, Norwich Research Park, Norwich, UK
- Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Dorothee Staiger
- RNA Biology and Molecular Physiology, Faculty of Biology, Bielefeld University, Bielefeld, Germany
| | - Yiliang Ding
- John Innes Centre, Norwich Research Park, Norwich, UK
| | - Xiang Yu
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China.
| | - Jun Xiao
- Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China.
- University of Chinese Academy of Sciences, Beijing, China.
- CAS-JIC Centre of Excellence for Plant and Microbial Science (CEPAMS), Institute of Genetics and Developmental Biology, CAS, Beijing, China.
| |
Collapse
|
3
|
Fan X, Chang T, Chen C, Hafner M, Wang Z. Analysis of RNA translation with a deep learning architecture provides new insight into translation control. Nucleic Acids Res 2025; 53:gkaf277. [PMID: 40219965 PMCID: PMC11992669 DOI: 10.1093/nar/gkaf277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Revised: 02/20/2025] [Accepted: 04/01/2025] [Indexed: 04/14/2025] Open
Abstract
Accurate annotation of coding regions in RNAs is essential for understanding gene translation. We developed a deep neural network to directly predict and analyze translation initiation and termination sites from RNA sequences. Trained with human transcripts, our model learned hidden rules of translation control and achieved a near perfect prediction of canonical translation sites across entire human transcriptome. Surprisingly, this model revealed a new role of codon usage in regulating translation termination, which was experimentally validated. We also identified thousands of new open reading frames in mRNAs or lncRNAs, some of which were confirmed experimentally. The model trained with human mRNAs achieved high prediction accuracy of canonical translation sites in all eukaryotes and good prediction in polycistronic transcripts from prokaryotes or RNA viruses, suggesting a high degree of conservation in translation control. Collectively, we present TranslationAI (https://www.biosino.org/TranslationAI/), a general and efficient deep learning model for RNA translation that generates new insights into the complexity of translation regulation.
Collapse
Affiliation(s)
- Xiaojuan Fan
- Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
- RNA Molecular Biology Laboratory, National Institute of Arthritis and Musculoskeletal and Skin Disease, Bethesda, MD 20814, United States
| | - Tiangen Chang
- Laboratory of Cancer Data Science, National Cancer Institute, Bethesda, MD 20814, United States
| | - Chuyun Chen
- Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Markus Hafner
- RNA Molecular Biology Laboratory, National Institute of Arthritis and Musculoskeletal and Skin Disease, Bethesda, MD 20814, United States
| | - Zefeng Wang
- Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
- School of Life Science, Southern University of Science and Technology, Shenzhen, Guangdong, 518055, China
| |
Collapse
|
4
|
Ravi S, Sharma T, Yip M, Yang H, Xie J, Gao G, Tai PL. A deep learning model trained on expressed transcripts across different tissue types reveals cell-type codon-optimization preferences. Nucleic Acids Res 2025; 53:gkaf233. [PMID: 40156867 PMCID: PMC11954528 DOI: 10.1093/nar/gkaf233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2024] [Revised: 03/03/2025] [Accepted: 03/28/2025] [Indexed: 04/01/2025] Open
Abstract
Species-specific differences in protein translation can affect the design of protein-based drugs. Consequently, efficient expression of recombinant proteins often requires codon optimization. Publicly available optimization tools do not always result in higher expression levels and can lead to protein misfolding and reduced expression. Here, we aimed to develop a novel deep learning (DL) tool using a recurrent neural network (RNN) to define cell type-dependent codon biases. Using gene expression data from three different tissue types (brain, liver, and muscle) and all secretory genes, we trained DL models to predict optimal codon usage. Codon-optimized sequences for test reporter genes exhibited enhanced protein expression compared to their original sequences and those optimized using a publicly available tool. Interestingly, DL models trained on genes expressed in liver cells (hepatocytes) resulted in the highest levels of expression when tested in vitro, irrespective of the cell type. Our findings also demonstrate that DL-based codon optimization algorithms can significantly enhance protein translation, particularly for secretory proteins, which are crucial for therapeutic applications. This research represents a novel approach to codon optimization with broader implications for protein-based pharmaceuticals, vaccine manufacturing, gene therapy, and other recombinant DNA products.
Collapse
Affiliation(s)
- Sandhiya Ravi
- Department of Genetic and Cellular Medicine, UMass Chan Medical School, Worcester, MA 01605, United States
- Department of Microbiology, UMass Chan Medical School, Worcester, MA 01605, United States
| | - Tapan Sharma
- Department of Genetic and Cellular Medicine, UMass Chan Medical School, Worcester, MA 01605, United States
- Department of Microbiology, UMass Chan Medical School, Worcester, MA 01605, United States
| | - Mitchell Yip
- Department of Genetic and Cellular Medicine, UMass Chan Medical School, Worcester, MA 01605, United States
| | - Huiya Yang
- Department of Genetic and Cellular Medicine, UMass Chan Medical School, Worcester, MA 01605, United States
| | - Jun Xie
- Department of Genetic and Cellular Medicine, UMass Chan Medical School, Worcester, MA 01605, United States
- Department of Microbiology, UMass Chan Medical School, Worcester, MA 01605, United States
| | - Guangping Gao
- Department of Genetic and Cellular Medicine, UMass Chan Medical School, Worcester, MA 01605, United States
- Department of Microbiology, UMass Chan Medical School, Worcester, MA 01605, United States
- Li Weibo Institute of Rare Diseases Research, UMass Chan Medical School, Worcester, MA 01605, United States
| | - Phillip W L Tai
- Department of Genetic and Cellular Medicine, UMass Chan Medical School, Worcester, MA 01605, United States
- Department of Microbiology, UMass Chan Medical School, Worcester, MA 01605, United States
- Li Weibo Institute of Rare Diseases Research, UMass Chan Medical School, Worcester, MA 01605, United States
| |
Collapse
|
5
|
Ferguson L, Upton HE, Pimentel SC, Jeans C, Ingolia NT, Collins K. Improved precision, sensitivity, and adaptability of ordered two-template relay cDNA library preparation for RNA sequencing. RNA (NEW YORK, N.Y.) 2025; 31:224-244. [PMID: 39626888 PMCID: PMC11789487 DOI: 10.1261/rna.080318.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2024] [Accepted: 11/13/2024] [Indexed: 12/11/2024]
Abstract
Sequencing RNAs that are biologically processed or degraded to less than ∼100 nt typically involves multistep, low-yield protocols with bias and information loss inherent to ligation and/or polynucleotide tailing. We recently introduced ordered two-template relay (OTTR), a method that captures obligatorily end-to-end sequences of input molecules and, in the same reverse transcription step, also appends 5' and 3' sequencing adapters of choice. OTTR has been thoroughly benchmarked for optimal production of microRNA, tRNA and tRNA fragments, and ribosome-protected mRNA footprint libraries. Here we sought to characterize, quantify, and ameliorate any remaining bias or imprecision in the end-to-end capture of RNA sequences. We introduce new metrics for the evaluation of sequence capture and use them to optimize reaction buffers, reverse transcriptase sequence, adapter oligonucleotides, and overall workflow. Modifications of the reverse transcriptase and adapter oligonucleotides increased the 3' and 5' end-precision of sequence capture and minimized overall library bias. Improvements in recombinant expression and purification of the truncated Bombyx mori R2 reverse transcriptase used in OTTR reduced nonproductive sequencing reads by minimizing bacterial nucleic acids that compete with low-input RNA molecules for cDNA synthesis, such that with miRNA input of 3 pg (<1 fmol), fewer than 10% of sequencing reads are bacterial nucleic acid contaminants. We also introduce a rapid, automation-compatible OTTR protocol that enables gel-free, length-agnostic enrichment of cDNA duplexes from unwanted adapter-only side products. Overall, this work informs considerations for unbiased end-to-end capture and annotation of RNAs independent of their sequence, structure, or posttranscriptional modifications.
Collapse
Affiliation(s)
- Lucas Ferguson
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, California 94720, USA
- Center for Computational Biology, University of California, Berkeley, Berkeley, California 94720, USA
| | - Heather E Upton
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, California 94720, USA
| | - Sydney C Pimentel
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, California 94720, USA
| | - Chris Jeans
- MacroLab, University of California, Berkeley, Berkeley, California 94720, USA
- California Institute for Quantitative Biosciences, University of California, Berkeley, Berkeley, California 94720, USA
| | - Nicholas T Ingolia
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, California 94720, USA
- California Institute for Quantitative Biosciences, University of California, Berkeley, Berkeley, California 94720, USA
| | - Kathleen Collins
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, California 94720, USA
- California Institute for Quantitative Biosciences, University of California, Berkeley, Berkeley, California 94720, USA
| |
Collapse
|
6
|
Sidi T, Bahiri-Elitzur S, Tuller T, Kolodny R. Predicting gene sequences with AI to study codon usage patterns. Proc Natl Acad Sci U S A 2025; 122:e2410003121. [PMID: 39739812 PMCID: PMC11725940 DOI: 10.1073/pnas.2410003121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Accepted: 11/27/2024] [Indexed: 01/02/2025] Open
Abstract
Selective pressure acts on the codon use, optimizing multiple, overlapping signals that are only partially understood. We trained AI models to predict codons given their amino acid sequence in the eukaryotes Saccharomyces cerevisiae and Schizosaccharomyces pombe and the bacteria Escherichia coli and Bacillus subtilis to study the extent to which we can learn patterns in naturally occurring codons to improve predictions. We trained our models on a subset of the proteins and evaluated their predictions on large, separate sets of proteins of varying lengths and expression levels. Our models significantly outperformed naïve frequency-based approaches, demonstrating that there are learnable dependencies in evolutionary-selected codon usage. The prediction accuracy advantage of our models is greater for highly expressed genes and is greater in bacteria than eukaryotes, supporting the hypothesis that there is a monotonic relationship between selective pressure for complex codon patterns and effective population size. In S. cerevisiae and bacteria, our models were more accurate for longer proteins, suggesting that the learned patterns may be related to cotranslational folding. Gene functionality and conservation were also important determinants that affect the performance of our models. Finally, we showed that using information encoded in homologous proteins has only a minor effect on prediction accuracy, perhaps due to complex codon-usage codes in genes undergoing rapid evolution. Our study employing contemporary AI methods offers a unique perspective and a deep-learning-based prediction tool for evolutionary-selected codons. We hope that these can be useful to optimize codon usage in endogenous and heterologous proteins.
Collapse
Affiliation(s)
- Tomer Sidi
- Department of Computer Science, University of Haifa, Haifa3303221, Israel
| | - Shir Bahiri-Elitzur
- Department of Biomedical Engineering, Tel-Aviv University, Tel Aviv6139001, Israel
| | - Tamir Tuller
- Department of Biomedical Engineering, Tel-Aviv University, Tel Aviv6139001, Israel
- The Sagol School of Neuroscience, Tel-Aviv University, Tel Aviv6139001, Israel
| | - Rachel Kolodny
- Department of Computer Science, University of Haifa, Haifa3303221, Israel
| |
Collapse
|
7
|
Ferguson L, Upton HE, Pimentel SC, Jeans C, Ingolia NT, Collins K. Improved precision, sensitivity, and adaptability of Ordered Two-Template Relay cDNA library preparation for RNA sequencing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.09.622813. [PMID: 39574714 PMCID: PMC11581009 DOI: 10.1101/2024.11.09.622813] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
Sequencing RNAs that are biologically processed or degraded to less than ~100 nucleotides typically involves multi-step, low-yield protocols with bias and information loss inherent to ligation and/or polynucleotide tailing. We recently introduced Ordered Two-Template Relay (OTTR), a method that captures obligatorily end-to-end sequences of input molecules and, in the same reverse transcription step, also appends 5' and 3' sequencing adapters of choice. OTTR has been thoroughly benchmarked for optimal production of microRNA, tRNA and tRNA fragments, and ribosome-protected mRNA footprint libraries. Here we sought to characterize, quantify, and ameliorate any remaining bias or imprecision in the end-to-end capture of RNA sequences. We introduce new metrics for the evaluation of sequence capture and use them to optimize reaction buffers, reverse transcriptase sequence, adapter oligonucleotides, and overall workflow. Modifications of the reverse transcriptase and adapter oligonucleotides increased the 3' and 5' end-precision of sequence capture and minimized overall library bias. Improvements in recombinant expression and purification of the truncated Bombyx mori R2 reverse transcriptase used in OTTR reduced non-productive sequencing reads by minimizing bacterial nucleic acids that compete with low-input RNA molecules for cDNA synthesis, such that with miRNA input of 3 picograms (less than 1 fmol), fewer than 10% of sequencing reads are bacterial nucleic acid contaminants. We also introduce a rapid, automation-compatible OTTR protocol that enables gel-free, length-agnostic enrichment of cDNA duplexes from unwanted adapter-only side products. Overall, this work informs considerations for unbiased end-to-end capture and annotation of RNAs independent of their sequence, structure, or post-transcriptional modifications.
Collapse
Affiliation(s)
- Lucas Ferguson
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, USA
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Heather E Upton
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, USA
- Present address: Addition Therapeutics, 201 Haskins Way, South San Francisco, CA 94080
| | - Sydney C Pimentel
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, USA
- Present address: NYU Grossman School of Medicine, 550 First Avenue, New York, NY 10016
| | - Chris Jeans
- MacroLab, University of California, Berkeley, Berkeley, CA, USA
- California Institute for Quantitative Biosciences, University of California, Berkeley, Berkeley, USA
| | - Nicholas T Ingolia
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, USA
- California Institute for Quantitative Biosciences, University of California, Berkeley, Berkeley, USA
| | - Kathleen Collins
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, USA
- California Institute for Quantitative Biosciences, University of California, Berkeley, Berkeley, USA
| |
Collapse
|
8
|
Lyons EF, Devanneaux LC, Muller RY, Freitas AV, Meacham ZA, McSharry MV, Trinh VN, Rogers AJ, Ingolia NT, Lareau LF. Translation elongation as a rate limiting step of protein production. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.27.568910. [PMID: 38076849 PMCID: PMC10705293 DOI: 10.1101/2023.11.27.568910] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
The impact of synonymous codon choice on protein output has important implications for understanding endogenous gene expression and design of synthetic mRNAs. Synonymous codons are decoded at different speeds, but simple models predict that this should not drive protein output. Instead, translation initiation should be the rate limiting step for production of protein per mRNA, with little impact of codon choice. Previously, we used a neural network model to design a series of synonymous fluorescent reporters and showed that their protein output in yeast spanned a seven-fold range corresponding to their predicted translation elongation speed. Here, we show that this effect is not due primarily to the established impact of slow elongation on mRNA stability, but rather, that slow elongation further decreases the number of proteins made per mRNA. We combine simulations and careful experiments on fluorescent reporters to show that translation is limited on non-optimally encoded transcripts. Using a genome-wide CRISPRi screen, we find that impairing translation initiation attenuates the impact of slow elongation, showing a dynamic balance between rate limiting steps of protein production. Our results show that codon choice can directly limit protein production across the full range of endogenous variability in codon usage.
Collapse
Affiliation(s)
- Elijah F Lyons
- Department of Molecular and Cell Biology, University of California, Berkeley, California
| | - Lou C Devanneaux
- Department of Molecular and Cell Biology, University of California, Berkeley, California
| | - Ryan Y Muller
- Department of Molecular and Cell Biology, University of California, Berkeley, California
| | - Anna V Freitas
- Department of Molecular and Cell Biology, University of California, Berkeley, California
| | - Zuriah A Meacham
- Department of Molecular and Cell Biology, University of California, Berkeley, California
| | - Maria V McSharry
- Department of Molecular and Cell Biology, University of California, Berkeley, California
| | - Van N Trinh
- Department of Bioengineering, University of California, Berkeley, California
| | - Anna J Rogers
- Department of Molecular and Cell Biology, University of California, Berkeley, California
| | - Nicholas T Ingolia
- Department of Molecular and Cell Biology, University of California, Berkeley, California
| | - Liana F Lareau
- Department of Molecular and Cell Biology, University of California, Berkeley, California
- Department of Bioengineering, University of California, Berkeley, California
- Chan Zuckerberg Biohub, San Francisco, California
| |
Collapse
|
9
|
Sejour R, Leatherwood J, Yurovsky A, Futcher B. Enrichment of rare codons at 5' ends of genes is a spandrel caused by evolutionary sequence turnover and does not improve translation. eLife 2024; 12:RP89656. [PMID: 39008347 PMCID: PMC11249729 DOI: 10.7554/elife.89656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/16/2024] Open
Abstract
Previously, Tuller et al. found that the first 30-50 codons of the genes of yeast and other eukaryotes are slightly enriched for rare codons. They argued that this slowed translation, and was adaptive because it queued ribosomes to prevent collisions. Today, the translational speeds of different codons are known, and indeed rare codons are translated slowly. We re-examined this 5' slow translation 'ramp.' We confirm that 5' regions are slightly enriched for rare codons; in addition, they are depleted for downstream Start codons (which are fast), with both effects contributing to slow 5' translation. However, we also find that the 5' (and 3') ends of yeast genes are poorly conserved in evolution, suggesting that they are unstable and turnover relatively rapidly. When a new 5' end forms de novo, it is likely to include codons that would otherwise be rare. Because evolution has had a relatively short time to select against these codons, 5' ends are typically slightly enriched for rare, slow codons. Opposite to the expectation of Tuller et al., we show by direct experiment that genes with slowly translated codons at the 5' end are expressed relatively poorly, and that substituting faster synonymous codons improves expression. Direct experiment shows that slow codons do not prevent downstream ribosome collisions. Further informatic studies suggest that for natural genes, slow 5' ends are correlated with poor gene expression, opposite to the expectation of Tuller et al. Thus, we conclude that slow 5' translation is a 'spandrel'--a non-adaptive consequence of something else, in this case, the turnover of 5' ends in evolution, and it does not improve translation.
Collapse
Affiliation(s)
- Richard Sejour
- Department of Pharmacological Sciences, Stony Brook UniversityStony BrookUnited States
| | - Janet Leatherwood
- Department of Microbiology and Immunology, Stony Brook UniversityStony BrookUnited States
| | - Alisa Yurovsky
- Department of Biomedical Informatics, Stony Brook UniversityStony BrookUnited States
| | - Bruce Futcher
- Department of Microbiology and Immunology, Stony Brook UniversityStony BrookUnited States
| |
Collapse
|
10
|
Fan X, Chang T, Chen C, Hafner M, Wang Z. Analysis of RNA translation with a deep learning architecture provides new insight into translation control. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.07.08.548206. [PMID: 39005319 PMCID: PMC11244891 DOI: 10.1101/2023.07.08.548206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
Accurate annotation of coding regions in RNAs is essential for understanding gene translation. We developed a deep neural network to directly predict and analyze translation initiation and termination sites from RNA sequences. Trained with human transcripts, our model learned hidden rules of translation control and achieved a near perfect prediction of canonical translation sites across entire human transcriptome. Surprisingly, this model revealed a new role of codon usage in regulating translation termination, which was experimentally validated. We also identified thousands of new open reading frames in mRNAs or lncRNAs, some of which were confirmed experimentally. The model trained with human mRNAs achieved high prediction accuracy of canonical translation sites in all eukaryotes and good prediction in polycistronic transcripts from prokaryotes or RNA viruses, suggesting a high degree of conservation in translation control. Collectively, we present a general and efficient deep learning model for RNA translation, generating new insights into the complexity of translation regulation.
Collapse
Affiliation(s)
- Xiaojuan Fan
- Bio-med Big Data Center, CAS Key Laboratory of Computational Biology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Nutrition and Health
- RNA Molecular Biology Laboratory, National Institute of Arthritis and Musculoskeletal and Skin Disease, Bethesda, MD, USA
| | - Tiangen Chang
- Laboratory of Cancer Data Science, National Cancer Institute, Bethesda, MD, USA
| | - Chuyun Chen
- Bio-med Big Data Center, CAS Key Laboratory of Computational Biology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Nutrition and Health
- University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Markus Hafner
- RNA Molecular Biology Laboratory, National Institute of Arthritis and Musculoskeletal and Skin Disease, Bethesda, MD, USA
| | - Zefeng Wang
- Bio-med Big Data Center, CAS Key Laboratory of Computational Biology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Nutrition and Health
- University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| |
Collapse
|
11
|
Goshisht MK. Machine Learning and Deep Learning in Synthetic Biology: Key Architectures, Applications, and Challenges. ACS OMEGA 2024; 9:9921-9945. [PMID: 38463314 PMCID: PMC10918679 DOI: 10.1021/acsomega.3c05913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 01/19/2024] [Accepted: 01/30/2024] [Indexed: 03/12/2024]
Abstract
Machine learning (ML), particularly deep learning (DL), has made rapid and substantial progress in synthetic biology in recent years. Biotechnological applications of biosystems, including pathways, enzymes, and whole cells, are being probed frequently with time. The intricacy and interconnectedness of biosystems make it challenging to design them with the desired properties. ML and DL have a synergy with synthetic biology. Synthetic biology can be employed to produce large data sets for training models (for instance, by utilizing DNA synthesis), and ML/DL models can be employed to inform design (for example, by generating new parts or advising unrivaled experiments to perform). This potential has recently been brought to light by research at the intersection of engineering biology and ML/DL through achievements like the design of novel biological components, best experimental design, automated analysis of microscopy data, protein structure prediction, and biomolecular implementations of ANNs (Artificial Neural Networks). I have divided this review into three sections. In the first section, I describe predictive potential and basics of ML along with myriad applications in synthetic biology, especially in engineering cells, activity of proteins, and metabolic pathways. In the second section, I describe fundamental DL architectures and their applications in synthetic biology. Finally, I describe different challenges causing hurdles in the progress of ML/DL and synthetic biology along with their solutions.
Collapse
Affiliation(s)
- Manoj Kumar Goshisht
- Department of Chemistry, Natural and
Applied Sciences, University of Wisconsin—Green
Bay, Green
Bay, Wisconsin 54311-7001, United States
| |
Collapse
|
12
|
Shao B, Yan J, Zhang J, Liu L, Chen Y, Buskirk AR. Riboformer: a deep learning framework for predicting context-dependent translation dynamics. Nat Commun 2024; 15:2011. [PMID: 38443396 PMCID: PMC10915169 DOI: 10.1038/s41467-024-46241-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Accepted: 02/18/2024] [Indexed: 03/07/2024] Open
Abstract
Translation elongation is essential for maintaining cellular proteostasis, and alterations in the translational landscape are associated with a range of diseases. Ribosome profiling allows detailed measurements of translation at the genome scale. However, it remains unclear how to disentangle biological variations from technical artifacts in these data and identify sequence determinants of translation dysregulation. Here we present Riboformer, a deep learning-based framework for modeling context-dependent changes in translation dynamics. Riboformer leverages the transformer architecture to accurately predict ribosome densities at codon resolution. When trained on an unbiased dataset, Riboformer corrects experimental artifacts in previously unseen datasets, which reveals subtle differences in synonymous codon translation and uncovers a bottleneck in translation elongation. Further, we show that Riboformer can be combined with in silico mutagenesis to identify sequence motifs that contribute to ribosome stalling across various biological contexts, including aging and viral infection. Our tool offers a context-aware and interpretable approach for standardizing ribosome profiling datasets and elucidating the regulatory basis of translation kinetics.
Collapse
Affiliation(s)
- Bin Shao
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA.
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Jiawei Yan
- Department of Chemistry, Stanford University, Stanford, CA, USA
| | - Jing Zhang
- Biological Design Center, Boston University, Boston, MA, USA
| | - Lili Liu
- Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Ye Chen
- Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Allen R Buskirk
- Department of Molecular Biology and Genetics, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| |
Collapse
|
13
|
Powers EN, Kuwayama N, Sousa C, Reynaud K, Jovanovic M, Ingolia NT, Brar GA. Dbp1 is a low performance paralog of RNA helicase Ded1 that drives impaired translation and heat stress response. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.12.575095. [PMID: 38260653 PMCID: PMC10802583 DOI: 10.1101/2024.01.12.575095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Ded1 and Dbp1 are paralogous conserved RNA helicases that enable translation initiation in yeast. Ded1 has been heavily studied but the role of Dbp1 is poorly understood. We find that the expression of these two helicases is controlled in an inverse and condition-specific manner. In meiosis and other long-term starvation states, Dbp1 expression is upregulated and Ded1 is downregulated, whereas in mitotic cells, Dbp1 expression is extremely low. Inserting the DBP1 ORF in place of the DED1 ORF cannot replace the function of Ded1 in supporting translation, partly due to inefficient mitotic translation of the DBP1 mRNA, dependent on features of its ORF sequence but independent of codon optimality. Global measurements of translation rates and 5' leader translation, activity of mRNA-tethered helicases, ribosome association, and low temperature growth assays show that-even at matched protein levels-Ded1 is more effective than Dbp1 at activating translation, especially for mRNAs with structured 5' leaders. Ded1 supports halting of translation and cell growth in response to heat stress, but Dbp1 lacks this function, as well. These functional differences in the ability to efficiently mediate translation activation and braking can be ascribed to the divergent, disordered N- and C-terminal regions of these two helicases. Altogether, our data show that Dbp1 is a "low performance" version of Ded1 that cells employ in place of Ded1 under long-term conditions of nutrient deficiency.
Collapse
|
14
|
Barrington CL, Galindo G, Koch AL, Horton ER, Morrison EJ, Tisa S, Stasevich TJ, Rissland OS. Synonymous codon usage regulates translation initiation. Cell Rep 2023; 42:113413. [PMID: 38096059 PMCID: PMC10790568 DOI: 10.1016/j.celrep.2023.113413] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2023] [Revised: 08/30/2023] [Accepted: 10/25/2023] [Indexed: 12/30/2023] Open
Abstract
Nonoptimal synonymous codons repress gene expression, but the underlying mechanisms are poorly understood. We and others have previously shown that nonoptimal codons slow translation elongation speeds and thereby trigger messenger RNA (mRNA) degradation. Nevertheless, transcript levels are often insufficient to explain protein levels, suggesting additional mechanisms by which codon usage regulates gene expression. Using reporters in human and Drosophila cells, we find that transcript levels account for less than half of the variation in protein abundance due to codon usage. This discrepancy is explained by translational differences whereby nonoptimal codons repress translation initiation. Nonoptimal transcripts are also less bound by the translation initiation factors eIF4E and eIF4G1, providing a mechanistic explanation for their reduced initiation rates. Importantly, translational repression can occur without mRNA decay and deadenylation, and it does not depend on the known nonoptimality sensor, CNOT3. Our results reveal a potent mechanism of regulation by codon usage where nonoptimal codons repress further rounds of translation.
Collapse
Affiliation(s)
- Chloe L Barrington
- Department of Biochemistry & Molecular Genetics, University of Colorado School of Medicine, Aurora, CO 80045, USA; RNA Bioscience Initiative, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Gabriel Galindo
- Department of Biochemistry & Molecular Biology, Colorado State University, Fort Collins, CO 80523, USA
| | - Amanda L Koch
- Department of Biochemistry & Molecular Biology, Colorado State University, Fort Collins, CO 80523, USA
| | - Emma R Horton
- Department of Biochemistry & Molecular Genetics, University of Colorado School of Medicine, Aurora, CO 80045, USA; RNA Bioscience Initiative, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Evan J Morrison
- Department of Biochemistry & Molecular Genetics, University of Colorado School of Medicine, Aurora, CO 80045, USA; RNA Bioscience Initiative, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Samantha Tisa
- Department of Biochemistry & Molecular Genetics, University of Colorado School of Medicine, Aurora, CO 80045, USA; RNA Bioscience Initiative, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Timothy J Stasevich
- Department of Biochemistry & Molecular Biology, Colorado State University, Fort Collins, CO 80523, USA
| | - Olivia S Rissland
- Department of Biochemistry & Molecular Genetics, University of Colorado School of Medicine, Aurora, CO 80045, USA; RNA Bioscience Initiative, University of Colorado School of Medicine, Aurora, CO 80045, USA.
| |
Collapse
|
15
|
Ferguson L, Upton HE, Pimentel SC, Mok A, Lareau LF, Collins K, Ingolia NT. Streamlined and sensitive mono- and di-ribosome profiling in yeast and human cells. Nat Methods 2023; 20:1704-1715. [PMID: 37783882 PMCID: PMC11276118 DOI: 10.1038/s41592-023-02028-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Accepted: 08/23/2023] [Indexed: 10/04/2023]
Abstract
Ribosome profiling has unveiled diverse regulation and perturbations of translation through a transcriptome-wide survey of ribosome occupancy, read out by sequencing of ribosome-protected messenger RNA fragments. Generation of ribosome footprints and their conversion into sequencing libraries is technically demanding and sensitive to biases that distort the representation of physiological ribosome occupancy. We address these challenges by producing ribosome footprints with P1 nuclease rather than RNase I and replacing RNA ligation with ordered two-template relay, a single-tube protocol for sequencing library preparation that incorporates adaptors by reverse transcription. Our streamlined approach reduced sequence bias and enhanced enrichment of ribosome footprints relative to ribosomal RNA. Furthermore, P1 nuclease preserved distinct juxtaposed ribosome complexes informative about yeast and human ribosome fates during translation initiation, stalling and termination. Our optimized methods for mRNA footprint generation and capture provide a richer translatome profile with low input and fewer technical challenges.
Collapse
Affiliation(s)
- Lucas Ferguson
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA.
- Center for Computational Biology, University of California, Berkeley, CA, USA.
| | - Heather E Upton
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA
| | - Sydney C Pimentel
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA
| | - Amanda Mok
- Center for Computational Biology, University of California, Berkeley, CA, USA
| | - Liana F Lareau
- Center for Computational Biology, University of California, Berkeley, CA, USA
- Department of Bioengineering, University of California, Berkeley, CA, USA
- California Institute for Quantitative Biosciences, University of California, Berkeley, CA, USA
| | - Kathleen Collins
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA.
- California Institute for Quantitative Biosciences, University of California, Berkeley, CA, USA.
| | - Nicholas T Ingolia
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA.
- California Institute for Quantitative Biosciences, University of California, Berkeley, CA, USA.
| |
Collapse
|
16
|
Weber M, Sogues A, Yus E, Burgos R, Gallo C, Martínez S, Lluch‐Senar M, Serrano L. Comprehensive quantitative modeling of translation efficiency in a genome-reduced bacterium. Mol Syst Biol 2023; 19:e11301. [PMID: 37642167 PMCID: PMC10568206 DOI: 10.15252/msb.202211301] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Revised: 07/17/2023] [Accepted: 07/24/2023] [Indexed: 08/31/2023] Open
Abstract
Translation efficiency has been mainly studied by ribosome profiling, which only provides an incomplete picture of translation kinetics. Here, we integrated the absolute quantifications of tRNAs, mRNAs, RNA half-lives, proteins, and protein half-lives with ribosome densities and derived the initiation and elongation rates for 475 genes (67% of all genes), 73 with high precision, in the bacterium Mycoplasma pneumoniae (Mpn). We found that, although the initiation rate varied over 160-fold among genes, most of the known factors had little impact on translation efficiency. Local codon elongation rates could not be fully explained by the adaptation to tRNA abundances, which varied over 100-fold among tRNA isoacceptors. We provide a comprehensive quantitative view of translation efficiency, which suggests the existence of unidentified mechanisms of translational regulation in Mpn.
Collapse
Affiliation(s)
- Marc Weber
- Centre for Genomic Regulation (CRG)The Barcelona Institute of Science and TechnologyBarcelonaSpain
| | - Adrià Sogues
- Centre for Genomic Regulation (CRG)The Barcelona Institute of Science and TechnologyBarcelonaSpain
| | - Eva Yus
- Centre for Genomic Regulation (CRG)The Barcelona Institute of Science and TechnologyBarcelonaSpain
| | - Raul Burgos
- Centre for Genomic Regulation (CRG)The Barcelona Institute of Science and TechnologyBarcelonaSpain
| | - Carolina Gallo
- Centre for Genomic Regulation (CRG)The Barcelona Institute of Science and TechnologyBarcelonaSpain
| | - Sira Martínez
- Centre for Genomic Regulation (CRG)The Barcelona Institute of Science and TechnologyBarcelonaSpain
| | - Maria Lluch‐Senar
- Centre for Genomic Regulation (CRG)The Barcelona Institute of Science and TechnologyBarcelonaSpain
| | - Luis Serrano
- Centre for Genomic Regulation (CRG)The Barcelona Institute of Science and TechnologyBarcelonaSpain
- Universitat Pompeu Fabra (UPF)BarcelonaSpain
- ICREABarcelonaSpain
| |
Collapse
|
17
|
Umemoto S, Kondo T, Fujino T, Hayashi G, Murakami H. Large-scale analysis of mRNA sequences localized near the start and amber codons and their impact on the diversity of mRNA display libraries. Nucleic Acids Res 2023; 51:7465-7479. [PMID: 37395404 PMCID: PMC10415131 DOI: 10.1093/nar/gkad555] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Revised: 06/14/2023] [Accepted: 06/18/2023] [Indexed: 07/04/2023] Open
Abstract
Extremely diverse libraries are essential for effectively selecting functional peptides or proteins, and mRNA display technology is a powerful tool for generating such libraries with over 1012-1013 diversity. Particularly, the protein-puromycin linker (PuL)/mRNA complex formation yield is determining for preparing the libraries. However, how mRNA sequences affect the complex formation yield remains unclear. To study the effects of N-terminal and C-terminal coding sequences on the complex formation yield, puromycin-attached mRNAs containing three random codons after the start codon (32768 sequences) or seven random bases next to the amber codon (6480 sequences) were translated. Enrichment scores were calculated by dividing the appearance rate of every sequence in protein-PuL/mRNA complexes by that in total mRNAs. The wide range of enrichment scores (0.09-2.10 for N-terminal and 0.30-4.23 for C-terminal coding sequences) indicated that the N-terminal and C-terminal coding sequences strongly affected the complex formation yield. Using C-terminal GGC-CGA-UAG-U sequences, which resulted in the highest enrichment scores, we constructed highly diverse libraries of monobodies and macrocyclic peptides. The present study provides insights into how mRNA sequences affect the protein/mRNA complex formation yield and will accelerate the identification of functional peptides and proteins involved in various biological processes and having therapeutic applications.
Collapse
Affiliation(s)
- Shun Umemoto
- Department of Biomolecular Engineering, Graduate School of Engineering, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8603, Japan
| | - Taishi Kondo
- Department of Biomolecular Engineering, Graduate School of Engineering, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8603, Japan
| | - Tomoshige Fujino
- Department of Biomolecular Engineering, Graduate School of Engineering, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8603, Japan
| | - Gosuke Hayashi
- Department of Biomolecular Engineering, Graduate School of Engineering, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8603, Japan
- Japan Science and Technology Agency (JST), PRESTO, 4-1-8 Honcho, Kawaguchi, Saitama 332-0012, Japan
| | - Hiroshi Murakami
- Department of Biomolecular Engineering, Graduate School of Engineering, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8603, Japan
- Institute of Nano-Life-Systems, Institutes of Innovation for Future Society, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8603, Japan
| |
Collapse
|
18
|
Wienecke AN, Barry ML, Pollard DA. Natural variation in codon bias and mRNA folding strength interact synergistically to modify protein expression in Saccharomyces cerevisiae. Genetics 2023; 224:iyad113. [PMID: 37310925 PMCID: PMC10411576 DOI: 10.1093/genetics/iyad113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Revised: 04/10/2023] [Accepted: 05/15/2023] [Indexed: 06/15/2023] Open
Abstract
Codon bias and mRNA folding strength (mF) are hypothesized molecular mechanisms by which polymorphisms in genes modify protein expression. Natural patterns of codon bias and mF across genes as well as effects of altering codon bias and mF suggest that the influence of these 2 mechanisms may vary depending on the specific location of polymorphisms within a transcript. Despite the central role codon bias and mF may play in natural trait variation within populations, systematic studies of how polymorphic codon bias and mF relate to protein expression variation are lacking. To address this need, we analyzed genomic, transcriptomic, and proteomic data for 22 Saccharomyces cerevisiae isolates, estimated protein accumulation for each allele of 1,620 genes as the log of protein molecules per RNA molecule (logPPR), and built linear mixed-effects models associating allelic variation in codon bias and mF with allelic variation in logPPR. We found that codon bias and mF interact synergistically in a positive association with logPPR, and this interaction explains almost all the effects of codon bias and mF. We examined how the locations of polymorphisms within transcripts influence their effects and found that codon bias primarily acts through polymorphisms in domain-encoding and 3' coding sequences, while mF acts most significantly through coding sequences with weaker effects from untranslated regions. Our results present the most comprehensive characterization to date of how polymorphisms in transcripts influence protein expression.
Collapse
Affiliation(s)
- Anastacia N Wienecke
- Biology Department, Western Washington University, Bellingham, WA 98225, USA
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Margaret L Barry
- Biology Department, Western Washington University, Bellingham, WA 98225, USA
| | - Daniel A Pollard
- Biology Department, Western Washington University, Bellingham, WA 98225, USA
| |
Collapse
|
19
|
Shao B, Yan J, Zhang J, Buskirk AR. Riboformer: A Deep Learning Framework for Predicting Context-Dependent Translation Dynamics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.24.538053. [PMID: 37163112 PMCID: PMC10168224 DOI: 10.1101/2023.04.24.538053] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Translation elongation is essential for maintaining cellular proteostasis, and alterations in the translational landscape are associated with a range of diseases. Ribosome profiling allows detailed measurement of translation at genome scale. However, it remains unclear how to disentangle biological variations from technical artifacts and identify sequence determinant of translation dysregulation. Here we present Riboformer, a deep learning-based framework for modeling context-dependent changes in translation dynamics. Riboformer leverages the transformer architecture to accurately predict ribosome densities at codon resolution. It corrects experimental artifacts in previously unseen datasets, reveals subtle differences in synonymous codon translation and uncovers a bottleneck in protein synthesis. Further, we show that Riboformer can be combined with in silico mutagenesis analysis to identify sequence motifs that contribute to ribosome stalling across various biological contexts, including aging and viral infection. Our tool offers a context-aware and interpretable approach for standardizing ribosome profiling datasets and elucidating the regulatory basis of translation kinetics.
Collapse
Affiliation(s)
- Bin Shao
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA
- Present address: Klarman Cell Observatory, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Jiawei Yan
- Department of Chemistry, Stanford University, Stanford, CA, USA
| | - Jing Zhang
- Biological Design Center, Boston University, Boston, MA, USA
| | - Allen R. Buskirk
- Department of Molecular Biology and Genetics, Johns Hopkins University School of Medicine, Baltimore, USA
| |
Collapse
|
20
|
Shiraishi C, Matsumoto A, Ichihara K, Yamamoto T, Yokoyama T, Mizoo T, Hatano A, Matsumoto M, Tanaka Y, Matsuura-Suzuki E, Iwasaki S, Matsushima S, Tsutsui H, Nakayama KI. RPL3L-containing ribosomes determine translation elongation dynamics required for cardiac function. Nat Commun 2023; 14:2131. [PMID: 37080962 PMCID: PMC10119107 DOI: 10.1038/s41467-023-37838-6] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 04/03/2023] [Indexed: 04/22/2023] Open
Abstract
Although several ribosomal protein paralogs are expressed in a tissue-specific manner, how these proteins affect translation and why they are required only in certain tissues have remained unclear. Here we show that RPL3L, a paralog of RPL3 specifically expressed in heart and skeletal muscle, influences translation elongation dynamics. Deficiency of RPL3L-containing ribosomes in RPL3L knockout male mice resulted in impaired cardiac contractility. Ribosome occupancy at mRNA codons was found to be altered in the RPL3L-deficient heart, and the changes were negatively correlated with those observed in myoblasts overexpressing RPL3L. RPL3L-containing ribosomes were less prone to collisions compared with RPL3-containing canonical ribosomes. Although the loss of RPL3L-containing ribosomes altered translation elongation dynamics for the entire transcriptome, its effects were most pronounced for transcripts related to cardiac muscle contraction and dilated cardiomyopathy, with the abundance of the encoded proteins being correspondingly decreased. Our results provide further insight into the mechanisms and physiological relevance of tissue-specific translational regulation.
Collapse
Affiliation(s)
- Chisa Shiraishi
- Division of Cell Biology, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Fukuoka, 812-8582, Japan
| | - Akinobu Matsumoto
- Division of Cell Biology, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Fukuoka, 812-8582, Japan.
| | - Kazuya Ichihara
- Division of Cell Biology, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Fukuoka, 812-8582, Japan
| | - Taishi Yamamoto
- Department of Cardiovascular Medicine, Faculty of Medical Sciences, Kyushu University, Fukuoka, Fukuoka, 812-8582, Japan
| | - Takeshi Yokoyama
- Graduate School of Life Sciences, Tohoku University, Sendai, Miyagi, 980-8577, Japan
| | - Taisuke Mizoo
- Division of Cell Biology, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Fukuoka, 812-8582, Japan
| | - Atsushi Hatano
- Department of Omics and Systems Biology, Graduate School of Medical and Dental Sciences, Niigata University, Niigata, Niigata, 951-8510, Japan
| | - Masaki Matsumoto
- Department of Omics and Systems Biology, Graduate School of Medical and Dental Sciences, Niigata University, Niigata, Niigata, 951-8510, Japan
| | - Yoshikazu Tanaka
- Graduate School of Life Sciences, Tohoku University, Sendai, Miyagi, 980-8577, Japan
| | - Eriko Matsuura-Suzuki
- RNA Systems Biochemistry Laboratory, RIKEN Cluster for Pioneering Research, Wako, Saitama, 351-0198, Japan
| | - Shintaro Iwasaki
- RNA Systems Biochemistry Laboratory, RIKEN Cluster for Pioneering Research, Wako, Saitama, 351-0198, Japan
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, 277-8561, Japan
| | - Shouji Matsushima
- Department of Cardiovascular Medicine, Faculty of Medical Sciences, Kyushu University, Fukuoka, Fukuoka, 812-8582, Japan
| | - Hiroyuki Tsutsui
- Department of Cardiovascular Medicine, Faculty of Medical Sciences, Kyushu University, Fukuoka, Fukuoka, 812-8582, Japan
| | - Keiichi I Nakayama
- Division of Cell Biology, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Fukuoka, 812-8582, Japan.
| |
Collapse
|
21
|
Hernandez-Alias X, Benisty H, Radusky LG, Serrano L, Schaefer MH. Using protein-per-mRNA differences among human tissues in codon optimization. Genome Biol 2023; 24:34. [PMID: 36829202 PMCID: PMC9951436 DOI: 10.1186/s13059-023-02868-2] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 02/07/2023] [Indexed: 02/26/2023] Open
Abstract
BACKGROUND Codon usage and nucleotide composition of coding sequences have profound effects on protein expression. However, while it is recognized that different tissues have distinct tRNA profiles and codon usages in their transcriptomes, the effect of tissue-specific codon optimality on protein synthesis remains elusive. RESULTS We leverage existing state-of-the-art transcriptomics and proteomics datasets from the GTEx project and the Human Protein Atlas to compute the protein-to-mRNA ratios of 36 human tissues. Using this as a proxy of translational efficiency, we build a machine learning model that identifies codons enriched or depleted in specific tissues. We detect two clusters of tissues with an opposite pattern of codon preferences. We then use these identified patterns for the development of CUSTOM, a codon optimizer algorithm which suggests a synonymous codon design in order to optimize protein production in a tissue-specific manner. In human cell-line models, we provide evidence that codon optimization should take into account particularities of the translational machinery of the tissues in which the target proteins are expressed and that our approach can design genes with tissue-optimized expression profiles. CONCLUSIONS We provide proof-of-concept evidence that codon preferences exist in tissue-specific protein synthesis and demonstrate its application to synthetic gene design. We show that CUSTOM can be of benefit in biological and biotechnological applications, such as in the design of tissue-targeted therapies and vaccines.
Collapse
Affiliation(s)
- Xavier Hernandez-Alias
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, 08003, Barcelona, Spain.
| | - Hannah Benisty
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, 08003, Barcelona, Spain
| | - Leandro G Radusky
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, 08003, Barcelona, Spain
| | - Luis Serrano
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, 08003, Barcelona, Spain. .,Universitat Pompeu Fabra (UPF), 08002, Barcelona, Spain. .,ICREA, Pg. Lluís Companys 23, 08010, Barcelona, Spain.
| | - Martin H Schaefer
- IEO European Institute of Oncology IRCCS, Department of Experimental Oncology, Via Adamello 16, 20139, Milan, Italy.
| |
Collapse
|
22
|
Mok A, Tunney R, Benegas G, Wallace EWJ, Lareau LF. choros: correction of sequence-based biases for accurate quantification of ribosome profiling data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.21.529452. [PMID: 36865295 PMCID: PMC9980091 DOI: 10.1101/2023.02.21.529452] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/24/2023]
Abstract
Ribosome profiling quantifies translation genome-wide by sequencing ribosome-protected fragments, or footprints. Its single-codon resolution allows identification of translation regulation, such as ribosome stalls or pauses, on individual genes. However, enzyme preferences during library preparation lead to pervasive sequence artifacts that obscure translation dynamics. Widespread over- and under-representation of ribosome footprints can dominate local footprint densities and skew estimates of elongation rates by up to five fold. To address these biases and uncover true patterns of translation, we present choros, a computational method that models ribosome footprint distributions to provide bias-corrected footprint counts. choros uses negative binomial regression to accurately estimate two sets of parameters: (i) biological contributions from codon-specific translation elongation rates; and (ii) technical contributions from nuclease digestion and ligation efficiencies. We use these parameter estimates to generate bias correction factors that eliminate sequence artifacts. Applying choros to multiple ribosome profiling datasets, we are able to accurately quantify and attenuate ligation biases to provide more faithful measurements of ribosome distribution. We show that a pattern interpreted as pervasive ribosome pausing near the beginning of coding regions is likely to arise from technical biases. Incorporating choros into standard analysis pipelines will improve biological discovery from measurements of translation.
Collapse
Affiliation(s)
- Amanda Mok
- Center for Computational Biology, University of California, Berkeley
| | - Robert Tunney
- Center for Computational Biology, University of California, Berkeley
| | - Gonzalo Benegas
- Center for Computational Biology, University of California, Berkeley
| | | | - Liana F. Lareau
- Center for Computational Biology, University of California, Berkeley
- Department of Bioengineering, University of California, Berkeley
| |
Collapse
|
23
|
Goulet DR, Yan Y, Agrawal P, Waight AB, Mak ANS, Zhu Y. Codon Optimization Using a Recurrent Neural Network. J Comput Biol 2023; 30:70-81. [PMID: 35727687 DOI: 10.1089/cmb.2021.0458] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
Codon optimization of a DNA sequence can significantly increase efficiency of protein expression, reducing the cost to manufacture biologic pharmaceuticals. Although directed methods based on such factors as codon usage bias and GC nucleotide content are often used to optimize protein expression, undirected optimization using machine learning could further improve the process by capitalizing on undiscovered patterns that exist within real DNA sequences. To explore this hypothesis, Chinese hamster DNA sequences were used to train a recurrent neural network (RNN) model of codon optimization. The model was used to generate optimized DNA sequence based on an input amino acid sequence for the example receptor programmed death-ligand 1 and for an example monoclonal antibody. When RNN-optimized sequences were transfected transiently or stably into Chinese hamster ovary cells, the resulting protein expression was as high or higher than that produced by DNA sequences optimized by conventional algorithms.
Collapse
Affiliation(s)
- Dennis R Goulet
- Department of Protein Engineering, and SysImmune, Inc., Redmond, Washington, USA
| | - Yongqi Yan
- Department of Cell Science, SysImmune, Inc., Redmond, Washington, USA
| | - Palak Agrawal
- Department of Cell Science, SysImmune, Inc., Redmond, Washington, USA
| | - Andrew B Waight
- Department of Protein Engineering, and SysImmune, Inc., Redmond, Washington, USA
| | - Amanda Nga-Sze Mak
- Department of Protein Engineering, and SysImmune, Inc., Redmond, Washington, USA.,Department of Cell Science, SysImmune, Inc., Redmond, Washington, USA
| | - Yi Zhu
- Department of Protein Engineering, and SysImmune, Inc., Redmond, Washington, USA.,Department of Cell Science, SysImmune, Inc., Redmond, Washington, USA
| |
Collapse
|
24
|
Abstract
This chapter outlines the myriad applications of machine learning (ML) in synthetic biology, specifically in engineering cell and protein activity, and metabolic pathways. Though by no means comprehensive, the chapter highlights several prominent computational tools applied in the field and their potential use cases. The examples detailed reinforce how ML algorithms can enhance synthetic biology research by providing data-driven insights into the behavior of living systems, even without detailed knowledge of their underlying mechanisms. By doing so, ML promises to increase the efficiency of research projects by modeling hypotheses in silico that can then be tested through experiments. While challenges related to training dataset generation and computational costs remain, ongoing improvements in ML tools are paving the way for smarter and more streamlined synthetic biology workflows that can be readily employed to address grand challenges across manufacturing, medicine, engineering, agriculture, and beyond.
Collapse
Affiliation(s)
- Brendan Fu-Long Sieow
- NUS Synthetic Biology for Clinical and Technological Innovation (SynCTI), National University of Singapore, Singapore, Singapore
- Synthetic Biology Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- NUS Graduate School for Integrative Sciences and Engineering Programme, National University of Singapore, Singapore, Singapore
| | - Ryan De Sotto
- NUS Synthetic Biology for Clinical and Technological Innovation (SynCTI), National University of Singapore, Singapore, Singapore
- Synthetic Biology Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Zhi Ren Darren Seet
- NUS Synthetic Biology for Clinical and Technological Innovation (SynCTI), National University of Singapore, Singapore, Singapore
- Synthetic Biology Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - In Young Hwang
- NUS Synthetic Biology for Clinical and Technological Innovation (SynCTI), National University of Singapore, Singapore, Singapore
- Synthetic Biology Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Matthew Wook Chang
- NUS Synthetic Biology for Clinical and Technological Innovation (SynCTI), National University of Singapore, Singapore, Singapore.
- Synthetic Biology Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
25
|
Eisen TJ, Li JJ, Bartel DP. The interplay between translational efficiency, poly(A) tails, microRNAs, and neuronal activation. RNA (NEW YORK, N.Y.) 2022; 28:808-831. [PMID: 35273099 PMCID: PMC9074895 DOI: 10.1261/rna.079046.121] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Accepted: 02/21/2022] [Indexed: 06/14/2023]
Abstract
Neurons provide a rich setting for studying post-transcriptional control. Here, we investigate the landscape of translational control in neurons and search for mRNA features that explain differences in translational efficiency (TE), considering the interplay between TE, mRNA poly(A)-tail lengths, microRNAs, and neuronal activation. In neurons and brain tissues, TE correlates with tail length, and a few dozen mRNAs appear to undergo cytoplasmic polyadenylation upon light or chemical stimulation. However, the correlation between TE and tail length is modest, explaining <5% of TE variance, and even this modest relationship diminishes when accounting for other mRNA features. Thus, tail length appears to affect TE only minimally. Accordingly, miRNAs, which accelerate deadenylation of their mRNA targets, primarily influence target mRNA levels, with no detectable effect on either steady-state tail lengths or TE. Larger correlates with TE include codon composition and predicted mRNA folding energy. When combined in a model, the identified correlates explain 38%-45% of TE variance. These results provide a framework for considering the relative impact of factors that contribute to translational control in neurons. They indicate that when examined in bulk, translational control in neurons largely resembles that of other types of post-embryonic cells. Thus, detection of more specialized control might require analyses that can distinguish translation occurring in neuronal processes from that occurring in cell bodies.
Collapse
Affiliation(s)
- Timothy J Eisen
- Howard Hughes Medical Institute, Cambridge, Massachusetts 02142, USA
- Whitehead Institute for Biomedical Research, Cambridge, Massachusetts 02142, USA
| | - Jingyi Jessica Li
- Department of Statistics, Department of Biostatistics, Department of Computational Medicine, and Department of Human Genetics, University of California, Los Angeles, California 90095, USA
| | - David P Bartel
- Howard Hughes Medical Institute, Cambridge, Massachusetts 02142, USA
- Whitehead Institute for Biomedical Research, Cambridge, Massachusetts 02142, USA
| |
Collapse
|
26
|
Fujita T, Yokoyama T, Shirouzu M, Taguchi H, Ito T, Iwasaki S. The landscape of translational stall sites in bacteria revealed by monosome and disome profiling. RNA (NEW YORK, N.Y.) 2022; 28:290-302. [PMID: 34906996 PMCID: PMC8848927 DOI: 10.1261/rna.078188.120] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Accepted: 11/24/2021] [Indexed: 05/29/2023]
Abstract
Ribosome pauses are associated with various cotranslational events and determine the fate of mRNAs and proteins. Thus, the identification of precise pause sites across the transcriptome is desirable; however, the landscape of ribosome pauses in bacteria remains ambiguous. Here, we harness monosome and disome (or collided ribosome) profiling strategies to survey ribosome pause sites in Escherichia coli Compared to eukaryotes, ribosome collisions in bacteria showed remarkable differences: a low frequency of disomes at stop codons, collisions occurring immediately after 70S assembly on start codons, and shorter queues of ribosomes trailing upstream. The pause sites corresponded with the biochemical validation by integrated nascent chain profiling (iNP) to detect polypeptidyl-tRNA, an elongation intermediate. Moreover, the subset of those sites showed puromycin resistance, presenting slow peptidyl transfer. Among the identified sites, the ribosome pause at Asn586 of ycbZ was validated by biochemical reporter assay, tRNA sequencing (tRNA-seq), and cryo-electron microscopy (cryo-EM) experiments. Our results provide a useful resource for ribosome stalling sites in bacteria.
Collapse
Affiliation(s)
- Tomoya Fujita
- RNA Systems Biochemistry Laboratory, RIKEN Cluster for Pioneering Research, Wako, Saitama 351-0198 Japan
- School of Life Science and Technology, Tokyo Institute of Technology, Midori-ku, Yokohama 226-8503, Japan
| | - Takeshi Yokoyama
- Laboratory for Protein Functional and Structural Biology, RIKEN Center for Biosystems Dynamics Research, Tsurumi-ku, Yokohama 230-0045, Japan
- Graduate School of Life Sciences, Tohoku University, Aoba-ku, Sendai 980-8577, Japan
| | - Mikako Shirouzu
- Laboratory for Protein Functional and Structural Biology, RIKEN Center for Biosystems Dynamics Research, Tsurumi-ku, Yokohama 230-0045, Japan
| | - Hideki Taguchi
- School of Life Science and Technology, Tokyo Institute of Technology, Midori-ku, Yokohama 226-8503, Japan
- Cell Biology Center, Institute of Innovative Research, Tokyo Institute of Technology, Yokohama, Midori-ku, Yokohama 226-8503, Japan
| | - Takuhiro Ito
- Laboratory for Translation Structural Biology, RIKEN Center for Biosystems Dynamics Research, Tsurumi-ku, Yokohama 230-0045, Japan
| | - Shintaro Iwasaki
- RNA Systems Biochemistry Laboratory, RIKEN Cluster for Pioneering Research, Wako, Saitama 351-0198 Japan
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8561, Japan
| |
Collapse
|
27
|
Ding Z, Guan F, Xu G, Wang Y, Yan Y, Zhang W, Wu N, Yao B, Huang H, Tuller T, Tian J. MPEPE, a predictive approach to improve protein expression in E. coli based on deep learning. Comput Struct Biotechnol J 2022; 20:1142-1153. [PMID: 35317239 PMCID: PMC8913310 DOI: 10.1016/j.csbj.2022.02.030] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Revised: 02/27/2022] [Accepted: 02/28/2022] [Indexed: 12/20/2022] Open
Abstract
The expression of proteins in Escherichia coli is often essential for their characterization, modification, and subsequent application. Gene sequence is the major factor contributing expression. In this study, we used the expression data from 6438 heterologous proteins under the same expression condition in E. coli to construct a deep learning classifier for screening high- and low-expression proteins. In conjunction with conserved residue analysis to minimize functional disruption, a mutation predictor for enhanced protein expression (MPEPE) was proposed to identify mutations conducive to protein expression. MPEPE identified mutation sites in laccase 13B22 and the glucose dehydrogenase FAD-AtGDH, that significantly increased both expression levels and activity of these proteins. Additionally, a significant correlation of 0.46 between the predicted high level expression propensity with the constructed models and the protein abundance of endogenous genes in E. coli was also been detected. Therefore, the study provides foundational insights into the relationship between specific amino acid usage, codon usage, and protein expression, and is essential for research and industrial applications.
Collapse
Affiliation(s)
- Zundan Ding
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Feifei Guan
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Guoshun Xu
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Yuchen Wang
- College of Life Science, Northwest Normal University, Lanzhou 730070, China
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Yaru Yan
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Wei Zhang
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Ningfeng Wu
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Bin Yao
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Huoqing Huang
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Tamir Tuller
- Department of Biomedical Engineering, the Engineering Faculty, Tel Aviv University, Tel-Aviv, Israel
| | - Jian Tian
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| |
Collapse
|
28
|
Wright G, Rodriguez A, Li J, Milenkovic T, Emrich SJ, Clark PL. CHARMING: Harmonizing synonymous codon usage to replicate a desired codon usage pattern. Protein Sci 2022; 31:221-231. [PMID: 34738275 PMCID: PMC8740841 DOI: 10.1002/pro.4223] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2021] [Revised: 10/31/2021] [Accepted: 11/02/2021] [Indexed: 01/03/2023]
Abstract
There is a growing appreciation that synonymous codon usage, although historically regarded as phenotypically silent, can instead alter a wide range of mechanisms related to functional protein production, a term we use here to describe the net effect of transcription (mRNA synthesis), mRNA half-life, translation (protein synthesis) and the probability of a protein folding correctly to its active, functional structure. In particular, recent discoveries have highlighted the important role that sub-optimal codons can play in modifying co-translational protein folding. These results have drawn increased attention to the patterns of synonymous codon usage within coding sequences, particularly in light of the discovery that these patterns can be conserved across evolution for homologous proteins. Because synonymous codon usage differs between organisms, for heterologous gene expression it can be desirable to make synonymous codon substitutions to match the codon usage pattern from the original organism in the heterologous expression host. Here we present CHARMING (for Codon HARMonizING), a robust and versatile algorithm to design mRNA sequences for heterologous gene expression and other related codon harmonization tasks. CHARMING can be run as a downloadable Python script or via a web portal at http://www.codons.org.
Collapse
Affiliation(s)
- Gabriel Wright
- Department of Computer Science & EngineeringUniversity of Notre DameNotre DameIndianaUSA,Present address:
Department of Electrical Engineering and Computer ScienceMilwaukee School of EngineeringMilwaukeeWIUSA
| | - Anabel Rodriguez
- Department of Chemistry & BiochemistryUniversity of Notre DameNotre DameIndianaUSA
| | - Jun Li
- Department of Applied and Computational Mathematics & StatisticsUniversity of Notre DameNotre DameIndianaUSA
| | - Tijana Milenkovic
- Department of Computer Science & EngineeringUniversity of Notre DameNotre DameIndianaUSA
| | - Scott J. Emrich
- Department of Electrical Engineering & Computer ScienceUniversity of TennesseeKnoxvilleTennesseeUSA
| | - Patricia L. Clark
- Department of Chemistry & BiochemistryUniversity of Notre DameNotre DameIndianaUSA
| |
Collapse
|
29
|
Riepe C, Zelin E, Frankino PA, Meacham ZA, Fernandez S, Ingolia NT, Corn JE. Double stranded DNA breaks and genome editing trigger loss of ribosomal protein RPS27A. FEBS J 2021; 289:3101-3114. [PMID: 34914197 PMCID: PMC9295824 DOI: 10.1111/febs.16321] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Revised: 11/09/2021] [Accepted: 12/14/2021] [Indexed: 11/03/2022]
Abstract
DNA damage activates a robust transcriptional stress response, but much less is known about how DNA damage impacts translation. The advent of genome editing with Cas9 has intensified interest in understanding cellular responses to DNA damage. Here, we find that DNA double-strand breaks (DSBs), including those induced by Cas9, trigger the loss of ribosomal protein RPS27A from ribosomes via p53-independent proteasomal degradation. Comparisons of Cas9 and dCas9 ribosome profiling and mRNA-seq experiments reveal a global translational response to DSBs that precedes changes in transcript abundance. Our results demonstrate that even a single double-strand break can lead to altered translational output and ribosome remodeling, suggesting caution in interpreting cellular phenotypes measured immediately after genome editing.
Collapse
Affiliation(s)
- Celeste Riepe
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, California, USA
| | - Elena Zelin
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, California, USA
| | - Phillip A Frankino
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, California, USA
| | - Zuriah A Meacham
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, California, USA
| | - Samantha Fernandez
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, California, USA
| | - Nicholas T Ingolia
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, California, USA
| | - Jacob E Corn
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, California, USA.,Innovative Genomics Institute, University of California, Berkeley, Berkeley, California, USA.,Department of Biology, ETH, Zürich, Switzerland
| |
Collapse
|
30
|
Gobet C, Naef F. Ribo-DT: An automated pipeline for inferring codon dwell times from ribosome profiling data. Methods 2021; 203:10-16. [PMID: 34673173 DOI: 10.1016/j.ymeth.2021.10.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Revised: 10/08/2021] [Accepted: 10/11/2021] [Indexed: 11/16/2022] Open
Abstract
Protein synthesis is an energy consuming process characterised as a pivotal and highly regulated step in gene expression. The net protein output is dictated by a combination of translation initiation, elongation and termination rates that have remained difficult to measure. Recently, the development of ribosome profiling has enabled the inference of translation parameters through modelling, as this method informs on the ribosome position along the mRNA. Here, we present an automated, reproducible and portable computational pipeline to infer relative single-codon and codon-pair dwell times as well as gene flux from raw ribosome profiling sequencing data. As a case study, we applied our workflow to a publicly available yeast ribosome profiling dataset consisting of 57 independent gene knockouts related to RNA and tRNA modifications. We uncovered the effects of those modifications on translation elongation and codon selection during decoding. In particular, knocking out mod5 and trm7 increases codon-specific dwell times which indicates their potential tRNA targets, and highlights effects of nucleotide modifications on ribosome decoding rate.
Collapse
Affiliation(s)
- Cédric Gobet
- Institute of Bioengineering (IBI), Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland.
| | - Félix Naef
- Institute of Bioengineering (IBI), Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland.
| |
Collapse
|
31
|
Bhandari BK, Lim CS, Remus DM, Chen A, van Dolleweerd C, Gardner PP. Analysis of 11,430 recombinant protein production experiments reveals that protein yield is tunable by synonymous codon changes of translation initiation sites. PLoS Comput Biol 2021; 17:e1009461. [PMID: 34610008 PMCID: PMC8519471 DOI: 10.1371/journal.pcbi.1009461] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Revised: 10/15/2021] [Accepted: 09/19/2021] [Indexed: 12/16/2022] Open
Abstract
Recombinant protein production is a key process in generating proteins of interest in the pharmaceutical industry and biomedical research. However, about 50% of recombinant proteins fail to be expressed in a variety of host cells. Here we show that the accessibility of translation initiation sites modelled using the mRNA base-unpairing across the Boltzmann's ensemble significantly outperforms alternative features. This approach accurately predicts the successes or failures of expression experiments, which utilised Escherichia coli cells to express 11,430 recombinant proteins from over 189 diverse species. On this basis, we develop TIsigner that uses simulated annealing to modify up to the first nine codons of mRNAs with synonymous substitutions. We show that accessibility captures the key propensity beyond the target region (initiation sites in this case), as a modest number of synonymous changes is sufficient to tune the recombinant protein expression levels. We build a stochastic simulation model and show that higher accessibility leads to higher protein production and slower cell growth, supporting the idea of protein cost, where cell growth is constrained by protein circuits during overexpression.
Collapse
Affiliation(s)
- Bikash K. Bhandari
- Department of Biochemistry, School of Biomedical Sciences, University of Otago, Dunedin, New Zealand
| | - Chun Shen Lim
- Department of Biochemistry, School of Biomedical Sciences, University of Otago, Dunedin, New Zealand
| | - Daniela M. Remus
- Callaghan Innovation Protein Science and Engineering, University of Canterbury, Christchurch, New Zealand
| | - Augustine Chen
- Department of Biochemistry, School of Biomedical Sciences, University of Otago, Dunedin, New Zealand
| | - Craig van Dolleweerd
- Biomolecular Interaction Center, University of Canterbury, Christchurch, New Zealand
| | - Paul P. Gardner
- Department of Biochemistry, School of Biomedical Sciences, University of Otago, Dunedin, New Zealand
- Biomolecular Interaction Center, University of Canterbury, Christchurch, New Zealand
| |
Collapse
|
32
|
Pavlov MY, Ullman G, Ignatova Z, Ehrenberg M. Estimation of peptide elongation times from ribosome profiling spectra. Nucleic Acids Res 2021; 49:5124-5142. [PMID: 33885812 PMCID: PMC8136808 DOI: 10.1093/nar/gkab260] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2021] [Revised: 03/25/2021] [Accepted: 04/15/2021] [Indexed: 11/13/2022] Open
Abstract
Ribosome profiling spectra bear rich information on translation control and dynamics. Yet, due to technical biases in library generation, extracting quantitative measures of discrete translation events has remained elusive. Using maximum likelihood statistics and data set from Escherichia coli we develop a robust method for neutralizing technical biases (e.g. base specific RNase preferences in ribosome-protected mRNA fragments (RPF) generation), which allows for correct estimation of translation times at single codon resolution. Furthermore, we validated the method with available datasets from E. coli treated with antibiotic to inhibit isoleucyl-tRNA synthetase, and two datasets from Saccharomyces cerevisiae treated with two RNases with distinct cleavage signatures. We demonstrate that our approach accounts for RNase cleavage preferences and provides bias-corrected translation times estimates. Our approach provides a solution to the long-standing problem of extracting reliable information about peptide elongation times from highly noisy and technically biased ribosome profiling spectra.
Collapse
Affiliation(s)
- Michael Y Pavlov
- Department of Cell and Molecular Biology, Biomedical Center, University of Uppsala, 75237 Uppsala, Sweden
| | - Gustaf Ullman
- Department of Cell and Molecular Biology, Biomedical Center, University of Uppsala, 75237 Uppsala, Sweden
| | - Zoya Ignatova
- Institute for Biochemistry & Molecular Biology, University of Hamburg, 20146 Hamburg, Germany
| | - Måns Ehrenberg
- Department of Cell and Molecular Biology, Biomedical Center, University of Uppsala, 75237 Uppsala, Sweden
| |
Collapse
|
33
|
Yadav V, Ullah Irshad I, Kumar H, Sharma AK. Quantitative Modeling of Protein Synthesis Using Ribosome Profiling Data. Front Mol Biosci 2021; 8:688700. [PMID: 34262940 PMCID: PMC8274658 DOI: 10.3389/fmolb.2021.688700] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 05/25/2021] [Indexed: 12/12/2022] Open
Abstract
Quantitative prediction on protein synthesis requires accurate translation initiation and codon translation rates. Ribosome profiling data, which provide steady-state distribution of relative ribosome occupancies along a transcript, can be used to extract these rate parameters. Various methods have been developed in the past few years to measure translation-initiation and codon translation rates from ribosome profiling data. In the review, we provide a detailed analysis of the key methods employed to extract the translation rate parameters from ribosome profiling data. We further discuss how these approaches were used to decipher the role of various structural and sequence-based features of mRNA molecules in the regulation of gene expression. The utilization of these accurate rate parameters in computational modeling of protein synthesis may provide new insights into the kinetic control of the process of gene expression.
Collapse
Affiliation(s)
- Vandana Yadav
- Department of Physics, Indian Institute of Technology Madras, Chennai, India
| | | | - Hemant Kumar
- School of Basic Sciences, Indian Institute of Technology Bhubaneswar, Bhubaneswar, India
| | - Ajeet K Sharma
- Department of Physics, Indian Institute of Technology Jammu, Jammu, India
| |
Collapse
|
34
|
Zrimec J, Buric F, Kokina M, Garcia V, Zelezniak A. Learning the Regulatory Code of Gene Expression. Front Mol Biosci 2021; 8:673363. [PMID: 34179082 PMCID: PMC8223075 DOI: 10.3389/fmolb.2021.673363] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Accepted: 05/24/2021] [Indexed: 11/13/2022] Open
Abstract
Data-driven machine learning is the method of choice for predicting molecular phenotypes from nucleotide sequence, modeling gene expression events including protein-DNA binding, chromatin states as well as mRNA and protein levels. Deep neural networks automatically learn informative sequence representations and interpreting them enables us to improve our understanding of the regulatory code governing gene expression. Here, we review the latest developments that apply shallow or deep learning to quantify molecular phenotypes and decode the cis-regulatory grammar from prokaryotic and eukaryotic sequencing data. Our approach is to build from the ground up, first focusing on the initiating protein-DNA interactions, then specific coding and non-coding regions, and finally on advances that combine multiple parts of the gene and mRNA regulatory structures, achieving unprecedented performance. We thus provide a quantitative view of gene expression regulation from nucleotide sequence, concluding with an information-centric overview of the central dogma of molecular biology.
Collapse
Affiliation(s)
- Jan Zrimec
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Filip Buric
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Mariia Kokina
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Victor Garcia
- School of Life Sciences and Facility Management, Zurich University of Applied Sciences, Wädenswil, Switzerland
| | - Aleksej Zelezniak
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
- Science for Life Laboratory, Stockholm, Sweden
| |
Collapse
|
35
|
Dalvie NC, Brady JR, Crowell LE, Tracey MK, Biedermann AM, Kaur K, Hickey JM, Kristensen DL, Bonnyman AD, Rodriguez-Aponte SA, Whittaker CA, Bok M, Vega C, Mukhopadhyay TK, Joshi SB, Volkin DB, Parreño V, Love KR, Love JC. Molecular engineering improves antigen quality and enables integrated manufacturing of a trivalent subunit vaccine candidate for rotavirus. Microb Cell Fact 2021; 20:94. [PMID: 33933073 PMCID: PMC8088319 DOI: 10.1186/s12934-021-01583-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Accepted: 04/21/2021] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Vaccines comprising recombinant subunit proteins are well-suited to low-cost and high-volume production for global use. The design of manufacturing processes to produce subunit vaccines depends, however, on the inherent biophysical traits presented by an individual antigen of interest. New candidate antigens typically require developing custom processes for each one and may require unique steps to ensure sufficient yields without product-related variants. RESULTS We describe a holistic approach for the molecular design of recombinant protein antigens-considering both their manufacturability and antigenicity-informed by bioinformatic analyses such as RNA-seq, ribosome profiling, and sequence-based prediction tools. We demonstrate this approach by engineering the product sequences of a trivalent non-replicating rotavirus vaccine (NRRV) candidate to improve titers and mitigate product variants caused by N-terminal truncation, hypermannosylation, and aggregation. The three engineered NRRV antigens retained their original antigenicity and immunogenicity, while their improved manufacturability enabled concomitant production and purification of all three serotypes in a single, end-to-end perfusion-based process using the biotechnical yeast Komagataella phaffii. CONCLUSIONS This study demonstrates that molecular engineering of subunit antigens using advanced genomic methods can facilitate their manufacturing in continuous production. Such capabilities have potential to lower the cost and volumetric requirements in manufacturing vaccines based on recombinant protein subunits.
Collapse
Affiliation(s)
- Neil C Dalvie
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
- The Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Joseph R Brady
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
- The Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Laura E Crowell
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
- The Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Mary Kate Tracey
- The Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Andrew M Biedermann
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
- The Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Kawaljit Kaur
- Department of Pharmaceutical Chemistry, Vaccine Analytics and Formulation Center, University of Kansas, Lawrence, KS, 66047, USA
| | - John M Hickey
- Department of Pharmaceutical Chemistry, Vaccine Analytics and Formulation Center, University of Kansas, Lawrence, KS, 66047, USA
| | - D Lee Kristensen
- The Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Alexandra D Bonnyman
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
- The Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Sergio A Rodriguez-Aponte
- The Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Charles A Whittaker
- The Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Marina Bok
- Instituto de Virología E Innovaciones Tecnológicas, IVIT, CONICET-INTA, Hurlingham,, Buenos Aires, Argentina
| | - Celina Vega
- Instituto de Virología E Innovaciones Tecnológicas, IVIT, CONICET-INTA, Hurlingham,, Buenos Aires, Argentina
| | - Tarit K Mukhopadhyay
- Department of Biochemical Engineering, University College London, Gower Street, London, WC1E 6BT, UK
| | - Sangeeta B Joshi
- Department of Pharmaceutical Chemistry, Vaccine Analytics and Formulation Center, University of Kansas, Lawrence, KS, 66047, USA
| | - David B Volkin
- Department of Pharmaceutical Chemistry, Vaccine Analytics and Formulation Center, University of Kansas, Lawrence, KS, 66047, USA
| | - Viviana Parreño
- Instituto de Virología E Innovaciones Tecnológicas, IVIT, CONICET-INTA, Hurlingham,, Buenos Aires, Argentina
| | - Kerry R Love
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
- The Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - J Christopher Love
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
- The Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
| |
Collapse
|
36
|
Tian T, Li S, Lang P, Zhao D, Zeng J. Full-length ribosome density prediction by a multi-input and multi-output model. PLoS Comput Biol 2021; 17:e1008842. [PMID: 33770074 PMCID: PMC8026034 DOI: 10.1371/journal.pcbi.1008842] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2020] [Revised: 04/07/2021] [Accepted: 03/01/2021] [Indexed: 11/29/2022] Open
Abstract
Translation elongation is regulated by a series of complicated mechanisms in both prokaryotes and eukaryotes. Although recent advance in ribosome profiling techniques has enabled one to capture the genome-wide ribosome footprints along transcripts at codon resolution, the regulatory codes of elongation dynamics are still not fully understood. Most of the existing computational approaches for modeling translation elongation from ribosome profiling data mainly focus on local contextual patterns, while ignoring the continuity of the elongation process and relations between ribosome densities of remote codons. Modeling the translation elongation process in full-length coding sequence (CDS) level has not been studied to the best of our knowledge. In this paper, we developed a deep learning based approach with a multi-input and multi-output framework, named RiboMIMO, for modeling the ribosome density distributions of full-length mRNA CDS regions. Through considering the underlying correlations in translation efficiency among neighboring and remote codons and extracting hidden features from the input full-length coding sequence, RiboMIMO can greatly outperform the state-of-the-art baseline approaches and accurately predict the ribosome density distributions along the whole mRNA CDS regions. In addition, RiboMIMO explores the contributions of individual input codons to the predictions of output ribosome densities, which thus can help reveal important biological factors influencing the translation elongation process. The analyses, based on our interpretable metric named codon impact score, not only identified several patterns consistent with the previously-published literatures, but also for the first time (to the best of our knowledge) revealed that the codons located at a long distance from the ribosomal A site may also have an association on the translation elongation rate. This finding of long-range impact on translation elongation velocity may shed new light on the regulatory mechanisms of protein synthesis. Overall, these results indicated that RiboMIMO can provide a useful tool for studying the regulation of translation elongation in the range of full-length CDS. Translation elongation is a process in which amino acids are linked into proteins by ribosomes in cells. Translation elongation rates along the mRNAs are not constant, and are regulated by a series of mechanisms, such as codon rarity and mRNA stability. In this study, we modeled the translation elongation process at a full-length coding sequence level and developed a deep learning based approach to predict the translation elongation rates from mRNA sequences, through extracting the regulatory codes of elongation rates from the contextual sequences. The analyses, based on our interpretable metric named codon impact score, for the first time (to the best of our knowledge), revealed that in addition to the neighboring codons of the ribosomal A sites, the remote codons may also have an important impact on the translation elongation rates. This new finding may stimulate additional experiments and shed light on the regulatory mechanisms of protein synthesis.
Collapse
Affiliation(s)
- Tingzhong Tian
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Shuya Li
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Peng Lang
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Dan Zhao
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
- * E-mail: (DZ); (JZ)
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
- MOE Key Laboratory of Bioinformatics, Tsinghua University, Beijing, China
- * E-mail: (DZ); (JZ)
| |
Collapse
|
37
|
do Couto Bordignon P, Pechmann S. Inferring translational heterogeneity from Saccharomyces cerevisiae ribosome profiling. FEBS J 2021; 288:4541-4559. [PMID: 33539640 DOI: 10.1111/febs.15748] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Revised: 01/27/2021] [Accepted: 02/02/2021] [Indexed: 11/30/2022]
Abstract
Translation of mRNAs into proteins by the ribosome is the most important step of protein biosynthesis. Accordingly, translation is tightly controlled and heavily regulated to maintain cellular homeostasis. Ribosome profiling (Ribo-seq) has revolutionized the study of translation by revealing many of its underlying mechanisms. However, equally many aspects of translation remain mysterious, in part also due to persisting challenges in the interpretation of data obtained from Ribo-seq experiments. Here, we show that some of the variability observed in Ribo-seq data has biological origins and reflects programmed heterogeneity of translation. Through a comparative analysis of Ribo-seq data from Saccharomyces cerevisiae, we systematically identify short 3-codon sequences that are differentially translated (DT) across mRNAs, that is, identical sequences that are translated sometimes fast and sometimes slowly beyond what can be attributed to variability between experiments. Remarkably, the thus identified DT sequences link to mechanisms known to regulate translation elongation and are enriched in genes important for protein and organelle biosynthesis. Our results thus highlight examples of translational heterogeneity that are encoded in the genomic sequences and tuned to optimizing cellular homeostasis. More generally, our work highlights the power of Ribo-seq to understand the complexities of translation regulation.
Collapse
|
38
|
Rauscher R, Bampi GB, Guevara-Ferrer M, Santos LA, Joshi D, Mark D, Strug LJ, Rommens JM, Ballmann M, Sorscher EJ, Oliver KE, Ignatova Z. Positive epistasis between disease-causing missense mutations and silent polymorphism with effect on mRNA translation velocity. Proc Natl Acad Sci U S A 2021; 118:e2010612118. [PMID: 33468668 PMCID: PMC7848603 DOI: 10.1073/pnas.2010612118] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
Epistasis refers to the dependence of a mutation on other mutation(s) and the genetic context in general. In the context of human disorders, epistasis complicates the spectrum of disease symptoms and has been proposed as a major contributor to variations in disease outcome. The nonadditive relationship between mutations and the lack of complete understanding of the underlying physiological effects limit our ability to predict phenotypic outcome. Here, we report positive epistasis between intragenic mutations in the cystic fibrosis transmembrane conductance regulator (CFTR)-the gene responsible for cystic fibrosis (CF) pathology. We identified a synonymous single-nucleotide polymorphism (sSNP) that is invariant for the CFTR amino acid sequence but inverts translation speed at the affected codon. This sSNP in cis exhibits positive epistatic effects on some CF disease-causing missense mutations. Individually, both mutations alter CFTR structure and function, yet when combined, they lead to enhanced protein expression and activity. The most robust effect was observed when the sSNP was present in combination with missense mutations that, along with the primary amino acid change, also alter the speed of translation at the affected codon. Functional studies revealed that synergistic alteration in ribosomal velocity is the underlying mechanism; alteration of translation speed likely increases the time window for establishing crucial domain-domain interactions that are otherwise perturbed by each individual mutation.
Collapse
Affiliation(s)
- Robert Rauscher
- Biochemistry and Molecular Biology, Department of Chemistry, University of Hamburg, 20146 Hamburg, Germany
| | - Giovana B Bampi
- Biochemistry and Molecular Biology, Department of Chemistry, University of Hamburg, 20146 Hamburg, Germany
| | - Marta Guevara-Ferrer
- Biochemistry and Molecular Biology, Department of Chemistry, University of Hamburg, 20146 Hamburg, Germany
| | - Leonardo A Santos
- Biochemistry and Molecular Biology, Department of Chemistry, University of Hamburg, 20146 Hamburg, Germany
| | - Disha Joshi
- Department of Pediatrics, Emory University School of Medicine, Atlanta, GA 30322
- Children's Healthcare of Atlanta, Atlanta, GA 30322
| | - David Mark
- Biochemistry and Molecular Biology, Department of Chemistry, University of Hamburg, 20146 Hamburg, Germany
| | - Lisa J Strug
- Program in Genetics & Genome Biology, The Hospital for Sick Children, Toronto M5G 0A4, Canada
- Department of Statistical Sciences, Computer Science and Division of Biostatistics, University of Toronto, Toronto M5G 0A4, Canada
| | - Johanna M Rommens
- Program in Genetics & Genome Biology, The Hospital for Sick Children, Toronto M5G 0A4, Canada
| | | | - Eric J Sorscher
- Department of Pediatrics, Emory University School of Medicine, Atlanta, GA 30322
- Children's Healthcare of Atlanta, Atlanta, GA 30322
| | - Kathryn E Oliver
- Department of Pediatrics, Emory University School of Medicine, Atlanta, GA 30322
- Children's Healthcare of Atlanta, Atlanta, GA 30322
| | - Zoya Ignatova
- Biochemistry and Molecular Biology, Department of Chemistry, University of Hamburg, 20146 Hamburg, Germany;
| |
Collapse
|
39
|
Hu H, Liu X, Xiao A, Li Y, Zhang C, Jiang T, Zhao D, Song S, Zeng J. Riboexp: an interpretable reinforcement learning framework for ribosome density modeling. Brief Bioinform 2021; 22:6105941. [PMID: 33479731 DOI: 10.1093/bib/bbaa412] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Revised: 12/11/2020] [Indexed: 11/13/2022] Open
Abstract
Translation elongation is a crucial phase during protein biosynthesis. In this study, we develop a novel deep reinforcement learning-based framework, named Riboexp, to model the determinants of the uneven distribution of ribosomes on mRNA transcripts during translation elongation. In particular, our model employs a policy network to perform a context-dependent feature selection in the setting of ribosome density prediction. Our extensive tests demonstrated that Riboexp can significantly outperform the state-of-the-art methods in predicting ribosome density by up to 5.9% in terms of per-gene Pearson correlation coefficient on the datasets from three species. In addition, Riboexp can indicate more informative sequence features for the prediction task than other commonly used attribution methods in deep learning. In-depth analyses also revealed the meaningful biological insights generated by the Riboexp framework. Moreover, the application of Riboexp in codon optimization resulted in an increase of protein production by around 31% over the previous state-of-the-art method that models ribosome density. These results have established Riboexp as a powerful and useful computational tool in the studies of translation dynamics and protein synthesis. Availability: The data and code of this study are available on GitHub: https://github.com/Liuxg16/Riboexp. Contact: zengjy321@tsinghua.edu.cn; songsen@tsinghua.edu.cn.
Collapse
Affiliation(s)
- Hailin Hu
- School of Medicine, Tsinghua University, Beijing, 100084, China
| | - Xianggen Liu
- Laboratory for Brain and Intelligence and Department of Biomedical Engineering, Tsinghua University, Beijing, 100084, China.,Beijing Innovation Center for Future Chip, Tsinghua University, Beijing, 100084, China
| | - An Xiao
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, 100084, China
| | - YangYang Li
- Comprehensive AIDS Research Center, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, School of Life Sciences, and School of Medicine, Tsinghua University, Beijing, 100084, China
| | | | - Tao Jiang
- Department of Computer Science and Engineering, University of California, Riverside, CA 92521, USA.,Bioinformatics Division, BNRIST/Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China.,Institute of Integrative Genome Biology, University of California, Riverside, CA 92521, USA
| | - Dan Zhao
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, 100084, China
| | - Sen Song
- Laboratory for Brain and Intelligence and Department of Biomedical Engineering, Tsinghua University, Beijing, 100084, China.,Beijing Innovation Center for Future Chip, Tsinghua University, Beijing, 100084, China
| | - Jianyang Zeng
- School of Medicine, Tsinghua University, Beijing, 100084, China
| |
Collapse
|
40
|
Ahmed N, Friedrich UA, Sormanni P, Ciryam P, Altman NS, Bukau B, Kramer G, O'Brien EP. Pairs of amino acids at the P- and A-sites of the ribosome predictably and causally modulate translation-elongation rates. J Mol Biol 2020; 432:166696. [PMID: 33152326 DOI: 10.1016/j.jmb.2020.10.030] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Revised: 08/30/2020] [Accepted: 10/19/2020] [Indexed: 12/31/2022]
Abstract
Variation in translation-elongation kinetics along a transcript's coding sequence plays an important role in the maintenance of cellular protein homeostasis by regulating co-translational protein folding, localization, and maturation. Translation-elongation speed is influenced by molecular factors within mRNA and protein sequences. For example, the presence of proline in the ribosome's P- or A-site slows down translation, but the effect of other pairs of amino acids, in the context of all 400 possible pairs, has not been characterized. Here, we study Saccharomyces cerevisiae using a combination of bioinformatics, mutational experiments, and evolutionary analyses, and show that many different pairs of amino acids and their associated tRNA molecules predictably and causally encode translation rate information when these pairs are present in the A- and P-sites of the ribosome independent of other factors known to influence translation speed including mRNA structure, wobble base pairing, tripeptide motifs, positively charged upstream nascent chain residues, and cognate tRNA concentration. The fast-translating pairs of amino acids that we identify are enriched four-fold relative to the slow-translating pairs across Saccharomyces cerevisiae's proteome, while the slow-translating pairs are enriched downstream of domain boundaries. Thus, the chemical identity of amino acid pairs contributes to variability in translation rates, elongation kinetics are causally encoded in the primary structure of proteins, and signatures of evolutionary selection indicate their potential role in co-translational processes.
Collapse
Affiliation(s)
- Nabeel Ahmed
- Bioinformatics and Genomics Graduate Program, The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802, USA
| | - Ulrike A Friedrich
- Center for Molecular Biology of the Heidelberg University (ZMBH), DKFZ-ZMBH Alliance, Im Neuenheimer Feld 282, 69120 Heidelberg, Germany; German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, 69120 Heidelberg, Germany
| | - Pietro Sormanni
- Centre for Misfolding Diseases, Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, UK
| | - Prajwal Ciryam
- Centre for Misfolding Diseases, Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, UK
| | - Naomi S Altman
- Bioinformatics and Genomics Graduate Program, The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802, USA; Department of Statistics, Pennsylvania State University, University Park, PA, 16802, USA
| | - Bernd Bukau
- Center for Molecular Biology of the Heidelberg University (ZMBH), DKFZ-ZMBH Alliance, Im Neuenheimer Feld 282, 69120 Heidelberg, Germany; German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, 69120 Heidelberg, Germany
| | - Günter Kramer
- Center for Molecular Biology of the Heidelberg University (ZMBH), DKFZ-ZMBH Alliance, Im Neuenheimer Feld 282, 69120 Heidelberg, Germany; German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, 69120 Heidelberg, Germany
| | - Edward P O'Brien
- Bioinformatics and Genomics Graduate Program, The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802, USA; Department of Chemistry, Pennsylvania State University, University Park, PA 16802, USA; Institute for Computational and Data Sciences, Pennsylvania State University, University Park, PA 16802, USA.
| |
Collapse
|
41
|
Computational discovery and modeling of novel gene expression rules encoded in the mRNA. Biochem Soc Trans 2020; 48:1519-1528. [PMID: 32662820 DOI: 10.1042/bst20191048] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Revised: 06/15/2020] [Accepted: 06/17/2020] [Indexed: 11/17/2022]
Abstract
The transcript is populated with numerous overlapping codes that regulate all steps of gene expression. Deciphering these codes is very challenging due to the large number of variables involved, the non-modular nature of the codes, biases and limitations in current experimental approaches, our limited knowledge in gene expression regulation across the tree of life, and other factors. In recent years, it has been shown that computational modeling and algorithms can significantly accelerate the discovery of novel gene expression codes. Here, we briefly summarize the latest developments and different approaches in the field.
Collapse
|
42
|
Wright G, Rodriguez A, Li J, Clark PL, Milenković T, Emrich SJ. Analysis of computational codon usage models and their association with translationally slow codons. PLoS One 2020; 15:e0232003. [PMID: 32352987 PMCID: PMC7192439 DOI: 10.1371/journal.pone.0232003] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2019] [Accepted: 04/05/2020] [Indexed: 11/19/2022] Open
Abstract
Improved computational modeling of protein translation rates, including better prediction of where translational slowdowns along an mRNA sequence may occur, is critical for understanding co-translational folding. Because codons within a synonymous codon group are translated at different rates, many computational translation models rely on analyzing synonymous codons. Some models rely on genome-wide codon usage bias (CUB), believing that globally rare and common codons are the most informative of slow and fast translation, respectively. Others use the CUB observed only in highly expressed genes, which should be under selective pressure to be translated efficiently (and whose CUB may therefore be more indicative of translation rates). No prior work has analyzed these models for their ability to predict translational slowdowns. Here, we evaluate five models for their association with slowly translated positions as denoted by two independent ribosome footprint (RFP) count experiments from S. cerevisiae, because RFP data is often considered as a “ground truth” for translation rates across mRNA sequences. We show that all five considered models strongly associate with the RFP data and therefore have potential for estimating translational slowdowns. However, we also show that there is a weak correlation between RFP counts for the same genes originating from independent experiments, even when their experimental conditions are similar. This raises concerns about the efficacy of using current RFP experimental data for estimating translation rates and highlights a potential advantage of using computational models to understand translation rates instead.
Collapse
Affiliation(s)
- Gabriel Wright
- Department of Computer Science & Engineering, University of Notre Dame, Notre Dame, IN, United States of America
- * E-mail:
| | - Anabel Rodriguez
- Department of Chemistry & Biochemistry, University of Notre Dame, Notre Dame, IN, United States of America
| | - Jun Li
- Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame, IN, United States of America
| | - Patricia L. Clark
- Department of Chemistry & Biochemistry, University of Notre Dame, Notre Dame, IN, United States of America
| | - Tijana Milenković
- Department of Computer Science & Engineering, University of Notre Dame, Notre Dame, IN, United States of America
| | - Scott J. Emrich
- Department of Electrical Engineering & Computer Science, University of Tennessee, Knoxville, TN, United States of America
| |
Collapse
|
43
|
Kiniry SJ, Michel AM, Baranov PV. Computational methods for ribosome profiling data analysis. WILEY INTERDISCIPLINARY REVIEWS. RNA 2020; 11:e1577. [PMID: 31760685 DOI: 10.1002/wrna.1577] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Revised: 10/12/2019] [Accepted: 10/16/2019] [Indexed: 12/15/2022]
Abstract
Since the introduction of the ribosome profiling technique in 2009 its popularity has greatly increased. It is widely used for the comprehensive assessment of gene expression and for studying the mechanisms of regulation at the translational level. As the number of ribosome profiling datasets being produced continues to grow, so too does the need for reliable software that can provide answers to the biological questions it can address. This review describes the computational methods and tools that have been developed to analyze ribosome profiling data at the different stages of the process. It starts with initial routine processing of raw data and follows with more specific tasks such as the identification of translated open reading frames, differential gene expression analysis, or evaluation of local or global codon decoding rates. The review pinpoints challenges associated with each step and explains the ways in which they are currently addressed. In addition it provides a comprehensive, albeit incomplete, list of publicly available software applicable to each step, which may be a beneficial starting point to those unexposed to ribosome profiling analysis. The outline of current challenges in ribosome profiling data analysis may inspire computational biologists to search for novel, potentially superior, solutions that will improve and expand the bioinformatician's toolbox for ribosome profiling data analysis. This article is characterized under: Translation > Ribosome Structure/Function RNA Evolution and Genomics > Computational Analyses of RNA Translation > Translation Mechanisms Translation > Translation Regulation.
Collapse
Affiliation(s)
- Stephen J Kiniry
- School of Biochemistry and Cell Biology, University College Cork, Cork, Ireland
| | - Audrey M Michel
- School of Biochemistry and Cell Biology, University College Cork, Cork, Ireland
| | - Pavel V Baranov
- School of Biochemistry and Cell Biology, University College Cork, Cork, Ireland
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, RAS, Moscow, Russia
| |
Collapse
|
44
|
Gobet C, Weger BD, Marquis J, Martin E, Neelagandan N, Gachon F, Naef F. Robust landscapes of ribosome dwell times and aminoacyl-tRNAs in response to nutrient stress in liver. Proc Natl Acad Sci U S A 2020; 117:9630-9641. [PMID: 32295881 PMCID: PMC7196831 DOI: 10.1073/pnas.1918145117] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Translation depends on messenger RNA (mRNA)-specific initiation, elongation, and termination rates. While translation elongation is well studied in bacteria and yeast, less is known in higher eukaryotes. Here we combined ribosome and transfer RNA (tRNA) profiling to investigate the relations between translation elongation rates, (aminoacyl-) tRNA levels, and codon usage in mammals. We modeled codon-specific ribosome dwell times from ribosome profiling, considering codon pair interactions between ribosome sites. In mouse liver, the model revealed site- and codon-specific dwell times that differed from those in yeast, as well as pairs of adjacent codons in the P and A site that markedly slow down or speed up elongation. While translation efficiencies vary across diurnal time and feeding regimen, codon dwell times were highly stable and conserved in human. Measured tRNA levels correlated with codon usage and several tRNAs showed reduced aminoacylation, which was conserved in fasted mice. Finally, we uncovered that the longest codon dwell times could be explained by aminoacylation levels or high codon usage relative to tRNA abundance.
Collapse
Affiliation(s)
- Cédric Gobet
- Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Lausanne CH-1015, Switzerland
- Nestlé Research, CH-1015 Lausanne, Switzerland
| | - Benjamin Dieter Weger
- Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Lausanne CH-1015, Switzerland
- Nestlé Research, CH-1015 Lausanne, Switzerland
| | | | - Eva Martin
- Nestlé Research, CH-1015 Lausanne, Switzerland
| | - Nagammal Neelagandan
- Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Lausanne CH-1015, Switzerland
| | | | - Felix Naef
- Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Lausanne CH-1015, Switzerland;
| |
Collapse
|
45
|
Recent advances in ribosome profiling for deciphering translational regulation. Methods 2020; 176:46-54. [DOI: 10.1016/j.ymeth.2019.05.011] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2018] [Revised: 05/02/2019] [Accepted: 05/15/2019] [Indexed: 12/16/2022] Open
|
46
|
Alexaki A, Kames J, Hettiarachchi GK, Athey JC, Katneni UK, Hunt RC, Hamasaki-Katagiri N, Holcomb DD, DiCuccio M, Bar H, Komar AA, Kimchi-Sarfaty C. Ribosome profiling of HEK293T cells overexpressing codon optimized coagulation factor IX. F1000Res 2020; 9:174. [PMID: 33014344 PMCID: PMC7509596 DOI: 10.12688/f1000research.22400.2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 09/09/2020] [Indexed: 12/30/2022] Open
Abstract
Ribosome profiling provides the opportunity to evaluate translation kinetics at codon level resolution. Here, we describe ribosome profiling data, generated from two HEK293T cell lines. The ribosome profiling data are composed of Ribo-seq (mRNA sequencing data from ribosome protected fragments) and RNA-seq data (total RNA sequencing). The two HEK293T cell lines each express a version of the
F9 gene, both of which are translated into identical proteins in terms of their amino acid sequences. However, these
F9 genes vary drastically in their codon usage and predicted mRNA structure. We also provide the pipeline that we used to analyze the data. Further analyzing this dataset holds great potential as it can be used i) to unveil insights into the composition and regulation of the transcriptome, ii) for comparison with other ribosome profiling datasets, iii) to measure the rate of protein synthesis across the proteome and identify differences in elongation rates, iv) to discover previously unidentified translation of peptides, v) to explore the effects of codon usage or codon context in translational kinetics and vi) to investigate cotranslational folding. Importantly, a unique feature of this dataset, compared to other available ribosome profiling data, is the presence of the
F9 gene in two very distinct coding sequences.
Collapse
Affiliation(s)
- Aikaterini Alexaki
- Center for Biologics Evaluation and Research, Food and Drug Administration, USA, Silver Spring, MD, 20993, USA
| | - Jacob Kames
- Center for Biologics Evaluation and Research, Food and Drug Administration, USA, Silver Spring, MD, 20993, USA
| | - Gaya K Hettiarachchi
- Center for Biologics Evaluation and Research, Food and Drug Administration, USA, Silver Spring, MD, 20993, USA
| | - John C Athey
- Center for Biologics Evaluation and Research, Food and Drug Administration, USA, Silver Spring, MD, 20993, USA
| | - Upendra K Katneni
- Center for Biologics Evaluation and Research, Food and Drug Administration, USA, Silver Spring, MD, 20993, USA
| | - Ryan C Hunt
- Center for Biologics Evaluation and Research, Food and Drug Administration, USA, Silver Spring, MD, 20993, USA
| | - Nobuko Hamasaki-Katagiri
- Center for Biologics Evaluation and Research, Food and Drug Administration, USA, Silver Spring, MD, 20993, USA
| | - David D Holcomb
- Center for Biologics Evaluation and Research, Food and Drug Administration, USA, Silver Spring, MD, 20993, USA
| | - Michael DiCuccio
- National Center of Biotechnology Information, National Institutes of Health, USA, Bethesda, MD, 20892, USA
| | - Haim Bar
- Department of Statistics, University of Connecticut, Storrs, CT, 06269, USA
| | - Anton A Komar
- Center for Gene Regulation in Health and Disease, Cleveland State University, Cleveland, OH, 44115, USA
| | - Chava Kimchi-Sarfaty
- Center for Biologics Evaluation and Research, Food and Drug Administration, USA, Silver Spring, MD, 20993, USA
| |
Collapse
|
47
|
Koo PK, Ploenzke M. Deep learning for inferring transcription factor binding sites. CURRENT OPINION IN SYSTEMS BIOLOGY 2020; 19:16-23. [PMID: 32905524 PMCID: PMC7469942 DOI: 10.1016/j.coisb.2020.04.001] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Deep learning is a powerful tool for predicting transcription factor binding sites from DNA sequence. Despite their high predictive accuracy, there are no guarantees that a high-performing deep learning model will learn causal sequence-function relationships. Thus a move beyond performance comparisons on benchmark datasets is needed. Interpreting model predictions is a powerful approach to identify which features drive performance gains and ideally provide insight into the underlying biological mechanisms. Here we highlight timely advances in deep learning for genomics, with a focus on inferring transcription factors binding sites. We describe recent applications, model architectures, and advances in local and global model interpretability methods, then conclude with a discussion on future research directions.
Collapse
Affiliation(s)
- Peter K Koo
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Matt Ploenzke
- Department of Biostatistics, Harvard University, Cambridge, MA, USA
| |
Collapse
|
48
|
XPRESSyourself: Enhancing, standardizing, and automating ribosome profiling computational analyses yields improved insight into data. PLoS Comput Biol 2020; 16:e1007625. [PMID: 32004313 PMCID: PMC7015430 DOI: 10.1371/journal.pcbi.1007625] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2019] [Revised: 02/12/2020] [Accepted: 12/20/2019] [Indexed: 11/19/2022] Open
Abstract
Ribosome profiling, an application of nucleic acid sequencing for monitoring ribosome activity, has revolutionized our understanding of protein translation dynamics. This technique has been available for a decade, yet the current state and standardization of publicly available computational tools for these data is bleak. We introduce XPRESSyourself, an analytical toolkit that eliminates barriers and bottlenecks associated with this specialized data type by filling gaps in the computational toolset for both experts and non-experts of ribosome profiling. XPRESSyourself automates and standardizes analysis procedures, decreasing time-to-discovery and increasing reproducibility. This toolkit acts as a reference implementation of current best practices in ribosome profiling analysis. We demonstrate this toolkit’s performance on publicly available ribosome profiling data by rapidly identifying hypothetical mechanisms related to neurodegenerative phenotypes and neuroprotective mechanisms of the small-molecule ISRIB during acute cellular stress. XPRESSyourself brings robust, rapid analysis of ribosome-profiling data to a broad and ever-expanding audience and will lead to more reproducible and accessible measurements of translation regulation. XPRESSyourself software is perpetually open-source under the GPL-3.0 license and is hosted at https://github.com/XPRESSyourself, where users can access additional documentation and report software issues.
Collapse
|
49
|
McGeary SE, Lin KS, Shi CY, Pham TM, Bisaria N, Kelley GM, Bartel DP. The biochemical basis of microRNA targeting efficacy. Science 2019; 366:eaav1741. [PMID: 31806698 PMCID: PMC7051167 DOI: 10.1126/science.aav1741] [Citation(s) in RCA: 847] [Impact Index Per Article: 141.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Revised: 09/24/2019] [Accepted: 11/16/2019] [Indexed: 12/12/2022]
Abstract
MicroRNAs (miRNAs) act within Argonaute proteins to guide repression of messenger RNA targets. Although various approaches have provided insight into target recognition, the sparsity of miRNA-target affinity measurements has limited understanding and prediction of targeting efficacy. Here, we adapted RNA bind-n-seq to enable measurement of relative binding affinities between Argonaute-miRNA complexes and all sequences ≤12 nucleotides in length. This approach revealed noncanonical target sites specific to each miRNA, miRNA-specific differences in canonical target-site affinities, and a 100-fold impact of dinucleotides flanking each site. These data enabled construction of a biochemical model of miRNA-mediated repression, which was extended to all miRNA sequences using a convolutional neural network. This model substantially improved prediction of cellular repression, thereby providing a biochemical basis for quantitatively integrating miRNAs into gene-regulatory networks.
Collapse
Affiliation(s)
- Sean E McGeary
- Howard Hughes Medical Institute, Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA
- Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Kathy S Lin
- Howard Hughes Medical Institute, Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA
- Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Computational and Systems Biology Program, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Charlie Y Shi
- Howard Hughes Medical Institute, Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA
- Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Thy M Pham
- Howard Hughes Medical Institute, Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA
- Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Namita Bisaria
- Howard Hughes Medical Institute, Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA
- Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Gina M Kelley
- Howard Hughes Medical Institute, Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA
- Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - David P Bartel
- Howard Hughes Medical Institute, Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA.
- Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Computational and Systems Biology Program, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| |
Collapse
|
50
|
Zinshteyn B, Chan D, England W, Feng C, Green R, Spitale RC. Assaying RNA structure with LASER-Seq. Nucleic Acids Res 2019; 47:43-55. [PMID: 30476193 PMCID: PMC6326810 DOI: 10.1093/nar/gky1172] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2018] [Accepted: 11/17/2018] [Indexed: 01/06/2023] Open
Abstract
Chemical probing methods are crucial to our understanding of the structure and function of RNA molecules. The majority of chemical methods used to probe RNA structure report on Watson–Crick pairing, but tertiary structure parameters such as solvent accessibility can provide an additional layer of structural information, particularly in RNA-protein complexes. Herein we report the development of Light Activated Structural Examination of RNA by high-throughput sequencing, or LASER-Seq, for measuring RNA structure in cells with deep sequencing. LASER relies on a light-generated nicotinoyl nitrenium ion to form covalent adducts with the C8 position of adenosine and guanosine. Reactivity is governed by the accessibility of C8 to the light-generated probe. We compare structure probing by RT-stop and mutational profiling (MaP), demonstrating that LASER can be integrated with both platforms for RNA structure analyses. We find that LASER reactivity correlates with solvent accessibility across the entire ribosome, and that LASER can be used to rapidly survey for ligand binding sites in an unbiased fashion. LASER has a particular advantage in this last application, as it readily modifies paired nucleotides, enabling the identification of binding sites and conformational changes in highly structured RNA.
Collapse
Affiliation(s)
- Boris Zinshteyn
- Department of Molecular Biology and Genetics, Johns Hopkins University. Baltimore, MD 21205, USA
| | - Dalen Chan
- Department of Pharmaceutical Sciences, University of California, Irvine, Irvine, CA 92697, USA
| | - Whitney England
- Department of Pharmaceutical Sciences, University of California, Irvine, Irvine, CA 92697, USA
| | - Chao Feng
- Department of Pharmaceutical Sciences, University of California, Irvine, Irvine, CA 92697, USA
| | - Rachel Green
- Department of Molecular Biology and Genetics, Johns Hopkins University. Baltimore, MD 21205, USA.,Howard Hughes Medical Institute, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Robert C Spitale
- Department of Pharmaceutical Sciences, University of California, Irvine, Irvine, CA 92697, USA.,Department of Chemistry, University of California, Irvine, Irvine, CA 92697, USA
| |
Collapse
|