1
|
Li D, Li P, Shi Y, Sheerin ED, Zhang Z, Yang L, Xiao L, Hill C, Gordon C, Ruether M, Pepper J, Sader JE, Morris MA, Wang JJ, Boland JJ. Stress-induced phase separation in plastics drives the release of amorphous polymer micropollutants into water. Nat Commun 2025; 16:3814. [PMID: 40268905 PMCID: PMC12018937 DOI: 10.1038/s41467-025-58898-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Accepted: 04/03/2025] [Indexed: 04/25/2025] Open
Abstract
Residual stress is an intrinsic property of semicrystalline plastics such as polypropylene and polyethylene. However, there is no fundamental understanding of the role intrinsic residual stress plays in the generation of plastic pollutants that threaten the environment and human health. Here, we show that the processing-induced compressive residual stress typically found in polypropylene and polyethylene plastics forces internal nano and microscale segregation of low molecular weight (MW) amorphous polymer droplets onto the plastic's surface. Squeeze flow simulations reveal this stress-driven volumetric flow is consistent with that of a Bingham plastic material, with a temperature-dependent threshold yield stress. We confirm that flow is thermally activated and stress dependent, with a reduced energy barrier at higher compressive stresses. Transfer of surface segregated droplets into water generates amorphous polymer micropollutants (APMPs) that are denatured, with structure and composition different from that of traditional polycrystalline microplastics. Studies with water-containing plastic bottles show that the highly compressed bottle neck and mouth regions are predominantly responsible for the release of APMPs. Our findings reveal a stress-induced mechanism of plastic degradation and underscore the need to modify current plastic processing technologies to reduce residual stress levels and suppress phase separation of low MW APMPs in plastics.
Collapse
Affiliation(s)
- Dunzhu Li
- Jiyang College, Zhejiang A&F University, Zhuji, China.
- AMBER Research Centre and Centre for Research on Adaptive Nanostructures and Nanodevices (CRANN), Trinity College Dublin, Dublin, Ireland.
- Department of Civil, Structural and Environmental Engineering, Trinity College Dublin, Dublin, Ireland.
| | - Peijing Li
- School of Mathematics and Statistics, The University of Melbourne, Victoria, Australia
| | - Yunhong Shi
- Department of Civil, Structural and Environmental Engineering, Trinity College Dublin, Dublin, Ireland
| | - Emmet D Sheerin
- AMBER Research Centre and Centre for Research on Adaptive Nanostructures and Nanodevices (CRANN), Trinity College Dublin, Dublin, Ireland
- School of Chemistry, Trinity College Dublin, Dublin, Ireland
| | - Zihan Zhang
- Department of Civil, Structural and Environmental Engineering, Trinity College Dublin, Dublin, Ireland
| | - Luming Yang
- AMBER Research Centre and Centre for Research on Adaptive Nanostructures and Nanodevices (CRANN), Trinity College Dublin, Dublin, Ireland
- Department of Civil, Structural and Environmental Engineering, Trinity College Dublin, Dublin, Ireland
| | - Liwen Xiao
- Department of Civil, Structural and Environmental Engineering, Trinity College Dublin, Dublin, Ireland.
- TrinityHaus, Trinity College Dublin, Dublin, Ireland.
| | - Christopher Hill
- AMBER Research Centre and Centre for Research on Adaptive Nanostructures and Nanodevices (CRANN), Trinity College Dublin, Dublin, Ireland
- School of Chemistry, Trinity College Dublin, Dublin, Ireland
| | - Conall Gordon
- AMBER Research Centre and Centre for Research on Adaptive Nanostructures and Nanodevices (CRANN), Trinity College Dublin, Dublin, Ireland
- School of Chemistry, Trinity College Dublin, Dublin, Ireland
| | - Manuel Ruether
- School of Chemistry, Trinity College Dublin, Dublin, Ireland
| | - Joshua Pepper
- AMBER Research Centre and Centre for Research on Adaptive Nanostructures and Nanodevices (CRANN), Trinity College Dublin, Dublin, Ireland
- School of Chemistry, Trinity College Dublin, Dublin, Ireland
| | - John E Sader
- Graduate Aerospace Laboratories and Department of Applied Physics, California Institute of Technology, Pasadena, USA
| | - Michael A Morris
- AMBER Research Centre and Centre for Research on Adaptive Nanostructures and Nanodevices (CRANN), Trinity College Dublin, Dublin, Ireland
- School of Chemistry, Trinity College Dublin, Dublin, Ireland
| | - Jing Jing Wang
- AMBER Research Centre and Centre for Research on Adaptive Nanostructures and Nanodevices (CRANN), Trinity College Dublin, Dublin, Ireland.
| | - John J Boland
- AMBER Research Centre and Centre for Research on Adaptive Nanostructures and Nanodevices (CRANN), Trinity College Dublin, Dublin, Ireland.
- School of Chemistry, Trinity College Dublin, Dublin, Ireland.
| |
Collapse
|
2
|
Herynek Š, Svoboda J, Huličiak M, Peleg Y, Škultétyová Ľ, Mikulecký P, Schneider B. Increasing recombinant protein production in E. coli via FACS-based selection of N-terminal coding DNA libraries. FEBS J 2025; 292:1070-1085. [PMID: 39726159 PMCID: PMC11880969 DOI: 10.1111/febs.17376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2024] [Revised: 08/28/2024] [Accepted: 11/25/2024] [Indexed: 12/28/2024]
Abstract
Here, we present a previously undescribed approach to modify N-terminal sequences of recombinant proteins to increase their production yield in Escherichia coli. Prior research has demonstrated that the nucleotides immediately following the start codon can significantly influence protein expression. However, the impact of these sequences is construct-specific and is not universally applicable to all proteins. Most of the previous research has been limited to selecting from a few rationally designed sequences. In contrast, we used a directed evolution-based methodology, screening large numbers of diversified sequences derived from DNA libraries coding for the N-termini of investigated proteins. To facilitate the identification of cells with increased expression of the target construct, we cloned a GFP gene at the C-terminus of the expressed genes and used fluorescent activated cell sorting (FACS) to separate cells based on their fluorescence. By following this systematic workflow, we successfully elevated the yield of soluble recombinant proteins of multiple constructs up to over 30-fold.
Collapse
Affiliation(s)
- Štěpán Herynek
- Institute of Biotechnology, Czech Academy of Sciences, BIOCEVPragueCzech Republic
| | - Jakub Svoboda
- Institute of Biotechnology, Czech Academy of Sciences, BIOCEVPragueCzech Republic
| | - Maroš Huličiak
- Institute of Biotechnology, Czech Academy of Sciences, BIOCEVPragueCzech Republic
| | - Yoav Peleg
- Structural Proteomics Unit (SPU), Department of Life Sciences Core Facilities (LSCF)Weizmann Institute of ScienceRehovotIsrael
| | - Ľubica Škultétyová
- Institute of Biotechnology, Czech Academy of Sciences, BIOCEVPragueCzech Republic
| | - Pavel Mikulecký
- Institute of Biotechnology, Czech Academy of Sciences, BIOCEVPragueCzech Republic
| | - Bohdan Schneider
- Institute of Biotechnology, Czech Academy of Sciences, BIOCEVPragueCzech Republic
| |
Collapse
|
3
|
Ren J, Oh SH, Na D. Untranslated region engineering strategies for gene overexpression, fine-tuning, and dynamic regulation. J Microbiol 2025; 63:e2501033. [PMID: 40195839 DOI: 10.71150/jm.2501033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2025] [Accepted: 03/10/2025] [Indexed: 04/09/2025]
Abstract
Precise and tunable gene expression is crucial for various biotechnological applications, including protein overexpression, fine-tuned metabolic pathway engineering, and dynamic gene regulation. Untranslated regions (UTRs) of mRNAs have emerged as key regulatory elements that modulate transcription and translation. In this review, we explore recent advances in UTR engineering strategies for bacterial gene expression optimization. We discuss approaches for enhancing protein expression through AU-rich elements, RG4 structures, and synthetic dual UTRs, as well as ProQC systems that improve translation fidelity. Additionally, we examine strategies for fine-tuning gene expression using UTR libraries and synthetic terminators that balance metabolic flux. Finally, we highlight riboswitches and toehold switches, which enable dynamic gene regulation in response to environmental or metabolic cues. The integration of these UTR-based regulatory tools provides a versatile and modular framework for optimizing bacterial gene expression, enhancing metabolic engineering, and advancing synthetic biology applications.
Collapse
Affiliation(s)
- Jun Ren
- Department of Biomedical Engineering, Chung-Ang University, Seoul 06974, Republic of Korea
| | - So Hee Oh
- Department of Biomedical Engineering, Chung-Ang University, Seoul 06974, Republic of Korea
| | - Dokyun Na
- Department of Biomedical Engineering, Chung-Ang University, Seoul 06974, Republic of Korea
| |
Collapse
|
4
|
Boyko K, Bernstein RA, Kim M, Cate JHD. Role of Ribosomal Protein bS1 in Orthogonal mRNA Start Codon Selection. Biochemistry 2025; 64:710-718. [PMID: 39854700 PMCID: PMC11800381 DOI: 10.1021/acs.biochem.4c00688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2024] [Revised: 01/01/2025] [Accepted: 01/08/2025] [Indexed: 01/26/2025]
Abstract
In many bacteria, the location of the mRNA start codon is determined by a short ribosome binding site sequence that base pairs with the 3'-end of 16S rRNA (rRNA) in the 30S subunit. Many groups have changed these short sequences, termed the Shine-Dalgarno (SD) sequence in the mRNA and the anti-Shine-Dalgarno (ASD) sequence in 16S rRNA, to create "orthogonal" ribosomes to enable the synthesis of orthogonal polymers in the presence of the endogenous translation machinery. However, orthogonal ribosomes are prone to SD-independent translation. Ribosomal protein bS1, which binds to the 30S ribosomal subunit, is thought to promote translation initiation by shuttling the mRNA to the ribosome. Thus, a better understanding of how the SD and bS1 contribute to start codon selection could help efforts to improve the orthogonality of ribosomes. Here, we engineered the Escherichia coli ribosome to prevent binding of bS1 to the 30S subunit and separate the activity of bS1 binding to the ribosome from the role of the mRNA SD sequence in start codon selection. We find that ribosomes lacking bS1 are slightly less active than wild-type ribosomes in vitro. Furthermore, orthogonal 30S subunits lacking bS1 do not have an improved orthogonality. Our findings suggest that mRNA features outside the SD sequence and independent of binding of bS1 to the ribosome likely contribute to start codon selection and the lack of orthogonality of present orthogonal ribosomes.
Collapse
Affiliation(s)
- Kristina
V. Boyko
- Biophysics
Graduate Group, University of California, Berkeley, California 94720, United States
| | - Rebecca A. Bernstein
- Department
of Chemistry, University of California, Berkeley, California 94720, United States
| | - Minji Kim
- Department
of Molecular and Cell Biology, University
of California, Berkeley, California 94720, United States
| | - Jamie H. D. Cate
- Department
of Chemistry, University of California, Berkeley, California 94720, United States
- Department
of Molecular and Cell Biology, University
of California, Berkeley, California 94720, United States
- Innovative
Genomics Institute, University of California, Berkeley, California 94720, United States
- Molecular
Biophysics and Integrated Bioimaging, Lawrence
Berkeley National Laboratory, Berkeley, California 94720, United States
| |
Collapse
|
5
|
Calis S, Gevaert K. The role of Nα-terminal acetylation in protein conformation. FEBS J 2025; 292:453-467. [PMID: 38923676 DOI: 10.1111/febs.17209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Accepted: 06/12/2024] [Indexed: 06/28/2024]
Abstract
Especially in higher eukaryotes, the N termini of proteins are subject to enzymatic modifications, with the acetylation of the alpha-amino group of nascent polypeptides being a prominent one. In recent years, the specificities and substrates of the enzymes responsible for this modification, the Nα-terminal acetyltransferases, have been mapped in several proteomic studies. Aberrant expression of, and mutations in these enzymes were found to be associated with several human diseases, explaining the growing interest in protein Nα-terminal acetylation. With some enzymes, such as the Nα-terminal acetyltransferase A complex having thousands of possible substrates, researchers are now trying to decipher the functional outcome of Nα-terminal protein acetylation. In this review, we zoom in on one possible functional consequence of Nα-terminal protein acetylation; its effect on protein folding. Using selected examples of proteins associated with human diseases such as alpha-synuclein and huntingtin, here, we discuss the sometimes contradictory findings of the effects of Nα-terminal protein acetylation on protein (mis)folding and aggregation.
Collapse
Affiliation(s)
- Sam Calis
- VIB Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Belgium
| | - Kris Gevaert
- VIB Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Belgium
| |
Collapse
|
6
|
Valentin-Alvarado LE, Knott GJ. From Code to Comprehension: AI Captures the Language of Life. CRISPR J 2025; 8:2-4. [PMID: 39879534 DOI: 10.1089/crispr.2025.0008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2025] Open
Affiliation(s)
- Luis E Valentin-Alvarado
- Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Clayton, Australia
| | - Gavin J Knott
- Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Clayton, Australia
| |
Collapse
|
7
|
Paget‐Bailly P, Helpiquet A, Decourcelle M, Bories R, Bravo IG. Translation of the downstream ORF from bicistronic mRNAs by human cells: Impact of codon usage and splicing in the upstream ORF. Protein Sci 2025; 34:e70036. [PMID: 39840808 PMCID: PMC11751868 DOI: 10.1002/pro.70036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Revised: 11/19/2024] [Accepted: 01/03/2025] [Indexed: 01/23/2025]
Abstract
Biochemistry textbooks describe eukaryotic mRNAs as monocistronic. However, increasing evidence reveals the widespread presence and translation of upstream open reading frames preceding the "main" ORF. DNA and RNA viruses infecting eukaryotes often produce polycistronic mRNAs and viruses have evolved multiple ways of manipulating the host's translation machinery. Here, we introduce an experimental model to study gene expression regulation from virus-like bicistronic mRNAs in human cells. The model consists of a short upstream ORF and a reporter downstream ORF encoding a fluorescent protein. We have engineered synonymous variants of the upstream ORF to explore large parameter space, including codon usage preferences, mRNA folding features, and splicing propensity. We show that human translation machinery can translate the downstream ORF from bicistronic mRNAs, albeit reporter protein levels are thousand times lower than those from the upstream ORF. Furthermore, synonymous recoding of the upstream ORF exclusively during elongation significantly influences its own translation efficiency, reveals cryptic splice signals, and modulates the probability of downstream ORF translation. Our results are consistent with a leaky scanning mechanism facilitating downstream ORF translation from bicistronic mRNAs in human cells, offering new insights into the role of upstream ORFs in translation regulation.
Collapse
Affiliation(s)
- Philippe Paget‐Bailly
- Laboratory MIVEGEC (Univ. Montpellier, CNRS, IRD)French National Center for Scientific Research (CNRS)MontpellierFrance
| | - Alexandre Helpiquet
- Laboratory MIVEGEC (Univ. Montpellier, CNRS, IRD)French National Center for Scientific Research (CNRS)MontpellierFrance
| | - Mathilde Decourcelle
- Functional Proteomics PlatformBioCampus Montpellier (University of Montpellier, CNRS, INSERM)MontpellierFrance
| | - Roxane Bories
- Laboratory MIVEGEC (Univ. Montpellier, CNRS, IRD)French National Center for Scientific Research (CNRS)MontpellierFrance
| | - Ignacio G. Bravo
- Laboratory MIVEGEC (Univ. Montpellier, CNRS, IRD)French National Center for Scientific Research (CNRS)MontpellierFrance
| |
Collapse
|
8
|
Wong DPH, Wong KH, Park S, Boël G, Hunt JF, Aalberts DP. OPT: Codon optimize gene sequences for E. coli protein overexpression. J Mol Biol 2025:168965. [PMID: 40133777 DOI: 10.1016/j.jmb.2025.168965] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2024] [Revised: 01/23/2025] [Accepted: 01/23/2025] [Indexed: 03/27/2025]
Abstract
The ability to overexpress proteins is valuable for biotechnology, but not all sequences are compatible with high yield. We previously analyzed the sequence features and mRNA folding stability of a large data set of 6,384 distinct gene constructs, and developed a model for protein yield. Our OPT.williams.edu server (1) predicts the probability an input sequence will produce protein at a high level when overexpressed in E. coli, and (2) returns optimized synonymous sequences designed to boost protein expression. Here we also present experimental evidence of the high yields of our OPT constructs for eight commercially produced proteins.
Collapse
Affiliation(s)
- Daniel P H Wong
- Physics Department, Williams College, Williamstown, MA 01267, USA
| | - Kam-Ho Wong
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
| | - Sunjae Park
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
| | - Grégory Boël
- Expression Génétique Microbienne, CNRS, Universite Paris Cite, Institut de Biologie Physio-Chimique, F-75005 Paris, France.
| | - John F Hunt
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA.
| | | |
Collapse
|
9
|
Shen Y, Kudla G, Oyarzún DA. Improving the generalization of protein expression models with mechanistic sequence information. Nucleic Acids Res 2025; 53:gkaf020. [PMID: 39873269 PMCID: PMC11773361 DOI: 10.1093/nar/gkaf020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2024] [Revised: 12/12/2024] [Accepted: 01/08/2025] [Indexed: 01/30/2025] Open
Abstract
The growing demand for biological products drives many efforts to maximize expression of heterologous proteins. Advances in high-throughput sequencing can produce data suitable for building sequence-to-expression models with machine learning. The most accurate models have been trained on one-hot encodings, a mechanism-agnostic representation of nucleotide sequences. Moreover, studies have consistently shown that training on mechanistic sequence features leads to much poorer predictions, even with features that are known to correlate with expression, such as DNA sequence motifs, codon usage, or properties of mRNA secondary structures. However, despite their excellent local accuracy, current sequence-to-expression models can fail to generalize predictions far away from the training data. Through a comparative study across datasets in Escherichia coli and Saccharomyces cerevisiae, here we show that mechanistic sequence features can provide gains on model generalization, and thus improve their utility for predictive sequence design. We explore several strategies to integrate one-hot encodings and mechanistic features into a single predictive model, including feature stacking, ensemble model stacking, and geometric stacking, a novel architecture based on graph convolutional neural networks. Our work casts new light on mechanistic sequence features, underscoring the importance of domain-knowledge and feature engineering for accurate prediction of protein expression levels.
Collapse
Affiliation(s)
- Yuxin Shen
- School of Biological Sciences, University of Edinburgh, Edinburgh, EH9 3JH, United Kingdom
| | - Grzegorz Kudla
- Institute for Genetics and Cancer, University of Edinburgh, Edinburgh, EH4 2XU, United Kingdom
| | - Diego A Oyarzún
- School of Biological Sciences, University of Edinburgh, Edinburgh, EH9 3JH, United Kingdom
- School of Informatics, University of Edinburgh, Edinburgh, EH8 9AB, United Kingdom
| |
Collapse
|
10
|
Li J, Li J, Li P, Zhang J, Liu Q, Qi H. Influence of 5'-UTR nucleotide composition on translation efficiency in Escherichiacoli. Res Microbiol 2025; 176:104260. [PMID: 39551118 DOI: 10.1016/j.resmic.2024.104260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2024] [Revised: 11/13/2024] [Accepted: 11/14/2024] [Indexed: 11/19/2024]
Abstract
Translation initiation for 5'-UTR contributes primarily to the efficient protein expression in Escherichia coli. Many studies have focused on constructing random 5'-UTR libraries to investigate the impact of mRNA features on protein translation efficiency. However, the study on the effect of the absence of specific types of nucleotides in the entire 5'-UTR region on translation efficiency has not yet been reported. Here, we constructed four reporter plasmid libraries encoding the sfGFP fluorescent protein, each preceded by 5'-UTRs that lack one specific nucleotide (25B, 25D, 25H, 25V). Each library was transformed into E. coli cells, and the fluorescence distribution among the different libraries was analyzed by flow cytometer. Additionally, we quantified the activity of 256 unique 5'-UTR sequences and analyzed the impact of the corresponding mRNA sequence features on translation efficiency. We found that the 25D library, which lacks the C nucleotide, exhibited the highest overall translation efficiency compared to the other three libraries. Moreover, the minimum free energy and 16S rRNA hybridization energy of the 5'-UTR sequence could work coordinately to influence translation efficiency. The 5'-UTR sequences lacking the C nucleotide also achieve efficient protein translation. These findings may provide several guiding principles for precisely tuning protein expression.
Collapse
Affiliation(s)
- Jinjin Li
- School of Chemical Engineering and Technology, Tianjin University, Tianjin, China; Frontiers Science Center for Synthetic Biology (Ministry of Education), Tianjin University, Tianjin, China
| | - Jiaojiao Li
- School of Chemical Engineering and Technology, Tianjin University, Tianjin, China; Frontiers Science Center for Synthetic Biology (Ministry of Education), Tianjin University, Tianjin, China
| | - Peixian Li
- School of Chemical Engineering and Technology, Tianjin University, Tianjin, China; Frontiers Science Center for Synthetic Biology (Ministry of Education), Tianjin University, Tianjin, China
| | - Jie Zhang
- School of Chemical Engineering and Technology, Tianjin University, Tianjin, China; Frontiers Science Center for Synthetic Biology (Ministry of Education), Tianjin University, Tianjin, China
| | - Qian Liu
- School of Chemical Engineering and Technology, Tianjin University, Tianjin, China; Frontiers Science Center for Synthetic Biology (Ministry of Education), Tianjin University, Tianjin, China
| | - Hao Qi
- School of Chemical Engineering and Technology, Tianjin University, Tianjin, China; Frontiers Science Center for Synthetic Biology (Ministry of Education), Tianjin University, Tianjin, China.
| |
Collapse
|
11
|
Jodlbauer J, Schmal M, Waltl C, Rohr T, Mach-Aigner AR, Mihovilovic MD, Rudroff F. Unlocking the potential of cyanobacteria: a high-throughput strategy for enhancing biocatalytic performance through genetic optimization. Trends Biotechnol 2024; 42:1795-1818. [PMID: 39214789 DOI: 10.1016/j.tibtech.2024.07.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2024] [Revised: 07/15/2024] [Accepted: 07/19/2024] [Indexed: 09/04/2024]
Abstract
Cyanobacteria show promise as hosts for whole-cell biocatalysis. Their photoautotrophic metabolism can be leveraged for a sustainable production process. Despite advancements, performance still lags behind heterotrophic hosts. A key challenge is the limited ability to overexpress recombinant enzymes, which also hinders their biocatalytic efficiency. To address this, we generated large-scale expression libraries and developed a high-throughput method combining fluorescence-activated cell sorting (FACS) and deep sequencing in Synechocystis sp. PCC 6803 (Syn. 6803) to screen and optimize its genetic background. We apply this approach to enhance expression and biocatalyst performance for three enzymes: the ketoreductase LfSDR1M50, enoate reductase YqjM, and Baeyer-Villiger monooxygenase (BVMO) CHMOmut. Diverse genetic combinations yielded significant improvements: optimizing LfSDR1M50 expression showed a 17-fold increase to 39.2 U gcell dry weight (CDW)-1. In vivo activity of Syn. YqjM was improved 16-fold to 58.7 U gCDW-1 and, for Syn. CHMOmut, a 1.5-fold increase to 7.3 U gCDW-1 was achieved by tailored genetic design. Thus, this strategy offers a pathway to optimize cyanobacteria as expression hosts, paving the way for broader applications in other cyanobacteria strains and larger libraries.
Collapse
Affiliation(s)
- Julia Jodlbauer
- Institute of Applied Synthetic Chemistry, TU Wien, Getreidemarkt 9, 1060, Vienna, Austria
| | - Matthias Schmal
- Institute of Chemical, Environmental, and Bioscience Engineering, TU Wien, Gumpendorfer Str. 1a, 1060, Vienna, Austria
| | - Christian Waltl
- Institute of Applied Synthetic Chemistry, TU Wien, Getreidemarkt 9, 1060, Vienna, Austria
| | - Thomas Rohr
- Institute of Applied Synthetic Chemistry, TU Wien, Getreidemarkt 9, 1060, Vienna, Austria
| | - Astrid R Mach-Aigner
- Institute of Chemical, Environmental, and Bioscience Engineering, TU Wien, Gumpendorfer Str. 1a, 1060, Vienna, Austria
| | - Marko D Mihovilovic
- Institute of Applied Synthetic Chemistry, TU Wien, Getreidemarkt 9, 1060, Vienna, Austria
| | - Florian Rudroff
- Institute of Applied Synthetic Chemistry, TU Wien, Getreidemarkt 9, 1060, Vienna, Austria.
| |
Collapse
|
12
|
Liu M, Jin Z, Xiang Q, He H, Huang Y, Long M, Wu J, Zhi Huang C, Mao C, Zuo H. Rational Design of Untranslated Regions to Enhance Gene Expression. J Mol Biol 2024; 436:168804. [PMID: 39326490 DOI: 10.1016/j.jmb.2024.168804] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2024] [Revised: 09/19/2024] [Accepted: 09/20/2024] [Indexed: 09/28/2024]
Abstract
How to improve gene expression by optimizing mRNA structures is a crucial question for various medical and biotechnological applications. Previous efforts focus largely on investigation of the 5' UTR hairpin structures. In this study, we present a rational strategy that enhances mRNA stability and translation by engineering both the 5' and 3' UTR sequences. We have successfully demonstrated this strategy using green fluorescent protein (GFP) as a model in Escherichia coli and across different expression vectors. We further validated it with luciferase and Plasmodium falciparum lactate dehydrogenase (PfLDH). To elucidate the underlying mechanism, we have quantitatively analyzed both protein, mRNA levels and half-life time. We have identified several key aspects of UTRs that significantly influence mRNA stability and protein expression in our system: (1) The optimal length of the single-stranded spacer between the stabilizer hairpin and ribosome binding site (RBS) in the 5' UTR is 25-30 nucleotide (nt) long. An optimal 32% GC content in the spacer yielded the highest levels of GFP protein production. (2) The insertion of a homodimerdizable, G-quadruplex structure containing RNA aptamer, "Corn", in the 3' UTR markedly increased the protein expression. Our findings indicated that the carefully engineered 5' UTRs and 3' UTRs significantly boosted gene expression. Specifically, the inclusion of 5 × Corn in the 3' UTR appeared to facilitate the local aggregation of mRNA, leading to the formation of mRNA condensates. Aside from shedding light on the regulation of mRNA stability and expression, this study is expected to substantially increase biological protein production.
Collapse
Affiliation(s)
- Mingchun Liu
- College of Pharmaceutical Sciences, Southwest University, Chongqing 400715, China
| | - Zhuoer Jin
- College of Pharmaceutical Sciences, Southwest University, Chongqing 400715, China
| | - Qing Xiang
- College of Pharmaceutical Sciences, Southwest University, Chongqing 400715, China
| | - Huawei He
- Biological Sciences Research Center, State Key Laboratory of Silkworm Genome Biology, Southwest University, Chongqing 400715, China
| | - Yuhan Huang
- College of Pharmaceutical Sciences, Southwest University, Chongqing 400715, China
| | - Mengfei Long
- College of Pharmaceutical Sciences, Southwest University, Chongqing 400715, China
| | - Jicheng Wu
- Chongqing Key Laboratory of Natural Product Synthesis and Drug Research, School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Cheng Zhi Huang
- College of Pharmaceutical Sciences, Southwest University, Chongqing 400715, China
| | - Chengde Mao
- Department of Chemistry, Purdue University, West Lafayette 47907, IN, USA
| | - Hua Zuo
- College of Pharmaceutical Sciences, Southwest University, Chongqing 400715, China.
| |
Collapse
|
13
|
Xiong S, Huang Z, Ding J, Ni D, Mu W. Improvement of cellobiose 2-epimerase expression in Bacillus subtilis for efficient bioconversion of lactose to epilactose. Int J Biol Macromol 2024; 280:136063. [PMID: 39341311 DOI: 10.1016/j.ijbiomac.2024.136063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 09/22/2024] [Accepted: 09/25/2024] [Indexed: 10/01/2024]
Abstract
Epilactose, a lactose derivative known for its prebiotic properties and potential health benefits, has garnered significant interest. Cellulose 2-epimerase (CEase) is responsible for catalyzing the conversion of lactose to epilactose. In this study, the enhancement of food-grade CEase expression in Bacillus subtilis WB600 was systematically investigated. Among seven selected epilactose-producing CEases, Rhodothermus marinus CEase (RmCE) exhibited the highest epimerization activity when expressed in B. subtilis. Translational and transcriptional regulations were employed to enhance CEase expression by screening effective N-terminal coding sequences (NCSs) and promoters. The final strain demonstrated efficient production of CEase, with epimerization activity reaching 273.6 ± 6.5 U/mL and 1255 ± 26.4 U/mL in shake-flask and fed-batch cultivation, respectively. Utilizing only 0.25 % (V/V) of the fed-batch cultivation broth for lactose biotransformation, epilactose was efficiently produced from 300 g/L of lactose within 4 h, achieving a yield of 29.5 %. These findings provide significant support for the potential industrialization of enzymatic epilactose production.
Collapse
Affiliation(s)
- Suchun Xiong
- Engineering Research Center of Sustainable Development and Utilization of Biomass Energy, Ministry of Education, Yunnan Normal University, Kunming 650500, China; State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, Jiangsu 214122, China
| | - Zhaolin Huang
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, Jiangsu 214122, China; School of Food Science and Technology, Jiangnan University, Wuxi, Jiangsu 214122, China
| | - Junmei Ding
- Engineering Research Center of Sustainable Development and Utilization of Biomass Energy, Ministry of Education, Yunnan Normal University, Kunming 650500, China.
| | - Dawei Ni
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, Jiangsu 214122, China; School of Food Science and Technology, Jiangnan University, Wuxi, Jiangsu 214122, China.
| | - Wanmeng Mu
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, Jiangsu 214122, China; School of Food Science and Technology, Jiangnan University, Wuxi, Jiangsu 214122, China
| |
Collapse
|
14
|
Carvalho A, Hipólito A, Trigo da Roza F, García-Pastor L, Vergara E, Buendía A, García-Seco T, Escudero JA. The expression of integron arrays is shaped by the translation rate of cassettes. Nat Commun 2024; 15:9232. [PMID: 39455579 PMCID: PMC11511950 DOI: 10.1038/s41467-024-53525-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Accepted: 10/15/2024] [Indexed: 10/28/2024] Open
Abstract
Integrons are key elements in the rise and spread of multidrug resistance in Gram-negative bacteria. These genetic platforms capture cassettes containing promoterless genes and stockpile them in arrays of variable length. In the current integron model, expression of cassettes is granted by the Pc promoter in the platform and is assumed to decrease as a function of its distance. Here we explored this model using a large collection of 136 antibiotic resistance cassettes and show the effect of distance is in fact negligible. Instead, cassettes have a strong impact in the expression of downstream genes because their translation rate affects the stability of the whole polycistronic mRNA molecule. Hence, cassettes with reduced translation rates decrease the expression and resistance phenotype of cassettes downstream. Our data puts forward an integron model in which expression is contingent on the translation of cassettes upstream, rather than on the distance to the Pc.
Collapse
Affiliation(s)
- André Carvalho
- Molecular Basis of Adaptation. Departamento de Sanidad Animal. Universidad Complutense de Madrid, Madrid, Spain.
- VISAVET Health Surveillance Centre, Universidad Complutense de Madrid, Madrid, Spain.
| | - Alberto Hipólito
- Molecular Basis of Adaptation. Departamento de Sanidad Animal. Universidad Complutense de Madrid, Madrid, Spain
- VISAVET Health Surveillance Centre, Universidad Complutense de Madrid, Madrid, Spain
| | - Filipa Trigo da Roza
- Molecular Basis of Adaptation. Departamento de Sanidad Animal. Universidad Complutense de Madrid, Madrid, Spain
- VISAVET Health Surveillance Centre, Universidad Complutense de Madrid, Madrid, Spain
| | - Lucía García-Pastor
- Molecular Basis of Adaptation. Departamento de Sanidad Animal. Universidad Complutense de Madrid, Madrid, Spain
- VISAVET Health Surveillance Centre, Universidad Complutense de Madrid, Madrid, Spain
| | - Ester Vergara
- Molecular Basis of Adaptation. Departamento de Sanidad Animal. Universidad Complutense de Madrid, Madrid, Spain
- VISAVET Health Surveillance Centre, Universidad Complutense de Madrid, Madrid, Spain
| | - Aranzazu Buendía
- VISAVET Health Surveillance Centre, Universidad Complutense de Madrid, Madrid, Spain
| | - Teresa García-Seco
- VISAVET Health Surveillance Centre, Universidad Complutense de Madrid, Madrid, Spain
| | - José Antonio Escudero
- Molecular Basis of Adaptation. Departamento de Sanidad Animal. Universidad Complutense de Madrid, Madrid, Spain.
- VISAVET Health Surveillance Centre, Universidad Complutense de Madrid, Madrid, Spain.
| |
Collapse
|
15
|
La Fleur A, Shi Y, Seelig G. Decoding biology with massively parallel reporter assays and machine learning. Genes Dev 2024; 38:843-865. [PMID: 39362779 PMCID: PMC11535156 DOI: 10.1101/gad.351800.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/05/2024]
Abstract
Massively parallel reporter assays (MPRAs) are powerful tools for quantifying the impacts of sequence variation on gene expression. Reading out molecular phenotypes with sequencing enables interrogating the impact of sequence variation beyond genome scale. Machine learning models integrate and codify information learned from MPRAs and enable generalization by predicting sequences outside the training data set. Models can provide a quantitative understanding of cis-regulatory codes controlling gene expression, enable variant stratification, and guide the design of synthetic regulatory elements for applications from synthetic biology to mRNA and gene therapy. This review focuses on cis-regulatory MPRAs, particularly those that interrogate cotranscriptional and post-transcriptional processes: alternative splicing, cleavage and polyadenylation, translation, and mRNA decay.
Collapse
Affiliation(s)
- Alyssa La Fleur
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA
| | - Yongsheng Shi
- Department of Microbiology and Molecular Genetics, School of Medicine, University of California, Irvine, Irvine, California 92697, USA;
| | - Georg Seelig
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA;
- Department of Electrical & Computer Engineering, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
16
|
Jiang R, Yuan S, Zhou Y, Wei Y, Li F, Wang M, Chen B, Yu H. Strategies to overcome the challenges of low or no expression of heterologous proteins in Escherichia coli. Biotechnol Adv 2024; 75:108417. [PMID: 39038691 DOI: 10.1016/j.biotechadv.2024.108417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 07/18/2024] [Accepted: 07/19/2024] [Indexed: 07/24/2024]
Abstract
Protein expression is a critical process in diverse biological systems. For Escherichia coli, a widely employed microbial host in industrial catalysis and healthcare, researchers often face significant challenges in constructing recombinant expression systems. To maximize the potential of E. coli expression systems, it is essential to address problems regarding the low or absent production of certain target proteins. This article presents viable solutions to the main factors posing challenges to heterologous protein expression in E. coli, which includes protein toxicity, the intrinsic influence of gene sequences, and mRNA structure. These strategies include specialized approaches for managing toxic protein expression, addressing issues related to mRNA structure and codon bias, advanced codon optimization methodologies that consider multiple factors, and emerging optimization techniques facilitated by big data and machine learning.
Collapse
Affiliation(s)
- Ruizhao Jiang
- Department of Chemical Engineering, Tsinghua University, Beijing 100084, China; Key Laboratory of Industrial Biocatalysis (Tsinghua University), the Ministry of Education, Beijing 100084, China
| | - Shuting Yuan
- Department of Chemical Engineering, Tsinghua University, Beijing 100084, China; Key Laboratory of Industrial Biocatalysis (Tsinghua University), the Ministry of Education, Beijing 100084, China
| | - Yilong Zhou
- Tanwei College, Tsinghua University, Beijing 100084, China
| | - Yuwen Wei
- Department of Chemical Engineering, Tsinghua University, Beijing 100084, China; Key Laboratory of Industrial Biocatalysis (Tsinghua University), the Ministry of Education, Beijing 100084, China
| | - Fulong Li
- Beijing Evolyzer Co.,Ltd., 100176, China
| | | | - Bo Chen
- Beijing Evolyzer Co.,Ltd., 100176, China
| | - Huimin Yu
- Department of Chemical Engineering, Tsinghua University, Beijing 100084, China; Key Laboratory of Industrial Biocatalysis (Tsinghua University), the Ministry of Education, Beijing 100084, China; Center for Synthetic and Systems Biology, Tsinghua University, Beijing 100084, China.
| |
Collapse
|
17
|
Yan Z, Chu W, Sheng Y, Tang K, Wang S, Liu Y, Wong WF. Integrating Deep Learning and Synthetic Biology: A Co-Design Approach for Enhancing Gene Expression via N-Terminal Coding Sequences. ACS Synth Biol 2024; 13:2960-2968. [PMID: 39229974 DOI: 10.1021/acssynbio.4c00371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/05/2024]
Abstract
N-terminal coding sequence (NCS) influences gene expression by impacting the translation initiation rate. The NCS optimization problem is to find an NCS that maximizes gene expression. The problem is important in genetic engineering. However, current methods for NCS optimization such as rational design and statistics-guided approaches are labor-intensive yield only relatively small improvements. This paper introduces a deep learning/synthetic biology codesigned few-shot training workflow for NCS optimization. Our method utilizes k-nearest encoding followed by word2vec to encode the NCS, then performs feature extraction using attention mechanisms, before constructing a time-series network for predicting gene expression intensity, and finally a direct search algorithm identifies the optimal NCS with limited training data. We took green fluorescent protein (GFP) expressed by Bacillus subtilis as a reporting protein of NCSs, and employed the fluorescence enhancement factor as the metric of NCS optimization. Within just six iterative experiments, our model generated an NCS (MLD62) that increased average GFP expression by 5.41-fold, outperforming the state-of-the-art NCS designs. Extending our findings beyond GFP, we showed that our engineered NCS (MLD62) can effectively boost the production of N-acetylneuraminic acid by enhancing the expression of the crucial rate-limiting GNA1 gene, demonstrating its practical utility. We have open-sourced our NCS expression database and experimental procedures for public use.
Collapse
Affiliation(s)
- Zhanglu Yan
- School of Computing, National University of Singapore, Singapore 117417, Singapore
| | - Weiran Chu
- Science Center for Future Foods, Jiangnan University, Wuxi 214122, PR China
| | - Yuhua Sheng
- Science Center for Future Foods, Jiangnan University, Wuxi 214122, PR China
| | - Kaiwen Tang
- School of Computing, National University of Singapore, Singapore 117417, Singapore
| | - Shida Wang
- Department of Mathematics, National University of Singapore, Singapore 119077, Singapore
| | - Yanfeng Liu
- Science Center for Future Foods, Jiangnan University, Wuxi 214122, PR China
| | - Weng-Fai Wong
- School of Computing, National University of Singapore, Singapore 117417, Singapore
| |
Collapse
|
18
|
Pang B, Song M, Yang J, Mo H, Wang K, Chen X, Huang Y, Gu R, Guan C. Efficient production of a highly active lysozyme from European flat oyster Ostrea edulis. J Biotechnol 2024; 391:40-49. [PMID: 38848819 DOI: 10.1016/j.jbiotec.2024.05.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 05/21/2024] [Accepted: 05/29/2024] [Indexed: 06/09/2024]
Abstract
Lysozyme, an antimicrobial agent, is extensively employed in the food and healthcare sectors to facilitate the breakdown of peptidoglycan. However, the methods to improve its catalytic activity and secretory expression still need to be studied. In the present study, twelve lysozymes from different origins were heterologously expressed using the Komagataella phaffii expression system. Among them, the lysozyme from the European flat oyster Ostrea edulis (oeLYZ) showed the highest activity. Via a semi-rational approach to reduce the structural free energy, the double mutant Y15A/S39R (oeLYZdm) with the catalytic activity 1.8-fold greater than that of the wild type was generated. Subsequently, different N-terminal fusion tags were employed to enhance oeLYZdm expression. The fusion with peptide tag 6×Glu resulted in a remarkable increase in the recombinant oeLYZdm expression, from 2.81 × 103 U mL-1 to 2.11 × 104 U mL-1 in shake flask culture, and eventually reaching 2.05 × 105 U mL-1 in a 3-L fermenter. The work produced the greatest amount of heterologous oeLYZ expression in microbial systems that are known to exist. Reducing the structural free energy and employing the N-terminal fusion tags are effective strategies to improve the catalytic activity and secretory expression of lysozyme.
Collapse
Affiliation(s)
- Bo Pang
- School of Food Science and Engineering, Yangzhou University, Yangzhou, Jiangsu 225127, China; Key Lab of Dairy Biotechnology and Safety Control, Yangzhou University, Yangzhou, Jiangsu 225127, China
| | - Manxi Song
- School of Food Science and Engineering, Yangzhou University, Yangzhou, Jiangsu 225127, China; Key Lab of Dairy Biotechnology and Safety Control, Yangzhou University, Yangzhou, Jiangsu 225127, China
| | - Jiahao Yang
- School of Food Science and Engineering, Yangzhou University, Yangzhou, Jiangsu 225127, China; Key Lab of Dairy Biotechnology and Safety Control, Yangzhou University, Yangzhou, Jiangsu 225127, China
| | - Haobin Mo
- School of Food Science and Engineering, Yangzhou University, Yangzhou, Jiangsu 225127, China; Key Lab of Dairy Biotechnology and Safety Control, Yangzhou University, Yangzhou, Jiangsu 225127, China
| | - Kai Wang
- School of Food Science and Engineering, Yangzhou University, Yangzhou, Jiangsu 225127, China; Key Lab of Dairy Biotechnology and Safety Control, Yangzhou University, Yangzhou, Jiangsu 225127, China
| | - Xia Chen
- School of Food Science and Engineering, Yangzhou University, Yangzhou, Jiangsu 225127, China; Key Lab of Dairy Biotechnology and Safety Control, Yangzhou University, Yangzhou, Jiangsu 225127, China
| | - Yujun Huang
- School of Food Science and Engineering, Yangzhou University, Yangzhou, Jiangsu 225127, China; Key Lab of Dairy Biotechnology and Safety Control, Yangzhou University, Yangzhou, Jiangsu 225127, China
| | - Ruixia Gu
- School of Food Science and Engineering, Yangzhou University, Yangzhou, Jiangsu 225127, China; Key Lab of Dairy Biotechnology and Safety Control, Yangzhou University, Yangzhou, Jiangsu 225127, China
| | - Chengran Guan
- School of Food Science and Engineering, Yangzhou University, Yangzhou, Jiangsu 225127, China; Key Lab of Dairy Biotechnology and Safety Control, Yangzhou University, Yangzhou, Jiangsu 225127, China.
| |
Collapse
|
19
|
Gilliot PA, Gorochowski TE. Transfer learning for cross-context prediction of protein expression from 5'UTR sequence. Nucleic Acids Res 2024; 52:e58. [PMID: 38864396 PMCID: PMC11260469 DOI: 10.1093/nar/gkae491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Revised: 04/28/2024] [Accepted: 05/28/2024] [Indexed: 06/13/2024] Open
Abstract
Model-guided DNA sequence design can accelerate the reprogramming of living cells. It allows us to engineer more complex biological systems by removing the need to physically assemble and test each potential design. While mechanistic models of gene expression have seen some success in supporting this goal, data-centric, deep learning-based approaches often provide more accurate predictions. This accuracy, however, comes at a cost - a lack of generalization across genetic and experimental contexts that has limited their wider use outside the context in which they were trained. Here, we address this issue by demonstrating how a simple transfer learning procedure can effectively tune a pre-trained deep learning model to predict protein translation rate from 5' untranslated region (5'UTR) sequence for diverse contexts in Escherichia coli using a small number of new measurements. This allows for important model features learnt from expensive massively parallel reporter assays to be easily transferred to new settings. By releasing our trained deep learning model and complementary calibration procedure, this study acts as a starting point for continually refined model-based sequence design that builds on previous knowledge and future experimental efforts.
Collapse
Affiliation(s)
- Pierre-Aurélien Gilliot
- School of Biological Sciences, University of Bristol, 24 Tyndall Avenue, Bristol BS8 1TQ, UK
| | - Thomas E Gorochowski
- School of Biological Sciences, University of Bristol, 24 Tyndall Avenue, Bristol BS8 1TQ, UK
- BrisEngBio, School of Chemistry, University of Bristol, Cantock’s Close, Bristol BS8 1TS, UK
| |
Collapse
|
20
|
Zelenka NR, Di Cara N, Sharma K, Sarvaharman S, Ghataora JS, Parmeggiani F, Nivala J, Abdallah ZS, Marucci L, Gorochowski TE. Data hazards in synthetic biology. Synth Biol (Oxf) 2024; 9:ysae010. [PMID: 38973982 PMCID: PMC11227101 DOI: 10.1093/synbio/ysae010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Revised: 05/17/2024] [Accepted: 06/19/2024] [Indexed: 07/09/2024] Open
Abstract
Data science is playing an increasingly important role in the design and analysis of engineered biology. This has been fueled by the development of high-throughput methods like massively parallel reporter assays, data-rich microscopy techniques, computational protein structure prediction and design, and the development of whole-cell models able to generate huge volumes of data. Although the ability to apply data-centric analyses in these contexts is appealing and increasingly simple to do, it comes with potential risks. For example, how might biases in the underlying data affect the validity of a result and what might the environmental impact of large-scale data analyses be? Here, we present a community-developed framework for assessing data hazards to help address these concerns and demonstrate its application to two synthetic biology case studies. We show the diversity of considerations that arise in common types of bioengineering projects and provide some guidelines and mitigating steps. Understanding potential issues and dangers when working with data and proactively addressing them will be essential for ensuring the appropriate use of emerging data-intensive AI methods and help increase the trustworthiness of their applications in synthetic biology.
Collapse
Affiliation(s)
- Natalie R Zelenka
- Jean Golding Institute, University of Bristol, Bristol, UK
- BrisEngBio, University of Bristol, Bristol, UK
| | - Nina Di Cara
- School of Psychological Science, University of Bristol, Bristol, UK
| | - Kieren Sharma
- School of Engineering Mathematics and Technology, University of Bristol, Bristol, UK
| | | | - Jasdeep S Ghataora
- BrisEngBio, University of Bristol, Bristol, UK
- School of Biological Sciences, University of Bristol, Bristol, UK
| | - Fabio Parmeggiani
- BrisEngBio, University of Bristol, Bristol, UK
- School of Biochemistry, University of Bristol, Bristol, UK
- School of Pharmacy and Pharmaceutical Sciences, Cardiff University, Cardiff, UK
| | - Jeff Nivala
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Zahraa S Abdallah
- School of Engineering Mathematics and Technology, University of Bristol, Bristol, UK
| | - Lucia Marucci
- BrisEngBio, University of Bristol, Bristol, UK
- School of Engineering Mathematics and Technology, University of Bristol, Bristol, UK
| | - Thomas E Gorochowski
- BrisEngBio, University of Bristol, Bristol, UK
- School of Biological Sciences, University of Bristol, Bristol, UK
| |
Collapse
|
21
|
Alcantar MA, English MA, Valeri JA, Collins JJ. A high-throughput synthetic biology approach for studying combinatorial chromatin-based transcriptional regulation. Mol Cell 2024; 84:2382-2396.e9. [PMID: 38906116 DOI: 10.1016/j.molcel.2024.05.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 04/11/2024] [Accepted: 05/24/2024] [Indexed: 06/23/2024]
Abstract
The construction of synthetic gene circuits requires the rational combination of multiple regulatory components, but predicting their behavior can be challenging due to poorly understood component interactions and unexpected emergent behaviors. In eukaryotes, chromatin regulators (CRs) are essential regulatory components that orchestrate gene expression. Here, we develop a screening platform to investigate the impact of CR pairs on transcriptional activity in yeast. We construct a combinatorial library consisting of over 1,900 CR pairs and use a high-throughput workflow to characterize the impact of CR co-recruitment on gene expression. We recapitulate known interactions and discover several instances of CR pairs with emergent behaviors. We also demonstrate that supervised machine learning models trained with low-dimensional amino acid embeddings accurately predict the impact of CR co-recruitment on transcriptional activity. This work introduces a scalable platform and machine learning approach that can be used to study how networks of regulatory components impact gene expression.
Collapse
Affiliation(s)
- Miguel A Alcantar
- Department of Biological Engineering, Massachusetts Institute of Technology (MIT), Cambridge, MA 02139, USA; Institute for Medical Engineering and Science, MIT, Cambridge, MA 02139, USA
| | - Max A English
- Department of Biological Engineering, Massachusetts Institute of Technology (MIT), Cambridge, MA 02139, USA; Institute for Medical Engineering and Science, MIT, Cambridge, MA 02139, USA
| | - Jacqueline A Valeri
- Department of Biological Engineering, Massachusetts Institute of Technology (MIT), Cambridge, MA 02139, USA; Institute for Medical Engineering and Science, MIT, Cambridge, MA 02139, USA; Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - James J Collins
- Department of Biological Engineering, Massachusetts Institute of Technology (MIT), Cambridge, MA 02139, USA; Institute for Medical Engineering and Science, MIT, Cambridge, MA 02139, USA; Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| |
Collapse
|
22
|
Radrizzani S, Kudla G, Izsvák Z, Hurst LD. Selection on synonymous sites: the unwanted transcript hypothesis. Nat Rev Genet 2024; 25:431-448. [PMID: 38297070 DOI: 10.1038/s41576-023-00686-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/04/2023] [Indexed: 02/02/2024]
Abstract
Although translational selection to favour codons that match the most abundant tRNAs is not readily observed in humans, there is nonetheless selection in humans on synonymous mutations. We hypothesize that much of this synonymous site selection can be explained in terms of protection against unwanted RNAs - spurious transcripts, mis-spliced forms or RNAs derived from transposable elements or viruses. We propose not only that selection on synonymous sites functions to reduce the rate of creation of unwanted transcripts (for example, through selection on exonic splice enhancers and cryptic splice sites) but also that high-GC content (but low-CpG content), together with intron presence and position, is both particular to functional native mRNAs and used to recognize transcripts as native. In support of this hypothesis, transcription, nuclear export, liquid phase condensation and RNA degradation have all recently been shown to promote GC-rich transcripts and suppress AU/CpG-rich ones. With such 'traps' being set against AU/CpG-rich transcripts, the codon usage of native genes has, in turn, evolved to avoid such suppression. That parallel filters against AU/CpG-rich transcripts also affect the endosomal import of RNAs further supports the unwanted transcript hypothesis of synonymous site selection and explains the similar design rules that have enabled the successful use of transgenes and RNA vaccines.
Collapse
Affiliation(s)
- Sofia Radrizzani
- Milner Centre for Evolution, Department of Life Sciences, University of Bath, Bath, UK
- Milner Therapeutics Institute, Jeffrey Cheah Biomedical Centre, University of Cambridge, Cambridge, UK
| | - Grzegorz Kudla
- MRC Human Genetics Unit, Institute for Genetics and Cancer, The University of Edinburgh, Edinburgh, UK
| | - Zsuzsanna Izsvák
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Society, Berlin, Germany
| | - Laurence D Hurst
- Milner Centre for Evolution, Department of Life Sciences, University of Bath, Bath, UK.
| |
Collapse
|
23
|
Zhang W, Xiong S, Ni D, Huang Z, Ding J, Mu W. Engineering Bacillus subtilis for highly efficient production of functional disaccharide lactulose from lactose. Int J Biol Macromol 2024; 271:132478. [PMID: 38772465 DOI: 10.1016/j.ijbiomac.2024.132478] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Revised: 05/05/2024] [Accepted: 05/15/2024] [Indexed: 05/23/2024]
Abstract
Bioconversion of lactose to functional lactose derivatives attracts increasing attention. Lactulose is an important high-value lactose derivative, which has been widely used in pharmaceutical, nutraceutical, and food industries. Lactulose can be enzymatically produced from lactose by cellobiose 2-epimerase (CEase). Several studies have already focused on the food-grade expression of CEase, but they are all aimed at the biosynthesis of epilactose. Herein, we reported for the first time the biosynthesis of lactulose using the recombinant food-grade Bacillus subtilis. Lactulose biosynthesis was optimized by varying lactulose-producing CEases and expression vectors. Caldicellulosiruptor saccharolyticus CEase and pP43NMK were determined to be the optimal CEase and expression vector. Fine-tuning of CEase expression was investigated by screening a beneficial N-terminal coding sequence. After fed-batch cultivation, the highest fermentation isomerization activity reached 11.6 U/mL. Lactulose was successfully produced by the broth of the engineered B. subtilis with a yield of 52.1 %.
Collapse
Affiliation(s)
- Wenli Zhang
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, Jiangsu 214122, China
| | - Suchun Xiong
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, Jiangsu 214122, China; Engineering Research Center of Sustainable Development and Utilization of Biomass Energy, Ministry of Education, Yunnan Normal University, Kunming 650500, China
| | - Dawei Ni
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, Jiangsu 214122, China
| | - Zhaolin Huang
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, Jiangsu 214122, China
| | - Junmei Ding
- Engineering Research Center of Sustainable Development and Utilization of Biomass Energy, Ministry of Education, Yunnan Normal University, Kunming 650500, China
| | - Wanmeng Mu
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, Jiangsu 214122, China.
| |
Collapse
|
24
|
Kobo A, Taguchi H, Chadani Y. Nonspecific N-terminal tetrapeptide insertions disrupt the translation arrest induced by ribosome-arresting peptide sequences. J Biol Chem 2024; 300:107360. [PMID: 38735477 PMCID: PMC11190716 DOI: 10.1016/j.jbc.2024.107360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2024] [Revised: 04/19/2024] [Accepted: 05/03/2024] [Indexed: 05/14/2024] Open
Abstract
The nascent polypeptide chains passing through the ribosome tunnel not only serve as an intermediate of protein synthesis but also, in some cases, act as dynamic genetic information, controlling translation through interaction with the ribosome. One notable example is Escherichia coli SecM, in which translation of the ribosome arresting peptide (RAP) sequence in SecM leads to robust elongation arrest. Translation regulations, including the SecM-induced translation arrest, play regulatory roles such as gene expression control. Recent investigations have indicated that the insertion of a peptide sequence, SKIK (or MSKIK), into the adjacent N-terminus of the RAP sequence of SecM behaves as an "arrest canceler". As the study did not provide a direct assessment of the strength of translation arrest, we conducted detailed biochemical analyses. The results revealed that the effect of SKIK insertion on weakening SecM-induced translation arrest was not specific to the SKIK sequence, that is, other tetrapeptide sequences inserted just before the RAP sequence also attenuated the arrest. Our data suggest that SKIK or other tetrapeptide insertions disrupt the context of the RAP sequence rather than canceling or preventing the translation arrest.
Collapse
Affiliation(s)
- Akinao Kobo
- School of Life Science and Technology, Tokyo Institute of Technology, Yokohama, Japan
| | - Hideki Taguchi
- School of Life Science and Technology, Tokyo Institute of Technology, Yokohama, Japan; Cell Biology Center, Institute of Innovative Research, Tokyo Institute of Technology, Yokohama, Japan.
| | - Yuhei Chadani
- Faculty of Environmental, Life, Natural Science and Technology, Okayama University, Okayama, Japan.
| |
Collapse
|
25
|
Calabrese L, Ciandrini L, Cosentino Lagomarsino M. How total mRNA influences cell growth. Proc Natl Acad Sci U S A 2024; 121:e2400679121. [PMID: 38753514 PMCID: PMC11126920 DOI: 10.1073/pnas.2400679121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Accepted: 04/10/2024] [Indexed: 05/18/2024] Open
Abstract
Experimental observations tracing back to the 1960s imply that ribosome quantities play a prominent role in determining a cell's growth. Nevertheless, in biologically relevant scenarios, growth can also be influenced by the levels of mRNA and RNA polymerase. Here, we construct a quantitative model of biosynthesis providing testable scenarios for these situations. The model explores a theoretically motivated regime where RNA polymerases compete for genes and ribosomes for transcripts and gives general expressions relating growth rate, mRNA concentrations, ribosome, and RNA polymerase levels. On general grounds, the model predicts how the fraction of ribosomes in the proteome depends on total mRNA concentration and inspects an underexplored regime in which the trade-off between transcript levels and ribosome abundances sets the cellular growth rate. In particular, we show that the model predicts and clarifies three important experimental observations, in budding yeast and Escherichia coli bacteria: i) that the growth-rate cost of unneeded protein expression can be affected by mRNA levels, ii) that resource optimization leads to decreasing trends in mRNA levels at slow growth, and iii) that ribosome allocation may increase, stay constant, or decrease, in response to transcription-inhibiting antibiotics. Since the data indicate that a regime of joint limitation may apply in physiological conditions and not only to perturbations, we speculate that this regime is likely self-imposed.
Collapse
Affiliation(s)
- Ludovico Calabrese
- IFOM-ETS–The AIRC Institute of Molecular Oncology, The Associazione Italiana di Ricerca sul Cancro (AIRC) Institute of Molecular Oncology, Milan20139, Italy
| | - Luca Ciandrini
- Centre de Biologie Structurale, Université de Montpellier, CNRS, INSERM, Montpellier, France
- Institut Universitaire de France
| | - Marco Cosentino Lagomarsino
- IFOM-ETS–The AIRC Institute of Molecular Oncology, The Associazione Italiana di Ricerca sul Cancro (AIRC) Institute of Molecular Oncology, Milan20139, Italy
- Dipartimento di Fisica, Universitá degli Studi di Milano, Milano20133, Italy
- Istituto Nazionale di Fisica Nucleare (INFN) Sezione di Milano, Milano20133, Italy
| |
Collapse
|
26
|
Wagner A. Genotype sampling for deep-learning assisted experimental mapping of a combinatorially complete fitness landscape. Bioinformatics 2024; 40:btae317. [PMID: 38745436 PMCID: PMC11132821 DOI: 10.1093/bioinformatics/btae317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 03/21/2024] [Accepted: 05/14/2024] [Indexed: 05/16/2024] Open
Abstract
MOTIVATION Experimental characterization of fitness landscapes, which map genotypes onto fitness, is important for both evolutionary biology and protein engineering. It faces a fundamental obstacle in the astronomical number of genotypes whose fitness needs to be measured for any one protein. Deep learning may help to predict the fitness of many genotypes from a smaller neural network training sample of genotypes with experimentally measured fitness. Here I use a recently published experimentally mapped fitness landscape of more than 260 000 protein genotypes to ask how such sampling is best performed. RESULTS I show that multilayer perceptrons, recurrent neural networks, convolutional networks, and transformers, can explain more than 90% of fitness variance in the data. In addition, 90% of this performance is reached with a training sample comprising merely ≈103 sequences. Generalization to unseen test data is best when training data is sampled randomly and uniformly, or sampled to minimize the number of synonymous sequences. In contrast, sampling to maximize sequence diversity or codon usage bias reduces performance substantially. These observations hold for more than one network architecture. Simple sampling strategies may perform best when training deep learning neural networks to map fitness landscapes from experimental data. AVAILABILITY AND IMPLEMENTATION The fitness landscape data analyzed here is publicly available as described previously (Papkou et al. 2023). All code used to analyze this landscape is publicly available at https://github.com/andreas-wagner-uzh/fitness_landscape_sampling.
Collapse
Affiliation(s)
- Andreas Wagner
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, 8057 Zurich, Switzerland
- Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode,1015 Lausanne, Switzerland
- The Santa Fe Institute, Santa Fe, 87501 NM, United States
| |
Collapse
|
27
|
Castle SD, Stock M, Gorochowski TE. Engineering is evolution: a perspective on design processes to engineer biology. Nat Commun 2024; 15:3640. [PMID: 38684714 PMCID: PMC11059173 DOI: 10.1038/s41467-024-48000-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Accepted: 04/18/2024] [Indexed: 05/02/2024] Open
Abstract
Careful consideration of how we approach design is crucial to all areas of biotechnology. However, choosing or developing an effective design methodology is not always easy as biology, unlike most areas of engineering, is able to adapt and evolve. Here, we put forward that design and evolution follow a similar cyclic process and therefore all design methods, including traditional design, directed evolution, and even random trial and error, exist within an evolutionary design spectrum. This contrasts with conventional views that often place these methods at odds and provides a valuable framework for unifying engineering approaches for challenging biological design problems.
Collapse
Affiliation(s)
- Simeon D Castle
- School of Biological Sciences, University of Bristol, Life Sciences Building, 24 Tyndall Avenue, Bristol, UK.
| | - Michiel Stock
- KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium
| | - Thomas E Gorochowski
- School of Biological Sciences, University of Bristol, Life Sciences Building, 24 Tyndall Avenue, Bristol, UK.
- BrisEngBio, School of Chemistry, University of Bristol, Cantock's Close, Bristol, UK.
| |
Collapse
|
28
|
Zhang W, Ren H, Chen J, Ni D, Xu W, Mu W. Enhancement of the d-Allulose 3-Epimerase Expression in Bacillus subtilis through Both Transcriptional and Translational Regulations. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2024; 72:8052-8059. [PMID: 38563420 DOI: 10.1021/acs.jafc.4c01122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
d-Allulose, a functional bulk sweetener, has recently attracted increasing attention because of its low-caloric-ness properties and diverse health effects. d-Allulose is industrially produced by the enzymatic epimerization of d-fructose, which is catalyzed by ketose 3-epimerase (KEase). In this study, the food-grade expression of KEase was studied using Bacillus subtills as the host. Clostridium sp. d-allulose 3-epimerase (Clsp-DAEase) was screened from nine d-allulose-producing KEases, showing better potential for expression in B. subtills WB600. Promoter-based transcriptional regulation and N-terminal coding sequence (NCS)-based translational regulation were studied to enhance the DAEase expression level. In addition, the synergistic effect of promoter and NCS on the Clsp-DAEase expression was studied. Finally, the strain with the combination of a PHapII promoter and gln A-Up NCS was selected as the best Clsp-DAEase-producing strain. It efficiently produced Clsp-DAEase with a total activity of 333.2 and 1860.6 U/mL by shake-flask and fed-batch cultivations, respectively.
Collapse
Affiliation(s)
- Wenli Zhang
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, Jiangsu 214122, People's Republic of China
| | - Hu Ren
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, Jiangsu 214122, People's Republic of China
| | - JiaJun Chen
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, Jiangsu 214122, People's Republic of China
| | - Dawei Ni
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, Jiangsu 214122, People's Republic of China
| | - Wei Xu
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, Jiangsu 214122, People's Republic of China
| | - Wanmeng Mu
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, Jiangsu 214122, People's Republic of China
| |
Collapse
|
29
|
Sinzger-D'Angelo M, Hanst M, Reinhardt F, Koeppl H. Effects of mRNA conformational switching on translational noise in gene circuits. J Chem Phys 2024; 160:134108. [PMID: 38573847 DOI: 10.1063/5.0186927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Accepted: 03/08/2024] [Indexed: 04/06/2024] Open
Abstract
Intragenic translational heterogeneity describes the variation in translation at the level of transcripts for an individual gene. A factor that contributes to this source of variation is the mRNA structure. Both the composition of the thermodynamic ensemble, i.e., the stationary distribution of mRNA structures, and the switching dynamics between those play a role. The effect of the switching dynamics on intragenic translational heterogeneity remains poorly understood. We present a stochastic translation model that accounts for mRNA structure switching and is derived from a Markov model via approximate stochastic filtering. We assess the approximation on various timescales and provide a method to quantify how mRNA structure dynamics contributes to translational heterogeneity. With our approach, we allow quantitative information on mRNA switching from biophysical experiments or coarse-grain molecular dynamics simulations of mRNA structures to be included in gene regulatory chemical reaction network models without an increase in the number of species. Thereby, our model bridges a gap between mRNA structure kinetics and gene expression models, which we hope will further improve our understanding of gene regulatory networks and facilitate genetic circuit design.
Collapse
Affiliation(s)
| | - Maleen Hanst
- Centre for Synthetic Biology, Technische Universität Darmstadt, Darmstadt, Germany
| | - Felix Reinhardt
- Centre for Synthetic Biology, Technische Universität Darmstadt, Darmstadt, Germany
| | - Heinz Koeppl
- Centre for Synthetic Biology, Technische Universität Darmstadt, Darmstadt, Germany
| |
Collapse
|
30
|
Zabolotskii AI, Riabkova NS. A new look at the fluorescent protein-based approach for identifying optimal coding sequence for recombinant protein expression in E. coli. Biotechnol J 2024; 19:e2300343. [PMID: 38622786 DOI: 10.1002/biot.202300343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 03/06/2024] [Accepted: 03/20/2024] [Indexed: 04/17/2024]
Abstract
Due to the degeneracy of the genetic code, most amino acids are encoded by several codons. The choice among synonymous codons at the N-terminus of genes has a profound effect on protein expression in Escherichia coli. This is often explained by the different contributions of synonymous codons to mRNA secondary structure formation. Strong secondary structures at the 5'-end of mRNA interfere with ribosome binding and affect the process of translation initiation. In silico optimization of the gene 5'-end can significantly increase the level of protein expression; however, this method is not always effective due to the uncertainty of the exact mechanism by which synonymous substitutions affect expression; thus, it may produce nonoptimal variants as well as miss some of the best producers. In this paper, an alternative approach is proposed based on screening a partially randomized library of expression constructs comprising hundreds of selected synonymous variants. The effect of such substitutions was evaluated using the gene of interest fused to the reporter gene of the fluorescent protein with subsequent screening for the most promising candidates according to the reporter's signal intensity. The power of the approach is demonstrated by a significant increase in the prokaryotic expression of three proteins: canine cystatin C, human BCL2-associated athanogene 3 and human cardiac troponin I. This simple approach was suggested which may provide an efficient, easy, and inexpensive optimization method for poorly expressed proteins in bacteria.
Collapse
|
31
|
Love AM, Nair NU. Specific codons control cellular resources and fitness. SCIENCE ADVANCES 2024; 10:eadk3485. [PMID: 38381824 PMCID: PMC10881034 DOI: 10.1126/sciadv.adk3485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 01/18/2024] [Indexed: 02/23/2024]
Abstract
As cellular engineering progresses from simply overexpressing proteins to imparting complex phenotypes through multigene expression, judicious appropriation of cellular resources is essential. Since codon use is degenerate and biased, codons may control cellular resources at a translational level. We investigate how partitioning transfer RNA (tRNA) resources by incorporating dissimilar codon usage can drastically alter interdependence of expression level and burden on the host. By isolating the effect of individual codons' use during translation elongation while eliminating confounding factors, we show that codon choice can trans-regulate fitness of the host and expression of other heterologous or native genes. We correlate specific codon usage patterns with host fitness and derive a coding scheme for multigene expression called the Codon Health Index (CHI, χ). This empirically derived coding scheme (χ) enables the design of multigene expression systems that avoid catastrophic cellular burden and is robust across several proteins and conditions.
Collapse
Affiliation(s)
- Aaron M. Love
- Manus Bio, Waltham, MA 02453, USA
- Department of Chemical and Biological Engineering, Tufts University, Medford, MA 02155, USA
| | - Nikhil U. Nair
- Department of Chemical and Biological Engineering, Tufts University, Medford, MA 02155, USA
| |
Collapse
|
32
|
Ahsan A, Wagner D, Varaljay VA, Roman V, Kelley-Loughnane N, Reuel NF. Screening putative polyester polyurethane degrading enzymes with semi-automated cell-free expression and nitrophenyl probes. Synth Biol (Oxf) 2024; 9:ysae005. [PMID: 38414826 PMCID: PMC10898825 DOI: 10.1093/synbio/ysae005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 12/26/2023] [Accepted: 02/09/2024] [Indexed: 02/29/2024] Open
Abstract
Cell-free expression (CFE) has shown recent utility in prototyping enzymes for discovery efforts. In this work, CFE is demonstrated as an effective tool to screen putative polyester polyurethane degrading enzyme sequences sourced from metagenomic analysis of biofilms prospected on aircraft and vehicles. An automated fluid handler with a controlled temperature block is used to assemble the numerous 30 µL CFE reactions to provide more consistent results over human assembly. In sum, 13 putative hydrolase enzymes from the biofilm organisms as well as a previously verified, polyester-degrading cutinase were expressed using in-house E. coli extract and minimal linear templates. The enzymes were then tested for esterase activity directly in extract using nitrophenyl conjugated substrates, showing highest sensitivity to shorter substrates (4-nitrophenyl hexanoate and 4-nNitrophenyl valerate). This screen identified 10 enzymes with statistically significant activities against these substrates; however, all were lower in measured relative activity, on a CFE volume basis, to the established cutinase control. This approach portends the use of CFE and reporter probes to rapidly prototype, screen and design for synthetic polymer degrading enzymes from environmental consortia. Graphical Abstract.
Collapse
Affiliation(s)
- Afrin Ahsan
- Department of Chemical and Biological Engineering, Iowa State University, Ames, IA, USA
| | - Dominique Wagner
- Materials and Manufacturing Directorate, Air Force Research Laboratory, Wright-Patterson AFB, OH, USA
- UES Inc., Dayton, OH, USA
| | - Vanessa A Varaljay
- Materials and Manufacturing Directorate, Air Force Research Laboratory, Wright-Patterson AFB, OH, USA
| | - Victor Roman
- Materials and Manufacturing Directorate, Air Force Research Laboratory, Wright-Patterson AFB, OH, USA
| | - Nancy Kelley-Loughnane
- Materials and Manufacturing Directorate, Air Force Research Laboratory, Wright-Patterson AFB, OH, USA
| | - Nigel F Reuel
- Department of Chemical and Biological Engineering, Iowa State University, Ames, IA, USA
| |
Collapse
|
33
|
Mishra S, Perkovich PM, Mitchell WP, Venkataraman M, Pfleger BF. Expanding the synthetic biology toolbox of Cupriavidus necator for establishing fatty acid production. J Ind Microbiol Biotechnol 2024; 51:kuae008. [PMID: 38366943 PMCID: PMC10926325 DOI: 10.1093/jimb/kuae008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Accepted: 02/15/2024] [Indexed: 02/19/2024]
Abstract
The Gram-negative betaproteobacterium Cupriavidus necator is a chemolithotroph that can convert carbon dioxide into biomass. Cupriavidus necator has been engineered to produce a variety of high-value chemicals in the past. However, there is still a lack of a well-characterized toolbox for gene expression and genome engineering. Development and optimization of biosynthetic pathways in metabolically engineered microorganisms necessitates control of gene expression via functional genetic elements such as promoters, ribosome binding sites (RBSs), and codon optimization. In this work, a set of inducible and constitutive promoters were validated and characterized in C. necator, and a library of RBSs was designed and tested to show a 50-fold range of expression for green fluorescent protein (gfp). The effect of codon optimization on gene expression in C. necator was studied by expressing gfp and mCherry genes with varied codon-adaptation indices and was validated by expressing codon-optimized variants of a C12-specific fatty acid thioesterase to produce dodecanoic acid. We discuss further hurdles that will need to be overcome for C. necator to be widely used for biosynthetic processes.
Collapse
Affiliation(s)
- Shivangi Mishra
- Department of Chemical and Biological Engineering, University of Wisconsin–Madison, Madison, WI 53706, USA
| | - Paul M Perkovich
- Department of Chemical and Biological Engineering, University of Wisconsin–Madison, Madison, WI 53706, USA
| | | | - Maya Venkataraman
- Department of Chemical and Biological Engineering, University of Wisconsin–Madison, Madison, WI 53706, USA
| | - Brian F Pfleger
- Department of Chemical and Biological Engineering, University of Wisconsin–Madison, Madison, WI 53706, USA
| |
Collapse
|
34
|
Moon S, Saboe A, Smanski MJ. Using design of experiments to guide genetic optimization of engineered metabolic pathways. J Ind Microbiol Biotechnol 2024; 51:kuae010. [PMID: 38490746 PMCID: PMC10981448 DOI: 10.1093/jimb/kuae010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Accepted: 03/14/2024] [Indexed: 03/17/2024]
Abstract
Design of experiments (DoE) is a term used to describe the application of statistical approaches to interrogate the impact of many variables on the performance of a multivariate system. It is commonly used for process optimization in fields such as chemical engineering and material science. Recent advances in the ability to quantitatively control the expression of genes in biological systems open up the possibility to apply DoE for genetic optimization. In this review targeted to genetic and metabolic engineers, we introduce several approaches in DoE at a high level and describe instances wherein these were applied to interrogate or optimize engineered genetic systems. We discuss the challenges of applying DoE and propose strategies to mitigate these challenges. ONE-SENTENCE SUMMARY This is a review of literature related to applying Design of Experiments for genetic optimization.
Collapse
Affiliation(s)
- Seonyun Moon
- Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesota, St Paul, MN 55108, USA
- Biotechnology Institute, University of Minnesota, St Paul, MN 55108, USA
| | - Anna Saboe
- Biotechnology Institute, University of Minnesota, St Paul, MN 55108, USA
| | - Michael J Smanski
- Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesota, St Paul, MN 55108, USA
- Biotechnology Institute, University of Minnesota, St Paul, MN 55108, USA
| |
Collapse
|
35
|
Connors BM, Thompson J, Ertmer S, Clark RL, Pfleger BF, Venturelli OS. Control points for design of taxonomic composition in synthetic human gut communities. Cell Syst 2023; 14:1044-1058.e13. [PMID: 38091992 PMCID: PMC10752370 DOI: 10.1016/j.cels.2023.11.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Revised: 06/22/2023] [Accepted: 11/20/2023] [Indexed: 12/23/2023]
Abstract
Microbial communities offer vast potential across numerous sectors but remain challenging to systematically control. We develop a two-stage approach to guide the taxonomic composition of synthetic microbiomes by precisely manipulating media components and initial species abundances. By combining high-throughput experiments and computational modeling, we demonstrate the ability to predict and design the diversity of a 10-member synthetic human gut community. We reveal that critical environmental factors governing monoculture growth can be leveraged to steer microbial communities to desired states. Furthermore, systematically varied initial abundances drive variation in community assembly and enable inference of pairwise inter-species interactions via a dynamic ecological model. These interactions are overall consistent with conditioned media experiments, demonstrating that specific perturbations to a high-richness community can provide rich information for building dynamic ecological models. This model is subsequently used to design low-richness communities that display low or high temporal taxonomic variability over an extended period. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- Bryce M Connors
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI 53706, USA; Department of Chemical & Biological Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Jaron Thompson
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI 53706, USA; Department of Chemical & Biological Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Sarah Ertmer
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI 53706, USA; Department of Chemical & Biological Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Ryan L Clark
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Brian F Pfleger
- Department of Chemical & Biological Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Ophelia S Venturelli
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI 53706, USA; Department of Chemical & Biological Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA; Department of Bacteriology, University of Wisconsin-Madison, Madison, WI 53706, USA.
| |
Collapse
|
36
|
Mao Y, Jia L, Dong L, Shu XE, Qian SB. Start codon-associated ribosomal frameshifting mediates nutrient stress adaptation. Nat Struct Mol Biol 2023; 30:1816-1825. [PMID: 37957305 DOI: 10.1038/s41594-023-01119-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 09/07/2023] [Indexed: 11/15/2023]
Abstract
A translating ribosome is typically thought to follow the reading frame defined by the selected start codon. Using super-resolution ribosome profiling, here we report pervasive out-of-frame translation immediately from the start codon. Start codon-associated ribosomal frameshifting (SCARF) stems from the slippage of ribosomes during the transition from initiation to elongation. Using a massively paralleled reporter assay, we uncovered sequence elements acting as SCARF enhancers or repressors, implying that start codon recognition is coupled with reading frame fidelity. This finding explains thousands of mass spectrometry spectra that are unannotated in the human proteome. Mechanistically, we find that the eukaryotic initiation factor 5B (eIF5B) maintains the reading frame fidelity by stabilizing initiating ribosomes. Intriguingly, amino acid starvation induces SCARF by proteasomal degradation of eIF5B. The stress-induced SCARF protects cells from starvation by enabling amino acid recycling and selective mRNA translation. Our findings illustrate a beneficial effect of translational 'noise' in nutrient stress adaptation.
Collapse
Affiliation(s)
- Yuanhui Mao
- Division of Nutritional Sciences, Cornell University, Ithaca, NY, USA
- Liangzhu Laboratory, Zhejiang University, Hangzhou, China
| | - Longfei Jia
- Division of Nutritional Sciences, Cornell University, Ithaca, NY, USA
| | - Leiming Dong
- Division of Nutritional Sciences, Cornell University, Ithaca, NY, USA
| | - Xin Erica Shu
- Division of Nutritional Sciences, Cornell University, Ithaca, NY, USA
| | - Shu-Bing Qian
- Division of Nutritional Sciences, Cornell University, Ithaca, NY, USA.
| |
Collapse
|
37
|
Korenskaia AY, Matushkin YG, Mustafin ZS, Lashin SA, Klimenko AI. Bioinformatic Analysis Reveals the Role of Translation Elongation Efficiency Optimisation in the Evolution of Ralstonia Genus. BIOLOGY 2023; 12:1338. [PMID: 37887048 PMCID: PMC10604486 DOI: 10.3390/biology12101338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/09/2023] [Accepted: 10/11/2023] [Indexed: 10/28/2023]
Abstract
Translation efficiency modulates gene expression in prokaryotes. The comparative analysis of translation elongation efficiency characteristics of Ralstonia genus bacteria genomes revealed that these characteristics diverge in accordance with the phylogeny of Ralstonia. The first branch of this genus is a group of bacteria commonly found in moist environments such as soil and water that includes the species R. mannitolilytica, R. insidiosa, and R. pickettii, which are also described as nosocomial infection pathogens. In contrast, the second branch is plant pathogenic bacteria consisting of R. solanacearum, R. pseudosolanacearum, and R. syzygii. We found that the soil Ralstonia have a significantly lower number and energy of potential secondary structures in mRNA and an increased role of codon usage bias in the optimization of highly expressed genes' translation elongation efficiency, not only compared to phytopathogenic Ralstonia but also to Cupriavidus necator, which is closely related to the Ralstonia genus. The observed alterations in translation elongation efficiency of orthologous genes are also reflected in the difference of potentially highly expressed gene' sets' content among Ralstonia branches with different lifestyles. Analysis of translation elongation efficiency characteristics can be considered a promising approach for studying complex mechanisms that determine the evolution and adaptation of bacteria in various environments.
Collapse
Affiliation(s)
- Aleksandra Y. Korenskaia
- Systems Biology Department, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Science, Lavrentiev Avenue 10, Novosibirsk 630090, Russia; (A.Y.K.); (Z.S.M.)
- Kurchatov Genomics Center, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Science, Lavrentiev Avenue 10, Novosibirsk 630090, Russia
- Department of Natural Sciences, Novosibirsk National Research State University, Pirogova St. 1, Novosibirsk 630090, Russia
| | - Yury G. Matushkin
- Systems Biology Department, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Science, Lavrentiev Avenue 10, Novosibirsk 630090, Russia; (A.Y.K.); (Z.S.M.)
- Department of Natural Sciences, Novosibirsk National Research State University, Pirogova St. 1, Novosibirsk 630090, Russia
| | - Zakhar S. Mustafin
- Systems Biology Department, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Science, Lavrentiev Avenue 10, Novosibirsk 630090, Russia; (A.Y.K.); (Z.S.M.)
| | - Sergey A. Lashin
- Systems Biology Department, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Science, Lavrentiev Avenue 10, Novosibirsk 630090, Russia; (A.Y.K.); (Z.S.M.)
- Kurchatov Genomics Center, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Science, Lavrentiev Avenue 10, Novosibirsk 630090, Russia
- Department of Natural Sciences, Novosibirsk National Research State University, Pirogova St. 1, Novosibirsk 630090, Russia
| | - Alexandra I. Klimenko
- Systems Biology Department, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Science, Lavrentiev Avenue 10, Novosibirsk 630090, Russia; (A.Y.K.); (Z.S.M.)
- Kurchatov Genomics Center, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Science, Lavrentiev Avenue 10, Novosibirsk 630090, Russia
| |
Collapse
|
38
|
Weber M, Sogues A, Yus E, Burgos R, Gallo C, Martínez S, Lluch‐Senar M, Serrano L. Comprehensive quantitative modeling of translation efficiency in a genome-reduced bacterium. Mol Syst Biol 2023; 19:e11301. [PMID: 37642167 PMCID: PMC10568206 DOI: 10.15252/msb.202211301] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Revised: 07/17/2023] [Accepted: 07/24/2023] [Indexed: 08/31/2023] Open
Abstract
Translation efficiency has been mainly studied by ribosome profiling, which only provides an incomplete picture of translation kinetics. Here, we integrated the absolute quantifications of tRNAs, mRNAs, RNA half-lives, proteins, and protein half-lives with ribosome densities and derived the initiation and elongation rates for 475 genes (67% of all genes), 73 with high precision, in the bacterium Mycoplasma pneumoniae (Mpn). We found that, although the initiation rate varied over 160-fold among genes, most of the known factors had little impact on translation efficiency. Local codon elongation rates could not be fully explained by the adaptation to tRNA abundances, which varied over 100-fold among tRNA isoacceptors. We provide a comprehensive quantitative view of translation efficiency, which suggests the existence of unidentified mechanisms of translational regulation in Mpn.
Collapse
Affiliation(s)
- Marc Weber
- Centre for Genomic Regulation (CRG)The Barcelona Institute of Science and TechnologyBarcelonaSpain
| | - Adrià Sogues
- Centre for Genomic Regulation (CRG)The Barcelona Institute of Science and TechnologyBarcelonaSpain
| | - Eva Yus
- Centre for Genomic Regulation (CRG)The Barcelona Institute of Science and TechnologyBarcelonaSpain
| | - Raul Burgos
- Centre for Genomic Regulation (CRG)The Barcelona Institute of Science and TechnologyBarcelonaSpain
| | - Carolina Gallo
- Centre for Genomic Regulation (CRG)The Barcelona Institute of Science and TechnologyBarcelonaSpain
| | - Sira Martínez
- Centre for Genomic Regulation (CRG)The Barcelona Institute of Science and TechnologyBarcelonaSpain
| | - Maria Lluch‐Senar
- Centre for Genomic Regulation (CRG)The Barcelona Institute of Science and TechnologyBarcelonaSpain
| | - Luis Serrano
- Centre for Genomic Regulation (CRG)The Barcelona Institute of Science and TechnologyBarcelonaSpain
- Universitat Pompeu Fabra (UPF)BarcelonaSpain
- ICREABarcelonaSpain
| |
Collapse
|
39
|
Hara K, Iwano N, Fukunaga T, Hamada M. DeepRaccess: high-speed RNA accessibility prediction using deep learning. FRONTIERS IN BIOINFORMATICS 2023; 3:1275787. [PMID: 37881622 PMCID: PMC10597636 DOI: 10.3389/fbinf.2023.1275787] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 09/29/2023] [Indexed: 10/27/2023] Open
Abstract
RNA accessibility is a useful RNA secondary structural feature for predicting RNA-RNA interactions and translation efficiency in prokaryotes. However, conventional accessibility calculation tools, such as Raccess, are computationally expensive and require considerable computational time to perform transcriptome-scale analysis. In this study, we developed DeepRaccess, which predicts RNA accessibility based on deep learning methods. DeepRaccess was trained to take artificial RNA sequences as input and to predict the accessibility of these sequences as calculated by Raccess. Simulation and empirical dataset analyses showed that the accessibility predicted by DeepRaccess was highly correlated with the accessibility calculated by Raccess. In addition, we confirmed that DeepRaccess could predict protein abundance in E.coli with moderate accuracy from the sequences around the start codon. We also demonstrated that DeepRaccess achieved tens to hundreds of times software speed-up in a GPU environment. The source codes and the trained models of DeepRaccess are freely available at https://github.com/hmdlab/DeepRaccess.
Collapse
Affiliation(s)
- Kaisei Hara
- Department of Electrical Engineering and Bioscience, Graduate School of Advanced Science and Engineering, Waseda University, Tokyo, Japan
- Computational Bio Big-Data Open Innovation Laboratory, AIST-Waseda University, Tokyo, Japan
| | - Natsuki Iwano
- Department of Electrical Engineering and Bioscience, Graduate School of Advanced Science and Engineering, Waseda University, Tokyo, Japan
| | - Tsukasa Fukunaga
- Waseda Institute for Advanced Study, Waseda University, Tokyo, Japan
| | - Michiaki Hamada
- Department of Electrical Engineering and Bioscience, Graduate School of Advanced Science and Engineering, Waseda University, Tokyo, Japan
- Computational Bio Big-Data Open Innovation Laboratory, AIST-Waseda University, Tokyo, Japan
- Graduate School of Medicine, Nippon Medical School, Tokyo, Japan
| |
Collapse
|
40
|
Tang M, Pan X, Yang T, You J, Zhu R, Yang T, Zhang X, Xu M, Rao Z. Multidimensional engineering of Escherichia coli for efficient synthesis of L-tryptophan. BIORESOURCE TECHNOLOGY 2023; 386:129475. [PMID: 37451510 DOI: 10.1016/j.biortech.2023.129475] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/11/2023] [Revised: 07/05/2023] [Accepted: 07/07/2023] [Indexed: 07/18/2023]
Abstract
Development of microbial cell factory for L-tryptophan (L-trp) production has received widespread attention but still requires extensive efforts due to weak metabolic flux distribution and low yield. Here, the riboswitch-based high-throughput screening (HTS) platform was established to construct a powerful L-trp-producing chassis cell. To facilitate L-trp biosynthesis, gene expression was regulated by promoter and N-terminal coding sequences (NCS) engineering. Modules of degradation, transport and by-product synthesis related to L-trp production were also fine-tuned. Next, a novel transcription factor YihL was excavated to negatively regulate L-trp biosynthesis. Self-regulated promoter-mediated dynamic regulation of branch pathways was performed and cofactor supply was improved for further L-trp biosynthesis. Finally, without extra addition, the yield of strain Trp30 reached 42.5 g/L and 0.178 g/g glucose after 48 h of cultivation in 5-L bioreactor. Overall, strategies described here worked up a promising method combining HTS and multidimensional regulation for developing cell factories for products in interest.
Collapse
Affiliation(s)
- Mi Tang
- Key Laboratory of Industrial Biotechnology of the Ministry of Education, Laboratory of Applied Microorganisms and Metabolic Engineering, School of Biotechnology, Jiangnan University, Wuxi 214122, China; Yixing Institute of Food and Biotechnology Co., Ltd, Yixing 214200, China
| | - Xuewei Pan
- Key Laboratory of Industrial Biotechnology of the Ministry of Education, Laboratory of Applied Microorganisms and Metabolic Engineering, School of Biotechnology, Jiangnan University, Wuxi 214122, China; Yixing Institute of Food and Biotechnology Co., Ltd, Yixing 214200, China
| | - Tianjin Yang
- Key Laboratory of Industrial Biotechnology of the Ministry of Education, Laboratory of Applied Microorganisms and Metabolic Engineering, School of Biotechnology, Jiangnan University, Wuxi 214122, China; Yixing Institute of Food and Biotechnology Co., Ltd, Yixing 214200, China
| | - Jiajia You
- Key Laboratory of Industrial Biotechnology of the Ministry of Education, Laboratory of Applied Microorganisms and Metabolic Engineering, School of Biotechnology, Jiangnan University, Wuxi 214122, China; Yixing Institute of Food and Biotechnology Co., Ltd, Yixing 214200, China
| | - Rongshuai Zhu
- Key Laboratory of Industrial Biotechnology of the Ministry of Education, Laboratory of Applied Microorganisms and Metabolic Engineering, School of Biotechnology, Jiangnan University, Wuxi 214122, China; Yixing Institute of Food and Biotechnology Co., Ltd, Yixing 214200, China
| | - Taowei Yang
- Key Laboratory of Industrial Biotechnology of the Ministry of Education, Laboratory of Applied Microorganisms and Metabolic Engineering, School of Biotechnology, Jiangnan University, Wuxi 214122, China; Yixing Institute of Food and Biotechnology Co., Ltd, Yixing 214200, China
| | - Xian Zhang
- Key Laboratory of Industrial Biotechnology of the Ministry of Education, Laboratory of Applied Microorganisms and Metabolic Engineering, School of Biotechnology, Jiangnan University, Wuxi 214122, China; Yixing Institute of Food and Biotechnology Co., Ltd, Yixing 214200, China
| | - Meijuan Xu
- Key Laboratory of Industrial Biotechnology of the Ministry of Education, Laboratory of Applied Microorganisms and Metabolic Engineering, School of Biotechnology, Jiangnan University, Wuxi 214122, China; Yixing Institute of Food and Biotechnology Co., Ltd, Yixing 214200, China
| | - Zhiming Rao
- Key Laboratory of Industrial Biotechnology of the Ministry of Education, Laboratory of Applied Microorganisms and Metabolic Engineering, School of Biotechnology, Jiangnan University, Wuxi 214122, China; Yixing Institute of Food and Biotechnology Co., Ltd, Yixing 214200, China.
| |
Collapse
|
41
|
Lewin LE, Daniels KG, Hurst LD. Genes for highly abundant proteins in Escherichia coli avoid 5' codons that promote ribosomal initiation. PLoS Comput Biol 2023; 19:e1011581. [PMID: 37878567 PMCID: PMC10599525 DOI: 10.1371/journal.pcbi.1011581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 10/09/2023] [Indexed: 10/27/2023] Open
Abstract
In many species highly expressed genes (HEGs) over-employ the synonymous codons that match the more abundant iso-acceptor tRNAs. Bacterial transgene codon randomization experiments report, however, that enrichment with such "translationally optimal" codons has little to no effect on the resultant protein level. By contrast, consistent with the view that ribosomal initiation is rate limiting, synonymous codon usage following the 5' ATG greatly influences protein levels, at least in part by modifying RNA stability. For the design of bacterial transgenes, for simple codon based in silico inference of protein levels and for understanding selection on synonymous mutations, it would be valuable to computationally determine initiation optimality (IO) scores for codons for any given species. One attractive approach is to characterize the 5' codon enrichment of HEGs compared with the most lowly expressed genes, just as translational optimality scores of codons have been similarly defined employing the full gene body. Here we determine the viability of this approach employing a unique opportunity: for Escherichia coli there is both the most extensive protein abundance data for native genes and a unique large-scale transgene codon randomization experiment enabling objective definition of the 5' codons that cause, rather than just correlate with, high protein abundance (that we equate with initiation optimality, broadly defined). Surprisingly, the 5' ends of native genes that specify highly abundant proteins avoid such initiation optimal codons. We find that this is probably owing to conflicting selection pressures particular to native HEGs, including selection favouring low initiation rates, this potentially enabling high efficiency of ribosomal usage and low noise. While the classical HEG enrichment approach does not work, rendering simple prediction of native protein abundance from 5' codon content futile, we report evidence that initiation optimality scores derived from the transgene experiment may hold relevance for in silico transgene design for a broad spectrum of bacteria.
Collapse
Affiliation(s)
- Loveday E. Lewin
- The Milner Centre for Evolution, Department of Life Sciences, University of Bath, Bath, United Kingdom
| | - Kate G. Daniels
- The Milner Centre for Evolution, Department of Life Sciences, University of Bath, Bath, United Kingdom
| | - Laurence D. Hurst
- The Milner Centre for Evolution, Department of Life Sciences, University of Bath, Bath, United Kingdom
| |
Collapse
|
42
|
Han Y, Li W, Filko A, Li J, Zhang F. Genome-wide promoter responses to CRISPR perturbations of regulators reveal regulatory networks in Escherichia coli. Nat Commun 2023; 14:5757. [PMID: 37717013 PMCID: PMC10505187 DOI: 10.1038/s41467-023-41572-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Accepted: 09/08/2023] [Indexed: 09/18/2023] Open
Abstract
Elucidating genome-scale regulatory networks requires a comprehensive collection of gene expression profiles, yet measuring gene expression responses for every transcription factor (TF)-gene pair in living prokaryotic cells remains challenging. Here, we develop pooled promoter responses to TF perturbation sequencing (PPTP-seq) via CRISPR interference to address this challenge. Using PPTP-seq, we systematically measure the activity of 1372 Escherichia coli promoters under single knockdown of 183 TF genes, illustrating more than 200,000 possible TF-gene responses in one experiment. We perform PPTP-seq for E. coli growing in three different media. The PPTP-seq data reveal robust steady-state promoter activities under most single TF knockdown conditions. PPTP-seq also enables identifications of, to the best of our knowledge, previously unknown TF autoregulatory responses and complex transcriptional control on one-carbon metabolism. We further find context-dependent promoter regulation by multiple TFs whose relative binding strengths determined promoter activities. Additionally, PPTP-seq reveals different promoter responses in different growth media, suggesting condition-specific gene regulation. Overall, PPTP-seq provides a powerful method to examine genome-wide transcriptional regulatory networks and can be potentially expanded to reveal gene expression responses to other genetic elements.
Collapse
Affiliation(s)
- Yichao Han
- Department of Energy, Environmental and Chemical Engineering, Washington University in St. Louis, Saint Louis, Missouri, USA
| | - Wanji Li
- Department of Energy, Environmental and Chemical Engineering, Washington University in St. Louis, Saint Louis, Missouri, USA
| | - Alden Filko
- Department of Energy, Environmental and Chemical Engineering, Washington University in St. Louis, Saint Louis, Missouri, USA
| | - Jingyao Li
- Department of Energy, Environmental and Chemical Engineering, Washington University in St. Louis, Saint Louis, Missouri, USA
| | - Fuzhong Zhang
- Department of Energy, Environmental and Chemical Engineering, Washington University in St. Louis, Saint Louis, Missouri, USA.
- Division of Biological and Biomedical Sciences, Washington University in St. Louis, Saint Louis, Missouri, USA.
- Institute of Materials Science and Engineering, Washington University in St. Louis, Saint Louis, Missouri, USA.
| |
Collapse
|
43
|
Umemoto S, Kondo T, Fujino T, Hayashi G, Murakami H. Large-scale analysis of mRNA sequences localized near the start and amber codons and their impact on the diversity of mRNA display libraries. Nucleic Acids Res 2023; 51:7465-7479. [PMID: 37395404 PMCID: PMC10415131 DOI: 10.1093/nar/gkad555] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Revised: 06/14/2023] [Accepted: 06/18/2023] [Indexed: 07/04/2023] Open
Abstract
Extremely diverse libraries are essential for effectively selecting functional peptides or proteins, and mRNA display technology is a powerful tool for generating such libraries with over 1012-1013 diversity. Particularly, the protein-puromycin linker (PuL)/mRNA complex formation yield is determining for preparing the libraries. However, how mRNA sequences affect the complex formation yield remains unclear. To study the effects of N-terminal and C-terminal coding sequences on the complex formation yield, puromycin-attached mRNAs containing three random codons after the start codon (32768 sequences) or seven random bases next to the amber codon (6480 sequences) were translated. Enrichment scores were calculated by dividing the appearance rate of every sequence in protein-PuL/mRNA complexes by that in total mRNAs. The wide range of enrichment scores (0.09-2.10 for N-terminal and 0.30-4.23 for C-terminal coding sequences) indicated that the N-terminal and C-terminal coding sequences strongly affected the complex formation yield. Using C-terminal GGC-CGA-UAG-U sequences, which resulted in the highest enrichment scores, we constructed highly diverse libraries of monobodies and macrocyclic peptides. The present study provides insights into how mRNA sequences affect the protein/mRNA complex formation yield and will accelerate the identification of functional peptides and proteins involved in various biological processes and having therapeutic applications.
Collapse
Affiliation(s)
- Shun Umemoto
- Department of Biomolecular Engineering, Graduate School of Engineering, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8603, Japan
| | - Taishi Kondo
- Department of Biomolecular Engineering, Graduate School of Engineering, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8603, Japan
| | - Tomoshige Fujino
- Department of Biomolecular Engineering, Graduate School of Engineering, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8603, Japan
| | - Gosuke Hayashi
- Department of Biomolecular Engineering, Graduate School of Engineering, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8603, Japan
- Japan Science and Technology Agency (JST), PRESTO, 4-1-8 Honcho, Kawaguchi, Saitama 332-0012, Japan
| | - Hiroshi Murakami
- Department of Biomolecular Engineering, Graduate School of Engineering, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8603, Japan
- Institute of Nano-Life-Systems, Institutes of Innovation for Future Society, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8603, Japan
| |
Collapse
|
44
|
Nikolados EM, Oyarzún DA. Deep learning for optimization of protein expression. Curr Opin Biotechnol 2023; 81:102941. [PMID: 37087839 DOI: 10.1016/j.copbio.2023.102941] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Revised: 02/02/2023] [Accepted: 03/17/2023] [Indexed: 04/25/2023]
Abstract
Recent progress in high-throughput DNA synthesis and sequencing has enabled the development of massively parallel reporter assays for strain characterization. These datasets map a large number of DNA sequences to protein expression levels, sparking increased interest in data-driven methods for sequence-to-expression modeling. Here, we highlight advances in deep learning models of protein expression and their potential for optimizing strains engineered to produce recombinant proteins. We review recent works that built highly accurate models and discuss challenges that hinder adoption by end users. There is a need to better align this technology with the constraints encountered in strain engineering, particularly the cost of acquiring large amounts of data and the requirement for interpretable models that generalize beyond the training data. Overcoming these barriers will help to incentivize academic and industrial laboratories to tap into a new era of data-centric strain engineering.
Collapse
Affiliation(s)
| | - Diego A Oyarzún
- School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JH, UK; School of Informatics, University of Edinburgh, Edinburgh EH8 9AB, UK; The Alan Turing Institute, London NW1 2DB, UK.
| |
Collapse
|
45
|
Höllerer S, Jeschek M. Ultradeep characterisation of translational sequence determinants refutes rare-codon hypothesis and unveils quadruplet base pairing of initiator tRNA and transcript. Nucleic Acids Res 2023; 51:2377-2396. [PMID: 36727459 PMCID: PMC10018350 DOI: 10.1093/nar/gkad040] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Revised: 12/05/2022] [Accepted: 01/13/2023] [Indexed: 02/03/2023] Open
Abstract
Translation is a key determinant of gene expression and an important biotechnological engineering target. In bacteria, 5'-untranslated region (5'-UTR) and coding sequence (CDS) are well-known mRNA parts controlling translation and thus cellular protein levels. However, the complex interaction of 5'-UTR and CDS has so far only been studied for few sequences leading to non-generalisable and partly contradictory conclusions. Herein, we systematically assess the dynamic translation from over 1.2 million 5'-UTR-CDS pairs in Escherichia coli to investigate their collective effect using a new method for ultradeep sequence-function mapping. This allows us to disentangle and precisely quantify effects of various sequence determinants of translation. We find that 5'-UTR and CDS individually account for 53% and 20% of variance in translation, respectively, and show conclusively that, contrary to a common hypothesis, tRNA abundance does not explain expression changes between CDSs with different synonymous codons. Moreover, the obtained large-scale data provide clear experimental evidence for a base-pairing interaction between initiator tRNA and mRNA beyond the anticodon-codon interaction, an effect that is often masked for individual sequences and therefore inaccessible to low-throughput approaches. Our study highlights the indispensability of ultradeep sequence-function mapping to accurately determine the contribution of parts and phenomena involved in gene regulation.
Collapse
Affiliation(s)
- Simon Höllerer
- Department of Biosystems Science and Engineering, Swiss Federal Institute of Technology – ETH Zurich, Basel CH-4058, Switzerland
| | - Markus Jeschek
- Department of Biosystems Science and Engineering, Swiss Federal Institute of Technology – ETH Zurich, Basel CH-4058, Switzerland
- Institute of Microbiology, Synthetic Microbiology Group, University of Regensburg, Regensburg D-93053, Germany
| |
Collapse
|
46
|
Picard MAL, Leblay F, Cassan C, Willemsen A, Daron J, Bauffe F, Decourcelle M, Demange A, Bravo IG. Transcriptomic, proteomic, and functional consequences of codon usage bias in human cells during heterologous gene expression. Protein Sci 2023; 32:e4576. [PMID: 36692287 PMCID: PMC9926478 DOI: 10.1002/pro.4576] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 01/12/2023] [Accepted: 01/14/2023] [Indexed: 01/25/2023]
Abstract
Differences in codon frequency between genomes, genes, or positions along a gene, modulate transcription and translation efficiency, leading to phenotypic and functional differences. Here, we present a multiscale analysis of the effects of synonymous codon recoding during heterologous gene expression in human cells, quantifying the phenotypic consequences of codon usage bias at different molecular and cellular levels, with an emphasis on translation elongation. Six synonymous versions of an antibiotic resistance gene were generated, fused to a fluorescent reporter, and independently expressed in HEK293 cells. Multiscale phenotype was analyzed by means of quantitative transcriptome and proteome assessment, as proxies for gene expression; cellular fluorescence, as a proxy for single-cell level expression; and real-time cell proliferation in absence or presence of antibiotic, as a proxy for the cell fitness. We show that differences in codon usage bias strongly impact the molecular and cellular phenotype: (i) they result in large differences in mRNA levels and protein levels, leading to differences of over 15 times in translation efficiency; (ii) they introduce unpredicted splicing events; (iii) they lead to reproducible phenotypic heterogeneity; and (iv) they lead to a trade-off between the benefit of antibiotic resistance and the burden of heterologous expression. In human cells in culture, codon usage bias modulates gene expression by modifying mRNA availability and suitability for translation, leading to differences in protein levels and eventually eliciting functional phenotypic changes.
Collapse
Affiliation(s)
- Marion A. L. Picard
- French National Center for Scientific ResearchLaboratory MIVEGEC (CNRS, IRD, University of Montpellier)MontpellierFrance
| | - Fiona Leblay
- French National Center for Scientific ResearchLaboratory MIVEGEC (CNRS, IRD, University of Montpellier)MontpellierFrance
| | - Cécile Cassan
- French National Center for Scientific ResearchLaboratory MIVEGEC (CNRS, IRD, University of Montpellier)MontpellierFrance
| | - Anouk Willemsen
- French National Center for Scientific ResearchLaboratory MIVEGEC (CNRS, IRD, University of Montpellier)MontpellierFrance
| | - Josquin Daron
- French National Center for Scientific ResearchLaboratory MIVEGEC (CNRS, IRD, University of Montpellier)MontpellierFrance
| | - Frédérique Bauffe
- French National Center for Scientific ResearchLaboratory MIVEGEC (CNRS, IRD, University of Montpellier)MontpellierFrance
| | - Mathilde Decourcelle
- BioCampus Montpellier (University of Montpellier, CNRS, INSERM)MontpellierFrance
| | - Antonin Demange
- French National Center for Scientific ResearchLaboratory MIVEGEC (CNRS, IRD, University of Montpellier)MontpellierFrance
| | - Ignacio G. Bravo
- French National Center for Scientific ResearchLaboratory MIVEGEC (CNRS, IRD, University of Montpellier)MontpellierFrance
| |
Collapse
|
47
|
Mao Y, Jia L, Dong L, Shu XE, Qian SB. Start codon-associated ribosomal frameshifting mediates nutrient stress adaptation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.15.528768. [PMID: 36824937 PMCID: PMC9949036 DOI: 10.1101/2023.02.15.528768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
Abstract
A translating ribosome is typically thought to follow the reading frame defined by the selected start codon. Using super-resolution ribosome profiling, here we report pervasive out-of-frame translation immediately from the start codon. The start codon-associated ribosome frameshifting (SCARF) stems from the slippage of ribosomes during the transition from initiation to elongation. Using a massively paralleled reporter assay, we uncovered sequence elements acting as SCARF enhancers or repressors, implying that start codon recognition is coupled with reading frame fidelity. This finding explains thousands of mass spectrometry spectra unannotated from human proteome. Mechanistically, we find that the eukaryotic initiation factor 5B (eIF5B) maintains the reading frame fidelity by stabilizing initiating ribosomes. Intriguingly, amino acid starvation induces SCARF by proteasomal degradation of eIF5B. The stress-induced SCARF protects cells from starvation by enabling amino acid recycling and selective mRNA translation. Our findings illustrate a beneficial effect of translational "noise" in nutrient stress adaptation.
Collapse
|
48
|
Zabolotskii AI, Kozlovskiy SV, Katrukha AG. The Influence of the Nucleotide Composition of Genes and Gene Regulatory Elements on the Efficiency of Protein Expression in Escherichia coli. BIOCHEMISTRY (MOSCOW) 2023; 88:S176-S191. [PMID: 37069120 DOI: 10.1134/s0006297923140109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/22/2023]
Abstract
Recombinant proteins expressed in Escherichia coli are widely used in biochemical research and industrial processes. At the same time, achieving higher protein expression levels and correct protein folding still remains the key problem, since optimization of nutrient media, growth conditions, and methods for induction of protein synthesis do not always lead to the desired result. Often, low protein expression is determined by the sequences of the expressed genes and their regulatory regions. The genetic code is degenerated; 18 out of 20 amino acids are encoded by more than one codon. Choosing between synonymous codons in the coding sequence can significantly affect the level of protein expression and protein folding due to the influence of the gene nucleotide composition on the probability of formation of secondary mRNA structures that affect the ribosome binding at the translation initiation phase, as well as the ribosome movement along the mRNA during elongation, which, in turn, influences the mRNA degradation and the folding of the nascent protein. The nucleotide composition of the mRNA untranslated regions, in particular the promoter and Shine-Dalgarno sequences, also affects the efficiency of mRNA transcription, translation, and degradation. In this review, we describe the genetic principles that determine the efficiency of protein production in Escherichia coli.
Collapse
Affiliation(s)
- Artur I Zabolotskii
- Faculty of Biology, Lomonosov Moscow State University, Moscow, 119991, Russia.
| | | | - Alexey G Katrukha
- Faculty of Biology, Lomonosov Moscow State University, Moscow, 119991, Russia
| |
Collapse
|
49
|
Gilliot PA, Gorochowski TE. Design and Analysis of Massively Parallel Reporter Assays Using FORECAST. Methods Mol Biol 2023; 2553:41-56. [PMID: 36227538 DOI: 10.1007/978-1-0716-2617-7_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Machine learning is revolutionizing molecular biology and bioengineering by providing powerful insights and predictions. Massively parallel reporter assays (MPRAs) have emerged as a particularly valuable class of high-throughput technique to support such algorithms. MPRAs enable the simultaneous characterization of thousands or even millions of genetic constructs and provide the large amounts of data needed to train models. However, while the scale of this approach is impressive, the design of effective MPRA experiments is challenging due to the many factors that can be varied and the difficulty in predicting how these will impact the quality and quantity of data obtained. Here, we present a computational tool called FORECAST, which can simulate MPRA experiments based on fluorescence-activated cell sorting and subsequent sequencing (commonly referred to as Flow-seq or Sort-seq experiments), as well as carry out rigorous statistical estimation of construct performance from this type of experimental data. FORECAST can be used to develop workflows to aid the design of MPRA experiments and reanalyze existing MPRA data sets.
Collapse
|
50
|
Accuracy and data efficiency in deep learning models of protein expression. Nat Commun 2022; 13:7755. [PMID: 36517468 PMCID: PMC9751117 DOI: 10.1038/s41467-022-34902-5] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Accepted: 11/10/2022] [Indexed: 12/23/2022] Open
Abstract
Synthetic biology often involves engineering microbial strains to express high-value proteins. Thanks to progress in rapid DNA synthesis and sequencing, deep learning has emerged as a promising approach to build sequence-to-expression models for strain optimization. But such models need large and costly training data that create steep entry barriers for many laboratories. Here we study the relation between accuracy and data efficiency in an atlas of machine learning models trained on datasets of varied size and sequence diversity. We show that deep learning can achieve good prediction accuracy with much smaller datasets than previously thought. We demonstrate that controlled sequence diversity leads to substantial gains in data efficiency and employed Explainable AI to show that convolutional neural networks can finely discriminate between input DNA sequences. Our results provide guidelines for designing genotype-phenotype screens that balance cost and quality of training data, thus helping promote the wider adoption of deep learning in the biotechnology sector.
Collapse
|