1
|
Shen Y, Kudla G, Oyarzún DA. Improving the generalization of protein expression models with mechanistic sequence information. Nucleic Acids Res 2025; 53:gkaf020. [PMID: 39873269 PMCID: PMC11773361 DOI: 10.1093/nar/gkaf020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2024] [Revised: 12/12/2024] [Accepted: 01/08/2025] [Indexed: 01/30/2025] Open
Abstract
The growing demand for biological products drives many efforts to maximize expression of heterologous proteins. Advances in high-throughput sequencing can produce data suitable for building sequence-to-expression models with machine learning. The most accurate models have been trained on one-hot encodings, a mechanism-agnostic representation of nucleotide sequences. Moreover, studies have consistently shown that training on mechanistic sequence features leads to much poorer predictions, even with features that are known to correlate with expression, such as DNA sequence motifs, codon usage, or properties of mRNA secondary structures. However, despite their excellent local accuracy, current sequence-to-expression models can fail to generalize predictions far away from the training data. Through a comparative study across datasets in Escherichia coli and Saccharomyces cerevisiae, here we show that mechanistic sequence features can provide gains on model generalization, and thus improve their utility for predictive sequence design. We explore several strategies to integrate one-hot encodings and mechanistic features into a single predictive model, including feature stacking, ensemble model stacking, and geometric stacking, a novel architecture based on graph convolutional neural networks. Our work casts new light on mechanistic sequence features, underscoring the importance of domain-knowledge and feature engineering for accurate prediction of protein expression levels.
Collapse
Affiliation(s)
- Yuxin Shen
- School of Biological Sciences, University of Edinburgh, Edinburgh, EH9 3JH, United Kingdom
| | - Grzegorz Kudla
- Institute for Genetics and Cancer, University of Edinburgh, Edinburgh, EH4 2XU, United Kingdom
| | - Diego A Oyarzún
- School of Biological Sciences, University of Edinburgh, Edinburgh, EH9 3JH, United Kingdom
- School of Informatics, University of Edinburgh, Edinburgh, EH8 9AB, United Kingdom
| |
Collapse
|
2
|
Jiang K, Yan Z, Di Bernardo M, Sgrizzi SR, Villiger L, Kayabolen A, Kim BJ, Carscadden JK, Hiraizumi M, Nishimasu H, Gootenberg JS, Abudayyeh OO. Rapid in silico directed evolution by a protein language model with EVOLVEpro. Science 2025; 387:eadr6006. [PMID: 39571002 DOI: 10.1126/science.adr6006] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Accepted: 11/12/2024] [Indexed: 01/25/2025]
Abstract
Directed protein evolution is central to biomedical applications but faces challenges such as experimental complexity, inefficient multiproperty optimization, and local maxima traps. Although in silico methods that use protein language models (PLMs) can provide modeled fitness landscape guidance, they struggle to generalize across diverse protein families and map to protein activity. We present EVOLVEpro, a few-shot active learning framework that combines PLMs and regression models to rapidly improve protein activity. EVOLVEpro surpasses current methods, yielding up to 100-fold improvements in desired properties. We demonstrate its effectiveness across six proteins in RNA production, genome editing, and antibody binding applications. These results highlight the advantages of few-shot active learning with minimal experimental data over zero-shot predictions. EVOLVEpro opens new possibilities for artificial intelligence-guided protein engineering in biology and medicine.
Collapse
Affiliation(s)
- Kaiyi Jiang
- Department of Medicine Division of Engineering in Medicine Brigham and Women's Hospital Harvard Medical School, Boston, MA, USA
- Gene and Cell Therapy Institute Mass General Brigham, Cambridge, MA, USA
- Center for Virology and Vaccine Research Beth Israel Deaconess Medical Center Harvard Medical School, Boston, MA, USA
- Department of Bioengineering Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Zhaoqing Yan
- Department of Medicine Division of Engineering in Medicine Brigham and Women's Hospital Harvard Medical School, Boston, MA, USA
- Gene and Cell Therapy Institute Mass General Brigham, Cambridge, MA, USA
- Center for Virology and Vaccine Research Beth Israel Deaconess Medical Center Harvard Medical School, Boston, MA, USA
| | - Matteo Di Bernardo
- Whitehead Institute Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Samantha R Sgrizzi
- Department of Medicine Division of Engineering in Medicine Brigham and Women's Hospital Harvard Medical School, Boston, MA, USA
- Gene and Cell Therapy Institute Mass General Brigham, Cambridge, MA, USA
- Center for Virology and Vaccine Research Beth Israel Deaconess Medical Center Harvard Medical School, Boston, MA, USA
| | - Lukas Villiger
- Department of Dermatology and Allergology Kantonspital St. Gallen, St. Gallen, Switzerland
| | - Alisan Kayabolen
- Department of Medicine Division of Engineering in Medicine Brigham and Women's Hospital Harvard Medical School, Boston, MA, USA
- Gene and Cell Therapy Institute Mass General Brigham, Cambridge, MA, USA
- Center for Virology and Vaccine Research Beth Israel Deaconess Medical Center Harvard Medical School, Boston, MA, USA
| | - B J Kim
- Koch Institute for Integrative Cancer Research at MIT Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Josephine K Carscadden
- Department of Medicine Division of Engineering in Medicine Brigham and Women's Hospital Harvard Medical School, Boston, MA, USA
- Gene and Cell Therapy Institute Mass General Brigham, Cambridge, MA, USA
- Center for Virology and Vaccine Research Beth Israel Deaconess Medical Center Harvard Medical School, Boston, MA, USA
| | - Masahiro Hiraizumi
- Department of Chemistry and Biotechnology, Graduate School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, Japan
| | - Hiroshi Nishimasu
- Department of Chemistry and Biotechnology, Graduate School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, Japan
- Structural Biology Division, Research Center for Advanced Science and Technology, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo, Japan
- Inamori Research Institute for Science, 620 Suiginya-cho, Shimogyo-ku, Kyoto, Japan
| | - Jonathan S Gootenberg
- Department of Medicine Division of Engineering in Medicine Brigham and Women's Hospital Harvard Medical School, Boston, MA, USA
- Gene and Cell Therapy Institute Mass General Brigham, Cambridge, MA, USA
- Center for Virology and Vaccine Research Beth Israel Deaconess Medical Center Harvard Medical School, Boston, MA, USA
| | - Omar O Abudayyeh
- Department of Medicine Division of Engineering in Medicine Brigham and Women's Hospital Harvard Medical School, Boston, MA, USA
- Gene and Cell Therapy Institute Mass General Brigham, Cambridge, MA, USA
- Center for Virology and Vaccine Research Beth Israel Deaconess Medical Center Harvard Medical School, Boston, MA, USA
| |
Collapse
|
3
|
Nguyen E, Poli M, Durrant MG, Kang B, Katrekar D, Li DB, Bartie LJ, Thomas AW, King SH, Brixi G, Sullivan J, Ng MY, Lewis A, Lou A, Ermon S, Baccus SA, Hernandez-Boussard T, Ré C, Hsu PD, Hie BL. Sequence modeling and design from molecular to genome scale with Evo. Science 2024; 386:eado9336. [PMID: 39541441 PMCID: PMC12057570 DOI: 10.1126/science.ado9336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Accepted: 09/09/2024] [Indexed: 11/16/2024]
Abstract
The genome is a sequence that encodes the DNA, RNA, and proteins that orchestrate an organism's function. We present Evo, a long-context genomic foundation model with a frontier architecture trained on millions of prokaryotic and phage genomes, and report scaling laws on DNA to complement observations in language and vision. Evo generalizes across DNA, RNA, and proteins, enabling zero-shot function prediction competitive with domain-specific language models and the generation of functional CRISPR-Cas and transposon systems, representing the first examples of protein-RNA and protein-DNA codesign with a language model. Evo also learns how small mutations affect whole-organism fitness and generates megabase-scale sequences with plausible genomic architecture. These prediction and generation capabilities span molecular to genomic scales of complexity, advancing our understanding and control of biology.
Collapse
Affiliation(s)
- Eric Nguyen
- Arc Institute, Palo Alto, CA, USA
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Michael Poli
- Department of Computer Science, Stanford University, Stanford, CA, USA
- TogetherAI, San Francisco, CA, USA
| | | | - Brian Kang
- Arc Institute, Palo Alto, CA, USA
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | | | - David B. Li
- Arc Institute, Palo Alto, CA, USA
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | | | - Armin W. Thomas
- Stanford Data Science, Stanford University, Stanford, CA, USA
| | - Samuel H. King
- Arc Institute, Palo Alto, CA, USA
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Garyk Brixi
- Arc Institute, Palo Alto, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
| | | | - Madelena Y. Ng
- Stanford Center for Biomedical Informatics Research, Stanford, CA, USA
| | - Ashley Lewis
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Aaron Lou
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Stefano Ermon
- Department of Computer Science, Stanford University, Stanford, CA, USA
- CZ Biohub, San Francisco, CA, USA
| | | | | | - Christopher Ré
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Patrick D. Hsu
- Arc Institute, Palo Alto, CA, USA
- Department of Bioengineering and Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Brian L. Hie
- Arc Institute, Palo Alto, CA, USA
- Stanford Data Science, Stanford University, Stanford, CA, USA
- Department of Chemical Engineering, Stanford University, Stanford, CA, USA
| |
Collapse
|
4
|
Shao B, Yan J. A long-context language model for deciphering and generating bacteriophage genomes. Nat Commun 2024; 15:9392. [PMID: 39477977 PMCID: PMC11525655 DOI: 10.1038/s41467-024-53759-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Accepted: 10/22/2024] [Indexed: 11/02/2024] Open
Abstract
Inspired by the success of large language models (LLMs), we develop a long-context generative model for genomes. Our multiscale transformer model, megaDNA, is pre-trained on unannotated bacteriophage genomes with nucleotide-level tokenization. We demonstrate the foundational capabilities of our model including the prediction of essential genes, genetic variant effects, regulatory element activity and taxonomy of unannotated sequences. Furthermore, it generates de novo sequences up to 96 K base pairs, which contain potential regulatory elements and annotated proteins with phage-related functions.
Collapse
Affiliation(s)
- Bin Shao
- Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, 100081, China.
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, 02138, USA.
| | - Jiawei Yan
- Independent researcher, 100 N Gushan Rd, Shanghai, 200135, China
| |
Collapse
|
5
|
Torrillo PA, Lieberman TD. Reversions mask the contribution of adaptive evolution in microbiomes. eLife 2024; 13:e93146. [PMID: 39240756 PMCID: PMC11379459 DOI: 10.7554/elife.93146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 07/30/2024] [Indexed: 09/08/2024] Open
Abstract
When examining bacterial genomes for evidence of past selection, the results depend heavily on the mutational distance between chosen genomes. Even within a bacterial species, genomes separated by larger mutational distances exhibit stronger evidence of purifying selection as assessed by dN/dS, the normalized ratio of nonsynonymous to synonymous mutations. Here, we show that the classical interpretation of this scale dependence, weak purifying selection, leads to problematic mutation accumulation when applied to available gut microbiome data. We propose an alternative, adaptive reversion model with opposite implications for dynamical intuition and applications of dN/dS. Reversions that occur and sweep within-host populations are nearly guaranteed in microbiomes due to large population sizes, short generation times, and variable environments. Using analytical and simulation approaches, we show that adaptive reversion can explain the dN/dS decay given only dozens of locally fluctuating selective pressures, which is realistic in the context of Bacteroides genomes. The success of the adaptive reversion model argues for interpreting low values of dN/dS obtained from long timescales with caution as they may emerge even when adaptive sweeps are frequent. Our work thus inverts the interpretation of an old observation in bacterial evolution, illustrates the potential of mutational reversions to shape genomic landscapes over time, and highlights the importance of studying bacterial genomic evolution on short timescales.
Collapse
Affiliation(s)
- Paul A Torrillo
- Institute for Medical Engineering and Sciences, Massachusetts Institute of TechnologyCambridgeUnited States
- Department of Civil and Environmental Engineering, Massachusetts Institute of TechnologyCambridgeUnited States
| | - Tami D Lieberman
- Institute for Medical Engineering and Sciences, Massachusetts Institute of TechnologyCambridgeUnited States
- Department of Civil and Environmental Engineering, Massachusetts Institute of TechnologyCambridgeUnited States
- Broad Institute of MIT and HarvardCambridgeUnited States
- Ragon Institute of MGH, MIT and HarvardCambridgeUnited States
| |
Collapse
|
6
|
Jiang K, Yan Z, Di Bernardo M, Sgrizzi SR, Villiger L, Kayabolen A, Kim B, Carscadden JK, Hiraizumi M, Nishimasu H, Gootenberg JS, Abudayyeh OO. Rapid protein evolution by few-shot learning with a protein language model. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.17.604015. [PMID: 39071429 PMCID: PMC11275896 DOI: 10.1101/2024.07.17.604015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
Directed evolution of proteins is critical for applications in basic biological research, therapeutics, diagnostics, and sustainability. However, directed evolution methods are labor intensive, cannot efficiently optimize over multiple protein properties, and are often trapped by local maxima. In silico-directed evolution methods incorporating protein language models (PLMs) have the potential to accelerate this engineering process, but current approaches fail to generalize across diverse protein families. We introduce EVOLVEpro, a few-shot active learning framework to rapidly improve protein activity using a combination of PLMs and protein activity predictors, achieving improved activity with as few as four rounds of evolution. EVOLVEpro substantially enhances the efficiency and effectiveness of in silico protein evolution, surpassing current state-of-the-art methods and yielding proteins with up to 100-fold improvement of desired properties. We showcase EVOLVEpro for five proteins across three applications: T7 RNA polymerase for RNA production, a miniature CRISPR nuclease, a prime editor, and an integrase for genome editing, and a monoclonal antibody for epitope binding. These results demonstrate the advantages of few-shot active learning with small amounts of experimental data over zero-shot predictions. EVOLVEpro paves the way for broader applications of AI-guided protein engineering in biology and medicine.
Collapse
Affiliation(s)
- Kaiyi Jiang
- Department of Medicine Division of Engineering in Medicine Brigham and Women’s Hospital Harvard Medical School Boston, 02115 MA, USA
- Gene and Cell Therapy Institute Mass General Brigham Cambridge, 02139 MA, USA
- Center for Virology and Vaccine Research Beth Israel Deaconess Medical Center Harvard Medical School Boston, 02115 MA, USA
- Department of Bioengineering Massachusetts Institute of Technology Cambridge, 02139 MA, USA
| | - Zhaoqing Yan
- Department of Medicine Division of Engineering in Medicine Brigham and Women’s Hospital Harvard Medical School Boston, 02115 MA, USA
- Gene and Cell Therapy Institute Mass General Brigham Cambridge, 02139 MA, USA
- Center for Virology and Vaccine Research Beth Israel Deaconess Medical Center Harvard Medical School Boston, 02115 MA, USA
| | - Matteo Di Bernardo
- Department of Bioengineering Massachusetts Institute of Technology Cambridge, 02139 MA, USA
| | - Samantha R. Sgrizzi
- Department of Medicine Division of Engineering in Medicine Brigham and Women’s Hospital Harvard Medical School Boston, 02115 MA, USA
- Gene and Cell Therapy Institute Mass General Brigham Cambridge, 02139 MA, USA
- Center for Virology and Vaccine Research Beth Israel Deaconess Medical Center Harvard Medical School Boston, 02115 MA, USA
| | - Lukas Villiger
- Department of Dermatology and Allergology Kantonspital St. Gallen St. Gallen, 9000, Switzerland
| | - Alisan Kayabolen
- Department of Medicine Division of Engineering in Medicine Brigham and Women’s Hospital Harvard Medical School Boston, 02115 MA, USA
- Gene and Cell Therapy Institute Mass General Brigham Cambridge, 02139 MA, USA
- Center for Virology and Vaccine Research Beth Israel Deaconess Medical Center Harvard Medical School Boston, 02115 MA, USA
| | - Byungji Kim
- Koch Institute for Integrative Cancer Research At MIT Massachusetts Institute of Technology Cambridge, 02139 MA, USA
| | - Josephine K. Carscadden
- Department of Medicine Division of Engineering in Medicine Brigham and Women’s Hospital Harvard Medical School Boston, 02115 MA, USA
- Gene and Cell Therapy Institute Mass General Brigham Cambridge, 02139 MA, USA
- Center for Virology and Vaccine Research Beth Israel Deaconess Medical Center Harvard Medical School Boston, 02115 MA, USA
| | - Masahiro Hiraizumi
- Department of Chemistry and Biotechnology, Graduate School of Engineering, The University of Tokyo 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
| | - Hiroshi Nishimasu
- Department of Chemistry and Biotechnology, Graduate School of Engineering, The University of Tokyo 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
- Structural Biology Division, Research Center for Advanced Science and Technology, The University of Tokyo 4-6-1 Komaba, Meguro-ku, Tokyo 153-8904, Japan
- Inamori Research Institute for Science 620 Suiginya-cho, Shimogyo-ku, Kyoto 600-8411, Japan
| | - Jonathan S. Gootenberg
- Department of Medicine Division of Engineering in Medicine Brigham and Women’s Hospital Harvard Medical School Boston, 02115 MA, USA
- Gene and Cell Therapy Institute Mass General Brigham Cambridge, 02139 MA, USA
- Center for Virology and Vaccine Research Beth Israel Deaconess Medical Center Harvard Medical School Boston, 02115 MA, USA
| | - Omar O. Abudayyeh
- Department of Medicine Division of Engineering in Medicine Brigham and Women’s Hospital Harvard Medical School Boston, 02115 MA, USA
- Gene and Cell Therapy Institute Mass General Brigham Cambridge, 02139 MA, USA
- Center for Virology and Vaccine Research Beth Israel Deaconess Medical Center Harvard Medical School Boston, 02115 MA, USA
| |
Collapse
|
7
|
Banayan NE, Loughlin BJ, Singh S, Forouhar F, Lu G, Wong K, Neky M, Hunt HS, Bateman LB, Tamez A, Handelman SK, Price WN, Hunt JF. Systematic enhancement of protein crystallization efficiency by bulk lysine-to-arginine (KR) substitution. Protein Sci 2024; 33:e4898. [PMID: 38358135 PMCID: PMC10868448 DOI: 10.1002/pro.4898] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Revised: 01/01/2024] [Accepted: 01/02/2024] [Indexed: 02/16/2024]
Abstract
Structural genomics consortia established that protein crystallization is the primary obstacle to structure determination using x-ray crystallography. We previously demonstrated that crystallization propensity is systematically related to primary sequence, and we subsequently performed computational analyses showing that arginine is the most overrepresented amino acid in crystal-packing interfaces in the Protein Data Bank. Given the similar physicochemical characteristics of arginine and lysine, we hypothesized that multiple lysine-to-arginine (KR) substitutions should improve crystallization. To test this hypothesis, we developed software that ranks lysine sites in a target protein based on the redundancy-corrected KR substitution frequency in homologs. This software can be run interactively on the worldwide web at https://www.pxengineering.org/. We demonstrate that three unrelated single-domain proteins can tolerate 5-11 KR substitutions with at most minor destabilization, and, for two of these three proteins, the construct with the largest number of KR substitutions exhibits significantly enhanced crystallization propensity. This approach rapidly produced a 1.9 Å crystal structure of a human protein domain refractory to crystallization with its native sequence. Structures from Bulk KR-substituted domains show the engineered arginine residues frequently make hydrogen-bonds across crystal-packing interfaces. We thus demonstrate that Bulk KR substitution represents a rational and efficient method for probabilistic engineering of protein surface properties to improve crystallization.
Collapse
Affiliation(s)
- Nooriel E. Banayan
- Department of Biological Sciences702A Sherman Fairchild Center, MC2434, Columbia UniversityNew YorkNew YorkUSA
| | - Blaine J. Loughlin
- Department of Biological Sciences702A Sherman Fairchild Center, MC2434, Columbia UniversityNew YorkNew YorkUSA
| | - Shikha Singh
- Department of Biological Sciences702A Sherman Fairchild Center, MC2434, Columbia UniversityNew YorkNew YorkUSA
| | - Farhad Forouhar
- Department of Biological Sciences702A Sherman Fairchild Center, MC2434, Columbia UniversityNew YorkNew YorkUSA
| | - Guanqi Lu
- Department of Biological Sciences702A Sherman Fairchild Center, MC2434, Columbia UniversityNew YorkNew YorkUSA
| | - Kam‐Ho Wong
- Department of Biological Sciences702A Sherman Fairchild Center, MC2434, Columbia UniversityNew YorkNew YorkUSA
- Present address:
Vaccine Research and DevelopmentPfizer Inc.Pearl RiverNew YorkUSA
| | - Matthew Neky
- Department of Biological Sciences702A Sherman Fairchild Center, MC2434, Columbia UniversityNew YorkNew YorkUSA
- Present address:
Columbia UniversityNew YorkNew YorkUSA
| | - Henry S. Hunt
- Department of PhysicsStanford UniversityStanfordCaliforniaUSA
| | | | | | - Samuel K. Handelman
- Department of Biological Sciences702A Sherman Fairchild Center, MC2434, Columbia UniversityNew YorkNew YorkUSA
- Present address:
Department of Pain & Neuronal HealthEli Lily & Co.893 Delaware StIndianapolisIndianaUSA
| | - W. Nicholson Price
- Department of Biological Sciences702A Sherman Fairchild Center, MC2434, Columbia UniversityNew YorkNew YorkUSA
- Present address:
University of Michigan Law SchoolAnn ArborMichiganUSA
| | - John F. Hunt
- Department of Biological Sciences702A Sherman Fairchild Center, MC2434, Columbia UniversityNew YorkNew YorkUSA
| |
Collapse
|
8
|
Hie BL, Shanker VR, Xu D, Bruun TUJ, Weidenbacher PA, Tang S, Wu W, Pak JE, Kim PS. Efficient evolution of human antibodies from general protein language models. Nat Biotechnol 2024; 42:275-283. [PMID: 37095349 PMCID: PMC10869273 DOI: 10.1038/s41587-023-01763-2] [Citation(s) in RCA: 135] [Impact Index Per Article: 135.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Accepted: 03/28/2023] [Indexed: 04/26/2023]
Abstract
Natural evolution must explore a vast landscape of possible sequences for desirable yet rare mutations, suggesting that learning from natural evolutionary strategies could guide artificial evolution. Here we report that general protein language models can efficiently evolve human antibodies by suggesting mutations that are evolutionarily plausible, despite providing the model with no information about the target antigen, binding specificity or protein structure. We performed language-model-guided affinity maturation of seven antibodies, screening 20 or fewer variants of each antibody across only two rounds of laboratory evolution, and improved the binding affinities of four clinically relevant, highly mature antibodies up to sevenfold and three unmatured antibodies up to 160-fold, with many designs also demonstrating favorable thermostability and viral neutralization activity against Ebola and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pseudoviruses. The same models that improve antibody binding also guide efficient evolution across diverse protein families and selection pressures, including antibiotic resistance and enzyme activity, suggesting that these results generalize to many settings.
Collapse
Affiliation(s)
- Brian L Hie
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, USA.
- Sarafan ChEM-H, Stanford University, Stanford, CA, USA.
| | - Varun R Shanker
- Sarafan ChEM-H, Stanford University, Stanford, CA, USA
- Stanford Medical Scientist Training Program, Stanford University School of Medicine, Stanford, CA, USA
| | - Duo Xu
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, USA
- Sarafan ChEM-H, Stanford University, Stanford, CA, USA
| | - Theodora U J Bruun
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, USA
- Sarafan ChEM-H, Stanford University, Stanford, CA, USA
- Stanford Medical Scientist Training Program, Stanford University School of Medicine, Stanford, CA, USA
| | - Payton A Weidenbacher
- Sarafan ChEM-H, Stanford University, Stanford, CA, USA
- Department of Chemistry, Stanford University, Stanford, CA, USA
| | - Shaogeng Tang
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, USA
- Sarafan ChEM-H, Stanford University, Stanford, CA, USA
| | - Wesley Wu
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - John E Pak
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Peter S Kim
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, USA.
- Sarafan ChEM-H, Stanford University, Stanford, CA, USA.
- Chan Zuckerberg Biohub, San Francisco, CA, USA.
| |
Collapse
|
9
|
Doud MB, Gupta A, Li V, Medina SJ, De La Fuente CA, Meyer JR. Competition-driven eco-evolutionary feedback reshapes bacteriophage lambda's fitness landscape and enables speciation. Nat Commun 2024; 15:863. [PMID: 38286804 PMCID: PMC10825149 DOI: 10.1038/s41467-024-45008-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Accepted: 01/11/2024] [Indexed: 01/31/2024] Open
Abstract
A major challenge in evolutionary biology is explaining how populations navigate rugged fitness landscapes without getting trapped on local optima. One idea illustrated by adaptive dynamics theory is that as populations adapt, their newly enhanced capacities to exploit resources alter fitness payoffs and restructure the landscape in ways that promote speciation by opening new adaptive pathways. While there have been indirect tests of this theory, to our knowledge none have measured how fitness landscapes deform during adaptation, or test whether these shifts promote diversification. Here, we achieve this by studying bacteriophage [Formula: see text], a virus that readily speciates into co-existing receptor specialists under controlled laboratory conditions. We use a high-throughput gene editing-phenotyping technology to measure [Formula: see text]'s fitness landscape in the presence of different evolved-[Formula: see text] competitors and find that the fitness effects of individual mutations, and their epistatic interactions, depend on the competitor. Using these empirical data, we simulate [Formula: see text]'s evolution on an unchanging landscape and one that recapitulates how the landscape deforms during evolution. [Formula: see text] heterogeneity only evolves in the shifting landscape regime. This study provides a test of adaptive dynamics, and, more broadly, shows how fitness landscapes dynamically change during adaptation, potentiating phenomena like speciation by opening new adaptive pathways.
Collapse
Affiliation(s)
- Michael B Doud
- Department of Medicine, Division of Infectious Diseases and Global Public Health, University of California San Diego, San Diego, CA, USA
- Department of Ecology, Behavior and Evolution, University of California San Diego, San Diego, CA, USA
| | - Animesh Gupta
- Department of Ecology, Behavior and Evolution, University of California San Diego, San Diego, CA, USA
| | - Victor Li
- Department of Ecology, Behavior and Evolution, University of California San Diego, San Diego, CA, USA
| | - Sarah J Medina
- Department of Ecology, Behavior and Evolution, University of California San Diego, San Diego, CA, USA
| | - Caesar A De La Fuente
- Department of Ecology, Behavior and Evolution, University of California San Diego, San Diego, CA, USA
| | - Justin R Meyer
- Department of Ecology, Behavior and Evolution, University of California San Diego, San Diego, CA, USA.
| |
Collapse
|
10
|
Notin P, Kollasch AW, Ritter D, van Niekerk L, Paul S, Spinner H, Rollins N, Shaw A, Weitzman R, Frazer J, Dias M, Franceschi D, Orenbuch R, Gal Y, Marks DS. ProteinGym: Large-Scale Benchmarks for Protein Design and Fitness Prediction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.07.570727. [PMID: 38106144 PMCID: PMC10723403 DOI: 10.1101/2023.12.07.570727] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
Predicting the effects of mutations in proteins is critical to many applications, from understanding genetic disease to designing novel proteins that can address our most pressing challenges in climate, agriculture and healthcare. Despite a surge in machine learning-based protein models to tackle these questions, an assessment of their respective benefits is challenging due to the use of distinct, often contrived, experimental datasets, and the variable performance of models across different protein families. Addressing these challenges requires scale. To that end we introduce ProteinGym, a large-scale and holistic set of benchmarks specifically designed for protein fitness prediction and design. It encompasses both a broad collection of over 250 standardized deep mutational scanning assays, spanning millions of mutated sequences, as well as curated clinical datasets providing high-quality expert annotations about mutation effects. We devise a robust evaluation framework that combines metrics for both fitness prediction and design, factors in known limitations of the underlying experimental methods, and covers both zero-shot and supervised settings. We report the performance of a diverse set of over 70 high-performing models from various subfields (eg., alignment-based, inverse folding) into a unified benchmark suite. We open source the corresponding codebase, datasets, MSAs, structures, model predictions and develop a user-friendly website that facilitates data access and analysis.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Ada Shaw
- Applied Mathematics, Harvard University
| | | | | | - Mafalda Dias
- Centre for Genomic Regulation, Universitat Pompeu Fabra
| | | | | | - Yarin Gal
- Computer Science, University of Oxford
| | | |
Collapse
|
11
|
Yang KB, Cameranesi M, Gowder M, Martinez C, Shamovsky Y, Epshtein V, Hao Z, Nguyen T, Nirenstein E, Shamovsky I, Rasouly A, Nudler E. High-resolution landscape of an antibiotic binding site. Nature 2023; 622:180-187. [PMID: 37648864 PMCID: PMC10550828 DOI: 10.1038/s41586-023-06495-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 07/28/2023] [Indexed: 09/01/2023]
Abstract
Antibiotic binding sites are located in important domains of essential enzymes and have been extensively studied in the context of resistance mutations; however, their study is limited by positive selection. Using multiplex genome engineering1 to overcome this constraint, we generate and characterize a collection of 760 single-residue mutants encompassing the entire rifampicin binding site of Escherichia coli RNA polymerase (RNAP). By genetically mapping drug-enzyme interactions, we identify an alpha helix where mutations considerably enhance or disrupt rifampicin binding. We find mutations in this region that prolong antibiotic binding, converting rifampicin from a bacteriostatic to bactericidal drug by inducing lethal DNA breaks. The latter are replication dependent, indicating that rifampicin kills by causing detrimental transcription-replication conflicts at promoters. We also identify additional binding site mutations that greatly increase the speed of RNAP.Fast RNAP depletes the cell of nucleotides, alters cell sensitivity to different antibiotics and provides a cold growth advantage. Finally, by mapping natural rpoB sequence diversity, we discover that functional rifampicin binding site mutations that alter RNAP properties or confer drug resistance occur frequently in nature.
Collapse
Affiliation(s)
- Kevin B Yang
- Department of Biochemistry and Molecular Pharmacology, New York University Grossman School of Medicine, New York, NY, USA
| | - Maria Cameranesi
- Department of Biochemistry and Molecular Pharmacology, New York University Grossman School of Medicine, New York, NY, USA
| | - Manjunath Gowder
- Department of Biochemistry and Molecular Pharmacology, New York University Grossman School of Medicine, New York, NY, USA
| | - Criseyda Martinez
- Department of Biochemistry and Molecular Pharmacology, New York University Grossman School of Medicine, New York, NY, USA
| | - Yosef Shamovsky
- Department of Biochemistry and Molecular Pharmacology, New York University Grossman School of Medicine, New York, NY, USA
| | - Vitaliy Epshtein
- Department of Biochemistry and Molecular Pharmacology, New York University Grossman School of Medicine, New York, NY, USA
| | - Zhitai Hao
- Department of Biochemistry and Molecular Pharmacology, New York University Grossman School of Medicine, New York, NY, USA
| | - Thao Nguyen
- Department of Biochemistry and Molecular Pharmacology, New York University Grossman School of Medicine, New York, NY, USA
| | - Eric Nirenstein
- Department of Biochemistry and Molecular Pharmacology, New York University Grossman School of Medicine, New York, NY, USA
| | - Ilya Shamovsky
- Department of Biochemistry and Molecular Pharmacology, New York University Grossman School of Medicine, New York, NY, USA
| | - Aviram Rasouly
- Department of Biochemistry and Molecular Pharmacology, New York University Grossman School of Medicine, New York, NY, USA.
- Howard Hughes Medical Institute, New York University School of Medicine, New York, NY, USA.
| | - Evgeny Nudler
- Department of Biochemistry and Molecular Pharmacology, New York University Grossman School of Medicine, New York, NY, USA.
- Howard Hughes Medical Institute, New York University School of Medicine, New York, NY, USA.
| |
Collapse
|
12
|
Doud MB, Gupta A, Li V, Medina SJ, De La Fuente CA, Meyer JR. Competition-driven eco-evolutionary feedback reshapes bacteriophage lambda's fitness landscape and enables speciation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.11.553017. [PMID: 37645887 PMCID: PMC10461988 DOI: 10.1101/2023.08.11.553017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
A major challenge in evolutionary biology is explaining how populations navigate rugged fitness landscapes without getting trapped on local optima. One idea illustrated by adaptive dynamics theory is that as populations adapt, their newly enhanced capacities to exploit resources alter fitness payoffs and restructure the landscape in ways that promote speciation by opening new adaptive pathways. While there have been indirect tests of this theory, none have measured how fitness landscapes deform during adaptation, or test whether these shifts promote diversification. Here, we achieve this by studying bacteriophage λ, a virus that readily speciates into co-existing receptor specialists under controlled laboratory conditions. We used a high-throughput gene editing-phenotyping technology to measure λ's fitness landscape in the presence of different evolved-λ competitors and found that the fitness effects of individual mutations, and their epistatic interactions, depend on the competitor. Using these empirical data, we simulated λ's evolution on an unchanging landscape and one that recapitulates how the landscape deforms during evolution. λ heterogeneity only evolved in the shifting landscape regime. This study provides a test of adaptive dynamics, and, more broadly, shows how fitness landscapes dynamically change during adaptation, potentiating phenomena like speciation by opening new adaptive pathways.
Collapse
Affiliation(s)
- Michael B. Doud
- Department of Medicine, Division of Infectious Diseases and Global Public Health, University of California San Diego, San Diego, CA, USA
- Department of Ecology, Behavior and Evolution, University of California San Diego, San Diego, CA, USA
| | - Animesh Gupta
- Department of Ecology, Behavior and Evolution, University of California San Diego, San Diego, CA, USA
| | - Victor Li
- Department of Ecology, Behavior and Evolution, University of California San Diego, San Diego, CA, USA
| | - Sarah J. Medina
- Department of Ecology, Behavior and Evolution, University of California San Diego, San Diego, CA, USA
| | - Caesar A. De La Fuente
- Department of Ecology, Behavior and Evolution, University of California San Diego, San Diego, CA, USA
| | - Justin R. Meyer
- Department of Ecology, Behavior and Evolution, University of California San Diego, San Diego, CA, USA
| |
Collapse
|
13
|
Umemoto S, Kondo T, Fujino T, Hayashi G, Murakami H. Large-scale analysis of mRNA sequences localized near the start and amber codons and their impact on the diversity of mRNA display libraries. Nucleic Acids Res 2023; 51:7465-7479. [PMID: 37395404 PMCID: PMC10415131 DOI: 10.1093/nar/gkad555] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Revised: 06/14/2023] [Accepted: 06/18/2023] [Indexed: 07/04/2023] Open
Abstract
Extremely diverse libraries are essential for effectively selecting functional peptides or proteins, and mRNA display technology is a powerful tool for generating such libraries with over 1012-1013 diversity. Particularly, the protein-puromycin linker (PuL)/mRNA complex formation yield is determining for preparing the libraries. However, how mRNA sequences affect the complex formation yield remains unclear. To study the effects of N-terminal and C-terminal coding sequences on the complex formation yield, puromycin-attached mRNAs containing three random codons after the start codon (32768 sequences) or seven random bases next to the amber codon (6480 sequences) were translated. Enrichment scores were calculated by dividing the appearance rate of every sequence in protein-PuL/mRNA complexes by that in total mRNAs. The wide range of enrichment scores (0.09-2.10 for N-terminal and 0.30-4.23 for C-terminal coding sequences) indicated that the N-terminal and C-terminal coding sequences strongly affected the complex formation yield. Using C-terminal GGC-CGA-UAG-U sequences, which resulted in the highest enrichment scores, we constructed highly diverse libraries of monobodies and macrocyclic peptides. The present study provides insights into how mRNA sequences affect the protein/mRNA complex formation yield and will accelerate the identification of functional peptides and proteins involved in various biological processes and having therapeutic applications.
Collapse
Affiliation(s)
- Shun Umemoto
- Department of Biomolecular Engineering, Graduate School of Engineering, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8603, Japan
| | - Taishi Kondo
- Department of Biomolecular Engineering, Graduate School of Engineering, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8603, Japan
| | - Tomoshige Fujino
- Department of Biomolecular Engineering, Graduate School of Engineering, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8603, Japan
| | - Gosuke Hayashi
- Department of Biomolecular Engineering, Graduate School of Engineering, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8603, Japan
- Japan Science and Technology Agency (JST), PRESTO, 4-1-8 Honcho, Kawaguchi, Saitama 332-0012, Japan
| | - Hiroshi Murakami
- Department of Biomolecular Engineering, Graduate School of Engineering, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8603, Japan
- Institute of Nano-Life-Systems, Institutes of Innovation for Future Society, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8603, Japan
| |
Collapse
|
14
|
Limdi A, Baym M. Resolving Deleterious and Near-Neutral Effects Requires Different Pooled Fitness Assay Designs. J Mol Evol 2023; 91:325-333. [PMID: 37160452 DOI: 10.1007/s00239-023-10110-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Accepted: 04/06/2023] [Indexed: 05/11/2023]
Abstract
Pooled sequencing-based fitness assays are a powerful and widely used approach to quantifying fitness of thousands of genetic variants in parallel. Despite the throughput of such assays, they are prone to biases in fitness estimates, and errors in measurements are typically larger for deleterious fitness effects, relative to neutral effects. In practice, designing pooled fitness assays involves tradeoffs between the number of timepoints, the sequencing depth, and other parameters to gain as much information as possible within a feasible experiment. Here, we combined simulations and reanalysis of an existing experimental dataset to explore how assay parameters impact measurements of near-neutral and deleterious fitness effects using a standard fitness estimator. We found that sequencing multiple timepoints at relatively modest depth improved estimates of near-neutral fitness effects, but systematically biased measurements of deleterious effects. We showed that a fixed total number of reads, deeper sequencing at fewer timepoints improved resolution of deleterious fitness effects. Our results highlight a tradeoff between measurement of deleterious and near-neutral effect sizes for a fixed amount of data and suggest that fitness assay design should be tuned for fitness effects that are relevant to the specific biological question.
Collapse
Affiliation(s)
- Anurag Limdi
- Department of Biomedical Informatics and Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, 02115, USA
| | - Michael Baym
- Department of Biomedical Informatics and Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, 02115, USA.
| |
Collapse
|
15
|
Nieuwkoop T, Terlouw BR, Stevens KG, Scheltema R, de Ridder D, van der Oost J, Claassens N. Revealing determinants of translation efficiency via whole-gene codon randomization and machine learning. Nucleic Acids Res 2023; 51:2363-2376. [PMID: 36718935 PMCID: PMC10018363 DOI: 10.1093/nar/gkad035] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 12/14/2022] [Accepted: 01/16/2023] [Indexed: 02/01/2023] Open
Abstract
It has been known for decades that codon usage contributes to translation efficiency and hence to protein production levels. However, its role in protein synthesis is still only partly understood. This lack of understanding hampers the design of synthetic genes for efficient protein production. In this study, we generated a synonymous codon-randomized library of the complete coding sequence of red fluorescent protein. Protein production levels and the full coding sequences were determined for 1459 gene variants in Escherichia coli. Using different machine learning approaches, these data were used to reveal correlations between codon usage and protein production. Interestingly, protein production levels can be relatively accurately predicted (Pearson correlation of 0.762) by a Random Forest model that only relies on the sequence information of the first eight codons. In this region, close to the translation initiation site, mRNA secondary structure rather than Codon Adaptation Index (CAI) is the key determinant of protein production. This study clearly demonstrates the key role of codons at the start of the coding sequence. Furthermore, these results imply that commonly used CAI-based codon optimization of the full coding sequence is not a very effective strategy. One should rather focus on optimizing protein production via reducing mRNA secondary structure formation with the first few codons.
Collapse
Affiliation(s)
| | | | - Katherine G Stevens
- Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, Padualaan 8, 3584 CH Utrecht, The Netherlands
- Netherlands Proteomics Center, Padualaan 8, 3584 CH Utrecht, The Netherlands
| | - Richard A Scheltema
- Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, Padualaan 8, 3584 CH Utrecht, The Netherlands
- Netherlands Proteomics Center, Padualaan 8, 3584 CH Utrecht, The Netherlands
| | - Dick de Ridder
- Bioinformatics Group, Wageningen University, Wageningen, Droevendaalsesteeg 1, 6708 PB, The Netherlands
| | - John van der Oost
- Correspondence may also be addressed to John van der Oost. Tel: +31 317483740;
| | | |
Collapse
|
16
|
Zabolotskii AI, Kozlovskiy SV, Katrukha AG. The Influence of the Nucleotide Composition of Genes and Gene Regulatory Elements on the Efficiency of Protein Expression in Escherichia coli. BIOCHEMISTRY (MOSCOW) 2023; 88:S176-S191. [PMID: 37069120 DOI: 10.1134/s0006297923140109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/22/2023]
Abstract
Recombinant proteins expressed in Escherichia coli are widely used in biochemical research and industrial processes. At the same time, achieving higher protein expression levels and correct protein folding still remains the key problem, since optimization of nutrient media, growth conditions, and methods for induction of protein synthesis do not always lead to the desired result. Often, low protein expression is determined by the sequences of the expressed genes and their regulatory regions. The genetic code is degenerated; 18 out of 20 amino acids are encoded by more than one codon. Choosing between synonymous codons in the coding sequence can significantly affect the level of protein expression and protein folding due to the influence of the gene nucleotide composition on the probability of formation of secondary mRNA structures that affect the ribosome binding at the translation initiation phase, as well as the ribosome movement along the mRNA during elongation, which, in turn, influences the mRNA degradation and the folding of the nascent protein. The nucleotide composition of the mRNA untranslated regions, in particular the promoter and Shine-Dalgarno sequences, also affects the efficiency of mRNA transcription, translation, and degradation. In this review, we describe the genetic principles that determine the efficiency of protein production in Escherichia coli.
Collapse
Affiliation(s)
- Artur I Zabolotskii
- Faculty of Biology, Lomonosov Moscow State University, Moscow, 119991, Russia.
| | | | - Alexey G Katrukha
- Faculty of Biology, Lomonosov Moscow State University, Moscow, 119991, Russia
| |
Collapse
|
17
|
Gupta A, Zaman L, Strobel HM, Gallie J, Burmeister AR, Kerr B, Tamar ES, Kishony R, Meyer JR. Host-parasite coevolution promotes innovation through deformations in fitness landscapes. eLife 2022; 11:e76162. [PMID: 35793223 PMCID: PMC9259030 DOI: 10.7554/elife.76162] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Accepted: 06/20/2022] [Indexed: 01/14/2023] Open
Abstract
During the struggle for survival, populations occasionally evolve new functions that give them access to untapped ecological opportunities. Theory suggests that coevolution between species can promote the evolution of such innovations by deforming fitness landscapes in ways that open new adaptive pathways. We directly tested this idea by using high-throughput gene editing-phenotyping technology (MAGE-Seq) to measure the fitness landscape of a virus, bacteriophage λ, as it coevolved with its host, the bacterium Escherichia coli. An analysis of the empirical fitness landscape revealed mutation-by-mutation-by-host-genotype interactions that demonstrate coevolution modified the contours of λ's landscape. Computer simulations of λ's evolution on a static versus shifting fitness landscape showed that the changes in contours increased λ's chances of evolving the ability to use a new host receptor. By coupling sequencing and pairwise competition experiments, we demonstrated that the first mutation λ evolved en route to the innovation would only evolve in the presence of the ancestral host, whereas later steps in λ's evolution required the shift to a resistant host. When time-shift replays of the coevolution experiment were run where host evolution was artificially accelerated, λ did not innovate to use the new receptor. This study provides direct evidence for the role of coevolution in driving evolutionary novelty and provides a quantitative framework for predicting evolution in coevolving ecological communities.
Collapse
Affiliation(s)
- Animesh Gupta
- Department of Physics, University of California San DiegoLa JollaUnited States
| | - Luis Zaman
- Department of Ecology and Evolutionary Biology, University of MichiganAnn ArborUnited States
| | - Hannah M Strobel
- Department of Ecology, Behavior and Evolution, University of California San DiegoLa JollaUnited States
| | - Jenna Gallie
- Department of Evolutionary Theory, Max Planck Institute for Evolutionary BiologyPlönGermany
| | - Alita R Burmeister
- Department of Ecology and Evolutionary Biology, Yale UniversityNew HavenUnited States
| | - Benjamin Kerr
- Department of Biology, University of WashingtonSeattleUnited States
| | - Einat S Tamar
- Department of Biology, Technion – Israel Institute of TechnologyHaifaIsrael
| | - Roy Kishony
- Department of Biology, Technion – Israel Institute of TechnologyHaifaIsrael
| | - Justin R Meyer
- Department of Ecology, Behavior and Evolution, University of California San DiegoLa JollaUnited States
| |
Collapse
|
18
|
Weissenow K, Heinzinger M, Rost B. Protein language-model embeddings for fast, accurate, and alignment-free protein structure prediction. Structure 2022; 30:1169-1177.e4. [DOI: 10.1016/j.str.2022.05.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 02/25/2022] [Accepted: 04/29/2022] [Indexed: 01/27/2023]
|
19
|
Rapid expansion and extinction of antibiotic resistance mutations during treatment of acute bacterial respiratory infections. Nat Commun 2022; 13:1231. [PMID: 35264582 PMCID: PMC8907320 DOI: 10.1038/s41467-022-28188-w] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 01/07/2022] [Indexed: 11/18/2022] Open
Abstract
Acute bacterial infections are often treated empirically, with the choice of antibiotic therapy updated during treatment. The effects of such rapid antibiotic switching on the evolution of antibiotic resistance in individual patients are poorly understood. Here we find that low-frequency antibiotic resistance mutations emerge, contract, and even go to extinction within days of changes in therapy. We analyzed Pseudomonas aeruginosa populations in sputum samples collected serially from 7 mechanically ventilated patients at the onset of respiratory infection. Combining short- and long-read sequencing and resistance phenotyping of 420 isolates revealed that while new infections are near-clonal, reflecting a recent colonization bottleneck, resistance mutations could emerge at low frequencies within days of therapy. We then measured the in vivo frequencies of select resistance mutations in intact sputum samples with resistance-targeted deep amplicon sequencing (RETRA-Seq), which revealed that rare resistance mutations not detected by clinically used culture-based methods can increase by nearly 40-fold over 5–12 days in response to antibiotic changes. Conversely, mutations conferring resistance to antibiotics not administered diminish and even go to extinction. Our results underscore how therapy choice shapes the dynamics of low-frequency resistance mutations at short time scales, and the findings provide a possibility for driving resistance mutations to extinction during early stages of infection by designing patient-specific antibiotic cycling strategies informed by deep genomic surveillance. It remains unclear how rapid antibiotic switching affects the evolution of antibiotic resistance in individual patients. Here, Chung et al. combine short- and long-read sequencing and resistance phenotyping of 420 serial isolates of Pseudomonas aeruginosa collected from the onset of respiratory infection, and show that rare resistance mutations can increase by nearly 40-fold over 5–12 days in response to antibiotic changes, while mutations conferring resistance to antibiotics not administered diminish and even go to extinction.
Collapse
|
20
|
Høie MH, Cagiada M, Beck Frederiksen AH, Stein A, Lindorff-Larsen K. Predicting and interpreting large-scale mutagenesis data using analyses of protein stability and conservation. Cell Rep 2022; 38:110207. [PMID: 35021073 DOI: 10.1016/j.celrep.2021.110207] [Citation(s) in RCA: 64] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Revised: 10/01/2021] [Accepted: 12/13/2021] [Indexed: 01/23/2023] Open
Abstract
Understanding and predicting the functional consequences of single amino acid changes is central in many areas of protein science. Here, we collect and analyze experimental measurements of effects of >150,000 variants in 29 proteins. We use biophysical calculations to predict changes in stability for each variant and assess them in light of sequence conservation. We find that the sequence analyses give more accurate prediction of variant effects than predictions of stability and that about half of the variants that show loss of function do so due to stability effects. We construct a machine learning model to predict variant effects from protein structure and sequence alignments and show how the two sources of information support one another and enable mechanistic interpretations. Together, our results show how one can leverage large-scale experimental assessments of variant effects to gain deeper and general insights into the mechanisms that cause loss of function.
Collapse
Affiliation(s)
- Magnus Haraldson Høie
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, DK-2200 Copenhagen N, Denmark
| | - Matteo Cagiada
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, DK-2200 Copenhagen N, Denmark
| | - Anders Haagen Beck Frederiksen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, DK-2200 Copenhagen N, Denmark
| | - Amelie Stein
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, DK-2200 Copenhagen N, Denmark.
| | - Kresten Lindorff-Larsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, DK-2200 Copenhagen N, Denmark.
| |
Collapse
|
21
|
Park J, Wang HH. Systematic dissection of σ 70 sequence diversity and function in bacteria. Cell Rep 2021; 36:109590. [PMID: 34433066 PMCID: PMC8716302 DOI: 10.1016/j.celrep.2021.109590] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Revised: 04/19/2021] [Accepted: 08/02/2021] [Indexed: 10/29/2022] Open
Abstract
Primary σ70 factors are key conserved bacterial regulatory proteins that interact with regulatory DNA to control gene expression. It is, however, poorly understood whether σ70 sequence diversity in different bacteria reflects functional differences. Here, we employ comparative and functional genomics to explore the sequence and function relationship of primary σ70. Using multiplex automated genome engineering and deep sequencing (MAGE-seq), we generate a saturation mutagenesis library and high-resolution fitness map of E. coli σ70 in domains 2-4. Mapping natural σ70 sequence diversity to the E. coli σ70 fitness landscape reveals significant predicted fitness deficits across σ70 orthologs. Interestingly, these predicted deficits are larger than observed fitness changes for 15 σ70 orthologs introduced into E. coli. Finally, we use a multiplexed transcriptional reporter assay and RNA sequencing (RNA-seq) to explore functional differences of several σ70 orthologs. This work provides an in-depth analysis of σ70 sequence and function to improve efforts to understand the evolution and engineering potential of this global regulator.
Collapse
Affiliation(s)
- Jimin Park
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA; Integrated Program in Cellular, Molecular and Biomedical Studies, Columbia University Irving Medical Center, New York, NY, USA.
| | - Harris H Wang
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA; Department of Pathology and Cell Biology, Columbia University Irving Medical Center, New York, NY, USA.
| |
Collapse
|
22
|
Abstract
Bacterial protein synthesis rates have evolved to maintain preferred stoichiometries at striking precision, from the components of protein complexes to constituents of entire pathways. Setting relative protein production rates to be well within a factor of two requires concerted tuning of transcription, RNA turnover, and translation, allowing many potential regulatory strategies to achieve the preferred output. The last decade has seen a greatly expanded capacity for precise interrogation of each step of the central dogma genome-wide. Here, we summarize how these technologies have shaped the current understanding of diverse bacterial regulatory architectures underpinning stoichiometric protein synthesis. We focus on the emerging expanded view of bacterial operons, which encode diverse primary and secondary mRNA structures for tuning protein stoichiometry. Emphasis is placed on how quantitative tuning is achieved. We discuss the challenges and open questions in the application of quantitative, genome-wide methodologies to the problem of precise protein production. Expected final online publication date for the Annual Review of Microbiology, Volume 75 is October 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- James C Taggart
- Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA; ,
| | - Jean-Benoît Lalanne
- Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA; , .,Department of Physics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA.,Current affiliation: Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA;
| | - Gene-Wei Li
- Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA; ,
| |
Collapse
|
23
|
Filsinger GT, Wannier TM, Pedersen FB, Lutz ID, Zhang J, Stork DA, Debnath A, Gozzi K, Kuchwara H, Volf V, Wang S, Rios X, Gregg CJ, Lajoie MJ, Shipman SL, Aach J, Laub MT, Church GM. Characterizing the portability of phage-encoded homologous recombination proteins. Nat Chem Biol 2021; 17:394-402. [PMID: 33462496 PMCID: PMC7990699 DOI: 10.1038/s41589-020-00710-5] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Revised: 11/02/2020] [Accepted: 11/13/2020] [Indexed: 01/29/2023]
Abstract
Efficient genome editing methods are essential for biotechnology and fundamental research. Homologous recombination (HR) is the most versatile method of genome editing, but techniques that rely on host RecA-mediated pathways are inefficient and laborious. Phage-encoded single-stranded DNA annealing proteins (SSAPs) improve HR 1,000-fold above endogenous levels. However, they are not broadly functional. Using Escherichia coli, Lactococcus lactis, Mycobacterium smegmatis, Lactobacillus rhamnosus and Caulobacter crescentus, we investigated the limited portability of SSAPs. We find that these proteins specifically recognize the C-terminal tail of the host's single-stranded DNA-binding protein (SSB) and are portable between species only if compatibility with this host domain is maintained. Furthermore, we find that co-expressing SSAPs with SSBs can significantly improve genome editing efficiency, in some species enabling SSAP functionality even without host compatibility. Finally, we find that high-efficiency HR far surpasses the mutational capacity of commonly used random mutagenesis methods, generating exceptional phenotypes that are inaccessible through sequential nucleotide conversions.
Collapse
Affiliation(s)
- Gabriel T. Filsinger
- Department of Systems Biology, Harvard Medical School, Boston, Massachusetts, USA.,Wyss Institute for Biologically Inspired Engineering, Harvard University, Cambridge, Massachusetts, USA.,Correspondence to: ,
| | - Timothy M. Wannier
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Cambridge, Massachusetts, USA.,Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA
| | - Felix B. Pedersen
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense M, Denmark
| | - Isaac D. Lutz
- Institute for Protein Design, University of Washington, Seattle, Washington, USA.,Department of Bioengineering, University of Washington, Seattle, Washington, USA
| | - Julie Zhang
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Devon A. Stork
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Cambridge, Massachusetts, USA.,Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, USA
| | - Anik Debnath
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA.,Tenza Inc., Cambridge, MA
| | - Kevin Gozzi
- Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Helene Kuchwara
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA
| | - Verena Volf
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Cambridge, Massachusetts, USA.,Harvard University John A. Paulson School of Engineering and Applied Sciences, Cambridge, Massachusetts, USA
| | - Stan Wang
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Cambridge, Massachusetts, USA.,Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA
| | - Xavier Rios
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA
| | | | - Marc J. Lajoie
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA
| | - Seth L. Shipman
- Gladstone Institutes, San Francisco, CA,Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA
| | - John Aach
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA
| | - Michael T. Laub
- Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - George M. Church
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Cambridge, Massachusetts, USA.,Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA.,Correspondence to: ,
| |
Collapse
|
24
|
Wannier TM, Ciaccia PN, Ellington AD, Filsinger GT, Isaacs FJ, Javanmardi K, Jones MA, Kunjapur AM, Nyerges A, Pal C, Schubert MG, Church GM. Recombineering and MAGE. NATURE REVIEWS. METHODS PRIMERS 2021; 1:7. [PMID: 35540496 PMCID: PMC9083505 DOI: 10.1038/s43586-020-00006-x] [Citation(s) in RCA: 51] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 11/19/2020] [Indexed: 12/17/2022]
Abstract
Recombination-mediated genetic engineering, also known as recombineering, is the genomic incorporation of homologous single-stranded or double-stranded DNA into bacterial genomes. Recombineering and its derivative methods have radically improved genome engineering capabilities, perhaps none more so than multiplex automated genome engineering (MAGE). MAGE is representative of a set of highly multiplexed single-stranded DNA-mediated technologies. First described in Escherichia coli, both MAGE and recombineering are being rapidly translated into diverse prokaryotes and even into eukaryotic cells. Together, this modern set of tools offers the promise of radically improving the scope and throughput of experimental biology by providing powerful new methods to ease the genetic manipulation of model and non-model organisms. In this Primer, we describe recombineering and MAGE, their optimal use, their diverse applications and methods for pairing them with other genetic editing tools. We then look forward to the future of genetic engineering.
Collapse
Affiliation(s)
- Timothy M. Wannier
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, USA
| | - Peter N. Ciaccia
- Department of Molecular, Cellular & Developmental Biology, Yale University, New Haven, CT, USA
- Systems Biology Institute, Yale University, West Haven, CT, USA
| | - Andrew D. Ellington
- Department of Molecular Biosciences, College of Natural Sciences, University of Texas at Austin, Austin, TX, USA
| | - Gabriel T. Filsinger
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, USA
- Department of Systems Biology, Harvard University, Cambridge, MA, USA
| | - Farren J. Isaacs
- Department of Molecular, Cellular & Developmental Biology, Yale University, New Haven, CT, USA
- Systems Biology Institute, Yale University, West Haven, CT, USA
- Department of Biomedical Engineering, Yale University, New Haven, CT, USA
| | - Kamyab Javanmardi
- Department of Molecular Biosciences, College of Natural Sciences, University of Texas at Austin, Austin, TX, USA
| | - Michaela A. Jones
- Department of Chemical and Biomolecular Engineering, University of Delaware, Newark, DE, USA
| | - Aditya M. Kunjapur
- Department of Chemical and Biomolecular Engineering, University of Delaware, Newark, DE, USA
| | - Akos Nyerges
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, USA
| | - Csaba Pal
- Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Centre, Szeged, Hungary
| | - Max G. Schubert
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, USA
| | - George M. Church
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, USA
| |
Collapse
|
25
|
Munro D, Singh M. DeMaSk: a deep mutational scanning substitution matrix and its use for variant impact prediction. Bioinformatics 2020; 36:5322-5329. [PMID: 33325500 PMCID: PMC8016454 DOI: 10.1093/bioinformatics/btaa1030] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Revised: 10/16/2020] [Accepted: 11/30/2020] [Indexed: 01/27/2023] Open
Abstract
Motivation Accurately predicting the quantitative impact of a substitution on a protein’s molecular function would be a great aid in understanding the effects of observed genetic variants across populations. While this remains a challenging task, new approaches can leverage data from the increasing numbers of comprehensive deep mutational scanning (DMS) studies that systematically mutate proteins and measure fitness. Results We introduce DeMaSk, an intuitive and interpretable method based only upon DMS datasets and sequence homologs that predicts the impact of missense mutations within any protein. DeMaSk first infers a directional amino acid substitution matrix from DMS datasets and then fits a linear model that combines these substitution scores with measures of per-position evolutionary conservation and variant frequency across homologs. Despite its simplicity, DeMaSk has state-of-the-art performance in predicting the impact of amino acid substitutions, and can easily and rapidly be applied to any protein sequence. Availability and implementation https://demask.princeton.edu generates fitness impact predictions and visualizations for any user-submitted protein sequence. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Daniel Munro
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, 08544, USA
| | - Mona Singh
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, 08544, USA.,Department of Computer Science, Princeton University, Princeton, 08544, USA
| |
Collapse
|
26
|
Van Leuven JT, Ederer MM, Burleigh K, Scott L, Hughes RA, Codrea V, Ellington AD, Wichman HA, Miller CR. ΦX174 Attenuation by Whole-Genome Codon Deoptimization. Genome Biol Evol 2020; 13:5921183. [PMID: 33045052 PMCID: PMC7881332 DOI: 10.1093/gbe/evaa214] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/07/2020] [Indexed: 12/11/2022] Open
Abstract
Natural selection acting on synonymous mutations in protein-coding genes influences genome composition and evolution. In viruses, introducing synonymous mutations in genes encoding structural proteins can drastically reduce viral growth, providing a means to generate potent, live-attenuated vaccine candidates. However, an improved understanding of what compositional features are under selection and how combinations of synonymous mutations affect viral growth is needed to predictably attenuate viruses and make them resistant to reversion. We systematically recoded all nonoverlapping genes of the bacteriophage ΦX174 with codons rarely used in its Escherichia coli host. The fitness of recombinant viruses decreases as additional deoptimizing mutations are made to the genome, although not always linearly, and not consistently across genes. Combining deoptimizing mutations may reduce viral fitness more or less than expected from the effect size of the constituent mutations and we point out difficulties in untangling correlated compositional features. We test our model by optimizing the same genes and find that the relationship between codon usage and fitness does not hold for optimization, suggesting that wild-type ΦX174 is at a fitness optimum. This work highlights the need to better understand how selection acts on patterns of synonymous codon usage across the genome and provides a convenient system to investigate the genetic determinants of virulence.
Collapse
Affiliation(s)
- James T Van Leuven
- Department of Biological Science, University of Idaho.,Institute for Modeling Collaboration and Innovation, University of Idaho
| | | | - Katelyn Burleigh
- Department of Biological Science, University of Idaho.,Present address: Seattle Children's Research Institute, Seattle, WA
| | - LuAnn Scott
- Department of Biological Science, University of Idaho
| | - Randall A Hughes
- Applied Research Laboratories, University of Texas, Austin.,Present address: Biotechnology Branch, CCDC US Army Research Laboratory, Adelphi, MD
| | - Vlad Codrea
- Institute for Cellular and Molecular Biology, University of Texas, Austin
| | - Andrew D Ellington
- Applied Research Laboratories, University of Texas, Austin.,Institute for Cellular and Molecular Biology, University of Texas, Austin
| | - Holly A Wichman
- Department of Biological Science, University of Idaho.,Institute for Modeling Collaboration and Innovation, University of Idaho
| | - Craig R Miller
- Department of Biological Science, University of Idaho.,Institute for Modeling Collaboration and Innovation, University of Idaho
| |
Collapse
|
27
|
Nieuwkoop T, Finger-Bou M, van der Oost J, Claassens NJ. The Ongoing Quest to Crack the Genetic Code for Protein Production. Mol Cell 2020; 80:193-209. [PMID: 33010203 DOI: 10.1016/j.molcel.2020.09.014] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Revised: 08/10/2020] [Accepted: 09/10/2020] [Indexed: 01/05/2023]
Abstract
Understanding the genetic design principles that determine protein production remains a major challenge. Although the key principles of gene expression were discovered 50 years ago, additional factors are still being uncovered. Both protein-coding and non-coding sequences harbor elements that collectively influence the efficiency of protein production by modulating transcription, mRNA decay, and translation. The influences of many contributing elements are intertwined, which complicates a full understanding of the individual factors. In natural genes, a functional balance between these factors has been obtained in the course of evolution, whereas for genetic-engineering projects, our incomplete understanding still limits optimal design of synthetic genes. However, notable advances have recently been made, supported by high-throughput analysis of synthetic gene libraries as well as by state-of-the-art biomolecular techniques. We discuss here how these advances further strengthen understanding of the gene expression process and how they can be harnessed to optimize protein production.
Collapse
Affiliation(s)
- Thijs Nieuwkoop
- Laboratory of Microbiology, Wageningen University, Stippeneng 4, 6708 WE Wageningen, the Netherlands
| | - Max Finger-Bou
- Laboratory of Microbiology, Wageningen University, Stippeneng 4, 6708 WE Wageningen, the Netherlands
| | - John van der Oost
- Laboratory of Microbiology, Wageningen University, Stippeneng 4, 6708 WE Wageningen, the Netherlands
| | - Nico J Claassens
- Laboratory of Microbiology, Wageningen University, Stippeneng 4, 6708 WE Wageningen, the Netherlands.
| |
Collapse
|
28
|
Csörgő B, Nyerges A, Pál C. Targeted mutagenesis of multiple chromosomal regions in microbes. Curr Opin Microbiol 2020; 57:22-30. [PMID: 32599531 PMCID: PMC7613694 DOI: 10.1016/j.mib.2020.05.010] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2020] [Revised: 05/18/2020] [Accepted: 05/20/2020] [Indexed: 12/20/2022]
Abstract
Directed evolution allows the effective engineering of proteins, biosynthetic pathways, and cellular functions. Traditional plasmid-based methods generally subject one or occasionally multiple genes-of-interest to mutagenesis, require time-consuming manual interventions, and the genes that are subjected to mutagenesis are outside of their native genomic context. Other methods mutagenize the whole genome unselectively which may distort the outcome. Recent recombineering- and CRISPR-based technologies radically change this field by allowing exceedingly high mutation rates at multiple, predefined loci in their native genomic context. In this review, we focus on recent technologies that potentially allow accelerated tunable mutagenesis at multiple genomic loci in the native genomic context of these target sequences. These technologies will be compared by four main criteria, including the scale of mutagenesis, portability to multiple microbial species, off-target mutagenesis, and cost-effectiveness. Finally, we discuss how these technical advances open new avenues in basic research and biotechnology.
Collapse
Affiliation(s)
- Bálint Csörgő
- Department of Microbiology and Immunology, University of California, San Francisco, 94143, San Francisco, CA, USA; Genome Biology Unit, European Molecular Biology Laboratory, 69117, Heidelberg, Germany.
| | - Akos Nyerges
- Synthetic and Systems Biology Unit, Biological Research Centre, 6726, Szeged, Hungary; Department of Genetics, Harvard Medical School, 02115, Boston, MA, USA
| | - Csaba Pál
- Synthetic and Systems Biology Unit, Biological Research Centre, 6726, Szeged, Hungary.
| |
Collapse
|
29
|
Chen JZ, Fowler DM, Tokuriki N. Comprehensive exploration of the translocation, stability and substrate recognition requirements in VIM-2 lactamase. eLife 2020; 9:e56707. [PMID: 32510322 PMCID: PMC7308095 DOI: 10.7554/elife.56707] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Accepted: 06/06/2020] [Indexed: 12/12/2022] Open
Abstract
Metallo-β-lactamases (MBLs) degrade a broad spectrum of β-lactam antibiotics, and are a major disseminating source for multidrug resistant bacteria. Despite many biochemical studies in diverse MBLs, molecular understanding of the roles of residues in the enzyme's stability and function, and especially substrate specificity, is lacking. Here, we employ deep mutational scanning (DMS) to generate comprehensive single amino acid variant data on a major clinical MBL, VIM-2, by measuring the effect of thousands of VIM-2 mutants on the degradation of three representative classes of β-lactams (ampicillin, cefotaxime, and meropenem) and at two different temperatures (25°C and 37°C). We revealed residues responsible for expression and translocation, and mutations that increase resistance and/or alter substrate specificity. The distribution of specificity-altering mutations unveiled distinct molecular recognition of the three substrates. Moreover, these function-altering mutations are frequently observed among naturally occurring variants, suggesting that the enzymes have continuously evolved to become more potent resistance genes.
Collapse
Affiliation(s)
- John Z Chen
- Michael Smith Laboratories, University of British ColumbiaVancouverCanada
| | - Douglas M Fowler
- Department of Genome Sciences, University of WashingtonSeattleUnited States
- Department of Bioengineering, University of WashingtonSeattleUnited States
| | - Nobuhiko Tokuriki
- Michael Smith Laboratories, University of British ColumbiaVancouverCanada
| |
Collapse
|
30
|
Martínez MA, Jordan-Paiz A, Franco S, Nevot M. Synonymous genome recoding: a tool to explore microbial biology and new therapeutic strategies. Nucleic Acids Res 2020; 47:10506-10519. [PMID: 31584076 PMCID: PMC6846928 DOI: 10.1093/nar/gkz831] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2019] [Revised: 09/12/2019] [Accepted: 09/30/2019] [Indexed: 12/18/2022] Open
Abstract
Synthetic genome recoding is a new means of generating designed organisms with altered phenotypes. Synonymous mutations introduced into the protein coding region tolerate modifications in DNA or mRNA without modifying the encoded proteins. Synonymous genome-wide recoding has allowed the synthetic generation of different small-genome viruses with modified phenotypes and biological properties. Recently, a decreased cost of chemically synthesizing DNA and improved methods for assembling DNA fragments (e.g. lambda red recombination and CRISPR-based editing) have enabled the construction of an Escherichia coli variant with a 4-Mb synthetic synonymously recoded genome with a reduced number of sense codons (n = 59) encoding the 20 canonical amino acids. Synonymous genome recoding is increasing our knowledge of microbial interactions with innate immune responses, identifying functional genome structures, and strategically ameliorating cis-inhibitory signaling sequences related to splicing, replication (in eukaryotes), and complex microbe functions, unraveling the relevance of codon usage for the temporal regulation of gene expression and the microbe mutant spectrum and adaptability. New biotechnological and therapeutic applications of this methodology can easily be envisaged. In this review, we discuss how synonymous genome recoding may impact our knowledge of microbial biology and the development of new and better therapeutic methodologies.
Collapse
Affiliation(s)
- Miguel Angel Martínez
- IrsiCaixa, Hospital Universitari Germans Trias i Pujol, Universitat Autònoma de Barcelona (UAB), Badalona, Spain
| | - Ana Jordan-Paiz
- IrsiCaixa, Hospital Universitari Germans Trias i Pujol, Universitat Autònoma de Barcelona (UAB), Badalona, Spain
| | - Sandra Franco
- IrsiCaixa, Hospital Universitari Germans Trias i Pujol, Universitat Autònoma de Barcelona (UAB), Badalona, Spain
| | - Maria Nevot
- IrsiCaixa, Hospital Universitari Germans Trias i Pujol, Universitat Autònoma de Barcelona (UAB), Badalona, Spain
| |
Collapse
|
31
|
Russ D, Glaser F, Shaer Tamar E, Yelin I, Baym M, Kelsic ED, Zampaloni C, Haldimann A, Kishony R. Escape mutations circumvent a tradeoff between resistance to a beta-lactam and resistance to a beta-lactamase inhibitor. Nat Commun 2020; 11:2029. [PMID: 32332717 PMCID: PMC7181632 DOI: 10.1038/s41467-020-15666-2] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2019] [Accepted: 03/13/2020] [Indexed: 11/09/2022] Open
Abstract
Beta-lactamase inhibitors are increasingly used to counteract antibiotic resistance mediated by beta-lactamase enzymes. These inhibitors compete with the beta-lactam antibiotic for the same binding site on the beta-lactamase, thus generating an evolutionary tradeoff: mutations that increase the enzyme's beta-lactamase activity tend to increase also its susceptibility to the inhibitor. Here, we investigate how common and accessible are mutants that escape this adaptive tradeoff. Screening a deep mutant library of the blaampC beta-lactamase gene of Escherichia coli, we identified mutations that allow growth at beta-lactam concentrations far exceeding those inhibiting growth of the wildtype strain, even in the presence of the enzyme inhibitor (avibactam). These escape mutations are rare and drug-specific, and some combinations of avibactam with beta-lactam drugs appear to prevent such escape phenotypes. Our results, showing differential adaptive potential of blaampC to combinations of avibactam and different beta-lactam antibiotics, suggest that it may be possible to identify treatments that are more resilient to evolution of resistance.
Collapse
Affiliation(s)
- Dor Russ
- Faculty of Biology, Technion-Israel Institute of Technology, Haifa, Israel
| | - Fabian Glaser
- Lorry I. Lokey Interdisciplinary Center for Life Sciences and Engineering, Technion-Israel Institute of Technology, Haifa, Israel
| | - Einat Shaer Tamar
- Faculty of Biology, Technion-Israel Institute of Technology, Haifa, Israel
| | - Idan Yelin
- Faculty of Biology, Technion-Israel Institute of Technology, Haifa, Israel
| | - Michael Baym
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Eric D Kelsic
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Claudia Zampaloni
- Roche Pharma Research and Early Development, Immunology, Infectious Diseases, and Ophthalmology, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Grenzacherstrasse 124, 4070, Basel, Switzerland
| | - Andreas Haldimann
- Roche Pharma Research and Early Development, Immunology, Infectious Diseases, and Ophthalmology, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Grenzacherstrasse 124, 4070, Basel, Switzerland
| | - Roy Kishony
- Faculty of Biology, Technion-Israel Institute of Technology, Haifa, Israel. .,Faculty of Computer Science, Technion-Israel Institute of Technology, Haifa, Israel.
| |
Collapse
|
32
|
Blazejewski T, Ho HI, Wang HH. Synthetic sequence entanglement augments stability and containment of genetic information in cells. Science 2020; 365:595-598. [PMID: 31395784 DOI: 10.1126/science.aav5477] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Revised: 06/21/2019] [Accepted: 07/15/2019] [Indexed: 12/28/2022]
Abstract
In synthetic biology, methods for stabilizing genetically engineered functions and confining recombinant DNA to intended hosts are necessary to cope with natural mutation accumulation and pervasive lateral gene flow. We present a generalizable strategy to preserve and constrain genetic information through the computational design of overlapping genes. Overlapping a sequence with an essential gene altered its fitness landscape and produced a constrained evolutionary path, even for synonymous mutations. Embedding a toxin gene in a gene of interest restricted its horizontal propagation. We further demonstrated a multiplex and scalable approach to build and test >7500 overlapping sequence designs, yielding functional yet highly divergent variants from natural homologs. This work enables deeper exploration of natural and engineered overlapping genes and facilitates enhanced genetic stability and biocontainment in emerging applications.
Collapse
Affiliation(s)
- Tomasz Blazejewski
- Department of Systems Biology, Columbia University, New York, NY, USA.,Integrated Program in Cellular, Molecular, and Biomedical Studies, Columbia University, New York, NY, USA
| | - Hsing-I Ho
- Department of Systems Biology, Columbia University, New York, NY, USA
| | - Harris H Wang
- Department of Systems Biology, Columbia University, New York, NY, USA. .,Department of Pathology and Cell Biology, Columbia University, New York, NY, USA
| |
Collapse
|
33
|
Moreira MH, Barros GC, Requião RD, Rossetto S, Domitrovic T, Palhano FL. From reporters to endogenous genes: the impact of the first five codons on translation efficiency in Escherichia coli. RNA Biol 2019; 16:1806-1816. [PMID: 31470761 PMCID: PMC6844562 DOI: 10.1080/15476286.2019.1661213] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2019] [Accepted: 08/23/2019] [Indexed: 12/29/2022] Open
Abstract
Translation initiation is a critical step in the regulation of protein synthesis, and it is subjected to different control mechanisms, such as 5' UTR secondary structure and initiation codon context, that can influence the rates at which initiation and consequentially translation occur. For some genes, translation elongation also affects the rate of protein synthesis. With a GFP library containing nearly all possible combinations of nucleotides from the 3rd to the 5th codon positions in the protein coding region of the mRNA, it was previously demonstrated that some nucleotide combinations increased GFP expression up to four orders of magnitude. While it is clear that the codon region from positions 3 to 5 can influence protein expression levels of artificial constructs, its impact on endogenous proteins is still unknown. Through bioinformatics analysis, we identified the nucleotide combinations of the GFP library in Escherichia coli genes and examined the correlation between the expected levels of translation according to the GFP data with the experimental measures of protein expression. We observed that E. coli genes were enriched with the nucleotide compositions that enhanced protein expression in the GFP library, but surprisingly, it seemed to affect the translation efficiency only marginally. Nevertheless, our data indicate that different enterobacteria present similar nucleotide composition enrichment as E. coli, suggesting an evolutionary pressure towards the conservation of short translational enhancer sequences.
Collapse
Affiliation(s)
- Mariana H. Moreira
- Programa de Biologia Estrutural, Instituto de Bioquímica Médica Leopoldo de Meis, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | - Géssica C. Barros
- Programa de Biologia Estrutural, Instituto de Bioquímica Médica Leopoldo de Meis, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | - Rodrigo D. Requião
- Programa de Biologia Estrutural, Instituto de Bioquímica Médica Leopoldo de Meis, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | - Silvana Rossetto
- Departamento de Ciência da Computação, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | - Tatiana Domitrovic
- Departamento de Virologia, Instituto de Microbiologia Paulo de Góes, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | - Fernando L. Palhano
- Programa de Biologia Estrutural, Instituto de Bioquímica Médica Leopoldo de Meis, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| |
Collapse
|
34
|
Esposito D, Weile J, Shendure J, Starita LM, Papenfuss AT, Roth FP, Fowler DM, Rubin AF. MaveDB: an open-source platform to distribute and interpret data from multiplexed assays of variant effect. Genome Biol 2019; 20:223. [PMID: 31679514 PMCID: PMC6827219 DOI: 10.1186/s13059-019-1845-6] [Citation(s) in RCA: 154] [Impact Index Per Article: 25.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2019] [Accepted: 10/01/2019] [Indexed: 11/10/2022] Open
Abstract
Multiplex assays of variant effect (MAVEs), such as deep mutational scans and massively parallel reporter assays, test thousands of sequence variants in a single experiment. Despite the importance of MAVE data for basic and clinical research, there is no standard resource for their discovery and distribution. Here, we present MaveDB ( https://www.mavedb.org ), a public repository for large-scale measurements of sequence variant impact, designed for interoperability with applications to interpret these datasets. We also describe the first such application, MaveVis, which retrieves, visualizes, and contextualizes variant effect maps. Together, the database and applications will empower the community to mine these powerful datasets.
Collapse
Affiliation(s)
- Daniel Esposito
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
| | - Jochen Weile
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Lea M Starita
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Anthony T Papenfuss
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
- Department of Medical Biology, University of Melbourne, Melbourne, VIC, Australia
- Bioinformatics and Cancer Genomics Laboratory, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
- Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, VIC, Australia
- Department of Mathematics and Statistics, University of Melbourne, Melbourne, VIC, Australia
| | - Frederick P Roth
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada.
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada.
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.
- Department of Computer Science, University of Toronto, Toronto, ON, Canada.
- Canadian Institute for Advanced Research, Toronto, ON, Canada.
| | - Douglas M Fowler
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Canadian Institute for Advanced Research, Toronto, ON, Canada.
- Department of Bioengineering, University of Washington, Seattle, WA, USA.
| | - Alan F Rubin
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia.
- Department of Medical Biology, University of Melbourne, Melbourne, VIC, Australia.
- Bioinformatics and Cancer Genomics Laboratory, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia.
| |
Collapse
|
35
|
Konaté MM, Plata G, Park J, Usmanova DR, Wang H, Vitkup D. Molecular function limits divergent protein evolution on planetary timescales. eLife 2019; 8:e39705. [PMID: 31532392 PMCID: PMC6750897 DOI: 10.7554/elife.39705] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Accepted: 08/07/2019] [Indexed: 01/25/2023] Open
Abstract
Functional conservation is known to constrain protein evolution. Nevertheless, the long-term divergence patterns of proteins maintaining the same molecular function and the possible limits of this divergence have not been explored in detail. We investigate these fundamental questions by characterizing the divergence between ancient protein orthologs with conserved molecular function. Our results demonstrate that the decline of sequence and structural similarities between such orthologs significantly slows down after ~1-2 billion years of independent evolution. As a result, the sequence and structural similarities between ancient orthologs have not substantially decreased for the past billion years. The effective divergence limit (>25% sequence identity) is not primarily due to protein sites universally conserved in all linages. Instead, less than four amino acid types are accepted, on average, per site across orthologous protein sequences. Our analysis also reveals different divergence patterns for protein sites with experimentally determined small and large fitness effects of mutations. Editorial note This article has been through an editorial process in which the authors decide how to respond to the issues raised during peer review. The Reviewing Editor's assessment is that all the issues have been addressed (see decision letter).
Collapse
Affiliation(s)
- Mariam M Konaté
- Department of Systems BiologyColumbia UniversityNew YorkUnited States
- Division of Cancer Treatment and Diagnosis, National Cancer InstituteBethesdaUnited States
| | - Germán Plata
- Department of Systems BiologyColumbia UniversityNew YorkUnited States
| | - Jimin Park
- Department of Systems BiologyColumbia UniversityNew YorkUnited States
- Department of Pathology and Cell BiologyColumbia UniversityNew YorkUnited States
| | - Dinara R Usmanova
- Department of Systems BiologyColumbia UniversityNew YorkUnited States
| | - Harris Wang
- Department of Systems BiologyColumbia UniversityNew YorkUnited States
- Department of Pathology and Cell BiologyColumbia UniversityNew YorkUnited States
| | - Dennis Vitkup
- Department of Systems BiologyColumbia UniversityNew YorkUnited States
- Department of Biomedical InformaticsColumbia UniversityNew YorkUnited States
| |
Collapse
|
36
|
Abstract
Synonymous variations in protein-coding sequences alter protein expression dynamics, which has important implications for cellular physiology and evolutionary fitness, but disentangling the underlying molecular mechanisms remains challenging.
Collapse
Affiliation(s)
| | - Gregory Boël
- Institut de Biologie Physico-Chemique, CNRS, 75005 Paris, France.
| | - John F Hunt
- Department of Biological Sciences, Columbia University, New York, NY 10024, USA.
| |
Collapse
|
37
|
Mahajan S, Agashe D. Translational Selection for Speed Is Not Sufficient to Explain Variation in Bacterial Codon Usage Bias. Genome Biol Evol 2018; 10:562-576. [PMID: 29385509 PMCID: PMC5800062 DOI: 10.1093/gbe/evy018] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/16/2017] [Indexed: 02/05/2023] Open
Abstract
Increasing growth rate across bacteria strengthens selection for faster translation, concomitantly increasing the total number of tRNA genes and codon usage bias (CUB: enrichment of specific synonymous codons in highly expressed genes). Typically, enriched codons are translated by tRNAs with higher gene copy numbers (GCN). A model of tRNA–CUB coevolution based on fast growth-associated selection on translational speed recapitulates these patterns. A key untested implication of the coevolution model is that translational selection should favor higher tRNA GCN for more frequently used amino acids, potentially weakening the effect of growth-associated selection on CUB. Surprisingly, we find that CUB saturates with increasing growth rate across γ-proteobacteria, even as the number of tRNA genes continues to increase. As predicted, amino acid-specific tRNA GCN is positively correlated with the usage of corresponding amino acids, but there is no correlation between growth rate associated changes in CUB and amino acid usage. Instead, we find that some amino acids—cysteine and those in the NNA/G codon family—show weak CUB that does not increase with growth rate, despite large variation in the corresponding tRNA GCN. We suggest that amino acid-specific variation in CUB is not explained by tRNA GCN because GCN does not influence the difference between translation times of synonymous codons as expected. Thus, selection on translational speed alone cannot fully explain quantitative variation in overall or amino acid-specific CUB, suggesting a significant role for other functional constraints and amino acid-specific codon features.
Collapse
Affiliation(s)
- Saurabh Mahajan
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bangalore, Karnataka, India
| | - Deepa Agashe
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bangalore, Karnataka, India
| |
Collapse
|
38
|
Characterizing posttranslational modifications in prokaryotic metabolism using a multiscale workflow. Proc Natl Acad Sci U S A 2018; 115:11096-11101. [PMID: 30301795 DOI: 10.1073/pnas.1811971115] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Understanding the complex interactions of protein posttranslational modifications (PTMs) represents a major challenge in metabolic engineering, synthetic biology, and the biomedical sciences. Here, we present a workflow that integrates multiplex automated genome editing (MAGE), genome-scale metabolic modeling, and atomistic molecular dynamics to study the effects of PTMs on metabolic enzymes and microbial fitness. This workflow incorporates complementary approaches across scientific disciplines; provides molecular insight into how PTMs influence cellular fitness during nutrient shifts; and demonstrates how mechanistic details of PTMs can be explored at different biological scales. As a proof of concept, we present a global analysis of PTMs on enzymes in the metabolic network of Escherichia coli Based on our workflow results, we conduct a more detailed, mechanistic analysis of the PTMs in three proteins: enolase, serine hydroxymethyltransferase, and transaldolase. Application of this workflow identified the roles of specific PTMs in observed experimental phenomena and demonstrated how individual PTMs regulate enzymes, pathways, and, ultimately, cell phenotypes.
Collapse
|
39
|
Riesselman AJ, Ingraham JB, Marks DS. Deep generative models of genetic variation capture the effects of mutations. Nat Methods 2018; 15:816-822. [PMID: 30250057 DOI: 10.1038/s41592-018-0138-4] [Citation(s) in RCA: 320] [Impact Index Per Article: 45.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2018] [Accepted: 07/29/2018] [Indexed: 01/05/2023]
Abstract
The functions of proteins and RNAs are defined by the collective interactions of many residues, and yet most statistical models of biological sequences consider sites nearly independently. Recent approaches have demonstrated benefits of including interactions to capture pairwise covariation, but leave higher-order dependencies out of reach. Here we show how it is possible to capture higher-order, context-dependent constraints in biological sequences via latent variable models with nonlinear dependencies. We found that DeepSequence ( https://github.com/debbiemarkslab/DeepSequence ), a probabilistic model for sequence families, predicted the effects of mutations across a variety of deep mutational scanning experiments substantially better than existing methods based on the same evolutionary data. The model, learned in an unsupervised manner solely on the basis of sequence information, is grounded with biologically motivated priors, reveals the latent organization of sequence families, and can be used to explore new parts of sequence space.
Collapse
Affiliation(s)
- Adam J Riesselman
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA.,Program in Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - John B Ingraham
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA.,Program in Systems Biology, Harvard University, Cambridge, MA, USA
| | - Debora S Marks
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
40
|
Cambray G, Guimaraes JC, Arkin AP. Evaluation of 244,000 synthetic sequences reveals design principles to optimize translation in Escherichia coli. Nat Biotechnol 2018; 36:1005-1015. [DOI: 10.1038/nbt.4238] [Citation(s) in RCA: 135] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2017] [Accepted: 08/02/2018] [Indexed: 01/01/2023]
|
41
|
Mittal P, Brindle J, Stephen J, Plotkin JB, Kudla G. Codon usage influences fitness through RNA toxicity. Proc Natl Acad Sci U S A 2018; 115:8639-8644. [PMID: 30082392 PMCID: PMC6112741 DOI: 10.1073/pnas.1810022115] [Citation(s) in RCA: 63] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Many organisms are subject to selective pressure that gives rise to unequal usage of synonymous codons, known as codon bias. To experimentally dissect the mechanisms of selection on synonymous sites, we expressed several hundred synonymous variants of the GFP gene in Escherichia coli, and used quantitative growth and viability assays to estimate bacterial fitness. Unexpectedly, we found many synonymous variants whose expression was toxic to E. coli Unlike previously studied effects of synonymous mutations, the effect that we discovered is independent of translation, but it depends on the production of toxic mRNA molecules. We identified RNA sequence determinants of toxicity and evolved suppressor strains that can tolerate the expression of toxic GFP variants. Genome sequencing of these suppressor strains revealed a cluster of promoter mutations that prevented toxicity by reducing mRNA levels. We conclude that translation-independent RNA toxicity is a previously unrecognized obstacle in bacterial gene expression.
Collapse
Affiliation(s)
- Pragya Mittal
- Medical Research Council Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, EH4 2XU Edinburgh, United Kingdom
| | - James Brindle
- Medical Research Council Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, EH4 2XU Edinburgh, United Kingdom
| | - Julie Stephen
- Medical Research Council Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, EH4 2XU Edinburgh, United Kingdom
| | - Joshua B Plotkin
- Department of Biology, University of Pennsylvania, Philadelphia, PA 19104
| | - Grzegorz Kudla
- Medical Research Council Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, EH4 2XU Edinburgh, United Kingdom;
| |
Collapse
|
42
|
Directed evolution of multiple genomic loci allows the prediction of antibiotic resistance. Proc Natl Acad Sci U S A 2018; 115:E5726-E5735. [PMID: 29871954 PMCID: PMC6016788 DOI: 10.1073/pnas.1801646115] [Citation(s) in RCA: 63] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Antibiotic development is frequently plagued by the rapid emergence of drug resistance. However, assessing the risk of resistance development in the preclinical stage is difficult. Standard laboratory evolution approaches explore only a small fraction of the sequence space and fail to identify exceedingly rare resistance mutations and combinations thereof. Therefore, new rapid and exhaustive methods are needed to accurately assess the potential of resistance evolution and uncover the underlying mutational mechanisms. Here, we introduce directed evolution with random genomic mutations (DIvERGE), a method that allows an up to million-fold increase in mutation rate along the full lengths of multiple predefined loci in a range of bacterial species. In a single day, DIvERGE generated specific mutation combinations, yielding clinically significant resistance against trimethoprim and ciprofloxacin. Many of these mutations have remained previously undetected or provide resistance in a species-specific manner. These results indicate pathogen-specific resistance mechanisms and the necessity of future narrow-spectrum antibacterial treatments. In contrast to prior claims, we detected the rapid emergence of resistance against gepotidacin, a novel antibiotic currently in clinical trials. Based on these properties, DIvERGE could be applicable to identify less resistance-prone antibiotics at an early stage of drug development. Finally, we discuss potential future applications of DIvERGE in synthetic and evolutionary biology.
Collapse
|
43
|
Park J, Wang HH. Systematic and synthetic approaches to rewire regulatory networks. CURRENT OPINION IN SYSTEMS BIOLOGY 2018; 8:90-96. [PMID: 30637352 PMCID: PMC6329604 DOI: 10.1016/j.coisb.2017.12.009] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Microbial gene regulatory networks are composed of cis- and trans-components that in concert act to control essential and adaptive cellular functions. Regulatory components and interactions evolve to adopt new configurations through mutations and network rewiring events, resulting in novel phenotypes that may benefit the cell. Advances in high-throughput DNA synthesis and sequencing have enabled the development of new tools and approaches to better characterize and perturb various elements of regulatory networks. Here, we highlight key recent approaches to systematically dissect the sequence space of cis-regulatory elements and trans-regulators as well as their inter-connections. These efforts yield fundamental insights into the architecture, robustness, and dynamics of gene regulation and provide models and design principles for building synthetic regulatory networks for a variety of practical applications.
Collapse
Affiliation(s)
- Jimin Park
- Department of Systems Biology, Columbia University Medical Center, New York, USA
- Integrated Program in Cellular, Molecular and Biomedical Studies, Columbia University Medical Center, New York, USA
| | - Harris H Wang
- Department of Systems Biology, Columbia University Medical Center, New York, USA
- Department of Pathology and Cell Biology, Columbia University Medical Center, New York, USA
| |
Collapse
|
44
|
Mauro VP, Chappell SA. Considerations in the Use of Codon Optimization for Recombinant Protein Expression. Methods Mol Biol 2018; 1850:275-288. [PMID: 30242693 DOI: 10.1007/978-1-4939-8730-6_18] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Codon optimization is a gene engineering approach that is commonly used for enhancing recombinant protein expression. This approach is possible because (1) degeneracy of the genetic code enables most amino acids to be encoded by multiple codons and (2) different mRNAs encoding the same protein can vary dramatically in the amount of protein expressed. However, because codon optimization potentially disrupts overlapping information encoded in mRNA coding regions, protein structure and function may be altered. This chapter discusses the use of codon optimization for various applications in mammalian cells as well as potential consequences, so that informed decisions can be made on the appropriateness of using this approach in each case.
Collapse
|
45
|
Lau YH, Stirling F, Kuo J, Karrenbelt MAP, Chan YA, Riesselman A, Horton CA, Schäfer E, Lips D, Weinstock MT, Gibson DG, Way JC, Silver PA. Large-scale recoding of a bacterial genome by iterative recombineering of synthetic DNA. Nucleic Acids Res 2017; 45:6971-6980. [PMID: 28499033 PMCID: PMC5499800 DOI: 10.1093/nar/gkx415] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2017] [Accepted: 05/02/2017] [Indexed: 01/02/2023] Open
Abstract
The ability to rewrite large stretches of genomic DNA enables the creation of new organisms with customized functions. However, few methods currently exist for accumulating such widespread genomic changes in a single organism. In this study, we demonstrate a rapid approach for rewriting bacterial genomes with modified synthetic DNA. We recode 200 kb of the Salmonella typhimurium LT2 genome through a process we term SIRCAS (stepwise integration of rolling circle amplified segments), towards constructing an attenuated and genetically isolated bacterial chassis. The SIRCAS process involves direct iterative recombineering of 10–25 kb synthetic DNA constructs which are assembled in yeast and amplified by rolling circle amplification. Using SIRCAS, we create a Salmonella with 1557 synonymous leucine codon replacements across 176 genes, the largest number of cumulative recoding changes in a single bacterial strain to date. We demonstrate reproducibility over sixteen two-day cycles of integration and parallelization for hierarchical construction of a synthetic genome by conjugation. The resulting recoded strain grows at a similar rate to the wild-type strain and does not exhibit any major growth defects. This work is the first instance of synthetic bacterial recoding beyond the Escherichia coli genome, and reveals that Salmonella is remarkably amenable to genome-scale modification.
Collapse
Affiliation(s)
- Yu Heng Lau
- Wyss Institute for Biologically Inspired Engineering, Harvard University, 3 Blackfan Circle, 5th Floor, Boston, MA 02115, USA.,Department of Systems Biology, Harvard Medical School, 200 Longwood Avenue, Alpert 536, Boston, MA 02115, USA
| | - Finn Stirling
- Wyss Institute for Biologically Inspired Engineering, Harvard University, 3 Blackfan Circle, 5th Floor, Boston, MA 02115, USA.,Department of Systems Biology, Harvard Medical School, 200 Longwood Avenue, Alpert 536, Boston, MA 02115, USA
| | - James Kuo
- Wyss Institute for Biologically Inspired Engineering, Harvard University, 3 Blackfan Circle, 5th Floor, Boston, MA 02115, USA.,Department of Systems Biology, Harvard Medical School, 200 Longwood Avenue, Alpert 536, Boston, MA 02115, USA
| | - Michiel A P Karrenbelt
- Wyss Institute for Biologically Inspired Engineering, Harvard University, 3 Blackfan Circle, 5th Floor, Boston, MA 02115, USA.,Systems and Synthetic Biology, Wageningen University, Dreijenplein 10, 6703 HB Wageningen, The Netherlands
| | - Yujia A Chan
- Wyss Institute for Biologically Inspired Engineering, Harvard University, 3 Blackfan Circle, 5th Floor, Boston, MA 02115, USA.,Department of Systems Biology, Harvard Medical School, 200 Longwood Avenue, Alpert 536, Boston, MA 02115, USA
| | - Adam Riesselman
- Program in Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Connor A Horton
- Wyss Institute for Biologically Inspired Engineering, Harvard University, 3 Blackfan Circle, 5th Floor, Boston, MA 02115, USA.,Department of Systems Biology, Harvard Medical School, 200 Longwood Avenue, Alpert 536, Boston, MA 02115, USA
| | - Elena Schäfer
- Wyss Institute for Biologically Inspired Engineering, Harvard University, 3 Blackfan Circle, 5th Floor, Boston, MA 02115, USA.,Department of Systems Biology, Harvard Medical School, 200 Longwood Avenue, Alpert 536, Boston, MA 02115, USA
| | - David Lips
- Wyss Institute for Biologically Inspired Engineering, Harvard University, 3 Blackfan Circle, 5th Floor, Boston, MA 02115, USA.,Department of Systems Biology, Harvard Medical School, 200 Longwood Avenue, Alpert 536, Boston, MA 02115, USA
| | - Matthew T Weinstock
- Synthetic Genomics, Inc., 11149 North Torrey Pines Road, La Jolla, CA 92037, USA
| | - Daniel G Gibson
- Synthetic Genomics, Inc., 11149 North Torrey Pines Road, La Jolla, CA 92037, USA.,Synthetic Biology and Bioenergy Group, J. Craig Venter Institute, 4120 Capricorn Lane, La Jolla, CA 92037, USA
| | - Jeffrey C Way
- Wyss Institute for Biologically Inspired Engineering, Harvard University, 3 Blackfan Circle, 5th Floor, Boston, MA 02115, USA.,Department of Systems Biology, Harvard Medical School, 200 Longwood Avenue, Alpert 536, Boston, MA 02115, USA
| | - Pamela A Silver
- Wyss Institute for Biologically Inspired Engineering, Harvard University, 3 Blackfan Circle, 5th Floor, Boston, MA 02115, USA.,Department of Systems Biology, Harvard Medical School, 200 Longwood Avenue, Alpert 536, Boston, MA 02115, USA
| |
Collapse
|
46
|
Burkhardt DH, Rouskin S, Zhang Y, Li GW, Weissman JS, Gross CA. Operon mRNAs are organized into ORF-centric structures that predict translation efficiency. eLife 2017; 6. [PMID: 28139975 PMCID: PMC5318159 DOI: 10.7554/elife.22037] [Citation(s) in RCA: 112] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2016] [Accepted: 01/27/2017] [Indexed: 02/02/2023] Open
Abstract
Bacterial mRNAs are organized into operons consisting of discrete open reading frames (ORFs) in a single polycistronic mRNA. Individual ORFs on the mRNA are differentially translated, with rates varying as much as 100-fold. The signals controlling differential translation are poorly understood. Our genome-wide mRNA secondary structure analysis indicated that operonic mRNAs are comprised of ORF-wide units of secondary structure that vary across ORF boundaries such that adjacent ORFs on the same mRNA molecule are structurally distinct. ORF translation rate is strongly correlated with its mRNA structure in vivo, and correlation persists, albeit in a reduced form, with its structure when translation is inhibited and with that of in vitro refolded mRNA. These data suggest that intrinsic ORF mRNA structure encodes a rough blueprint for translation efficiency. This structure is then amplified by translation, in a self-reinforcing loop, to provide the structure that ultimately specifies the translation of each ORF. DOI:http://dx.doi.org/10.7554/eLife.22037.001 Proteins make up much of the biological machinery inside cells and perform the essential tasks needed to keep each cell alive. Cells contain thousands of different proteins and the instructions needed to build each protein are encoded in genes. However, these instructions cannot be used directly to manufacture the proteins. Instead, a messenger molecule called mRNA is needed to carry the information stored within genes to the parts of the cell where proteins are made. In bacteria, one mRNA molecule can include information from several genes. This group of genes is called an operon and produces a set of proteins that perform a shared task. Although these proteins work together, some of them are needed in greater numbers than others. Because they are all made using information from the same mRNA, some instructions on the mRNA must be read more times than others. It is unclear how bacterial cells control how many proteins are produced from each part of one mRNA but it is thought to relate to the three-dimensional shape of the molecule itself. Burkhardt, Rouskin, Zhang et al. have now examined the production of proteins from mRNAs in the commonly studied bacterium, Escherichia coli. The results showed that each set of instructions on the mRNA formed a three-dimensional structure that corresponds to the amount of protein produced from that portion of the mRNA. When this three-dimensional structure is more stable or rigid, the corresponding instructions tended to produce fewer proteins than if the structure was relatively simple and unstable. Further investigation showed that these three-dimensional mRNA structures could form spontaneously outside of cells, suggesting that molecules other than the mRNA itself have a relatively small role in controlling the number of proteins produced. This also suggests that the entire structure of each mRNA is important and is likely to be essential for cell survival. The next step is to understand why bacteria organise their genes in this way and how the different mRNA structures control how proteins are produced. Moreover, because many bacteria are used like biological factories to produce a variety of commercially useful molecules, these new insights have the potential to enhance a number of manufacturing processes. DOI:http://dx.doi.org/10.7554/eLife.22037.002
Collapse
Affiliation(s)
- David H Burkhardt
- Graduate Group in Biophysics, University of California, San Francisco, San Francisco, United States.,Department of Microbiology and Immunology, University of California, San Francisco, San Francisco, United States.,California Institute of Quantitative Biology, University of California, San Francisco, San Francisco, United States
| | - Silvi Rouskin
- California Institute of Quantitative Biology, University of California, San Francisco, San Francisco, United States.,Department of Cellular and Molecular Pharmacology, Howard Hughes Medical Institute, University of California, San Francisco, San Francisco, United States.,Center for RNA Systems Biology, University of California, San Francisco, San Francisco, United States
| | - Yan Zhang
- Department of Microbiology and Immunology, University of California, San Francisco, San Francisco, United States.,Department of Cell and Tissue Biology, University of California, San Francisco, San Francisco, United States
| | - Gene-Wei Li
- California Institute of Quantitative Biology, University of California, San Francisco, San Francisco, United States.,Department of Cellular and Molecular Pharmacology, Howard Hughes Medical Institute, University of California, San Francisco, San Francisco, United States.,Center for RNA Systems Biology, University of California, San Francisco, San Francisco, United States
| | - Jonathan S Weissman
- California Institute of Quantitative Biology, University of California, San Francisco, San Francisco, United States.,Department of Cellular and Molecular Pharmacology, Howard Hughes Medical Institute, University of California, San Francisco, San Francisco, United States.,Center for RNA Systems Biology, University of California, San Francisco, San Francisco, United States
| | - Carol A Gross
- Department of Microbiology and Immunology, University of California, San Francisco, San Francisco, United States.,California Institute of Quantitative Biology, University of California, San Francisco, San Francisco, United States.,Department of Cell and Tissue Biology, University of California, San Francisco, San Francisco, United States
| |
Collapse
|