1
|
Cetin E, Abdizadeh H, Atilgan AR, Atilgan C. A Thermodynamic Cycle to Predict the Competitive Inhibition Outcomes of an Evolving Enzyme. J Chem Theory Comput 2025. [PMID: 40268874 DOI: 10.1021/acs.jctc.5c00193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/25/2025]
Abstract
Understanding competitive inhibition at the molecular level is essential for unraveling the dynamics of enzyme-inhibitor interactions and predicting the evolutionary outcomes of resistance mutations. In this study, we present a framework linking competitive inhibition to alchemical free energy perturbation (FEP) calculations, focusing on Escherichia coli dihydrofolate reductase (DHFR) and its inhibition by trimethoprim (TMP). Using thermodynamic cycles, we relate experimentally measured binding constants (Ki and Km) to free energy differences associated with wild-type and mutant forms of DHFR with a mean error of 0.9 kcal/mol, providing insight into the molecular underpinnings of TMP resistance. Our findings highlight the importance of local conformational dynamics in competitive inhibition. Mutations in DHFR affect substrate and inhibitor binding affinities differently, influencing the fitness landscape under selective pressure from TMP. Our FEP simulations reveal that resistance mutations stabilize inhibitor-bound or substrate-bound states through specific structural and/or dynamical effects. The interplay of these effects showcases significant molecular-level epistasis in certain cases. The ability to separately assess substrate and inhibitor binding provides valuable insights, allowing for a more precise interpretation of mutation effects and epistatic interactions. Furthermore, we identify key challenges in FEP simulations, including convergence issues arising from charge-changing mutations and long-range allosteric effects. By integrating computational and experimental data, we provide an effective approach for predicting the functional impact of resistance mutations and their contributions to evolutionary fitness landscapes. These insights pave the way for constructing robust mutational scanning protocols and designing more effective therapeutic strategies against resistant bacterial strains.
Collapse
Affiliation(s)
- Ebru Cetin
- Faculty of Engineering and Natural Sciences, Sabanci University, 34956 Istanbul, Türkiye
| | - Haleh Abdizadeh
- Faculty of Engineering and Natural Sciences, Sabanci University, 34956 Istanbul, Türkiye
| | - Ali Rana Atilgan
- Faculty of Engineering and Natural Sciences, Sabanci University, 34956 Istanbul, Türkiye
| | - Canan Atilgan
- Faculty of Engineering and Natural Sciences, Sabanci University, 34956 Istanbul, Türkiye
| |
Collapse
|
2
|
Posfai A, Zhou J, McCandlish DM, Kinney JB. Gauge fixing for sequence-function relationships. PLoS Comput Biol 2025; 21:e1012818. [PMID: 40111986 PMCID: PMC11957564 DOI: 10.1371/journal.pcbi.1012818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2024] [Accepted: 01/22/2025] [Indexed: 03/22/2025] Open
Abstract
Quantitative models of sequence-function relationships are ubiquitous in computational biology, e.g., for modeling the DNA binding of transcription factors or the fitness landscapes of proteins. Interpreting these models, however, is complicated by the fact that the values of model parameters can often be changed without affecting model predictions. Before the values of model parameters can be meaningfully interpreted, one must remove these degrees of freedom (called "gauge freedoms" in physics) by imposing additional constraints (a process called "fixing the gauge"). However, strategies for fixing the gauge of sequence-function relationships have received little attention. Here we derive an analytically tractable family of gauges for a large class of sequence-function relationships. These gauges are derived in the context of models with all-order interactions, but an important subset of these gauges can be applied to diverse types of models, including additive models, pairwise-interaction models, and models with higher-order interactions. Many commonly used gauges are special cases of gauges within this family. We demonstrate the utility of this family of gauges by showing how different choices of gauge can be used both to explore complex activity landscapes and to reveal simplified models that are approximately correct within localized regions of sequence space. The results provide practical gauge-fixing strategies and demonstrate the utility of gauge-fixing for model exploration and interpretation.
Collapse
Affiliation(s)
- Anna Posfai
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| | - Juannan Zhou
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
- Department of Biology, University of Florida, Gainesville, Florida, United States of America
| | - David M. McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| | - Justin B. Kinney
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| |
Collapse
|
3
|
Li Z, Luo Y. Rewiring protein sequence and structure generative models to enhance protein stability prediction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.13.638154. [PMID: 40027759 PMCID: PMC11870403 DOI: 10.1101/2025.02.13.638154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Predicting changes in protein thermostability due to amino acid substitutions is essential for understanding human diseases and engineering useful proteins for clinical and industrial applications. While recent advances in protein generative models, which learn probability distributions over amino acids conditioned on structural or evolutionary sequence contexts, have shown impressive performance in predicting various protein properties without task-specific training, their strong unsupervised prediction ability does not extend to all protein functions. In particular, their potential to improve protein stability prediction remains underexplored. In this work, we present SPURS, a novel deep learning framework that adapts and integrates two general-purpose protein generative models-a protein language model (ESM) and an inverse folding model (ProteinMPNN)-into an effective stability predictor. SPURS employs a lightweight neural network module to rewire per-residue structure representations learned by ProteinMPNN into the attention layers of ESM, thereby informing and enhancing ESM's sequence representation learning. This rewiring strategy enables SPURS to harness evolutionary patterns from both sequence and structure data, where the sequence like-lihood distribution learned by ESM is conditioned on structure priors encoded by ProteinMPNN to predict mutation effects. We steer this integrated framework to a stability prediction model through supervised training on a recently released mega-scale thermostability dataset. Evaluations across 12 benchmark datasets showed that SPURS delivers accurate, rapid, scalable, and generalizable stability predictions, consistently outperforming current state-of-the-art methods. Notably, SPURS demonstrates remarkable versatility in protein stability and function analyses: when combined with a protein language model, it accurately identifies protein functional sites in an unsupervised manner. Additionally, it enhances current low- N protein fitness prediction models by serving as a stability prior model to improve accuracy. These results highlight SPURS as a powerful tool to advance current protein stability prediction and machine learning-guided protein engineering workflows. The source code of SPURS is available at https://github.com/luo-group/SPURS .
Collapse
|
4
|
Cetin E, Abdizadeh H, Atilgan AR, Atilgan C. A thermodynamic cycle to predict the competitive inhibition outcomes of an evolving enzyme. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.03.636225. [PMID: 39975265 PMCID: PMC11838402 DOI: 10.1101/2025.02.03.636225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Abstract
Understanding competitive inhibition at the molecular level is essential for unraveling the dynamics of enzyme-inhibitor interactions and predicting the evolutionary outcomes of resistance mutations. In this study, we present a framework linking competitive inhibition to alchemical free energy perturbation (FEP) calculations, focusing on E. coli dihydrofolate reductase (DHFR) and its inhibition by trimethoprim (TMP). Using thermodynamic cycles, we relate experimentally measured binding constants ( K i and K m ) to free energy differences associated with wild-type and mutant forms of DHFR with a mean error of 0.9 kcal/mol, providing insights into the molecular underpinnings of TMP resistance. Our findings highlight the importance of local conformational dynamics in competitive inhibition. Mutations in DHFR affect substrate and inhibitor binding affinities differently, influencing the fitness landscape under selective pressure from TMP. Our FEP simulations reveal that resistance mutations stabilize inhibitor-bound or substrate-bound states through specific structural and/or dynamical effects. The interplay of these effects showcases significant epistasis in certain cases. The ability to separately assess substrate and inhibitor binding provides valuable insights, allowing for a more precise interpretation of mutation effects and epistatic interactions. Furthermore, we identify key challenges in FEP simulations, including convergence issues arising from charge-changing mutations and long-range allosteric effects. By integrating computational and experimental data, we provide an effective approach for predicting the functional impact of resistance mutations and their contributions to evolutionary fitness landscapes. These insights pave the way for constructing robust mutational scanning protocols and designing more effective therapeutic strategies against resistant bacterial strains.
Collapse
|
5
|
Vila JA. The origin of mutational epistasis. EUROPEAN BIOPHYSICS JOURNAL : EBJ 2024; 53:473-480. [PMID: 39443382 DOI: 10.1007/s00249-024-01725-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Revised: 10/03/2024] [Accepted: 10/06/2024] [Indexed: 10/25/2024]
Abstract
The interconnected processes of protein folding, mutations, epistasis, and evolution have all been the subject of extensive analysis throughout the years due to their significance for structural and evolutionary biology. The origin (molecular basis) of epistasis-the non-additive interactions between mutations-is still, nonetheless, unknown. The existence of a new perspective on protein folding, a problem that needs to be conceived as an 'analytic whole', will enable us to shed light on the origin of mutational epistasis at the simplest level-within proteins-while also uncovering the reasons why the genetic background in which they occur, a key component of molecular evolution, could foster changes in epistasis effects. Additionally, because mutations are the source of epistasis, more research is needed to determine the impact of post-translational modifications, which can potentially increase the proteome's diversity by several orders of magnitude, on mutational epistasis and protein evolvability. Finally, a protein evolution thermodynamic-based analysis that does not consider specific mutational steps or epistasis effects will be briefly discussed. Our study explores the complex processes behind the evolution of proteins upon mutations, clearing up some previously unresolved issues, and providing direction for further research.
Collapse
Affiliation(s)
- Jorge A Vila
- IMASL-CONICET, Ejército de Los Andes 950, 5700, San Luis, Argentina.
| |
Collapse
|
6
|
Faure AJ, Martí-Aranda A, Hidalgo-Carcedo C, Beltran A, Schmiedel JM, Lehner B. The genetic architecture of protein stability. Nature 2024; 634:995-1003. [PMID: 39322666 PMCID: PMC11499273 DOI: 10.1038/s41586-024-07966-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 08/20/2024] [Indexed: 09/27/2024]
Abstract
There are more ways to synthesize a 100-amino acid (aa) protein (20100) than there are atoms in the universe. Only a very small fraction of such a vast sequence space can ever be experimentally or computationally surveyed. Deep neural networks are increasingly being used to navigate high-dimensional sequence spaces1. However, these models are extremely complicated. Here, by experimentally sampling from sequence spaces larger than 1010, we show that the genetic architecture of at least some proteins is remarkably simple, allowing accurate genetic prediction in high-dimensional sequence spaces with fully interpretable energy models. These models capture the nonlinear relationships between free energies and phenotypes but otherwise consist of additive free energy changes with a small contribution from pairwise energetic couplings. These energetic couplings are sparse and associated with structural contacts and backbone proximity. Our results indicate that protein genetics is actually both rather simple and intelligible.
Collapse
Affiliation(s)
- Andre J Faure
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.
- ALLOX, Barcelona, Spain.
| | - Aina Martí-Aranda
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Cristina Hidalgo-Carcedo
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Antoni Beltran
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Jörn M Schmiedel
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- factorize.bio, Berlin, Germany
| | - Ben Lehner
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.
- Universitat Pompeu Fabra (UPF), Barcelona, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain.
| |
Collapse
|
7
|
Fernandez-de-Cossio-Diaz J, Uguzzoni G, Ricard K, Anselmi F, Nizak C, Pagnani A, Rivoire O. Inference and design of antibody specificity: From experiments to models and back. PLoS Comput Biol 2024; 20:e1012522. [PMID: 39401247 PMCID: PMC11501025 DOI: 10.1371/journal.pcbi.1012522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Revised: 10/24/2024] [Accepted: 09/29/2024] [Indexed: 10/26/2024] Open
Abstract
Exquisite binding specificity is essential for many protein functions but is difficult to engineer. Many biotechnological or biomedical applications require the discrimination of very similar ligands, which poses the challenge of designing protein sequences with highly specific binding profiles. Experimental methods for generating specific binders rely on in vitro selection, which is limited in terms of library size and control over specificity profiles. Additional control was recently demonstrated through high-throughput sequencing and downstream computational analysis. Here we follow such an approach to demonstrate the design of specific antibodies beyond those probed experimentally. We do so in a context where very similar epitopes need to be discriminated, and where these epitopes cannot be experimentally dissociated from other epitopes present in the selection. Our approach involves the identification of different binding modes, each associated with a particular ligand against which the antibodies are either selected or not. Using data from phage display experiments, we show that the model successfully disentangles these modes, even when they are associated with chemically very similar ligands. Additionally, we demonstrate and validate experimentally the computational design of antibodies with customized specificity profiles, either with specific high affinity for a particular target ligand, or with cross-specificity for multiple target ligands. Overall, our results showcase the potential of leveraging a biophysical model learned from selections against multiple ligands to design proteins with tailored specificity, with applications to protein engineering extending beyond the design of antibodies.
Collapse
Affiliation(s)
- Jorge Fernandez-de-Cossio-Diaz
- Laboratoire de physique de l’Ecole normale supérieure, CNRS, PSL University, Sorbonne Université, Université Paris-Cité, Paris, France
| | - Guido Uguzzoni
- Italian Institute for Genomic Medicine, IRCCS Candiolo, Candiolo, Italy
| | - Kévin Ricard
- Center for Interdisciplinary Research in Biology (CIRB), Collège de France, CNRS, INSERM, Université PSL, Paris, France
| | - Francesca Anselmi
- Italian Institute for Genomic Medicine, IRCCS Candiolo, Candiolo, Italy
- Department of Life Sciences and Systems Biology & Molecular Biotechnology Center - MBC, Universita di Torino, Via Nizza, Torino, Italy
| | - Clément Nizak
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire Jean Perrin, Paris, France
| | - Andrea Pagnani
- Italian Institute for Genomic Medicine, IRCCS Candiolo, Candiolo, Italy
- DISAT, Politecnico di Torino, Corso Duca degli Abruzzi, Torino, Italy
- INFN, Sezione di Torino, Torino, Via Pietro Giuria, Torino Italy
| | - Olivier Rivoire
- Center for Interdisciplinary Research in Biology (CIRB), Collège de France, CNRS, INSERM, Université PSL, Paris, France
- Gulliver, CNRS, ESPCI Paris, Université PSL, Paris, France
| |
Collapse
|
8
|
Di Bari L, Bisardi M, Cotogno S, Weigt M, Zamponi F. Emergent time scales of epistasis in protein evolution. Proc Natl Acad Sci U S A 2024; 121:e2406807121. [PMID: 39325427 PMCID: PMC11459137 DOI: 10.1073/pnas.2406807121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Accepted: 08/17/2024] [Indexed: 09/27/2024] Open
Abstract
We introduce a data-driven epistatic model of protein evolution, capable of generating evolutionary trajectories spanning very different time scales reaching from individual mutations to diverged homologs. Our in silico evolution encompasses random nucleotide mutations, insertions and deletions, and models selection using a fitness landscape, which is inferred via a generative probabilistic model for protein families. We show that the proposed framework accurately reproduces the sequence statistics of both short-time (experimental) and long-time (natural) protein evolution, suggesting applicability also to relatively data-poor intermediate evolutionary time scales, which are currently inaccessible to evolution experiments. Our model uncovers a highly collective nature of epistasis, gradually changing the fitness effect of mutations in a diverging sequence context, rather than acting via strong interactions between individual mutations. This collective nature triggers the emergence of a long evolutionary time scale, separating fast mutational processes inside a given sequence context, from the slow evolution of the context itself. The model quantitatively reproduces epistatic phenomena such as contingency and entrenchment, as well as the loss of predictability in protein evolution observed in deep mutational scanning experiments of distant homologs. It thereby deepens our understanding of the interplay between mutation and selection in shaping protein diversity and functions, allows one to statistically forecast evolution, and challenges the prevailing independent-site models of protein evolution, which are unable to capture the fundamental importance of epistasis.
Collapse
Affiliation(s)
- Leonardo Di Bari
- Dipartimento Scienza Applicata e Tecnologia, Politecnico di Torino, I-10129Torino, Italy
| | - Matteo Bisardi
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative, ParisF-75005, France
| | - Sabrina Cotogno
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative, ParisF-75005, France
| | - Martin Weigt
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative, ParisF-75005, France
| | - Francesco Zamponi
- Dipartimento di Fisica, Sapienza Università di Roma, 00185Rome, Italy
| |
Collapse
|
9
|
Li P, Liu ZP. MuToN Quantifies Binding Affinity Changes upon Protein Mutations by Geometric Deep Learning. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2402918. [PMID: 38995072 PMCID: PMC11425207 DOI: 10.1002/advs.202402918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 06/04/2024] [Indexed: 07/13/2024]
Abstract
Assessing changes in protein-protein binding affinity due to mutations helps understanding a wide range of crucial biological processes within cells. Despite significant efforts to create accurate computational models, predicting how mutations affect affinity remains challenging due to the complexity of the biological mechanisms involved. In the present work, a geometric deep learning framework called MuToN is introduced for quantifying protein binding affinity change upon residue mutations. The method, designed with geometric attention networks, is mechanism-aware. It captures changes in the protein binding interfaces of mutated complexes and assesses the allosteric effects of amino acids. Experimental results highlight MuToN's superiority compared to existing methods. Additionally, MuToN's flexibility and effectiveness are illustrated by its precise predictions of binding affinity changes between SARS-CoV-2 variants and the ACE2 complex.
Collapse
Affiliation(s)
- Pengpai Li
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong, 250061, China
| | - Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong, 250061, China
| |
Collapse
|
10
|
Guclu TF, Atilgan AR, Atilgan C. Deciphering GB1's Single Mutational Landscape: Insights from MuMi Analysis. J Phys Chem B 2024; 128:7987-7996. [PMID: 39115184 PMCID: PMC11671028 DOI: 10.1021/acs.jpcb.4c04916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2024] [Revised: 08/02/2024] [Accepted: 08/02/2024] [Indexed: 08/23/2024]
Abstract
Mutational changes that affect the binding of the C2 fragment of Streptococcal protein G (GB1) to the Fc domain of human IgG (IgG-Fc) have been extensively studied using deep mutational scanning (DMS), and the binding affinity of all single mutations has been measured experimentally in the literature. To investigate the underlying molecular basis, we perform in silico mutational scanning for all possible single mutations, along with 2 μs-long molecular dynamics (WT-MD) of the wild-type (WT) GB1 in both unbound and IgG-Fc bound forms. We compute the hydrogen bonds between GB1 and IgG-Fc in WT-MD to identify the dominant hydrogen bonds for binding, which we then assess in conformations produced by Mutation and Minimization (MuMi) to explain the fitness landscape of GB1 and IgG-Fc binding. Furthermore, we analyze MuMi and WT-MD to investigate the dynamics of binding, focusing on the relative solvent accessibility of residues and the probability of residues being located at the binding interface. With these analyses, we explain the interactions between GB1 and IgG-Fc and display the structural features of binding. In sum, our findings highlight the potential of MuMi as a reliable and computationally efficient tool for predicting protein fitness landscapes, offering significant advantages over traditional methods. The methodologies and results presented in this study pave the way for improved predictive accuracy in protein stability and interaction studies, which are crucial for advancements in drug design and synthetic biology.
Collapse
Affiliation(s)
- Tandac F. Guclu
- Faculty of Natural Sciences
and Engineering, Sabanci University, Tuzla, Istanbul 34956, Turkey
| | - Ali Rana Atilgan
- Faculty of Natural Sciences
and Engineering, Sabanci University, Tuzla, Istanbul 34956, Turkey
| | - Canan Atilgan
- Faculty of Natural Sciences
and Engineering, Sabanci University, Tuzla, Istanbul 34956, Turkey
| |
Collapse
|
11
|
Posfai A, Zhou J, McCandlish DM, Kinney JB. Gauge fixing for sequence-function relationships. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.12.593772. [PMID: 38798671 PMCID: PMC11118547 DOI: 10.1101/2024.05.12.593772] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Quantitative models of sequence-function relationships are ubiquitous in computational biology, e.g., for modeling the DNA binding of transcription factors or the fitness landscapes of proteins. Interpreting these models, however, is complicated by the fact that the values of model parameters can often be changed without affecting model predictions. Before the values of model parameters can be meaningfully interpreted, one must remove these degrees of freedom (called "gauge freedoms" in physics) by imposing additional constraints (a process called "fixing the gauge"). However, strategies for fixing the gauge of sequence-function relationships have received little attention. Here we derive an analytically tractable family of gauges for a large class of sequence-function relationships. These gauges are derived in the context of models with all-order interactions, but an important subset of these gauges can be applied to diverse types of models, including additive models, pairwise-interaction models, and models with higher-order interactions. Many commonly used gauges are special cases of gauges within this family. We demonstrate the utility of this family of gauges by showing how different choices of gauge can be used both to explore complex activity landscapes and to reveal simplified models that are approximately correct within localized regions of sequence space. The results provide practical gauge-fixing strategies and demonstrate the utility of gauge-fixing for model exploration and interpretation.
Collapse
Affiliation(s)
- Anna Posfai
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| | - Juannan Zhou
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
- Department of Biology, University of Florida, Gainesville, FL, 32611
| | - David M. McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| | - Justin B. Kinney
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| |
Collapse
|
12
|
Bendel AM, Skendo K, Klein D, Shimada K, Kauneckaite-Griguole K, Diss G. Optimization of a deep mutational scanning workflow to improve quantification of mutation effects on protein-protein interactions. BMC Genomics 2024; 25:630. [PMID: 38914936 PMCID: PMC11194945 DOI: 10.1186/s12864-024-10524-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Accepted: 06/14/2024] [Indexed: 06/26/2024] Open
Abstract
Deep Mutational Scanning (DMS) assays are powerful tools to study sequence-function relationships by measuring the effects of thousands of sequence variants on protein function. During a DMS experiment, several technical artefacts might distort non-linearly the functional score obtained, potentially biasing the interpretation of the results. We therefore tested several technical parameters in the deepPCA workflow, a DMS assay for protein-protein interactions, in order to identify technical sources of non-linearities. We found that parameters common to many DMS assays such as amount of transformed DNA, timepoint of harvest and library composition can cause non-linearities in the data. Designing experiments in a way to minimize these non-linear effects will improve the quantification and interpretation of mutation effects.
Collapse
Affiliation(s)
- Alexandra M Bendel
- Friedrich Miescher Institute for Biomedical Research (FMI), Basel, Switzerland
- University of Basel, Basel, Switzerland
| | | | - Dominique Klein
- Friedrich Miescher Institute for Biomedical Research (FMI), Basel, Switzerland
| | - Kenji Shimada
- Friedrich Miescher Institute for Biomedical Research (FMI), Basel, Switzerland
| | - Kotryna Kauneckaite-Griguole
- Friedrich Miescher Institute for Biomedical Research (FMI), Basel, Switzerland
- University of Basel, Basel, Switzerland
| | - Guillaume Diss
- Friedrich Miescher Institute for Biomedical Research (FMI), Basel, Switzerland.
| |
Collapse
|
13
|
Diaz-Colunga J, Skwara A, Vila JCC, Bajic D, Sanchez A. Global epistasis and the emergence of function in microbial consortia. Cell 2024; 187:3108-3119.e30. [PMID: 38776921 DOI: 10.1016/j.cell.2024.04.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 12/06/2023] [Accepted: 04/16/2024] [Indexed: 05/25/2024]
Abstract
The many functions of microbial communities emerge from a complex web of interactions between organisms and their environment. This poses a significant obstacle to engineering microbial consortia, hindering our ability to harness the potential of microorganisms for biotechnological applications. In this study, we demonstrate that the collective effect of ecological interactions between microbes in a community can be captured by simple statistical models that predict how adding a new species to a community will affect its function. These predictive models mirror the patterns of global epistasis reported in genetics, and they can be quantitatively interpreted in terms of pairwise interactions between community members. Our results illuminate an unexplored path to quantitatively predicting the function of microbial consortia from their composition, paving the way to optimizing desirable community properties and bringing the tasks of predicting biological function at the genetic, organismal, and ecological scales under the same quantitative formalism.
Collapse
Affiliation(s)
- Juan Diaz-Colunga
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06511, USA; Microbial Sciences Institute, Yale University, New Haven, CT 06511, USA; Department of Microbial Biotechnology, National Center for Biotechnology CNB-CSIC, 28049 Madrid, Spain; Institute of Functional Biology and Genomics IBFG-CSIC, University of Salamanca, 37007 Salamanca, Spain.
| | - Abigail Skwara
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06511, USA; Microbial Sciences Institute, Yale University, New Haven, CT 06511, USA
| | - Jean C C Vila
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06511, USA; Microbial Sciences Institute, Yale University, New Haven, CT 06511, USA; Department of Biology, Stanford University, Stanford, CA 94305, USA
| | - Djordje Bajic
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06511, USA; Microbial Sciences Institute, Yale University, New Haven, CT 06511, USA; Department of Biotechnology, Delft University of Technology, Delft 2628 CD, the Netherlands.
| | - Alvaro Sanchez
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06511, USA; Microbial Sciences Institute, Yale University, New Haven, CT 06511, USA; Department of Microbial Biotechnology, National Center for Biotechnology CNB-CSIC, 28049 Madrid, Spain; Institute of Functional Biology and Genomics IBFG-CSIC, University of Salamanca, 37007 Salamanca, Spain.
| |
Collapse
|
14
|
Daffern N, Johansson KE, Baumer ZT, Robertson NR, Woojuh J, Bedewitz MA, Davis Z, Wheeldon I, Cutler SR, Lindorff-Larsen K, Whitehead TA. GMMA Can Stabilize Proteins Across Different Functional Constraints. J Mol Biol 2024; 436:168586. [PMID: 38663544 DOI: 10.1016/j.jmb.2024.168586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 04/16/2024] [Accepted: 04/17/2024] [Indexed: 05/06/2024]
Abstract
Stabilizing proteins without otherwise hampering their function is a central task in protein engineering and design. PYR1 is a plant hormone receptor that has been engineered to bind diverse small molecule ligands. We sought a set of generalized mutations that would provide stability without affecting functionality for PYR1 variants with diverse ligand-binding capabilities. To do this we used a global multi-mutant analysis (GMMA) approach, which can identify substitutions that have stabilizing effects and do not lower function. GMMA has the added benefit of finding substitutions that are stabilizing in different sequence contexts and we hypothesized that applying GMMA to PYR1 with different functionalities would identify this set of generalized mutations. Indeed, conducting FACS and deep sequencing of libraries for PYR1 variants with two different functionalities and applying a GMMA analysis identified 5 substitutions that, when inserted into four PYR1 variants that each bind a unique ligand, provided an increase of 2-6 °C in thermal inactivation temperature and no decrease in functionality.
Collapse
Affiliation(s)
- Nicolas Daffern
- Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, CO 80305, USA
| | - Kristoffer E Johansson
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Zachary T Baumer
- Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, CO 80305, USA
| | | | - Janty Woojuh
- Department of Botany and Plant Sciences, University of California, Riverside, USA
| | - Matthew A Bedewitz
- Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, CO 80305, USA
| | - Zoë Davis
- Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, CO 80305, USA
| | - Ian Wheeldon
- Department of Chemical and Environmental Engineering, University of California, Riverside, USA; Institute for Integrative Genome Biology, University of California, Riverside, Riverside, CA, USA
| | - Sean R Cutler
- Department of Botany and Plant Sciences, University of California, Riverside, USA; Institute for Integrative Genome Biology, University of California, Riverside, Riverside, CA, USA; Center for Plant Cell Biology, University of California, Riverside, Riverside, CA, USA
| | - Kresten Lindorff-Larsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| | - Timothy A Whitehead
- Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, CO 80305, USA.
| |
Collapse
|
15
|
Nguyen TN, Ingle C, Thompson S, Reynolds KA. The genetic landscape of a metabolic interaction. Nat Commun 2024; 15:3351. [PMID: 38637543 PMCID: PMC11026382 DOI: 10.1038/s41467-024-47671-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Accepted: 04/09/2024] [Indexed: 04/20/2024] Open
Abstract
While much prior work has explored the constraints on protein sequence and evolution induced by physical protein-protein interactions, the sequence-level constraints emerging from non-binding functional interactions in metabolism remain unclear. To quantify how variation in the activity of one enzyme constrains the biochemical parameters and sequence of another, we focus on dihydrofolate reductase (DHFR) and thymidylate synthase (TYMS), a pair of enzymes catalyzing consecutive reactions in folate metabolism. We use deep mutational scanning to quantify the growth rate effect of 2696 DHFR single mutations in 3 TYMS backgrounds under conditions selected to emphasize biochemical epistasis. Our data are well-described by a relatively simple enzyme velocity to growth rate model that quantifies how metabolic context tunes enzyme mutational tolerance. Together our results reveal the structural distribution of epistasis in a metabolic enzyme and establish a foundation for the design of multi-enzyme systems.
Collapse
Affiliation(s)
- Thuy N Nguyen
- The Green Center for Systems Biology, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
- The Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
- The Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
- Form Bio, Dallas, TX, 75226, USA
| | - Christine Ingle
- The Green Center for Systems Biology, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
- The Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
- The Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Samuel Thompson
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA, 94158, USA
- Department of Bioengineering, Stanford University, Stanford, CA, 94305, USA
| | - Kimberly A Reynolds
- The Green Center for Systems Biology, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA.
- The Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA.
- The Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA.
| |
Collapse
|
16
|
Sesta L, Pagnani A, Fernandez-de-Cossio-Diaz J, Uguzzoni G. Inference of annealed protein fitness landscapes with AnnealDCA. PLoS Comput Biol 2024; 20:e1011812. [PMID: 38377054 PMCID: PMC10878520 DOI: 10.1371/journal.pcbi.1011812] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Accepted: 01/08/2024] [Indexed: 02/22/2024] Open
Abstract
The design of proteins with specific tasks is a major challenge in molecular biology with important diagnostic and therapeutic applications. High-throughput screening methods have been developed to systematically evaluate protein activity, but only a small fraction of possible protein variants can be tested using these techniques. Computational models that explore the sequence space in-silico to identify the fittest molecules for a given function are needed to overcome this limitation. In this article, we propose AnnealDCA, a machine-learning framework to learn the protein fitness landscape from sequencing data derived from a broad range of experiments that use selection and sequencing to quantify protein activity. We demonstrate the effectiveness of our method by applying it to antibody Rep-Seq data of immunized mice and screening experiments, assessing the quality of the fitness landscape reconstructions. Our method can be applied to several experimental cases where a population of protein variants undergoes various rounds of selection and sequencing, without relying on the computation of variants enrichment ratios, and thus can be used even in cases of disjoint sequence samples.
Collapse
Affiliation(s)
- Luca Sesta
- Department of Applied Science and Technology, Politecnico di Torino, Torino, Italy
| | - Andrea Pagnani
- Department of Applied Science and Technology, Politecnico di Torino, Torino, Italy
- Italian Institute for Genomic Medicine, Torino, Italy
- INFN, Sezione di Torino, Torino, Italy
| | | | | |
Collapse
|
17
|
Weng C, Faure AJ, Escobedo A, Lehner B. The energetic and allosteric landscape for KRAS inhibition. Nature 2024; 626:643-652. [PMID: 38109937 PMCID: PMC10866706 DOI: 10.1038/s41586-023-06954-0] [Citation(s) in RCA: 27] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 12/07/2023] [Indexed: 12/20/2023]
Abstract
Thousands of proteins have been validated genetically as therapeutic targets for human diseases1. However, very few have been successfully targeted, and many are considered 'undruggable'. This is particularly true for proteins that function via protein-protein interactions-direct inhibition of binding interfaces is difficult and requires the identification of allosteric sites. However, most proteins have no known allosteric sites, and a comprehensive allosteric map does not exist for any protein. Here we address this shortcoming by charting multiple global atlases of inhibitory allosteric communication in KRAS. We quantified the effects of more than 26,000 mutations on the folding of KRAS and its binding to six interaction partners. Genetic interactions in double mutants enabled us to perform biophysical measurements at scale, inferring more than 22,000 causal free energy changes. These energy landscapes quantify how mutations tune the binding specificity of a signalling protein and map the inhibitory allosteric sites for an important therapeutic target. Allosteric propagation is particularly effective across the central β-sheet of KRAS, and multiple surface pockets are genetically validated as allosterically active, including a distal pocket in the C-terminal lobe of the protein. Allosteric mutations typically inhibit binding to all tested effectors, but they can also change the binding specificity, revealing the regulatory, evolutionary and therapeutic potential to tune pathway activation. Using the approach described here, it should be possible to rapidly and comprehensively identify allosteric target sites in many proteins.
Collapse
Affiliation(s)
- Chenchun Weng
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Andre J Faure
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Albert Escobedo
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Ben Lehner
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.
- University Pompeu Fabra (UPF), Barcelona, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain.
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.
| |
Collapse
|
18
|
Nemoto T, Ocari T, Planul A, Tekinsoy M, Zin EA, Dalkara D, Ferrari U. ACIDES: on-line monitoring of forward genetic screens for protein engineering. Nat Commun 2023; 14:8504. [PMID: 38148337 PMCID: PMC10751290 DOI: 10.1038/s41467-023-43967-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Accepted: 11/24/2023] [Indexed: 12/28/2023] Open
Abstract
Forward genetic screens of mutated variants are a versatile strategy for protein engineering and investigation, which has been successfully applied to various studies like directed evolution (DE) and deep mutational scanning (DMS). While next-generation sequencing can track millions of variants during the screening rounds, the vast and noisy nature of the sequencing data impedes the estimation of the performance of individual variants. Here, we propose ACIDES that combines statistical inference and in-silico simulations to improve performance estimation in the library selection process by attributing accurate statistical scores to individual variants. We tested ACIDES first on a random-peptide-insertion experiment and then on multiple public datasets from DE and DMS studies. ACIDES allows experimentalists to reliably estimate variant performance on the fly and can aid protein engineering and research pipelines in a range of applications, including gene therapy.
Collapse
Affiliation(s)
- Takahiro Nemoto
- Institut de la Vision, Sorbonne Université, INSERM, CNRS, 17 rue Moreau, 75012, Paris, France.
- Graduate School of Informatics, Kyoto University, Yoshida Hon-machi, Sakyo-ku, Kyoto, 606-8501, Japan.
- Premium Research Institute for Human Metaverse Medicine (WPI-PRIMe), Osaka University, Suita, Osaka, 565-0871, Japan.
| | - Tommaso Ocari
- Institut de la Vision, Sorbonne Université, INSERM, CNRS, 17 rue Moreau, 75012, Paris, France
| | - Arthur Planul
- Institut de la Vision, Sorbonne Université, INSERM, CNRS, 17 rue Moreau, 75012, Paris, France
| | - Muge Tekinsoy
- Institut de la Vision, Sorbonne Université, INSERM, CNRS, 17 rue Moreau, 75012, Paris, France
| | - Emilia A Zin
- Institut de la Vision, Sorbonne Université, INSERM, CNRS, 17 rue Moreau, 75012, Paris, France
| | - Deniz Dalkara
- Institut de la Vision, Sorbonne Université, INSERM, CNRS, 17 rue Moreau, 75012, Paris, France.
| | - Ulisse Ferrari
- Institut de la Vision, Sorbonne Université, INSERM, CNRS, 17 rue Moreau, 75012, Paris, France.
| |
Collapse
|
19
|
Ogbunugafor CB, Guerrero RF, Miller-Dickson MD, Shakhnovich EI, Shoulders MD. Epistasis and pleiotropy shape biophysical protein subspaces associated with drug resistance. Phys Rev E 2023; 108:054408. [PMID: 38115433 PMCID: PMC10935598 DOI: 10.1103/physreve.108.054408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2023] [Accepted: 09/19/2023] [Indexed: 12/21/2023]
Abstract
Protein space is a rich analogy for genotype-phenotype maps, where amino acid sequence is organized into a high-dimensional space that highlights the connectivity between protein variants. It is a useful abstraction for understanding the process of evolution, and for efforts to engineer proteins towards desirable phenotypes. Few mentions of protein space consider how protein phenotypes can be described in terms of their biophysical components, nor do they rigorously interrogate how forces like epistasis-describing the nonlinear interaction between mutations and their phenotypic consequences-manifest across these components. In this study, we deconstruct a low-dimensional protein space of a bacterial enzyme (dihydrofolate reductase; DHFR) into "subspaces" corresponding to a set of kinetic and thermodynamic traits [k_{cat}, K_{M}, K_{i}, and T_{m} (melting temperature)]. We then examine how combinations of three mutations (eight alleles in total) display pleiotropy, or unique effects on individual subspace traits. We examine protein spaces across three orthologous DHFR enzymes (Escherichia coli, Listeria grayi, and Chlamydia muridarum), adding a genotypic context dimension through which epistasis occurs across subspaces. In doing so, we reveal that protein space is a deceptively complex notion, and that future applications to bioengineering should consider how interactions between amino acid substitutions manifest across different phenotypic subspaces.
Collapse
Affiliation(s)
- C. Brandon Ogbunugafor
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut, USA
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
- Santa Fe Institute, Santa Fe, New Mexico, USA
| | - Rafael F. Guerrero
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina, USA
| | | | - Eugene I. Shakhnovich
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts, USA
| | - Matthew D. Shoulders
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| |
Collapse
|
20
|
Xie X, Sun X, Wang Y, Lehner B, Li X. Dominance vs epistasis: the biophysical origins and plasticity of genetic interactions within and between alleles. Nat Commun 2023; 14:5551. [PMID: 37689712 PMCID: PMC10492795 DOI: 10.1038/s41467-023-41188-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Accepted: 08/25/2023] [Indexed: 09/11/2023] Open
Abstract
An important challenge in genetics, evolution and biotechnology is to understand and predict how mutations combine to alter phenotypes, including molecular activities, fitness and disease. In diploids, mutations in a gene can combine on the same chromosome or on different chromosomes as a "heteroallelic combination". However, a direct comparison of the extent, sign, and stability of the genetic interactions between variants within and between alleles is lacking. Here we use thermodynamic models of protein folding and ligand-binding to show that interactions between mutations within and between alleles are expected in even very simple biophysical systems. Protein folding alone generates within-allele interactions and a single molecular interaction is sufficient to cause between-allele interactions and dominance. These interactions change differently, quantitatively and qualitatively as a system becomes more complex. Altering the concentration of a ligand can, for example, switch alleles from dominant to recessive. Our results show that intra-molecular epistasis and dominance should be widely expected in even the simplest biological systems but also reinforce the view that they are plastic system properties and so a formidable challenge to predict. Accurate prediction of both intra-molecular epistasis and dominance will require either detailed mechanistic understanding and experimental parameterization or brute-force measurement and learning.
Collapse
Affiliation(s)
- Xuan Xie
- Zhejiang University - University of Edinburgh Institute, Zhejiang University School of Medicine, Haining, 314400, P. R. China
| | - Xia Sun
- Zhejiang University - University of Edinburgh Institute, Zhejiang University School of Medicine, Haining, 314400, P. R. China
- Deanery of Biomedical Sciences, College of Medicine & Veterinary Medicine, University of Edinburgh, Edinburgh, EH8 9XD, UK
| | - Yuheng Wang
- Zhejiang University - University of Edinburgh Institute, Zhejiang University School of Medicine, Haining, 314400, P. R. China
- Deanery of Biomedical Sciences, College of Medicine & Veterinary Medicine, University of Edinburgh, Edinburgh, EH8 9XD, UK
| | - Ben Lehner
- Center for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Spain.
- Universitat Pompeu Fabra (UPF), Barcelona, 08003, Spain.
- ICREA, Pg. Luis Companys 23, Barcelona, 08010, Spain.
- Wellcome Sanger Institute, Wellcome Genome Campus Hinxton, Cambridge, CB10 1SA, UK.
| | - Xianghua Li
- Zhejiang University - University of Edinburgh Institute, Zhejiang University School of Medicine, Haining, 314400, P. R. China.
- Wellcome Sanger Institute, Wellcome Genome Campus Hinxton, Cambridge, CB10 1SA, UK.
- Deanery of Biomedical Sciences, College of Medicine & Veterinary Medicine, University of Edinburgh, Edinburgh, EH8 9XD, UK.
- Biomedical and Health Translational Centre of Zhejiang Province, Haizhou East Road 718, Haining, 314400, P. R. China.
| |
Collapse
|
21
|
Nguyen TN, Ingle C, Thompson S, Reynolds KA. The Genetic Landscape of a Metabolic Interaction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.28.542639. [PMID: 37645784 PMCID: PMC10461916 DOI: 10.1101/2023.05.28.542639] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
Enzyme abundance, catalytic activity, and ultimately sequence are all shaped by the need of growing cells to maintain metabolic flux while minimizing accumulation of deleterious intermediates. While much prior work has explored the constraints on protein sequence and evolution induced by physical protein-protein interactions, the sequence-level constraints emerging from non-binding functional interactions in metabolism remain unclear. To quantify how variation in the activity of one enzyme constrains the biochemical parameters and sequence of another, we focused on dihydrofolate reductase (DHFR) and thymidylate synthase (TYMS), a pair of enzymes catalyzing consecutive reactions in folate metabolism. We used deep mutational scanning to quantify the growth rate effect of 2,696 DHFR single mutations in 3 TYMS backgrounds under conditions selected to emphasize biochemical epistasis. Our data are well-described by a relatively simple enzyme velocity to growth rate model that quantifies how metabolic context tunes enzyme mutational tolerance. Together our results reveal the structural distribution of epistasis in a metabolic enzyme and establish a foundation for the design of multi-enzyme systems.
Collapse
Affiliation(s)
- Thuy N. Nguyen
- The Green Center for Systems Biology, University of Texas Southwestern Medical Center, Dallas, USA, 75390
- Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, USA, 75390
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, USA, 75390
| | - Christine Ingle
- The Green Center for Systems Biology, University of Texas Southwestern Medical Center, Dallas, USA, 75390
- Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, USA, 75390
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, USA, 75390
| | - Samuel Thompson
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94158
| | - Kimberly A. Reynolds
- The Green Center for Systems Biology, University of Texas Southwestern Medical Center, Dallas, USA, 75390
- Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, USA, 75390
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, USA, 75390
| |
Collapse
|
22
|
Vila JA. Protein folding rate evolution upon mutations. Biophys Rev 2023; 15:661-669. [PMID: 37681091 PMCID: PMC10480377 DOI: 10.1007/s12551-023-01088-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Accepted: 06/24/2023] [Indexed: 09/09/2023] Open
Abstract
Despite the spectacular success of cutting-edge protein fold prediction methods, many critical questions remain unanswered, including why proteins can reach their native state in a biologically reasonable time. A satisfactory answer to this simple question could shed light on the slowest folding rate of proteins as well as how mutations-amino-acid substitutions and/or post-translational modifications-might affect it. Preliminary results indicate that (i) Anfinsen's dogma validity ensures that proteins reach their native state on a reasonable timescale regardless of their sequence or length, and (ii) it is feasible to determine the evolution of protein folding rates without accounting for epistasis effects or the mutational trajectories between the starting and target sequences. These results have direct implications for evolutionary biology because they lay the groundwork for a better understanding of why, and to what extent, mutations-a crucial element of evolution and a factor influencing it-affect protein evolvability. Furthermore, they may spur significant progress in our efforts to solve crucial structural biology problems, such as how a sequence encodes its folding.
Collapse
Affiliation(s)
- Jorge A. Vila
- IMASL-CONICET, Universidad Nacional de San Luis, Ejército de Los Andes 950, 5700 San Luis, Argentina
| |
Collapse
|
23
|
Cagiada M, Bottaro S, Lindemose S, Schenstrøm SM, Stein A, Hartmann-Petersen R, Lindorff-Larsen K. Discovering functionally important sites in proteins. Nat Commun 2023; 14:4175. [PMID: 37443362 PMCID: PMC10345196 DOI: 10.1038/s41467-023-39909-0] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Accepted: 07/02/2023] [Indexed: 07/15/2023] Open
Abstract
Proteins play important roles in biology, biotechnology and pharmacology, and missense variants are a common cause of disease. Discovering functionally important sites in proteins is a central but difficult problem because of the lack of large, systematic data sets. Sequence conservation can highlight residues that are functionally important but is often convoluted with a signal for preserving structural stability. We here present a machine learning method to predict functional sites by combining statistical models for protein sequences with biophysical models of stability. We train the model using multiplexed experimental data on variant effects and validate it broadly. We show how the model can be used to discover active sites, as well as regulatory and binding sites. We illustrate the utility of the model by prospective prediction and subsequent experimental validation on the functional consequences of missense variants in HPRT1 which may cause Lesch-Nyhan syndrome, and pinpoint the molecular mechanisms by which they cause disease.
Collapse
Affiliation(s)
- Matteo Cagiada
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Sandro Bottaro
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Søren Lindemose
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Signe M Schenstrøm
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Amelie Stein
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Rasmus Hartmann-Petersen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| | - Kresten Lindorff-Larsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
24
|
Wonderlick DR, Widom JR, Harms MJ. Disentangling contact and ensemble epistasis in a riboswitch. Biophys J 2023; 122:1600-1612. [PMID: 36710492 PMCID: PMC10183321 DOI: 10.1016/j.bpj.2023.01.033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 01/09/2023] [Accepted: 01/24/2023] [Indexed: 01/29/2023] Open
Abstract
Mutations introduced into macromolecules often exhibit epistasis, where the effect of one mutation alters the effect of another. Knowing the mechanisms that lead to epistasis is important for understanding how macromolecules work and evolve, as well as for effective macromolecular engineering. Here, we investigate the interplay between "contact epistasis" (epistasis arising from physical interactions between mutated residues) and "ensemble epistasis" (epistasis that occurs when a mutation redistributes the conformational ensemble of a macromolecule, thus changing the effect of the second mutation). We argue that the two mechanisms can be distinguished in allosteric macromolecules by measuring epistasis at differing allosteric effector concentrations. Contact epistasis manifests as nonadditivity in the microscopic equilibrium constants describing the conformational ensemble. This epistatic effect is independent of allosteric effector concentration. Ensemble epistasis manifests as nonadditivity in thermodynamic observables-such as ligand binding-that are determined by the distribution of ensemble conformations. This epistatic effect strongly depends on allosteric effector concentration. Using this framework, we experimentally investigated the origins of epistasis in three pairwise mutant cycles introduced into the adenine riboswitch aptamer domain by measuring ligand binding as a function of allosteric effector concentration. We found evidence for both contact and ensemble epistasis in all cycles. Furthermore, we found that the two mechanisms of epistasis could interact with each other. For example, in one mutant cycle we observed 6 kcal/mol of contact epistasis in a microscopic equilibrium constant. In that same cycle, the maximum epistasis in ligand binding was only 1.5 kcal/mol: shifts in the ensemble masked the contribution of contact epistasis. Finally, our work yields simple heuristics for identifying contact and ensemble epistasis based on measurements of a biochemical observable as a function of allosteric effector concentration.
Collapse
Affiliation(s)
- Daria R Wonderlick
- Department of Chemistry and Biochemistry, University of Oregon, Eugene, Oregon
| | - Julia R Widom
- Department of Chemistry and Biochemistry, University of Oregon, Eugene, Oregon; Institute for Molecular Biology, University of Oregon, Eugene, Oregon; Oregon Center for Optical, Molecular, & Quantum Science, University of Oregon, Eugene, Oregon
| | - Michael J Harms
- Department of Chemistry and Biochemistry, University of Oregon, Eugene, Oregon; Institute for Molecular Biology, University of Oregon, Eugene, Oregon.
| |
Collapse
|
25
|
Johansson KE, Lindorff-Larsen K, Winther JR. Global Analysis of Multi-Mutants to Improve Protein Function. J Mol Biol 2023; 435:168034. [PMID: 36863661 DOI: 10.1016/j.jmb.2023.168034] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Revised: 02/22/2023] [Accepted: 02/22/2023] [Indexed: 03/04/2023]
Abstract
The identification of amino acid substitutions that both enhance the stability and function of a protein is a key challenge in protein engineering. Technological advances have enabled assaying thousands of protein variants in a single high-throughput experiment, and more recent studies use such data in protein engineering. We present a Global Multi-Mutant Analysis (GMMA) that exploits the presence of multiply-substituted variants to identify individual amino acid substitutions that are beneficial for the stability and function across a large library of protein variants. We have applied GMMA to a previously published experiment reporting on >54,000 variants of green fluorescent protein (GFP), each with known fluorescence output, and each carrying 1-15 amino acid substitutions (Sarkisyan et al., 2016). The GMMA method achieves a good fit to this dataset while being analytically transparent. We show experimentally that the six top-ranking substitutions progressively enhance GFP. More broadly, using only a single experiment as input our analysis recovers nearly all the substitutions previously reported to be beneficial for GFP folding and function. In conclusion, we suggest that large libraries of multiply-substituted variants may provide a unique source of information for protein engineering.
Collapse
Affiliation(s)
- Kristoffer E Johansson
- Linderstrøm-Lang Centre for Protein Science, Section for Biomolecular Sciences, Department of Biology of (University of Copenhagen), Ole Maaloes Vej 5, DK-2200 Copenhagen N, Denmark.
| | - Kresten Lindorff-Larsen
- Linderstrøm-Lang Centre for Protein Science, Section for Biomolecular Sciences, Department of Biology of (University of Copenhagen), Ole Maaloes Vej 5, DK-2200 Copenhagen N, Denmark.
| | - Jakob R Winther
- Linderstrøm-Lang Centre for Protein Science, Section for Biomolecular Sciences, Department of Biology of (University of Copenhagen), Ole Maaloes Vej 5, DK-2200 Copenhagen N, Denmark.
| |
Collapse
|
26
|
Yu TC, Thornton ZT, Hannon WW, DeWitt WS, Radford CE, Matsen FA, Bloom JD. A biophysical model of viral escape from polyclonal antibodies. Virus Evol 2022; 8:veac110. [PMID: 36582502 PMCID: PMC9793855 DOI: 10.1093/ve/veac110] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2022] [Revised: 11/12/2022] [Accepted: 11/29/2022] [Indexed: 12/14/2022] Open
Abstract
A challenge in studying viral immune escape is determining how mutations combine to escape polyclonal antibodies, which can potentially target multiple distinct viral epitopes. Here we introduce a biophysical model of this process that partitions the total polyclonal antibody activity by epitope and then quantifies how each viral mutation affects the antibody activity against each epitope. We develop software that can use deep mutational scanning data to infer these properties for polyclonal antibody mixtures. We validate this software using a computationally simulated deep mutational scanning experiment and demonstrate that it enables the prediction of escape by arbitrary combinations of mutations. The software described in this paper is available at https://jbloomlab.github.io/polyclonal.
Collapse
Affiliation(s)
- Timothy C Yu
- Basic Sciences Division, Fred Hutchinson Cancer Center, 1100 Fairview Ave N, Seattle, WA 98109, USA
- Computational Biology Program, Fred Hutchinson Cancer Center, 1100 Fairview Ave N, Seattle, WA 98109, USA
- Molecular and Cellular Biology Graduate Program, University of Washington, 1959 NE Pacifc Street, Seattle, WA 98195, USA
| | - Zorian T Thornton
- Computational Biology Program, Fred Hutchinson Cancer Center, 1100 Fairview Ave N, Seattle, WA 98109, USA
- Department of Genome Sciences, University of Washington, 3720 15th Ave NE, Seattle, WA 98195, USA
| | - William W Hannon
- Basic Sciences Division, Fred Hutchinson Cancer Center, 1100 Fairview Ave N, Seattle, WA 98109, USA
- Computational Biology Program, Fred Hutchinson Cancer Center, 1100 Fairview Ave N, Seattle, WA 98109, USA
- Molecular and Cellular Biology Graduate Program, University of Washington, 1959 NE Pacifc Street, Seattle, WA 98195, USA
| | - William S DeWitt
- Computational Biology Program, Fred Hutchinson Cancer Center, 1100 Fairview Ave N, Seattle, WA 98109, USA
- Department of Genome Sciences, University of Washington, 3720 15th Ave NE, Seattle, WA 98195, USA
| | - Caelan E Radford
- Basic Sciences Division, Fred Hutchinson Cancer Center, 1100 Fairview Ave N, Seattle, WA 98109, USA
- Computational Biology Program, Fred Hutchinson Cancer Center, 1100 Fairview Ave N, Seattle, WA 98109, USA
- Molecular and Cellular Biology Graduate Program, University of Washington, 1959 NE Pacifc Street, Seattle, WA 98195, USA
| | - Frederick A Matsen
- Computational Biology Program, Fred Hutchinson Cancer Center, 1100 Fairview Ave N, Seattle, WA 98109, USA
- Department of Genome Sciences, University of Washington, 3720 15th Ave NE, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, 1100 Fairview Ave N, Seattle, WA 98109, USA
| | - Jesse D Bloom
- Basic Sciences Division, Fred Hutchinson Cancer Center, 1100 Fairview Ave N, Seattle, WA 98109, USA
- Computational Biology Program, Fred Hutchinson Cancer Center, 1100 Fairview Ave N, Seattle, WA 98109, USA
- Department of Genome Sciences, University of Washington, 3720 15th Ave NE, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, 1100 Fairview Ave N, Seattle, WA 98109, USA
| |
Collapse
|
27
|
Norrild RK, Johansson KE, O’Shea C, Morth JP, Lindorff-Larsen K, Winther JR. Increasing protein stability by inferring substitution effects from high-throughput experiments. CELL REPORTS METHODS 2022; 2:100333. [PMID: 36452862 PMCID: PMC9701609 DOI: 10.1016/j.crmeth.2022.100333] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Revised: 06/22/2022] [Accepted: 10/19/2022] [Indexed: 06/17/2023]
Abstract
We apply a computational model, global multi-mutant analysis (GMMA), to inform on effects of most amino acid substitutions from a randomly mutated gene library. Using a high mutation frequency, the method can determine mutations that increase the stability of even very stable proteins for which conventional selection systems have reached their limit. As a demonstration of this, we screened a mutant library of a highly stable and computationally redesigned model protein using an in vivo genetic sensor for folding and assigned a stability effect to 374 of 912 possible single amino acid substitutions. Combining the top 9 substitutions increased the unfolding energy 47 to 69 kJ/mol in a single engineering step. Crystal structures of stabilized variants showed small perturbations in helices 1 and 2, which rendered them closer in structure to the redesign template. This case study illustrates the capability of the method, which is applicable to any screen for protein function.
Collapse
Affiliation(s)
- Rasmus Krogh Norrild
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, 2200 Copenhagen N, Denmark
- Department of Biotechnology and Biomedicine, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
| | - Kristoffer Enøe Johansson
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Charlotte O’Shea
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Jens Preben Morth
- Department of Biotechnology and Biomedicine, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
| | - Kresten Lindorff-Larsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Jakob Rahr Winther
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, 2200 Copenhagen N, Denmark
| |
Collapse
|
28
|
Interpretable modeling of genotype-phenotype landscapes with state-of-the-art predictive power. Proc Natl Acad Sci U S A 2022; 119:e2114021119. [PMID: 35733251 PMCID: PMC9245639 DOI: 10.1073/pnas.2114021119] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Large-scale measurements linking genetic background to biological function have driven a need for models that can incorporate these data for reliable predictions and insight into the underlying biophysical system. Recent modeling efforts, however, prioritize predictive accuracy at the expense of model interpretability. Here, we present LANTERN (landscape interpretable nonparametric model, https://github.com/usnistgov/lantern), a hierarchical Bayesian model that distills genotype-phenotype landscape (GPL) measurements into a low-dimensional feature space that represents the fundamental biological mechanisms of the system while also enabling straightforward, explainable predictions. Across a benchmark of large-scale datasets, LANTERN equals or outperforms all alternative approaches, including deep neural networks. LANTERN furthermore extracts useful insights of the landscape, including its inherent dimensionality, a latent space of additive mutational effects, and metrics of landscape structure. LANTERN facilitates straightforward discovery of fundamental mechanisms in GPLs, while also reliably extrapolating to unexplored regions of genotypic space.
Collapse
|
29
|
Tareen A, Kooshkbaghi M, Posfai A, Ireland WT, McCandlish DM, Kinney JB. MAVE-NN: learning genotype-phenotype maps from multiplex assays of variant effect. Genome Biol 2022; 23:98. [PMID: 35428271 PMCID: PMC9011994 DOI: 10.1186/s13059-022-02661-7] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Revised: 03/21/2022] [Accepted: 03/24/2022] [Indexed: 12/17/2022] Open
Abstract
Multiplex assays of variant effect (MAVEs) are a family of methods that includes deep mutational scanning experiments on proteins and massively parallel reporter assays on gene regulatory sequences. Despite their increasing popularity, a general strategy for inferring quantitative models of genotype-phenotype maps from MAVE data is lacking. Here we introduce MAVE-NN, a neural-network-based Python package that implements a broadly applicable information-theoretic framework for learning genotype-phenotype maps-including biophysically interpretable models-from MAVE datasets. We demonstrate MAVE-NN in multiple biological contexts, and highlight the ability of our approach to deconvolve mutational effects from otherwise confounding experimental nonlinearities and noise.
Collapse
Affiliation(s)
- Ammar Tareen
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, 11724, NY, USA
- Present Address: Regeneron Pharmaceuticals, Inc., Tarrytown, 10591, NY, USA
| | - Mahdi Kooshkbaghi
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, 11724, NY, USA
| | - Anna Posfai
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, 11724, NY, USA
| | - William T Ireland
- Department of Physics, California Institute of Technology, Pasadena, 91125, CA, USA
- Present Address: Department of Applied Physics, Harvard University, Cambridge, 02134, MA, USA
| | - David M McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, 11724, NY, USA
| | - Justin B Kinney
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, 11724, NY, USA.
| |
Collapse
|
30
|
Faure AJ, Domingo J, Schmiedel JM, Hidalgo-Carcedo C, Diss G, Lehner B. Mapping the energetic and allosteric landscapes of protein binding domains. Nature 2022; 604:175-183. [PMID: 35388192 DOI: 10.1038/s41586-022-04586-4] [Citation(s) in RCA: 117] [Impact Index Per Article: 39.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Accepted: 02/25/2022] [Indexed: 11/09/2022]
Abstract
Allosteric communication between distant sites in proteins is central to biological regulation but still poorly characterized, limiting understanding, engineering and drug development1-6. An important reason for this is the lack of methods to comprehensively quantify allostery in diverse proteins. Here we address this shortcoming and present a method that uses deep mutational scanning to globally map allostery. The approach uses an efficient experimental design to infer en masse the causal biophysical effects of mutations by quantifying multiple molecular phenotypes-here we examine binding and protein abundance-in multiple genetic backgrounds and fitting thermodynamic models using neural networks. We apply the approach to two of the most common protein interaction domains found in humans, an SH3 domain and a PDZ domain, to produce comprehensive atlases of allosteric communication. Allosteric mutations are abundant, with a large mutational target space of network-altering 'edgetic' variants. Mutations are more likely to be allosteric closer to binding interfaces, at glycine residues and at specific residues connecting to an opposite surface within the PDZ domain. This general approach of quantifying mutational effects for multiple molecular phenotypes and in multiple genetic backgrounds should enable the energetic and allosteric landscapes of many proteins to be rapidly and comprehensively mapped.
Collapse
Affiliation(s)
- Andre J Faure
- Center for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Júlia Domingo
- Center for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.,New York Genome Center (NYGC), New York, NY, USA
| | - Jörn M Schmiedel
- Center for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Cristina Hidalgo-Carcedo
- Center for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Guillaume Diss
- Center for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.,Friedrich Miescher Institute for Biomedical Research (FMI), Basel, Switzerland
| | - Ben Lehner
- Center for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain. .,Universitat Pompeu Fabra (UPF), Barcelona, Spain. .,Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain.
| |
Collapse
|
31
|
On the sparsity of fitness functions and implications for learning. Proc Natl Acad Sci U S A 2022; 119:2109649118. [PMID: 34937698 PMCID: PMC8740588 DOI: 10.1073/pnas.2109649118] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/11/2021] [Indexed: 01/05/2023] Open
Abstract
The properties of proteins and other biological molecules are encoded in large part in the sequence of amino acids or nucleotides that defines them. Increasingly, researchers estimate functions that map sequences to a particular property using machine learning and related statistical approaches. However, an important question remains unanswered: How many experimental measurements are needed in order to accurately learn these “fitness” functions? We leverage perspectives from the fields of biophysics, evolutionary biology, and signal processing to develop a theoretical framework that enables us to make progress on answering this question. We demonstrate that this framework can be used to make useful calculations on real-world data and suggest how these calculations may be used to guide experiments. Fitness functions map biological sequences to a scalar property of interest. Accurate estimation of these functions yields biological insight and sets the foundation for model-based sequence design. However, the fitness datasets available to learn these functions are typically small relative to the large combinatorial space of sequences; characterizing how much data are needed for accurate estimation remains an open problem. There is a growing body of evidence demonstrating that empirical fitness functions display substantial sparsity when represented in terms of epistatic interactions. Moreover, the theory of Compressed Sensing provides scaling laws for the number of samples required to exactly recover a sparse function. Motivated by these results, we develop a framework to study the sparsity of fitness functions sampled from a generalization of the NK model, a widely used random field model of fitness functions. In particular, we present results that allow us to test the effect of the Generalized NK (GNK) model’s interpretable parameters—sequence length, alphabet size, and assumed interactions between sequence positions—on the sparsity of fitness functions sampled from the model and, consequently, the number of measurements required to exactly recover these functions. We validate our framework by demonstrating that GNK models with parameters set according to structural considerations can be used to accurately approximate the number of samples required to recover two empirical protein fitness functions and an RNA fitness function. In addition, we show that these GNK models identify important higher-order epistatic interactions in the empirical fitness functions using only structural information.
Collapse
|
32
|
Zutz A, Hamborg L, Pedersen LE, Kassem MM, Papaleo E, Koza A, Herrgård MJ, Jensen SI, Teilum K, Lindorff-Larsen K, Nielsen AT. A dual-reporter system for investigating and optimizing protein translation and folding in E. coli. Nat Commun 2021; 12:6093. [PMID: 34667164 PMCID: PMC8526717 DOI: 10.1038/s41467-021-26337-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2020] [Accepted: 10/01/2021] [Indexed: 01/29/2023] Open
Abstract
Strategies for investigating and optimizing the expression and folding of proteins for biotechnological and pharmaceutical purposes are in high demand. Here, we describe a dual-reporter biosensor system that simultaneously assesses in vivo protein translation and protein folding, thereby enabling rapid screening of mutant libraries. We have validated the dual-reporter system on five different proteins and find an excellent correlation between reporter signals and the levels of protein expression and solubility of the proteins. We further demonstrate the applicability of the dual-reporter system as a screening assay for deep mutational scanning experiments. The system enables high throughput selection of protein variants with high expression levels and altered protein stability. Next generation sequencing analysis of the resulting libraries of protein variants show a good correlation between computationally predicted and experimentally determined protein stabilities. We furthermore show that the mutational experimental data obtained using this system may be useful for protein structure calculations.
Collapse
Affiliation(s)
- Ariane Zutz
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet 220, 2800 Kgs, Lyngby, Denmark
| | - Louise Hamborg
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet 220, 2800 Kgs, Lyngby, Denmark
- Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Ole Maaloes Vej 5, 2200, Copenhagen N, Denmark
| | - Lasse Ebdrup Pedersen
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet 220, 2800 Kgs, Lyngby, Denmark
| | - Maher M Kassem
- Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Ole Maaloes Vej 5, 2200, Copenhagen N, Denmark
| | - Elena Papaleo
- Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Ole Maaloes Vej 5, 2200, Copenhagen N, Denmark
| | - Anna Koza
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet 220, 2800 Kgs, Lyngby, Denmark
| | - Markus J Herrgård
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet 220, 2800 Kgs, Lyngby, Denmark
| | - Sheila Ingemann Jensen
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet 220, 2800 Kgs, Lyngby, Denmark
| | - Kaare Teilum
- Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Ole Maaloes Vej 5, 2200, Copenhagen N, Denmark
| | - Kresten Lindorff-Larsen
- Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Ole Maaloes Vej 5, 2200, Copenhagen N, Denmark
| | - Alex Toftgaard Nielsen
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet 220, 2800 Kgs, Lyngby, Denmark.
| |
Collapse
|
33
|
Sesta L, Uguzzoni G, Fernandez-de-Cossio-Diaz J, Pagnani A. AMaLa: Analysis of Directed Evolution Experiments via Annealed Mutational Approximated Landscape. Int J Mol Sci 2021; 22:10908. [PMID: 34681569 PMCID: PMC8535593 DOI: 10.3390/ijms222010908] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2021] [Revised: 09/24/2021] [Accepted: 09/27/2021] [Indexed: 01/12/2023] Open
Abstract
We present Annealed Mutational approximated Landscape (AMaLa), a new method to infer fitness landscapes from Directed Evolution experiments sequencing data. Such experiments typically start from a single wild-type sequence, which undergoes Darwinian in vitro evolution via multiple rounds of mutation and selection for a target phenotype. In the last years, Directed Evolution is emerging as a powerful instrument to probe fitness landscapes under controlled experimental conditions and as a relevant testing ground to develop accurate statistical models and inference algorithms (thanks to high-throughput screening and sequencing). Fitness landscape modeling either uses the enrichment of variants abundances as input, thus requiring the observation of the same variants at different rounds or assuming the last sequenced round as being sampled from an equilibrium distribution. AMaLa aims at effectively leveraging the information encoded in the whole time evolution. To do so, while assuming statistical sampling independence between sequenced rounds, the possible trajectories in sequence space are gauged with a time-dependent statistical weight consisting of two contributions: (i) an energy term accounting for the selection process and (ii) a generalized Jukes-Cantor model for the purely mutational step. This simple scheme enables accurately describing the Directed Evolution dynamics and inferring a fitness landscape that correctly reproduces the measures of the phenotype under selection (e.g., antibiotic drug resistance), notably outperforming widely used inference strategies. In addition, we assess the reliability of AMaLa by showing how the inferred statistical model could be used to predict relevant structural properties of the wild-type sequence.
Collapse
Affiliation(s)
- Luca Sesta
- Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino, Italy; (L.S.); (G.U.); (A.P.)
| | - Guido Uguzzoni
- Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino, Italy; (L.S.); (G.U.); (A.P.)
| | - Jorge Fernandez-de-Cossio-Diaz
- Laboratory of Physics of the Ecole Normale Supérieure, CNRS UMR 8023 & PSL Research, Sorbonne Université, 24 rue Lhomond, 75005 Paris, France
- Center of Molecular Immunology, Systems Biology Department, Playa, Havana CP 11600, Cuba
| | - Andrea Pagnani
- Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino, Italy; (L.S.); (G.U.); (A.P.)
- Italian Institute for Genomic Medicine, IRCCS Candiolo, SP-142, I-10060 Candiolo, Italy
- INFN, Sezione di Torino, I-10125 Torino, Italy
| |
Collapse
|
34
|
Phillips AM, Lawrence KR, Moulana A, Dupic T, Chang J, Johnson MS, Cvijovic I, Mora T, Walczak AM, Desai MM. Binding affinity landscapes constrain the evolution of broadly neutralizing anti-influenza antibodies. eLife 2021; 10:71393. [PMID: 34491198 PMCID: PMC8476123 DOI: 10.7554/elife.71393] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 09/05/2021] [Indexed: 12/12/2022] Open
Abstract
Over the past two decades, several broadly neutralizing antibodies (bnAbs) that confer protection against diverse influenza strains have been isolated. Structural and biochemical characterization of these bnAbs has provided molecular insight into how they bind distinct antigens. However, our understanding of the evolutionary pathways leading to bnAbs, and thus how best to elicit them, remains limited. Here, we measure equilibrium dissociation constants of combinatorially complete mutational libraries for two naturally isolated influenza bnAbs (CR9114, 16 heavy-chain mutations; CR6261, 11 heavy-chain mutations), reconstructing all possible evolutionary intermediates back to the unmutated germline sequences. We find that these two libraries exhibit strikingly different patterns of breadth: while many variants of CR6261 display moderate affinity to diverse antigens, those of CR9114 display appreciable affinity only in specific, nested combinations. By examining the extensive pairwise and higher order epistasis between mutations, we find key sites with strong synergistic interactions that are highly similar across antigens for CR6261 and different for CR9114. Together, these features of the binding affinity landscapes strongly favor sequential acquisition of affinity to diverse antigens for CR9114, while the acquisition of breadth to more similar antigens for CR6261 is less constrained. These results, if generalizable to other bnAbs, may explain the molecular basis for the widespread observation that sequential exposure favors greater breadth, and such mechanistic insight will be essential for predicting and eliciting broadly protective immune responses.
Collapse
Affiliation(s)
- Angela M Phillips
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, United States
| | - Katherine R Lawrence
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, United States.,NSF-Simons Center for Mathematical and Statistical Analysis of Biology, Harvard University, Cambridge, United States.,Quantitative Biology Initiative, Harvard University, Cambridge, United States.,Department of Physics, Massachusetts Institute of Technology, Cambridge, United States
| | - Alief Moulana
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, United States
| | - Thomas Dupic
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, United States
| | - Jeffrey Chang
- Department of Physics, Harvard University, Cambridge, United States
| | - Milo S Johnson
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, United States
| | - Ivana Cvijovic
- Department of Applied Physics, Stanford University, Stanford, United States
| | - Thierry Mora
- Laboratoire de physique de ÍÉcole Normale Supérieure, CNRS, PSL University, Sorbonne Université, and Université de Paris, Paris, France
| | - Aleksandra M Walczak
- Laboratoire de physique de ÍÉcole Normale Supérieure, CNRS, PSL University, Sorbonne Université, and Université de Paris, Paris, France
| | - Michael M Desai
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, United States.,NSF-Simons Center for Mathematical and Statistical Analysis of Biology, Harvard University, Cambridge, United States.,Quantitative Biology Initiative, Harvard University, Cambridge, United States.,Department of Physics, Harvard University, Cambridge, United States
| |
Collapse
|
35
|
Morrison AJ, Wonderlick DR, Harms MJ. Ensemble epistasis: thermodynamic origins of nonadditivity between mutations. Genetics 2021; 219:iyab105. [PMID: 34849909 PMCID: PMC8633102 DOI: 10.1093/genetics/iyab105] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Accepted: 06/19/2021] [Indexed: 01/02/2023] Open
Abstract
Epistasis-when mutations combine nonadditively-is a profoundly important aspect of biology. It is often difficult to understand its mechanistic origins. Here, we show that epistasis can arise from the thermodynamic ensemble, or the set of interchanging conformations a protein adopts. Ensemble epistasis occurs because mutations can have different effects on different conformations of the same protein, leading to nonadditive effects on its average, observable properties. Using a simple analytical model, we found that ensemble epistasis arises when two conditions are met: (1) a protein populates at least three conformations and (2) mutations have differential effects on at least two conformations. To explore the relative magnitude of ensemble epistasis, we performed a virtual deep-mutational scan of the allosteric Ca2+ signaling protein S100A4. We found that 47% of mutation pairs exhibited ensemble epistasis with a magnitude on the order of thermal fluctuations. We observed many forms of epistasis: magnitude, sign, and reciprocal sign epistasis. The same mutation pair could even exhibit different forms of epistasis under different environmental conditions. The ubiquity of thermodynamic ensembles in biology and the pervasiveness of ensemble epistasis in our dataset suggests that it may be a common mechanism of epistasis in proteins and other macromolecules.
Collapse
Affiliation(s)
- Anneliese J Morrison
- Institute of Molecular Biology, University of Oregon, Eugene, OR 97403, USA
- Department of Chemistry and Biochemistry, University of Oregon, Eugene OR 97403, USA
| | - Daria R Wonderlick
- Institute of Molecular Biology, University of Oregon, Eugene, OR 97403, USA
- Department of Chemistry and Biochemistry, University of Oregon, Eugene OR 97403, USA
| | - Michael J Harms
- Institute of Molecular Biology, University of Oregon, Eugene, OR 97403, USA
- Department of Chemistry and Biochemistry, University of Oregon, Eugene OR 97403, USA
| |
Collapse
|
36
|
Fernandez-de-Cossio-Diaz J, Uguzzoni G, Pagnani A. Unsupervised Inference of Protein Fitness Landscape from Deep Mutational Scan. Mol Biol Evol 2021; 38:318-328. [PMID: 32770229 PMCID: PMC7783173 DOI: 10.1093/molbev/msaa204] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
The recent technological advances underlying the screening of large combinatorial libraries in high-throughput mutational scans deepen our understanding of adaptive protein evolution and boost its applications in protein design. Nevertheless, the large number of possible genotypes requires suitable computational methods for data analysis, the prediction of mutational effects, and the generation of optimized sequences. We describe a computational method that, trained on sequencing samples from multiple rounds of a screening experiment, provides a model of the genotype-fitness relationship. We tested the method on five large-scale mutational scans, yielding accurate predictions of the mutational effects on fitness. The inferred fitness landscape is robust to experimental and sampling noise and exhibits high generalization power in terms of broader sequence space exploration and higher fitness variant predictions. We investigate the role of epistasis and show that the inferred model provides structural information about the 3D contacts in the molecular fold.
Collapse
Affiliation(s)
- Jorge Fernandez-de-Cossio-Diaz
- Systems Biology Department, Center of Molecular Immunology, Havana, Cuba.,Laboratory of Physics of the Ecole Normale Superieure, CNRS UMR 8023 & PSL Research, Paris, France
| | | | - Andrea Pagnani
- Politecnico di Torino, Torino, Italy.,Italian Institute for Genomic Medicine, IRCCS Candiolo, Candiolo, TO, Italy.,INFN, Sezione di Torino, Torino, Italy
| |
Collapse
|
37
|
Zeng HL, Aurell E. Inferring genetic fitness from genomic data. Phys Rev E 2021; 101:052409. [PMID: 32575265 DOI: 10.1103/physreve.101.052409] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2020] [Accepted: 05/04/2020] [Indexed: 11/07/2022]
Abstract
The genetic composition of a naturally developing population is considered as due to mutation, selection, genetic drift, and recombination. Selection is modeled as single-locus terms (additive fitness) and two-loci terms (pairwise epistatic fitness). The problem is posed to infer epistatic fitness from population-wide whole-genome data from a time series of a developing population. We generate such data in silico and show that in the quasilinkage equilibrium phase of Kimura, Neher, and Shraiman, which pertains at high enough recombination rates and low enough mutation rates, epistatic fitness can be quantitatively correctly inferred using inverse Ising-Potts methods.
Collapse
Affiliation(s)
- Hong-Li Zeng
- School of Science, and New Energy Technology Engineering Laboratory of Jiangsu Province, Nanjing University of Posts and Telecommunications, Nanjing 210023, China.,Nordita, Royal Institute of Technology, and Stockholm University, SE-10691 Stockholm, Sweden
| | - Erik Aurell
- KTH-Royal Institute of Technology, AlbaNova University Center, SE-106 91 Stockholm, Sweden.,Faculty of Physics, Astronomy and Applied Computer Science, Jagiellonian University, 30-348 Kraków, Poland
| |
Collapse
|
38
|
Eccleston RC, Pollock DD, Goldstein RA. Selection for cooperativity causes epistasis predominately between native contacts and enables epistasis-based structure reconstruction. Proc Natl Acad Sci U S A 2021; 118:e2010057118. [PMID: 33879570 PMCID: PMC8072402 DOI: 10.1073/pnas.2010057118] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Epistasis and cooperativity of folding both result from networks of energetic interactions in proteins. Epistasis results from energetic interactions among mutants, whereas cooperativity results from energetic interactions during folding that reduce the presence of intermediate states. The two concepts seem intuitively related, but it is unknown how they are related, particularly in terms of selection. To investigate their relationship, we simulated protein evolution under selection for cooperativity and separately under selection for epistasis. Strong selection for cooperativity created strong epistasis between contacts in the native structure but weakened epistasis between nonnative contacts. In contrast, selection for epistasis increased epistasis in both native and nonnative contacts and reduced cooperativity. Because epistasis can be used to predict protein structure only if it preferentially occurs in native contacts, this result indicates that selection for cooperativity may be key for predicting structure using epistasis. To evaluate this inference, we simulated the evolution of guanine nucleotide-binding protein (GB1) with and without cooperativity. With cooperativity, strong epistatic interactions clearly map out the native GB1 structure, while allowing the presence of intermediate states (low cooperativity) obscured the structure. This indicates that using epistasis measurements to reconstruct protein structure may be inappropriate for proteins with stable intermediates.
Collapse
Affiliation(s)
- R Charlotte Eccleston
- Division of Infection and Immunity, University College London, London WC1E 6BT, United Kingdom
| | - David D Pollock
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, CO 80045
| | - Richard A Goldstein
- Division of Infection and Immunity, University College London, London WC1E 6BT, United Kingdom;
| |
Collapse
|
39
|
Reddy G, Desai MM. Global epistasis emerges from a generic model of a complex trait. eLife 2021; 10:64740. [PMID: 33779543 PMCID: PMC8057814 DOI: 10.7554/elife.64740] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Accepted: 03/26/2021] [Indexed: 11/20/2022] Open
Abstract
Epistasis between mutations can make adaptation contingent on evolutionary history. Yet despite widespread ‘microscopic’ epistasis between the mutations involved, microbial evolution experiments show consistent patterns of fitness increase between replicate lines. Recent work shows that this consistency is driven in part by global patterns of diminishing-returns and increasing-costs epistasis, which make mutations systematically less beneficial (or more deleterious) on fitter genetic backgrounds. However, the origin of this ‘global’ epistasis remains unknown. Here, we show that diminishing-returns and increasing-costs epistasis emerge generically as a consequence of pervasive microscopic epistasis. Our model predicts a specific quantitative relationship between the magnitude of global epistasis and the stochastic effects of microscopic epistasis, which we confirm by reanalyzing existing data. We further show that the distribution of fitness effects takes on a universal form when epistasis is widespread and introduce a novel fitness landscape model to show how phenotypic evolution can be repeatable despite sequence-level stochasticity.
Collapse
Affiliation(s)
- Gautam Reddy
- NSF-Simons Center for Mathematical and Statistical Analysis of Biology, Harvard University, Cambridge, United States
| | - Michael M Desai
- NSF-Simons Center for Mathematical and Statistical Analysis of Biology, Harvard University, Cambridge, United States.,Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, United States.,Quantitative Biology Initiative, Harvard University, Cambridge, United States.,Department of Physics, Harvard University, Cambridge, United States
| |
Collapse
|
40
|
Schulz S, Boyer S, Smerlak M, Cocco S, Monasson R, Nizak C, Rivoire O. Parameters and determinants of responses to selection in antibody libraries. PLoS Comput Biol 2021; 17:e1008751. [PMID: 33765014 PMCID: PMC7993935 DOI: 10.1371/journal.pcbi.1008751] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Accepted: 01/31/2021] [Indexed: 12/01/2022] Open
Abstract
The sequences of antibodies from a given repertoire are highly diverse at few sites located on the surface of a genome-encoded larger scaffold. The scaffold is often considered to play a lesser role than highly diverse, non-genome-encoded sites in controlling binding affinity and specificity. To gauge the impact of the scaffold, we carried out quantitative phage display experiments where we compare the response to selection for binding to four different targets of three different antibody libraries based on distinct scaffolds but harboring the same diversity at randomized sites. We first show that the response to selection of an antibody library may be captured by two measurable parameters. Second, we provide evidence that one of these parameters is determined by the degree of affinity maturation of the scaffold, affinity maturation being the process by which antibodies accumulate somatic mutations to evolve towards higher affinities during the natural immune response. In all cases, we find that libraries of antibodies built around maturated scaffolds have a lower response to selection to other arbitrary targets than libraries built around germline-based scaffolds. We thus propose that germline-encoded scaffolds have a higher selective potential than maturated ones as a consequence of a selection for this potential over the long-term evolution of germline antibody genes. Our results are a first step towards quantifying the evolutionary potential of biomolecules. Antibodies in the immune system consist of a genetically encoded scaffold that exposes a few highly diverse, non-genetically encoded sites. This focused diversity is sufficient to produce antibodies that bind to any target molecule. To understand the role of the scaffold, which acquires hypermutations during the immune response, over the selective response, we analyze quantitative in vitro experiments where large antibody populations based on different scaffolds are selected against different targets. We show that selective responses are described statistically by two parameters, one of which depends on prior evolution of the scaffold as part of a previous response. Our work provides methods to assay whether naïve antibody scaffolds are endowed with a distinctively high selective potential.
Collapse
Affiliation(s)
- Steven Schulz
- Center for Interdisciplinary Research in Biology (CIRB), Collège de France, CNRS UMR 7241, INSERM U1050, PSL University, Paris, France
| | - Sébastien Boyer
- Département de biochimie, Faculté de Médecine, Université de Montréal, Montréal, Canada
| | - Matteo Smerlak
- Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany
| | - Simona Cocco
- Laboratory of Physics of École Normale Supérieure, UMR 8023, CNRS & PSL University, Paris, France
| | - Rémi Monasson
- Laboratory of Physics of École Normale Supérieure, UMR 8023, CNRS & PSL University, Paris, France
| | - Clément Nizak
- Laboratory of Biochemistry, CBI, UMR 8231, ESPCI Paris, PSL University, CNRS, Paris, France
- * E-mail: (CN); (OR)
| | - Olivier Rivoire
- Center for Interdisciplinary Research in Biology (CIRB), Collège de France, CNRS UMR 7241, INSERM U1050, PSL University, Paris, France
- * E-mail: (CN); (OR)
| |
Collapse
|
41
|
Abstract
An accurate estimation of the Protein Space size, in light of the factors that govern it, is a long-standing problem and of paramount importance in evolutionary biology, since it determines the nature of protein evolvability. A simple analysis will enable us to, firstly, reduce an unrealistic Protein Space size of ~ 10130 sequences, for a 100-residues polypeptide chain, to ~ 109 functional proteins and, secondly, estimate a robust average-mutation rate per amino acid (ξ ~ 1.23) and infer from it, in light of the protein marginal stability, that only a fraction of the sequence will be available at any one time for a functional protein to evolve. Although this result does not solve the Protein Space vastness problem frames it in a more rational one and illustrates the impact of the marginal stability on protein evolvability.
Collapse
|
42
|
Li X, Lehner B. Biophysical ambiguities prevent accurate genetic prediction. Nat Commun 2020; 11:4923. [PMID: 33004824 PMCID: PMC7529754 DOI: 10.1038/s41467-020-18694-0] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Accepted: 09/04/2020] [Indexed: 12/27/2022] Open
Abstract
A goal of biology is to predict how mutations combine to alter phenotypes, fitness and disease. It is often assumed that mutations combine additively or with interactions that can be predicted. Here, we show using simulations that, even for the simple example of the lambda phage transcription factor CI repressing a gene, this assumption is incorrect and that perfect measurements of the effects of mutations on a trait and mechanistic understanding can be insufficient to predict what happens when two mutations are combined. This apparent paradox arises because mutations can have different biophysical effects to cause the same change in a phenotype and the outcome in a double mutant depends upon what these hidden biophysical changes actually are. Pleiotropy and non-monotonic functions further confound prediction of how mutations interact. Accurate prediction of phenotypes and disease will sometimes not be possible unless these biophysical ambiguities can be resolved using additional measurements.
Collapse
Affiliation(s)
- Xianghua Li
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Spain
| | - Ben Lehner
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Spain. .,Universitat Pompeu Fabra (UPF), Barcelona, Spain. .,ICREA, Pg. Luis Companys 23, Barcelona, 08010, Spain.
| |
Collapse
|
43
|
Abstract
Living systems evolve one mutation at a time, but a single mutation can alter the effect of subsequent mutations. The underlying mechanistic determinants of such epistasis are unclear. Here, we demonstrate that the physical dynamics of a biological system can generically constrain epistasis. We analyze models and experimental data on proteins and regulatory networks. In each, we find that if the long-time physical dynamics is dominated by a slow, collective mode, then the dimensionality of mutational effects is reduced. Consequently, epistatic coefficients for different combinations of mutations are no longer independent, even if individually strong. Such epistasis can be summarized as resulting from a global nonlinearity applied to an underlying linear trait, that is, as global epistasis. This constraint, in turn, reduces the ruggedness of the sequence-to-function map. By providing a generic mechanistic origin for experimentally observed global epistasis, our work suggests that slow collective physical modes can make biological systems evolvable.
Collapse
Affiliation(s)
- Kabir Husain
- Department of Physics, University of Chicago, Chicago, IL
| | - Arvind Murugan
- Department of Physics, University of Chicago, Chicago, IL
| |
Collapse
|
44
|
DiMSum: an error model and pipeline for analyzing deep mutational scanning data and diagnosing common experimental pathologies. Genome Biol 2020; 21:207. [PMID: 32799905 PMCID: PMC7429474 DOI: 10.1186/s13059-020-02091-3] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2019] [Accepted: 07/05/2020] [Indexed: 12/30/2022] Open
Abstract
Deep mutational scanning (DMS) enables multiplexed measurement of the effects of thousands of variants of proteins, RNAs, and regulatory elements. Here, we present a customizable pipeline, DiMSum, that represents an end-to-end solution for obtaining variant fitness and error estimates from raw sequencing data. A key innovation of DiMSum is the use of an interpretable error model that captures the main sources of variability arising in DMS workflows, outperforming previous methods. DiMSum is available as an R/Bioconda package and provides summary reports to help researchers diagnose common DMS pathologies and take remedial steps in their analyses.
Collapse
|
45
|
Zhou J, McCandlish DM. Minimum epistasis interpolation for sequence-function relationships. Nat Commun 2020; 11:1782. [PMID: 32286265 PMCID: PMC7156698 DOI: 10.1038/s41467-020-15512-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Accepted: 03/12/2020] [Indexed: 12/17/2022] Open
Abstract
Massively parallel phenotyping assays have provided unprecedented insight into how multiple mutations combine to determine biological function. While such assays can measure phenotypes for thousands to millions of genotypes in a single experiment, in practice these measurements are not exhaustive, so that there is a need for techniques to impute values for genotypes whose phenotypes have not been directly assayed. Here, we present an imputation method based on inferring the least epistatic possible sequence-function relationship compatible with the data. In particular, we infer the reconstruction where mutational effects change as little as possible across adjacent genetic backgrounds. The resulting models can capture complex higher-order genetic interactions near the data, but approach additivity where data is sparse or absent. We apply the method to high-throughput transcription factor binding assays and use it to explore a fitness landscape for protein G.
Collapse
Affiliation(s)
- Juannan Zhou
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
| | - David M McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA.
| |
Collapse
|
46
|
Kemble H, Nghe P, Tenaillon O. Recent insights into the genotype-phenotype relationship from massively parallel genetic assays. Evol Appl 2019; 12:1721-1742. [PMID: 31548853 PMCID: PMC6752143 DOI: 10.1111/eva.12846] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2019] [Revised: 06/21/2019] [Accepted: 07/02/2019] [Indexed: 12/20/2022] Open
Abstract
With the molecular revolution in Biology, a mechanistic understanding of the genotype-phenotype relationship became possible. Recently, advances in DNA synthesis and sequencing have enabled the development of deep mutational scanning assays, capable of scoring comprehensive libraries of genotypes for fitness and a variety of phenotypes in massively parallel fashion. The resulting empirical genotype-fitness maps pave the way to predictive models, potentially accelerating our ability to anticipate the behaviour of pathogen and cancerous cell populations from sequencing data. Besides from cellular fitness, phenotypes of direct application in industry (e.g. enzyme activity) and medicine (e.g. antibody binding) can be quantified and even selected directly by these assays. This review discusses the technological basis of and recent developments in massively parallel genetics, along with the trends it is uncovering in the genotype-phenotype relationship (distribution of mutation effects, epistasis), their possible mechanistic bases and future directions for advancing towards the goal of predictive genetics.
Collapse
Affiliation(s)
- Harry Kemble
- Infection, Antimicrobials, Modelling, Evolution, INSERM, Unité Mixte de Recherche 1137Université Paris Diderot, Université Paris NordParisFrance
- École Supérieure de Physique et de Chimie Industrielles de la Ville de Paris (ESPCI Paris), UMR CNRS‐ESPCI CBI 8231PSL Research UniversityParis Cedex 05France
| | - Philippe Nghe
- École Supérieure de Physique et de Chimie Industrielles de la Ville de Paris (ESPCI Paris), UMR CNRS‐ESPCI CBI 8231PSL Research UniversityParis Cedex 05France
| | - Olivier Tenaillon
- Infection, Antimicrobials, Modelling, Evolution, INSERM, Unité Mixte de Recherche 1137Université Paris Diderot, Université Paris NordParisFrance
| |
Collapse
|
47
|
Ferrada E. Gene Families, Epistasis and the Amino Acid Preferences of Protein Homologs. Evol Bioinform Online 2019; 15:1176934319870485. [PMID: 31452598 PMCID: PMC6698995 DOI: 10.1177/1176934319870485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Accepted: 07/27/2019] [Indexed: 11/16/2022] Open
Abstract
In order to preserve structure and function, proteins tend to preferentially conserve amino acids at particular sites along the sequence. Because mutations can affect structure and function, the question arises whether the preference of a protein site for a particular amino acid varies between protein homologs, and to what extent that variation depends on sequence divergence. Answering these questions can help in the development of models of sequence evolution, as well as provide insights on the dependence of the fitness effects of mutations on the genetic background of sequences, a phenomenon known as epistasis. Here, I comment on recent computational work providing a systematic analysis of the extent to which the amino acid preferences of proteins depend on the background mutations of protein homologs.
Collapse
Affiliation(s)
- Evandro Ferrada
- Center for Genomics and Bioinformatics, Faculty of Science, Universidad Mayor, Santiago, Chile
| |
Collapse
|
48
|
Protein stability engineering insights revealed by domain-wide comprehensive mutagenesis. Proc Natl Acad Sci U S A 2019; 116:16367-16377. [PMID: 31371509 DOI: 10.1073/pnas.1903888116] [Citation(s) in RCA: 116] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
The accurate prediction of protein stability upon sequence mutation is an important but unsolved challenge in protein engineering. Large mutational datasets are required to train computational predictors, but traditional methods for collecting stability data are either low-throughput or measure protein stability indirectly. Here, we develop an automated method to generate thermodynamic stability data for nearly every single mutant in a small 56-residue protein. Analysis reveals that most single mutants have a neutral effect on stability, mutational sensitivity is largely governed by residue burial, and unexpectedly, hydrophobics are the best tolerated amino acid type. Correlating the output of various stability-prediction algorithms against our data shows that nearly all perform better on boundary and surface positions than for those in the core and are better at predicting large-to-small mutations than small-to-large ones. We show that the most stable variants in the single-mutant landscape are better identified using combinations of 2 prediction algorithms and including more algorithms can provide diminishing returns. In most cases, poor in silico predictions were tied to compositional differences between the data being analyzed and the datasets used to train the algorithm. Finally, we find that strategies to extract stabilities from high-throughput fitness data such as deep mutational scanning are promising and that data produced by these methods may be applicable toward training future stability-prediction tools.
Collapse
|
49
|
Kinney JB, McCandlish DM. Massively Parallel Assays and Quantitative Sequence-Function Relationships. Annu Rev Genomics Hum Genet 2019; 20:99-127. [PMID: 31091417 DOI: 10.1146/annurev-genom-083118-014845] [Citation(s) in RCA: 88] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Over the last decade, a rich variety of massively parallel assays have revolutionized our understanding of how biological sequences encode quantitative molecular phenotypes. These assays include deep mutational scanning, high-throughput SELEX, and massively parallel reporter assays. Here, we review these experimental methods and how the data they produce can be used to quantitatively model sequence-function relationships. In doing so, we touch on a diverse range of topics, including the identification of clinically relevant genomic variants, the modeling of transcription factor binding to DNA, the functional and evolutionary landscapes of proteins, and cis-regulatory mechanisms in both transcription and mRNA splicing. We further describe a unified conceptual framework and a core set of mathematical modeling strategies that studies in these diverse areas can make use of. Finally, we highlight key aspects of experimental design and mathematical modeling that are important for the results of such studies to be interpretable and reproducible.
Collapse
Affiliation(s)
- Justin B Kinney
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA; ,
| | - David M McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA; ,
| |
Collapse
|
50
|
Domingo J, Baeza-Centurion P, Lehner B. The Causes and Consequences of Genetic Interactions (Epistasis). Annu Rev Genomics Hum Genet 2019; 20:433-460. [PMID: 31082279 DOI: 10.1146/annurev-genom-083118-014857] [Citation(s) in RCA: 137] [Impact Index Per Article: 22.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The same mutation can have different effects in different individuals. One important reason for this is that the outcome of a mutation can depend on the genetic context in which it occurs. This dependency is known as epistasis. In recent years, there has been a concerted effort to quantify the extent of pairwise and higher-order genetic interactions between mutations through deep mutagenesis of proteins and RNAs. This research has revealed two major components of epistasis: nonspecific genetic interactions caused by nonlinearities in genotype-to-phenotype maps, and specific interactions between particular mutations. Here, we provide an overview of our current understanding of the mechanisms causing epistasis at the molecular level, the consequences of genetic interactions for evolution and genetic prediction, and the applications of epistasis for understanding biology and determining macromolecular structures.
Collapse
Affiliation(s)
- Júlia Domingo
- Systems Biology Program, Centre for Genomic Regulation, Barcelona Institute of Science and Technology, 08003 Barcelona, Spain; , ,
| | - Pablo Baeza-Centurion
- Systems Biology Program, Centre for Genomic Regulation, Barcelona Institute of Science and Technology, 08003 Barcelona, Spain; , ,
| | - Ben Lehner
- Systems Biology Program, Centre for Genomic Regulation, Barcelona Institute of Science and Technology, 08003 Barcelona, Spain; , , .,Universitat Pompeu Fabra, 08003 Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain
| |
Collapse
|