1
|
Sandhu M, Chen JZ, Matthews DS, Spence MA, Pulsford SB, Gall B, Kaczmarski JA, Nichols J, Tokuriki N, Jackson CJ. Computational and Experimental Exploration of Protein Fitness Landscapes: Navigating Smooth and Rugged Terrains. Biochemistry 2025; 64:1673-1684. [PMID: 40132127 DOI: 10.1021/acs.biochem.4c00673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/27/2025]
Abstract
Proteins evolve through complex sequence spaces, with fitness landscapes serving as a conceptual framework that links sequence to function. Fitness landscapes can be smooth, where multiple similarly accessible evolutionary paths are available, or rugged, where the presence of multiple local fitness optima complicate evolution and prediction. Indeed, many proteins, especially those with complex functions or under multiple selection pressures, exist on rugged fitness landscapes. Here we discuss the theoretical framework that underpins our understanding of fitness landscapes, alongside recent work that has advanced our understanding─particularly the biophysical basis for smoothness versus ruggedness. Finally, we address the rapid advances that have been made in computational and experimental exploration and exploitation of fitness landscapes, and how these can identify efficient routes to protein optimization.
Collapse
Affiliation(s)
- Mahakaran Sandhu
- Research School of Chemistry, Australian National University, Canberra ACT 2601, Australia
- ARC Centre of Excellence for Innovations in Peptide & Protein Science, Research School of Chemistry, Australian National University, Canberra ACT 2601, Australia
| | - John Z Chen
- Research School of Chemistry, Australian National University, Canberra ACT 2601, Australia
- ARC Centre of Excellence in Synthetic Biology, Research School of Biology, Australian National University, Canberra ACT 2601, Australia
| | - Dana S Matthews
- Research School of Chemistry, Australian National University, Canberra ACT 2601, Australia
- ARC Centre of Excellence for Innovations in Peptide & Protein Science, Research School of Chemistry, Australian National University, Canberra ACT 2601, Australia
| | - Matthew A Spence
- Research School of Chemistry, Australian National University, Canberra ACT 2601, Australia
- ARC Centre of Excellence for Innovations in Peptide & Protein Science, Research School of Chemistry, Australian National University, Canberra ACT 2601, Australia
| | - Sacha B Pulsford
- Research School of Chemistry, Australian National University, Canberra ACT 2601, Australia
- ARC Centre of Excellence for Innovations in Peptide & Protein Science, Research School of Chemistry, Australian National University, Canberra ACT 2601, Australia
| | - Barnabas Gall
- Research School of Chemistry, Australian National University, Canberra ACT 2601, Australia
- ARC Centre of Excellence for Innovations in Peptide & Protein Science, Research School of Chemistry, Australian National University, Canberra ACT 2601, Australia
| | - Joe A Kaczmarski
- Research School of Chemistry, Australian National University, Canberra ACT 2601, Australia
- ARC Centre of Excellence in Synthetic Biology, Research School of Biology, Australian National University, Canberra ACT 2601, Australia
| | - James Nichols
- Biological Data Science Institute, Australian National University, Canberra ACT 2601, Australia
| | - Nobuhiko Tokuriki
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
| | - Colin J Jackson
- Research School of Chemistry, Australian National University, Canberra ACT 2601, Australia
- ARC Centre of Excellence for Innovations in Peptide & Protein Science, Research School of Chemistry, Australian National University, Canberra ACT 2601, Australia
- Biological Data Science Institute, Australian National University, Canberra ACT 2601, Australia
- ARC Centre of Excellence in Synthetic Biology, Research School of Biology, Australian National University, Canberra ACT 2601, Australia
| |
Collapse
|
2
|
Martí-Gómez C, Zhou J, Chen WC, Kinney JB, McCandlish DM. Inference and visualization of complex genotype-phenotype maps with gpmap-tools. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.03.09.642267. [PMID: 40161830 PMCID: PMC11952336 DOI: 10.1101/2025.03.09.642267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/02/2025]
Abstract
Multiplex assays of variant effect (MAVEs) allow the functional characterization of an unprecedented number of sequence variants in both gene regulatory regions and protein coding sequences. This has enabled the study of nearly complete combinatorial libraries of mutational variants and revealed the widespread influence of higher-order genetic interactions that arise when multiple mutations are combined. However, the lack of appropriate tools for exploratory analysis of this high-dimensional data limits our overall understanding of the main qualitative properties of complex genotype-phenotype maps. To fill this gap, we have developed gpmap-tools (https://github.com/cmarti/gpmap-tools), a python library that integrates Gaussian process models for inference, phenotypic imputation, and error estimation from incomplete and noisy MAVE data and collections of natural sequences, together with methods for summarizing patterns of higher-order epistasis and non-linear dimensionality reduction techniques that allow visualization of genotype-phenotype maps containing up to millions of genotypes. Here, we used gpmap-tools to study the genotype-phenotype map of the Shine-Dalgarno sequence, a motif that modulates binding of the 16S rRNA to the 5' untranslated region (UTR) of mRNAs through base pair complementarity during translation initiation in prokaryotes. We inferred full combinatorial landscapes containing 262,144 different sequences from the sequences of 5,311 5'UTRs in the E. coli genome and from experimental MAVE data. Visualizations of the inferred landscapes were largely consistent with each other, and unveiled a simple molecular mechanism underlying the highly epistatic genotype-phenotype map of the Shine-Dalgarno sequence.
Collapse
Affiliation(s)
- Carlos Martí-Gómez
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| | - Juannan Zhou
- Department of Biology, University of Florida, Gainesville, FL, 32611
| | - Wei-Chia Chen
- Department of Physics, National Chung Cheng University, Chiayi 62102, Taiwan, Republic of China
| | - Justin B Kinney
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| | - David M McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| |
Collapse
|
3
|
Dieckhaus H, Kuhlman B. Protein stability models fail to capture epistatic interactions of double point mutations. Protein Sci 2025; 34:e70003. [PMID: 39704075 DOI: 10.1002/pro.70003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2024] [Revised: 11/06/2024] [Accepted: 12/05/2024] [Indexed: 12/21/2024]
Abstract
There is strong interest in accurate methods for predicting changes in protein stability resulting from amino acid mutations to the protein sequence. Recombinant proteins must often be stabilized to be used as therapeutics or reagents, and destabilizing mutations are implicated in a variety of diseases. Due to increased data availability and improved modeling techniques, recent studies have shown advancements in predicting changes in protein stability when a single-point mutation is made. Less focus has been directed toward predicting changes in protein stability when there are two or more mutations. Here, we analyze the largest available dataset of double point mutation stability and benchmark several widely used protein stability models on this and other datasets. We find that additive models of protein stability perform surprisingly well on this task, achieving similar performance to comparable non-additive predictors according to most metrics. Accordingly, we find that neither artificial intelligence-based nor physics-based protein stability models consistently capture epistatic interactions between single mutations. We observe one notable deviation from this trend, which is that epistasis-aware models provide marginally better predictions than additive models on stabilizing double point mutations. We develop an extension of the ThermoMPNN framework for double mutant modeling, as well as a novel data augmentation scheme, which mitigates some of the limitations in currently available datasets. Collectively, our findings indicate that current protein stability models fail to capture the nuanced epistatic interactions between concurrent mutations due to several factors, including training dataset limitations and insufficient model sensitivity.
Collapse
Affiliation(s)
- Henry Dieckhaus
- Department of Biochemistry and Biophysics, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
- Division of Chemical Biology and Medicinal Chemistry, University of North Carolina Eshelman School of Pharmacy, Chapel Hill, North Carolina, USA
| | - Brian Kuhlman
- Department of Biochemistry and Biophysics, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
- Department of Bioinformatics and Computational Biology, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
- Lineberger Comprehensive Cancer Center, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
| |
Collapse
|
4
|
Zhu XX, Zheng WQ, Xia ZW, Chen XR, Jin T, Ding XW, Chen FF, Chen Q, Xu JH, Kong XD, Zheng GW. Evolutionary insights into the stereoselectivity of imine reductases based on ancestral sequence reconstruction. Nat Commun 2024; 15:10330. [PMID: 39609402 PMCID: PMC11605051 DOI: 10.1038/s41467-024-54613-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Accepted: 11/14/2024] [Indexed: 11/30/2024] Open
Abstract
The stereoselectivity of enzymes plays a central role in asymmetric biocatalytic reactions, but there remains a dearth of evolution-driven biochemistry studies investigating the evolutionary trajectory of this vital property. Imine reductases (IREDs) are one such enzyme that possesses excellent stereoselectivity, and stereocomplementary members are pervasive in the family. However, the regulatory mechanism behind stereocomplementarity remains cryptic. Herein, we reconstruct a panel of active ancestral IREDs and trace the evolution of stereoselectivity from ancestors to extant IREDs. Combined with coevolution analysis, we reveal six historical mutations capable of recapitulating stereoselectivity evolution. An investigation of the mechanism with X-ray crystallography shows that they collectively reshape the substrate-binding pocket to regulate stereoselectivity inversion. In addition, we construct an empirical fitness landscape and discover that epistasis is prevalent in stereoselectivity evolution. Our findings emphasize the power of ASR in circumventing the time-consuming large-scale mutagenesis library screening for identifying mutations that change functions and support a Darwinian premise from a molecular perspective that the evolution of biological functions is a stepwise process.
Collapse
Affiliation(s)
- Xin-Xin Zhu
- State Key Laboratory of Bioreactor Engineering, Shanghai Collaborative Innovation Center for Biomanufacturing, East China University of Science and Technology, Shanghai, China
| | - Wen-Qing Zheng
- State Key Laboratory of Bioreactor Engineering, Shanghai Collaborative Innovation Center for Biomanufacturing, East China University of Science and Technology, Shanghai, China
| | - Zi-Wei Xia
- State Key Laboratory of Bioreactor Engineering, Shanghai Collaborative Innovation Center for Biomanufacturing, East China University of Science and Technology, Shanghai, China
| | - Xin-Ru Chen
- State Key Laboratory of Bioreactor Engineering, Shanghai Collaborative Innovation Center for Biomanufacturing, East China University of Science and Technology, Shanghai, China
| | - Tian Jin
- State Key Laboratory of Bioreactor Engineering, Shanghai Collaborative Innovation Center for Biomanufacturing, East China University of Science and Technology, Shanghai, China
| | - Xu-Wei Ding
- State Key Laboratory of Bioreactor Engineering, Shanghai Collaborative Innovation Center for Biomanufacturing, East China University of Science and Technology, Shanghai, China
| | - Fei-Fei Chen
- State Key Laboratory of Bioreactor Engineering, Shanghai Collaborative Innovation Center for Biomanufacturing, East China University of Science and Technology, Shanghai, China
| | - Qi Chen
- State Key Laboratory of Bioreactor Engineering, Shanghai Collaborative Innovation Center for Biomanufacturing, East China University of Science and Technology, Shanghai, China
| | - Jian-He Xu
- State Key Laboratory of Bioreactor Engineering, Shanghai Collaborative Innovation Center for Biomanufacturing, East China University of Science and Technology, Shanghai, China
| | - Xu-Dong Kong
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, and Zhangjiang Institute for Advanced Study, Shanghai Jiao Tong University, Shanghai, China.
| | - Gao-Wei Zheng
- State Key Laboratory of Bioreactor Engineering, Shanghai Collaborative Innovation Center for Biomanufacturing, East China University of Science and Technology, Shanghai, China.
| |
Collapse
|
5
|
Vila JA. The origin of mutational epistasis. EUROPEAN BIOPHYSICS JOURNAL : EBJ 2024; 53:473-480. [PMID: 39443382 DOI: 10.1007/s00249-024-01725-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Revised: 10/03/2024] [Accepted: 10/06/2024] [Indexed: 10/25/2024]
Abstract
The interconnected processes of protein folding, mutations, epistasis, and evolution have all been the subject of extensive analysis throughout the years due to their significance for structural and evolutionary biology. The origin (molecular basis) of epistasis-the non-additive interactions between mutations-is still, nonetheless, unknown. The existence of a new perspective on protein folding, a problem that needs to be conceived as an 'analytic whole', will enable us to shed light on the origin of mutational epistasis at the simplest level-within proteins-while also uncovering the reasons why the genetic background in which they occur, a key component of molecular evolution, could foster changes in epistasis effects. Additionally, because mutations are the source of epistasis, more research is needed to determine the impact of post-translational modifications, which can potentially increase the proteome's diversity by several orders of magnitude, on mutational epistasis and protein evolvability. Finally, a protein evolution thermodynamic-based analysis that does not consider specific mutational steps or epistasis effects will be briefly discussed. Our study explores the complex processes behind the evolution of proteins upon mutations, clearing up some previously unresolved issues, and providing direction for further research.
Collapse
Affiliation(s)
- Jorge A Vila
- IMASL-CONICET, Ejército de Los Andes 950, 5700, San Luis, Argentina.
| |
Collapse
|
6
|
Shimagaki KS, Barton JP. Efficient epistasis inference via higher-order covariance matrix factorization. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.14.618287. [PMID: 39464126 PMCID: PMC11507688 DOI: 10.1101/2024.10.14.618287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/29/2024]
Abstract
Epistasis can profoundly influence evolutionary dynamics. Temporal genetic data, consisting of sequences sampled repeatedly from a population over time, provides a unique resource to understand how epistasis shapes evolution. However, detecting epistatic interactions from sequence data is technically challenging. Existing methods for identifying epistasis are computationally demanding, limiting their applicability to real-world data. Here, we present a novel computational method for inferring epistasis that significantly reduces computational costs without sacrificing accuracy. We validated our approach in simulations and applied it to study HIV-1 evolution over multiple years in a data set of 16 individuals. There we observed a strong excess of negative epistatic interactions between beneficial mutations, especially mutations involved in immune escape. Our method is general and could be used to characterize epistasis in other large data sets.
Collapse
Affiliation(s)
- Kai S. Shimagaki
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, USA
- Department of Physics and Astronomy, University of Pittsburgh, USA
| | - John P. Barton
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, USA
- Department of Physics and Astronomy, University of Pittsburgh, USA
| |
Collapse
|
7
|
Lipsh-Sokolik R, Fleishman SJ. Addressing epistasis in the design of protein function. Proc Natl Acad Sci U S A 2024; 121:e2314999121. [PMID: 39133844 PMCID: PMC11348311 DOI: 10.1073/pnas.2314999121] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/29/2024] Open
Abstract
Mutations in protein active sites can dramatically improve function. The active site, however, is densely packed and extremely sensitive to mutations. Therefore, some mutations may only be tolerated in combination with others in a phenomenon known as epistasis. Epistasis reduces the likelihood of obtaining improved functional variants and dramatically slows natural and lab evolutionary processes. Research has shed light on the molecular origins of epistasis and its role in shaping evolutionary trajectories and outcomes. In addition, sequence- and AI-based strategies that infer epistatic relationships from mutational patterns in natural or experimental evolution data have been used to design functional protein variants. In recent years, combinations of such approaches and atomistic design calculations have successfully predicted highly functional combinatorial mutations in active sites. These were used to design thousands of functional active-site variants, demonstrating that, while our understanding of epistasis remains incomplete, some of the determinants that are critical for accurate design are now sufficiently understood. We conclude that the space of active-site variants that has been explored by evolution may be expanded dramatically to enhance natural activities or discover new ones. Furthermore, design opens the way to systematically exploring sequence and structure space and mutational impacts on function, deepening our understanding and control over protein activity.
Collapse
Affiliation(s)
- Rosalie Lipsh-Sokolik
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Sarel J Fleishman
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot 7610001, Israel
| |
Collapse
|
8
|
Vila JA. Analysis of proteins in the light of mutations. EUROPEAN BIOPHYSICS JOURNAL : EBJ 2024; 53:255-265. [PMID: 38955858 DOI: 10.1007/s00249-024-01714-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 05/23/2024] [Accepted: 06/18/2024] [Indexed: 07/04/2024]
Abstract
Proteins have evolved through mutations-amino acid substitutions-since life appeared on Earth, some 109 years ago. The study of these phenomena has been of particular significance because of their impact on protein stability, function, and structure. This study offers a new viewpoint on how the most recent findings in these areas can be used to explore the impact of mutations on protein sequence, stability, and evolvability. Preliminary results indicate that: (1) mutations can be viewed as sensitive probes to identify 'typos' in the amino-acid sequence, and also to assess the resistance of naturally occurring proteins to unwanted sequence alterations; (2) the presence of 'typos' in the amino acid sequence, rather than being an evolutionary obstacle, could promote faster evolvability and, in turn, increase the likelihood of higher protein stability; (3) the mutation site is far more important than the substituted amino acid in terms of the marginal stability changes of the protein, and (4) the unpredictability of protein evolution at the molecular level-by mutations-exists even in the absence of epistasis effects. Finally, the Darwinian concept of evolution "descent with modification" and experimental evidence endorse one of the results of this study, which suggests that some regions of any protein sequence are susceptible to mutations while others are not. This work contributes to our general understanding of protein responses to mutations and may spur significant progress in our efforts to develop methods to accurately forecast changes in protein stability, their propensity for metamorphism, and their ability to evolve.
Collapse
Affiliation(s)
- Jorge A Vila
- IMASL-CONICET, Universidad Nacional de San Luis, Ejército de los Andes 950, 5700, San Luis, Argentina.
| |
Collapse
|
9
|
Chitra U, Arnold BJ, Raphael BJ. Quantifying higher-order epistasis: beware the chimera. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.17.603976. [PMID: 39071303 PMCID: PMC11275791 DOI: 10.1101/2024.07.17.603976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
Epistasis, or interactions in which alleles at one locus modify the fitness effects of alleles at other loci, plays a fundamental role in genetics, protein evolution, and many other areas of biology. Epistasis is typically quantified by computing the deviation from the expected fitness under an additive or multiplicative model using one of several formulae. However, these formulae are not all equivalent. Importantly, one widely used formula - which we call the chimeric formula - measures deviations from a multiplicative fitness model on an additive scale, thus mixing two measurement scales. We show that for pairwise interactions, the chimeric formula yields a different magnitude, but the same sign (synergistic vs. antagonistic) of epistasis compared to the multiplicative formula that measures both fitness and deviations on a multiplicative scale. However, for higher-order interactions, we show that the chimeric formula can have both different magnitude and sign compared to the multiplicative formula - thus confusing negative epistatic interactions with positive interactions, and vice versa. We resolve these inconsistencies by deriving fundamental connections between the different epistasis formulae and the parameters of the multivariate Bernoulli distribution . Our results demonstrate that the additive and multiplicative epistasis formulae are more mathematically sound than the chimeric formula. Moreover, we demonstrate that the mathematical issues with the chimeric epistasis formula lead to markedly different biological interpretations of real data. Analyzing multi-gene knockout data in yeast, multi-way drug interactions in E. coli , and deep mutational scanning (DMS) of several proteins, we find that 10 - 60% of higher-order interactions have a change in sign with the multiplicative or additive epistasis formula. These sign changes result in qualitatively different findings on functional divergence in the yeast genome, synergistic vs. antagonistic drug interactions, and and epistasis between protein mutations. In particular, in the yeast data, the more appropriate multiplicative formula identifies nearly 500 additional negative three-way interactions, thus extending the trigenic interaction network by 25%.
Collapse
|
10
|
Diaz-Colunga J, Skwara A, Vila JCC, Bajic D, Sanchez A. Global epistasis and the emergence of function in microbial consortia. Cell 2024; 187:3108-3119.e30. [PMID: 38776921 DOI: 10.1016/j.cell.2024.04.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 12/06/2023] [Accepted: 04/16/2024] [Indexed: 05/25/2024]
Abstract
The many functions of microbial communities emerge from a complex web of interactions between organisms and their environment. This poses a significant obstacle to engineering microbial consortia, hindering our ability to harness the potential of microorganisms for biotechnological applications. In this study, we demonstrate that the collective effect of ecological interactions between microbes in a community can be captured by simple statistical models that predict how adding a new species to a community will affect its function. These predictive models mirror the patterns of global epistasis reported in genetics, and they can be quantitatively interpreted in terms of pairwise interactions between community members. Our results illuminate an unexplored path to quantitatively predicting the function of microbial consortia from their composition, paving the way to optimizing desirable community properties and bringing the tasks of predicting biological function at the genetic, organismal, and ecological scales under the same quantitative formalism.
Collapse
Affiliation(s)
- Juan Diaz-Colunga
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06511, USA; Microbial Sciences Institute, Yale University, New Haven, CT 06511, USA; Department of Microbial Biotechnology, National Center for Biotechnology CNB-CSIC, 28049 Madrid, Spain; Institute of Functional Biology and Genomics IBFG-CSIC, University of Salamanca, 37007 Salamanca, Spain.
| | - Abigail Skwara
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06511, USA; Microbial Sciences Institute, Yale University, New Haven, CT 06511, USA
| | - Jean C C Vila
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06511, USA; Microbial Sciences Institute, Yale University, New Haven, CT 06511, USA; Department of Biology, Stanford University, Stanford, CA 94305, USA
| | - Djordje Bajic
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06511, USA; Microbial Sciences Institute, Yale University, New Haven, CT 06511, USA; Department of Biotechnology, Delft University of Technology, Delft 2628 CD, the Netherlands.
| | - Alvaro Sanchez
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06511, USA; Microbial Sciences Institute, Yale University, New Haven, CT 06511, USA; Department of Microbial Biotechnology, National Center for Biotechnology CNB-CSIC, 28049 Madrid, Spain; Institute of Functional Biology and Genomics IBFG-CSIC, University of Salamanca, 37007 Salamanca, Spain.
| |
Collapse
|
11
|
Metzger BPH, Park Y, Starr TN, Thornton JW. Epistasis facilitates functional evolution in an ancient transcription factor. eLife 2024; 12:RP88737. [PMID: 38767330 PMCID: PMC11105156 DOI: 10.7554/elife.88737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/22/2024] Open
Abstract
A protein's genetic architecture - the set of causal rules by which its sequence produces its functions - also determines its possible evolutionary trajectories. Prior research has proposed that the genetic architecture of proteins is very complex, with pervasive epistatic interactions that constrain evolution and make function difficult to predict from sequence. Most of this work has analyzed only the direct paths between two proteins of interest - excluding the vast majority of possible genotypes and evolutionary trajectories - and has considered only a single protein function, leaving unaddressed the genetic architecture of functional specificity and its impact on the evolution of new functions. Here, we develop a new method based on ordinal logistic regression to directly characterize the global genetic determinants of multiple protein functions from 20-state combinatorial deep mutational scanning (DMS) experiments. We use it to dissect the genetic architecture and evolution of a transcription factor's specificity for DNA, using data from a combinatorial DMS of an ancient steroid hormone receptor's capacity to activate transcription from two biologically relevant DNA elements. We show that the genetic architecture of DNA recognition consists of a dense set of main and pairwise effects that involve virtually every possible amino acid state in the protein-DNA interface, but higher-order epistasis plays only a tiny role. Pairwise interactions enlarge the set of functional sequences and are the primary determinants of specificity for different DNA elements. They also massively expand the number of opportunities for single-residue mutations to switch specificity from one DNA target to another. By bringing variants with different functions close together in sequence space, pairwise epistasis therefore facilitates rather than constrains the evolution of new functions.
Collapse
Affiliation(s)
- Brian PH Metzger
- Department of Ecology and Evolution, University of ChicagoChicagoUnited States
| | - Yeonwoo Park
- Program in Genetics, Genomics, and Systems Biology, University of ChicagoChicagoUnited States
| | - Tyler N Starr
- Department of Biochemistry and Molecular Biophysics, University of ChicagoChicagoUnited States
| | - Joseph W Thornton
- Department of Ecology and Evolution, University of ChicagoChicagoUnited States
- Department of Human Genetics, University of ChicagoChicagoUnited States
| |
Collapse
|
12
|
Dupic T, Phillips AM, Desai MM. Protein sequence landscapes are not so simple: on reference-free versus reference-based inference. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.29.577800. [PMID: 38352387 PMCID: PMC10862727 DOI: 10.1101/2024.01.29.577800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
In a recent preprint, Park, Metzger, and Thornton reanalyze 20 empirical protein sequence-function landscapes using a "reference-free analysis" (RFA) method they recently developed. They argue that these empirical landscapes are simpler and less epistatic than earlier work suggested, and attribute the difference to limitations of the methods used in the original analyses of these landscapes, which they claim are more sensitive to measurement noise, missing data, and other artifacts. Here, we show that these claims are incorrect. Instead, we find that the RFA method introduced by Park et al. is exactly equivalent to the reference-based least-squares methods used in the original analysis of many of these empirical landscapes (and also equivalent to a Hadamard-based approach they implement). Because the reanalyzed and original landscapes are in fact identical, the different conclusions drawn by Park et al. instead reflect different interpretations of the parameters describing the inferred landscapes; we argue that these do not support the conclusion that epistasis plays only a small role in protein sequence-function landscapes.
Collapse
Affiliation(s)
- Thomas Dupic
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA
| | - Angela M Phillips
- Department of Microbiology and Immunology, University of California San Francisco, San Francisco CA
| | - Michael M Desai
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA
| |
Collapse
|
13
|
Buda K, Miton CM, Tokuriki N. Pervasive epistasis exposes intramolecular networks in adaptive enzyme evolution. Nat Commun 2023; 14:8508. [PMID: 38129396 PMCID: PMC10739712 DOI: 10.1038/s41467-023-44333-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2023] [Accepted: 12/08/2023] [Indexed: 12/23/2023] Open
Abstract
Enzyme evolution is characterized by constant alterations of the intramolecular residue networks supporting their functions. The rewiring of these network interactions can give rise to epistasis. As mutations accumulate, the epistasis observed across diverse genotypes may appear idiosyncratic, that is, exhibit unique effects in different genetic backgrounds. Here, we unveil a quantitative picture of the prevalence and patterns of epistasis in enzyme evolution by analyzing 41 fitness landscapes generated from seven enzymes. We show that >94% of all mutational and epistatic effects appear highly idiosyncratic, which greatly distorted the functional prediction of the evolved enzymes. By examining seemingly idiosyncratic changes in epistasis along adaptive trajectories, we expose several instances of higher-order, intramolecular rewiring. Using complementary structural data, we outline putative molecular mechanisms explaining higher-order epistasis along two enzyme trajectories. Our work emphasizes the prevalence of epistasis and provides an approach to exploring this phenomenon through a molecular lens.
Collapse
Affiliation(s)
- Karol Buda
- Michael Smith Laboratories, University of British Columbia, Vancouver, Canada
| | - Charlotte M Miton
- Michael Smith Laboratories, University of British Columbia, Vancouver, Canada
| | - Nobuhiko Tokuriki
- Michael Smith Laboratories, University of British Columbia, Vancouver, Canada.
| |
Collapse
|
14
|
Eble H, Joswig M, Lamberti L, Ludington WB. Master regulators of biological systems in higher dimensions. Proc Natl Acad Sci U S A 2023; 120:e2300634120. [PMID: 38096409 PMCID: PMC10743376 DOI: 10.1073/pnas.2300634120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Accepted: 10/23/2023] [Indexed: 12/18/2023] Open
Abstract
A longstanding goal of biology is to identify the key genes and species that critically impact evolution, ecology, and health. Network analysis has revealed keystone species that regulate ecosystems and master regulators that regulate cellular genetic networks. Yet these studies have focused on pairwise biological interactions, which can be affected by the context of genetic background and other species present, generating higher-order interactions. The important regulators of higher-order interactions are unstudied. To address this, we applied a high-dimensional geometry approach that quantifies epistasis in a fitness landscape to ask how individual genes and species influence the interactions in the rest of the biological network. We then generated and also reanalyzed 5-dimensional datasets (two genetic, two microbiome). We identified key genes (e.g., the rbs locus and pykF) and species (e.g., Lactobacilli) that control the interactions of many other genes and species. These higher-order master regulators can induce or suppress evolutionary and ecological diversification by controlling the topography of the fitness landscape. Thus, we provide a method and mathematical justification for exploration of biological networks in higher dimensions.
Collapse
Affiliation(s)
- Holger Eble
- Chair of Discrete Mathematics/Geometry, Technical University Berlin, Berlin10623, Germany
| | - Michael Joswig
- Chair of Discrete Mathematics/Geometry, Technical University Berlin, Berlin10623, Germany
- Max Planck Institute for Mathematics in the Sciences, Leipzig04103, Germany
| | - Lisa Lamberti
- Department of Biosystems Science and Engineering, Federal Institute of Technology (ETH Zürich), Basel4058, Switzerland
- Swiss Institute of Bioinformatics, Basel4058, Switzerland
| | - William B. Ludington
- Department of Biosphere Sciences and Engineering, Carnegie Institution for Science, Baltimore, MD21218
- Department of Biology, Johns Hopkins University, Baltimore, MD21218
| |
Collapse
|
15
|
Carpenter AC, Feist AM, Harrison FS, Paulsen IT, Williams TC. Have you tried turning it off and on again? Oscillating selection to enhance fitness-landscape traversal in adaptive laboratory evolution experiments. Metab Eng Commun 2023; 17:e00227. [PMID: 37538933 PMCID: PMC10393799 DOI: 10.1016/j.mec.2023.e00227] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 06/05/2023] [Accepted: 07/11/2023] [Indexed: 08/05/2023] Open
Abstract
Adaptive Laboratory Evolution (ALE) is a powerful tool for engineering and understanding microbial physiology. ALE relies on the selection and enrichment of mutations that enable survival or faster growth under a selective condition imposed by the experimental setup. Phenotypic fitness landscapes are often underpinned by complex genotypes involving multiple genes, with combinatorial positive and negative effects on fitness. Such genotype relationships result in mutational fitness landscapes with multiple local fitness maxima and valleys. Traversing local maxima to find a global maximum often requires an individual or sub-population of cells to traverse fitness valleys. Traversing involves gaining mutations that are not adaptive for a given local maximum but are necessary to 'peak shift' to another local maximum, or eventually a global maximum. Despite these relatively well understood evolutionary principles, and the combinatorial genotypes that underlie most metabolic phenotypes, the majority of applied ALE experiments are conducted using constant selection pressures. The use of constant pressure can result in populations becoming trapped within local maxima, and often precludes the attainment of optimum phenotypes associated with global maxima. Here, we argue that oscillating selection pressures is an easily accessible mechanism for traversing fitness landscapes in ALE experiments, and provide theoretical and practical frameworks for implementation.
Collapse
Affiliation(s)
- Alexander C. Carpenter
- Department of Molecular Sciences and ARC Centre of Excellence in Synthetic Biology, Centre Headquarters, Macquarie University, Sydney, SW, 2109, Australia
- CSIRO Synthetic Biology Future Science Platform, Canberra, ACT, 2601, Australia
| | - Adam M. Feist
- Department of Bioengineering, University of California San Diego, 9500 Gilman Dr., La Jolla, CA, 92093, USA
- Joint BioEnergy Institute, 5885 Hollis Street, 4th Floor, Emeryville, CA, 94608, USA
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, 2800, Kgs, Lyngby, Denmark
| | - Fergus S.M. Harrison
- Department of Molecular Sciences and ARC Centre of Excellence in Synthetic Biology, Centre Headquarters, Macquarie University, Sydney, SW, 2109, Australia
| | - Ian T. Paulsen
- Department of Molecular Sciences and ARC Centre of Excellence in Synthetic Biology, Centre Headquarters, Macquarie University, Sydney, SW, 2109, Australia
| | - Thomas C. Williams
- Department of Molecular Sciences and ARC Centre of Excellence in Synthetic Biology, Centre Headquarters, Macquarie University, Sydney, SW, 2109, Australia
- CSIRO Synthetic Biology Future Science Platform, Canberra, ACT, 2601, Australia
| |
Collapse
|
16
|
Santorsola M, Lescai F. The promise of explainable deep learning for omics data analysis: Adding new discovery tools to AI. N Biotechnol 2023; 77:1-11. [PMID: 37329982 DOI: 10.1016/j.nbt.2023.06.002] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 06/01/2023] [Accepted: 06/14/2023] [Indexed: 06/19/2023]
Abstract
Deep learning has already revolutionised the way a wide range of data is processed in many areas of daily life. The ability to learn abstractions and relationships from heterogeneous data has provided impressively accurate prediction and classification tools to handle increasingly big datasets. This has a significant impact on the growing wealth of omics datasets, with the unprecedented opportunity for a better understanding of the complexity of living organisms. While this revolution is transforming the way these data are analyzed, explainable deep learning is emerging as an additional tool with the potential to change the way biological data is interpreted. Explainability addresses critical issues such as transparency, so important when computational tools are introduced especially in clinical environments. Moreover, it empowers artificial intelligence with the capability to provide new insights into the input data, thus adding an element of discovery to these already powerful resources. In this review, we provide an overview of the transformative effects explainable deep learning is having on multiple sectors, ranging from genome engineering and genomics, from radiomics to drug design and clinical trials. We offer a perspective to life scientists, to better understand the potential of these tools, and a motivation to implement them in their research, by suggesting learning resources they can use to move their first steps in this field.
Collapse
Affiliation(s)
| | - Francesco Lescai
- Department of Biology and Biotechnology, University of Pavia, Pavia, Italy.
| |
Collapse
|
17
|
Chen L, Zhang Z, Li Z, Li R, Huo R, Chen L, Wang D, Luo X, Chen K, Liao C, Zheng M. Learning protein fitness landscapes with deep mutational scanning data from multiple sources. Cell Syst 2023; 14:706-721.e5. [PMID: 37591206 DOI: 10.1016/j.cels.2023.07.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 05/30/2023] [Accepted: 07/18/2023] [Indexed: 08/19/2023]
Abstract
One of the key points of machine learning-assisted directed evolution (MLDE) is the accurate learning of the fitness landscape, a conceptual mapping from sequence variants to the desired function. Here, we describe a multi-protein training scheme that leverages the existing deep mutational scanning data from diverse proteins to aid in understanding the fitness landscape of a new protein. Proof-of-concept trials are designed to validate this training scheme in three aspects: random and positional extrapolation for single-variant effects, zero-shot fitness predictions for new proteins, and extrapolation for higher-order variant effects from single-variant effects. Moreover, our study identified previously overlooked strong baselines, and their unexpectedly good performance brings our attention to the pitfalls of MLDE. Overall, these results may improve our understanding of the association between different protein fitness profiles and shed light on developing better machine learning-assisted approaches to the directed evolution of proteins. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- Lin Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zehong Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhenghao Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; Shanghai Institute for Advanced Immunochemical Studies, School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China
| | - Rui Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; School of Pharmacy, China Pharmaceutical University, Nanjing 211198, China
| | - Ruifeng Huo
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing 210023, China
| | - Lifan Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | | | - Xiaomin Luo
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Kaixian Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; University of Chinese Academy of Sciences, Beijing 100049, China; School of Pharmacy, China Pharmaceutical University, Nanjing 211198, China
| | - Cangsong Liao
- University of Chinese Academy of Sciences, Beijing 100049, China; Chemical Biology Research Center, Shanghai Institute of Materia Medica, Chinese Academy of Science, Shanghai 201203, China.
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; University of Chinese Academy of Sciences, Beijing 100049, China; School of Pharmacy, China Pharmaceutical University, Nanjing 211198, China; School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing 210023, China.
| |
Collapse
|
18
|
Vila JA. Protein folding rate evolution upon mutations. Biophys Rev 2023; 15:661-669. [PMID: 37681091 PMCID: PMC10480377 DOI: 10.1007/s12551-023-01088-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Accepted: 06/24/2023] [Indexed: 09/09/2023] Open
Abstract
Despite the spectacular success of cutting-edge protein fold prediction methods, many critical questions remain unanswered, including why proteins can reach their native state in a biologically reasonable time. A satisfactory answer to this simple question could shed light on the slowest folding rate of proteins as well as how mutations-amino-acid substitutions and/or post-translational modifications-might affect it. Preliminary results indicate that (i) Anfinsen's dogma validity ensures that proteins reach their native state on a reasonable timescale regardless of their sequence or length, and (ii) it is feasible to determine the evolution of protein folding rates without accounting for epistasis effects or the mutational trajectories between the starting and target sequences. These results have direct implications for evolutionary biology because they lay the groundwork for a better understanding of why, and to what extent, mutations-a crucial element of evolution and a factor influencing it-affect protein evolvability. Furthermore, they may spur significant progress in our efforts to solve crucial structural biology problems, such as how a sequence encodes its folding.
Collapse
Affiliation(s)
- Jorge A. Vila
- IMASL-CONICET, Universidad Nacional de San Luis, Ejército de Los Andes 950, 5700 San Luis, Argentina
| |
Collapse
|
19
|
Densi A, Iyer RS, Bhat PJ. Synonymous and Nonsynonymous Substitutions in Dictyostelium discoideum Ammonium Transporter amtA Are Necessary for Functional Complementation in Saccharomyces cerevisiae. Microbiol Spectr 2023; 11:e0384722. [PMID: 36840598 PMCID: PMC10100761 DOI: 10.1128/spectrum.03847-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Accepted: 01/24/2023] [Indexed: 02/24/2023] Open
Abstract
Ammonium transporters are present in all three domains of life. They have undergone extensive horizontal gene transfer (HGT), gene duplication, and functional diversification and therefore offer an excellent paradigm to study protein evolution. We attempted to complement a mep1Δmep2Δmep3Δ strain of Saccharomyces cerevisiae (triple-deletion strain), which otherwise cannot grow on ammonium as a sole nitrogen source at concentrations of <3 mM, with amtA of Dictyostelium discoideum, an orthologue of S. cerevisiae MEP2. We observed that amtA did not complement the triple-deletion strain of S. cerevisiae for growth on low-ammonium medium. We isolated two mutant derivatives of amtA (amtA M1 and amtA M2) from a PCR-generated mutant plasmid library that complemented the triple-deletion strain of S. cerevisiae. amtA M1 bears three nonsynonymous and two synonymous substitutions, which are necessary for its functionality. amtA M2 bears two nonsynonymous substitutions and one synonymous substitution, all of which are necessary for functionality. Interestingly, AmtA M1 transports ammonium but does not confer methylamine toxicity, while AmtA M2 transports ammonium and confers methylamine toxicity, demonstrating functional diversification. Preliminary biochemical analyses indicated that the mutants differ in their conformations as well as their mechanisms of ammonium transport. These intriguing results clearly point out that protein evolution cannot be fathomed by studying nonsynonymous and synonymous substitutions in isolation. The above-described observations have significant implications for various facets of biological processes and are discussed in detail. IMPORTANCE Functional diversification following gene duplication is one of the major driving forces of protein evolution. While the role of nonsynonymous substitutions in the functional diversification of proteins is well recognized, knowledge of the role of synonymous substitutions in protein evolution is in its infancy. Using functional complementation, we isolated two functional alleles of the D. discoideum ammonium transporter gene (amtA), which otherwise does not function in S. cerevisiae as an ammonium transporters. One of them is an ammonium transporter, while the other is an ammonium transporter that also confers methylammonium (ammonium analogue) toxicity, suggesting functional diversification. Surprisingly, both alleles require a combination of synonymous and nonsynonymous substitutions for their functionality. These results bring out a hitherto-unknown pathway of protein evolution and pave the way for not only understanding protein evolution but also interpreting single nucleotide polymorphisms (SNPs).
Collapse
Affiliation(s)
- Asha Densi
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Mumbai, India
| | - Revathi S. Iyer
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Mumbai, India
| | - Paike Jayadeva Bhat
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Mumbai, India
| |
Collapse
|
20
|
Moulana A, Dupic T, Phillips AM, Chang J, Roffler AA, Greaney AJ, Starr TN, Bloom JD, Desai MM. The landscape of antibody binding affinity in SARS-CoV-2 Omicron BA.1 evolution. eLife 2023; 12:e83442. [PMID: 36803543 PMCID: PMC9949795 DOI: 10.7554/elife.83442] [Citation(s) in RCA: 31] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Accepted: 02/06/2023] [Indexed: 02/16/2023] Open
Abstract
The Omicron BA.1 variant of SARS-CoV-2 escapes convalescent sera and monoclonal antibodies that are effective against earlier strains of the virus. This immune evasion is largely a consequence of mutations in the BA.1 receptor binding domain (RBD), the major antigenic target of SARS-CoV-2. Previous studies have identified several key RBD mutations leading to escape from most antibodies. However, little is known about how these escape mutations interact with each other and with other mutations in the RBD. Here, we systematically map these interactions by measuring the binding affinity of all possible combinations of these 15 RBD mutations (215=32,768 genotypes) to 4 monoclonal antibodies (LY-CoV016, LY-CoV555, REGN10987, and S309) with distinct epitopes. We find that BA.1 can lose affinity to diverse antibodies by acquiring a few large-effect mutations and can reduce affinity to others through several small-effect mutations. However, our results also reveal alternative pathways to antibody escape that does not include every large-effect mutation. Moreover, epistatic interactions are shown to constrain affinity decline in S309 but only modestly shape the affinity landscapes of other antibodies. Together with previous work on the ACE2 affinity landscape, our results suggest that the escape of each antibody is mediated by distinct groups of mutations, whose deleterious effects on ACE2 affinity are compensated by another distinct group of mutations (most notably Q498R and N501Y).
Collapse
Affiliation(s)
- Alief Moulana
- Department of Organismic and Evolutionary Biology, Harvard UniversityCambridgeUnited States
| | - Thomas Dupic
- Department of Organismic and Evolutionary Biology, Harvard UniversityCambridgeUnited States
| | - Angela M Phillips
- Department of Organismic and Evolutionary Biology, Harvard UniversityCambridgeUnited States
| | - Jeffrey Chang
- Department of Organismic and Evolutionary Biology, Harvard UniversityCambridgeUnited States
- Department of Physics, Harvard UniversityCambridgeUnited States
| | - Anne A Roffler
- Biological and Biomedical Sciences, Harvard Medical SchoolBostonUnited States
| | - Allison J Greaney
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Research CenterSeattleUnited States
- Department of Genome Sciences, University of WashingtonSeattleUnited States
- Medical Scientist Training Program, University of WashingtonSeattleUnited States
| | - Tyler N Starr
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Research CenterSeattleUnited States
| | - Jesse D Bloom
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Research CenterSeattleUnited States
- Department of Genome Sciences, University of WashingtonSeattleUnited States
- Howard Hughes Medical InstituteSeattleUnited States
| | - Michael M Desai
- Department of Organismic and Evolutionary Biology, Harvard UniversityCambridgeUnited States
- Department of Physics, Harvard UniversityCambridgeUnited States
- NSF-Simons Center for Mathematical and Statistical Analysis of Biology, Harvard UniversityCambridgeUnited States
- Quantitative Biology Initiative, Harvard UniversityCambridgeUnited States
| |
Collapse
|
21
|
Phillips AM, Maurer DP, Brooks C, Dupic T, Schmidt AG, Desai MM. Hierarchical sequence-affinity landscapes shape the evolution of breadth in an anti-influenza receptor binding site antibody. eLife 2023; 12:83628. [PMID: 36625542 PMCID: PMC9995116 DOI: 10.7554/elife.83628] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Accepted: 01/09/2023] [Indexed: 01/11/2023] Open
Abstract
Broadly neutralizing antibodies (bnAbs) that neutralize diverse variants of a particular virus are of considerable therapeutic interest. Recent advances have enabled us to isolate and engineer these antibodies as therapeutics, but eliciting them through vaccination remains challenging, in part due to our limited understanding of how antibodies evolve breadth. Here, we analyze the landscape by which an anti-influenza receptor binding site (RBS) bnAb, CH65, evolved broad affinity to diverse H1 influenza strains. We do this by generating an antibody library of all possible evolutionary intermediates between the unmutated common ancestor (UCA) and the affinity-matured CH65 antibody and measure the affinity of each intermediate to three distinct H1 antigens. We find that affinity to each antigen requires a specific set of mutations - distributed across the variable light and heavy chains - that interact non-additively (i.e., epistatically). These sets of mutations form a hierarchical pattern across the antigens, with increasingly divergent antigens requiring additional epistatic mutations beyond those required to bind less divergent antigens. We investigate the underlying biochemical and structural basis for these hierarchical sets of epistatic mutations and find that epistasis between heavy chain mutations and a mutation in the light chain at the VH-VL interface is essential for binding a divergent H1. Collectively, this is the first work to comprehensively characterize epistasis between heavy and light chain mutations and shows that such interactions are both strong and widespread. Together with our previous study analyzing a different class of anti-influenza antibodies, our results implicate epistasis as a general feature of antibody sequence-affinity landscapes that can potentiate and constrain the evolution of breadth.
Collapse
Affiliation(s)
- Angela M Phillips
- Department of Organismic and Evolutionary Biology, Harvard UniversityCambridgeUnited States
- Department of Microbiology and Immunology, University of California, San FranciscoSan FranciscoUnited States
| | - Daniel P Maurer
- Ragon Institute of MGH, MIT, and HarvardCambridgeUnited States
- Department of Microbiology, Harvard Medical SchoolBostonUnited States
| | - Caelan Brooks
- Department of Physics, Harvard UniversityCambridgeUnited States
| | - Thomas Dupic
- Department of Organismic and Evolutionary Biology, Harvard UniversityCambridgeUnited States
| | - Aaron G Schmidt
- Ragon Institute of MGH, MIT, and HarvardCambridgeUnited States
- Department of Microbiology, Harvard Medical SchoolBostonUnited States
| | - Michael M Desai
- Department of Organismic and Evolutionary Biology, Harvard UniversityCambridgeUnited States
- Department of Physics, Harvard UniversityCambridgeUnited States
- NSF-Simons Center for Mathematical and Statistical Analysis of Biology, Harvard UniversityCambridgeUnited States
- Quantitative Biology Initiative, Harvard UniversityCambridgeUnited States
| |
Collapse
|
22
|
Draghi JA, Ogbunugafor CB. Exploring the expanse between theoretical questions and experimental approaches in the modern study of evolvability. JOURNAL OF EXPERIMENTAL ZOOLOGY. PART B, MOLECULAR AND DEVELOPMENTAL EVOLUTION 2023; 340:8-17. [PMID: 35451559 PMCID: PMC10083935 DOI: 10.1002/jez.b.23134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/07/2021] [Revised: 03/04/2022] [Accepted: 03/11/2022] [Indexed: 12/16/2022]
Abstract
Despite several decades of computational and experimental work across many systems, evolvability remains on the periphery with regards to its status as a widely accepted and regularly applied theoretical concept. Here we propose that its marginal status is partly a result of large gaps between the diverse but disconnected theoretical treatments of evolvability and the relatively narrower range of studies that have tested it empirically. To make this case, we draw on a range of examples-from experimental evolution in microbes, to molecular evolution in proteins-where attempts have been made to mend this disconnect. We highlight some examples of progress that has been made and point to areas where synthesis and translation of existing theory can lead to further progress in the still-new field of empirical measurements of evolvability.
Collapse
Affiliation(s)
- Jeremy A Draghi
- Department of Biological Sciences, Virginia Tech, Blacksburg, Virginia, USA
| | - C Brandon Ogbunugafor
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, Connecticut, USA
| |
Collapse
|
23
|
Moulana A, Dupic T, Phillips AM, Chang J, Nieves S, Roffler AA, Greaney AJ, Starr TN, Bloom JD, Desai MM. Compensatory epistasis maintains ACE2 affinity in SARS-CoV-2 Omicron BA.1. Nat Commun 2022; 13:7011. [PMID: 36384919 PMCID: PMC9668218 DOI: 10.1038/s41467-022-34506-z] [Citation(s) in RCA: 85] [Impact Index Per Article: 28.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Accepted: 10/26/2022] [Indexed: 11/17/2022] Open
Abstract
The Omicron BA.1 variant emerged in late 2021 and quickly spread across the world. Compared to the earlier SARS-CoV-2 variants, BA.1 has many mutations, some of which are known to enable antibody escape. Many of these antibody-escape mutations individually decrease the spike receptor-binding domain (RBD) affinity for ACE2, but BA.1 still binds ACE2 with high affinity. The fitness and evolution of the BA.1 lineage is therefore driven by the combined effects of numerous mutations. Here, we systematically map the epistatic interactions between the 15 mutations in the RBD of BA.1 relative to the Wuhan Hu-1 strain. Specifically, we measure the ACE2 affinity of all possible combinations of these 15 mutations (215 = 32,768 genotypes), spanning all possible evolutionary intermediates from the ancestral Wuhan Hu-1 strain to BA.1. We find that immune escape mutations in BA.1 individually reduce ACE2 affinity but are compensated by epistatic interactions with other affinity-enhancing mutations, including Q498R and N501Y. Thus, the ability of BA.1 to evade immunity while maintaining ACE2 affinity is contingent on acquiring multiple interacting mutations. Our results implicate compensatory epistasis as a key factor driving substantial evolutionary change for SARS-CoV-2 and are consistent with Omicron BA.1 arising from a chronic infection.
Collapse
Affiliation(s)
- Alief Moulana
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, 02138, USA
| | - Thomas Dupic
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, 02138, USA
| | - Angela M Phillips
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, 02138, USA.
| | - Jeffrey Chang
- Department of Physics, Harvard University, Cambridge, MA, 02138, USA
| | - Serafina Nieves
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, 02138, USA
| | - Anne A Roffler
- Biological and Biomedical Sciences, Harvard Medical School, Boston, MA, 02115, USA
| | - Allison J Greaney
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA, 98109, USA
- Department of Genome Sciences, University of Washington, Seattle, WA, 98195, USA
- Medical Scientist Training Program, University of Washington, Seattle, WA, 98195, USA
| | - Tyler N Starr
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA, 98109, USA
| | - Jesse D Bloom
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA, 98109, USA
- Department of Genome Sciences, University of Washington, Seattle, WA, 98195, USA
- Howard Hughes Medical Institute, Seattle, WA, 98109, USA
| | - Michael M Desai
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, 02138, USA.
- Department of Physics, Harvard University, Cambridge, MA, 02138, USA.
- NSF-Simons Center for Mathematical and Statistical Analysis of Biology, Harvard University, Cambridge, MA, 02138, USA.
- Quantitative Biology Initiative, Harvard University, Cambridge, MA, 02138, USA.
| |
Collapse
|
24
|
Abstract
One core goal of genetics is to systematically understand the mapping between the DNA sequence of an organism (genotype) and its measurable characteristics (phenotype). Understanding this mapping is often challenging because of interactions between mutations, where the result of combining several different mutations can be very different than the sum of their individual effects. Here we provide a statistical framework for modeling complex genetic interactions of this type. The key idea is to ask how fast the effects of mutations change when introducing the same mutation in increasingly distant genetic backgrounds. We then propose a model for phenotypic prediction that takes into account this tendency for the effects of mutations to be more similar in nearby genetic backgrounds. Contemporary high-throughput mutagenesis experiments are providing an increasingly detailed view of the complex patterns of genetic interaction that occur between multiple mutations within a single protein or regulatory element. By simultaneously measuring the effects of thousands of combinations of mutations, these experiments have revealed that the genotype–phenotype relationship typically reflects not only genetic interactions between pairs of sites but also higher-order interactions among larger numbers of sites. However, modeling and understanding these higher-order interactions remains challenging. Here we present a method for reconstructing sequence-to-function mappings from partially observed data that can accommodate all orders of genetic interaction. The main idea is to make predictions for unobserved genotypes that match the type and extent of epistasis found in the observed data. This information on the type and extent of epistasis can be extracted by considering how phenotypic correlations change as a function of mutational distance, which is equivalent to estimating the fraction of phenotypic variance due to each order of genetic interaction (additive, pairwise, three-way, etc.). Using these estimated variance components, we then define an empirical Bayes prior that in expectation matches the observed pattern of epistasis and reconstruct the genotype–phenotype mapping by conducting Gaussian process regression under this prior. To demonstrate the power of this approach, we present an application to the antibody-binding domain GB1 and also provide a detailed exploration of a dataset consisting of high-throughput measurements for the splicing efficiency of human pre-mRNA 5′ splice sites, for which we also validate our model predictions via additional low-throughput experiments.
Collapse
|
25
|
Smith CE, Smith ANH, Cooper TF, Moore FBG. Fitness of evolving bacterial populations is contingent on deep and shallow history but only shallow history creates predictable patterns. Proc Biol Sci 2022; 289:20221292. [PMID: 36100026 PMCID: PMC9470251 DOI: 10.1098/rspb.2022.1292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Long-term evolution experiments have tested the importance of genetic and environmental factors in influencing evolutionary outcomes. Differences in phylogenetic history, recent adaptation to distinct environments and chance events, all influence the fitness of a population. However, the interplay of these factors on a population's evolutionary potential remains relatively unexplored. We tracked the outcome of 2000 generations of evolution of four natural isolates of Escherichia coli bacteria that were engineered to also create differences in shallow history by adding previously identified mutations selected in a separate long-term experiment. Replicate populations started from each progenitor evolved in four environments. We found that deep and shallow phylogenetic histories both contributed significantly to differences in evolved fitness, though by different amounts in different selection environments. With one exception, chance effects were not significant. Whereas the effect of deep history did not follow any detectable pattern, effects of shallow history followed a pattern of diminishing returns whereby fitter ancestors had smaller fitness increases. These results are consistent with adaptive evolution being contingent on the interaction of several evolutionary forces but demonstrate that the nature of these interactions is not fixed and may not be predictable even when the role of chance is small.
Collapse
Affiliation(s)
- Chelsea E Smith
- Department of Biological Sciences, Kent State University, Kent, OH 44242, USA
| | - Adam N H Smith
- School of Mathematical and Computational Sciences, Massey University, Auckland 0634, New Zealand
| | - Tim F Cooper
- School of Natural Sciences, Massey University, Auckland 0634, New Zealand
| | - Francisco B-G Moore
- Department of Biological Sciences, Kent State University, Kent, OH 44242, USA.,Department of Biology, University of Akron, Akron, OH 44325, USA
| |
Collapse
|
26
|
Ponte-Fernandez C, Gonzalez-Dominguez J, Carvajal-Rodriguez A, Martin MJ. Evaluation of Existing Methods for High-Order Epistasis Detection. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:912-926. [PMID: 33055017 DOI: 10.1109/tcbb.2020.3030312] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Finding epistatic interactions among loci when expressing a phenotype is a widely employed strategy to understand the genetic architecture of complex traits in GWAS. The abundance of methods dedicated to the same purpose, however, makes it increasingly difficult for scientists to decide which method is more suitable for their studies. This work compares the different epistasis detection methods published during the last decade in terms of runtime, detection power and type I error rate, with a special emphasis on high-order interactions. Results show that in terms of detection power, the only methods that perform well across all experiments are the exhaustive methods, although their computational cost may be prohibitive in large-scale studies. Regarding non-exhaustive methods, not one could consistently find epistasis interactions when marginal effects are absent. If marginal effects are present, there are methods that perform well for high-order interactions, such as BADTrees, FDHE-IW, SingleMI or SNPHarvester. As for false-positive control, only SNPHarvester, FDHE-IW and DCHE show good results. The study concludes that there is no single epistasis detection method to recommend in all scenarios. Authors should prioritize exhaustive methods when sufficient computational resources are available considering the data set size, and resort to non-exhaustive methods when the analysis time is prohibitive.
Collapse
|
27
|
Baquero F, Martínez JL, F. Lanza V, Rodríguez-Beltrán J, Galán JC, San Millán A, Cantón R, Coque TM. Evolutionary Pathways and Trajectories in Antibiotic Resistance. Clin Microbiol Rev 2021; 34:e0005019. [PMID: 34190572 PMCID: PMC8404696 DOI: 10.1128/cmr.00050-19] [Citation(s) in RCA: 115] [Impact Index Per Article: 28.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Evolution is the hallmark of life. Descriptions of the evolution of microorganisms have provided a wealth of information, but knowledge regarding "what happened" has precluded a deeper understanding of "how" evolution has proceeded, as in the case of antimicrobial resistance. The difficulty in answering the "how" question lies in the multihierarchical dimensions of evolutionary processes, nested in complex networks, encompassing all units of selection, from genes to communities and ecosystems. At the simplest ontological level (as resistance genes), evolution proceeds by random (mutation and drift) and directional (natural selection) processes; however, sequential pathways of adaptive variation can occasionally be observed, and under fixed circumstances (particular fitness landscapes), evolution is predictable. At the highest level (such as that of plasmids, clones, species, microbiotas), the systems' degrees of freedom increase dramatically, related to the variable dispersal, fragmentation, relatedness, or coalescence of bacterial populations, depending on heterogeneous and changing niches and selective gradients in complex environments. Evolutionary trajectories of antibiotic resistance find their way in these changing landscapes subjected to random variations, becoming highly entropic and therefore unpredictable. However, experimental, phylogenetic, and ecogenetic analyses reveal preferential frequented paths (highways) where antibiotic resistance flows and propagates, allowing some understanding of evolutionary dynamics, modeling and designing interventions. Studies on antibiotic resistance have an applied aspect in improving individual health, One Health, and Global Health, as well as an academic value for understanding evolution. Most importantly, they have a heuristic significance as a model to reduce the negative influence of anthropogenic effects on the environment.
Collapse
Affiliation(s)
- F. Baquero
- Department of Microbiology, Ramón y Cajal University Hospital, Ramón y Cajal Institute for Health Research (IRYCIS), Network Center for Research in Epidemiology and Public Health (CIBERESP), Madrid, Spain
| | - J. L. Martínez
- National Center for Biotechnology (CNB-CSIC), Madrid, Spain
| | - V. F. Lanza
- Department of Microbiology, Ramón y Cajal University Hospital, Ramón y Cajal Institute for Health Research (IRYCIS), Network Center for Research in Epidemiology and Public Health (CIBERESP), Madrid, Spain
- Central Bioinformatics Unit, Ramón y Cajal Institute for Health Research (IRYCIS), Madrid, Spain
| | - J. Rodríguez-Beltrán
- Department of Microbiology, Ramón y Cajal University Hospital, Ramón y Cajal Institute for Health Research (IRYCIS), Network Center for Research in Epidemiology and Public Health (CIBERESP), Madrid, Spain
| | - J. C. Galán
- Department of Microbiology, Ramón y Cajal University Hospital, Ramón y Cajal Institute for Health Research (IRYCIS), Network Center for Research in Epidemiology and Public Health (CIBERESP), Madrid, Spain
| | - A. San Millán
- National Center for Biotechnology (CNB-CSIC), Madrid, Spain
| | - R. Cantón
- Department of Microbiology, Ramón y Cajal University Hospital, Ramón y Cajal Institute for Health Research (IRYCIS), Network Center for Research in Epidemiology and Public Health (CIBERESP), Madrid, Spain
| | - T. M. Coque
- Department of Microbiology, Ramón y Cajal University Hospital, Ramón y Cajal Institute for Health Research (IRYCIS), Network Center for Research in Epidemiology and Public Health (CIBERESP), Madrid, Spain
| |
Collapse
|
28
|
High-Order Epistasis and Functional Coupling of Infection Steps Drive Virus Evolution toward Independence from a Host Pathway. Microbiol Spectr 2021; 9:e0080021. [PMID: 34468191 PMCID: PMC8557862 DOI: 10.1128/spectrum.00800-21] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
The phosphatidylinositol-4 kinase IIIβ (PI4KB)/oxysterol-binding protein (OSBP) family I pathway serves as an essential host pathway for the formation of viral replication complex for viral plus-strand RNA synthesis; however, poliovirus (PV) could evolve toward substantial independence from this host pathway with four mutations. Recessive epistasis of the two mutations (3A-R54W and 2B-F17L) is essential for viral RNA replication. Quantitative analysis of effects of the other two mutations (2B-Q20H and 2C-M187V) on each step of infection reveals functional couplings between viral replication, growth, and spread conferred by the 2B-Q20H mutation, while no enhancing effect was conferred by the 2C-M187V mutation. The effects of the 2B-Q20H mutation occur only via another recessive epistasis between the 3A-R54W/2B-F17L mutations. These mutations confer enhanced replication in PI4KB/OSBP-independent infection concomitantly with an increased ratio of viral plus-strand RNA to the minus-strand RNA. This work reveals the essential roles of the functional coupling and high-order, multi-tiered recessive epistasis in viral evolution toward independence from an obligatory host pathway. IMPORTANCE Each virus has a different strategy for its replication, which requires different host factors. Enterovirus, a model RNA virus, requires host factors PI4KB and OSBP, which form an obligatory functional axis to support viral replication. In an experimental evolution system in vitro, virus mutants that do not depend on these host factors could arise only with four mutations. The two mutations (3A-R54W and 2B-F17L) are required for the replication but are not sufficient to support efficient infection. Another mutation (2B-Q20H) is essential for efficient spread of the virus. The order of introduction of the mutations in the viral genome is essential (known as “epistasis”), and functional couplings of infection steps (i.e., viral replication, growth, and spread) have substantial roles to show the effects of the 2B-Q20H mutation. These observations would provide novel insights into an evolutionary pathway of the virus to require host factors for infection.
Collapse
|
29
|
Youssef N, Susko E, Roger AJ, Bielawski JP. Shifts in amino acid preferences as proteins evolve: A synthesis of experimental and theoretical work. Protein Sci 2021; 30:2009-2028. [PMID: 34322924 PMCID: PMC8442975 DOI: 10.1002/pro.4161] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Revised: 07/19/2021] [Accepted: 07/26/2021] [Indexed: 11/08/2022]
Abstract
Amino acid preferences vary across sites and time. While variation across sites is widely accepted, the extent and frequency of temporal shifts are contentious. Our understanding of the drivers of amino acid preference change is incomplete: To what extent are temporal shifts driven by adaptive versus nonadaptive evolutionary processes? We review phenomena that cause preferences to vary (e.g., evolutionary Stokes shift, contingency, and entrenchment) and clarify how they differ. To determine the extent and prevalence of shifted preferences, we review experimental and theoretical studies. Analyses of natural sequence alignments often detect decreases in homoplasy (convergence and reversions) rates, and variation in replacement rates with time-signals that are consistent with temporally changing preferences. While approaches inferring shifts in preferences from patterns in natural alignments are valuable, they are indirect since multiple mechanisms (both adaptive and nonadaptive) could lead to the observed signal. Alternatively, site-directed mutagenesis experiments allow for a more direct assessment of shifted preferences. They corroborate evidence from multiple sequence alignments, revealing that the preference for an amino acid at a site varies depending on the background sequence. However, shifts in preferences are usually minor in magnitude and sites with significantly shifted preferences are low in frequency. The small yet consistent perturbations in preferences could, nevertheless, jeopardize the accuracy of inference procedures, which assume constant preferences. We conclude by discussing if and how such shifts in preferences might influence widely used time-homogenous inference procedures and potential ways to mitigate such effects.
Collapse
Affiliation(s)
- Noor Youssef
- Department of BiologyDalhousie UniversityHalifaxNova ScotiaCanada
| | - Edward Susko
- Department of Mathematics and StatisticsDalhousie UniversityHalifaxNova ScotiaCanada
| | - Andrew J. Roger
- Department of Biochemistry and Molecular BiologyDalhousie UniversityHalifaxNova ScotiaCanada
| | - Joseph P. Bielawski
- Department of BiologyDalhousie UniversityHalifaxNova ScotiaCanada
- Department of Mathematics and StatisticsDalhousie UniversityHalifaxNova ScotiaCanada
| |
Collapse
|
30
|
Phillips AM, Lawrence KR, Moulana A, Dupic T, Chang J, Johnson MS, Cvijovic I, Mora T, Walczak AM, Desai MM. Binding affinity landscapes constrain the evolution of broadly neutralizing anti-influenza antibodies. eLife 2021; 10:71393. [PMID: 34491198 PMCID: PMC8476123 DOI: 10.7554/elife.71393] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 09/05/2021] [Indexed: 12/12/2022] Open
Abstract
Over the past two decades, several broadly neutralizing antibodies (bnAbs) that confer protection against diverse influenza strains have been isolated. Structural and biochemical characterization of these bnAbs has provided molecular insight into how they bind distinct antigens. However, our understanding of the evolutionary pathways leading to bnAbs, and thus how best to elicit them, remains limited. Here, we measure equilibrium dissociation constants of combinatorially complete mutational libraries for two naturally isolated influenza bnAbs (CR9114, 16 heavy-chain mutations; CR6261, 11 heavy-chain mutations), reconstructing all possible evolutionary intermediates back to the unmutated germline sequences. We find that these two libraries exhibit strikingly different patterns of breadth: while many variants of CR6261 display moderate affinity to diverse antigens, those of CR9114 display appreciable affinity only in specific, nested combinations. By examining the extensive pairwise and higher order epistasis between mutations, we find key sites with strong synergistic interactions that are highly similar across antigens for CR6261 and different for CR9114. Together, these features of the binding affinity landscapes strongly favor sequential acquisition of affinity to diverse antigens for CR9114, while the acquisition of breadth to more similar antigens for CR6261 is less constrained. These results, if generalizable to other bnAbs, may explain the molecular basis for the widespread observation that sequential exposure favors greater breadth, and such mechanistic insight will be essential for predicting and eliciting broadly protective immune responses.
Collapse
Affiliation(s)
- Angela M Phillips
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, United States
| | - Katherine R Lawrence
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, United States.,NSF-Simons Center for Mathematical and Statistical Analysis of Biology, Harvard University, Cambridge, United States.,Quantitative Biology Initiative, Harvard University, Cambridge, United States.,Department of Physics, Massachusetts Institute of Technology, Cambridge, United States
| | - Alief Moulana
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, United States
| | - Thomas Dupic
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, United States
| | - Jeffrey Chang
- Department of Physics, Harvard University, Cambridge, United States
| | - Milo S Johnson
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, United States
| | - Ivana Cvijovic
- Department of Applied Physics, Stanford University, Stanford, United States
| | - Thierry Mora
- Laboratoire de physique de ÍÉcole Normale Supérieure, CNRS, PSL University, Sorbonne Université, and Université de Paris, Paris, France
| | - Aleksandra M Walczak
- Laboratoire de physique de ÍÉcole Normale Supérieure, CNRS, PSL University, Sorbonne Université, and Université de Paris, Paris, France
| | - Michael M Desai
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, United States.,NSF-Simons Center for Mathematical and Statistical Analysis of Biology, Harvard University, Cambridge, United States.,Quantitative Biology Initiative, Harvard University, Cambridge, United States.,Department of Physics, Harvard University, Cambridge, United States
| |
Collapse
|
31
|
Aghazadeh A, Nisonoff H, Ocal O, Brookes DH, Huang Y, Koyluoglu OO, Listgarten J, Ramchandran K. Epistatic Net allows the sparse spectral regularization of deep neural networks for inferring fitness functions. Nat Commun 2021; 12:5225. [PMID: 34471113 PMCID: PMC8410946 DOI: 10.1038/s41467-021-25371-3] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Accepted: 07/27/2021] [Indexed: 11/18/2022] Open
Abstract
Despite recent advances in high-throughput combinatorial mutagenesis assays, the number of labeled sequences available to predict molecular functions has remained small for the vastness of the sequence space combined with the ruggedness of many fitness functions. While deep neural networks (DNNs) can capture high-order epistatic interactions among the mutational sites, they tend to overfit to the small number of labeled sequences available for training. Here, we developed Epistatic Net (EN), a method for spectral regularization of DNNs that exploits evidence that epistatic interactions in many fitness functions are sparse. We built a scalable extension of EN, usable for larger sequences, which enables spectral regularization using fast sparse recovery algorithms informed by coding theory. Results on several biological landscapes show that EN consistently improves the prediction accuracy of DNNs and enables them to outperform competing models which assume other priors. EN estimates the higher-order epistatic interactions of DNNs trained on massive sequence spaces-a computational problem that otherwise takes years to solve.
Collapse
Affiliation(s)
- Amirali Aghazadeh
- Department of Electrical Engineering and Computer Sciences, Berkeley, CA, USA
| | | | - Orhan Ocal
- Department of Electrical Engineering and Computer Sciences, Berkeley, CA, USA
| | - David H Brookes
- Biophysics Graduate Group, University of California, Berkeley, CA, USA
| | - Yijie Huang
- Department of Electrical Engineering and Computer Sciences, Berkeley, CA, USA
| | - O Ozan Koyluoglu
- Department of Electrical Engineering and Computer Sciences, Berkeley, CA, USA
| | - Jennifer Listgarten
- Department of Electrical Engineering and Computer Sciences, Berkeley, CA, USA
- Center for Computational Biology, Berkeley, CA, USA
| | - Kannan Ramchandran
- Department of Electrical Engineering and Computer Sciences, Berkeley, CA, USA.
| |
Collapse
|
32
|
Morrison AJ, Wonderlick DR, Harms MJ. Ensemble epistasis: thermodynamic origins of nonadditivity between mutations. Genetics 2021; 219:iyab105. [PMID: 34849909 PMCID: PMC8633102 DOI: 10.1093/genetics/iyab105] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Accepted: 06/19/2021] [Indexed: 01/02/2023] Open
Abstract
Epistasis-when mutations combine nonadditively-is a profoundly important aspect of biology. It is often difficult to understand its mechanistic origins. Here, we show that epistasis can arise from the thermodynamic ensemble, or the set of interchanging conformations a protein adopts. Ensemble epistasis occurs because mutations can have different effects on different conformations of the same protein, leading to nonadditive effects on its average, observable properties. Using a simple analytical model, we found that ensemble epistasis arises when two conditions are met: (1) a protein populates at least three conformations and (2) mutations have differential effects on at least two conformations. To explore the relative magnitude of ensemble epistasis, we performed a virtual deep-mutational scan of the allosteric Ca2+ signaling protein S100A4. We found that 47% of mutation pairs exhibited ensemble epistasis with a magnitude on the order of thermal fluctuations. We observed many forms of epistasis: magnitude, sign, and reciprocal sign epistasis. The same mutation pair could even exhibit different forms of epistasis under different environmental conditions. The ubiquity of thermodynamic ensembles in biology and the pervasiveness of ensemble epistasis in our dataset suggests that it may be a common mechanism of epistasis in proteins and other macromolecules.
Collapse
Affiliation(s)
- Anneliese J Morrison
- Institute of Molecular Biology, University of Oregon, Eugene, OR 97403, USA
- Department of Chemistry and Biochemistry, University of Oregon, Eugene OR 97403, USA
| | - Daria R Wonderlick
- Institute of Molecular Biology, University of Oregon, Eugene, OR 97403, USA
- Department of Chemistry and Biochemistry, University of Oregon, Eugene OR 97403, USA
| | - Michael J Harms
- Institute of Molecular Biology, University of Oregon, Eugene, OR 97403, USA
- Department of Chemistry and Biochemistry, University of Oregon, Eugene OR 97403, USA
| |
Collapse
|
33
|
The adaptive landscape of a metallo-enzyme is shaped by environment-dependent epistasis. Nat Commun 2021; 12:3867. [PMID: 34162839 PMCID: PMC8222346 DOI: 10.1038/s41467-021-23943-x] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2020] [Accepted: 05/18/2021] [Indexed: 11/08/2022] Open
Abstract
Enzymes can evolve new catalytic activity when environmental changes present them with novel substrates. Despite this seemingly straightforward relationship, factors other than the direct catalytic target can also impact adaptation. Here, we characterize the catalytic activity of a recently evolved bacterial methyl-parathion hydrolase for all possible combinations of the five functionally relevant mutations under eight different laboratory conditions (in which an alternative divalent metal is supplemented). The resultant adaptive landscapes across this historical evolutionary transition vary in terms of both the number of “fitness peaks” as well as the genotype(s) at which they are found as a result of genotype-by-environment interactions and environment-dependent epistasis. This suggests that adaptive landscapes may be fluid and molecular adaptation is highly contingent not only on obvious factors (such as catalytic targets), but also on less obvious secondary environmental factors that can direct it towards distinct outcomes. The metaphor of an adaptive landscape is presented quantitatively by looking at molecular adaptations and their catalytic consequences in a recently evolved bacterial enzyme. The study identifies both genotype-by-environment interactions and environment-dependent epistasis as factors that can alter the fitness of functional mutations.
Collapse
|
34
|
Swint-Kruse L, Martin TA, Page BM, Wu T, Gerhart PM, Dougherty LL, Tang Q, Parente DJ, Mosier BR, Bantis LE, Fenton AW. Rheostat functional outcomes occur when substitutions are introduced at nonconserved positions that diverge with speciation. Protein Sci 2021; 30:1833-1853. [PMID: 34076313 DOI: 10.1002/pro.4136] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Revised: 05/25/2021] [Accepted: 05/28/2021] [Indexed: 12/14/2022]
Abstract
When amino acids vary during evolution, the outcome can be functionally neutral or biologically-important. We previously found that substituting a subset of nonconserved positions, "rheostat" positions, can have surprising effects on protein function. Since changes at rheostat positions can facilitate functional evolution or cause disease, more examples are needed to understand their unique biophysical characteristics. Here, we explored whether "phylogenetic" patterns of change in multiple sequence alignments (such as positions with subfamily specific conservation) predict the locations of functional rheostat positions. To that end, we experimentally tested eight phylogenetic positions in human liver pyruvate kinase (hLPYK), using 10-15 substitutions per position and biochemical assays that yielded five functional parameters. Five positions were strongly rheostatic and three were non-neutral. To test the corollary that positions with low phylogenetic scores were not rheostat positions, we combined these phylogenetic positions with previously-identified hLPYK rheostat, "toggle" (most substitution abolished function), and "neutral" (all substitutions were like wild-type) positions. Despite representing 428 variants, this set of 33 positions was poorly statistically powered. Thus, we turned to the in vivo phenotypic dataset for E. coli lactose repressor protein (LacI), which comprised 12-13 substitutions at 329 positions and could be used to identify rheostat, toggle, and neutral positions. Combined hLPYK and LacI results show that positions with strong phylogenetic patterns of change are more likely to exhibit rheostat substitution outcomes than neutral or toggle outcomes. Furthermore, phylogenetic patterns were more successful at identifying rheostat positions than were co-evolutionary or eigenvector centrality measures of evolutionary change.
Collapse
Affiliation(s)
- Liskin Swint-Kruse
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Tyler A Martin
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Braelyn M Page
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Tiffany Wu
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Paige M Gerhart
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Larissa L Dougherty
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA.,Department of Biochemistry and Cell Biology, Geisel School of Medicine at Dartmouth College, Hanover, New Hampshire, USA
| | - Qingling Tang
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Daniel J Parente
- Department of Family Medicine and Community Health, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Brian R Mosier
- Department of Biostatistics and Data Science, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Leonidas E Bantis
- Department of Biostatistics and Data Science, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Aron W Fenton
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| |
Collapse
|
35
|
Xie VC, Pu J, Metzger BP, Thornton JW, Dickinson BC. Contingency and chance erase necessity in the experimental evolution of ancestral proteins. eLife 2021; 10:67336. [PMID: 34061027 PMCID: PMC8282340 DOI: 10.7554/elife.67336] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Accepted: 05/30/2021] [Indexed: 12/13/2022] Open
Abstract
The roles of chance, contingency, and necessity in evolution are unresolved because they have never been assessed in a single system or on timescales relevant to historical evolution. We combined ancestral protein reconstruction and a new continuous evolution technology to mutate and select proteins in the B-cell lymphoma-2 (BCL-2) family to acquire protein–protein interaction specificities that occurred during animal evolution. By replicating evolutionary trajectories from multiple ancestral proteins, we found that contingency generated over long historical timescales steadily erased necessity and overwhelmed chance as the primary cause of acquired sequence variation; trajectories launched from phylogenetically distant proteins yielded virtually no common mutations, even under strong and identical selection pressures. Chance arose because many sets of mutations could alter specificity at any timepoint; contingency arose because historical substitutions changed these sets. Our results suggest that patterns of variation in BCL-2 sequences – and likely other proteins, too – are idiosyncratic products of a particular and unpredictable course of historical events. One of the most fundamental and unresolved questions in evolutionary biology is whether the outcomes of evolution are predictable. Is the diversity of life we see today the expected result of organisms adapting to their environment throughout history (also known as natural selection) or the product of random chance? Or did chance events early in history shape the paths that evolution could take next, determining the biological forms that emerged under natural selection much later? These questions are hard to study because evolution happened only once, long ago. To overcome this barrier, Xie, Pu, Metzger et al. developed an experimental approach that can evolve reconstructed ancestral proteins that existed deep in the past. Using this method, it is possible to replay evolution multiple times, from various historical starting points, under conditions similar to those that existed long ago. The end products of the evolutionary trajectories can then be compared to determine how predictable evolution actually is. Xie, Pu, Metzger et al. studied proteins belonging to the BCL-2 family, which originated some 800 million years ago. These proteins have diversified greatly over time in both their genetic sequences and their ability to bind to specific partner proteins called co-regulators. Xie, Pu, Metzger et al. synthesized BCL-2 proteins that existed at various times in the past. Each ancestral protein was then allowed to evolve repeatedly under natural selection to acquire the same co-regulator binding functions that evolved during history. At the end of each evolutionary trajectory, the genetic sequence of the resulting BCL-2 proteins was recorded. This revealed that the outcomes of evolution were almost completely unpredictable: trajectories initiated from the same ancestral protein produced proteins with very different sequences, and proteins launched from different ancestral starting points were even more dissimilar. Further experiments identified the mutations in each trajectory that caused changes in coregulator binding. When these mutations were introduced into other ancestral proteins, they did not yield the same change in function. This suggests that early chance events influenced each protein’s evolution in an unpredictable way by opening and closing the paths available to it in the future. This research expands our understanding of evolution on a molecular level whilst providing a new experimental approach for studying evolutionary drivers in more detail. The results suggest that BCL-2 proteins, in all their various forms, are unique products of a particular, unpredictable course of history set in motion by ancient chance events.
Collapse
Affiliation(s)
| | - Jinyue Pu
- Department of Chemistry, University of Chicago, Chicago, United States
| | - Brian Ph Metzger
- Department of Ecology and Evolution, University of Chicago, Chicago, United States
| | - Joseph W Thornton
- Department of Ecology and Evolution, University of Chicago, Chicago, United States.,Department of Human Genetics, University of Chicago, Chicago, United States
| | - Bryan C Dickinson
- Department of Chemistry, University of Chicago, Chicago, United States
| |
Collapse
|
36
|
Miton CM, Buda K, Tokuriki N. Epistasis and intramolecular networks in protein evolution. Curr Opin Struct Biol 2021; 69:160-168. [PMID: 34077895 DOI: 10.1016/j.sbi.2021.04.007] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Revised: 04/01/2021] [Accepted: 04/21/2021] [Indexed: 12/01/2022]
Abstract
Proteins are molecular machines composed of complex, highly connected amino acid networks. Their functional optimization requires the reorganization of these intramolecular networks by evolution. In this review, we discuss the mechanisms by which epistasis, that is, the dependence of the effect of a mutation on the genetic background, rewires intramolecular interactions to alter protein function. Deciphering the biophysical basis of epistasis is crucial to our understanding of evolutionary dynamics and the elucidation of sequence-structure-function relationships. We featured recent studies that provide insights into the molecular mechanisms giving rise to epistasis, particularly at the structural level. These studies illustrate the convoluted and fascinating nature of the intramolecular networks co-opted by epistasis during the evolution of protein function.
Collapse
Affiliation(s)
- Charlotte M Miton
- Michael Smith Laboratories, University of British Columbia, Vancouver, V6T 1Z4, BC, Canada
| | - Karol Buda
- Michael Smith Laboratories, University of British Columbia, Vancouver, V6T 1Z4, BC, Canada
| | - Nobuhiko Tokuriki
- Michael Smith Laboratories, University of British Columbia, Vancouver, V6T 1Z4, BC, Canada.
| |
Collapse
|
37
|
Damry AM, Jackson CJ. The evolution and engineering of enzyme activity through tuning conformational landscapes. Protein Eng Des Sel 2021; 34:6254467. [PMID: 33903911 DOI: 10.1093/protein/gzab009] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 03/22/2021] [Accepted: 03/23/2021] [Indexed: 11/12/2022] Open
Abstract
Proteins are dynamic molecules whose structures consist of an ensemble of conformational states. Dynamics contribute to protein function and a link to protein evolution has begun to emerge. This increased appreciation for the evolutionary impact of conformational sampling has grown from our developing structural biology capabilities and the exploration of directed evolution approaches, which have allowed evolutionary trajectories to be mapped. Recent studies have provided empirical examples of how proteins can evolve via conformational landscape alterations. Moreover, minor conformational substates have been shown to be involved in the emergence of new enzyme functions as they can become enriched through evolution. The role of remote mutations in stabilizing new active site geometries has also granted insight into the molecular basis underpinning poorly understood epistatic effects that guide protein evolution. Finally, we discuss how the growth of our understanding of remote mutations is beginning to refine our approach to engineering enzymes.
Collapse
Affiliation(s)
- Adam M Damry
- Research School of Chemistry, The Australian National University, Canberra, 2601, Australia
| | - Colin J Jackson
- Research School of Chemistry, The Australian National University, Canberra, 2601, Australia.,Australian Research Council Centre of Excellence for Innovations in Peptide and Protein Science, Research School of Chemistry, Australian National University, Canberra, 2601, ACT, Australia.,Australian Research Council Centre of Excellence in Synthetic Biology, Research School of Chemistry, Australian National University, Canberra, 2601, ACT, Australia
| |
Collapse
|
38
|
Kryazhimskiy S. Emergence and propagation of epistasis in metabolic networks. eLife 2021; 10:e60200. [PMID: 33527897 PMCID: PMC7924954 DOI: 10.7554/elife.60200] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Accepted: 02/01/2021] [Indexed: 12/11/2022] Open
Abstract
Epistasis is often used to probe functional relationships between genes, and it plays an important role in evolution. However, we lack theory to understand how functional relationships at the molecular level translate into epistasis at the level of whole-organism phenotypes, such as fitness. Here, I derive two rules for how epistasis between mutations with small effects propagates from lower- to higher-level phenotypes in a hierarchical metabolic network with first-order kinetics and how such epistasis depends on topology. Most importantly, weak epistasis at a lower level may be distorted as it propagates to higher levels. Computational analyses show that epistasis in more realistic models likely follows similar, albeit more complex, patterns. These results suggest that pairwise inter-gene epistasis should be common, and it should generically depend on the genetic background and environment. Furthermore, the epistasis coefficients measured for high-level phenotypes may not be sufficient to fully infer the underlying functional relationships.
Collapse
Affiliation(s)
- Sergey Kryazhimskiy
- Division of Biological Sciences, University of California, San DiegoLa JollaUnited States
| |
Collapse
|
39
|
Stouffer DB, Novak M. Hidden layers of density dependence in consumer feeding rates. Ecol Lett 2021; 24:520-532. [PMID: 33404158 DOI: 10.1111/ele.13670] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Revised: 11/26/2020] [Accepted: 12/07/2020] [Indexed: 01/16/2023]
Abstract
Functional responses relate a consumer's feeding rates to variation in its abiotic and biotic environment, providing insight into consumer behaviour and fitness, and underpinning population and food-web dynamics. Despite their broad relevance and long-standing history, we show here that the types of density dependence found in classic resource- and consumer-dependent functional-response models equate to strong and often untenable assumptions about the independence of processes underlying feeding rates. We first demonstrate mathematically how to quantify non-independence between feeding and consumer interference and between feeding on multiple resources. We then analyse two large collections of functional-response data sets to show that non-independence is pervasive and borne out in previously hidden forms of density dependence. Our results provide a new lens through which to view variation in consumer feeding rates and disentangle the biological underpinnings of species interactions in multi-species contexts.
Collapse
Affiliation(s)
- Daniel B Stouffer
- Centre for Integrative Ecology, School of Biological Sciences, University of Canterbury, Christchurch, 8041, New Zealand
| | - Mark Novak
- Department of Integrative Biology, Oregon State University, Corvallis, OR, 97331, USA
| |
Collapse
|
40
|
Chen J, Wong KC. Analyzing High-Order Epistasis from Genotype-Phenotype Maps Using 'Epistasis' Package. Methods Mol Biol 2021; 2212:265-275. [PMID: 33733361 DOI: 10.1007/978-1-0716-0947-7_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Epistasis is the phenomenon about the interactions between genes, leading to complex phenotypic effects. The interactions between three or more mutations called "high-order epistasis" aroused significant interests in recent studies. However, there are still debates for analysis of high-order epistasis due to the non-linear model complexity and statistical artifacts. A recent "epistasis" Python package was therefore developed to characterize high-order epistasis by estimating non-linear scaling for mutation effects to extract high-order epistasis using linear models. This method successfully discovered statistically significant high-order epistasis on several real genotype-phenotype maps. We provided a concise and step-by-step guide to apply the "epistasis" by reproducing the high-order epistasis discoveries on real genotype-phenotype data using the latest API of the package.
Collapse
Affiliation(s)
- Junyi Chen
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong.
| |
Collapse
|
41
|
Sailer ZR, Shafik SH, Summers RL, Joule A, Patterson-Robert A, Martin RE, Harms MJ. Inferring a complete genotype-phenotype map from a small number of measured phenotypes. PLoS Comput Biol 2020; 16:e1008243. [PMID: 32991585 PMCID: PMC7546491 DOI: 10.1371/journal.pcbi.1008243] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2019] [Revised: 10/09/2020] [Accepted: 08/13/2020] [Indexed: 01/02/2023] Open
Abstract
Understanding evolution requires detailed knowledge of genotype-phenotype maps; however, it can be a herculean task to measure every phenotype in a combinatorial map. We have developed a computational strategy to predict the missing phenotypes from an incomplete, combinatorial genotype-phenotype map. As a test case, we used an incomplete genotype-phenotype dataset previously generated for the malaria parasite’s ‘chloroquine resistance transporter’ (PfCRT). Wild-type PfCRT (PfCRT3D7) lacks significant chloroquine (CQ) transport activity, but the introduction of the eight mutations present in the ‘Dd2’ isoform of PfCRT (PfCRTDd2) enables the protein to transport CQ away from its site of antimalarial action. This gain of a transport function imparts CQ resistance to the parasite. A combinatorial map between PfCRT3D7 and PfCRTDd2 consists of 256 genotypes, of which only 52 have had their CQ transport activities measured through expression in the Xenopus laevis oocyte. We trained a statistical model with these 52 measurements to infer the CQ transport activity for the remaining 204 combinatorial genotypes between PfCRT3D7 and PfCRTDd2. Our best-performing model incorporated a binary classifier, a nonlinear scale, and additive effects for each mutation. The addition of specific pairwise- and high-order-epistatic coefficients decreased the predictive power of the model. We evaluated our predictions by experimentally measuring the CQ transport activities of 24 additional PfCRT genotypes. The R2 value between our predicted and newly-measured phenotypes was 0.90. We then used the model to probe the accessibility of evolutionary trajectories through the map. Approximately 1% of the possible trajectories between PfCRT3D7 and PfCRTDd2 are accessible; however, none of the trajectories entailed eight successive increases in CQ transport activity. These results demonstrate that phenotypes can be inferred with known uncertainty from a partial genotype-phenotype dataset. We also validated our approach against a collection of previously published genotype-phenotype maps. The model therefore appears general and should be applicable to a large number of genotype-phenotype maps. Biological macromolecules are built from chains of building blocks. The function of a macromolecule depends on the specific chemical properties of the building blocks that make it up. Macromolecules evolve through mutations that swap one building block for another. Understanding how biomolecules work and evolve therefore requires knowledge of the effects of mutations. The effects of mutations can be measured experimentally; however, because there are a vast number of possible combinations of mutations, it is often difficult to make enough measurements to understand biomolecular function and evolution. In this paper, we describe a simple method to predict the effects of mutations on biomolecules from a small number of measurements. This method works by appropriately averaging the effects of mutations seen in different contexts. We test the method by predicting the effects of mutations on a PfCRT—a macromolecule from the malarial parasite that confers drug resistance. We find that our method is fast and effective. Using a small number of measurements, we were able to gain insight into the evolutionary steps by which this macromolecule conferred drug resistance. To make this method accessible to other researchers, we have released it as an open-source software package: https://gpseer.readthedocs.io.
Collapse
Affiliation(s)
- Zachary R. Sailer
- Institute for Molecular Biology, University of Oregon, Eugene, OR, United States of America
- Department of Chemistry and Biochemistry, University of Oregon, Eugene, OR, United States of America
| | - Sarah H. Shafik
- Research School of Biology, Australian National University, Canberra, ACT, Australia
| | - Robert L. Summers
- Research School of Biology, Australian National University, Canberra, ACT, Australia
| | - Alex Joule
- Research School of Biology, Australian National University, Canberra, ACT, Australia
| | | | - Rowena E. Martin
- Research School of Biology, Australian National University, Canberra, ACT, Australia
- * E-mail: (REM); (MJH)
| | - Michael J. Harms
- Institute for Molecular Biology, University of Oregon, Eugene, OR, United States of America
- Department of Chemistry and Biochemistry, University of Oregon, Eugene, OR, United States of America
- * E-mail: (REM); (MJH)
| |
Collapse
|
42
|
Genotype networks of 80 quantitative Arabidopsis thaliana phenotypes reveal phenotypic evolvability despite pervasive epistasis. PLoS Comput Biol 2020; 16:e1008082. [PMID: 32790763 PMCID: PMC7447023 DOI: 10.1371/journal.pcbi.1008082] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2019] [Revised: 08/25/2020] [Accepted: 06/22/2020] [Indexed: 12/23/2022] Open
Abstract
We study the genotype-phenotype maps of 80 quantitative phenotypes in the model plant Arabidopsis thaliana, by representing the genotypes affecting each phenotype as a genotype network. In such a network, each vertex or node corresponds to an individual's genotype at all those genomic loci that affect a given phenotype. Two vertices are connected by an edge if the associated genotypes differ in exactly one nucleotide. The 80 genotype networks we analyze are based on data from genome-wide association studies of 199 A. thaliana accessions. They form connected graphs whose topography differs substantially among phenotypes. We focus our analysis on the incidence of epistasis (non-additive interactions among mutations) because a high incidence of epistasis can reduce the accessibility of evolutionary paths towards high or low phenotypic values. We find epistatic interactions in 67 phenotypes, and in 51 phenotypes every pairwise mutant interaction is epistatic. Moreover, we find phenotype-specific differences in the fraction of accessible mutational paths to maximum phenotypic values. However, even though epistasis affects the accessibility of maximum phenotypic values, the relationships between genotypic and phenotypic change of our analyzed phenotypes are sufficiently smooth that some evolutionary paths remain accessible for most phenotypes, even where epistasis is pervasive. The genotype network representation we use can complement existing approaches to understand the genetic architecture of polygenic traits in many different organisms.
Collapse
|
43
|
Ballal A, Laurendon C, Salmon M, Vardakou M, Cheema J, Defernez M, O'Maille PE, Morozov AV. Sparse Epistatic Patterns in the Evolution of Terpene Synthases. Mol Biol Evol 2020; 37:1907-1924. [PMID: 32119077 DOI: 10.1093/molbev/msaa052] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
We explore sequence determinants of enzyme activity and specificity in a major enzyme family of terpene synthases. Most enzymes in this family catalyze reactions that produce cyclic terpenes-complex hydrocarbons widely used by plants and insects in diverse biological processes such as defense, communication, and symbiosis. To analyze the molecular mechanisms of emergence of terpene cyclization, we have carried out in-depth examination of mutational space around (E)-β-farnesene synthase, an Artemisia annua enzyme which catalyzes production of a linear hydrocarbon chain. Each mutant enzyme in our synthetic libraries was characterized biochemically, and the resulting reaction rate data were used as input to the Michaelis-Menten model of enzyme kinetics, in which free energies were represented as sums of one-amino-acid contributions and two-amino-acid couplings. Our model predicts measured reaction rates with high accuracy and yields free energy landscapes characterized by relatively few coupling terms. As a result, the Michaelis-Menten free energy landscapes have simple, interpretable structure and exhibit little epistasis. We have also developed biophysical fitness models based on the assumption that highly fit enzymes have evolved to maximize the output of correct products, such as cyclic products or a specific product of interest, while minimizing the output of byproducts. This approach results in nonlinear fitness landscapes that are considerably more epistatic. Overall, our experimental and computational framework provides focused characterization of evolutionary emergence of novel enzymatic functions in the context of microevolutionary exploration of sequence space around naturally occurring enzymes.
Collapse
Affiliation(s)
- Aditya Ballal
- Department of Physics & Astronomy and Center for Quantitative Biology, Rutgers University, Piscataway, NJ
| | - Caroline Laurendon
- John Innes Centre, Department of Metabolic Biology, Norwich Research Park, Norwich, United Kingdom.,Food & Health Programme, Institute of Food Research, Norwich Research Park, Norwich, United Kingdom
| | - Melissa Salmon
- John Innes Centre, Department of Metabolic Biology, Norwich Research Park, Norwich, United Kingdom.,Food & Health Programme, Institute of Food Research, Norwich Research Park, Norwich, United Kingdom.,Earlham Institute, Norwich Research Park, Norwich, United Kingdom
| | - Maria Vardakou
- John Innes Centre, Department of Metabolic Biology, Norwich Research Park, Norwich, United Kingdom.,Food & Health Programme, Institute of Food Research, Norwich Research Park, Norwich, United Kingdom.,School of Biological Sciences, University of East Anglia, Norwich Research Park, Norwich, United Kingdom
| | - Jitender Cheema
- John Innes Centre, Department of Computational and Systems Biology, Norwich Research Park, Norwich, United Kingdom
| | - Marianne Defernez
- Core Science Resources, Quadram Institute, Norwich Research Park, Norwich, United Kingdom
| | - Paul E O'Maille
- John Innes Centre, Department of Metabolic Biology, Norwich Research Park, Norwich, United Kingdom.,Food & Health Programme, Institute of Food Research, Norwich Research Park, Norwich, United Kingdom.,SRI International, Menlo Park, CA
| | - Alexandre V Morozov
- Department of Physics & Astronomy and Center for Quantitative Biology, Rutgers University, Piscataway, NJ
| |
Collapse
|
44
|
Abstract
Cells adapt to changing environments. Perturb a cell and it returns to a point of homeostasis. Perturb a population and it evolves toward a fitness peak. We review quantitative models of the forces of adaptation and their visualizations on landscapes. While some adaptations result from single mutations or few-gene effects, others are more cooperative, more delocalized in the genome, and more universal and physical. For example, homeostasis and evolution depend on protein folding and aggregation, energy and protein production, protein diffusion, molecular motor speeds and efficiencies, and protein expression levels. Models provide a way to learn about the fitness of cells and cell populations by making and testing hypotheses.
Collapse
Affiliation(s)
- Luca Agozzino
- The Louis and Beatrice Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York 11794, USA; .,Department of Physics and Astronomy, Stony Brook University, Stony Brook, New York 11794, USA
| | - Gábor Balázsi
- The Louis and Beatrice Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York 11794, USA; .,Department of Biomedical Engineering, Stony Brook University, Stony Brook, New York 11794, USA
| | - Jin Wang
- The Louis and Beatrice Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York 11794, USA; .,Department of Physics and Astronomy, Stony Brook University, Stony Brook, New York 11794, USA.,Department of Chemistry, Stony Brook University, Stony Brook, New York 11790, USA
| | - Ken A Dill
- The Louis and Beatrice Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York 11794, USA; .,Department of Physics and Astronomy, Stony Brook University, Stony Brook, New York 11794, USA.,Department of Chemistry, Stony Brook University, Stony Brook, New York 11790, USA
| |
Collapse
|
45
|
Zhou J, McCandlish DM. Minimum epistasis interpolation for sequence-function relationships. Nat Commun 2020; 11:1782. [PMID: 32286265 PMCID: PMC7156698 DOI: 10.1038/s41467-020-15512-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Accepted: 03/12/2020] [Indexed: 12/17/2022] Open
Abstract
Massively parallel phenotyping assays have provided unprecedented insight into how multiple mutations combine to determine biological function. While such assays can measure phenotypes for thousands to millions of genotypes in a single experiment, in practice these measurements are not exhaustive, so that there is a need for techniques to impute values for genotypes whose phenotypes have not been directly assayed. Here, we present an imputation method based on inferring the least epistatic possible sequence-function relationship compatible with the data. In particular, we infer the reconstruction where mutational effects change as little as possible across adjacent genetic backgrounds. The resulting models can capture complex higher-order genetic interactions near the data, but approach additivity where data is sparse or absent. We apply the method to high-throughput transcription factor binding assays and use it to explore a fitness landscape for protein G.
Collapse
Affiliation(s)
- Juannan Zhou
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
| | - David M McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA.
| |
Collapse
|
46
|
Crona K, Luo M, Greene D. An uncertainty law for microbial evolution. J Theor Biol 2020; 489:110155. [PMID: 31926205 DOI: 10.1016/j.jtbi.2020.110155] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2019] [Revised: 01/05/2020] [Accepted: 01/07/2020] [Indexed: 11/28/2022]
Abstract
Medical practice would benefit from a thorough understanding of constraints and uncertainty in microbial evolution. Higher order epistasis refers to unexpected effects of multiple mutations even if both single mutations and pairwise effects have been accounted for. Recent studies show that higher order epistasis is abundant in nature, for bacteria as well as higher organisms. However, the importance of higher order effects has been debated. It has been suggested that such effects cannot be interpreted, and should not be considered. Here, we show conclusively that higher order epistasis changes the adaptive prospects for a population. The conclusion is based on an exhaustive search of 193,270,310 hyper-cube graphs and applications of graph theory. Our results are more precise, yet more universal, than related research since they depend on mathematical theory, rather than sampling or simulations. Moreover, the uncertainty we establish for microbial evolution, due to higher order epistasis is not sensitive for detailed model assumptions, such as the baseline being additive or log-additive fitness.
Collapse
Affiliation(s)
- Kristina Crona
- Department of Mathematics and Statistics 4400 Massachusetts Avenue NW Washington, DC 20016-8050, United States.
| | - Mengming Luo
- University of California at San Diego, CA, United States.
| | - Devin Greene
- Department of Mathematics and Statistics 4400 Massachusetts Avenue NW Washington, DC 20016-8050, United States.
| |
Collapse
|
47
|
Miton CM, Chen JZ, Ost K, Anderson DW, Tokuriki N. Statistical analysis of mutational epistasis to reveal intramolecular interaction networks in proteins. Methods Enzymol 2020; 643:243-280. [DOI: 10.1016/bs.mie.2020.07.012] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
|
48
|
Ding X, Zou Z, Brooks Iii CL. Deciphering protein evolution and fitness landscapes with latent space models. Nat Commun 2019; 10:5644. [PMID: 31822668 PMCID: PMC6904478 DOI: 10.1038/s41467-019-13633-0] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2019] [Accepted: 11/12/2019] [Indexed: 12/03/2022] Open
Abstract
Protein sequences contain rich information about protein evolution, fitness landscapes, and stability. Here we investigate how latent space models trained using variational auto-encoders can infer these properties from sequences. Using both simulated and real sequences, we show that the low dimensional latent space representation of sequences, calculated using the encoder model, captures both evolutionary and ancestral relationships between sequences. Together with experimental fitness data and Gaussian process regression, the latent space representation also enables learning the protein fitness landscape in a continuous low dimensional space. Moreover, the model is also useful in predicting protein mutational stability landscapes and quantifying the importance of stability in shaping protein evolution. Overall, we illustrate that the latent space models learned using variational auto-encoders provide a mechanism for exploration of the rich data contained in protein sequences regarding evolution, fitness and stability and hence are well-suited to help guide protein engineering efforts.
Collapse
Affiliation(s)
- Xinqiang Ding
- Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Zhengting Zou
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Charles L Brooks Iii
- Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA.
- Department of Chemistry, University of Michigan, Ann Arbor, MI, 48109, USA.
- Biophysics Program, University of Michigan, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
49
|
Esteban L, Lonishin LR, Bobrovskiy DM, Leleytner G, Bogatyreva NS, Kondrashov FA, Ivankov DN. HypercubeME: two hundred million combinatorially complete datasets from a single experiment. Bioinformatics 2019; 36:btz841. [PMID: 31742320 PMCID: PMC7703787 DOI: 10.1093/bioinformatics/btz841] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2019] [Revised: 11/01/2019] [Accepted: 11/07/2019] [Indexed: 11/17/2022] Open
Abstract
MOTIVATION Epistasis, the context-dependence of the contribution of an amino acid substitution to fitness, is common in evolution. To detect epistasis, fitness must be measured for at least four genotypes: the reference genotype, two different single mutants and a double mutant with both of the single mutations. For higher-order epistasis of the order n, fitness has to be measured for all 2n genotypes of an n-dimensional hypercube in genotype space forming a "combinatorially complete dataset". So far, only a handful of such datasets have been produced by manual curation. Concurrently, random mutagenesis experiments have produced measurements of fitness and other phenotypes in a high-throughput manner, potentially containing a number of combinatorially complete datasets. RESULTS We present an effective recursive algorithm for finding all hypercube structures in random mutagenesis experimental data. To test the algorithm, we applied it to the data from a recent HIS3 protein dataset and found all 199,847,053 unique combinatorially complete genotype combinations of dimensionality ranging from two to twelve. The algorithm may be useful for researchers looking for higher-order epistasis in their high-throughput experimental data. AVAILABILITY https://github.com/ivankovlab/HypercubeME.git. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Lyubov R Lonishin
- Faculty of Medical Physics, Institute of Biomedical System and Technologies, Peter the Great Saint Petersburg Polytechnic University, Saint Petersburg 195251, Russia
| | - Daniil M Bobrovskiy
- Faculty of Bioengineering and Bioinformatics, Moscow State University, Moscow 119234, Russia
| | - Gregory Leleytner
- Department of Innovation and High Technology, Moscow Institute of Physics and Technology, Moscow 141701, Russia
| | - Natalya S Bogatyreva
- Universitat Pompeu Fabra (UPF), Barcelona 08003, Spain
- Bioinformatics and Genomics Programme, Center for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, 08003 Barcelona, Spain
- Laboratory of Protein Physics, Institute of Protein Research of the Russian Academy of Sciences, Moscow 142290, Russia
| | | | - Dmitry N Ivankov
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow 121205, Russia
| |
Collapse
|
50
|
Higher-order epistasis shapes the fitness landscape of a xenobiotic-degrading enzyme. Nat Chem Biol 2019; 15:1120-1128. [DOI: 10.1038/s41589-019-0386-3] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2018] [Accepted: 09/06/2019] [Indexed: 12/31/2022]
|