1
|
Swint-Kruse L, Fenton AW. Rheostats, toggles, and neutrals, Oh my! A new framework for understanding how amino acid changes modulate protein function. J Biol Chem 2024; 300:105736. [PMID: 38336297 PMCID: PMC10914490 DOI: 10.1016/j.jbc.2024.105736] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 01/09/2024] [Accepted: 01/25/2024] [Indexed: 02/12/2024] Open
Abstract
Advances in personalized medicine and protein engineering require accurately predicting outcomes of amino acid substitutions. Many algorithms correctly predict that evolutionarily-conserved positions show "toggle" substitution phenotypes, which is defined when a few substitutions at that position retain function. In contrast, predictions often fail for substitutions at the less-studied "rheostat" positions, which are defined when different amino acid substitutions at a position sample at least half of the possible functional range. This review describes efforts to understand the impact and significance of rheostat positions: (1) They have been observed in globular soluble, integral membrane, and intrinsically disordered proteins; within single proteins, their prevalence can be up to 40%. (2) Substitutions at rheostat positions can have biological consequences and ∼10% of substitutions gain function. (3) Although both rheostat and "neutral" (defined when all substitutions exhibit wild-type function) positions are nonconserved, the two classes have different evolutionary signatures. (4) Some rheostat positions have pleiotropic effects on function, simultaneously modulating multiple parameters (e.g., altering both affinity and allosteric coupling). (5) In structural studies, substitutions at rheostat positions appear to cause only local perturbations; the overall conformations appear unchanged. (6) Measured functional changes show promising correlations with predicted changes in protein dynamics; the emergent properties of predicted, dynamically coupled amino acid networks might explain some of the complex functional outcomes observed when substituting rheostat positions. Overall, rheostat positions provide unique opportunities for using single substitutions to tune protein function. Future studies of these positions will yield important insights into the protein sequence/function relationship.
Collapse
Affiliation(s)
- Liskin Swint-Kruse
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA.
| | - Aron W Fenton
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| |
Collapse
|
2
|
Koch J, Romero‐Romero S, Höcker B. Stepwise introduction of stabilizing mutations reveals nonlinear additive effects in de novo TIM barrels. Protein Sci 2024; 33:e4926. [PMID: 38380781 PMCID: PMC10880431 DOI: 10.1002/pro.4926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Revised: 01/29/2024] [Accepted: 01/30/2024] [Indexed: 02/22/2024]
Abstract
Over the past decades, the TIM-barrel fold has served as a model system for the exploration of how changes in protein sequences affect their structural, stability, and functional characteristics, and moreover, how this information can be leveraged to design proteins from the ground up. After numerous attempts to design de novo proteins with this specific fold, sTIM11 was the first validated de novo design of an idealized four-fold symmetric TIM barrel. Subsequent efforts to enhance the stability of this initial design resulted in the development of DeNovoTIMs, a family of de novo TIM barrels with various stabilizing mutations. In this study, we present an investigation into the biophysical and thermodynamic effects upon introducing a varying number of stabilizing mutations per quarter along the sequence of a four-fold symmetric TIM barrel. We compared the base design DeNovoTIM0 without any stabilizing mutations with variants containing mutations in one, two, three, and all four quarters-designated TIM1q, TIM2q, TIM3q, and DeNovoTIM6, respectively. This analysis revealed a stepwise and nonlinear change in the thermodynamic properties that correlated with the number of mutated quarters, suggesting positive nonadditive effects. To shed light on the significance of the location of stabilized quarters, we engineered two variants of TIM2q which contain the same number of mutations but positioned in different quarter locations. Characterization of these TIM2q variants revealed that the mutations exhibit varying effects on the overall protein stability, contingent upon the specific region in which they are introduced. These findings emphasize that the amount and location of stabilized interfaces among the four quarters play a crucial role in shaping the conformational stability of these four-fold symmetric TIM barrels. Analysis of de novo proteins, as described in this study, enhances our understanding of how sequence variations can finely modulate stability in both naturally occurring and computationally designed proteins.
Collapse
Affiliation(s)
| | | | - Birte Höcker
- Department of BiochemistryUniversity of BayreuthBayreuthGermany
| |
Collapse
|
3
|
Notin P, Kollasch AW, Ritter D, van Niekerk L, Paul S, Spinner H, Rollins N, Shaw A, Weitzman R, Frazer J, Dias M, Franceschi D, Orenbuch R, Gal Y, Marks DS. ProteinGym: Large-Scale Benchmarks for Protein Design and Fitness Prediction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.07.570727. [PMID: 38106144 PMCID: PMC10723403 DOI: 10.1101/2023.12.07.570727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
Predicting the effects of mutations in proteins is critical to many applications, from understanding genetic disease to designing novel proteins that can address our most pressing challenges in climate, agriculture and healthcare. Despite a surge in machine learning-based protein models to tackle these questions, an assessment of their respective benefits is challenging due to the use of distinct, often contrived, experimental datasets, and the variable performance of models across different protein families. Addressing these challenges requires scale. To that end we introduce ProteinGym, a large-scale and holistic set of benchmarks specifically designed for protein fitness prediction and design. It encompasses both a broad collection of over 250 standardized deep mutational scanning assays, spanning millions of mutated sequences, as well as curated clinical datasets providing high-quality expert annotations about mutation effects. We devise a robust evaluation framework that combines metrics for both fitness prediction and design, factors in known limitations of the underlying experimental methods, and covers both zero-shot and supervised settings. We report the performance of a diverse set of over 70 high-performing models from various subfields (eg., alignment-based, inverse folding) into a unified benchmark suite. We open source the corresponding codebase, datasets, MSAs, structures, model predictions and develop a user-friendly website that facilitates data access and analysis.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Ada Shaw
- Applied Mathematics, Harvard University
| | | | | | - Mafalda Dias
- Centre for Genomic Regulation, Universitat Pompeu Fabra
| | | | | | - Yarin Gal
- Computer Science, University of Oxford
| | | |
Collapse
|
4
|
Haddox HK, Galloway JG, Dadonaite B, Bloom JD, Matsen IV FA, DeWitt WS. Jointly modeling deep mutational scans identifies shifted mutational effects among SARS-CoV-2 spike homologs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.31.551037. [PMID: 37577604 PMCID: PMC10418112 DOI: 10.1101/2023.07.31.551037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Deep mutational scanning (DMS) is a high-throughput experimental technique that measures the effects of thousands of mutations to a protein. These experiments can be performed on multiple homologs of a protein or on the same protein selected under multiple conditions. It is often of biological interest to identify mutations with shifted effects across homologs or conditions. However, it is challenging to determine if observed shifts arise from biological signal or experimental noise. Here, we describe a method for jointly inferring mutational effects across multiple DMS experiments while also identifying mutations that have shifted in their effects among experiments. A key aspect of our method is to regularize the inferred shifts, so that they are nonzero only when strongly supported by the data. We apply this method to DMS experiments that measure how mutations to spike proteins from SARS-CoV-2 variants (Delta, Omicron BA.1, and Omicron BA.2) affect cell entry. Most mutational effects are conserved between these spike homologs, but a fraction have markedly shifted. We experimentally validate a subset of the mutations inferred to have shifted effects, and confirm differences of > 1,000-fold in the impact of the same mutation on spike-mediated viral infection across spikes from different SARS-CoV-2 variants. Overall, our work establishes a general approach for comparing sets of DMS experiments to identify biologically important shifts in mutational effects.
Collapse
Affiliation(s)
- Hugh K. Haddox
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98102, USA
| | - Jared G. Galloway
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98102, USA
| | - Bernadeta Dadonaite
- Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Jesse D. Bloom
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98102, USA
- Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, Seattle, WA 98109, USA
| | - Frederick A. Matsen IV
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98102, USA
- Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
- Howard Hughes Medical Institute, Seattle, WA 98109, USA
- Department of Statistics, University of Washington, Seattle, WA 98195, USA
| | - William S. DeWitt
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720, USA
| |
Collapse
|
5
|
Swint-Kruse L, Dougherty LL, Page B, Wu T, O’Neil PT, Prasannan CB, Timmons C, Tang Q, Parente DJ, Sreenivasan S, Holyoak T, Fenton AW. PYK-SubstitutionOME: an integrated database containing allosteric coupling, ligand affinity and mutational, structural, pathological, bioinformatic and computational information about pyruvate kinase isozymes. Database (Oxford) 2023; 2023:baad030. [PMID: 37171062 PMCID: PMC10176505 DOI: 10.1093/database/baad030] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 03/29/2023] [Accepted: 04/11/2023] [Indexed: 05/13/2023]
Abstract
Interpreting changes in patient genomes, understanding how viruses evolve and engineering novel protein function all depend on accurately predicting the functional outcomes that arise from amino acid substitutions. To that end, the development of first-generation prediction algorithms was guided by historic experimental datasets. However, these datasets were heavily biased toward substitutions at positions that have not changed much throughout evolution (i.e. conserved). Although newer datasets include substitutions at positions that span a range of evolutionary conservation scores, these data are largely derived from assays that agglomerate multiple aspects of function. To facilitate predictions from the foundational chemical properties of proteins, large substitution databases with biochemical characterizations of function are needed. We report here a database derived from mutational, biochemical, bioinformatic, structural, pathological and computational studies of a highly studied protein family-pyruvate kinase (PYK). A centerpiece of this database is the biochemical characterization-including quantitative evaluation of allosteric regulation-of the changes that accompany substitutions at positions that sample the full conservation range observed in the PYK family. We have used these data to facilitate critical advances in the foundational studies of allosteric regulation and protein evolution and as rigorous benchmarks for testing protein predictions. We trust that the collected dataset will be useful for the broader scientific community in the further development of prediction algorithms. Database URL https://github.com/djparente/PYK-DB.
Collapse
Affiliation(s)
- Liskin Swint-Kruse
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, 3901 Rainbow Blvd., Kansas City, KS 66160, USA
| | - Larissa L Dougherty
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, 3901 Rainbow Blvd., Kansas City, KS 66160, USA
| | - Braelyn Page
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, 3901 Rainbow Blvd., Kansas City, KS 66160, USA
| | - Tiffany Wu
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, 3901 Rainbow Blvd., Kansas City, KS 66160, USA
| | - Pierce T O’Neil
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, 3901 Rainbow Blvd., Kansas City, KS 66160, USA
| | - Charulata B Prasannan
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, 3901 Rainbow Blvd., Kansas City, KS 66160, USA
| | - Cody Timmons
- Chemistry Department, Southwestern Oklahoma State University, 100 Campus Dr., Weatherford, OK 73096, USA
| | - Qingling Tang
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, 3901 Rainbow Blvd., Kansas City, KS 66160, USA
| | - Daniel J Parente
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, 3901 Rainbow Blvd., Kansas City, KS 66160, USA
- Department of Family Medicine and Community Health, The University of Kansas Medical Center, 3901 Rainbow Blvd., Kansas City, KS 66160, USA
| | - Shwetha Sreenivasan
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, 3901 Rainbow Blvd., Kansas City, KS 66160, USA
| | - Todd Holyoak
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, 3901 Rainbow Blvd., Kansas City, KS 66160, USA
- Department of Biology, University of Waterloo, 200 University Ave. W, Waterloo, ON N2L 3G1, Canada
| | - Aron W Fenton
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, 3901 Rainbow Blvd., Kansas City, KS 66160, USA
| |
Collapse
|
6
|
Page BM, Martin TA, Wright CL, Fenton LA, Villar MT, Tang Q, Artigues A, Lamb A, Fenton AW, Swint-Kruse L. Odd one out? Functional tuning of Zymomonas mobilis pyruvate kinase is narrower than its allosteric, human counterpart. Protein Sci 2022; 31:e4336. [PMID: 35762709 DOI: 10.1002/pro.4336] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2022] [Revised: 04/29/2022] [Accepted: 05/03/2022] [Indexed: 11/08/2022]
Abstract
Various protein properties are often illuminated using sequence comparisons of protein homologs. For example, in analyses of the pyruvate kinase multiple sequence alignment, the set of positions that changed during speciation ("phylogenetic" positions) were enriched for "rheostat" positions in human liver pyruvate kinase (hLPYK). (Rheostat positions are those which, when substituted with various amino acids, yield a range of functional outcomes). However, the correlation was moderate, which could result from multiple biophysical constraints acting on the same position during evolution and/or various sources of noise. To further examine this correlation, we here tested Zymomonas mobilis PYK (ZmPYK), which has <65% sequence identity to any other PYK sequence. Twenty-six ZmPYK positions were selected based on their phylogenetic scores, substituted with multiple amino acids, and assessed for changes in Kapp-PEP . Although we expected to identify multiple, strong rheostat positions, only one moderate rheostat position was detected. Instead, nearly half of the 271 ZmPYK variants were inactive and most others showed near wild-type function. Indeed, for the active ZmPYK variants, the total range of Kapp,PEP values ("tunability") was 40-fold less than that observed for hLPYK variants. The combined functional studies and sequence comparisons suggest that ZmPYK has evolved functional and/or structural attributes that differ from the rest of the family. We hypothesize that including such "orphan" sequences in MSA analyses obscures the correlations used to predict rheostat positions. Finally, results raise the intriguing biophysical question as to how the same protein fold can support rheostat positions in one homolog but not another.
Collapse
Affiliation(s)
- Braelyn M Page
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Tyler A Martin
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Collette L Wright
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA.,Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas, USA
| | - Lauren A Fenton
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Maite T Villar
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Qingling Tang
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Antonio Artigues
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Audrey Lamb
- Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas, USA.,Department of Chemistry, University of Texas at San Antonio, San Antonio, Texas, USA
| | - Aron W Fenton
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Liskin Swint-Kruse
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| |
Collapse
|
7
|
Gonzalez Somermeyer L, Fleiss A, Mishin AS, Bozhanova NG, Igolkina AA, Meiler J, Alaball Pujol ME, Putintseva EV, Sarkisyan KS, Kondrashov FA. Heterogeneity of the GFP fitness landscape and data-driven protein design. eLife 2022; 11:75842. [PMID: 35510622 PMCID: PMC9119679 DOI: 10.7554/elife.75842] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Accepted: 03/25/2022] [Indexed: 11/24/2022] Open
Abstract
Studies of protein fitness landscapes reveal biophysical constraints guiding protein evolution and empower prediction of functional proteins. However, generalisation of these findings is limited due to scarceness of systematic data on fitness landscapes of proteins with a defined evolutionary relationship. We characterized the fitness peaks of four orthologous fluorescent proteins with a broad range of sequence divergence. While two of the four studied fitness peaks were sharp, the other two were considerably flatter, being almost entirely free of epistatic interactions. Mutationally robust proteins, characterized by a flat fitness peak, were not optimal templates for machine-learning-driven protein design - instead, predictions were more accurate for fragile proteins with epistatic landscapes. Our work paves insights for practical application of fitness landscape heterogeneity in protein engineering.
Collapse
Affiliation(s)
| | - Aubin Fleiss
- Synthetic Biology Group, MRC London Institute of Medical SciencesLondonUnited Kingdom,Institute of Clinical Sciences, Faculty of Medicine and Imperial College Centre for Synthetic Biology, Imperial College LondonLondonUnited Kingdom
| | - Alexander S Mishin
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of SciencesMoscowRussian Federation
| | - Nina G Bozhanova
- Department of Chemistry, Center for Structural Biology, Vanderbilt UniversityNashvilleUnited States
| | - Anna A Igolkina
- Gregor Mendel Institute, Austrian Academy of Sciences, Vienna BioCenterViennaAustria
| | - Jens Meiler
- Department of Chemistry, Center for Structural Biology, Vanderbilt UniversityNashvilleUnited States,Institute for Drug Discovery, Medical School, Leipzig UniversityLeipzigGermany
| | - Maria-Elisenda Alaball Pujol
- Synthetic Biology Group, MRC London Institute of Medical SciencesLondonUnited Kingdom,Institute of Clinical Sciences, Faculty of Medicine and Imperial College Centre for Synthetic Biology, Imperial College LondonLondonUnited Kingdom
| | | | - Karen S Sarkisyan
- Synthetic Biology Group, MRC London Institute of Medical SciencesLondonUnited Kingdom,Institute of Clinical Sciences, Faculty of Medicine and Imperial College Centre for Synthetic Biology, Imperial College LondonLondonUnited Kingdom,Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of SciencesMoscowRussian Federation
| | - Fyodor A Kondrashov
- Institute of Science and Technology AustriaKlosterneuburgAustria,Evolutionary and Synthetic Biology Unit, Okinawa Institute of Science and Technology Graduate UniversityOkinawaJapan
| |
Collapse
|
8
|
Youssef N, Susko E, Roger AJ, Bielawski JP. Shifts in amino acid preferences as proteins evolve: A synthesis of experimental and theoretical work. Protein Sci 2021; 30:2009-2028. [PMID: 34322924 PMCID: PMC8442975 DOI: 10.1002/pro.4161] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Revised: 07/19/2021] [Accepted: 07/26/2021] [Indexed: 11/08/2022]
Abstract
Amino acid preferences vary across sites and time. While variation across sites is widely accepted, the extent and frequency of temporal shifts are contentious. Our understanding of the drivers of amino acid preference change is incomplete: To what extent are temporal shifts driven by adaptive versus nonadaptive evolutionary processes? We review phenomena that cause preferences to vary (e.g., evolutionary Stokes shift, contingency, and entrenchment) and clarify how they differ. To determine the extent and prevalence of shifted preferences, we review experimental and theoretical studies. Analyses of natural sequence alignments often detect decreases in homoplasy (convergence and reversions) rates, and variation in replacement rates with time-signals that are consistent with temporally changing preferences. While approaches inferring shifts in preferences from patterns in natural alignments are valuable, they are indirect since multiple mechanisms (both adaptive and nonadaptive) could lead to the observed signal. Alternatively, site-directed mutagenesis experiments allow for a more direct assessment of shifted preferences. They corroborate evidence from multiple sequence alignments, revealing that the preference for an amino acid at a site varies depending on the background sequence. However, shifts in preferences are usually minor in magnitude and sites with significantly shifted preferences are low in frequency. The small yet consistent perturbations in preferences could, nevertheless, jeopardize the accuracy of inference procedures, which assume constant preferences. We conclude by discussing if and how such shifts in preferences might influence widely used time-homogenous inference procedures and potential ways to mitigate such effects.
Collapse
Affiliation(s)
- Noor Youssef
- Department of BiologyDalhousie UniversityHalifaxNova ScotiaCanada
| | - Edward Susko
- Department of Mathematics and StatisticsDalhousie UniversityHalifaxNova ScotiaCanada
| | - Andrew J. Roger
- Department of Biochemistry and Molecular BiologyDalhousie UniversityHalifaxNova ScotiaCanada
| | - Joseph P. Bielawski
- Department of BiologyDalhousie UniversityHalifaxNova ScotiaCanada
- Department of Mathematics and StatisticsDalhousie UniversityHalifaxNova ScotiaCanada
| |
Collapse
|
9
|
Yazhini A, Sandhya S, Srinivasan N. Rewards of divergence in sequences, 3-D structures and dynamics of yeast and human spliceosome SF3b complexes. Curr Res Struct Biol 2021; 3:133-145. [PMID: 35028595 PMCID: PMC8714771 DOI: 10.1016/j.crstbi.2021.05.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2020] [Revised: 05/21/2021] [Accepted: 05/26/2021] [Indexed: 12/21/2022] Open
Abstract
The evolution of homologous and functionally equivalent multiprotein assemblies is intriguing considering sequence divergence of constituent proteins. Here, we studied the implications of protein sequence divergence on the structure, dynamics and function of homologous yeast and human SF3b spliceosomal subcomplexes. Human and yeast SF3b comprise of 7 and 6 proteins respectively, with all yeast proteins homologous to their human counterparts at moderate sequence identity. SF3b6, an additional component in the human SF3b, interacts with the N-terminal extension of SF3b1 while the yeast homologue Hsh155 lacks the equivalent region. Through detailed homology studies, we show that SF3b6 is absent not only in yeast but in multiple lineages of eukaryotes implying that it is critical in specific organisms. We probed for the potential role of SF3b6 in the spliceosome assembled form through structural and flexibility analyses. By analysing normal modes derived from anisotropic network models of SF3b1, we demonstrate that when SF3b1 is bound to SF3b6, similarities in the magnitude of residue motions (0.86) and inter-residue correlated motions (0.94) with Hsh155 are significantly higher than when SF3b1 is considered in isolation (0.21 and 0.89 respectively). We observed that SF3b6 promotes functionally relevant 'open-to-close' transition in SF3b1 by enhancing concerted residue motions. Such motions are found to occur in the Hsh155 without SF3b6. The presence of SF3b6 influences motions of 16 residues that interact with U2 snRNA/branchpoint duplex and supports the participation of its interface residues in long-range communication in the SF3b1. These results advocate that SF3b6 potentially acts as an allosteric regulator of SF3b1 for BPS selection and might play a role in alternative splicing. Furthermore, we observe variability in the relative orientation of SF3b4 and in the local structure of three β-propeller domains of SF3b3 with reference to their yeast counterparts. Such differences influence the inter-protein interactions of SF3b between these two organisms. Together, our findings highlight features of SF3b evolution and suggests that the human SF3b may have evolved sophisticated mechanisms to fine tune its molecular function.
Collapse
Key Words
- Allostery
- BPS, branch-point sequence
- Bact, activated B spliceosome assembly
- Cryo-EM structure
- Cryo-EM, cryo-electron microscopy
- DOPE, discrete optimized protein energy
- NMA, normal mode analysis
- PDB, protein data bank
- Protein dynamics
- RMSD, root mean square deviation
- RRM, RNA recognition motif
- SF3b complex
- SF3b1
- SF3b1SF3b6−bound, SF3b1 bound to SF3b6
- SF3b1iso, SF3b1 in isolation
- SIP, square inner product
- Spliceosome
Collapse
Affiliation(s)
- Arangasamy Yazhini
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, 560012, India
| | - Sankaran Sandhya
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, 560012, India
| | | |
Collapse
|
10
|
Romero-Romero S, Kordes S, Michel F, Höcker B. Evolution, folding, and design of TIM barrels and related proteins. Curr Opin Struct Biol 2021; 68:94-104. [PMID: 33453500 PMCID: PMC8250049 DOI: 10.1016/j.sbi.2020.12.007] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2020] [Revised: 12/13/2020] [Accepted: 12/14/2020] [Indexed: 12/16/2022]
Abstract
Proteins are chief actors in life that perform a myriad of exquisite functions. This diversity has been enabled through the evolution and diversification of protein folds. Analysis of sequences and structures strongly suggest that numerous protein pieces have been reused as building blocks and propagated to many modern folds. This information can be traced to understand how the protein world has diversified. In this review, we discuss the latest advances in the analysis of protein evolutionary units, and we use as a model system one of the most abundant and versatile topologies, the TIM-barrel fold, to highlight the existing common principles that interconnect protein evolution, structure, folding, function, and design.
Collapse
Affiliation(s)
| | - Sina Kordes
- Department of Biochemistry, University of Bayreuth, 95447 Bayreuth, Germany
| | - Florian Michel
- Department of Biochemistry, University of Bayreuth, 95447 Bayreuth, Germany
| | - Birte Höcker
- Department of Biochemistry, University of Bayreuth, 95447 Bayreuth, Germany.
| |
Collapse
|
11
|
A conserved folding nucleus sculpts the free energy landscape of bacterial and archaeal orthologs from a divergent TIM barrel family. Proc Natl Acad Sci U S A 2021; 118:2019571118. [PMID: 33875592 PMCID: PMC8092565 DOI: 10.1073/pnas.2019571118] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Orthologous proteins from the three superkingdoms have conserved their structures and functions over evolutionary time. We ask whether their folding mechanisms and the structures of their partially folded states are similarly conserved, using bacterial and archaeal representatives of the IGPS TIM barrel enzyme. Comparison of circular dichroism and fluorescence spectroscopic studies reveal a highly conserved mechanism, and hydrogen–deuterium exchange mass spectrometry analyses highlight similar cores of stability in regions dominated by clusters of branched aliphatic side chains. A bioinformatics analysis of hundreds of IGPS sequences from each superkingdom shows a very highly conserved sequence, V/ILLI, that nucleates the formation of a misfolded, microsecond intermediate and has existed since the last universal common ancestor of the IGPS family of proteins. The amino acid sequences of proteins have evolved over billions of years, preserving their structures and functions while responding to evolutionary forces. Are there conserved sequence and structural elements that preserve the protein folding mechanisms? The functionally diverse and ancient (βα)1–8 TIM barrel motif may answer this question. We mapped the complex six-state folding free energy surface of a ∼3.6 billion y old, bacterial indole-3-glycerol phosphate synthase (IGPS) TIM barrel enzyme by equilibrium and kinetic hydrogen–deuterium exchange mass spectrometry (HDX-MS). HDX-MS on the intact protein reported exchange in the native basin and the presence of two thermodynamically distinct on- and off-pathway intermediates in slow but dynamic equilibrium with each other. Proteolysis revealed protection in a small (α1β2) and a large cluster (β5α5β6α6β7) and that these clusters form cores of stability in Ia and Ibp. The strongest protection in both states resides in β4α4 with the highest density of branched aliphatic side chain contacts in the folded structure. Similar correlations were observed previously for an evolutionarily distinct archaeal IGPS, emphasizing a key role for hydrophobicity in stabilizing common high-energy folding intermediates. A bioinformatics analysis of IGPS sequences from the three superkingdoms revealed an exceedingly high hydrophobicity and surprising α-helix propensity for β4, preceded by a highly conserved βα-hairpin clamp that links β3 and β4. The conservation of the folding mechanisms for archaeal and bacterial IGPS proteins reflects the conservation of key elements of sequence and structure that first appeared in the last universal common ancestor of these ancient proteins.
Collapse
|
12
|
Munro D, Singh M. DeMaSk: a deep mutational scanning substitution matrix and its use for variant impact prediction. Bioinformatics 2020; 36:5322-5329. [PMID: 33325500 PMCID: PMC8016454 DOI: 10.1093/bioinformatics/btaa1030] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Revised: 10/16/2020] [Accepted: 11/30/2020] [Indexed: 01/27/2023] Open
Abstract
Motivation Accurately predicting the quantitative impact of a substitution on a protein’s molecular function would be a great aid in understanding the effects of observed genetic variants across populations. While this remains a challenging task, new approaches can leverage data from the increasing numbers of comprehensive deep mutational scanning (DMS) studies that systematically mutate proteins and measure fitness. Results We introduce DeMaSk, an intuitive and interpretable method based only upon DMS datasets and sequence homologs that predicts the impact of missense mutations within any protein. DeMaSk first infers a directional amino acid substitution matrix from DMS datasets and then fits a linear model that combines these substitution scores with measures of per-position evolutionary conservation and variant frequency across homologs. Despite its simplicity, DeMaSk has state-of-the-art performance in predicting the impact of amino acid substitutions, and can easily and rapidly be applied to any protein sequence. Availability and implementation https://demask.princeton.edu generates fitness impact predictions and visualizations for any user-submitted protein sequence. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Daniel Munro
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, 08544, USA
| | - Mona Singh
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, 08544, USA.,Department of Computer Science, Princeton University, Princeton, 08544, USA
| |
Collapse
|
13
|
Chan YH, Zeldovich KB, Matthews CR. An allosteric pathway explains beneficial fitness in yeast for long-range mutations in an essential TIM barrel enzyme. Protein Sci 2020; 29:1911-1923. [PMID: 32643222 PMCID: PMC7454521 DOI: 10.1002/pro.3911] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Revised: 07/03/2020] [Accepted: 07/07/2020] [Indexed: 11/06/2022]
Abstract
Protein evolution proceeds by a complex response of organismal fitness to mutations that can simultaneously affect protein stability, structure, and enzymatic activity. To probe the relationship between genotype and phenotype, we chose a fundamental paradigm for protein evolution, folding, and design, the (βα)8 TIM barrel fold. Here, we demonstrate the role of long-range allosteric interactions in the adaptation of an essential hyperthermophilic TIM barrel enzyme to mesophilic conditions in a yeast host. Beneficial fitness effects observed with single and double mutations of the canonical βα-hairpin clamps and the α-helical shell distal to the active site revealed an underlying energy network between opposite faces of the cylindrical β-barrel. We experimentally determined the fitness of multiple mutants in the energetic phase plane, contrasting the energy barrier of the chemical reaction and the folding free energy of the protein. For the system studied, the reaction energy barrier was the primary determinant of organism fitness. Our observations of long-range epistatic interactions uncovered an allosteric pathway in an ancient and ubiquitous enzyme that may provide a novel way of designing proteins with a desired activity and stability profile.
Collapse
Affiliation(s)
- Yvonne H Chan
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts, USA.,Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, Massachusetts, USA.,Sanofi Pasteur, Cambridge, Massachusetts, USA
| | - Konstantin B Zeldovich
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, Massachusetts, USA.,Sanofi Pasteur, Cambridge, Massachusetts, USA
| | - Charles R Matthews
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts, USA
| |
Collapse
|
14
|
Martin TA, Wu T, Tang Q, Dougherty LL, Parente DJ, Swint-Kruse L, Fenton AW. Identification of biochemically neutral positions in liver pyruvate kinase. Proteins 2020; 88:1340-1350. [PMID: 32449829 DOI: 10.1002/prot.25953] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Revised: 03/10/2020] [Accepted: 05/16/2020] [Indexed: 01/08/2023]
Abstract
Understanding how each residue position contributes to protein function has been a long-standing goal in protein science. Substitution studies have historically focused on conserved protein positions. However, substitutions of nonconserved positions can also modify function. Indeed, we recently identified nonconserved positions that have large substitution effects in human liver pyruvate kinase (hLPYK), including altered allosteric coupling. To facilitate a comparison of which characteristics determine when a nonconserved position does vs does not contribute to function, the goal of the current work was to identify neutral positions in hLPYK. However, existing hLPYK data showed that three features commonly associated with neutral positions-high sequence entropy, high surface exposure, and alanine scanning-lacked the sensitivity needed to guide experimental studies. We used multiple evolutionary patterns identified in a sequence alignment of the PYK family to identify which positions were least patterned, reasoning that these were most likely to be neutral. Nine positions were tested with a total of 117 amino acid substitutions. Although exploring all potential functions is not feasible for any protein, five parameters associated with substrate/effector affinities and allosteric coupling were measured for hLPYK variants. For each position, the aggregate functional outcomes of all variants were used to quantify a "neutrality" score. Three positions showed perfect neutral scores for all five parameters. Furthermore, the nine positions showed larger neutral scores than 17 positions located near allosteric binding sites. Thus, our strategy successfully enriched the dataset for positions with neutral and modest substitutions.
Collapse
Affiliation(s)
- Tyler A Martin
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Tiffany Wu
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Qingling Tang
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Larissa L Dougherty
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Daniel J Parente
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA.,Department of Family and Community Medicine, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Liskin Swint-Kruse
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Aron W Fenton
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| |
Collapse
|
15
|
Abstract
To achieve the full potential of pharmacogenomics, one must accurately predict the functional outcomes that arise from amino acid substitutions in proteins. Classically, researchers have focused on understanding the consequences of individual substitutions. However, literature surveys have shown that most substitutions were created at evolutionarily conserved positions. Awareness of this bias leads to a shift in perspective, from considering the outcomes of individual substitutions to understanding the roles of individual protein positions. Conserved positions tend to act as “toggle” switches, with most substitutions abolishing function. However, nonconserved positions have been found equally capable of affecting protein function. Indeed, many nonconserved positions act like functional dimmer switches (“rheostat” positions): this is revealed when multiple substitutions are made at a single position. Each substitution has a different functional outcome; the set of substitutions spans a range of outcomes. Finally, some nonconserved positions appear neutral, capable of accommodating all amino acid types without modifying function. This paper reviews the currently-known properties of rheostat positions, with examples shown for pyruvate kinase, organic anion transporting polypeptide 1B1, the beta-lactamase inhibitory protein, and angiotensin-converting enzyme 2. Outcomes observed for rheostat positions have implications for the rational design of drug analogs and allosteric drugs. Furthermore, this new framework—comprising three types of protein positions—provides a new approach to interpreting disease and population-based databases of amino acid changes. In conclusion, although a full understanding of substitution outcomes at rheostat positions poses a challenge, utilization of this new frame of reference will further advance the application of pharmacogenomics.
Collapse
|
16
|
Esposito D, Weile J, Shendure J, Starita LM, Papenfuss AT, Roth FP, Fowler DM, Rubin AF. MaveDB: an open-source platform to distribute and interpret data from multiplexed assays of variant effect. Genome Biol 2019; 20:223. [PMID: 31679514 PMCID: PMC6827219 DOI: 10.1186/s13059-019-1845-6] [Citation(s) in RCA: 96] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2019] [Accepted: 10/01/2019] [Indexed: 11/10/2022] Open
Abstract
Multiplex assays of variant effect (MAVEs), such as deep mutational scans and massively parallel reporter assays, test thousands of sequence variants in a single experiment. Despite the importance of MAVE data for basic and clinical research, there is no standard resource for their discovery and distribution. Here, we present MaveDB ( https://www.mavedb.org ), a public repository for large-scale measurements of sequence variant impact, designed for interoperability with applications to interpret these datasets. We also describe the first such application, MaveVis, which retrieves, visualizes, and contextualizes variant effect maps. Together, the database and applications will empower the community to mine these powerful datasets.
Collapse
Affiliation(s)
- Daniel Esposito
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
| | - Jochen Weile
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Lea M Starita
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Anthony T Papenfuss
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
- Department of Medical Biology, University of Melbourne, Melbourne, VIC, Australia
- Bioinformatics and Cancer Genomics Laboratory, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
- Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, VIC, Australia
- Department of Mathematics and Statistics, University of Melbourne, Melbourne, VIC, Australia
| | - Frederick P Roth
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada.
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada.
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.
- Department of Computer Science, University of Toronto, Toronto, ON, Canada.
- Canadian Institute for Advanced Research, Toronto, ON, Canada.
| | - Douglas M Fowler
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Canadian Institute for Advanced Research, Toronto, ON, Canada.
- Department of Bioengineering, University of Washington, Seattle, WA, USA.
| | - Alan F Rubin
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia.
- Department of Medical Biology, University of Melbourne, Melbourne, VIC, Australia.
- Bioinformatics and Cancer Genomics Laboratory, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia.
| |
Collapse
|
17
|
Kemble H, Nghe P, Tenaillon O. Recent insights into the genotype-phenotype relationship from massively parallel genetic assays. Evol Appl 2019; 12:1721-1742. [PMID: 31548853 PMCID: PMC6752143 DOI: 10.1111/eva.12846] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2019] [Revised: 06/21/2019] [Accepted: 07/02/2019] [Indexed: 12/20/2022] Open
Abstract
With the molecular revolution in Biology, a mechanistic understanding of the genotype-phenotype relationship became possible. Recently, advances in DNA synthesis and sequencing have enabled the development of deep mutational scanning assays, capable of scoring comprehensive libraries of genotypes for fitness and a variety of phenotypes in massively parallel fashion. The resulting empirical genotype-fitness maps pave the way to predictive models, potentially accelerating our ability to anticipate the behaviour of pathogen and cancerous cell populations from sequencing data. Besides from cellular fitness, phenotypes of direct application in industry (e.g. enzyme activity) and medicine (e.g. antibody binding) can be quantified and even selected directly by these assays. This review discusses the technological basis of and recent developments in massively parallel genetics, along with the trends it is uncovering in the genotype-phenotype relationship (distribution of mutation effects, epistasis), their possible mechanistic bases and future directions for advancing towards the goal of predictive genetics.
Collapse
Affiliation(s)
- Harry Kemble
- Infection, Antimicrobials, Modelling, Evolution, INSERM, Unité Mixte de Recherche 1137Université Paris Diderot, Université Paris NordParisFrance
- École Supérieure de Physique et de Chimie Industrielles de la Ville de Paris (ESPCI Paris), UMR CNRS‐ESPCI CBI 8231PSL Research UniversityParis Cedex 05France
| | - Philippe Nghe
- École Supérieure de Physique et de Chimie Industrielles de la Ville de Paris (ESPCI Paris), UMR CNRS‐ESPCI CBI 8231PSL Research UniversityParis Cedex 05France
| | - Olivier Tenaillon
- Infection, Antimicrobials, Modelling, Evolution, INSERM, Unité Mixte de Recherche 1137Université Paris Diderot, Université Paris NordParisFrance
| |
Collapse
|
18
|
Konaté MM, Plata G, Park J, Usmanova DR, Wang H, Vitkup D. Molecular function limits divergent protein evolution on planetary timescales. eLife 2019; 8:e39705. [PMID: 31532392 PMCID: PMC6750897 DOI: 10.7554/elife.39705] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Accepted: 08/07/2019] [Indexed: 01/25/2023] Open
Abstract
Functional conservation is known to constrain protein evolution. Nevertheless, the long-term divergence patterns of proteins maintaining the same molecular function and the possible limits of this divergence have not been explored in detail. We investigate these fundamental questions by characterizing the divergence between ancient protein orthologs with conserved molecular function. Our results demonstrate that the decline of sequence and structural similarities between such orthologs significantly slows down after ~1-2 billion years of independent evolution. As a result, the sequence and structural similarities between ancient orthologs have not substantially decreased for the past billion years. The effective divergence limit (>25% sequence identity) is not primarily due to protein sites universally conserved in all linages. Instead, less than four amino acid types are accepted, on average, per site across orthologous protein sequences. Our analysis also reveals different divergence patterns for protein sites with experimentally determined small and large fitness effects of mutations. Editorial note This article has been through an editorial process in which the authors decide how to respond to the issues raised during peer review. The Reviewing Editor's assessment is that all the issues have been addressed (see decision letter).
Collapse
Affiliation(s)
- Mariam M Konaté
- Department of Systems BiologyColumbia UniversityNew YorkUnited States
- Division of Cancer Treatment and Diagnosis, National Cancer InstituteBethesdaUnited States
| | - Germán Plata
- Department of Systems BiologyColumbia UniversityNew YorkUnited States
| | - Jimin Park
- Department of Systems BiologyColumbia UniversityNew YorkUnited States
- Department of Pathology and Cell BiologyColumbia UniversityNew YorkUnited States
| | - Dinara R Usmanova
- Department of Systems BiologyColumbia UniversityNew YorkUnited States
| | - Harris Wang
- Department of Systems BiologyColumbia UniversityNew YorkUnited States
- Department of Pathology and Cell BiologyColumbia UniversityNew YorkUnited States
| | - Dennis Vitkup
- Department of Systems BiologyColumbia UniversityNew YorkUnited States
- Department of Biomedical InformaticsColumbia UniversityNew YorkUnited States
| |
Collapse
|
19
|
Ferrada E. The Site-Specific Amino Acid Preferences of Homologous Proteins Depend on Sequence Divergence. Genome Biol Evol 2019; 11:121-135. [PMID: 30496400 PMCID: PMC6326188 DOI: 10.1093/gbe/evy261] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/26/2018] [Indexed: 12/20/2022] Open
Abstract
The propensity of protein sites to be occupied by any of the 20 amino acids is known as site-specific amino acid preferences (SSAP). Under the assumption that SSAP are conserved among homologs, they can be used to parameterize evolutionary models for the reconstruction of accurate phylogenetic trees. However, simulations and experimental studies have not been able to fully assess the relative conservation of SSAP as a function of sequence divergence between protein homologs. Here, we implement a computational procedure to predict the SSAP of proteins based on the effect of changes in thermodynamic stability upon mutation. An advantage of this computational approach is that it allows us to interrogate a large and unbiased sample of homologous proteins, over the entire spectrum of sequence divergence, and under selection for the same molecular trait. We show that computational predictions have reproducibilities that resemble those obtained in experimental replicates, and can largely recapitulate the SSAP observed in a large-scale mutagenesis experiment. Our results support recent experimental reports on the conservation of SSAP of related homologs, with a slowly increasing fraction of up to 15% of different sites at sequence distances lower than 40%. However, even under the sole contribution of thermodynamic stability, our conservative approach identifies up to 30% of significant different sites between divergent homologs. We show that this relation holds for homologs of diverse sizes and structural classes. Analyses of residue contact networks suggest that an important determinant of these differences is the increasing accumulation of structural deviations that results from sequence divergence.
Collapse
Affiliation(s)
- Evandro Ferrada
- Center for Genomics and Bioinformatics, Faculty of Science, Universidad Mayor, Camino La Pirámide 5750, Huechuraba, 8580745, Santiago, Chile
| |
Collapse
|
20
|
Riesselman AJ, Ingraham JB, Marks DS. Deep generative models of genetic variation capture the effects of mutations. Nat Methods 2018; 15:816-822. [PMID: 30250057 DOI: 10.1038/s41592-018-0138-4] [Citation(s) in RCA: 239] [Impact Index Per Article: 39.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2018] [Accepted: 07/29/2018] [Indexed: 01/05/2023]
Abstract
The functions of proteins and RNAs are defined by the collective interactions of many residues, and yet most statistical models of biological sequences consider sites nearly independently. Recent approaches have demonstrated benefits of including interactions to capture pairwise covariation, but leave higher-order dependencies out of reach. Here we show how it is possible to capture higher-order, context-dependent constraints in biological sequences via latent variable models with nonlinear dependencies. We found that DeepSequence ( https://github.com/debbiemarkslab/DeepSequence ), a probabilistic model for sequence families, predicted the effects of mutations across a variety of deep mutational scanning experiments substantially better than existing methods based on the same evolutionary data. The model, learned in an unsupervised manner solely on the basis of sequence information, is grounded with biologically motivated priors, reveals the latent organization of sequence families, and can be used to explore new parts of sequence space.
Collapse
Affiliation(s)
- Adam J Riesselman
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA.,Program in Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - John B Ingraham
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA.,Program in Systems Biology, Harvard University, Cambridge, MA, USA
| | - Debora S Marks
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
21
|
Hodges AM, Fenton AW, Dougherty LL, Overholt AC, Swint-Kruse L. RheoScale: A tool to aggregate and quantify experimentally determined substitution outcomes for multiple variants at individual protein positions. Hum Mutat 2018; 39:1814-1826. [PMID: 30117637 DOI: 10.1002/humu.23616] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2018] [Revised: 07/31/2018] [Accepted: 08/13/2018] [Indexed: 12/25/2022]
Abstract
Human mutations often cause amino acid changes (variants) that can alter protein function or stability. Some variants fall at protein positions that experimentally exhibit "rheostatic" mutation outcomes (different amino acid substitutions lead to a range of functional outcomes). In ongoing studies of rheostat positions, we encountered the need to aggregate experimental results from multiple variants, to describe the overall roles of individual positions. Here, we present "RheoScale" which generates quantitative scores to discriminate rheostat positions from those with "toggle" (most substitutions abolish function) or "neutral" (most substitutions have wild-type function) outcomes. RheoScale scores facilitate correlations of experimental data (such as binding affinity or stability) with structural and bioinformatic analyses. The RheoScale calculator is encoded into a Microsoft Excel workbook and an R script. Example analyses are shown for three model protein systems, including one assessed via deep mutational scanning. The RheoScale calculator quickly and efficiently provided quantitative descriptions that were in good agreement with prior qualitative observations. As an example application, scores were compared to the example proteins' structures; strong rheostat positions tended to occur in dynamic locations. In the future, RheoScale scores can be easily integrated into computational studies to facilitate improved algorithms for predicting outcomes of human variants.
Collapse
Affiliation(s)
- Abby M Hodges
- Department of Natural, Health, and Mathematical Sciences, MidAmerica Nazarene University, Olathe, Kansas, USA
| | - Aron W Fenton
- The Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Larissa L Dougherty
- The Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Andrew C Overholt
- Department of Natural, Health, and Mathematical Sciences, MidAmerica Nazarene University, Olathe, Kansas, USA
| | - Liskin Swint-Kruse
- The Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| |
Collapse
|
22
|
Multiplexed assays of variant effects contribute to a growing genotype-phenotype atlas. Hum Genet 2018; 137:665-678. [PMID: 30073413 PMCID: PMC6153521 DOI: 10.1007/s00439-018-1916-x] [Citation(s) in RCA: 59] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2018] [Accepted: 07/21/2018] [Indexed: 12/12/2022]
Abstract
Given the constantly improving cost and speed of genome sequencing, it is reasonable to expect that personal genomes will soon be known for many millions of humans. This stands in stark contrast with our limited ability to interpret the sequence variants which we find. Although it is, perhaps, easiest to interpret variants in coding regions, knowledge of functional impact is unknown for the vast majority of missense variants. While many computational approaches can predict the impact of coding variants, they are given a little weight in the current guidelines for interpreting clinical variants. Laboratory assays produce comparatively more trustworthy results, but until recently did not scale to the space of all possible mutations. The development of deep mutational scanning and other multiplexed assays of variant effect has now brought feasibility of this endeavour within view. Here, we review progress in this field over the last decade, break down the different approaches into their components, and compare methodological differences.
Collapse
|
23
|
Risso VA, Sanchez-Ruiz JM, Ozkan SB. Biotechnological and protein-engineering implications of ancestral protein resurrection. Curr Opin Struct Biol 2018; 51:106-115. [PMID: 29660672 DOI: 10.1016/j.sbi.2018.02.007] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2018] [Revised: 02/18/2018] [Accepted: 02/20/2018] [Indexed: 10/17/2022]
Abstract
Approximations to the sequences of ancestral proteins can be derived from the sequences of their modern descendants. Proteins encoded by such reconstructed sequences can be prepared in the laboratory and subjected to experimental scrutiny. These 'resurrected' ancestral proteins often display remarkable properties, reflecting ancestral adaptations to intra-cellular and extra-cellular environments that differed from the environments hosting modern/extant proteins. Recent experimental and computational work has specifically discussed high stability, substrate and catalytic promiscuity, conformational flexibility/diversity and altered patterns of interaction with other sub-cellular components. In this review, we discuss these remarkable properties as well as recent attempts to explore their biotechnological and protein-engineering potential.
Collapse
Affiliation(s)
- Valeria A Risso
- Departamento de Quimica Fisica, Facultad de Ciencias, University of Granada, 18071 Granada, Spain
| | - Jose M Sanchez-Ruiz
- Departamento de Quimica Fisica, Facultad de Ciencias, University of Granada, 18071 Granada, Spain.
| | - S Banu Ozkan
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ 85281, United States.
| |
Collapse
|
24
|
Haddox HK, Dingens AS, Hilton SK, Overbaugh J, Bloom JD. Mapping mutational effects along the evolutionary landscape of HIV envelope. eLife 2018; 7:34420. [PMID: 29590010 PMCID: PMC5910023 DOI: 10.7554/elife.34420] [Citation(s) in RCA: 71] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2017] [Accepted: 03/15/2018] [Indexed: 01/04/2023] Open
Abstract
The immediate evolutionary space accessible to HIV is largely determined by how single amino acid mutations affect fitness. These mutational effects can shift as the virus evolves. However, the prevalence of such shifts in mutational effects remains unclear. Here, we quantify the effects on viral growth of all amino acid mutations to two HIV envelope (Env) proteins that differ at >100 residues. Most mutations similarly affect both Envs, but the amino acid preferences of a minority of sites have clearly shifted. These shifted sites usually prefer a specific amino acid in one Env, but tolerate many amino acids in the other. Surprisingly, shifts are only slightly enriched at sites that have substituted between the Envs—and many occur at residues that do not even contact substitutions. Therefore, long-range epistasis can unpredictably shift Env’s mutational tolerance during HIV evolution, although the amino acid preferences of most sites are conserved between moderately diverged viral strains. The virus that causes AIDS, or HIV, has a protein called Env on its surface, which is essential for the virus to infect cells. Env can also be recognized by the immune system, which then targets the virus for destruction or blocks it from infecting cells. Unfortunately, Env evolves very quickly, which means that HIV can evade our defenses. However, there are limits to how much this protein can change, since it still needs to perform its essential role in helping viruses enter cells. In the century since HIV first appeared in human populations, the virus has evolved considerably. There are now many HIV strains that infect people, and they bear Env proteins with substantially different sequences. However, it is not clear if these changes in sequence have resulted in Envs from distinct strains being able to tolerate different mutations. To examine this question, Haddox et al. compared how the Envs from two strains of HIV react to modifications in their sequences. They created all possible individual mutations in the proteins, and the resulting collections of mutated viruses were then tested for their ability to infect cells in the laboratory. Most mutations had similar effects in both Env proteins. This allowed Haddox et al. to identify portions of the protein that easily accommodate changes, and portions that must remain unchanged for viruses to remain infectious—at least in the laboratory. Some of these mutations are under different types of pressures when the virus faces the immune system, and those were identified using computational approaches. However, some mutations were tolerated differently by the two Env proteins. Therefore, viral strains differ in how their Env proteins can evolve. The parts of Env that showed differences in mutational tolerance between the strains were not necessarily the parts that differ in sequence. This shows that changes in sequence in one part of the protein can modify how other portions evolve. It remains to be determined whether changes in tolerance to mutations translate into differences in how the virus can escape immunity. This is an important question given that the rapid evolution of Env is a major obstacle to creating a vaccine for HIV.
Collapse
Affiliation(s)
- Hugh K Haddox
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, United States.,Molecular and Cellular Biology PhD program, University of Washington, Seattle, United States
| | - Adam S Dingens
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, United States.,Molecular and Cellular Biology PhD program, University of Washington, Seattle, United States
| | - Sarah K Hilton
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, United States.,Department of Genome Sciences, University of Washington, Seattle, United States
| | - Julie Overbaugh
- Human Biology Division, Fred Hutchinson Cancer Research Center, Seattle, United States.,Epidemiology Program, Fred Hutchinson Cancer Research Center, Seattle, United States
| | - Jesse D Bloom
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, United States.,Department of Genome Sciences, University of Washington, Seattle, United States
| |
Collapse
|
25
|
Getting Momentum: From Biocatalysis to Advanced Synthetic Biology. Trends Biochem Sci 2018; 43:180-198. [DOI: 10.1016/j.tibs.2018.01.003] [Citation(s) in RCA: 58] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2017] [Revised: 01/08/2018] [Accepted: 01/10/2018] [Indexed: 11/20/2022]
|
26
|
Boehr DD, D'Amico RN, O'Rourke KF. Engineered control of enzyme structural dynamics and function. Protein Sci 2018; 27:825-838. [PMID: 29380452 DOI: 10.1002/pro.3379] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2017] [Revised: 01/20/2018] [Accepted: 01/24/2018] [Indexed: 12/20/2022]
Abstract
Enzymes undergo a range of internal motions from local, active site fluctuations to large-scale, global conformational changes. These motions are often important for enzyme function, including in ligand binding and dissociation and even preparing the active site for chemical catalysis. Protein engineering efforts have been directed towards manipulating enzyme structural dynamics and conformational changes, including targeting specific amino acid interactions and creation of chimeric enzymes with new regulatory functions. Post-translational covalent modification can provide an additional level of enzyme control. These studies have not only provided insights into the functional role of protein motions, but they offer opportunities to create stimulus-responsive enzymes. These enzymes can be engineered to respond to a number of external stimuli, including light, pH, and the presence of novel allosteric modulators. Altogether, the ability to engineer and control enzyme structural dynamics can provide new tools for biotechnology and medicine.
Collapse
Affiliation(s)
- David D Boehr
- Department of Chemistry, The Pennsylvania State University, University Park, Pennsylvania, 16802, USA
| | - Rebecca N D'Amico
- Department of Chemistry, The Pennsylvania State University, University Park, Pennsylvania, 16802, USA
| | - Kathleen F O'Rourke
- Department of Chemistry, The Pennsylvania State University, University Park, Pennsylvania, 16802, USA
| |
Collapse
|
27
|
Evolutionary mechanisms studied through protein fitness landscapes. Curr Opin Struct Biol 2018; 48:141-148. [DOI: 10.1016/j.sbi.2018.01.001] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2017] [Revised: 12/26/2017] [Accepted: 01/01/2018] [Indexed: 12/15/2022]
|
28
|
Khromov P, Malliaris CD, Morozov AV. Generalization of the Ewens sampling formula to arbitrary fitness landscapes. PLoS One 2018; 13:e0190186. [PMID: 29324850 PMCID: PMC5764269 DOI: 10.1371/journal.pone.0190186] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2017] [Accepted: 12/08/2017] [Indexed: 11/30/2022] Open
Abstract
In considering evolution of transcribed regions, regulatory sequences, and other genomic loci, we are often faced with a situation in which the number of allelic states greatly exceeds the size of the population. In this limit, the population eventually adopts a steady state characterized by mutation-selection-drift balance. Although new alleles continue to be explored through mutation, the statistics of the population, and in particular the probabilities of seeing specific allelic configurations in samples taken from the population, do not change with time. In the absence of selection, the probabilities of allelic configurations are given by the Ewens sampling formula, widely used in population genetics to detect deviations from neutrality. Here we develop an extension of this formula to arbitrary fitness distributions. Although our approach is general, we focus on the class of fitness landscapes, inspired by recent high-throughput genotype-phenotype maps, in which alleles can be in several distinct phenotypic states. This class of landscapes yields sampling probabilities that are computationally more tractable and can form a basis for inference of selection signatures from genomic data. Using an efficient numerical implementation of the sampling probabilities, we demonstrate that, for a sizable range of mutation rates and selection coefficients, the steady-state allelic diversity is not neutral. Therefore, it may be used to infer selection coefficients, as well as other evolutionary parameters from population data. We also carry out numerical simulations to challenge various approximations involved in deriving our sampling formulas, such as the infinite-allele limit and the “full connectivity” assumption inherent in the Ewens theory, in which each allele can mutate into any other allele. We find that, at least for the specific numerical examples studied, our theory remains sufficiently accurate even if these assumptions are relaxed. Thus our framework establishes both theoretical and practical foundations for inferring selection signatures from population-level genomic sequence samples.
Collapse
Affiliation(s)
- Pavel Khromov
- Department of Physics and Astronomy and Center for Quantitative Biology, Rutgers University, Piscataway, New Jersey, United States of America
| | - Constantin D. Malliaris
- Department of Physics and Astronomy and Center for Quantitative Biology, Rutgers University, Piscataway, New Jersey, United States of America
| | - Alexandre V. Morozov
- Department of Physics and Astronomy and Center for Quantitative Biology, Rutgers University, Piscataway, New Jersey, United States of America
- * E-mail:
| |
Collapse
|