1
|
Buric F, Viknander S, Fu X, Lemke O, Carmona OG, Zrimec J, Szyrwiel L, Mülleder M, Ralser M, Zelezniak A. Amino acid sequence encodes protein abundance shaped by protein stability at reduced synthesis cost. Protein Sci 2025; 34:e5239. [PMID: 39665261 PMCID: PMC11635393 DOI: 10.1002/pro.5239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 10/11/2024] [Accepted: 11/14/2024] [Indexed: 12/13/2024]
Abstract
Understanding what drives protein abundance is essential to biology, medicine, and biotechnology. Driven by evolutionary selection, an amino acid sequence is tailored to meet the required abundance of a proteome, underscoring the intricate relationship between sequence and functional demand. Yet, the specific role of amino acid sequences in determining proteome abundance remains elusive. Here we show that the amino acid sequence alone encodes over half of protein abundance variation across all domains of life, ranging from bacteria to mouse and human. With an attempt to go beyond predictions, we trained a manageable-size Transformer model to interpret latent factors predictive of protein abundances. Intuitively, the model's attention focused on the protein's structural features linked to stability and metabolic costs related to protein synthesis. To probe these relationships, we introduce MGEM (Mutation Guided by an Embedded Manifold), a methodology for guiding protein abundance through sequence modifications. We find that mutations which increase predicted abundance have significantly altered protein polarity and hydrophobicity, underscoring a connection between protein structural features and abundance. Through molecular dynamics simulations we revealed that abundance-enhancing mutations possibly contribute to protein thermostability by increasing rigidity, which occurs at a lower synthesis cost.
Collapse
Affiliation(s)
- Filip Buric
- Department of Biology and Biological EngineeringChalmers University of TechnologyGothenburgSweden
| | - Sandra Viknander
- Department of Biology and Biological EngineeringChalmers University of TechnologyGothenburgSweden
| | - Xiaozhi Fu
- Department of Biology and Biological EngineeringChalmers University of TechnologyGothenburgSweden
| | - Oliver Lemke
- Department of BiochemistryCharité – Universitätsmedizin BerlinBerlinGermany
| | - Oriol Gracia Carmona
- Randall Centre for Cell & Molecular BiophysicsKing's College LondonLondonUK
- Institute of Structural and Molecular BiologyUniversity College LondonLondonUK
| | - Jan Zrimec
- Department of Biology and Biological EngineeringChalmers University of TechnologyGothenburgSweden
- Department of Biotechnology and Systems BiologyNational Institute of BiologyLjubljanaSlovenia
| | - Lukasz Szyrwiel
- Department of BiochemistryCharité – Universitätsmedizin BerlinBerlinGermany
| | - Michael Mülleder
- Core Facility High Throughput Mass SpectrometryCharité – Universitätsmedizin BerlinBerlinGermany
| | - Markus Ralser
- Department of BiochemistryCharité – Universitätsmedizin BerlinBerlinGermany
| | - Aleksej Zelezniak
- Department of Biology and Biological EngineeringChalmers University of TechnologyGothenburgSweden
- Randall Centre for Cell & Molecular BiophysicsKing's College LondonLondonUK
- Institute of Biotechnology, Life Sciences CentreVilnius UniversityVilniusLithuania
| |
Collapse
|
2
|
Blaabjerg LM, Jonsson N, Boomsma W, Stein A, Lindorff-Larsen K. SSEmb: A joint embedding of protein sequence and structure enables robust variant effect predictions. Nat Commun 2024; 15:9646. [PMID: 39511177 PMCID: PMC11544099 DOI: 10.1038/s41467-024-53982-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Accepted: 10/28/2024] [Indexed: 11/15/2024] Open
Abstract
The ability to predict how amino acid changes affect proteins has a wide range of applications including in disease variant classification and protein engineering. Many existing methods focus on learning from patterns found in either protein sequences or protein structures. Here, we present a method for integrating information from sequence and structure in a single model that we term SSEmb (Sequence Structure Embedding). SSEmb combines a graph representation for the protein structure with a transformer model for processing multiple sequence alignments. We show that by integrating both types of information we obtain a variant effect prediction model that is robust when sequence information is scarce. We also show that SSEmb learns embeddings of the sequence and structure that are useful for other downstream tasks such as to predict protein-protein binding sites. We envisage that SSEmb may be useful both for variant effect predictions and as a representation for learning to predict protein properties that depend on sequence and structure.
Collapse
Affiliation(s)
- Lasse M Blaabjerg
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen N, Denmark
| | - Nicolas Jonsson
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen N, Denmark
| | - Wouter Boomsma
- Center for Basic Machine Learning Research in Life Science, Department of Computer Science, University of Copenhagen, Copenhagen N, Denmark.
| | - Amelie Stein
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen N, Denmark.
| | - Kresten Lindorff-Larsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen N, Denmark.
| |
Collapse
|
3
|
McShea H, Weibel C, Wehbi S, Goodman P, James JE, Wheeler AL, Masel J. The effectiveness of selection in a species affects the direction of amino acid frequency evolution. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.02.01.526552. [PMID: 38948853 PMCID: PMC11212923 DOI: 10.1101/2023.02.01.526552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
Nearly neutral theory predicts that species with higher effective population size (N e ) are better able to purge slightly deleterious mutations. We compare evolution in high-N e vs. low-N e vertebrates to reveal which amino acid frequencies are subject to subtle selective preferences. We take three complementary approaches, two measuring flux and one measuring outcomes. First, we fit non-stationary substitution models of amino acid flux using maximum likelihood, comparing the high-N e clade of rodents and lagomorphs to its low-N e sister clade of primates and colugos. Second, we compare evolutionary outcomes across a wider range of vertebrates, via correlations between amino acid frequencies and N e . Third, we dissect the details of flux in human, chimpanzee, mouse, and rat, as scored by parsimony - this also enables comparison to a historical paper. All three methods agree on which amino acids are preferred under more effective selection. Preferred amino acids tend to be smaller, less costly to synthesize, and to promote intrinsic structural disorder. Parsimony-induced bias in the historical study produces an apparent reduction in structural disorder, perhaps driven by slightly deleterious substitutions. Within highly exchangeable pairs of amino acids, arginine is strongly preferred over lysine, and valine over isoleucine, consistent with more effective selection preferring a marginally larger free energy of folding. These two preferences match differences between thermophiles and mesophilic relatives. These results reveal the biophysical consequences of mutation-selection-drift balance, and demonstrate the utility of nearly neutral theory for understanding protein evolution.
Collapse
Affiliation(s)
- Hanon McShea
- Department of Earth System Science, Stanford University
| | - Catherine Weibel
- Department of Ecology & Evolutionary Biology, University of Arizona
- Department of Applied Physics, Stanford University
| | - Sawsan Wehbi
- Graduate Interdisciplinary Program in Genetics, University of Arizona
| | | | - Jennifer E James
- Department of Ecology & Evolutionary Biology, University of Arizona
- Department of Ecology and Genetics, Uppsala University
| | - Andrew L Wheeler
- Graduate Interdisciplinary Program in Genetics, University of Arizona
| | - Joanna Masel
- Department of Ecology & Evolutionary Biology, University of Arizona
| |
Collapse
|
4
|
Araujo NA, Bubis J. Analysis of a Novel Peptide That Is Capable of Inhibiting the Enzymatic Activity of the Protein Kinase A Catalytic Subunit-Like Protein from Trypanosoma equiperdum. Protein J 2023; 42:709-727. [PMID: 37713008 DOI: 10.1007/s10930-023-10153-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/15/2023] [Indexed: 09/16/2023]
Abstract
A 26-residue peptide possessing the αN-helix motif of the protein kinase A (PKA) regulatory subunit-like proteins from the Trypanozoom subgenera (VAP26, sequence = VAPYFEKSEDETALILKLLTYNVLFS), was shown to inhibit the enzymatic activity of the Trypanosoma equiperdum PKA catalytic subunit-like protein, in a similar manner that the mammalian heat-stable soluble PKA inhibitor known as PKI. However, VAP26 does not contain the PKI inhibitory sequence. Bioinformatics analyzes of the αN-helix motif from various Trypanozoon PKA regulatory subunit-like proteins suggested that the sequence could form favorable peptide-protein interactions of hydrophobic nature with the PKA catalytic subunit-like protein, which possibly may represent an alternative PKA inhibitory mechanism. The sequence of the αN-helix motif of the Trypanozoon proteins was shown to be highly homologous but significantly divergent from the corresponding αN-helix motifs of their Leishmania and mammalian counterparts. This sequence divergence contrasted with the proposed secondary structure of the αN-helix motif, which appeared conserved in every analyzed regulatory subunit-like protein. In silico mutation experiments at positions I234, L238 and F244 of the αN-helix motif from the Trypanozoon proteins destabilized both the specific motif and the protein. On the contrary, mutations at positions T239 and Y240 stabilized the motif and the protein. These results suggested that the αN-helix motif from the Trypanozoon proteins probably possessed a different evolutionary path than their Leishmania and mammalian counterparts. Moreover, finding stabilizing mutations indicated that new inhibitory peptides may be designed based on the αN-helix motif from the Trypanozoon PKA regulatory subunit-like proteins.
Collapse
Affiliation(s)
- Nelson A Araujo
- Escuela de Ciencias Agroalimentarias, Animales y Ambientales, Universidad de O'Higgins, Campus Colchagua, ruta I-90, Km 3, San Fernando, Chile.
| | - José Bubis
- Unidad de Polimorfismo Genético, Genómica y Proteómica, Dirección de Salud, Fundación Instituto de Estudios Avanzados IDEA, Caracas, 1015-A, Venezuela
- Unidad de Señalización Celular y Bioquímica de Parásitos, Dirección de Salud, Fundación Instituto de Estudios Avanzados IDEA, Caracas, 1015-A, Venezuela
- Departamento de Biología Celular, Universidad Simón Bolívar, Apartado 89.000, Caracas, 1081‑A, Venezuela
| |
Collapse
|
5
|
Catching A, Te Yeh M, Bianco S, Capponi S, Andino R. A tradeoff between enterovirus A71 particle stability and cell entry. Nat Commun 2023; 14:7450. [PMID: 37978288 PMCID: PMC10656440 DOI: 10.1038/s41467-023-43029-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Accepted: 10/26/2023] [Indexed: 11/19/2023] Open
Abstract
A central role of viral capsids is to protect the viral genome from the harsh extracellular environment while facilitating initiation of infection when the virus encounters a target cell. Viruses are thought to have evolved an optimal equilibrium between particle stability and efficiency of cell entry. In this study, we genetically perturb this equilibrium in a non-enveloped virus, enterovirus A71 to determine its structural basis. We isolate a single-point mutation variant with increased particle thermotolerance and decreased efficiency of cell entry. Using cryo-electron microscopy and molecular dynamics simulations, we determine that the thermostable native particles have acquired an expanded conformation that results in a significant increase in protein dynamics. Examining the intermediate states of the thermostable variant reveals a potential pathway for uncoating. We propose a sequential release of the lipid pocket factor, followed by internal VP4 and ultimately the viral RNA.
Collapse
Affiliation(s)
- Adam Catching
- Department of Microbiology and Immunology, University of California in San Francisco, San Francisco, CA, 94158, USA
- Graduate Program in Biophysics, University of California in San Francisco, San Francisco, CA, 94158, USA
| | - Ming Te Yeh
- Department of Microbiology and Immunology, University of California in San Francisco, San Francisco, CA, 94158, USA
| | - Simone Bianco
- Industrial and Applied Genomics, AI and Cognitive Software, IBM Almaden Research Center, San Jose, CA, 95120, USA
- Center for Cellular Construction, San Francisco, CA, 94158, USA
- Altos Labs, Redwood City, CA, 94022, USA
| | - Sara Capponi
- Industrial and Applied Genomics, AI and Cognitive Software, IBM Almaden Research Center, San Jose, CA, 95120, USA.
- Center for Cellular Construction, San Francisco, CA, 94158, USA.
| | - Raul Andino
- Department of Microbiology and Immunology, University of California in San Francisco, San Francisco, CA, 94158, USA.
| |
Collapse
|
6
|
Luzuriaga-Neira AR, Ritchie AM, Payne BL, Carrillo-Parramon O, Liberles DA, Alvarez-Ponce D. Highly Abundant Proteins Are Highly Thermostable. Genome Biol Evol 2023; 15:evad112. [PMID: 37399326 DOI: 10.1093/gbe/evad112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/08/2023] [Indexed: 07/05/2023] Open
Abstract
Highly abundant proteins tend to evolve slowly (a trend called E-R anticorrelation), and a number of hypotheses have been proposed to explain this phenomenon. The misfolding avoidance hypothesis attributes the E-R anticorrelation to the abundance-dependent toxic effects of protein misfolding. To avoid these toxic effects, protein sequences (particularly those of highly expressed proteins) would be under selection to fold properly. One prediction of the misfolding avoidance hypothesis is that highly abundant proteins should exhibit high thermostability (i.e., a highly negative free energy of folding, ΔG). Thus far, only a handful of analyses have tested for a relationship between protein abundance and thermostability, producing contradictory results. These analyses have been limited by 1) the scarcity of ΔG data, 2) the fact that these data have been obtained by different laboratories and under different experimental conditions, 3) the problems associated with using proteins' melting energy (Tm) as a proxy for ΔG, and 4) the difficulty of controlling for potentially confounding variables. Here, we use computational methods to compare the free energy of folding of pairs of human-mouse orthologous proteins with different expression levels. Even though the effect size is limited, the most highly expressed ortholog is often the one with a more negative ΔG of folding, indicating that highly expressed proteins are often more thermostable.
Collapse
Affiliation(s)
| | - Andrew M Ritchie
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, Pennsylvania, USA
| | | | | | - David A Liberles
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, Pennsylvania, USA
| | | |
Collapse
|
7
|
Aledo P, Aledo JC. Proteome-Wide Structural Computations Provide Insights into Empirical Amino Acid Substitution Matrices. Int J Mol Sci 2023; 24:ijms24010796. [PMID: 36614247 PMCID: PMC9821064 DOI: 10.3390/ijms24010796] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 12/24/2022] [Accepted: 12/29/2022] [Indexed: 01/04/2023] Open
Abstract
The relative contribution of mutation and selection to the amino acid substitution rates observed in empirical matrices is unclear. Herein, we present a neutral continuous fitness-stability model, inspired by the Arrhenius law (qij=aije-ΔΔGij). The model postulates that the rate of amino acid substitution (i→j) is determined by the product of a pre-exponential factor, which is influenced by the genetic code structure, and an exponential term reflecting the relative fitness of the amino acid substitutions. To assess the validity of our model, we computed changes in stability of 14,094 proteins, for which 137,073,638 in silico mutants were analyzed. These site-specific data were summarized into a 20 square matrix, whose entries, ΔΔGij, were obtained after averaging through all the sites in all the proteins. We found a significant positive correlation between these energy values and the disease-causing potential of each substitution, suggesting that the exponential term accurately summarizes the fitness effect. A remarkable observation was that amino acids that were highly destabilizing when acting as the source, tended to have little effect when acting as the destination, and vice versa (source → destination). The Arrhenius model accurately reproduced the pattern of substitution rates collected in the empirical matrices, suggesting a relevant role for the genetic code structure and a tuning role for purifying selection exerted via protein stability.
Collapse
|
8
|
Bédard C, Cisneros AF, Jordan D, Landry CR. Correlation between protein abundance and sequence conservation: what do recent experiments say? Curr Opin Genet Dev 2022; 77:101984. [PMID: 36162152 DOI: 10.1016/j.gde.2022.101984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 08/23/2022] [Accepted: 08/26/2022] [Indexed: 01/27/2023]
Abstract
Cells evolve in a space of parameter values set by physical and chemical forces. These constraints create associations among cellular properties. A particularly strong association is the negative correlation between the rate of evolution of proteins and their abundance in the cell. Highly expressed proteins evolve slower than lowly expressed ones. Multiple hypotheses have been put forward to explain this relationship, including, for instance, the requirement for higher mRNA stability, misfolding avoidance, and misinteraction avoidance for highly expressed proteins. Here, we review some of these hypotheses, their predictions, and how they are supported to finally discuss recent experiments that have been performed to test these predictions.
Collapse
Affiliation(s)
- Camille Bédard
- Département de Biologie, Faculté des Sciences et de Génie, Université Laval, G1V 0A6, Canada; Institut de Biologie Intégrative et des Systèmes, Université Laval, G1V 0A6, Canada; PROTEO, Le regroupement québécois de recherche sur la fonction, l'ingénierie et les applications des protéines, Université Laval, G1V 0A6, Canada; Centre de Recherche sur les Données Massives, Université Laval, G1V 0A6, Canada. https://twitter.com/@CamilleBed17
| | - Angel F Cisneros
- Institut de Biologie Intégrative et des Systèmes, Université Laval, G1V 0A6, Canada; PROTEO, Le regroupement québécois de recherche sur la fonction, l'ingénierie et les applications des protéines, Université Laval, G1V 0A6, Canada; Centre de Recherche sur les Données Massives, Université Laval, G1V 0A6, Canada; Département de Biochimie, de Microbiologie et de Bio-informatique, Faculté des Sciences et de Génie, Université Laval, G1V 0A6, Canada. https://twitter.com/@AngelFCC119
| | - David Jordan
- Institut de Biologie Intégrative et des Systèmes, Université Laval, G1V 0A6, Canada; PROTEO, Le regroupement québécois de recherche sur la fonction, l'ingénierie et les applications des protéines, Université Laval, G1V 0A6, Canada; Centre de Recherche sur les Données Massives, Université Laval, G1V 0A6, Canada; Département de Biochimie, de Microbiologie et de Bio-informatique, Faculté des Sciences et de Génie, Université Laval, G1V 0A6, Canada. https://twitter.com/@DavidJordan1997
| | - Christian R Landry
- Département de Biologie, Faculté des Sciences et de Génie, Université Laval, G1V 0A6, Canada; Institut de Biologie Intégrative et des Systèmes, Université Laval, G1V 0A6, Canada; PROTEO, Le regroupement québécois de recherche sur la fonction, l'ingénierie et les applications des protéines, Université Laval, G1V 0A6, Canada; Centre de Recherche sur les Données Massives, Université Laval, G1V 0A6, Canada; Département de Biochimie, de Microbiologie et de Bio-informatique, Faculté des Sciences et de Génie, Université Laval, G1V 0A6, Canada.
| |
Collapse
|
9
|
Karamycheva S, Wolf YI, Persi E, Koonin EV, Makarova KS. Analysis of lineage-specific protein family variability in prokaryotes combined with evolutionary reconstructions. Biol Direct 2022; 17:22. [PMID: 36042479 PMCID: PMC9425974 DOI: 10.1186/s13062-022-00337-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 08/13/2022] [Indexed: 12/24/2022] Open
Abstract
Background Evolutionary rate is a key characteristic of gene families that is linked to the functional importance of the respective genes as well as specific biological functions of the proteins they encode. Accurate estimation of evolutionary rates is a challenging task that requires precise phylogenetic analysis. Here we present an easy to estimate protein family level measure of sequence variability based on alignment column homogeneity in multiple alignments of protein sequences from Clade-Specific Clusters of Orthologous Genes (csCOGs). Results We report genome-wide estimates of variability for 8 diverse groups of bacteria and archaea and investigate the connection between variability and various genomic and biological features. The variability estimates are based on homogeneity distributions across amino acid sequence alignments and can be obtained for multiple groups of genomes at minimal computational expense. About half of the variance in variability values can be explained by the analyzed features, with the greatest contribution coming from the extent of gene paralogy in the given csCOG. The correlation between variability and paralogy appears to originate, primarily, not from gene duplication, but from acquisition of distant paralogs and xenologs, introducing sequence variants that are more divergent than those that could have evolved in situ during the lifetime of the given group of organisms. Both high-variability and low-variability csCOGs were identified in all functional categories, but as expected, proteins encoded by integrated mobile elements as well as proteins involved in defense functions and cell motility are, on average, more variable than proteins with housekeeping functions. Additionally, using linear discriminant analysis, we found that variability and fraction of genomes carrying a given gene are the two variables that provide the best prediction of gene essentiality as compared to the results of transposon mutagenesis in Sulfolobus islandicus. Conclusions Variability, a measure of sequence diversity within an alignment relative to the overall diversity within a group of organisms, offers a convenient proxy for evolutionary rate estimates and is informative with respect to prediction of functional properties of proteins. In particular, variability is a strong predictor of gene essentiality for the respective organisms and indicative of sub- or neofunctionalization of paralogs. Supplementary Information The online version contains supplementary material available at 10.1186/s13062-022-00337-7.
Collapse
Affiliation(s)
- Svetlana Karamycheva
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, 20894, USA
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, 20894, USA
| | - Erez Persi
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, 20894, USA
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, 20894, USA
| | - Kira S Makarova
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, 20894, USA.
| |
Collapse
|
10
|
Low protein expression enhances phenotypic evolvability by intensifying selection on folding stability. Nat Ecol Evol 2022; 6:1155-1164. [PMID: 35798838 PMCID: PMC7613228 DOI: 10.1038/s41559-022-01797-w] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Accepted: 05/19/2022] [Indexed: 01/09/2023]
Abstract
Protein abundance affects the evolution of protein genotypes, but we do not know how it affects the evolution of protein phenotypes. Here we investigate the role of protein abundance in the evolvability of green fluorescent protein (GFP) towards the novel phenotype of cyan fluorescence. We evolve GFP in E. coli through multiple cycles of mutation and selection and show that low GFP expression facilitates the evolution of cyan fluorescence. A computational model whose predictions we test experimentally helps explain why: lowly expressed proteins are under stronger selection for proper folding, which facilitates their evolvability on short evolutionary time scales. The reason is that high fluorescence can be achieved by either few proteins that fold well or by many proteins that fold less well. In other words, we observe a synergy between a protein's scarcity and its stability. Because many proteins meet the essential requirements for this scarcity-stability synergy, it may be a widespread mechanism by which low expression helps proteins evolve new phenotypes and functions.
Collapse
|
11
|
Samant N, Nachum G, Tsepal T, Bolon DNA. Sequence dependencies and biophysical features both govern cleavage of diverse cut-sites by HIV protease. Protein Sci 2022; 31:e4366. [PMID: 35762719 PMCID: PMC9207908 DOI: 10.1002/pro.4366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Revised: 05/18/2022] [Accepted: 05/27/2022] [Indexed: 11/12/2022]
Abstract
The infectivity of HIV-1 requires its protease (PR) cleave multiple cut-sites with low sequence similarity. The diversity of cleavage sites has made it challenging to investigate the underlying sequence properties that determine binding and turnover of substrates by PR. We engineered a mutational scanning approach utilizing yeast display, flow cytometry, and deep sequencing to systematically measure the impacts of all individual amino acid changes at 12 positions in three different cut-sites (MA/CA, NC/p1, and p1/p6). The resulting fitness landscapes revealed common physical features that underlie cutting of all three cut-sites at the amino acid positions closest to the scissile bond. In contrast, positions more than two amino acids away from the scissile bond exhibited a strong dependence on the sequence background of the rest of the cut-site. We observed multiple amino acid changes in cut-sites that led to faster cleavage rates, including a preference for negative charge five and six amino acids away from the scissile bond at locations where the surface of protease is positively charged. Analysis of individual cut sites using full-length matrix-capsid proteins indicate that long-distance sequence context can contribute to cutting efficiency such that analyses of peptides or shorter engineered constructs including those in this work should be considered carefully. This work provides a framework for understanding how diverse substrates interact with HIV-1 PR and can be extended to investigate other viral PRs with similar properties.
Collapse
Affiliation(s)
- Neha Samant
- Biochemistry and Molecular BiotechnologyUniversity of Massachusetts Chan Medical SchoolWorcesterMassachusettsUSA
| | - Gily Nachum
- Biochemistry and Molecular BiotechnologyUniversity of Massachusetts Chan Medical SchoolWorcesterMassachusettsUSA
| | - Tenzin Tsepal
- Biochemistry and Molecular BiotechnologyUniversity of Massachusetts Chan Medical SchoolWorcesterMassachusettsUSA
| | - Daniel N. A. Bolon
- Biochemistry and Molecular BiotechnologyUniversity of Massachusetts Chan Medical SchoolWorcesterMassachusettsUSA
| |
Collapse
|
12
|
Besse S, Poujol R, Hussin JG. Comparative Study of Protein Aggregation Propensity and Mutation Tolerance Between Naked Mole-Rat and Mouse. Genome Biol Evol 2022; 14:evac057. [PMID: 35482036 PMCID: PMC9086952 DOI: 10.1093/gbe/evac057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/20/2022] [Indexed: 11/13/2022] Open
Abstract
The molecular mechanisms of aging and life expectancy have been studied in model organisms with short lifespans. However, long-lived species may provide insights into successful strategies for healthy aging, potentially opening the door for novel therapeutic interventions in age-related diseases. Notably, naked mole-rats, the longest-lived rodent, present attenuated aging phenotypes compared with mice. Their resistance toward oxidative stress has been proposed as one hallmark of their healthy aging, suggesting their ability to maintain cell homeostasis, specifically their protein homeostasis. To identify the general principles behind their protein homeostasis robustness, we compared the aggregation propensity and mutation tolerance of naked mole-rat and mouse orthologous proteins. Our analysis showed no proteome-wide differential effects in aggregation propensity and mutation tolerance between these species, but several subsets of proteins with a significant difference in aggregation propensity. We found an enrichment of proteins with higher aggregation propensity in naked mole-rat, and these are functionally involved in the inflammasome complex and nucleic acid binding. On the other hand, proteins with lower aggregation propensity in naked mole-rat have a significantly higher mutation tolerance compared with the rest of the proteins. Among them, we identified proteins known to be associated with neurodegenerative and age-related diseases. These findings highlight the intriguing hypothesis about the capacity of the naked mole-rat proteome to delay aging through its proteomic intrinsic architecture.
Collapse
Affiliation(s)
- Savandara Besse
- Département de Biochimie et Médecine Moléculaire, Faculté de Médecine, Université de Montréal, Québec, Canada
- Centre Robert-Cedergren en Bioinformatique et Génomique, Université de Montréal, Québec, Canada
| | | | - Julie G. Hussin
- Institut de Cardiologie de Montréal, Québec, Canada
- Département de Médecine, Faculté de Médecine, Université de Montréal, Québec, Canada
| |
Collapse
|
13
|
Palenchar PM. The Influence of Codon Usage, Protein Abundance, and Protein Stability on Protein Evolution Vary by Evolutionary Distance and the Type of Protein. Protein J 2022; 41:216-229. [PMID: 35147896 DOI: 10.1007/s10930-022-10045-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/01/2022] [Indexed: 12/01/2022]
Abstract
In general, the evolutionary rate of proteins is not primarily related to protein and amino acid functions, and factors such as protein abundance, codon usage, and the protein's TM are more important. To better understand the factors that affect protein evolution, E. coli MG1655 orthologs were compared to those in closely related bacteria and to more distantly related prokaryotes, eukaryotes, and archaea. Also, the evolution of different types of proteins was studied. The analyses indicate that the amino acid conservation of enzymes that do not use macromolecules (e.g. DNA, RNA, and proteins) as substrates and that carry out metabolic processes involving small molecules (i.e. small molecule enzymes) is different than other enzymes. For example, the small molecule enzymes have a lower percent identity than other enzymes when sequences from closely related bacteria are compared. Analyses indicate the lower percent identity is not a result of the amino acid or codon usage of the small molecule enzymes. The small molecule enzymes also don't have a significantly lower protein abundance indicating that is also not likely an important factor driving differences in amino acid conservation. Analyses indicate different methods to measure the TM of proteins have different relationships between amino acid conservation over different evolutionary distances. In totality, the results demonstrate that the relationship between the factors thought to affect protein evolution (protein abundance, codon usage, and proteins TMs) and protein evolution are complex and depend on the factor, the organisms, and the type of proteins being analyzed.
Collapse
Affiliation(s)
- Peter M Palenchar
- Department of Chemistry, Villanova University, 800 E. Lancaster Ave, Villanova, PA, 19805, USA.
| |
Collapse
|
14
|
Baek KT, Kepp KP. Data set and fitting dependencies when estimating protein mutant stability: Toward simple, balanced, and interpretable models. J Comput Chem 2022; 43:504-518. [PMID: 35040492 DOI: 10.1002/jcc.26810] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 12/13/2021] [Accepted: 01/03/2022] [Indexed: 12/27/2022]
Abstract
Accurate prediction of protein stability changes upon mutation (ΔΔG) is increasingly important to evolution studies, protein engineering, and screening of disease-causing gene variants but is challenged by biases in training data. We investigated 45 linear regression models trained on data sets that account systematically for destabilization bias and mutation-type bias BM . The models were externally validated on three test data sets probing different pathologies and for internal consistency (symmetry and neutrality). Model structure and performance substantially depended on training data and even fitting method. We developed two final models: SimBa-IB for typical natural mutations and SimBa-SYM for situations where stabilizing and destabilizing mutations occur to a similar extent. SimBa-SYM, despite is simplicity, is essentially non-biased (vs. the Ssym data set) while still performing well for all data sets (R ~ 0.46-0.54, MAE = 1.16-1.24 kcal/mol). The simple models provide advantage in terms of interpretability, use and future improvement, and are freely available on GitHub.
Collapse
Affiliation(s)
| | - Kasper P Kepp
- DTU Chemistry, Technical University of Denmark, Lyngby, Denmark
| |
Collapse
|
15
|
Coton C, Talbot G, Louarn ML, Dillmann C, Vienne D. Evolution of enzyme levels in metabolic pathways: A theoretical approach. J Theor Biol 2022; 538:111015. [PMID: 35016894 DOI: 10.1016/j.jtbi.2022.111015] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Revised: 12/03/2021] [Accepted: 01/03/2022] [Indexed: 10/19/2022]
Abstract
The central role of metabolism in cell functioning and adaptation has given rise to countless studies on the evolution of enzyme-coding genes and network topology. However, very few studies have addressed the question of how enzyme concentrations change in response to positive selective pressure on the flux, considered a proxy of fitness. In particular, the way cellular constraints, such as resource limitations and co-regulation, affect the adaptive landscape of a pathway under selection has never been analyzed theoretically. To fill this gap, we developed a model of the evolution of enzyme concentrations that combines metabolic control theory and an adaptive dynamics approach, and integrates possible dependencies between enzyme concentrations. We determined the evolutionary equilibria of enzyme concentrations and their range of neutral variation, and showed that they differ with the properties of the enzymes, the constraints applied to the system and the initial enzyme concentrations. Simulations of long-term evolution confirmed all analytical and numerical predictions, even though we relaxed the simplifying assumptions used in the analytical treatment.
Collapse
Affiliation(s)
- Charlotte Coton
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190, Gif-sur-Yvette, France.
| | - Grégoire Talbot
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190, Gif-sur-Yvette, France
| | - Maud Le Louarn
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190, Gif-sur-Yvette, France
| | - Christine Dillmann
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190, Gif-sur-Yvette, France
| | - Dominique Vienne
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190, Gif-sur-Yvette, France.
| |
Collapse
|
16
|
Nassar R, Dignon GL, Razban RM, Dill KA. The Protein Folding Problem: The Role of Theory. J Mol Biol 2021; 433:167126. [PMID: 34224747 PMCID: PMC8547331 DOI: 10.1016/j.jmb.2021.167126] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Revised: 06/21/2021] [Accepted: 06/26/2021] [Indexed: 10/20/2022]
Abstract
The protein folding problem was first articulated as question of how order arose from disorder in proteins: How did the various native structures of proteins arise from interatomic driving forces encoded within their amino acid sequences, and how did they fold so fast? These matters have now been largely resolved by theory and statistical mechanics combined with experiments. There are general principles. Chain randomness is overcome by solvation-based codes. And in the needle-in-a-haystack metaphor, native states are found efficiently because protein haystacks (conformational ensembles) are funnel-shaped. Order-disorder theory has now grown to encompass a large swath of protein physical science across biology.
Collapse
Affiliation(s)
- Roy Nassar
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY, USA; Department of Chemistry, Stony Brook University, Stony Brook, NY, USA
| | - Gregory L Dignon
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY, USA
| | - Rostam M Razban
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY, USA
| | - Ken A Dill
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY, USA; Department of Chemistry, Stony Brook University, Stony Brook, NY, USA; Department of Physics and Astronomy, Stony Brook University, Stony Brook, NY, USA.
| |
Collapse
|
17
|
Latrille T, Lartillot N. Quantifying the impact of changes in effective population size and expression level on the rate of coding sequence evolution. Theor Popul Biol 2021; 142:57-66. [PMID: 34563555 DOI: 10.1016/j.tpb.2021.09.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 09/08/2021] [Accepted: 09/11/2021] [Indexed: 02/07/2023]
Abstract
Molecular sequences are shaped by selection, where the strength of selection relative to drift is determined by effective population size (Ne). Populations with high Ne are expected to undergo stronger purifying selection, and consequently to show a lower substitution rate for selected mutations relative to the substitution rate for neutral mutations (ω). However, computational models based on biophysics of protein stability have suggested that ω can also be independent of Ne. Together, the response of ω to changes in Ne depends on the specific mapping from sequence to fitness. Importantly, an increase in protein expression level has been found empirically to result in decrease of ω, an observation predicted by theoretical models assuming selection for protein stability. Here, we derive a theoretical approximation for the response of ω to changes in Ne and expression level, under an explicit genotype-phenotype-fitness map. The method is generally valid for additive traits and log-concave fitness functions. We applied these results to protein undergoing selection for their conformational stability and corroborate out findings with simulations under more complex models. We predict a weak response of ω to changes in either Ne or expression level, which are interchangeable. Based on empirical data, we propose that fitness based on the conformational stability may not be a sufficient mechanism to explain the empirically observed variation in ω across species. Other aspects of protein biophysics might be explored, such as protein-protein interactions, which can lead to a stronger response of ω to changes in Ne.
Collapse
Affiliation(s)
- T Latrille
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR 5558, F-69622 Villeurbanne, France; École Normale Supérieure de Lyon, Université de Lyon, Université Lyon 1, Lyon, France.
| | - N Lartillot
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR 5558, F-69622 Villeurbanne, France
| |
Collapse
|
18
|
Razban RM, Dasmeh P, Serohijos AWR, Shakhnovich EI. Avoidance of protein unfolding constrains protein stability in long-term evolution. Biophys J 2021; 120:2413-2424. [PMID: 33932438 PMCID: PMC8390877 DOI: 10.1016/j.bpj.2021.03.042] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Revised: 02/24/2021] [Accepted: 03/17/2021] [Indexed: 11/28/2022] Open
Abstract
Every amino acid residue can influence a protein's overall stability, making stability highly susceptible to change throughout evolution. We consider the distribution of protein stabilities evolutionarily permittable under two previously reported protein fitness functions: flux dynamics and misfolding avoidance. We develop an evolutionary dynamics theory and find that it agrees better with an extensive protein stability data set for dihydrofolate reductase orthologs under the misfolding avoidance fitness function rather than the flux dynamics fitness function. Further investigation with ribonuclease H data demonstrates that not any misfolded state is avoided; rather, it is only the unfolded state. At the end, we discuss how our work pertains to the universal protein abundance-evolutionary rate correlation seen across organisms' proteomes. We derive a closed-form expression relating protein abundance to evolutionary rate that captures Escherichia coli, Saccharomyces cerevisiae, and Homo sapiens experimental trends without fitted parameters.
Collapse
Affiliation(s)
- Rostam M Razban
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts
| | - Pouria Dasmeh
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts; Departement de Biochimie, Université de Montréal, Montreal, Quebec, Canada
| | | | - Eugene I Shakhnovich
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts.
| |
Collapse
|
19
|
Maddamsetti R. Universal Constraints on Protein Evolution in the Long-Term Evolution Experiment with Escherichia coli. Genome Biol Evol 2021; 13:evab070. [PMID: 33856016 PMCID: PMC8233687 DOI: 10.1093/gbe/evab070] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/30/2021] [Indexed: 12/18/2022] Open
Abstract
Although it is well known that abundant proteins evolve slowly across the tree of life, there is little consensus for why this is true. Here, I report that abundant proteins evolve slowly in the hypermutator populations of Lenski's long-term evolution experiment with Escherichia coli (LTEE). Specifically, the density of all observed mutations per gene, as measured in metagenomic time series covering 60,000 generations of the LTEE, significantly anticorrelates with mRNA abundance, protein abundance, and degree of protein-protein interaction. The same pattern holds for nonsynonymous mutation density. However, synonymous mutation density, measured across the LTEE hypermutator populations, positively correlates with protein abundance. These results show that universal constraints on protein evolution are visible in data spanning three decades of experimental evolution. Therefore, it should be possible to design experiments to answer why abundant proteins evolve slowly.
Collapse
Affiliation(s)
- Rohan Maddamsetti
- Department of Biomedical Engineering, Duke University, Durham, North Carolina, USA
| |
Collapse
|
20
|
Manrubia S, Cuesta JA, Aguirre J, Ahnert SE, Altenberg L, Cano AV, Catalán P, Diaz-Uriarte R, Elena SF, García-Martín JA, Hogeweg P, Khatri BS, Krug J, Louis AA, Martin NS, Payne JL, Tarnowski MJ, Weiß M. From genotypes to organisms: State-of-the-art and perspectives of a cornerstone in evolutionary dynamics. Phys Life Rev 2021; 38:55-106. [PMID: 34088608 DOI: 10.1016/j.plrev.2021.03.004] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Accepted: 03/01/2021] [Indexed: 12/21/2022]
Abstract
Understanding how genotypes map onto phenotypes, fitness, and eventually organisms is arguably the next major missing piece in a fully predictive theory of evolution. We refer to this generally as the problem of the genotype-phenotype map. Though we are still far from achieving a complete picture of these relationships, our current understanding of simpler questions, such as the structure induced in the space of genotypes by sequences mapped to molecular structures, has revealed important facts that deeply affect the dynamical description of evolutionary processes. Empirical evidence supporting the fundamental relevance of features such as phenotypic bias is mounting as well, while the synthesis of conceptual and experimental progress leads to questioning current assumptions on the nature of evolutionary dynamics-cancer progression models or synthetic biology approaches being notable examples. This work delves with a critical and constructive attitude into our current knowledge of how genotypes map onto molecular phenotypes and organismal functions, and discusses theoretical and empirical avenues to broaden and improve this comprehension. As a final goal, this community should aim at deriving an updated picture of evolutionary processes soundly relying on the structural properties of genotype spaces, as revealed by modern techniques of molecular and functional analysis.
Collapse
Affiliation(s)
- Susanna Manrubia
- Department of Systems Biology, Centro Nacional de Biotecnología (CSIC), Madrid, Spain; Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain.
| | - José A Cuesta
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain; Departamento de Matemáticas, Universidad Carlos III de Madrid, Leganés, Spain; Instituto de Biocomputación y Física de Sistemas Complejos (BiFi), Universidad de Zaragoza, Spain; UC3M-Santander Big Data Institute (IBiDat), Getafe, Madrid, Spain
| | - Jacobo Aguirre
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain; Centro de Astrobiología, CSIC-INTA, ctra. de Ajalvir km 4, 28850 Torrejón de Ardoz, Madrid, Spain
| | - Sebastian E Ahnert
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, UK; The Alan Turing Institute, British Library, 96 Euston Road, London NW1 2DB, UK
| | | | - Alejandro V Cano
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland; Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Pablo Catalán
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain; Departamento de Matemáticas, Universidad Carlos III de Madrid, Leganés, Spain
| | - Ramon Diaz-Uriarte
- Department of Biochemistry, Universidad Autónoma de Madrid, Madrid, Spain; Instituto de Investigaciones Biomédicas "Alberto Sols" (UAM-CSIC), Madrid, Spain
| | - Santiago F Elena
- Instituto de Biología Integrativa de Sistemas, I(2)SysBio (CSIC-UV), València, Spain; The Santa Fe Institute, Santa Fe, NM, USA
| | | | - Paulien Hogeweg
- Theoretical Biology and Bioinformatics Group, Utrecht University, the Netherlands
| | - Bhavin S Khatri
- The Francis Crick Institute, London, UK; Department of Life Sciences, Imperial College London, London, UK
| | - Joachim Krug
- Institute for Biological Physics, University of Cologne, Köln, Germany
| | - Ard A Louis
- Rudolf Peierls Centre for Theoretical Physics, University of Oxford, Oxford, UK
| | - Nora S Martin
- Theory of Condensed Matter Group, Cavendish Laboratory, University of Cambridge, Cambridge, UK; Sainsbury Laboratory, University of Cambridge, Cambridge, UK
| | - Joshua L Payne
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland; Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | | | - Marcel Weiß
- Theory of Condensed Matter Group, Cavendish Laboratory, University of Cambridge, Cambridge, UK; Sainsbury Laboratory, University of Cambridge, Cambridge, UK
| |
Collapse
|
21
|
Labourel F, Rajon E. Resource uptake and the evolution of moderately efficient enzymes. Mol Biol Evol 2021; 38:3938-3952. [PMID: 33964160 PMCID: PMC8382906 DOI: 10.1093/molbev/msab132] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Enzymes speed up reactions that would otherwise be too slow to sustain the metabolism of selfreplicators. Yet, most enzymes seem only moderately efficient, exhibiting kinetic parameters orders of magnitude lower than their expected physically achievable maxima and spanning over surprisingly large ranges of values. Here, we question how these parameters evolve using a mechanistic model where enzyme efficiency is a key component of individual competition for resources. We show that kinetic parameters are under strong directional selection only up to a point, above which enzymes appear to evolve under near-neutrality, thereby confirming the qualitative observation of other modeling approaches. While the existence of a large fitness plateau could potentially explain the extensive variation in enzyme features reported, we show using a population genetics model that such a widespread distribution is an unlikely outcome of evolution on a common landscape, as mutation–selection–drift balance occupy a narrow area even when very moderate biases towards lower efficiency are considered. Instead, differences in the evolutionary context encountered by each enzyme should be involved, such that each evolves on an individual, unique landscape. Our results point to drift and effective population size playing an important role, along with the kinetics of nutrient transporters, the tolerance to high concentrations of intermediate metabolites, and the reversibility of reactions. Enzyme concentration also shapes selection on kinetic parameters, but we show that the joint evolution of concentration and efficiency does not yield extensive variance in evolutionary outcomes when documented costs to protein expression are applied.
Collapse
Affiliation(s)
- Florian Labourel
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR5558, Villeurbanne, F-69622, France
| | - Etienne Rajon
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR5558, Villeurbanne, F-69622, France
| |
Collapse
|
22
|
Dubreuil B, Levy ED. Abundance Imparts Evolutionary Constraints of Similar Magnitude on the Buried, Surface, and Disordered Regions of Proteins. Front Mol Biosci 2021; 8:626729. [PMID: 33996892 PMCID: PMC8119896 DOI: 10.3389/fmolb.2021.626729] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Accepted: 03/29/2021] [Indexed: 12/02/2022] Open
Abstract
An understanding of the forces shaping protein conservation is key, both for the fundamental knowledge it represents and to allow for optimal use of evolutionary information in practical applications. Sequence conservation is typically examined at one of two levels. The first is a residue-level, where intra-protein differences are analyzed and the second is a protein-level, where inter-protein differences are studied. At a residue level, we know that solvent-accessibility is a prime determinant of conservation. By inverting this logic, we inferred that disordered regions are slightly more solvent-accessible on average than the most exposed surface residues in domains. By integrating abundance information with evolutionary data within and across proteins, we confirmed a previously reported strong surface-core association in the evolution of structured regions, but we found a comparatively weak association between disordered and structured regions. The facts that disordered and structured regions experience different structural constraints and evolve independently provide a unique setup to examine an outstanding question: why is a protein’s abundance the main determinant of its sequence conservation? Indeed, any structural or biophysical property linked to the abundance-conservation relationship should increase the relative conservation of regions concerned with that property (e.g., disordered residues with mis-interactions, domain residues with misfolding). Surprisingly, however, we found the conservation of disordered and structured regions to increase in equal proportion with abundance. This observation implies that either abundance-related constraints are structure-independent, or multiple constraints apply to different regions and perfectly balance each other.
Collapse
Affiliation(s)
- Benjamin Dubreuil
- Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Emmanuel D Levy
- Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| |
Collapse
|
23
|
Pathogenic missense protein variants affect different functional pathways and proteomic features than healthy population variants. PLoS Biol 2021; 19:e3001207. [PMID: 33909605 PMCID: PMC8110273 DOI: 10.1371/journal.pbio.3001207] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2020] [Revised: 05/10/2021] [Accepted: 03/26/2021] [Indexed: 12/27/2022] Open
Abstract
Missense variants are present amongst the healthy population, but some of them are causative of human diseases. A classification of variants associated with “healthy” or “diseased” states is therefore not always straightforward. A deeper understanding of the nature of missense variants in health and disease, the cellular processes they may affect, and the general molecular principles which underlie these differences is essential to offer mechanistic explanations of the true impact of pathogenic variants. Here, we have formalised a statistical framework which enables robust probabilistic quantification of variant enrichment across full-length proteins, their domains, and 3D structure-defined regions. Using this framework, we validate and extend previously reported trends of variant enrichment in different protein structural regions (surface/core/interface). By examining the association of variant enrichment with available functional pathways and transcriptomic and proteomic (protein half-life, thermal stability, abundance) data, we have mined a rich set of molecular features which distinguish between pathogenic and population variants: Pathogenic variants mainly affect proteins involved in cell proliferation and nucleotide processing and are enriched in more abundant proteins. Additionally, rare population variants display features closer to common than pathogenic variants. We validate the association between these molecular features and variant pathogenicity by comparing against existing in silico variant impact annotations. This study provides molecular details into how different proteins exhibit resilience and/or sensitivity towards missense variants and provides the rationale to prioritise variant-enriched proteins and protein domains for therapeutic targeting and development. The ZoomVar database, which we created for this study, is available at fraternalilab.kcl.ac.uk/ZoomVar. It allows users to programmatically annotate missense variants with protein structural information and to calculate variant enrichment in different protein structural regions. How do can one improve the classification of genetic variants as harmful or harmless? This study uses a robust statistical analysis to exploit the interplay between protein structure, proteomic measurements and functional pathways to enable better discrimination between missense variants in health and disease.
Collapse
|
24
|
Chaudhuri D, Majumder S, Datta J, Giri K. In Silico Study of Mutational Stability of SARS-CoV-2 Proteins. Protein J 2021; 40:328-340. [PMID: 33890205 PMCID: PMC8061876 DOI: 10.1007/s10930-021-09988-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/12/2021] [Indexed: 11/03/2022]
Abstract
Severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), an enveloped RNA virus transmits by droplet infection thus affects the respiratory system. Different genomes have been reported globally for SARS-CoV-2 with moderate level of mutation which makes it harder to combat the virus. Mutational profiling and the relevant evolutionary aspect of coronavirus proteins namely spike glycoprotein, membrane protein, envelope protein, nucleoprotein, ORF1ab, ORF3a, ORF6, ORF7a, ORF7b and ORF8 were studied by in silico experiments. Clustering of the protein sequences and calculation of residue relative abundance were done to get an idea about the protein conservancy as well as finding out some representative sequences for phylogenetic and ancestral reconstruction. By mutational profiling and mutation analysis, the effect of mutations on the protein stability and their functional implication were studied. This study indicates the mutational effect on the proteins and their relevance in evolution, which directs us towards a better understanding of these variations and diversification of SARS-CoV-2 for useful future therapeutic study and thus aid in designing therapeutic agents keeping the highly variable regions in mind.
Collapse
Affiliation(s)
- Dwaipayan Chaudhuri
- Department of Life Sciences, Presidency University, 86/1 College Street, Kolkata, 700073, India
| | - Satyabrata Majumder
- Department of Life Sciences, Presidency University, 86/1 College Street, Kolkata, 700073, India
| | - Joyeeta Datta
- Department of Life Sciences, Presidency University, 86/1 College Street, Kolkata, 700073, India
| | - Kalyan Giri
- Department of Life Sciences, Presidency University, 86/1 College Street, Kolkata, 700073, India.
| |
Collapse
|
25
|
Berger D, Stångberg J, Baur J, Walters RJ. Elevated temperature increases genome-wide selection on de novo mutations. Proc Biol Sci 2021; 288:20203094. [PMID: 33529558 DOI: 10.1098/rspb.2020.3094] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Adaptation in new environments depends on the amount of genetic variation available for evolution, and the efficacy by which natural selection discriminates among this variation. However, whether some ecological factors reveal more genetic variation, or impose stronger selection pressures than others, is typically not known. Here, we apply the enzyme kinetic theory to show that rising global temperatures are predicted to intensify natural selection throughout the genome by increasing the effects of DNA sequence variation on protein stability. We test this prediction by (i) estimating temperature-dependent fitness effects of induced mutations in seed beetles adapted to ancestral or elevated temperature, and (ii) calculate 100 paired selection estimates on mutations in benign versus stressful environments from unicellular and multicellular organisms. Environmental stress per se did not increase mean selection on de novo mutation, suggesting that the cost of adaptation does not generally increase in new ecological settings to which the organism is maladapted. However, elevated temperature increased the mean strength of selection on genome-wide polymorphism, signified by increases in both mutation load and mutational variance in fitness. These results have important implications for genetic diversity gradients and the rate and repeatability of evolution under climate change.
Collapse
Affiliation(s)
- David Berger
- Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Norbyvägen 18D, 75236 Uppsala, Sweden
| | - Josefine Stångberg
- Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Norbyvägen 18D, 75236 Uppsala, Sweden
| | - Julian Baur
- Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Norbyvägen 18D, 75236 Uppsala, Sweden
| | - Richard J Walters
- Centre for Environmental and Climate Research, Lund University, Sölvegatan 37, 223 62 Lund, Sweden
| |
Collapse
|
26
|
Usmanova DR, Plata G, Vitkup D. The Relationship between the Misfolding Avoidance Hypothesis and Protein Evolutionary Rates in the Light of Empirical Evidence. Genome Biol Evol 2021; 13:6081017. [PMID: 33432359 PMCID: PMC7874998 DOI: 10.1093/gbe/evab006] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/07/2021] [Indexed: 12/14/2022] Open
Abstract
For more than a decade, the misfolding avoidance hypothesis (MAH) and related theories have dominated evolutionary discussions aimed at explaining the variance of the molecular clock across cellular proteins. In this study, we use various experimental data to further investigate the consistency of the MAH predictions with empirical evidence. We also critically discuss experimental results that motivated the MAH development and that are often viewed as evidence of its major contribution to the variability of protein evolutionary rates. We demonstrate, in Escherichia coli and Homo sapiens, the lack of a substantial negative correlation between protein evolutionary rates and Gibbs free energies of unfolding, a direct measure of protein stability. We then analyze multiple new genome-scale data sets characterizing protein aggregation and interaction propensities, the properties that are likely optimized in evolution to alleviate deleterious effects associated with toxic protein misfolding and misinteractions. Our results demonstrate that the propensity of proteins to aggregate, the fraction of charged amino acids, and protein stickiness do correlate with protein abundances. Nevertheless, across multiple organisms and various data sets we do not observe substantial correlations between proteins’ aggregation- and stability-related properties and evolutionary rates. Therefore, diverse empirical data support the conclusion that the MAH and similar hypotheses do not play a major role in mediating a strong negative correlation between protein expression and the molecular clock, and thus in explaining the variability of evolutionary rates across cellular proteins.
Collapse
Affiliation(s)
- Dinara R Usmanova
- Department of Systems Biology, Columbia University, New York, NY, USA
| | - Germán Plata
- Department of Systems Biology, Columbia University, New York, NY, USA.,Elanco Animal Health, Greenfield, IN, USA
| | - Dennis Vitkup
- Department of Systems Biology, Columbia University, New York, NY, USA.,Department of Biomedical Informatics, Columbia University, New York, NY, USA
| |
Collapse
|
27
|
Schwersensky M, Rooman M, Pucci F. Large-scale in silico mutagenesis experiments reveal optimization of genetic code and codon usage for protein mutational robustness. BMC Biol 2020; 18:146. [PMID: 33081759 PMCID: PMC7576759 DOI: 10.1186/s12915-020-00870-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Accepted: 09/16/2020] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND How, and the extent to which, evolution acts on DNA and protein sequences to ensure mutational robustness and evolvability is a long-standing open question in the field of molecular evolution. We addressed this issue through the first structurome-scale computational investigation, in which we estimated the change in folding free energy upon all possible single-site mutations introduced in more than 20,000 protein structures, as well as through available experimental stability and fitness data. RESULTS At the amino acid level, we found the protein surface to be more robust against random mutations than the core, this difference being stronger for small proteins. The destabilizing and neutral mutations are more numerous in the core and on the surface, respectively, whereas the stabilizing mutations are about 4% in both regions. At the genetic code level, we observed smallest destabilization for mutations that are due to substitutions of base III in the codon, followed by base I, bases I+III, base II, and other multiple base substitutions. This ranking highly anticorrelates with the codon-anticodon mispairing frequency in the translation process. This suggests that the standard genetic code is optimized to limit the impact of random mutations, but even more so to limit translation errors. At the codon level, both the codon usage and the usage bias appear to optimize mutational robustness and translation accuracy, especially for surface residues. CONCLUSION Our results highlight the non-universality of mutational robustness and its multiscale dependence on protein features, the structure of the genetic code, and the codon usage. Our analyses and approach are strongly supported by available experimental mutagenesis data.
Collapse
Affiliation(s)
- Martin Schwersensky
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, CP 165/61, Roosevelt Ave. 50, Brussels, 1050, Belgium
| | - Marianne Rooman
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, CP 165/61, Roosevelt Ave. 50, Brussels, 1050, Belgium.
- Interuniversity Institute of Bioinformatics in Brussels, Boulevard du Triomphe, Brussels, 1050, Belgium.
| | - Fabrizio Pucci
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, CP 165/61, Roosevelt Ave. 50, Brussels, 1050, Belgium.
- Interuniversity Institute of Bioinformatics in Brussels, Boulevard du Triomphe, Brussels, 1050, Belgium.
| |
Collapse
|
28
|
Abstract
Darwin's theory of evolution emphasized that positive selection of functional proficiency provides the fitness that ultimately determines the structure of life, a view that has dominated biochemical thinking of enzymes as perfectly optimized for their specific functions. The 20th-century modern synthesis, structural biology, and the central dogma explained the machinery of evolution, and nearly neutral theory explained how selection competes with random fixation dynamics that produce molecular clocks essential e.g. for dating evolutionary histories. However, quantitative proteomics revealed that selection pressures not relating to optimal function play much larger roles than previously thought, acting perhaps most importantly via protein expression levels. This paper first summarizes recent progress in the 21st century toward recovering this universal selection pressure. Then, the paper argues that proteome cost minimization is the dominant, underlying 'non-function' selection pressure controlling most of the evolution of already functionally adapted living systems. A theory of proteome cost minimization is described and argued to have consequences for understanding evolutionary trade-offs, aging, cancer, and neurodegenerative protein-misfolding diseases.
Collapse
|
29
|
Abstract
Cells adapt to changing environments. Perturb a cell and it returns to a point of homeostasis. Perturb a population and it evolves toward a fitness peak. We review quantitative models of the forces of adaptation and their visualizations on landscapes. While some adaptations result from single mutations or few-gene effects, others are more cooperative, more delocalized in the genome, and more universal and physical. For example, homeostasis and evolution depend on protein folding and aggregation, energy and protein production, protein diffusion, molecular motor speeds and efficiencies, and protein expression levels. Models provide a way to learn about the fitness of cells and cell populations by making and testing hypotheses.
Collapse
Affiliation(s)
- Luca Agozzino
- The Louis and Beatrice Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York 11794, USA; .,Department of Physics and Astronomy, Stony Brook University, Stony Brook, New York 11794, USA
| | - Gábor Balázsi
- The Louis and Beatrice Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York 11794, USA; .,Department of Biomedical Engineering, Stony Brook University, Stony Brook, New York 11794, USA
| | - Jin Wang
- The Louis and Beatrice Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York 11794, USA; .,Department of Physics and Astronomy, Stony Brook University, Stony Brook, New York 11794, USA.,Department of Chemistry, Stony Brook University, Stony Brook, New York 11790, USA
| | - Ken A Dill
- The Louis and Beatrice Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York 11794, USA; .,Department of Physics and Astronomy, Stony Brook University, Stony Brook, New York 11794, USA.,Department of Chemistry, Stony Brook University, Stony Brook, New York 11790, USA
| |
Collapse
|
30
|
Effects of Single Mutations on Protein Stability Are Gaussian Distributed. Biophys J 2020; 118:2872-2878. [PMID: 32416078 DOI: 10.1016/j.bpj.2020.04.027] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Revised: 04/14/2020] [Accepted: 04/24/2020] [Indexed: 12/16/2022] Open
Abstract
The distribution of protein stability effects is known to be well approximated by a Gaussian distribution from previous empirical fits. Starting from first-principles statistical mechanics, we more rigorously motivate this empirical observation by deriving per-residue-position protein stability effects to be Gaussian. Our derivation requires the number of amino acids to be large, which is satisfied by the standard set of 20 amino acids found in nature. No assumption is needed on the number of residues in close proximity in space, in contrast to previous applications of the central limit theorem to protein energetics. We support our derivation results with computational and experimental data on mutant protein stabilities across all types of protein residues.
Collapse
|
31
|
Kemble H, Nghe P, Tenaillon O. Recent insights into the genotype-phenotype relationship from massively parallel genetic assays. Evol Appl 2019; 12:1721-1742. [PMID: 31548853 PMCID: PMC6752143 DOI: 10.1111/eva.12846] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2019] [Revised: 06/21/2019] [Accepted: 07/02/2019] [Indexed: 12/20/2022] Open
Abstract
With the molecular revolution in Biology, a mechanistic understanding of the genotype-phenotype relationship became possible. Recently, advances in DNA synthesis and sequencing have enabled the development of deep mutational scanning assays, capable of scoring comprehensive libraries of genotypes for fitness and a variety of phenotypes in massively parallel fashion. The resulting empirical genotype-fitness maps pave the way to predictive models, potentially accelerating our ability to anticipate the behaviour of pathogen and cancerous cell populations from sequencing data. Besides from cellular fitness, phenotypes of direct application in industry (e.g. enzyme activity) and medicine (e.g. antibody binding) can be quantified and even selected directly by these assays. This review discusses the technological basis of and recent developments in massively parallel genetics, along with the trends it is uncovering in the genotype-phenotype relationship (distribution of mutation effects, epistasis), their possible mechanistic bases and future directions for advancing towards the goal of predictive genetics.
Collapse
Affiliation(s)
- Harry Kemble
- Infection, Antimicrobials, Modelling, Evolution, INSERM, Unité Mixte de Recherche 1137Université Paris Diderot, Université Paris NordParisFrance
- École Supérieure de Physique et de Chimie Industrielles de la Ville de Paris (ESPCI Paris), UMR CNRS‐ESPCI CBI 8231PSL Research UniversityParis Cedex 05France
| | - Philippe Nghe
- École Supérieure de Physique et de Chimie Industrielles de la Ville de Paris (ESPCI Paris), UMR CNRS‐ESPCI CBI 8231PSL Research UniversityParis Cedex 05France
| | - Olivier Tenaillon
- Infection, Antimicrobials, Modelling, Evolution, INSERM, Unité Mixte de Recherche 1137Université Paris Diderot, Université Paris NordParisFrance
| |
Collapse
|
32
|
Razban RM. Protein Melting Temperature Cannot Fully Assess Whether Protein Folding Free Energy Underlies the Universal Abundance-Evolutionary Rate Correlation Seen in Proteins. Mol Biol Evol 2019; 36:1955-1963. [PMID: 31093676 PMCID: PMC6736436 DOI: 10.1093/molbev/msz119] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
The protein misfolding avoidance hypothesis explains the universal negative correlation between protein abundance and sequence evolutionary rate across the proteome by identifying protein folding free energy (ΔG) as the confounding variable. Abundant proteins resist toxic misfolding events by being more stable, and more stable proteins evolve slower because their mutations are more destabilizing. Direct supporting evidence consists only of computer simulations. A study taking advantage of a recent experimental breakthrough in measuring protein stability proteome-wide through melting temperature (Tm) (Leuenberger et al. 2017), found weak misfolding avoidance hypothesis support for the Escherichia coli proteome, and no support for the Saccharomyces cerevisiae, Homo sapiens, and Thermus thermophilus proteomes (Plata and Vitkup 2018). I find that the nontrivial relationship between Tm and ΔG and inaccuracy in Tm measurements by Leuenberger et al. 2017 can be responsible for not observing strong positive abundance-Tm and strong negative Tm-evolutionary rate correlations.
Collapse
Affiliation(s)
- Rostam M Razban
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA
| |
Collapse
|
33
|
Guin D, Gruebele M. Weak Chemical Interactions That Drive Protein Evolution: Crowding, Sticking, and Quinary Structure in Folding and Function. Chem Rev 2019; 119:10691-10717. [PMID: 31356058 DOI: 10.1021/acs.chemrev.8b00753] [Citation(s) in RCA: 89] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
In recent years, better instrumentation and greater computing power have enabled the imaging of elusive biomolecule dynamics in cells, driving many advances in understanding the chemical organization of biological systems. The focus of this Review is on interactions in the cell that affect both biomolecular stability and function and modulate them. The same protein or nucleic acid can behave differently depending on the time in the cell cycle, the location in a specific compartment, or the stresses acting on the cell. We describe in detail the crowding, sticking, and quinary structure in the cell and the current methods to quantify them both in vitro and in vivo. Finally, we discuss protein evolution in the cell in light of current biophysical evidence. We describe the factors that drive protein evolution and shape protein interaction networks. These interactions can significantly affect the free energy, ΔG, of marginally stable and low-population proteins and, due to epistasis, direct the evolutionary pathways in an organism. We finally conclude by providing an outlook on experiments to come and the possibility of collaborative evolutionary biology and biophysical efforts.
Collapse
Affiliation(s)
- Drishti Guin
- Department of Chemistry , University of Illinois , Urbana , Illinois 61801 , United States
| | - Martin Gruebele
- Department of Chemistry , University of Illinois , Urbana , Illinois 61801 , United States.,Department of Physics , University of Illinois , Urbana , Illinois 61801 , United States.,Center for Biophysics and Quantitative Biology , University of Illinois , Urbana , Illinois 61801 , United States
| |
Collapse
|
34
|
Loos MS, Ramakrishnan R, Vranken W, Tsirigotaki A, Tsare EP, Zorzini V, Geyter JD, Yuan B, Tsamardinos I, Klappa M, Schymkowitz J, Rousseau F, Karamanou S, Economou A. Structural Basis of the Subcellular Topology Landscape of Escherichia coli. Front Microbiol 2019; 10:1670. [PMID: 31404336 PMCID: PMC6677119 DOI: 10.3389/fmicb.2019.01670] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Accepted: 07/08/2019] [Indexed: 11/21/2022] Open
Abstract
Cellular proteomes are distributed in multiple compartments: on DNA, ribosomes, on and inside membranes, or they become secreted. Structural properties that allow polypeptides to occupy subcellular niches, particularly to after crossing membranes, remain unclear. We compared intrinsic and extrinsic features in cytoplasmic and secreted polypeptides of the Escherichia coli K-12 proteome. Structural features between the cytoplasmome and secretome are sharply distinct, such that a signal peptide-agnostic machine learning tool distinguishes cytoplasmic from secreted proteins with 95.5% success. Cytoplasmic polypeptides are enriched in aliphatic, aromatic, charged and hydrophobic residues, unique folds and higher early folding propensities. Secretory polypeptides are enriched in polar/small amino acids, β folds, have higher backbone dynamics, higher disorder and contact order and are more often intrinsically disordered. These non-random distributions and experimental evidence imply that evolutionary pressure selected enhanced secretome flexibility, slow folding and looser structures, placing the secretome in a distinct protein class. These adaptations protect the secretome from premature folding during its cytoplasmic transit, optimize its lipid bilayer crossing and allowed it to acquire cell envelope specific chemistries. The latter may favor promiscuous multi-ligand binding, sensing of stress and cell envelope structure changes. In conclusion, enhanced flexibility, slow folding, looser structures and unique folds differentiate the secretome from the cytoplasmome. These findings have wide implications on the structural diversity and evolution of modern proteomes and the protein folding problem.
Collapse
Affiliation(s)
- Maria S Loos
- Department of Microbiology and Immunology, Laboratory of Molecular Bacteriology, Rega Institute, KU Leuven, Leuven, Belgium
| | - Reshmi Ramakrishnan
- Department of Microbiology and Immunology, Laboratory of Molecular Bacteriology, Rega Institute, KU Leuven, Leuven, Belgium.,VIB Switch Laboratory, Department for Cellular and Molecular Medicine, VIB-KU Leuven Center for Brain & Disease Research, KU Leuven, Leuven, Belgium
| | - Wim Vranken
- Interuniversity Institute of Bioinformatics in Brussels, Free University of Brussels, Brussels, Belgium.,Structural Biology Brussels, Vrije Universiteit Brussel and Center for Structural Biology, Brussels, Belgium
| | - Alexandra Tsirigotaki
- Department of Microbiology and Immunology, Laboratory of Molecular Bacteriology, Rega Institute, KU Leuven, Leuven, Belgium
| | - Evrydiki-Pandora Tsare
- Metabolic Engineering & Systems Biology Laboratory, Institute of Chemical Engineering Sciences, Foundation for Research and Technology-Hellas, Patras, Greece
| | - Valentina Zorzini
- Department of Microbiology and Immunology, Laboratory of Molecular Bacteriology, Rega Institute, KU Leuven, Leuven, Belgium
| | - Jozefien De Geyter
- Department of Microbiology and Immunology, Laboratory of Molecular Bacteriology, Rega Institute, KU Leuven, Leuven, Belgium
| | - Biao Yuan
- Department of Microbiology and Immunology, Laboratory of Molecular Bacteriology, Rega Institute, KU Leuven, Leuven, Belgium
| | - Ioannis Tsamardinos
- Gnosis Data Analysis PC, Heraklion, Greece.,Department of Computer Science, University of Crete, Heraklion, Greece
| | - Maria Klappa
- Metabolic Engineering & Systems Biology Laboratory, Institute of Chemical Engineering Sciences, Foundation for Research and Technology-Hellas, Patras, Greece
| | - Joost Schymkowitz
- VIB Switch Laboratory, Department for Cellular and Molecular Medicine, VIB-KU Leuven Center for Brain & Disease Research, KU Leuven, Leuven, Belgium
| | - Frederic Rousseau
- VIB Switch Laboratory, Department for Cellular and Molecular Medicine, VIB-KU Leuven Center for Brain & Disease Research, KU Leuven, Leuven, Belgium
| | - Spyridoula Karamanou
- Department of Microbiology and Immunology, Laboratory of Molecular Bacteriology, Rega Institute, KU Leuven, Leuven, Belgium
| | - Anastassios Economou
- Department of Microbiology and Immunology, Laboratory of Molecular Bacteriology, Rega Institute, KU Leuven, Leuven, Belgium.,Gnosis Data Analysis PC, Heraklion, Greece
| |
Collapse
|
35
|
Teufel AI, Johnson MM, Laurent JM, Kachroo AH, Marcotte EM, Wilke CO. The Many Nuanced Evolutionary Consequences of Duplicated Genes. Mol Biol Evol 2019; 36:304-314. [PMID: 30428072 DOI: 10.1093/molbev/msy210] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Gene duplication is seen as a major source of structural and functional divergence in genome evolution. Under the conventional models of sub or neofunctionalization, functional changes arise in one of the duplicates after duplication. However, we suggest here that the presence of a duplicated gene can result in functional changes to its interacting partners. We explore this hypothesis by in silico evolution of a heterodimer when one member of the interacting pair is duplicated. We examine how a range of selection pressures and protein structures leads to differential patterns of evolutionary divergence. We find that a surprising number of distinct evolutionary trajectories can be observed even in a simple three member system. Further, we observe that selection to correct dosage imbalance can affect the evolution of the initial function in several unexpected ways. For example, if a duplicate is under selective pressure to avoid binding its original binding partner, this can lead to changes in the binding interface of a nonduplicated interacting partner to exclude the duplicate. Hence, independent of the fate of the duplicate, its presence can impact how the original function operates. Additionally, we introduce a conceptual framework to describe how interacting partners cope with dosage imbalance after duplication. Contextualizing our results within this framework reveals that the evolutionary path taken by a duplicate's interacting partners is highly stochastic in nature. Consequently, the fate of duplicate genes may not only be controlled by their own ability to accumulate mutations but also by how interacting partners cope with them.
Collapse
Affiliation(s)
- Ashley I Teufel
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX.,Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX
| | - Mackenzie M Johnson
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX.,Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX
| | - Jon M Laurent
- Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX.,Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX.,Department of Biochemistry and Molecular Pharmacology, Institute for Systems Genetics, New York University Langone Health, New York, NY
| | - Aashiq H Kachroo
- Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX.,Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX.,The Department of Biology, Centre for Applied Synthetic Biology, Concordia University, Montreal, QC, Canada
| | - Edward M Marcotte
- Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX.,Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX
| | - Claus O Wilke
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX.,Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX
| |
Collapse
|
36
|
How Often Do Protein Genes Navigate Valleys of Low Fitness? Genes (Basel) 2019; 10:genes10040283. [PMID: 30965625 PMCID: PMC6523826 DOI: 10.3390/genes10040283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Revised: 03/27/2019] [Accepted: 04/02/2019] [Indexed: 11/17/2022] Open
Abstract
To escape from local fitness peaks, a population must navigate across valleys of low fitness. How these transitions occur, and what role they play in adaptation, have been subjects of active interest in evolutionary genetics for almost a century. However, to our knowledge, this problem has never been addressed directly by considering the evolution of a gene, or group of genes, as a whole, including the complex effects of fitness interactions among multiple loci. Here, we use a precise model of protein fitness to compute the probability P ( s , Δ t ) that an allele, randomly sampled from a population at time t, has crossed a fitness valley of depth s during an interval t - Δ t , t in the immediate past. We study populations of model genes evolving under equilibrium conditions consistent with those in mammalian mitochondria. From this data, we estimate that genes encoding small protein motifs navigate fitness valleys of depth 2 N s ≳ 30 with probability P ≳ 0 . 1 on a time scale of human evolution, where N is the (mitochondrial) effective population size. The results are consistent with recent findings for Watson⁻Crick switching in mammalian mitochondrial tRNA molecules.
Collapse
|
37
|
Gauthier L, Di Franco R, Serohijos AWR. SodaPop: a forward simulation suite for the evolutionary dynamics of asexual populations on protein fitness landscapes. Bioinformatics 2019; 35:4053-4062. [DOI: 10.1093/bioinformatics/btz175] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2018] [Revised: 01/21/2019] [Accepted: 03/12/2019] [Indexed: 11/14/2022] Open
Abstract
Abstract
Motivation
Protein evolution is determined by forces at multiple levels of biological organization. Random mutations have an immediate effect on the biophysical properties, structure and function of proteins. These same mutations also affect the fitness of the organism. However, the evolutionary fate of mutations, whether they succeed to fixation or are purged, also depends on population size and dynamics. There is an emerging interest, both theoretically and experimentally, to integrate these two factors in protein evolution. Although there are several tools available for simulating protein evolution, most of them focus on either the biophysical or the population-level determinants, but not both. Hence, there is a need for a publicly available computational tool to explore both the effects of protein biophysics and population dynamics on protein evolution.
Results
To address this need, we developed SodaPop, a computational suite to simulate protein evolution in the context of the population dynamics of asexual populations. SodaPop accepts as input several fitness landscapes based on protein biochemistry or other user-defined fitness functions. The user can also provide as input experimental fitness landscapes derived from deep mutational scanning approaches or theoretical landscapes derived from physical force field estimates. Here, we demonstrate the broad utility of SodaPop with different applications describing the interplay of selection for protein properties and population dynamics. SodaPop is designed such that population geneticists can explore the influence of protein biochemistry on patterns of genetic variation, and that biochemists and biophysicists can explore the role of population size and demography on protein evolution.
Availability and implementation
Source code and binaries are freely available at https://github.com/louisgt/SodaPop under the GNU GPLv3 license. The software is implemented in C++ and supported on Linux, Mac OS/X and Windows.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Louis Gauthier
- Département de Biochimie, Université de Montréal, Montréal, QC, Canada
- Centre Robert-Cedergren en Bioinformatique et Génomique, Université de Montréal, Montréal, QC, Canada
| | - Rémicia Di Franco
- Département de Biochimie, Université de Montréal, Montréal, QC, Canada
- Centre Robert-Cedergren en Bioinformatique et Génomique, Université de Montréal, Montréal, QC, Canada
- Enseirb-Matmeca, Bordeaux Institute of Technology, Talence, France
| | - Adrian W R Serohijos
- Département de Biochimie, Université de Montréal, Montréal, QC, Canada
- Centre Robert-Cedergren en Bioinformatique et Génomique, Université de Montréal, Montréal, QC, Canada
| |
Collapse
|
38
|
Venev SV, Zeldovich KB. Thermophilic Adaptation in Prokaryotes Is Constrained by Metabolic Costs of Proteostasis. Mol Biol Evol 2019; 35:211-224. [PMID: 29106597 PMCID: PMC5850847 DOI: 10.1093/molbev/msx282] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Prokaryotes evolved to thrive in an extremely diverse set of habitats, and their proteomes bear signatures of environmental conditions. Although correlations between amino acid usage and environmental temperature are well-documented, understanding of the mechanisms of thermal adaptation remains incomplete. Here, we couple the energetic costs of protein folding and protein homeostasis to build a microscopic model explaining both the overall amino acid composition and its temperature trends. Low biosynthesis costs lead to low diversity of physical interactions between amino acid residues, which in turn makes proteins less stable and drives up chaperone activity to maintain appropriate levels of folded, functional proteins. Assuming that the cost of chaperone activity is proportional to the fraction of unfolded client proteins, we simulated thermal adaptation of model proteins subject to minimization of the total cost of amino acid synthesis and chaperone activity. For the first time, we predicted both the proteome-average amino acid abundances and their temperature trends simultaneously, and found strong correlations between model predictions and 402 genomes of bacteria and archaea. The energetic constraint on protein evolution is more apparent in highly expressed proteins, selected by codon adaptation index. We found that in bacteria, highly expressed proteins are similar in composition to thermophilic ones, whereas in archaea no correlation between predicted expression level and thermostability was observed. At the same time, thermal adaptations of highly expressed proteins in bacteria and archaea are nearly identical, suggesting that universal energetic constraints prevail over the phylogenetic differences between these domains of life.
Collapse
Affiliation(s)
- Sergey V Venev
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, 368 Plantation St, Worcester, MA
| | - Konstantin B Zeldovich
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, 368 Plantation St, Worcester, MA
| |
Collapse
|
39
|
Dasmeh P, Serohijos AWR. Estimating the contribution of folding stability to nonspecific epistasis in protein evolution. Proteins 2018; 86:1242-1250. [DOI: 10.1002/prot.25588] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2018] [Revised: 06/28/2018] [Accepted: 07/18/2018] [Indexed: 12/28/2022]
Affiliation(s)
- Pouria Dasmeh
- Department of BiochemistryUniversity of Montreal Montreal Quebec Canada
- Cedergren Center for Bioinformatics and GenomicsUniversity of Montreal Montreal, Quebec Canada
- Department of Biochemistry and Institute for Data Valorization (IVADO)University of Montreal Montreal, Quebec Canada
| | - Adrian W. R. Serohijos
- Department of BiochemistryUniversity of Montreal Montreal Quebec Canada
- Cedergren Center for Bioinformatics and GenomicsUniversity of Montreal Montreal, Quebec Canada
| |
Collapse
|
40
|
Abrahams L, Hurst LD. Refining the Ambush Hypothesis: Evidence That GC- and AT-Rich Bacteria Employ Different Frameshift Defence Strategies. Genome Biol Evol 2018; 10:1153-1173. [PMID: 29617761 PMCID: PMC5909447 DOI: 10.1093/gbe/evy075] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/30/2018] [Indexed: 12/13/2022] Open
Abstract
Stop codons are frequently selected for beyond their regular termination function for error control. The “ambush hypothesis” proposes out-of-frame stop codons (OSCs) terminating frameshifted translations are selected for. Although early indirect evidence was partially supportive, recent evidence suggests OSC frequencies are not exceptional when considering underlying nucleotide content. However, prior null tests fail to control amino acid/codon usages or possible local mutational biases. We therefore return to the issue using bacterial genomes, considering several tests defining and testing against a null. We employ simulation approaches preserving amino acid order but shuffling synonymous codons or preserving codons while shuffling amino acid order. Additionally, we compare codon usage in amino acid pairs, where one codon can but the next, otherwise identical codon, cannot encode an OSC. OSC frequencies exceed expectations typically in AT-rich genomes, the +1 frame and for TGA/TAA but not TAG. With this complex evidence, simply rejecting or accepting the ambush hypothesis is not warranted. We propose a refined post hoc model, whereby AT-rich genomes have more accidental frameshifts, handled by RF2–RF3 complexes (associated with TGA/TAA) and are mostly +1 (or −2) slips. Supporting this, excesses positively correlate with in silico predicted frameshift probabilities. Thus, we propose a more viable framework, whereby genomes broadly adopt one of the two strategies to combat frameshifts: preventing frameshifting (GC-rich) or permitting frameshifts but minimizing impacts when most are caught early (AT-rich). Our refined framework holds promise yet some features, such as the bias of out-of-frame sense codons, remain unexplained.
Collapse
Affiliation(s)
- Liam Abrahams
- Department of Biology and Biochemistry, The Milner Centre for Evolution, University of Bath, United Kingdom
| | - Laurence D Hurst
- Department of Biology and Biochemistry, The Milner Centre for Evolution, University of Bath, United Kingdom
| |
Collapse
|
41
|
Protein evolution speed depends on its stability and abundance and on chaperone concentrations. Proc Natl Acad Sci U S A 2018; 115:9092-9097. [PMID: 30150386 PMCID: PMC6140491 DOI: 10.1073/pnas.1810194115] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Some biological evolution is slow (millions of years), and some is fast (months to years). The speed at which a protein evolves depends on how stable a protein’s folded structure is, how well it avoids aggregation, and how well-chaperoned it is. What are the mechanisms? We compute fitness landscapes by combining a model of protein-folding equilibria with sequence-change dynamics. We find that adapting to a new environment is fastest for proteins that are least stably folded, because those sit on steep downhill parts of fitness potentials. The modeling shows that cells should adapt to warmer environments faster than to colder ones, explains why increasing a protein’s abundance slows cell evolution, and explains how chaperones accelerate evolution by mitigating this effect. Proteins evolve at different rates. What drives the speed of protein sequence changes? Two main factors are a protein’s folding stability and aggregation propensity. By combining the hydrophobic–polar (HP) model with the Zwanzig–Szabo–Bagchi rate theory, we find that: (i) Adaptation is strongly accelerated by selection pressure, explaining the broad variation from days to thousands of years over which organisms adapt to new environments. (ii) The proteins that adapt fastest are those that are not very stably folded, because their fitness landscapes are steepest. And because heating destabilizes folded proteins, we predict that cells should adapt faster when put into warmer rather than cooler environments. (iii) Increasing protein abundance slows down evolution (the substitution rate of the sequence) because a typical protein is not perfectly fit, so increasing its number of copies reduces the cell’s fitness. (iv) However, chaperones can mitigate this abundance effect and accelerate evolution (also called evolutionary capacitance) by effectively enhancing protein stability. This model explains key observations about protein evolution rates.
Collapse
|
42
|
Saarman NP, Kober KM, Simison WB, Pogson GH. Sequence-Based Analysis of Thermal Adaptation and Protein Energy Landscapes in an Invasive Blue Mussel (Mytilus galloprovincialis). Genome Biol Evol 2018; 9:2739-2751. [PMID: 28985307 PMCID: PMC5647807 DOI: 10.1093/gbe/evx190] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/13/2017] [Indexed: 12/12/2022] Open
Abstract
Adaptive responses to thermal stress in poikilotherms plays an important role in determining competitive ability and species distributions. Amino acid substitutions that affect protein stability and modify the thermal optima of orthologous proteins may be particularly important in this context. Here, we examine a set of 2,770 protein-coding genes to determine if proteins in a highly invasive heat tolerant blue mussel (Mytilus galloprovincialis) contain signals of adaptive increases in protein stability relative to orthologs in a more cold tolerant M. trossulus. Such thermal adaptations might help to explain, mechanistically, the success with which the invasive marine mussel M. galloprovincialis has displaced native species in contact zones in the eastern (California) and western (Japan) Pacific. We tested for stabilizing amino acid substitutions in warm tolerant M. galloprovincialis relative to cold tolerant M. trossulus with a generalized linear model that compares in silico estimates of recent changes in protein stability among closely related congeners. Fixed substitutions in M. galloprovincialis were 3,180.0 calories per mol per substitution more stabilizing at genes with both elevated dN/dS ratios and transcriptional responses to heat stress, and 705.8 calories per mol per substitution more stabilizing across all 2,770 loci investigated. Amino acid substitutions concentrated in a small number of genes were more stabilizing in M. galloprovincialis compared with cold tolerant M. trossulus. We also tested for, but did not find, enrichment of a priori GO terms in genes with elevated dN/dS ratios in M. galloprovincialis. This might indicate that selection for thermodynamic stability is generic across all lineages, and suggests that the high change in estimated protein stability that we observed in M. galloprovincialis is driven by selection for extra stabilizing substitutions, rather than by higher incidence of selection in a greater number of genes in this lineage. Nonetheless, our finding of more stabilizing amino acid changes in the warm adapted lineage is important because it suggests that adaption for thermal stability has contributed to M. galloprovincialis’ superior tolerance to heat stress, and that pairing tests for positive selection and tests for transcriptional response to heat stress can identify candidates of protein stability adaptation.
Collapse
Affiliation(s)
- Norah P Saarman
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz.,Department of Ecology and Evolutionary Biology, Yale University
| | - Kord M Kober
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz.,Department of Physiological Nursing, University of California, San Francisco.,Institute for Computational Health Sciences, University of California, San Francisco
| | - W Brian Simison
- Center for Comparative Genomics, California Academy of Sciences, San Francisco, California
| | - Grant H Pogson
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz
| |
Collapse
|
43
|
Beyond Thermodynamic Constraints: Evolutionary Sampling Generates Realistic Protein Sequence Variation. Genetics 2018; 208:1387-1395. [PMID: 29382650 DOI: 10.1534/genetics.118.300699] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2018] [Accepted: 01/25/2018] [Indexed: 01/01/2023] Open
Abstract
Biological evolution generates a surprising amount of site-specific variability in protein sequences. Yet, attempts at modeling this process have been only moderately successful, and current models based on protein structural metrics explain, at best, 60% of the observed variation. Surprisingly, simple measures of protein structure, such as solvent accessibility, are often better predictors of site-specific variability than more complex models employing all-atom energy functions and detailed structural modeling. We suggest here that these more complex models perform poorly because they lack consideration of the evolutionary process, which is, in part, captured by the simpler metrics. We compare protein sequences that are computationally designed to sequences that are computationally evolved using the same protein-design energy function and to homologous natural sequences. We find that, by a wide variety of metrics, evolved sequences are much more similar to natural sequences than are designed sequences. In particular, designed sequences are too conserved on the protein surface relative to natural sequences, whereas evolved sequences are not. Our results suggest that evolutionary simulation produces a realistic sampling of sequence space. By contrast, protein design-at least as currently implemented-does not. Existing energy functions seem to be sufficiently accurate to correctly describe the key thermodynamic constraints acting on protein sequences, but they need to be paired with realistic sampling schemes to generate realistic sequence alignments.
Collapse
|
44
|
Plata G, Vitkup D. Protein Stability and Avoidance of Toxic Misfolding Do Not Explain the Sequence Constraints of Highly Expressed Proteins. Mol Biol Evol 2017; 35:700-703. [PMID: 29309671 DOI: 10.1093/molbev/msx323] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The avoidance of cytotoxic effects associated with protein misfolding has been proposed as a dominant constraint on the sequence evolution and molecular clock of highly expressed proteins. Recently, Leuenberger et al. developed an elegant experimental approach to measure protein thermal stability at the proteome scale. The collected data allow us to rigorously test the predictions of the misfolding avoidance hypothesis that highly expressed proteins have evolved to be more stable, and that maintaining thermodynamic stability significantly constrains their evolution. Notably, reanalysis of the Leuenberger et al. data across four different organisms reveals no substantial correlation between protein stability and protein abundance. Therefore, the key predictions of the misfolding toxicity and related hypotheses are not supported by available empirical data. The data also suggest that, regardless of protein expression, protein stability does not substantially affect the protein molecular clock across organisms.
Collapse
Affiliation(s)
- Germán Plata
- Department of Systems Biology, Columbia University, New York, NY
| | - Dennis Vitkup
- Department of Systems Biology, Columbia University, New York, NY.,Department of Biomedical Informatics, Columbia University, New York, NY
| |
Collapse
|
45
|
Dasmeh P, Girard É, Serohijos AWR. Highly expressed genes evolve under strong epistasis from a proteome-wide scan in E. coli. Sci Rep 2017; 7:15844. [PMID: 29158562 PMCID: PMC5696520 DOI: 10.1038/s41598-017-16030-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2017] [Accepted: 11/06/2017] [Indexed: 11/11/2022] Open
Abstract
Epistasis or the non-additivity of mutational effects is a major force in protein evolution, but it has not been systematically quantified at the level of a proteome. Here, we estimated the extent of epistasis for 2,382 genes in E. coli using several hundreds of orthologs for each gene within the class Gammaproteobacteria. We found that the average epistasis is ~41% across genes in the proteome and that epistasis is stronger among highly expressed genes. This trend is quantitatively explained by the prevailing model of sequence evolution based on minimizing the fitness cost of protein unfolding and aggregation. The genes with the highest epistasis are also functionally involved in the maintenance of proteostasis, translation and central metabolism. In contrast, genes evolving with low epistasis mainly encode for membrane proteins and are involved in transport activity. Our results highlight the coupling between selection and epistasis in the long-term evolution of a proteome.
Collapse
Affiliation(s)
- Pouria Dasmeh
- Departement de Biochimie, Université de Montréal, 2900 Édouard-Montpetit, Montréal, Québec, H3T 1J4, Canada
- Centre Robert Cedergren en Bioinformatique et Génomique, Université de Montréal, 2900 Édouard-Montpetit, Montréal, Québec, H3T 1J4, Canada
| | - Éric Girard
- Departement de Biochimie, Université de Montréal, 2900 Édouard-Montpetit, Montréal, Québec, H3T 1J4, Canada
- Centre Robert Cedergren en Bioinformatique et Génomique, Université de Montréal, 2900 Édouard-Montpetit, Montréal, Québec, H3T 1J4, Canada
| | - Adrian W R Serohijos
- Departement de Biochimie, Université de Montréal, 2900 Édouard-Montpetit, Montréal, Québec, H3T 1J4, Canada.
- Centre Robert Cedergren en Bioinformatique et Génomique, Université de Montréal, 2900 Édouard-Montpetit, Montréal, Québec, H3T 1J4, Canada.
| |
Collapse
|
46
|
Selection originating from protein stability/foldability: Relationships between protein folding free energy, sequence ensemble, and fitness. J Theor Biol 2017; 433:21-38. [DOI: 10.1016/j.jtbi.2017.08.018] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2017] [Revised: 07/27/2017] [Accepted: 08/21/2017] [Indexed: 11/19/2022]
|
47
|
Dasmeh P, Kepp KP. Superoxide dismutase 1 is positively selected to minimize protein aggregation in great apes. Cell Mol Life Sci 2017; 74:3023-3037. [PMID: 28389720 PMCID: PMC11107616 DOI: 10.1007/s00018-017-2519-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2016] [Revised: 03/17/2017] [Accepted: 04/03/2017] [Indexed: 12/14/2022]
Abstract
Positive (adaptive) selection has recently been implied in human superoxide dismutase 1 (SOD1), a highly abundant antioxidant protein with energy signaling and antiaging functions, one of very few examples of direct selection on a human protein product (exon); the molecular drivers of this selection are unknown. We mapped 30 extant SOD1 sequences to the recently established mammalian species tree and inferred ancestors, key substitutions, and signatures of selection during the protein's evolution. We detected elevated substitution rates leading to great apes (Hominidae) at ~1 per 2 million years, significantly higher than in other primates and rodents, although these paradoxically generally evolve much faster. The high evolutionary rate was partly due to relaxation of some selection pressures and partly to distinct positive selection of SOD1 in great apes. We then show that higher stability and net charge and changes at the dimer interface were selectively introduced upon separation from old world monkeys and lesser apes (gibbons). Consequently, human, chimpanzee and gorilla SOD1s have a net charge of -6 at physiological pH, whereas the closely related gibbons and macaques have -3. These features consistently point towards selection against the malicious aggregation effects of elevated SOD1 levels in long-living great apes. The findings mirror the impact of human SOD1 mutations that reduce net charge and/or stability and cause ALS, a motor neuron disease characterized by oxidative stress and SOD1 aggregates and triggered by aging. Our study thus marks an example of direct selection for a particular chemical phenotype (high net charge and stability) in a single human protein with possible implications for the evolution of aging.
Collapse
Affiliation(s)
- Pouria Dasmeh
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, USA
- Department of Biochemistry and Cedergren Center for Bioinformatics and Genomics, Faculty of Medicine, University of Montreal, 2900 Edouard-Montpetit, Montreal, QC, H3T 1J4, Canada
| | - Kasper P Kepp
- Technical University of Denmark, DTU Chemistry, 2800, Kongens Lyngby, Denmark.
| |
Collapse
|
48
|
The Role of Evolutionary Selection in the Dynamics of Protein Structure Evolution. Biophys J 2017; 112:1350-1365. [PMID: 28402878 DOI: 10.1016/j.bpj.2017.02.029] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2016] [Revised: 02/16/2017] [Accepted: 02/22/2017] [Indexed: 02/05/2023] Open
Abstract
Homology modeling is a powerful tool for predicting a protein's structure. This approach is successful because proteins whose sequences are only 30% identical still adopt the same structure, while structure similarity rapidly deteriorates beyond the 30% threshold. By studying the divergence of protein structure as sequence evolves in real proteins and in evolutionary simulations, we show that this nonlinear sequence-structure relationship emerges as a result of selection for protein folding stability in divergent evolution. Fitness constraints prevent the emergence of unstable protein evolutionary intermediates, thereby enforcing evolutionary paths that preserve protein structure despite broad sequence divergence. However, on longer timescales, evolution is punctuated by rare events where the fitness barriers obstructing structure evolution are overcome and discovery of new structures occurs. We outline biophysical and evolutionary rationale for broad variation in protein family sizes, prevalence of compact structures among ancient proteins, and more rapid structure evolution of proteins with lower packing density.
Collapse
|
49
|
Echave J, Wilke CO. Biophysical Models of Protein Evolution: Understanding the Patterns of Evolutionary Sequence Divergence. Annu Rev Biophys 2017; 46:85-103. [PMID: 28301766 DOI: 10.1146/annurev-biophys-070816-033819] [Citation(s) in RCA: 75] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
For decades, rates of protein evolution have been interpreted in terms of the vague concept of functional importance. Slowly evolving proteins or sites within proteins were assumed to be more functionally important and thus subject to stronger selection pressure. More recently, biophysical models of protein evolution, which combine evolutionary theory with protein biophysics, have completely revolutionized our view of the forces that shape sequence divergence. Slowly evolving proteins have been found to evolve slowly because of selection against toxic misfolding and misinteractions, linking their rate of evolution primarily to their abundance. Similarly, most slowly evolving sites in proteins are not directly involved in function, but mutating these sites has a large impact on protein structure and stability. In this article, we review the studies in the emerging field of biophysical protein evolution that have shaped our current understanding of sequence divergence patterns. We also propose future research directions to develop this nascent field.
Collapse
Affiliation(s)
- Julian Echave
- Escuela de Ciencia y Tecnología, Universidad Nacional de San Martín, 1650 San Martín, Buenos Aires, Argentina; .,Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Argentina
| | - Claus O Wilke
- Department of Integrative Biology, The University of Texas at Austin, Texas 78712;
| |
Collapse
|
50
|
Teufel AI, Wilke CO. Accelerated simulation of evolutionary trajectories in origin-fixation models. J R Soc Interface 2017; 14:20160906. [PMID: 28228542 PMCID: PMC5332577 DOI: 10.1098/rsif.2016.0906] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2016] [Accepted: 01/31/2017] [Indexed: 11/12/2022] Open
Abstract
We present an accelerated algorithm to forward-simulate origin-fixation models. Our algorithm requires, on average, only about two fitness evaluations per fixed mutation, whereas traditional algorithms require, per one fixed mutation, a number of fitness evaluations of the order of the effective population size, Ne Our accelerated algorithm yields the exact same steady state as the original algorithm but produces a different order of fixed mutations. By comparing several relevant evolutionary metrics, such as the distribution of fixed selection coefficients and the probability of reversion, we find that the two algorithms behave equivalently in many respects. However, the accelerated algorithm yields less variance in fixed selection coefficients. Notably, we are able to recover the expected amount of variance by rescaling population size, and we find a linear relationship between the rescaled population size and the population size used by the original algorithm. Considering the widespread usage of origin-fixation simulations across many areas of evolutionary biology, we introduce our accelerated algorithm as a useful tool for increasing the computational complexity of fitness functions without sacrificing much in terms of accuracy of the evolutionary simulation.
Collapse
Affiliation(s)
- Ashley I Teufel
- Department of Integrative Biology, Institute for Cellular and Molecular Biology, and Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, TX 78712, USA
| | - Claus O Wilke
- Department of Integrative Biology, Institute for Cellular and Molecular Biology, and Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, TX 78712, USA
| |
Collapse
|