1
|
Bouvier JW, Emms DM, Kelly S. Rubisco is evolving for improved catalytic efficiency and CO 2 assimilation in plants. Proc Natl Acad Sci U S A 2024; 121:e2321050121. [PMID: 38442173 PMCID: PMC10945770 DOI: 10.1073/pnas.2321050121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Accepted: 01/25/2024] [Indexed: 03/07/2024] Open
Abstract
Rubisco is the primary entry point for carbon into the biosphere. However, rubisco is widely regarded as inefficient leading many to question whether the enzyme can adapt to become a better catalyst. Through a phylogenetic investigation of the molecular and kinetic evolution of Form I rubisco we uncover the evolutionary trajectory of rubisco kinetic evolution in angiosperms. We show that rbcL is among the 1% of slowest-evolving genes and enzymes on Earth, accumulating one nucleotide substitution every 0.9 My and one amino acid mutation every 7.2 My. Despite this, rubisco catalysis has been continually evolving toward improved CO2/O2 specificity, carboxylase turnover, and carboxylation efficiency. Consistent with this kinetic adaptation, increased rubisco evolution has led to a concomitant improvement in leaf-level CO2 assimilation. Thus, rubisco has been slowly but continually evolving toward improved catalytic efficiency and CO2 assimilation in plants.
Collapse
Affiliation(s)
- Jacques W Bouvier
- Department of Biology, University of Oxford, Oxford OX1 3RB, United Kingdom
| | - David M Emms
- Department of Biology, University of Oxford, Oxford OX1 3RB, United Kingdom
| | - Steven Kelly
- Department of Biology, University of Oxford, Oxford OX1 3RB, United Kingdom
| |
Collapse
|
2
|
Thakur S, Planeta Kepp K, Mehra R. Predicting virus Fitness: Towards a structure-based computational model. J Struct Biol 2023; 215:108042. [PMID: 37931730 DOI: 10.1016/j.jsb.2023.108042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 10/12/2023] [Accepted: 11/03/2023] [Indexed: 11/08/2023]
Abstract
Predicting the impact of new emerging virus mutations is of major interest in surveillance and for understanding the evolutionary forces of the pathogens. The SARS-CoV-2 surface spike-protein (S-protein) binds to human ACE2 receptors as a critical step in host cell infection. At the same time, S-protein binding to human antibodies neutralizes the virus and prevents interaction with ACE2. Here we combine these two binding properties in a simple virus fitness model, using structure-based computation of all possible mutation effects averaged over 10 ACE2 complexes and 10 antibody complexes of the S-protein (∼380,000 computed mutations), and validated the approach against diverse experimental binding/escape data of ACE2 and antibodies. The ACE2-antibody selectivity change caused by mutation (i.e., the differential change in binding to ACE2 vs. immunity-inducing antibodies) is proposed to be a key metric of fitness model, enabling systematic error cancelation when evaluated. In this model, new mutations become fixated if they increase the selective binding to ACE2 relative to circulating antibodies, assuming that both are present in the host in a competitive binding situation. We use this model to categorize viral mutations that may best reach ACE2 before being captured by antibodies. Our model may aid the understanding of variant-specific vaccines and molecular mechanisms of viral evolution in the context of a human host.
Collapse
Affiliation(s)
- Shivani Thakur
- Department of Chemistry, Indian Institute of Technology Bhilai, Kutelabhata, Durg - 491001, Chhattisgarh, India
| | - Kasper Planeta Kepp
- DTU Chemistry, Technical University of Denmark, Building 206, 2800 Kongens Lyngby, Denmark
| | - Rukmankesh Mehra
- Department of Chemistry, Indian Institute of Technology Bhilai, Kutelabhata, Durg - 491001, Chhattisgarh, India; Department of Bioscience and Biomedical Engineering, Indian Institute of Technology Bhilai, Kutelabhata, Durg - 491001, Chhattisgarh, India.
| |
Collapse
|
3
|
Tilk S, Tkachenko S, Curtis C, Petrov DA, McFarland CD. Most cancers carry a substantial deleterious load due to Hill-Robertson interference. eLife 2022; 11:67790. [PMID: 36047771 PMCID: PMC9499534 DOI: 10.7554/elife.67790] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Accepted: 08/31/2022] [Indexed: 11/13/2022] Open
Abstract
Cancer genomes exhibit surprisingly weak signatures of negative selection1,2. This may be because selective pressures are relaxed or because genome-wide linkage prevents deleterious mutations from being removed (Hill-Robertson interference)3. By stratifying tumors by their genome-wide mutational burden, we observe negative selection (dN/dS ~ 0.56) in low mutational burden tumors, while remaining cancers exhibit dN/dS ratios ~1. This suggests that most tumors do not remove deleterious passengers. To buffer against deleterious passengers, tumors upregulate heat shock pathways as their mutational burden increases. Finally, evolutionary modeling finds that Hill-Robertson interference alone can reproduce patterns of attenuated selection and estimates the total fitness cost of passengers to be 46% per cell on average. Collectively, our findings suggest that the lack of observed negative selection in most tumors is not due to relaxed selective pressures, but rather the inability of selection to remove deleterious mutations in the presence of genome-wide linkage.
Collapse
Affiliation(s)
- Susanne Tilk
- Department of Biology, Stanford University, Stanford, United States
| | - Svyatoslav Tkachenko
- Department of Genetics and Genome Sciences, Case Western Reserve University, Cleveland, United States
| | - Christina Curtis
- Department of Genetics, Stanford University, Stanford, United States
| | - Dmitri A Petrov
- Department of Biology, Stanford University, Stanford, United States
| | - Christopher D McFarland
- Department of Genetics and Genome Sciences, Case Western Reserve University, Cleveland, United States
| |
Collapse
|
4
|
Karamycheva S, Wolf YI, Persi E, Koonin EV, Makarova KS. Analysis of lineage-specific protein family variability in prokaryotes combined with evolutionary reconstructions. Biol Direct 2022; 17:22. [PMID: 36042479 PMCID: PMC9425974 DOI: 10.1186/s13062-022-00337-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 08/13/2022] [Indexed: 12/24/2022] Open
Abstract
Background Evolutionary rate is a key characteristic of gene families that is linked to the functional importance of the respective genes as well as specific biological functions of the proteins they encode. Accurate estimation of evolutionary rates is a challenging task that requires precise phylogenetic analysis. Here we present an easy to estimate protein family level measure of sequence variability based on alignment column homogeneity in multiple alignments of protein sequences from Clade-Specific Clusters of Orthologous Genes (csCOGs). Results We report genome-wide estimates of variability for 8 diverse groups of bacteria and archaea and investigate the connection between variability and various genomic and biological features. The variability estimates are based on homogeneity distributions across amino acid sequence alignments and can be obtained for multiple groups of genomes at minimal computational expense. About half of the variance in variability values can be explained by the analyzed features, with the greatest contribution coming from the extent of gene paralogy in the given csCOG. The correlation between variability and paralogy appears to originate, primarily, not from gene duplication, but from acquisition of distant paralogs and xenologs, introducing sequence variants that are more divergent than those that could have evolved in situ during the lifetime of the given group of organisms. Both high-variability and low-variability csCOGs were identified in all functional categories, but as expected, proteins encoded by integrated mobile elements as well as proteins involved in defense functions and cell motility are, on average, more variable than proteins with housekeeping functions. Additionally, using linear discriminant analysis, we found that variability and fraction of genomes carrying a given gene are the two variables that provide the best prediction of gene essentiality as compared to the results of transposon mutagenesis in Sulfolobus islandicus. Conclusions Variability, a measure of sequence diversity within an alignment relative to the overall diversity within a group of organisms, offers a convenient proxy for evolutionary rate estimates and is informative with respect to prediction of functional properties of proteins. In particular, variability is a strong predictor of gene essentiality for the respective organisms and indicative of sub- or neofunctionalization of paralogs. Supplementary Information The online version contains supplementary material available at 10.1186/s13062-022-00337-7.
Collapse
Affiliation(s)
- Svetlana Karamycheva
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, 20894, USA
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, 20894, USA
| | - Erez Persi
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, 20894, USA
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, 20894, USA
| | - Kira S Makarova
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, 20894, USA.
| |
Collapse
|
5
|
Magi Meconi G, Sasselli IR, Bianco V, Onuchic JN, Coluzza I. Key aspects of the past 30 years of protein design. REPORTS ON PROGRESS IN PHYSICS. PHYSICAL SOCIETY (GREAT BRITAIN) 2022; 85:086601. [PMID: 35704983 DOI: 10.1088/1361-6633/ac78ef] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Accepted: 06/15/2022] [Indexed: 06/15/2023]
Abstract
Proteins are the workhorse of life. They are the building infrastructure of living systems; they are the most efficient molecular machines known, and their enzymatic activity is still unmatched in versatility by any artificial system. Perhaps proteins' most remarkable feature is their modularity. The large amount of information required to specify each protein's function is analogically encoded with an alphabet of just ∼20 letters. The protein folding problem is how to encode all such information in a sequence of 20 letters. In this review, we go through the last 30 years of research to summarize the state of the art and highlight some applications related to fundamental problems of protein evolution.
Collapse
Affiliation(s)
- Giulia Magi Meconi
- Computational Biophysics Lab, Center for Cooperative Research in Biomaterials (CIC biomaGUNE), Basque Research and Technology Alliance (BRTA), Paseo de Miramon 182, 20014, Donostia-San Sebastián, Spain
| | - Ivan R Sasselli
- Computational Biophysics Lab, Center for Cooperative Research in Biomaterials (CIC biomaGUNE), Basque Research and Technology Alliance (BRTA), Paseo de Miramon 182, 20014, Donostia-San Sebastián, Spain
| | | | - Jose N Onuchic
- Center for Theoretical Biological Physics, Department of Physics & Astronomy, Department of Chemistry, Department of Biosciences, Rice University, Houston, TX 77251, United States of America
| | - Ivan Coluzza
- BCMaterials, Basque Center for Materials, Applications and Nanostructures, Bld. Martina Casiano, UPV/EHU Science Park, Barrio Sarriena s/n, 48940 Leioa, Spain
- Basque Foundation for Science, Ikerbasque, 48009, Bilbao, Spain
| |
Collapse
|
6
|
Chikunova A, Ubbink M. The roles of highly conserved, non‐catalytic residues in class A β‐lactamases. Protein Sci 2022; 31:e4328. [PMID: 35634774 PMCID: PMC9112487 DOI: 10.1002/pro.4328] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Revised: 04/03/2022] [Accepted: 04/20/2022] [Indexed: 11/12/2022]
|
7
|
Abstract
The spike protein (S-protein) of SARS-CoV-2, the protein that enables the virus to infect human cells, is the basis for many vaccines and a hotspot of concerning virus evolution. Here, we discuss the outstanding progress in structural characterization of the S-protein and how these structures facilitate analysis of virus function and evolution. We emphasize the differences in reported structures and that analysis of structure-function relationships is sensitive to the structure used. We show that the average residue solvent exposure in nearly complete structures is a good descriptor of open vs closed conformation states. Because of structural heterogeneity of functionally important surface-exposed residues, we recommend using averages of a group of high-quality protein structures rather than a single structure before reaching conclusions on specific structure-function relationships. To illustrate these points, we analyze some significant chemical tendencies of prominent S-protein mutations in the context of the available structures. In the discussion of new variants, we emphasize the selectivity of binding to ACE2 vs prominent antibodies rather than simply the antibody escape or ACE2 affinity separately. We note that larger chemical changes, in particular increased electrostatic charge or side-chain volume of exposed surface residues, are recurring in mutations of concern, plausibly related to adaptation to the negative surface potential of human ACE2. We also find indications that the fixated mutations of the S-protein in the main variants are less destabilizing than would be expected on average, possibly pointing toward a selection pressure on the S-protein. The richness of available structures for all of these situations provides an enormously valuable basis for future research into these structure-function relationships.
Collapse
Affiliation(s)
- Rukmankesh Mehra
- Department of Chemistry, Indian Institute
of Technology Bhilai, Sejbahar, Raipur 492015, Chhattisgarh,
India
| | - Kasper P. Kepp
- DTU Chemistry, Technical University of
Denmark, Building 206, 2800 Kongens Lyngby,
Denmark
| |
Collapse
|
8
|
Maddamsetti R. Universal Constraints on Protein Evolution in the Long-Term Evolution Experiment with Escherichia coli. Genome Biol Evol 2021; 13:evab070. [PMID: 33856016 PMCID: PMC8233687 DOI: 10.1093/gbe/evab070] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/30/2021] [Indexed: 12/18/2022] Open
Abstract
Although it is well known that abundant proteins evolve slowly across the tree of life, there is little consensus for why this is true. Here, I report that abundant proteins evolve slowly in the hypermutator populations of Lenski's long-term evolution experiment with Escherichia coli (LTEE). Specifically, the density of all observed mutations per gene, as measured in metagenomic time series covering 60,000 generations of the LTEE, significantly anticorrelates with mRNA abundance, protein abundance, and degree of protein-protein interaction. The same pattern holds for nonsynonymous mutation density. However, synonymous mutation density, measured across the LTEE hypermutator populations, positively correlates with protein abundance. These results show that universal constraints on protein evolution are visible in data spanning three decades of experimental evolution. Therefore, it should be possible to design experiments to answer why abundant proteins evolve slowly.
Collapse
Affiliation(s)
- Rohan Maddamsetti
- Department of Biomedical Engineering, Duke University, Durham, North Carolina, USA
| |
Collapse
|
9
|
Abstract
Darwin's theory of evolution emphasized that positive selection of functional proficiency provides the fitness that ultimately determines the structure of life, a view that has dominated biochemical thinking of enzymes as perfectly optimized for their specific functions. The 20th-century modern synthesis, structural biology, and the central dogma explained the machinery of evolution, and nearly neutral theory explained how selection competes with random fixation dynamics that produce molecular clocks essential e.g. for dating evolutionary histories. However, quantitative proteomics revealed that selection pressures not relating to optimal function play much larger roles than previously thought, acting perhaps most importantly via protein expression levels. This paper first summarizes recent progress in the 21st century toward recovering this universal selection pressure. Then, the paper argues that proteome cost minimization is the dominant, underlying 'non-function' selection pressure controlling most of the evolution of already functionally adapted living systems. A theory of proteome cost minimization is described and argued to have consequences for understanding evolutionary trade-offs, aging, cancer, and neurodegenerative protein-misfolding diseases.
Collapse
|
10
|
Venev SV, Zeldovich KB. Thermophilic Adaptation in Prokaryotes Is Constrained by Metabolic Costs of Proteostasis. Mol Biol Evol 2019; 35:211-224. [PMID: 29106597 PMCID: PMC5850847 DOI: 10.1093/molbev/msx282] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Prokaryotes evolved to thrive in an extremely diverse set of habitats, and their proteomes bear signatures of environmental conditions. Although correlations between amino acid usage and environmental temperature are well-documented, understanding of the mechanisms of thermal adaptation remains incomplete. Here, we couple the energetic costs of protein folding and protein homeostasis to build a microscopic model explaining both the overall amino acid composition and its temperature trends. Low biosynthesis costs lead to low diversity of physical interactions between amino acid residues, which in turn makes proteins less stable and drives up chaperone activity to maintain appropriate levels of folded, functional proteins. Assuming that the cost of chaperone activity is proportional to the fraction of unfolded client proteins, we simulated thermal adaptation of model proteins subject to minimization of the total cost of amino acid synthesis and chaperone activity. For the first time, we predicted both the proteome-average amino acid abundances and their temperature trends simultaneously, and found strong correlations between model predictions and 402 genomes of bacteria and archaea. The energetic constraint on protein evolution is more apparent in highly expressed proteins, selected by codon adaptation index. We found that in bacteria, highly expressed proteins are similar in composition to thermophilic ones, whereas in archaea no correlation between predicted expression level and thermostability was observed. At the same time, thermal adaptations of highly expressed proteins in bacteria and archaea are nearly identical, suggesting that universal energetic constraints prevail over the phylogenetic differences between these domains of life.
Collapse
Affiliation(s)
- Sergey V Venev
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, 368 Plantation St, Worcester, MA
| | - Konstantin B Zeldovich
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, 368 Plantation St, Worcester, MA
| |
Collapse
|
11
|
Yan Z, Wang J. Superfunneled Energy Landscape of Protein Evolution Unifies the Principles of Protein Evolution, Folding, and Design. PHYSICAL REVIEW LETTERS 2019; 122:018103. [PMID: 31012725 DOI: 10.1103/physrevlett.122.018103] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/06/2017] [Revised: 11/08/2018] [Indexed: 06/09/2023]
Abstract
Evolution is essential for shaping the biological functions. Darwin proposed the selection as the driving force for evolution upon mutations. While mutations are clear, the quantification of the selection force is still challenging. In this study, we identified and quantified both thermodynamic stability and kinetic accessibility as the selection forces for protein evolution. The protein evolution can be viewed and quantified as a trajectory moving along a superfunneled energy landscape with a line attractor at the bottom. The resulting evolved sequences and structures show strong protein characteristics including the hydrophobic core, high designability, and fast folding. The evolution principle uncovered here is validated on real proteins and sheds light on the protein design.
Collapse
Affiliation(s)
- Zhiqiang Yan
- State Key Laboratory of Electroanalytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun, Jilin 130022, China
| | - Jin Wang
- State Key Laboratory of Electroanalytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun, Jilin 130022, China
- Department of Chemistry & Physics, State University of New York at Stony Brook, Stony Brook, New York 11790, USA
| |
Collapse
|
12
|
Cipriani V, Debono J, Goldenberg J, Jackson TNW, Arbuckle K, Dobson J, Koludarov I, Li B, Hay C, Dunstan N, Allen L, Hendrikx I, Kwok HF, Fry BG. Correlation between ontogenetic dietary shifts and venom variation in Australian brown snakes (Pseudonaja). Comp Biochem Physiol C Toxicol Pharmacol 2017; 197:53-60. [PMID: 28457945 DOI: 10.1016/j.cbpc.2017.04.007] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/25/2017] [Revised: 04/19/2017] [Accepted: 04/25/2017] [Indexed: 01/17/2023]
Abstract
Venom is a key evolutionary trait, as evidenced by its widespread convergent evolution across the animal kingdom. In an escalating prey-predator arms race, venoms evolve rapidly to guarantee predatory or defensive success. Variation in venom composition is ubiquitous among snakes. Here, we tested variation in venom activity on substrates relevant to blood coagulation among Pseudonaja (brown snake) species, Australian elapids responsible for the majority of medically important human envenomations in Australia. A functional approach was employed to elucidate interspecific variation in venom activity in all nine currently recognised species of Pseudonaja. Fluorometric enzymatic activity assays were performed to test variation in whole venom procoagulant activity among species. Analyses confirmed the previously documented ontogenetic shift from non-coagulopathic venom in juveniles to coagulopathic venom as adults, except for the case of P. modesta, which retains non-coagulopathic venom as an adult. These shifts in venom activity correlate with documented ontogenetic shifts in diet among brown snakes from specialisation on reptilian prey as juveniles (and throughout the life cycle of P. modesta), to a more generalised diet in adults that includes mammals. The results of this study bring to light findings relevant to both clinical and evolutionary toxinology.
Collapse
Affiliation(s)
- Vittoria Cipriani
- Venom Evolution Lab, School of Biological Sciences, University of Queensland, St Lucia, QLD 4072, Australia
| | - Jordan Debono
- Venom Evolution Lab, School of Biological Sciences, University of Queensland, St Lucia, QLD 4072, Australia
| | - Jonathan Goldenberg
- Venom Evolution Lab, School of Biological Sciences, University of Queensland, St Lucia, QLD 4072, Australia
| | - Timothy N W Jackson
- Venom Evolution Lab, School of Biological Sciences, University of Queensland, St Lucia, QLD 4072, Australia; Australian Venom Research Unit, Department of Pharmacology and Therapeutics, University of Melbourne, Parkville, Victoria 3010, Australia
| | - Kevin Arbuckle
- Department of Biosciences, College of Science, Swansea University, Swansea SA2, 8PP, UK
| | - James Dobson
- Venom Evolution Lab, School of Biological Sciences, University of Queensland, St Lucia, QLD 4072, Australia
| | - Ivan Koludarov
- Venom Evolution Lab, School of Biological Sciences, University of Queensland, St Lucia, QLD 4072, Australia
| | - Bin Li
- Faculty of Health Sciences, University of Macau, Avenida da Universidade, Taipa, Macau
| | - Chris Hay
- Venom Evolution Lab, School of Biological Sciences, University of Queensland, St Lucia, QLD 4072, Australia
| | - Nathan Dunstan
- Venom Supplies, Tanunda, South Australia 5352, Australia
| | - Luke Allen
- Venom Supplies, Tanunda, South Australia 5352, Australia
| | - Iwan Hendrikx
- Venom Evolution Lab, School of Biological Sciences, University of Queensland, St Lucia, QLD 4072, Australia
| | - Hang Fai Kwok
- Faculty of Health Sciences, University of Macau, Avenida da Universidade, Taipa, Macau
| | - Bryan G Fry
- Venom Evolution Lab, School of Biological Sciences, University of Queensland, St Lucia, QLD 4072, Australia.
| |
Collapse
|
13
|
The Role of Evolutionary Selection in the Dynamics of Protein Structure Evolution. Biophys J 2017; 112:1350-1365. [PMID: 28402878 DOI: 10.1016/j.bpj.2017.02.029] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2016] [Revised: 02/16/2017] [Accepted: 02/22/2017] [Indexed: 02/05/2023] Open
Abstract
Homology modeling is a powerful tool for predicting a protein's structure. This approach is successful because proteins whose sequences are only 30% identical still adopt the same structure, while structure similarity rapidly deteriorates beyond the 30% threshold. By studying the divergence of protein structure as sequence evolves in real proteins and in evolutionary simulations, we show that this nonlinear sequence-structure relationship emerges as a result of selection for protein folding stability in divergent evolution. Fitness constraints prevent the emergence of unstable protein evolutionary intermediates, thereby enforcing evolutionary paths that preserve protein structure despite broad sequence divergence. However, on longer timescales, evolution is punctuated by rare events where the fitness barriers obstructing structure evolution are overcome and discovery of new structures occurs. We outline biophysical and evolutionary rationale for broad variation in protein family sizes, prevalence of compact structures among ancient proteins, and more rapid structure evolution of proteins with lower packing density.
Collapse
|
14
|
Echave J, Wilke CO. Biophysical Models of Protein Evolution: Understanding the Patterns of Evolutionary Sequence Divergence. Annu Rev Biophys 2017; 46:85-103. [PMID: 28301766 DOI: 10.1146/annurev-biophys-070816-033819] [Citation(s) in RCA: 75] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
For decades, rates of protein evolution have been interpreted in terms of the vague concept of functional importance. Slowly evolving proteins or sites within proteins were assumed to be more functionally important and thus subject to stronger selection pressure. More recently, biophysical models of protein evolution, which combine evolutionary theory with protein biophysics, have completely revolutionized our view of the forces that shape sequence divergence. Slowly evolving proteins have been found to evolve slowly because of selection against toxic misfolding and misinteractions, linking their rate of evolution primarily to their abundance. Similarly, most slowly evolving sites in proteins are not directly involved in function, but mutating these sites has a large impact on protein structure and stability. In this article, we review the studies in the emerging field of biophysical protein evolution that have shaped our current understanding of sequence divergence patterns. We also propose future research directions to develop this nascent field.
Collapse
Affiliation(s)
- Julian Echave
- Escuela de Ciencia y Tecnología, Universidad Nacional de San Martín, 1650 San Martín, Buenos Aires, Argentina; .,Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Argentina
| | - Claus O Wilke
- Department of Integrative Biology, The University of Texas at Austin, Texas 78712;
| |
Collapse
|
15
|
Bershtein S, Serohijos AW, Shakhnovich EI. Bridging the physical scales in evolutionary biology: from protein sequence space to fitness of organisms and populations. Curr Opin Struct Biol 2016; 42:31-40. [PMID: 27810574 DOI: 10.1016/j.sbi.2016.10.013] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2016] [Accepted: 10/14/2016] [Indexed: 01/11/2023]
Abstract
Bridging the gap between the molecular properties of proteins and organismal/population fitness is essential for understanding evolutionary processes. This task requires the integration of the several physical scales of biological organization, each defined by a distinct set of mechanisms and constraints, into a single unifying model. The molecular scale is dominated by the constraints imposed by the physico-chemical properties of proteins and their substrates, which give rise to trade-offs and epistatic (non-additive) effects of mutations. At the systems scale, biological networks modulate protein expression and can either buffer or enhance the fitness effects of mutations. The population scale is influenced by the mutational input, selection regimes, and stochastic changes affecting the size and structure of populations, which eventually determine the evolutionary fate of mutations. Here, we summarize the recent advances in theory, computer simulations, and experiments that advance our understanding of the links between various physical scales in biology.
Collapse
Affiliation(s)
- Shimon Bershtein
- Department of Life Sciences, Ben-Gurion University of the Negev, Beer-Sheva 84501, Israel
| | - Adrian Wr Serohijos
- Département de Biochimie, Centre Robert-Cedergren en Bioinformatique & Génomique, Université de Montréal, Montréal, QC H3T 1J4, Canada
| | - Eugene I Shakhnovich
- Department of Chemistry and Chemical Biology, Harvard University, 12 Oxford Street, Cambridge, MA 02138, United States.
| |
Collapse
|
16
|
Rapid Radiations and the Race to Redundancy: An Investigation of the Evolution of Australian Elapid Snake Venoms. Toxins (Basel) 2016; 8:toxins8110309. [PMID: 27792190 PMCID: PMC5127106 DOI: 10.3390/toxins8110309] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2016] [Revised: 10/17/2016] [Accepted: 10/17/2016] [Indexed: 01/06/2023] Open
Abstract
Australia is the stronghold of the front-fanged venomous snake family Elapidae. The Australasian elapid snake radiation, which includes approximately 100 terrestrial species in Australia, as well as Melanesian species and all the world's sea snakes, is less than 12 million years old. The incredible phenotypic and ecological diversity of the clade is matched by considerable diversity in venom composition. The clade's evolutionary youth and dynamic evolution should make it of particular interest to toxinologists, however, the majority of species, which are small, typically inoffensive, and seldom encountered by non-herpetologists, have been almost completely neglected by researchers. The present study investigates the venom composition of 28 species proteomically, revealing several interesting trends in venom composition, and reports, for the first time in elapid snakes, the existence of an ontogenetic shift in the venom composition and activity of brown snakes (Pseudonaja sp.). Trends in venom composition are compared to the snakes' feeding ecology and the paper concludes with an extended discussion of the selection pressures shaping the evolution of snake venom.
Collapse
|
17
|
Nelson ED, Grishin NV. Evolution of off-lattice model proteins under ligand binding constraints. Phys Rev E 2016; 94:022410. [PMID: 27627338 DOI: 10.1103/physreve.94.022410] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2016] [Indexed: 12/12/2022]
Abstract
We investigate protein evolution using an off-lattice polymer model evolved to imitate the behavior of small enzymes. Model proteins evolve through mutations to nucleotide sequences (including insertions and deletions) and are selected to fold and maintain a specific binding site compatible with a model ligand. We show that this requirement is, in itself, sufficient to maintain an ordered folding domain, and we compare it to the requirement of folding an ordered (but otherwise unrestricted) domain. We measure rates of amino acid change as a function of local environment properties such as solvent exposure, packing density, and distance from the active site, as well as overall rates of sequence and structure change, both along and among model lineages in star phylogenies. The model recapitulates essentially all of the behavior found in protein phylogenetic analyses, and predicts that amino acid substitution rates vary linearly with distance from the binding site.
Collapse
Affiliation(s)
- Erik D Nelson
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, 6001 Forest Park Blvd., Room ND10.124, Dallas, Texas 75235-9050, USA
| | - Nick V Grishin
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, 6001 Forest Park Blvd., Room ND10.124, Dallas, Texas 75235-9050, USA
| |
Collapse
|
18
|
Venev SV, Zeldovich KB. Massively parallel sampling of lattice proteins reveals foundations of thermal adaptation. J Chem Phys 2016; 143:055101. [PMID: 26254668 DOI: 10.1063/1.4927565] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Evolution of proteins in bacteria and archaea living in different conditions leads to significant correlations between amino acid usage and environmental temperature. The origins of these correlations are poorly understood, and an important question of protein theory, physics-based prediction of types of amino acids overrepresented in highly thermostable proteins, remains largely unsolved. Here, we extend the random energy model of protein folding by weighting the interaction energies of amino acids by their frequencies in protein sequences and predict the energy gap of proteins designed to fold well at elevated temperatures. To test the model, we present a novel scalable algorithm for simultaneous energy calculation for many sequences in many structures, targeting massively parallel computing architectures such as graphics processing unit. The energy calculation is performed by multiplying two matrices, one representing the complete set of sequences, and the other describing the contact maps of all structural templates. An implementation of the algorithm for the CUDA platform is available at http://www.github.com/kzeldovich/galeprot and calculates protein folding energies over 250 times faster than a single central processing unit. Analysis of amino acid usage in 64-mer cubic lattice proteins designed to fold well at different temperatures demonstrates an excellent agreement between theoretical and simulated values of energy gap. The theoretical predictions of temperature trends of amino acid frequencies are significantly correlated with bioinformatics data on 191 bacteria and archaea, and highlight protein folding constraints as a fundamental selection pressure during thermal adaptation in biological evolution.
Collapse
Affiliation(s)
- Sergey V Venev
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, 368 Plantation St, Worcester, Massachusetts 01605, USA
| | - Konstantin B Zeldovich
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, 368 Plantation St, Worcester, Massachusetts 01605, USA
| |
Collapse
|
19
|
Woodard JC, Dunatunga S, Shakhnovich EI. A Simple Model of Protein Domain Swapping in Crowded Cellular Environments. Biophys J 2016; 110:2367-2376. [PMID: 27276255 DOI: 10.1016/j.bpj.2016.04.033] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2016] [Revised: 04/07/2016] [Accepted: 04/20/2016] [Indexed: 11/25/2022] Open
Abstract
Domain swapping in proteins is an important mechanism of functional and structural innovation. However, despite its ubiquity and importance, the physical mechanisms that lead to domain swapping are poorly understood. Here, we present a simple two-dimensional coarse-grained model of protein domain swapping in the cytoplasm. In our model, two-domain proteins partially unfold and diffuse in continuous space. Monte Carlo multiprotein simulations of the model reveal that domain swapping occurs at intermediate temperatures, whereas folded dimers and folded monomers prevail at low temperatures, and partially unfolded monomers predominate at high temperatures. We use a simplified amino acid alphabet consisting of four residue types, and find that the oligomeric state at a given temperature depends on the sequence of the protein. We also show that hinge strain between domains can promote domain swapping, consistent with experimental observations for real proteins. Domain swapping depends nonmonotonically on the protein concentration, with domain-swapped dimers occurring at intermediate concentrations and nonspecific interactions between partially unfolded proteins occurring at high concentrations. For folded proteins, we recover the result obtained in three-dimensional lattice simulations, i.e., that functional dimerization is most prevalent at intermediate temperatures and nonspecific interactions increase at low temperatures.
Collapse
Affiliation(s)
- Jaie C Woodard
- Graduate Program in Biophysics, Harvard University, Cambridge, Massachusetts; Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts
| | - Sachith Dunatunga
- Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts
| | - Eugene I Shakhnovich
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts.
| |
Collapse
|
20
|
Anti-Icing Superhydrophobic Surfaces: Controlling Entropic Molecular Interactions to Design Novel Icephobic Concrete. ENTROPY 2016. [DOI: 10.3390/e18040132] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|
21
|
Sikosek T, Chan HS. Biophysics of protein evolution and evolutionary protein biophysics. J R Soc Interface 2015; 11:20140419. [PMID: 25165599 DOI: 10.1098/rsif.2014.0419] [Citation(s) in RCA: 163] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
The study of molecular evolution at the level of protein-coding genes often entails comparing large datasets of sequences to infer their evolutionary relationships. Despite the importance of a protein's structure and conformational dynamics to its function and thus its fitness, common phylogenetic methods embody minimal biophysical knowledge of proteins. To underscore the biophysical constraints on natural selection, we survey effects of protein mutations, highlighting the physical basis for marginal stability of natural globular proteins and how requirement for kinetic stability and avoidance of misfolding and misinteractions might have affected protein evolution. The biophysical underpinnings of these effects have been addressed by models with an explicit coarse-grained spatial representation of the polypeptide chain. Sequence-structure mappings based on such models are powerful conceptual tools that rationalize mutational robustness, evolvability, epistasis, promiscuous function performed by 'hidden' conformational states, resolution of adaptive conflicts and conformational switches in the evolution from one protein fold to another. Recently, protein biophysics has been applied to derive more accurate evolutionary accounts of sequence data. Methods have also been developed to exploit sequence-based evolutionary information to predict biophysical behaviours of proteins. The success of these approaches demonstrates a deep synergy between the fields of protein biophysics and protein evolution.
Collapse
Affiliation(s)
- Tobias Sikosek
- Department of Biochemistry, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Physics, University of Toronto, Toronto, Ontario, Canada M5S 1A8
| | - Hue Sun Chan
- Department of Biochemistry, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Physics, University of Toronto, Toronto, Ontario, Canada M5S 1A8
| |
Collapse
|
22
|
Nelson ED, Grishin NV. Structural evolution of proteinlike heteropolymers. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2014; 90:062715. [PMID: 25615137 DOI: 10.1103/physreve.90.062715] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/24/2014] [Indexed: 06/04/2023]
Abstract
The biological function of a protein often depends on the formation of an ordered structure in order to support a smaller, chemically active configuration of amino acids against thermal fluctuations. Here we explore the development of proteins evolving to satisfy this requirement using an off-lattice polymer model in which monomers interact as low resolution amino acids. To evolve the model, we construct a Markov process in which sequences are subjected to random replacements, insertions, and deletions and are selected to recover a predefined minimum number of solid-ordered monomers using the Lindemann melting criterion. We show that polymers generated by this process consistently fold into soluble, ordered globules of similar length and complexity to small protein motifs. To compare the evolution of the globules with proteins, we analyze the statistics of amino acid replacements, the dependence of site mutation rates on solvent exposure, and the dependence of structural distance on sequence distance for homologous alignments. Despite the simplicity of the model, the results display a surprisingly close correspondence with protein data.
Collapse
Affiliation(s)
- Erik D Nelson
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, 6001 Forest Park Boulevard, Room ND10.124, Dallas, Texas 75235-9050, USA
| | - Nick V Grishin
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, 6001 Forest Park Boulevard, Room ND10.124, Dallas, Texas 75235-9050, USA
| |
Collapse
|
23
|
Dasmeh P, Serohijos AWR, Kepp KP, Shakhnovich EI. The influence of selection for protein stability on dN/dS estimations. Genome Biol Evol 2014; 6:2956-67. [PMID: 25355808 PMCID: PMC4224349 DOI: 10.1093/gbe/evu223] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Understanding the relative contributions of various evolutionary processes-purifying selection, neutral drift, and adaptation-is fundamental to evolutionary biology. A common metric to distinguish these processes is the ratio of nonsynonymous to synonymous substitutions (i.e., dN/dS) interpreted from the neutral theory as a null model. However, from biophysical considerations, mutations have non-negligible effects on the biophysical properties of proteins such as folding stability. In this work, we investigated how stability affects the rate of protein evolution in phylogenetic trees by using simulations that combine explicit protein sequences with associated stability changes. We first simulated myoglobin evolution in phylogenetic trees with a biophysically realistic approach that accounts for 3D structural information and estimates of changes in stability upon mutation. We then compared evolutionary rates inferred directly from simulation to those estimated using maximum-likelihood (ML) methods. We found that the dN/dS estimated by ML methods (ωML) is highly predictive of the per gene dN/dS inferred from the simulated phylogenetic trees. This agreement is strong in the regime of high stability where protein evolution is neutral. At low folding stabilities and under mutation-selection balance, we observe deviations from neutrality (per gene dN/dS > 1 and dN/dS < 1). We showed that although per gene dN/dS is robust to these deviations, ML tests for positive selection detect statistically significant per site dN/dS > 1. Altogether, we show how protein biophysics affects the dN/dS estimations and its subsequent interpretation. These results are important for improving the current approaches for detecting positive selection.
Collapse
Affiliation(s)
- Pouria Dasmeh
- Department of Chemistry and Chemical Biology, Harvard University DTU Chemistry, Technical University of Denmark, Kongens Lyngby, Denmark Present address: Max Planck Institute of Immunobiology and Epigenetics, Stübeweg, Freiburg, Germany
| | | | - Kasper P Kepp
- DTU Chemistry, Technical University of Denmark, Kongens Lyngby, Denmark
| | | |
Collapse
|
24
|
Merging molecular mechanism and evolution: theory and computation at the interface of biophysics and evolutionary population genetics. Curr Opin Struct Biol 2014; 26:84-91. [PMID: 24952216 DOI: 10.1016/j.sbi.2014.05.005] [Citation(s) in RCA: 60] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2014] [Revised: 04/19/2014] [Accepted: 05/16/2014] [Indexed: 11/24/2022]
Abstract
The variation among sequences and structures in nature is both determined by physical laws and by evolutionary history. However, these two factors are traditionally investigated by disciplines with different emphasis and philosophy-molecular biophysics on one hand and evolutionary population genetics in another. Here, we review recent theoretical and computational approaches that address the crucial need to integrate these two disciplines. We first articulate the elements of these approaches. Then, we survey their contribution to our mechanistic understanding of molecular evolution, the polymorphisms in coding region, the distribution of fitness effects (DFE) of mutations, the observed folding stability of proteins in nature, and the distribution of protein folds in genomes.
Collapse
|
25
|
Loss of quaternary structure is associated with rapid sequence divergence in the OSBS family. Proc Natl Acad Sci U S A 2014; 111:8535-40. [PMID: 24872444 DOI: 10.1073/pnas.1318703111] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
The rate of protein evolution is determined by a combination of selective pressure on protein function and biophysical constraints on protein folding and structure. Determining the relative contributions of these properties is an unsolved problem in molecular evolution with broad implications for protein engineering and function prediction. As a case study, we examined the structural divergence of the rapidly evolving o-succinylbenzoate synthase (OSBS) family, which catalyzes a step in menaquinone synthesis in diverse microorganisms and plants. On average, the OSBS family is much more divergent than other protein families from the same set of species, with the most divergent family members sharing <15% sequence identity. Comparing 11 representative structures revealed that loss of quaternary structure and large deletions or insertions are associated with the family's rapid evolution. Neither of these properties has been investigated in previous studies to identify factors that affect the rate of protein evolution. Intriguingly, one subfamily retained a multimeric quaternary structure and has small insertions and deletions compared with related enzymes that catalyze diverse reactions. Many proteins in this subfamily catalyze both OSBS and N-succinylamino acid racemization (NSAR). Retention of ancestral structural characteristics in the NSAR/OSBS subfamily suggests that the rate of protein evolution is not proportional to the capacity to evolve new protein functions. Instead, structural features that are conserved among proteins with diverse functions might contribute to the evolution of new functions.
Collapse
|
26
|
Galzitskaya OV, Pereyaslavets LB, Glyakina AV. Folding of Right- and Left-Handed Three-Helix Proteins. Isr J Chem 2014. [DOI: 10.1002/ijch.201300146] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
27
|
Kepp KP, Dasmeh P. A model of proteostatic energy cost and its use in analysis of proteome trends and sequence evolution. PLoS One 2014; 9:e90504. [PMID: 24587382 PMCID: PMC3938754 DOI: 10.1371/journal.pone.0090504] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2013] [Accepted: 02/03/2014] [Indexed: 12/25/2022] Open
Abstract
A model of proteome-associated chemical energetic costs of cells is derived from protein-turnover kinetics and protein folding. Minimization of the proteostatic maintenance cost can explain a range of trends of proteomes and combines both protein function, stability, size, proteostatic cost, temperature, resource availability, and turnover rates in one simple framework. We then explore the ansatz that the chemical energy remaining after proteostatic maintenance is available for reproduction (or cell division) and thus, proportional to organism fitness. Selection for lower proteostatic costs is then shown to be significant vs. typical effective population sizes of yeast. The model explains and quantifies evolutionary conservation of highly abundant proteins as arising both from functional mutations and from changes in other properties such as stability, cost, or turnover rates. We show that typical hypomorphic mutations can be selected against due to increased cost of compensatory protein expression (both in the mutated gene and in related genes, i.e. epistasis) rather than compromised function itself, although this compensation depends on the protein's importance. Such mutations exhibit larger selective disadvantage in abundant, large, synthetically costly, and/or short-lived proteins. Selection against increased turnover costs of less stable proteins rather than misfolding toxicity per se can explain equilibrium protein stability distributions, in agreement with recent findings in E. coli. The proteostatic selection pressure is stronger at low metabolic rates (i.e. scarce environments) and in hot habitats, explaining proteome adaptations towards rough environments as a question of energy. The model may also explain several trade-offs observed in protein evolution and suggests how protein properties can coevolve to maintain low proteostatic cost.
Collapse
Affiliation(s)
- Kasper P. Kepp
- Department of Chemistry, Technical University of Denmark, Kongens Lyngby, Denmark
- * E-mail:
| | - Pouria Dasmeh
- Department of Chemistry, Technical University of Denmark, Kongens Lyngby, Denmark
| |
Collapse
|
28
|
Çetinbaş M, Shakhnovich EI. Catalysis of protein folding by chaperones accelerates evolutionary dynamics in adapting cell populations. PLoS Comput Biol 2013; 9:e1003269. [PMID: 24244114 PMCID: PMC3820506 DOI: 10.1371/journal.pcbi.1003269] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2013] [Accepted: 08/23/2013] [Indexed: 11/19/2022] Open
Abstract
Although molecular chaperones are essential components of protein homeostatic machinery, their mechanism of action and impact on adaptation and evolutionary dynamics remain controversial. Here we developed a physics-based ab initio multi-scale model of a living cell for population dynamics simulations to elucidate the effect of chaperones on adaptive evolution. The 6-loci genomes of model cells encode model proteins, whose folding and interactions in cellular milieu can be evaluated exactly from their genome sequences. A genotype-phenotype relationship that is based on a simple yet non-trivially postulated protein-protein interaction (PPI) network determines the cell division rate. Model proteins can exist in native and molten globule states and participate in functional and all possible promiscuous non-functional PPIs. We find that an active chaperone mechanism, whereby chaperones directly catalyze protein folding, has a significant impact on the cellular fitness and the rate of evolutionary dynamics, while passive chaperones, which just maintain misfolded proteins in soluble complexes have a negligible effect on the fitness. We find that by partially releasing the constraint on protein stability, active chaperones promote a deeper exploration of sequence space to strengthen functional PPIs, and diminish the non-functional PPIs. A key experimentally testable prediction emerging from our analysis is that down-regulation of chaperones that catalyze protein folding significantly slows down the adaptation dynamics. Molecular chaperones or heat-shock proteins are essential components of protein homeostatic machinery in all three domains of life, whose role is not only to prevent protein aggregation but also catalyze the protein folding process by decreasing the energetic barrier for folding. Importantly, chaperones have often been implicated as phenotypic capacitors since they buffer the deleterious effects of mutations, promote genetic diversity, and thus speed up adaptive evolution. Here we explore computationally the consequences of chaperone activity in cytoplasm via long-time evolutionary dynamics simulations. We use a 6-loci multi scale model of cell populations, where the fitness of each cell is determined from its genome, based on statistical mechanical principles of protein folding and protein-protein interactions. We find that by catalyzing protein folding chaperones buffer the deleterious effect of mutations on folding stability and thus open up a sequence space for efficient and simultaneous optimization of multiple molecular traits determining the cellular fitness. As a result, chaperones dramatically accelerate adaptation dynamics.
Collapse
Affiliation(s)
- Murat Çetinbaş
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Eugene I. Shakhnovich
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts, United States of America
- * E-mail:
| |
Collapse
|
29
|
Lobkovsky AE, Wolf YI, Koonin EV. Gene frequency distributions reject a neutral model of genome evolution. Genome Biol Evol 2013; 5:233-42. [PMID: 23315380 PMCID: PMC3595032 DOI: 10.1093/gbe/evt002] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Evolution of prokaryotes involves extensive loss and gain of genes, which lead to substantial differences in the gene repertoires even among closely related organisms. Through a wide range of phylogenetic depths, gene frequency distributions in prokaryotic pangenomes bear a characteristic, asymmetrical U-shape, with a core of (nearly) universal genes, a “shell” of moderately common genes, and a “cloud” of rare genes. We employ mathematical modeling to investigate evolutionary processes that might underlie this universal pattern. Gene frequency distributions for almost 400 groups of 10 bacterial or archaeal species each over a broad range of evolutionary distances were fit to steady-state, infinite allele models based on the distribution of gene replacement rates and the phylogenetic tree relating the species in each group. The fits of the theoretical frequency distributions to the empirical ones yield model parameters and estimates of the goodness of fit. Using the Akaike Information Criterion, we show that the neutral model of genome evolution, with the same replacement rate for all genes, can be confidently rejected. Of the three tested models with purifying selection, the one in which the distribution of replacement rates is derived from a stochastic population model with additive per-gene fitness yields the best fits to the data. The selection strength estimated from the fits declines with evolutionary divergence while staying well outside the neutral regime. These findings indicate that, unlike some other universal distributions of genomic variables, for example, the distribution of paralogous gene family membership, the gene frequency distribution is substantially affected by selection.
Collapse
Affiliation(s)
- Alexander E Lobkovsky
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | | | | |
Collapse
|
30
|
Lobkovsky AE, Wolf YI, Koonin EV. Quantifying the similarity of monotonic trajectories in rough and smooth fitness landscapes. MOLECULAR BIOSYSTEMS 2013; 9:1627-31. [PMID: 23460358 DOI: 10.1039/c3mb25553k] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
When selection is strong and mutations are rare, evolution can be thought of as an uphill trajectory in a rugged fitness landscape. In this context the fitness landscape is a directed acyclic graph in which nodes are genotypes and edges lead from lower to higher fitness genotypes that differ by a single mutation. Because the space of genotypes is vastly multi-dimensional, classification of fitness landscapes is challenging. Many proposed summary characteristics of fitness landscapes attempt to quantify biologically relevant and intuitive notions such as roughness or peak accessibility in alternative ways. Here we explore, in different types of landscapes, the behavior of the recently introduced mean path divergence which quantifies the degree of similarity among evolutionary trajectories with the same endpoints. We find that monotonic trajectories in empirical and model fitness landscapes are significantly more constrained, with low median path divergence, than those in purely additive landscapes. By contrast, transcription factor sequence specificity (aptamer binding affinity) landscapes are markedly smoother and allow substantial variability in monotonic paths that can be greater than that in fully additive landscapes. We propose that the smoothness of the specificity landscapes is a consequence of the simple dependence of the transcription factor binding affinity on the aptamer sequence in contrast to the complex sequence-fitness mapping in folding landscapes.
Collapse
Affiliation(s)
- Alexander E Lobkovsky
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA
| | | | | |
Collapse
|
31
|
Differential requirements for mRNA folding partially explain why highly expressed proteins evolve slowly. Proc Natl Acad Sci U S A 2013; 110:E678-86. [PMID: 23382244 DOI: 10.1073/pnas.1218066110] [Citation(s) in RCA: 92] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The cause of the tremendous among-protein variation in the rate of sequence evolution is a central subject of molecular evolution. Expression level has been identified as a leading determinant of this variation among genes encoded in the same genome, but the underlying mechanisms are not fully understood. We here propose and demonstrate that a requirement for stronger folding of more abundant mRNAs results in slower evolution of more highly expressed genes and proteins. Specifically, we show that: (i) the higher the expression level of a gene, the greater the selective pressure for its mRNA to fold; (ii) random mutations are more likely to decrease mRNA folding when occurring in highly expressed genes than in lowly expressed genes; and (iii) amino acid substitution rate is negatively correlated with mRNA folding strength, with or without the control of expression level. Furthermore, synonymous (d(S)) and nonsynonymous (d(N)) nucleotide substitution rates are both negatively correlated with mRNA folding strength. However, counterintuitively, d(S) and d(N) are differentially constrained by selection for mRNA folding, resulting in a significant correlation between mRNA folding strength and d(N)/d(S), even when gene expression level is controlled. The direction and magnitude of this correlation is determined primarily by the G+C frequency at third codon positions. Together, these findings explain why highly expressed genes evolve slowly, demonstrate a major role of natural selection at the mRNA level in constraining protein evolution, and reveal a previously unrecognized and unexpected form of nonprotein-level selection that impacts d(N)/d(S).
Collapse
|
32
|
|
33
|
Lobkovsky AE, Koonin EV. Replaying the tape of life: quantification of the predictability of evolution. Front Genet 2012; 3:246. [PMID: 23226153 PMCID: PMC3509945 DOI: 10.3389/fgene.2012.00246] [Citation(s) in RCA: 66] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2012] [Accepted: 10/23/2012] [Indexed: 12/11/2022] Open
Abstract
The question whether adaptation follows a deterministic route largely prescribed by the environment or can proceed along a large number of alternative trajectories has engaged extensive research over the recent years. Experimental evolution studies enabled by advances in high throughput techniques for genome sequencing and manipulation, along with increasingly detailed mathematical modeling of fitness landscapes, are beginning to allow quantitative exploration of the repeatability of evolutionary trajectories. It is becoming clear that evolutionary trajectories in static correlated fitness landscapes are substantially non-random but the relative contributions of determinism and stochasticity in the evolution of specific phenotypes strongly depend on the specific conditions, particularly the magnitude of the selective pressure and the number of available beneficial mutations.
Collapse
Affiliation(s)
- Alexander E Lobkovsky
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health Bethesda, MD, USA
| | | |
Collapse
|
34
|
Protein biophysics explains why highly abundant proteins evolve slowly. Cell Rep 2012; 2:249-56. [PMID: 22938865 DOI: 10.1016/j.celrep.2012.06.022] [Citation(s) in RCA: 92] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2012] [Revised: 05/03/2012] [Accepted: 06/21/2012] [Indexed: 12/26/2022] Open
Abstract
The consistent observation across all kingdoms of life that highly abundant proteins evolve slowly demonstrates that cellular abundance is a key determinant of protein evolutionary rate. However, other empirical findings, such as the broad distribution of evolutionary rates, suggest that additional variables determine the rate of protein evolution. Here, we report that under the global selection against the cytotoxic effects of misfolded proteins, folding stability (ΔG), simultaneous with abundance, is a causal variable of evolutionary rate. Using both theoretical analysis and multiscale simulations, we demonstrate that the anticorrelation between the premutation ΔG and the arising mutational effect (ΔΔG), purely biophysical in origin, is a necessary requirement for abundance-evolutionary rate covariation. Additionally, we predict and demonstrate in bacteria that the strength of abundance-evolutionary rate correlation depends on the divergence time separating reference genomes. Altogether, these results highlight the intrinsic role of protein biophysics in the emerging universal patterns of molecular evolution.
Collapse
|
35
|
Abstract
Much molecular-evolution research is concerned with sequence analysis. Yet these sequences represent real, three-dimensional molecules with complex structure and function. Here I highlight a growing trend in the field to incorporate molecular structure and function into computational molecular-evolution work. I consider three focus areas: reconstruction and analysis of past evolutionary events, such as phylogenetic inference or methods to infer selection pressures; development of toy models and simulations to identify fundamental principles of molecular evolution; and atom-level, highly realistic computational modeling of molecular structure and function aimed at making predictions about possible future evolutionary events.
Collapse
Affiliation(s)
- Claus O Wilke
- Institute of Cell and Molecular Biology, The University of Texas at Austin, Austin, Texas, United States of America.
| |
Collapse
|
36
|
Mathematical modelling of transformations of asymmetrically distributed biological data: An application to a quantitative classification of spiny neurons of the human putamen. J Theor Biol 2012; 302:81-8. [DOI: 10.1016/j.jtbi.2012.02.027] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2012] [Revised: 02/24/2012] [Accepted: 02/28/2012] [Indexed: 11/23/2022]
|
37
|
Toll-Riera M, Bostick D, Albà MM, Plotkin JB. Structure and age jointly influence rates of protein evolution. PLoS Comput Biol 2012; 8:e1002542. [PMID: 22693443 PMCID: PMC3364943 DOI: 10.1371/journal.pcbi.1002542] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2012] [Accepted: 04/17/2012] [Indexed: 12/01/2022] Open
Abstract
What factors determine a protein's rate of evolution are actively debated. Especially unclear is the relative role of intrinsic factors of present-day proteins versus historical factors such as protein age. Here we study the interplay of structural properties and evolutionary age, as determinants of protein evolutionary rate. We use a large set of one-to-one orthologs between human and mouse proteins, with mapped PDB structures. We report that previously observed structural correlations also hold within each age group – including relationships between solvent accessibility, designabililty, and evolutionary rates. However, age also plays a crucial role: age modulates the relationship between solvent accessibility and rate. Additionally, younger proteins, despite being less designable, tend to evolve faster than older proteins. We show that previously reported relationships between age and rate cannot be explained by structural biases among age groups. Finally, we introduce a knowledge-based potential function to study the stability of proteins through large-scale computation. We find that older proteins are more stable for their native structure, and more robust to mutations, than younger ones. Our results underscore that several determinants, both intrinsic and historical, can interact to determine rates of protein evolution. Rates of protein evolution vary dramatically within and between organisms. But the factors that determine a protein's evolutionary rate are still under debate, despite extensive studies over the past decade. Several determinants have been proposed, for example gene expression, the importance of the gene for the organism, the number of physical or genetic interactions it has, its structural characteristics, or when it originated. Here we study how age and structural characteristics interact with one another to influence evolutionary rates. We use a set of one-to-one orthologs of human and mouse proteins, with known crystal structures. We find that these two determinants interact: for example, the age of protein modulates how its structure correlates with evolutionary rate. Nonetheless, the influence of age on evolutionary rate cannot be explained by its interplay with structure.
Collapse
Affiliation(s)
- Macarena Toll-Riera
- Evolutionary Genomics Group, Fundació Institut Municipal d'Investigació Mèdica (FIMIM)- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - David Bostick
- Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - M. Mar Albà
- Evolutionary Genomics Group, Fundació Institut Municipal d'Investigació Mèdica (FIMIM)- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain
- * E-mail: (MMA); (JBP)
| | - Joshua B. Plotkin
- Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- * E-mail: (MMA); (JBP)
| |
Collapse
|
38
|
Analytic markovian rates for generalized protein structure evolution. PLoS One 2012; 7:e34228. [PMID: 22693543 PMCID: PMC3367531 DOI: 10.1371/journal.pone.0034228] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2011] [Accepted: 02/26/2012] [Indexed: 12/24/2022] Open
Abstract
A general understanding of the complex phenomenon of protein evolution requires the accurate description of the constraints that define the sub-space of proteins with mutations that do not appreciably reduce the fitness of the organism. Such constraints can have multiple origins, in this work we present a model for constrained evolutionary trajectories represented by a Markovian process throughout a set of protein-like structures artificially constructed to be topological intermediates between the structure of two natural occurring proteins. The number and type of intermediate steps defines how constrained the total evolutionary process is. By using a coarse-grained representation for the protein structures, we derive an analytic formulation of the transition rates between each of the intermediate structures. The results indicate that compact structures with a high number of hydrogen bonds are more probable and have a higher likelihood to arise during evolution. Knowledge of the transition rates allows for the study of complex evolutionary pathways represented by trajectories through a set of intermediate structures.
Collapse
|
39
|
Lobkovsky AE, Wolf YI, Koonin EV. Predictability of evolutionary trajectories in fitness landscapes. PLoS Comput Biol 2011; 7:e1002302. [PMID: 22194675 PMCID: PMC3240586 DOI: 10.1371/journal.pcbi.1002302] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2011] [Accepted: 10/29/2011] [Indexed: 11/19/2022] Open
Abstract
Experimental studies on enzyme evolution show that only a small fraction of all possible mutation trajectories are accessible to evolution. However, these experiments deal with individual enzymes and explore a tiny part of the fitness landscape. We report an exhaustive analysis of fitness landscapes constructed with an off-lattice model of protein folding where fitness is equated with robustness to misfolding. This model mimics the essential features of the interactions between amino acids, is consistent with the key paradigms of protein folding and reproduces the universal distribution of evolutionary rates among orthologous proteins. We introduce mean path divergence as a quantitative measure of the degree to which the starting and ending points determine the path of evolution in fitness landscapes. Global measures of landscape roughness are good predictors of path divergence in all studied landscapes: the mean path divergence is greater in smooth landscapes than in rough ones. The model-derived and experimental landscapes are significantly smoother than random landscapes and resemble additive landscapes perturbed with moderate amounts of noise; thus, these landscapes are substantially robust to mutation. The model landscapes show a deficit of suboptimal peaks even compared with noisy additive landscapes with similar overall roughness. We suggest that smoothness and the substantial deficit of peaks in the fitness landscapes of protein evolution are fundamental consequences of the physics of protein folding. Is evolution deterministic, hence predictable, or stochastic, that is unpredictable? What would happen if one could “replay the tape of evolution”: will the outcomes of evolution be completely different or is evolution so constrained that history will be repeated? Arguably, these questions are among the most intriguing and most difficult in evolutionary biology. In other words, the predictability of evolution depends on the fraction of the trajectories on fitness landscapes that are accessible for evolutionary exploration. Because direct experimental investigation of fitness landscapes is technically challenging, the available studies only explore a minuscule portion of the landscape for individual enzymes. We therefore sought to investigate the topography of fitness landscapes within the framework of a previously developed model of protein folding and evolution where fitness is equated with robustness to misfolding. We show that model-derived and experimental landscapes are significantly smoother than random landscapes and resemble moderately perturbed additive landscapes; thus, these landscapes are substantially robust to mutation. The model landscapes show a deficit of suboptimal peaks even compared with noisy additive landscapes with similar overall roughness. Thus, the smoothness and substantial deficit of peaks in fitness landscapes of protein evolution could be fundamental consequences of the physics of protein folding.
Collapse
Affiliation(s)
- Alexander E. Lobkovsky
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Yuri I. Wolf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Eugene V. Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
- * E-mail:
| |
Collapse
|
40
|
Galzitskaya OV, Bogatyreva NS, Glyakina AV. Bacterial proteins fold faster than eukaryotic proteins with simple folding kinetics. BIOCHEMISTRY (MOSCOW) 2011; 76:225-35. [PMID: 21568856 DOI: 10.1134/s000629791102009x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Protein domain frequency and distribution among kingdoms was statistically analyzed using the SCOP structural database. It appeared that among chosen protein domains with the best resolution, eukaryotic proteins more often belong to α-helical and β-structural proteins, while proteins of bacterial origin belong to α/β structural class. Statistical analysis of folding rates of 73 proteins with known experimental data revealed that bacterial proteins with simple kinetics (23 proteins) exhibit a higher folding rate compared to eukaryotic proteins with simple folding kinetics (27 proteins). Analysis of protein domain amino acid composition showed that the frequency of amino acid residues in proteins of eukaryotic and bacterial origin is different for proteins with simple and complex folding kinetics.
Collapse
Affiliation(s)
- O V Galzitskaya
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, Moscow Region, Russia.
| | | | | |
Collapse
|
41
|
Managadze D, Rogozin IB, Chernikova D, Shabalina SA, Koonin EV. Negative correlation between expression level and evolutionary rate of long intergenic noncoding RNAs. Genome Biol Evol 2011; 3:1390-404. [PMID: 22071789 PMCID: PMC3242500 DOI: 10.1093/gbe/evr116] [Citation(s) in RCA: 74] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Mammalian genomes contain numerous genes for long noncoding RNAs (lncRNAs). The functions of the lncRNAs remain largely unknown but their evolution appears to be constrained by purifying selection, albeit relatively weakly. To gain insights into the mode of evolution and the functional range of the lncRNA, they can be compared with much better characterized protein-coding genes. The evolutionary rate of the protein-coding genes shows a universal negative correlation with expression: highly expressed genes are on average more conserved during evolution than the genes with lower expression levels. This correlation was conceptualized in the misfolding-driven protein evolution hypothesis according to which misfolding is the principal cost incurred by protein expression. We sought to determine whether long intergenic ncRNAs (lincRNAs) follow the same evolutionary trend and indeed detected a moderate but statistically significant negative correlation between the evolutionary rate and expression level of human and mouse lincRNA genes. The magnitude of the correlation for the lincRNAs is similar to that for equal-sized sets of protein-coding genes with similar levels of sequence conservation. Additionally, the expression level of the lincRNAs is significantly and positively correlated with the predicted extent of lincRNA molecule folding (base-pairing), however, the contributions of evolutionary rates and folding to the expression level are independent. Thus, the anticorrelation between evolutionary rate and expression level appears to be a general feature of gene evolution that might be caused by similar deleterious effects of protein and RNA misfolding and/or other factors, for example, the number of interacting partners of the gene product.
Collapse
Affiliation(s)
- David Managadze
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | | | | | | | | |
Collapse
|
42
|
Abstract
Research in quantitative evolutionary genomics and systems biology led to the discovery of several universal regularities connecting genomic and molecular phenomic variables. These universals include the log-normal distribution of the evolutionary rates of orthologous genes; the power law–like distributions of paralogous family size and node degree in various biological networks; the negative correlation between a gene's sequence evolution rate and expression level; and differential scaling of functional classes of genes with genome size. The universals of genome evolution can be accounted for by simple mathematical models similar to those used in statistical physics, such as the birth-death-innovation model. These models do not explicitly incorporate selection; therefore, the observed universal regularities do not appear to be shaped by selection but rather are emergent properties of gene ensembles. Although a complete physical theory of evolutionary biology is inconceivable, the universals of genome evolution might qualify as “laws of evolutionary genomics” in the same sense “law” is understood in modern physics.
Collapse
Affiliation(s)
- Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America.
| |
Collapse
|
43
|
Hamacher K. Free energy of contact formation in proteins: efficient computation in the elastic network approximation. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2011; 84:016703. [PMID: 21867339 DOI: 10.1103/physreve.84.016703] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2011] [Indexed: 05/31/2023]
Abstract
Biomolecular simulations have become a major tool in understanding biomolecules and their complexes. However, one can typically only investigate a few mutants or scenarios due to the severe computational demands of such simulations, leading to a great interest in method development to overcome this restriction. One way to achieve this is to reduce the complexity of the systems by an approximation of the forces acting upon the constituents of the molecule. The harmonic approximation used in elastic network models simplifies the physical complexity to the most reduced dynamics of these molecular systems. The reduced polymer modeled this way is typically comprised of mass points representing coarse-grained versions of, e.g., amino acids. In this work, we show how the computation of free energy contributions of contacts between two residues within the molecule can be reduced to a simple lookup operation in a precomputable matrix. Being able to compute such contributions is of great importance: protein design or molecular evolution changes introduce perturbations to these pair interactions, so we need to understand their impact. Perturbation to the interactions occurs due to randomized and fixated changes (in molecular evolution) or designed modifications of the protein structures (in bioengineering). These perturbations are modifications in the topology and the strength of the interactions modeled by the elastic network models. We apply the new algorithm to (1) the bovine trypsin inhibitor, a well-known enzyme in biomedicine, and show the connection to folding properties and the hydrophobic collapse hypothesis and (2) the serine proteinase inhibitor CI-2 and show the correlation to Φ values to characterize folding importance. Furthermore, we discuss the computational complexity and show empirical results for the average case, sampled over a library of 77 structurally diverse proteins. We found a relative speedup of up to 10 000-fold for large proteins with respect to repeated application of the initial model.
Collapse
Affiliation(s)
- Kay Hamacher
- TU Darmstadt, Departments of Biology, Physics, and Computer Science, Darmstadt, Germany.
| |
Collapse
|
44
|
Koonin EV, Wolf YI. Constraints and plasticity in genome and molecular-phenome evolution. Nat Rev Genet 2011; 11:487-98. [PMID: 20548290 DOI: 10.1038/nrg2810] [Citation(s) in RCA: 106] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Multiple constraints variously affect different parts of the genomes of diverse life forms. The selective pressures that shape the evolution of viral, archaeal, bacterial and eukaryotic genomes differ markedly, even among relatively closely related animal and bacterial lineages; by contrast, constraints affecting protein evolution seem to be more universal. The constraints that shape the evolution of genomes and phenomes are complemented by the plasticity and robustness of genome architecture, expression and regulation. Taken together, these findings are starting to reveal complex networks of evolutionary processes that must be integrated to attain a new synthesis of evolutionary biology.
Collapse
Affiliation(s)
- Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA.
| | | |
Collapse
|
45
|
The relationship between relative solvent accessibility and evolutionary rate in protein evolution. Genetics 2011; 188:479-88. [PMID: 21467571 DOI: 10.1534/genetics.111.128025] [Citation(s) in RCA: 87] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Recent work with Saccharomyces cerevisiae shows a linear relationship between the evolutionary rate of sites and the relative solvent accessibility (RSA) of the corresponding residues in the folded protein. Here, we aim to develop a mathematical model that can reproduce this linear relationship. We first demonstrate that two models that both seem reasonable choices (a simple model in which selection strength correlates with RSA and a more complex model based on RSA-dependent amino acid distributions) fail to reproduce the observed relationship. We then develop a model on the basis of observed site-specific amino acid distributions and show that this model behaves appropriately. We conclude that evolutionary rates are directly linked to the distribution of amino acids at individual sites. Because of this link, any future insight into the biophysical mechanisms that determine amino acid distributions will improve our understanding of evolutionary rates.
Collapse
|
46
|
Marrero Coto J, Ehrenhofer-Murray AE, Pons T, Siebers B. Functional analysis of archaeal MBF1 by complementation studies in yeast. Biol Direct 2011; 6:18. [PMID: 21392374 PMCID: PMC3062615 DOI: 10.1186/1745-6150-6-18] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2011] [Accepted: 03/10/2011] [Indexed: 11/21/2022] Open
Abstract
Background Multiprotein-bridging factor 1 (MBF1) is a transcriptional co-activator that bridges a sequence-specific activator (basic-leucine zipper (bZIP) like proteins (e.g. Gcn4 in yeast) or steroid/nuclear-hormone receptor family (e.g. FTZ-F1 in insect)) and the TATA-box binding protein (TBP) in Eukaryotes. MBF1 is absent in Bacteria, but is well- conserved in Eukaryotes and Archaea and harbors a C-terminal Cro-like Helix Turn Helix (HTH) domain, which is the only highly conserved, classical HTH domain that is vertically inherited in all Eukaryotes and Archaea. The main structural difference between archaeal MBF1 (aMBF1) and eukaryotic MBF1 is the presence of a Zn ribbon motif in aMBF1. In addition MBF1 interacting activators are absent in the archaeal domain. To study the function and therefore the evolutionary conservation of MBF1 and its single domains complementation studies in yeast (mbf1Δ) as well as domain swap experiments between aMBF1 and yMbf1 were performed. Results In contrast to previous reports for eukaryotic MBF1 (i.e. Arabidopsis thaliana, insect and human) the two archaeal MBF1 orthologs, TMBF1 from the hyperthermophile Thermoproteus tenax and MMBF1 from the mesophile Methanosarcina mazei were not functional for complementation of an Saccharomyces cerevisiae mutant lacking Mbf1 (mbf1Δ). Of twelve chimeric proteins representing different combinations of the N-terminal, core domain, and the C-terminal extension from yeast and aMBF1, only the chimeric MBF1 comprising the yeast N-terminal and core domain fused to the archaeal C-terminal part was able to restore full wild-type activity of MBF1. However, as reported previously for Bombyx mori, the C-terminal part of yeast Mbf1 was shown to be not essential for function. In addition phylogenetic analyses revealed a common distribution of MBF1 in all Archaea with available genome sequence, except of two of the three Thaumarchaeota; Cenarchaeum symbiosum A and Nitrosopumilus maritimus SCM1. Conclusions The absence of MBF1-interacting activators in the archaeal domain, the presence of a Zn ribbon motif in the divergent N-terminal domain of aMBF1 and the complementation experiments using archaeal- yeast chimeric proteins presented here suggests that archaeal MBF1 is not able to functionally interact with the transcription machinery and/or Gcn4 of S. cerevisiae. Based on modeling and structural prediction it is tempting to speculate that aMBF1 might act as a single regulator or non-essential transcription factor, which directly interacts with DNA via the positive charged linker or the basal transcription machinery via its Zn ribbon motif and the HTH domain. However, also alternative functions in ribosome biosynthesis and/or functionality have been discussed and therefore further experiments are required to unravel the function of MBF1 in Archaea. Reviewers This article was reviewed by William Martin, Patrick Forterre, John van der Oost and Fabian Blombach (nominated by Eugene V Koonin (United States)). For the full reviews, please go to the Reviewer's Reports section.
Collapse
Affiliation(s)
- Jeannette Marrero Coto
- Faculty of Chemistry, Biofilm Centre, Molecular Enzyme Technology and Biochemistry, University of Duisburg-Essen, Universitätsstr. 5, (S05 V03 F41), 45141 Essen, Germany
| | | | | | | |
Collapse
|
47
|
Yang L, Gaut BS. Factors that contribute to variation in evolutionary rate among Arabidopsis genes. Mol Biol Evol 2011; 28:2359-69. [PMID: 21389272 DOI: 10.1093/molbev/msr058] [Citation(s) in RCA: 125] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Surprisingly, few studies have described evolutionary rate variation among plant nuclear genes, with little investigation of the causes of rate variation. Here, we describe evolutionary rates for 11,492 ortholog pairs between Arabidopsis thaliana and A. lyrata and investigate possible contributors to rate variation among these genes. Rates of evolution at synonymous sites vary along chromosomes, suggesting that mutation rates vary on genomic scales, perhaps as a function of recombination rate. Rates of evolution at nonsynonymous sites correlate most strongly with expression patterns, but they also vary as to whether a gene is duplicated and retained after a whole-genome duplication (WGD) event. WGD genes evolve more slowly, on average, than nonduplicated genes and non-WGD duplicates. We hypothesize that levels and patterns of expression are not only the major determinants that explain nonsynonymous rate variation among genes but also a critical determinant of gene retention after duplication.
Collapse
Affiliation(s)
- Liang Yang
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, USA
| | | |
Collapse
|
48
|
Yang JR, Zhuang SM, Zhang J. Impact of translational error-induced and error-free misfolding on the rate of protein evolution. Mol Syst Biol 2011; 6:421. [PMID: 20959819 PMCID: PMC2990641 DOI: 10.1038/msb.2010.78] [Citation(s) in RCA: 76] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2010] [Accepted: 08/31/2010] [Indexed: 11/26/2022] Open
Abstract
Theoretical calculations suggest that, in addition to translational error-induced protein misfolding, a non-negligible fraction of misfolded proteins are error free. We propose that the anticorrelation between the expression level of a protein and its rate of sequence evolution be explained by an overarching protein-misfolding-avoidance hypothesis that includes selection against both error-induced and error-free protein misfolding, and verify this model by a molecular-level evolutionary simulation. We provide strong empirical evidence for the protein-misfolding-avoidance hypothesis, including a positive correlation between protein expression level and stability, enrichment of misfolding-minimizing codons and amino acids in highly expressed genes, and stronger evolutionary conservation of residues in which nonsynonymous changes are more likely to increase protein misfolding.
The rate of protein sequence evolution has long been of central interest to molecular evolutionists. Different proteins of the same species evolve at vastly different rates, which is commonly explained by a variation in functional constraint among different proteins (Kimura and Ohta, 1974). However, it is unclear how to quantify the functional constraint of a protein from the knowledge of its function. In the past decade, various types of genomic data from model organisms have been examined to look for the determinants of the rate of protein sequence evolution. The most unexpected discovery was a very strong anticorrelation between the expression level and evolutionary rate of a protein (E–R anticorrelation) (Pal et al, 2001). The prevailing explanation of the E–R anticorrelation is the translational robustness hypothesis (Drummond et al, 2005). This hypothesis posits that mistranslation induces protein misfolding, which is toxic to cells (Figure 1). Consequently, highly expressed proteins are under stronger pressures to be translationally robust and thus are more constrained in sequence evolution. However, the impact of the other source of misfolded proteins, translational error-free proteins (Figure 1), has not been evaluated. By theoretical calculation, computer simulation, and empirical data analysis, we examined the role of selection against both error-induced and error-free protein misfolding in creating the E–R correlation. Our theoretical calculations suggested that a non-negligible fraction of misfolded proteins are error free. We estimated that when a protein is not very stable, on average ∼20% of misfolded molecules are error free. However, when a protein is very stable, this fraction reduces to ∼5%, which is probably a result of natural selection against protein misfolding. We conducted a molecular-level evolutionary simulation (Figure 2A) using three different schemes: error-induced misfolding only, error-free misfolding only, and both types of misfolding. As expected, results from the first simulation are similar to those from a previous study that considers only error-induced misfolding (Drummond and Wilke, 2008). Interestingly, the second and third simulations can also generate the same patterns, including a positive correlation between the protein expression level and the unfolding energy (ΔG) of the error-free protein (Figure 2B), a negative correlation between the expression level and the fraction of protein molecules that misfold after being mistranslated (Figure 2C), a negative correlation between ΔG and the evolutionary rate (Figure 2D), and a negative correlation between the expression level and the evolutionary rate (i.e., the E–R anticorrelation) (Figure 2E). Furthermore, we found that selection against protein misfolding is more effective in reducing error-free misfolding than error-induced misfolding. Based on these results, we propose that an overarching protein-misfolding-avoidance hypothesis that includes both sources of misfolding is superior to the prevailing translational robustness hypothesis, which considers only error-induced misfolding. We tested three key predictions of the protein-misfolding-avoidance hypotheses using yeast data. First, we showed that, consistent with our prediction, a positive correlation exists between the protein expression level and stability, which is measured by the unfolding energy or melting temperature. In addition, protein expression level is negatively correlated with protein aggregation propensity. Second, we found that codons minimizing protein misfolding are used more frequently in highly expressed proteins than in lowly expressed ones. Third, we showed that, within the same protein, amino acid residues in which random nonsynonymous mutations are more likely to increase protein misfolding are evolutionarily more conserved. Together, these results provide unambiguous evidence that avoidance of both error-induced and error-free protein misfolding is a major source of the E–R anticorrelation and that protein stability and mistranslation have important roles in protein evolution. What determines the rate of protein evolution is a fundamental question in biology. Recent genomic studies revealed a surprisingly strong anticorrelation between the expression level of a protein and its rate of sequence evolution. This observation is currently explained by the translational robustness hypothesis in which the toxicity of translational error-induced protein misfolding selects for higher translational robustness of more abundant proteins, which constrains sequence evolution. However, the impact of error-free protein misfolding has not been evaluated. We estimate that a non-negligible fraction of misfolded proteins are error free and demonstrate by a molecular-level evolutionary simulation that selection against protein misfolding results in a greater reduction of error-free misfolding than error-induced misfolding. Thus, an overarching protein-misfolding-avoidance hypothesis that includes both sources of misfolding is superior to the translational robustness hypothesis. We show that misfolding-minimizing amino acids are preferentially used in highly abundant yeast proteins and that these residues are evolutionarily more conserved than other residues of the same proteins. These findings provide unambiguous support to the role of protein-misfolding-avoidance in determining the rate of protein sequence evolution.
Collapse
Affiliation(s)
- Jian-Rong Yang
- Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, PR China
| | | | | |
Collapse
|
49
|
Elias M. Patterns and processes in the evolution of the eukaryotic endomembrane system. Mol Membr Biol 2010; 27:469-89. [DOI: 10.3109/09687688.2010.521201] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
|
50
|
Vishnoi A, Kryazhimskiy S, Bazykin GA, Hannenhalli S, Plotkin JB. Young proteins experience more variable selection pressures than old proteins. Genome Res 2010; 20:1574-81. [PMID: 20921233 DOI: 10.1101/gr.109595.110] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
It is well known that young proteins tend to experience weaker purifying selection and evolve more quickly than old proteins. Here, we show that, in addition, young proteins tend to experience more variable selection pressures over time than old proteins. We demonstrate this pattern in three independent taxonomic groups: yeast, Drosophila, and mammals. The increased variability of selection pressures on young proteins is highly significant even after controlling for the fact that young proteins are typically shorter and experience weaker purifying selection than old proteins. The majority of our results are consistent with the hypothesis that the function of a young gene tends to change over time more readily than that of an old gene. At the same time, our results may be caused in part by young genes that serve constant functions over time, but nevertheless appear to evolve under changing selection pressures due to depletion of adaptive mutations. In either case, our results imply that the evolution of a protein-coding sequence is partly determined by its age and origin, and not only by the phenotypic properties of the encoded protein. We discuss, via specific examples, the consequences of these findings for understanding of the sources of evolutionary novelty.
Collapse
Affiliation(s)
- Anchal Vishnoi
- Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | | | | | | | | |
Collapse
|