76
|
Deeds EJ, Shakhnovich EI. A structure-centric view of protein evolution, design, and adaptation. ADVANCES IN ENZYMOLOGY AND RELATED AREAS OF MOLECULAR BIOLOGY 2010; 75:133-91, xi-xii. [PMID: 17124867 DOI: 10.1002/9780471224464.ch2] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Proteins, by virtue of their central role in most biological processes, represent one of the key subjects of the study of molecular evolution. Inherent in the indispensability of proteins for living cells is the fact that a given protein can adopt a specific three-dimensional shape that is specified solely by the protein's sequence of amino acids. Over the past several decades, structural biologists have demonstrated that the array of structures that proteins may adopt is quite astounding, and this has lead to a strong interest in understanding how protein structures change and evolve over time. In this review we consider a large body of recent work that attempts to illuminate this structure-centric picture of protein evolution. Much of this work has focused on the question of how completely new protein structures (i.e., new folds or topologies) are discovered by protein sequences as they evolve. Pursuant to this question of structural innovation has been a desire to describe and understand the observation that certain types of protein structures are far more abundant than others and how this uneven distribution of proteins implicates on the process through which new shapes are discovered. We consider a number of theoretical models that have been successful at explaining this heterogeneity in protein populations and discuss the increasing amount of evidence that indicates that the process of structural evolution involves the divergence of protein sequences and structures from one another. We also consider the topic of protein designability, which concerns itself with understanding how a protein's structure influences the number of sequences that can fold successfully into that structure. Understanding and quantifying the relationship between the physical feature of a structure and its designability has been a long-standing goal of the study of protein structure and evolution, and we discuss a number of recent advances that have yielded a promising answer to this question. Finally, we review the relatively new field of protein structural phylogeny, an area of study in which information about the distribution of protein structures among different organisms is used to reconstruct the evolutionary relationships between them. Taken together, the work that we review presents an increasingly coherent picture of how these unique polymers have evolved over the course of life on Earth.
Collapse
|
77
|
Choi PJ, Xie XS, Shakhnovich EI. Stochastic switching in gene networks can occur by a single-molecule event or many molecular steps. J Mol Biol 2009; 396:230-44. [PMID: 19931280 DOI: 10.1016/j.jmb.2009.11.035] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2009] [Revised: 11/09/2009] [Accepted: 11/13/2009] [Indexed: 10/20/2022]
Abstract
Due to regulatory feedback, biological networks can exist stably in multiple states, leading to heterogeneous phenotypes among genetically identical cells. Random fluctuations in protein numbers, tuned by specific molecular mechanisms, have been hypothesized to drive transitions between these different states. We develop a minimal theoretical framework to analyze the limits of switching in terms of simple experimental parameters. Our model identifies and distinguishes between two distinct molecular mechanisms for generating stochastic switches. In one class of switches, the stochasticity of a single-molecule event, a specific and rare molecular reaction, directly controls the macroscopic change in a cell's state. In the second class, no individual molecular event is significant, and stochasticity arises from the propagation of biochemical noise through many molecular pathways and steps. As an example, we explore switches based on protein-DNA binding fluctuations and predict relations between transcription factor kinetics, absolute switching rate, robustness, and efficiency that differentiate between switching by single-molecule events or many molecular steps. Finally, we apply our methods to recent experimental data on switching in Escherichia coli lactose metabolism, providing quantitative interpretations of a single-molecule switching mechanism.
Collapse
|
78
|
Kosmrlj A, Chakraborty AK, Kardar M, Shakhnovich EI. Thymic selection of T-cell receptors as an extreme value problem. PHYSICAL REVIEW LETTERS 2009; 103:068103. [PMID: 19792616 DOI: 10.1103/physrevlett.103.068103] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/30/2009] [Indexed: 05/28/2023]
Abstract
T lymphocytes (T cells) orchestrate adaptive immune responses upon activation. T-cell activation requires sufficiently strong binding of T-cell receptors on their surface to short peptides (p) derived from foreign proteins, which are bound to major histocompatibility gene products (displayed on antigen-presenting cells). A diverse and self-tolerant T-cell repertoire is selected in the thymus. We map thymic selection processes to an extreme value problem and provide an analytic expression for the amino acid compositions of selected T-cell receptors (which enable its recognition functions).
Collapse
|
79
|
Kutchukian PS, Yang JS, Verdine GL, Shakhnovich EI. All-atom model for stabilization of alpha-helical structure in peptides by hydrocarbon staples. J Am Chem Soc 2009; 131:4622-7. [PMID: 19334772 DOI: 10.1021/ja805037p] [Citation(s) in RCA: 92] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Recent work has shown that the incorporation of an all-hydrocarbon "staple" into peptides can greatly increase their alpha-helix propensity, leading to an improvement in pharmaceutical properties such as proteolytic stability, receptor affinity, and cell permeability. Stapled peptides thus show promise as a new class of drugs capable of accessing intractable targets such as those that engage in intracellular protein-protein interactions. The extent of alpha-helix stabilization provided by stapling has proven to be substantially context dependent, requiring cumbersome screening to identify the optimal site for staple incorporation. In certain cases, a staple encompassing one turn of the helix (attached at residues i and i+4) furnishes greater helix stabilization than one encompassing two turns (i,i+7 staple), which runs counter to expectation based on polymer theory. These findings highlight the need for a more thorough understanding of the forces that underlie helix stabilization by hydrocarbon staples. Here we report all-atom Monte Carlo folding simulations comparing unmodified peptides derived from RNase A and BID BH3 with various i,i+4 and i,i+7 stapled versions thereof. The results of these simulations were found to be in quantitative agreement with experimentally determined helix propensities. We also discovered that staples can stabilize quasi-stable decoy conformations, and that the removal of these states plays a major role in determining the helix stability of stapled peptides. Finally, we critically investigate why our method works, exposing the underlying physical forces that stabilize stapled peptides.
Collapse
|
80
|
Kutchukian PS, Lou D, Shakhnovich EI. FOG: Fragment Optimized Growth Algorithm for the de Novo Generation of Molecules Occupying Druglike Chemical Space. J Chem Inf Model 2009; 49:1630-42. [DOI: 10.1021/ci9000458] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|
81
|
Roland CB, Hatch KA, Prentiss M, Shakhnovich EI. DNA unzipping phase diagram calculated via replica theory. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2009; 79:051923. [PMID: 19518496 DOI: 10.1103/physreve.79.051923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/01/2007] [Revised: 12/07/2008] [Indexed: 05/27/2023]
Abstract
We show how single-molecule unzipping experiments can provide strong evidence that the zero-force melting transition of long molecules of natural dsDNA should be classified as a phase transition of the higher-order type (continuous). Toward this end, we study a statistical-mechanics model for the fluctuating structure of a long molecule of dsDNA, and compute the equilibrium phase diagram for the experiment in which the molecule is unzipped under applied force. We consider a perfect-matching dsDNA model, in which the loops are volume-excluding chains with arbitrary loop exponent c . We include stacking interactions, hydrogen bonds, and main-chain entropy. We include sequence heterogeneity at the level of random sequences; in particular, there is no correlation in the base-pairing (bp) energy from one sequence position to the next. We present heuristic arguments to demonstrate that the low-temperature macrostate does not exhibit degenerate ergodicity breaking. We use this claim to understand the results of our replica-theoretic calculation of the equilibrium properties of the system. As a function of temperature, we obtain the minimal force at which the molecule separates completely. This critical-force curve is a line in the temperature-force phase diagram that marks the regions where the molecule exists primarily as a double helix versus the region where the molecule exists as two separate strands. We compare our random-sequence model to magnetic tweezer experiments performed on the 48 502 bp genome of bacteriophage lambda . We find good agreement with the experimental data, which is restricted to temperatures between 24 and 50 degrees C . At higher temperatures, the critical-force curve of our random-sequence model is very different for that of the homogeneous-sequence version of our model. For both sequence models, the critical force falls to zero at the melting temperature T_{c} like |T-T_{c}|;{alpha} . For the homogeneous-sequence model, alpha=1/2 almost exactly, while for the random-sequence model, alpha approximately 0.9 . Importantly, the shape of the critical-force curve is connected, via our theory, to the manner in which the helix fraction falls to zero at T_{c} . The helix fraction is the property that is used to classify the melting transition as a type of phase transition. In our calculation, the shape of the critical-force curve holds strong evidence that the zero-force melting transition of long natural dsDNA should be classified as a higher-order (continuous) phase transition. Specifically, the order is 3rd or greater.
Collapse
|
82
|
Zhang J, Shakhnovich EI. Slowly replicating lytic viruses: pseudolysogenic persistence and within-host competition. PHYSICAL REVIEW LETTERS 2009; 102:178103. [PMID: 19518838 DOI: 10.1103/physrevlett.102.178103] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/15/2008] [Indexed: 05/27/2023]
Abstract
We study the population dynamics of lytic viruses which replicate slowly in dividing host cells within an organism or cell culture, and find a range of viral replication rates that allows viruses to persist, avoiding extinction of host cells or dilution of viruses at too rapid or too slow viral replication. For the within-host competition between viral strains with different replication rates, a strain with a "stable" replication rate in the persistence range could outcompete another strain. However, when strains with higher and lower than the stable value replication rates are both present, competition between strains does not result in the dominance of one strain, but in their coexistence.
Collapse
|
83
|
Faísca PFN, Travasso RDM, Ball RC, Shakhnovich EI. Identifying critical residues in protein folding: Insights from phi-value and P(fold) analysis. J Chem Phys 2009; 129:095108. [PMID: 19044896 DOI: 10.1063/1.2973624] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We apply a simulational proxy of the phi-value analysis and perform extensive mutagenesis experiments to identify the nucleating residues in the folding "reactions" of two small lattice Go polymers with different native geometries. Our findings show that for the more complex native fold (i.e., the one that is rich in nonlocal, long-range bonds), mutation of the residues that form the folding nucleus leads to a considerably larger increase in the folding time than the corresponding mutations in the geometry that is predominantly local. These results are compared to data obtained from an accurate analysis based on the reaction coordinate folding probability P(fold) and on structural clustering methods. Our study reveals a complex picture of the transition state ensemble. For both protein models, the transition state ensemble is rather heterogeneous and splits up into structurally different populations. For the more complex geometry the identified subpopulations are actually structurally disjoint. For the less complex native geometry we found a broad transition state with microscopic heterogeneity. These findings suggest that the existence of multiple transition state structures may be linked to the geometric complexity of the native fold. For both geometries, the identification of the folding nucleus via the P(fold) analysis agrees with the identification of the folding nucleus carried out with the phi-value analysis. For the most complex geometry, however, the applied methodologies give more consistent results than for the more local geometry. The study of the transition state structure reveals that the nucleus residues are not necessarily fully native in the transition state. Indeed, it is only for the more complex geometry that two of the five critical residues show a considerably high probability of having all its native bonds formed in the transition state. Therefore, one concludes that, in general, the phi-value correlates with the acceleration/deceleration of folding induced by mutation, rather than with the degree of nativeness of the transition state, and that the "traditional" interpretation of phi-values may provide a more realistic picture of the structure of the transition state only for more complex native geometries.
Collapse
|
84
|
Donald JE, Shakhnovich EI. SDR: a database of predicted specificity-determining residues in proteins. Nucleic Acids Res 2008; 37:D191-4. [PMID: 18927118 PMCID: PMC2686543 DOI: 10.1093/nar/gkn716] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
The specificity-determining residue database (SDR database) presents residue positions where mutations are predicted to have changed protein function in large protein families. Because the database pre-calculates predictions on existing protein sequence alignments, users can quickly find the predictions by selecting the appropriate protein family or searching by protein sequence. Predictions can be used to guide mutagenesis or to gain a better understanding of specificity changes in a protein family. The database is available on the web at http://paradox.harvard.edu/sdr.
Collapse
|
85
|
Abstract
The scale free structure p(k)-k(-gamma) of protein-protein interaction networks can be reproduced by a static physical model in simulation. We inspect the model theoretically, and find the key reason for the model generating apparent scale free degree distributions. This explanation provides a generic mechanism of 'scale free' networks. Moreover, we predict the dependence of gamma on experimental protein concentrations or other sensitivity factors in detecting interactions, and find experimental evidence to support the prediction.
Collapse
|
86
|
Zhang J, Maslov S, Shakhnovich EI. Constraints imposed by non-functional protein-protein interactions on gene expression and proteome size. Mol Syst Biol 2008; 4:210. [PMID: 18682700 PMCID: PMC2538908 DOI: 10.1038/msb.2008.48] [Citation(s) in RCA: 84] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2008] [Accepted: 06/21/2008] [Indexed: 12/21/2022] Open
Abstract
Crowded intracellular environments present a challenge for proteins to form functional specific complexes while reducing non-functional interactions with promiscuous non-functional partners. Here we show how the need to minimize the waste of resources to non-functional interactions limits the proteome diversity and the average concentration of co-expressed and co-localized proteins. Using the results of high-throughput Yeast 2-Hybrid experiments, we estimate the characteristic strength of non-functional protein–protein interactions. By combining these data with the strengths of specific interactions, we assess the fraction of time proteins spend tied up in non-functional interactions as a function of their overall concentration. This allows us to sketch the phase diagram for baker's yeast cells using the experimentally measured concentrations and subcellular localization of their proteins. The positions of yeast compartments on the phase diagram are consistent with our hypothesis that the yeast proteome has evolved to operate closely to the upper limit of its size, whereas keeping individual protein concentrations sufficiently low to reduce non-functional interactions. These findings have implication for conceptual understanding of intracellular compartmentalization, multicellularity and differentiation.
Collapse
|
87
|
Zeldovich KB, Chen P, Shakhnovich BE, Shakhnovich EI. A first-principles model of early evolution: emergence of gene families, species, and preferred protein folds. PLoS Comput Biol 2008; 3:e139. [PMID: 17630830 PMCID: PMC1914367 DOI: 10.1371/journal.pcbi.0030139] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2007] [Accepted: 06/04/2007] [Indexed: 11/19/2022] Open
Abstract
In this work we develop a microscopic physical model of early evolution where phenotype—organism life expectancy—is directly related to genotype—the stability of its proteins in their native conformations—which can be determined exactly in the model. Simulating the model on a computer, we consistently observe the “Big Bang” scenario whereby exponential population growth ensues as soon as favorable sequence–structure combinations (precursors of stable proteins) are discovered. Upon that, random diversity of the structural space abruptly collapses into a small set of preferred proteins. We observe that protein folds remain stable and abundant in the population at timescales much greater than mutation or organism lifetime, and the distribution of the lifetimes of dominant folds in a population approximately follows a power law. The separation of evolutionary timescales between discovery of new folds and generation of new sequences gives rise to emergence of protein families and superfamilies whose sizes are power-law distributed, closely matching the same distributions for real proteins. On the population level we observe emergence of species—subpopulations that carry similar genomes. Further, we present a simple theory that relates stability of evolving proteins to the sizes of emerging genomes. Together, these results provide a microscopic first-principles picture of how first-gene families developed in the course of early evolution. Here, we address the question of how Darwinian evolution of organisms determines molecular evolution of their proteins and genomes. We developed a microscopic ab initio model of early biological evolution where the fitness (essentially lifetime) of an organism is explicitly related to the evolving sequences of its proteins. The main assumption of the model is that the death rate of an organism is determined by the stability of the least stable of their proteins. A lattice model is used to calculate stability of all proteins in a genome from their amino acid sequence. The simulation of the model starts from 100 identical organisms, each carrying the same random gene, and proceeds via random mutations, gene duplication, organism births via replication, and organism deaths. We find that exponential population growth is possible only after the discovery of a very small number of specific advantageous protein structures. The number of genes in the evolving organisms depends on the mutation rate, demonstrating the intricate relationship between the genome sizes and protein stability requirements. Further, the model explains the observed power-law distributions of protein family and superfamily sizes, as well as the scale-free character of protein structural similarity graphs. Together, these results and their analysis suggest a plausible comprehensive scenario of emergence of the protein universe in early biological evolution.
Collapse
|
88
|
Zeldovich KB, Shakhnovich EI. Understanding protein evolution: from protein physics to Darwinian selection. Annu Rev Phys Chem 2008; 59:105-27. [PMID: 17937598 DOI: 10.1146/annurev.physchem.58.032806.104449] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Efforts in whole-genome sequencing and structural proteomics start to provide a global view of the protein universe, the set of existing protein structures and sequences. However, approaches based on the selection of individual sequences have not been entirely successful at the quantitative description of the distribution of structures and sequences in the protein universe because evolutionary pressure acts on the entire organism, rather than on a particular molecule. In parallel to this line of study, studies in population genetics and phenomenological molecular evolution established a mathematical framework to describe the changes in genome sequences in populations of organisms over time. Here, we review both microscopic (physics-based) and macroscopic (organism-level) models of protein-sequence evolution and demonstrate that bridging the two scales provides the most complete description of the protein universe starting from clearly defined, testable, and physiologically relevant assumptions.
Collapse
|
89
|
Shakhnovich BE, Shakhnovich EI. Improvisation in evolution of genes and genomes: whose structure is it anyway? Curr Opin Struct Biol 2008; 18:375-81. [PMID: 18487041 DOI: 10.1016/j.sbi.2008.02.007] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2008] [Accepted: 02/13/2008] [Indexed: 01/31/2023]
Abstract
Significant progress has been made in recent years in a variety of seemingly unrelated fields such as sequencing, protein structure prediction, and high-throughput transcriptomics and metabolomics. At the same time, new microscopic models have been developed that made it possible to analyze the evolution of genes and genomes from first principles. The results from these efforts enable, for the first time, a comprehensive insight into the evolution of complex systems and organisms on all scales--from sequences to organisms and populations. Every newly sequenced genome uncovers new genes, families, and folds. Where do these new genes come from? How do gene duplication and subsequent divergence of sequence and structure affect the fitness of the organism? What role does regulation play in the evolution of proteins and folds? Emerging synergism between data and modeling provides first robust answers to these questions.
Collapse
|
90
|
Pereira de Araújo AF, Gomes ALC, Bursztyn AA, Shakhnovich EI. Native atomic burials, supplemented by physically motivated hydrogen bond constraints, contain sufficient information to determine the tertiary structure of small globular proteins. Proteins 2008; 70:971-83. [PMID: 17847091 DOI: 10.1002/prot.21571] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
We investigate the possibility that atomic burials, as measured by their distances from the structural geometrical center, contain sufficient information to determine the tertiary structure of globular proteins. We report Monte Carlo simulated annealing results of all-atom hard-sphere models in continuous space for four small proteins: the all-beta WW-domain 1E0L, the alpha/beta protein-G 1IGD, the all-alpha engrailed homeo-domain 1ENH, and the alpha + beta engineered monomeric form of the Cro protein 1ORC. We used as energy function the sum over all atoms, labeled by i, of |R(i) - R(i) (*)|, where R(i) is the atomic distance from the center of coordinates, or central distance, and R(i) (*) is the "ideal" central distance obtained from the native structure. Hydrogen bonds were taken into consideration by the assignment of two ideal distances for backbone atoms forming hydrogen bonds in the native structure depending on the formation of a geometrically defined bond, independently of bond partner. Lowest energy final conformations turned out to be very similar to the native structure for the four proteins under investigation and a strong correlation was observed between energy and distance root mean square deviation (DRMS) from the native in the case of all-beta 1E0L and alpha/beta 1IGD. For all alpha 1ENH and alpha + beta 1ORC the overall correlation between energy and DRMS among final conformations was not as high because some trajectories resulted in high DRMS but low energy final conformations in which alpha-helices adopted a non-native mutual orientation. Comparison between central distances and actual accessible surface areas corroborated the implicit assumption of correlation between these two quantities. The Z-score obtained with this native-centric potential in the discrimination of native 1ORC from a set of random compact structures confirmed that it contains a much smaller amount of native information when compared to a traditional contact Go potential but indicated that simple sequence-dependent burial potentials still need some improvement in order to attain a similar discriminability. Taken together, our results suggest that central distances, in conjunction to physically motivated hydrogen bond constraints, contain sufficient information to determine the native conformation of these small proteins and that a solution to the folding problem for globular proteins could arise from sufficiently accurate burial predictions from sequence followed by minimization of a burial-dependent energy function.
Collapse
|
91
|
Lukatsky DB, Shakhnovich EI. Statistically enhanced promiscuity of structurally correlated patterns. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2008; 77:020901. [PMID: 18351980 DOI: 10.1103/physreve.77.020901] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/13/2007] [Indexed: 05/26/2023]
Abstract
We predict that patterns with correlated surface density of atoms have statistically higher promiscuity (ability to bind stronger to an arbitrary pattern) as compared with noncorrelated patterns with the same average surface density. We suggest that this constitutes a generic design principle for highly connected proteins (hubs) in protein interaction networks. We develop an analytical theory for this effect. We show that our key predictions are generic and independent, qualitatively, on the specific form of the interatomic interaction potential, provided it has a finite range.
Collapse
|
92
|
Zeldovich KB, Chen P, Shakhnovich EI. Protein stability imposes limits on organism complexity and speed of molecular evolution. Proc Natl Acad Sci U S A 2007; 104:16152-7. [PMID: 17913881 PMCID: PMC2042177 DOI: 10.1073/pnas.0705366104] [Citation(s) in RCA: 187] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2007] [Indexed: 01/18/2023] Open
Abstract
Classical population genetics a priori assigns fitness to alleles without considering molecular or functional properties of proteins that these alleles encode. Here we study population dynamics in a model where fitness can be inferred from physical properties of proteins under a physiological assumption that loss of stability of any protein encoded by an essential gene confers a lethal phenotype. Accumulation of mutations in organisms containing Gamma genes can then be represented as diffusion within the Gamma-dimensional hypercube with adsorbing boundaries determined, in each dimension, by loss of a protein's stability and, at higher stability, by lack of protein sequences. Solving the diffusion equation whose parameters are derived from the data on point mutations in proteins, we determine a universal distribution of protein stabilities, in agreement with existing data. The theory provides a fundamental relation between mutation rate, maximal genome size, and thermodynamic response of proteins to point mutations. It establishes a universal speed limit on rate of molecular evolution by predicting that populations go extinct (via lethal mutagenesis) when mutation rate exceeds approximately six mutations per essential part of genome per replication for mesophilic organisms and one to two mutations per genome per replication for thermophilic ones. Several RNA viruses function close to the evolutionary speed limit, whereas error correction mechanisms used by DNA viruses and nonmutant strains of bacteria featuring various genome lengths and mutation rates have brought these organisms universally approximately 1,000-fold below the natural speed limit.
Collapse
|
93
|
Deeds EJ, Ashenberg O, Gerardin J, Shakhnovich EI. Robust protein protein interactions in crowded cellular environments. Proc Natl Acad Sci U S A 2007; 104:14952-7. [PMID: 17848524 PMCID: PMC1986594 DOI: 10.1073/pnas.0702766104] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The capacity of proteins to interact specifically with one another underlies our conceptual understanding of how living systems function. Systems-level study of specificity in protein-protein interactions is complicated by the fact that the cellular environment is crowded and heterogeneous; interaction pairs may exist at low relative concentrations and thus be presented with many more opportunities for promiscuous interactions compared with specific interaction possibilities. Here we address these questions by using a simple computational model that includes specifically designed interacting model proteins immersed in a mixture containing hundreds of different unrelated ones; all of them undergo simulated diffusion and interaction. We find that specific complexes are quite robust to interference from promiscuous interaction partners only in the range of temperatures T(design) > T > T(rand). At T > T(design), specific complexes become unstable, whereas at T < T(rand), formation of specific complexes is suppressed by promiscuous interactions. Specific interactions can form only if T(design) > T(rand). This condition requires an energy gap between binding energy in a specific complex and set of binding energies between randomly associating proteins, providing a general physical constraint on evolutionary selection or design of specific interacting protein interfaces. This work has implications for our understanding of how the protein repertoire functions and evolves within the context of cellular systems.
Collapse
|
94
|
Perlstein EO, Deeds EJ, Ashenberg O, Shakhnovich EI, Schreiber SL. Quantifying fitness distributions and phenotypic relationships in recombinant yeast populations. Proc Natl Acad Sci U S A 2007; 104:10553-8. [PMID: 17566105 PMCID: PMC1965551 DOI: 10.1073/pnas.0704037104] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Studies of the role of sex in evolution typically involve a longitudinal comparison of a single ancestor to several intermediate descendants and to one terminally evolved descendant after many generations of adaptation under a given selective regime. Here we take a complementary, statistical approach to sex in evolution, by describing the distribution of phenotypic similarity in a population of yeast F1 meiotic recombinants. By applying graph theory to fitness measurements of thousands of Saccharomyces cerevisiae recombinants treated with 10 mechanistically distinct, growth-inhibitory small-molecule perturbagens (SMPs), we show that the network of phenotypic similarity among F1 recombinants exhibits a scale-free degree distribution. F1 recombinants are often phenotypically unique and sometimes exceptional, and their fitness strengths are unevenly distributed across the 10 compound treatments. By contrast, highly phenotypically similar F1 recombinants constitute failing hubs that display below-average fitness across all compound treatments and are candidate substrates for purifying selection. Comparison of the F1 generation with the parental strains reveals that (i) there is a specialist more fit in any given single condition than any of the parents but (ii) only rarely are there generalists that exhibit greater fitness than both parental strains across a majority of conditions. This analysis allows us to evaluate and to gain better theoretical understanding of the costs and benefits of sex in the F1 generation.
Collapse
|
95
|
Yang JS, Chen WW, Skolnick J, Shakhnovich EI. All-atom ab initio folding of a diverse set of proteins. Structure 2007; 15:53-63. [PMID: 17223532 DOI: 10.1016/j.str.2006.11.010] [Citation(s) in RCA: 72] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2006] [Revised: 11/15/2006] [Accepted: 11/18/2006] [Indexed: 11/30/2022]
Abstract
Natural proteins fold to a unique, thermodynamically dominant state. Modeling of the folding process and prediction of the native fold of proteins are two major unsolved problems in biophysics. Here, we show successful all-atom ab initio folding of a representative diverse set of proteins by using a minimalist transferable-energy model that consists of two-body atom-atom interactions, hydrogen bonding, and a local sequence-energy term that models sequence-specific chain stiffness. Starting from a random coil, the native-like structure was observed during replica exchange Monte Carlo (REMC) simulation for most proteins regardless of their structural classes; the lowest energy structure was close to native-in the range of 2-6 A root-mean-square deviation (rmsd). Our results demonstrate that the successful folding of a protein chain to its native state is governed by only a few crucial energetic terms.
Collapse
|
96
|
Gomes ALC, de Rezende JR, Pereira de Araújo AF, Shakhnovich EI. Description of atomic burials in compact globular proteins by Fermi-Dirac probability distributions. Proteins 2007; 66:304-20. [PMID: 17109406 DOI: 10.1002/prot.21137] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
We perform a statistical analysis of atomic distributions as a function of the distance R from the molecular geometrical center in a nonredundant set of compact globular proteins. The number of atoms increases quadratically for small R, indicating a constant average density inside the core, reaches a maximum at a size-dependent distance R(max), and falls rapidly for larger R. The empirical curves turn out to be consistent with the volume increase of spherical concentric solid shells and a Fermi-Dirac distribution in which the distance R plays the role of an effective atomic energy epsilon(R) = R. The effective chemical potential mu governing the distribution increases with the number of residues, reflecting the size of the protein globule, while the temperature parameter beta decreases. Interestingly, betamu is not as strongly dependent on protein size and appears to be tuned to maintain approximately half of the atoms in the high density interior and the other half in the exterior region of rapidly decreasing density. A normalized size-independent distribution was obtained for the atomic probability as a function of the reduced distance, r = R/R(g), where R(g) is the radius of gyration. The global normalized Fermi distribution, F(r), can be reasonably decomposed in Fermi-like subdistributions for different atomic types tau, F(tau)(r), with Sigma(tau)F(tau)(r) = F(r), which depend on two additional parameters mu(tau) and h(tau). The chemical potential mu(tau) affects a scaling prefactor and depends on the overall frequency of the corresponding atomic type, while the maximum position of the subdistribution is determined by h(tau), which appears in a type-dependent atomic effective energy, epsilon(tau)(r) = h(tau)r, and is strongly correlated to available hydrophobicity scales. Better adjustments are obtained when the effective energy is not assumed to be necessarily linear, or epsilon(tau)*(r) = h(tau)*r(alpha,), in which case a correlation with hydrophobicity scales is found for the product alpha(tau)h(tau)*. These results indicate that compact globular proteins are consistent with a thermodynamic system governed by hydrophobic-like energy functions, with reduced distances from the geometrical center, reflecting atomic burials, and provide a conceptual framework for the eventual prediction from sequence of a few parameters from which whole atomic probability distributions and potentials of mean force can be reconstructed.
Collapse
|
97
|
Wallin S, Zeldovich KB, Shakhnovich EI. The folding mechanics of a knotted protein. J Mol Biol 2007; 368:884-93. [PMID: 17368671 PMCID: PMC2692925 DOI: 10.1016/j.jmb.2007.02.035] [Citation(s) in RCA: 119] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2006] [Revised: 02/07/2007] [Accepted: 02/09/2007] [Indexed: 11/18/2022]
Abstract
An increasing number of proteins are being discovered with a remarkable and somewhat surprising feature, a knot in their native structures. How the polypeptide chain is able to "knot" itself during the folding process to form these highly intricate protein topologies is not known. Here we perform a computational study on the 160-amino-acid homodimeric protein YibK, which, like other proteins in the SpoU family of MTases, contains a deep trefoil knot in its C-terminal region. In this study, we use a coarse-grained C(alpha)-chain representation and Langevin dynamics to study folding kinetics. We find that specific, attractive nonnative interactions are critical for knot formation. In the absence of these interactions, i.e., in an energetics driven entirely by native interactions, knot formation is exceedingly unlikely. Further, we find, in concert with recent experimental data on YibK, two parallel folding pathways that we attribute to an early and a late formation of the trefoil knot, respectively. For both pathways, knot formation occurs before dimerization. A bioinformatics analysis of the SpoU family of proteins reveals further that the critical nonnative interactions may originate from evolutionary conserved hydrophobic segments around the knotted region.
Collapse
|
98
|
Abstract
The free energy landscape of protein folding is rugged, occasionally characterized by compact, intermediate states of low free energy. In computational folding, this landscape leads to trapped, compact states with incorrect secondary structure. We devised a residue-specific, protein backbone move set for efficient sampling of protein-like conformations in computational folding simulations. The move set is based on the selection of a small set of backbone dihedral angles, derived from clustering dihedral angles sampled from experimental structures. We show in both simulated annealing and replica exchange Monte Carlo (REMC) simulations that the knowledge-based move set, when compared with a conventional move set, shows statistically significant improved ability at overcoming kinetic barriers, reaching deeper energy minima, and achieving correspondingly lower RMSDs to native structures. The new move set is also more efficient, being able to reach low energy states considerably faster. Use of this move set in determining the energy minimum state and for calculating thermodynamic quantities is discussed.
Collapse
|
99
|
Berezovsky IN, Zeldovich KB, Shakhnovich EI. Positive and negative design in stability and thermal adaptation of natural proteins. PLoS Comput Biol 2007; 3:e52. [PMID: 17381236 PMCID: PMC1829478 DOI: 10.1371/journal.pcbi.0030052] [Citation(s) in RCA: 105] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2006] [Accepted: 01/31/2007] [Indexed: 11/18/2022] Open
Abstract
The aim of this work is to elucidate how physical principles of protein design are reflected in natural sequences that evolved in response to the thermal conditions of the environment. Using an exactly solvable lattice model, we design sequences with selected thermal properties. Compositional analysis of designed model sequences and natural proteomes reveals a specific trend in amino acid compositions in response to the requirement of stability at elevated environmental temperature: the increase of fractions of hydrophobic and charged amino acid residues at the expense of polar ones. We show that this “from both ends of the hydrophobicity scale” trend is due to positive (to stabilize the native state) and negative (to destabilize misfolded states) components of protein design. Negative design strengthens specific repulsive non-native interactions that appear in misfolded structures. A pressure to preserve specific repulsive interactions in non-native conformations may result in correlated mutations between amino acids that are far apart in the native state but may be in contact in misfolded conformations. Such correlated mutations are indeed found in TIM barrel and other proteins. What mechanisms does Nature use in her quest for thermophilic proteins? It is known that stability of a protein is mainly determined by the energy gap, or the difference in energy, between native state and a set of incorrectly folded (misfolded) conformations. Here we show that Nature makes thermophilic proteins by widening this gap from both ends. The energy of the native state of a protein is decreased by selecting strongly attractive amino acids at positions that are in contact in the native state (positive design). Simultaneously, energies of the misfolded conformations are increased by selection of strongly repulsive amino acids at positions that are distant in native structure; however, these amino acids will interact repulsively in the misfolded conformations (negative design). These fundamental principles of protein design are manifested in the “from both ends of the hydrophobicity scale” trend observed in thermophilic adaptation, whereby proteomes of thermophilic proteins are enriched in extreme amino acids—hydrophobic and charged—at the expense of polar ones. Hydrophobic amino acids contribute mostly to the positive design, while charged amino acids that repel each other in non-native conformations of proteins contribute to negative design. Our results provide guidance in rational design of proteins with selected thermal properties.
Collapse
|
100
|
Abstract
Protein–DNA interactions are vital for many processes in living cells, especially transcriptional regulation and DNA modification. To further our understanding of these important processes on the microscopic level, it is necessary that theoretical models describe the macromolecular interaction energetics accurately. While several methods have been proposed, there has not been a careful comparison of how well the different methods are able to predict biologically important quantities such as the correct DNA binding sequence, total binding free energy and free energy changes caused by DNA mutation. In addition to carrying out the comparison, we present two important theoretical models developed initially in protein folding that have not yet been tried on protein–DNA interactions. In the process, we find that the results of these knowledge-based potentials show a strong dependence on the interaction distance and the derivation method. Finally, we present a knowledge-based potential that gives comparable or superior results to the best of the other methods, including the molecular mechanics force field AMBER99.
Collapse
|