126
|
Hubner IA, Edmonds KA, Shakhnovich EI. Nucleation and the transition state of the SH3 domain. J Mol Biol 2005; 349:424-34. [PMID: 15890206 DOI: 10.1016/j.jmb.2005.03.050] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2005] [Revised: 03/16/2005] [Accepted: 03/18/2005] [Indexed: 11/17/2022]
Abstract
We present a verified computational model of the SH3 domain transition state (TS) ensemble. This model was built for three separate SH3 domains using experimental phi-values as structural constraints in all-atom protein folding simulations. While averaging over all conformations incorrectly considers non-TS conformations as transition states, quantifying structures as pre-TS, TS, and post-TS by measurement of their transmission coefficient ("probability to fold", or p(fold)) allows for rigorous conclusions regarding the structure of the folding nucleus and a full mechanistic analysis of the folding process. Through analysis of the TS, we observe a highly polarized nucleus in which many residues are solvent-exposed. Mechanistic analysis suggests the hydrophobic core forms largely after an early nucleation step. SH3 presents an ideal system for studying the nucleation-condensation mechanism and highlights the synergistic relationship between experiment and simulation in the study of protein folding.
Collapse
|
127
|
Tannenbaum E, Sherley JL, Shakhnovich EI. Evolutionary dynamics of adult stem cells: comparison of random and immortal-strand segregation mechanisms. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2005; 71:041914. [PMID: 15903708 DOI: 10.1103/physreve.71.041914] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/28/2004] [Indexed: 05/02/2023]
Abstract
This paper develops a point-mutation model describing the evolutionary dynamics of a population of adult stem cells. Such a model may prove useful for quantitative studies of tissue aging and the emergence of cancer. We consider two modes of chromosome segregation: (1) random segregation, where the daughter chromosomes of a given parent chromosome segregate randomly into the stem cell and its differentiating sister cell and (2) "immortal DNA strand" co-segregation, for which the stem cell retains the daughter chromosomes with the oldest parent strands. Immortal strand co-segregation is a mechanism, originally proposed by [Cairns Nature (London) 255, 197 (1975)], by which stem cells preserve the integrity of their genomes. For random segregation, we develop an ordered strand pair formulation of the dynamics, analogous to the ordered strand pair formalism developed for quasispecies dynamics involving semiconservative replication with imperfect lesion repair (in this context, lesion repair is taken to mean repair of postreplication base-pair mismatches). Interestingly, a similar formulation is possible with immortal strand co-segregation, despite the fact that this segregation mechanism is age dependent. From our model we are able to mathematically show that, when lesion repair is imperfect, then immortal strand co-segregation leads to better preservation of the stem cell lineage than random chromosome segregation. Furthermore, our model allows us to estimate the optimal lesion repair efficiency for preserving an adult stem cell population for a given period of time. For human stem cells, we obtain that mispaired bases still present after replication and cell division should be left untouched, to avoid potentially fixing a mutation in both DNA strands.
Collapse
|
128
|
Deeds EJ, Shakhnovich EI. The emergence of scaling in sequence-based physical models of protein evolution. Biophys J 2005; 88:3905-11. [PMID: 15805176 PMCID: PMC1305622 DOI: 10.1529/biophysj.104.051433] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
It has recently been discovered that many biological systems, when represented as graphs, exhibit a scale-free topology. One such system is the set of structural relationships among protein domains. The scale-free nature of this and other systems has previously been explained using network growth models that, although motivated by biological processes, do not explicitly consider the underlying physics or biology. In this work we explore a sequence-based model for the evolution protein structures and demonstrate that this model is able to recapitulate the scale-free nature observed in graphs of real protein structures. We find that this model also reproduces other statistical feature of the protein domain graph. This represents, to our knowledge, the first such microscopic, physics-based evolutionary model for a scale-free network of biological importance and as such has strong implications for our understanding of the evolution of protein structures and of other biological networks.
Collapse
|
129
|
Deeds EJ, Hennessey H, Shakhnovich EI. Prokaryotic phylogenies inferred from protein structural domains. Genome Res 2005; 15:393-402. [PMID: 15741510 PMCID: PMC551566 DOI: 10.1101/gr.3033805] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The determination of the phylogenetic relationships among microorganisms has long relied primarily on gene sequence information. Given that prokaryotic organisms often lack morphological characteristics amenable to phylogenetic analysis, prokaryotic phylogenies, in particular, are often based on sequence data. In this work, we explore a new source of phylogenetic information, the distribution of protein structural domains within fully sequenced prokaryotic genomes. The evolution of the structural domains we use has been studied extensively, allowing us to base our phylogenetic methods on testable theoretical models of structural evolution. We find that the methods that produce reasonable phylogenetic relationships are indeed the methods that are most consistent with theoretical evolutionary models. This work represents, to our knowledge, the first such theoretically motivated phylogeny, as well as the first application of structural information to phylogeny on this scale. Our results have strong implications for the phylogenetic relationships among prokaryotic organisms and for the understanding of protein evolution as a whole.
Collapse
|
130
|
Abstract
MOTIVATION Given a large family of homologous protein sequences, many methods can divide the family into smaller groups that correspond to the different functions carried out by proteins within the family. One important problem, however, has been the absence of a general method for selecting an appropriate level of granularity, or size of the groups. RESULTS We propose a consistent way of choosing the granularity that is independent of the sequence similarity and sequence clustering method used. We study three large, well-investigated protein families: basic leucine zippers, nuclear receptors and proteins with three consecutive C2H2 zinc fingers. Our method is tested against known functional information, the experimentally determined binding specificities, using a simple scoring method. The significance of the groups is also measured by randomizing the data. Finally, we compare our algorithm against a popular method of grouping proteins, the TRIBE-MCL method. In the end, we determine that dividing the families at the proposed level of granularity creates very significant and useful groups of proteins that correspond to the different DNA-binding motifs. We expect that such groupings will be useful in studying not only DNA binding but also other protein interactions.
Collapse
|
131
|
Donald JE, Hubner IA, Rotemberg VM, Shakhnovich EI, Mirny LA. CoC: a database of universally conserved residues in protein folds. Bioinformatics 2005; 21:2539-40. [PMID: 15746286 DOI: 10.1093/bioinformatics/bti360] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
UNLABELLED The conservatism of conservatism (CoC) database presents statistically analyzed information about the conservation of residue positions in folds across protein families. AVAILABILITY On the web at http://kulibin.mit.edu/coc/
Collapse
|
132
|
Brumer Y, Shakhnovich EI. Selective advantage for conservative viruses. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2005; 71:031903. [PMID: 15903455 DOI: 10.1103/physreve.71.031903] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/16/2004] [Revised: 12/23/2004] [Indexed: 05/02/2023]
Abstract
In this article we study the full semiconservative treatment of a model for the coevolution of a virus and an adaptive immune system. Regions of viability are calculated for both conservatively and semiconservatively replicating viruses interacting with a realistic semiconservatively replicating immune system. The conservative virus is found to have a selective advantage in the form of an ability to survive in regions with a wider range of mutation rates than its semiconservative counterpart, as well as an increased replication rate where both species can survive. This may help explain the existence of a rich range of viruses with conservatively replicating genomes, a trait that is found nowhere else in nature.
Collapse
|
133
|
Nivón LG, Shakhnovich EI. All-atom Monte Carlo simulation of GCAA RNA folding. J Mol Biol 2004; 344:29-45. [PMID: 15504400 DOI: 10.1016/j.jmb.2004.09.041] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2004] [Revised: 09/16/2004] [Accepted: 09/17/2004] [Indexed: 10/26/2022]
Abstract
We report a detailed all-atom simulation of the folding of the GCAA RNA tetraloop. The GCAA tetraloop motif is a very common and thermodynamically stable secondary structure in natural RNAs. We use our simulation methods to study the folding behavior of a 12-base GCAA tetraloop structure with a four-base helix adjacent to the tetraloop proper. We implement an all-atom Monte Carlo (MC) simulation of RNA structural dynamics using a Go potential. Molecular dynamics (MD) simulation of RNA and protein has realistic energetics and sterics, but is extremely expensive in terms of computational time. By coarsely treating non-covalent energetics, but retaining all-atom sterics and entropic effects, all-atom MC techniques are a useful method for the study of protein and now RNA. We observe a sharp folding transition for this structure, and in simulations at room temperature the state histogram shows three distinct minima: an unfolded state (U), a more narrow intermediated state (I), and a narrow folded state (F). The intermediate consists primarily of structures with the GCAA loop and some helix hydrogen bonds formed. Repeated kinetic folding simulations reveal that the number of helix base-pairs forms a simple 1D reaction coordinate for the I-->N transition.
Collapse
|
134
|
Tannenbaum E, Sherley JL, Shakhnovich EI. Imperfect DNA lesion repair in the semiconservative quasispecies model: derivation of the Hamming class equations and solution of the single-fitness peak landscape. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2004; 70:061915. [PMID: 15697410 DOI: 10.1103/physreve.70.061915] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/05/2004] [Indexed: 05/24/2023]
Abstract
This paper develops a Hamming class formalism for the semiconservative quasispecies equations with imperfect lesion repair, first presented and analytically solved in Y. Brumer and E.I. Shakhnovich (q-bio.GN/0403018, 2004). Starting from the quasispecies dynamics over the space of genomes, we derive an equivalent dynamics over the space of ordered sequence pairs. From this set of equations, we are able to derive the infinite sequence length form of the dynamics for a class of fitness landscapes defined by a master genome. We use these equations to solve for a generalized single-fitness-peak landscape, where the master genome can sustain a maximum number of lesions and remain viable. We determine the mean equilibrium fitness and error threshold for this class of landscapes, and show that when lesion repair is imperfect, semiconservative replication displays characteristics from both conservative replication and semiconservative replication with perfect lesion repair. The work presented here provides a formulation of the model which greatly facilitates the analysis of a relatively broad class of fitness landscapes, and thus serves as a convenient springboard into biological applications of imperfect lesion repair.
Collapse
|
135
|
Brumer Y, Shakhnovich EI. Importance of DNA repair in tumor suppression. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2004; 70:061912. [PMID: 15697407 DOI: 10.1103/physreve.70.061912] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/18/2004] [Indexed: 05/24/2023]
Abstract
The transition from a normal to cancerous cell requires a number of highly specific mutations that affect cell cycle regulation, apoptosis, differentiation, and many other cell functions. One hallmark of cancerous genomes is genomic instability, with mutation rates far greater than those of normal cells. In microsatellite instability (MIN tumors), these are often caused by damage to mismatch repair genes, allowing further mutation of the genome and tumor progression. These mutation rates may lie near the error catastrophe found in the quasispecies model of adaptive RNA genomes, suggesting that further increasing mutation rates will destroy cancerous genomes. However, recent results have demonstrated that DNA genomes exhibit an error threshold at mutation rates far lower than their conservative counterparts. Furthermore, while the maximum viable mutation rate in conservative systems increases indefinitely with increasing master sequence fitness, the semiconservative threshold plateaus at a relatively low value. This implies a paradox, wherein inaccessible mutation rates are found in viable tumor cells. In this paper, we address this paradox, demonstrating an isomorphism between the conservatively replicating (RNA) quasispecies model and the semiconservative (DNA) model with post-methylation DNA repair mechanisms impaired. Thus, as DNA repair becomes inactivated, the maximum viable mutation rate increases smoothly to that of a conservatively replicating system on a transformed landscape, with an upper bound that is dependent on replication rates. On a specific single fitness peak landscape, the repair-free semiconservative system is shown to mimic a conservative system exactly. We postulate that inactivation of post-methylation repair mechanisms is fundamental to the progression of a tumor cell and hence these mechanisms act as a method for the prevention and destruction of cancerous genomes.
Collapse
|
136
|
Liu Z, Dominy BN, Shakhnovich EI. Structural mining: self-consistent design on flexible protein-peptide docking and transferable binding affinity potential. J Am Chem Soc 2004; 126:8515-28. [PMID: 15238009 DOI: 10.1021/ja032018q] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
A flexible protein-peptide docking method has been designed to consider not only ligand flexibility but also the flexibility of the protein. The method is based on a Monte Carlo annealing process. Simulations with a distance root-mean-square (dRMS) virtual energy function revealed that the flexibility of protein side chains was as important as ligand flexibility for successful protein-peptide docking. On the basis of mean field theory, a transferable potential was designed to evaluate distance-dependent protein-ligand interactions and atomic solvation energies. The potential parameters were developed using a self-consistent process based on only 10 known complex structures. The effectiveness of each intermediate potential was judged on the basis of a Z score, approximating the gap between the energy of the native complex and the average energy of a decoy set. The Z score was determined using experimentally determined native structures and decoys generated by docking with the intermediate potentials. Using 6600 generated decoys and the Z score optimization criterion proposed in this work, the developed potential yielded an acceptable correlation of R(2) = 0.77, with binding free energies determined for known MHC I complexes (Class I Major Histocompatibility protein HLA-A(*)0201) which were not present in the training set. Test docking on 25 complexes further revealed a significant correlation between energy and dRMS, important for identifying native-like conformations. The near-native structures always belonged to one of the conformational classes with lower predicted binding energy. The lowest energy docked conformations are generally associated with near-native conformations, less than 3.0 Angstrom dRMS (and in many cases less than 1.0 Angstrom) from the experimentally determined structures.
Collapse
|
137
|
Dominy BN, Shakhnovich EI. Native Atom Types for Knowledge-Based Potentials: Application to Binding Energy Prediction. J Med Chem 2004; 47:4538-58. [PMID: 15317465 DOI: 10.1021/jm0498046] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Knowledge-based potentials have been found useful in a variety of biophysical studies of macromolecules. Recently, it has also been shown in self-consistent studies that it is possible to extract quantities consistent with pair potentials from model structural databases. In this study, we attempt to extend the results obtained from these self-consistent studies toward the extraction of realistic pair potentials from the Protein Data Bank (PDB). The new method utilizes a clustering approach to define atom types within the PDB consistent with the optimal effective pairwise potential. The method has been integrated into the SMoG drug design package, resulting in an improved approach for the rapid and accurate estimation of binding affinities from structural information. Using this approach, it is possible to generate simple knowledge-based potentials that correlate (R = 0.61) with experimental binding affinities in a database of 118 diverse complexes. Furthermore, predictions performed on a random 1/3 of the database consistently show an average unsigned error of 1.5 log Ki units. It is also possible to generate specialized knowledge-based potentials, targeted to specific protein families. This approach is capable of generating potentials that correlate strongly with experimental binding affinities within these families (R = 0.8-0.9). Predictions on 1/3 of these family databases yield average unsigned errors ranging from 1.1 to 1.3 log Ki units. In summary, we describe a physically motivated approach to optimizing knowledge-based potentials for binding energy prediction that can be integrated into a variety of stages within a lead discovery protocol.
Collapse
|
138
|
Tannenbaum E, Shakhnovich EI. Solution of the quasispecies model for an arbitrary gene network. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2004; 70:021903. [PMID: 15447511 DOI: 10.1103/physreve.70.021903] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2004] [Indexed: 05/24/2023]
Abstract
In this paper, we study the equilibrium behavior of Eigen's quasispecies equations for an arbitrary gene network. We consider a genome consisting of N genes, so that the full genome sequence sigma may be written as sigma= sigma1sigma2...sigmaN, where sigma(i) are sequences of individual genes. We assume a single fitness peak model for each gene, so that gene i has some "master" sequence sigma(i,0) for which it is functioning. The fitness landscape is then determined by which genes in the genome are functioning and which are not. The equilibrium behavior of this model may be solved in the limit of infinite sequence length. The central result is that, instead of a single error catastrophe, the model exhibits a series of localization to delocalization transitions, which we term an "error cascade." As the mutation rate is increased, the selective advantage for maintaining functional copies of certain genes in the network disappears, and the population distribution delocalizes over the corresponding sequence spaces. The network goes through a series of such transitions, as more and more genes become inactivated, until eventually delocalization occurs over the entire genome space, resulting in a final error catastrophe. This model provides a criterion for determining the conditions under which certain genes in a genome will lose functionality due to genetic drift. It also provides insight into the response of gene networks to mutagens. In particular, it suggests an approach for determining the relative importance of various genes to the fitness of an organism, in a more accurate manner than the standard "deletion set" method. The results in this paper also have implications for mutational robustness and what C.O. Wilke termed "survival of the flattest."
Collapse
|
139
|
Geissler PL, Shakhnovich EI, Grosberg AY. Solvation versus freezing in a heteropolymer globule. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2004; 70:021802. [PMID: 15447508 DOI: 10.1103/physreve.70.021802] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/05/2003] [Revised: 04/09/2004] [Indexed: 05/24/2023]
Abstract
We address the response of a random heteropolymer to preferential solvation of certain monomer types at the globule-solvent interface. For each set of monomers that can comprise the molecule's surface, we represent the ensemble of allowed configurations by a Gaussian distribution of energy levels, whose mean and variance depend on the set's composition. Within such a random energy model, mean surface composition is proportional to solvation strength under most conditions. The breadth of this linear response regime arises from the approximate statistical independence of surface and volume energies. Fluctuations play a crucial role in determining the excess of solvophilic monomers at the surface, and for a diverse set of monomer types can be overcome only by very strong solvent preference.
Collapse
|
140
|
Brumer Y, Shakhnovich EI. Host-parasite coevolution and optimal mutation rates for semiconservative quasispecies. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2004; 69:061909. [PMID: 15244619 DOI: 10.1103/physreve.69.061909] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/20/2004] [Indexed: 05/24/2023]
Abstract
In this paper, we extend a model of host-parasite coevolution to incorporate the semiconservative nature of DNA replication for both the host and the parasite. We find that the optimal mutation rate for the semiconservative and conservative hosts converge for realistic genome lengths, thus maintaining the admirable agreement between theory and experiment found previously for the conservative model and justifying the conservative approximation in some cases. We demonstrate that, while the optimal mutation rate for a conservative and semiconservative parasite interacting with a given immune system is similar to that of a conservative parasite, the properties away from this optimum differ significantly. We suspect that this difference, coupled with the requirement that a parasite optimize survival in a range of viable hosts, may help explain why semiconservative viruses are known to have significantly lower mutation rates than their conservative counterparts.
Collapse
|
141
|
Tannenbaum E, Deeds EJ, Shakhnovich EI. Semiconservative replication in the quasispecies model. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2004; 69:061916. [PMID: 15244626 DOI: 10.1103/physreve.69.061916] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/28/2003] [Revised: 01/23/2004] [Indexed: 05/24/2023]
Abstract
This paper extends Eigen's quasispecies equations to account for the semiconservative nature of DNA replication. We solve the equations in the limit of infinite sequence length for the simplest case of a static, sharply peaked fitness landscape. We show that the error catastrophe occurs when micro, the product of sequence length and per base pair mismatch probability, exceeds 2 ln [2/ ( 1+1/k ) ], where k>1 is the first-order growth rate constant of the viable "master" sequence (with all other sequences having a first-order growth rate constant of 1 ). This is in contrast to the result of ln k for conservative replication. In particular, as k--> infinity, the error catastrophe is never reached for conservative replication, while for semiconservative replication the critical micro approaches 2 ln 2. Semiconservative replication is therefore considerably less robust than conservative replication to the effect of replication errors. We also show that the mean equilibrium fitness of a semiconservatively replicating system is given by k ( 2 e(-micro/2) -1 ) below the error catastrophe, in contrast to the standard result of k e(-micro) for conservative replication (derived by Kimura and Maruyama in 1966). From this result it is readily shown that semiconservative replication is necessary to account for the observation that, at sufficiently high mutagen concentrations, faster replicating cells will die more quickly than more slowly replicating cells. Thus, in contrast to Eigen's original model, the semiconservative quasispecies equations are able to provide a mathematical basis for explaining the efficacy of mutagens as chemotherapeutic agents.
Collapse
|
142
|
Deeds EJ, Shakhnovich B, Shakhnovich EI. Proteomic traces of speciation. J Mol Biol 2004; 336:695-706. [PMID: 15095981 DOI: 10.1016/j.jmb.2003.12.066] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2003] [Revised: 11/01/2003] [Accepted: 12/19/2003] [Indexed: 10/26/2022]
Abstract
Recent work has shown that the network of structural similarity between protein domains exhibits a power-law distribution of edges per node. The scale-free nature of this graph, termed the protein domain universe graph or PDUG, may be reproduced via a divergent model of structural evolution. The performance of this model, however, does not preclude the existence of a successful convergent model. To further resolve the issue of protein structural evolution, we explore the predictions of both convergent and divergent models directly. We show that when nodes from the PDUG are partitioned into subgraphs on the basis of their occurrence in the proteomes of particular organisms, these subgraphs exhibit a scale-free nature as well. We explore a simple convergent model of structural evolution and find that the implications of this model are inconsistent with features of these organismal subgraphs. Importantly, we find that biased convergent models are inconsistent with our data. We find that when speciation mechanisms are added to a simple divergent model, subgraphs similar to the organismal subgraphs are produced, demonstrating that dynamic models can easily explain the distributions of structural similarity that exist within proteomes. We show that speciation events must be included in a divergent model of structural evolution to account for the non-random overlap of structural proteomes. These findings have implications for the long-standing debate over convergent and divergent models of protein structural evolution, and for the study of the evolution of organisms as a whole.
Collapse
|
143
|
Hubner IA, Oliveberg M, Shakhnovich EI. Simulation, experiment, and evolution: understanding nucleation in protein S6 folding. Proc Natl Acad Sci U S A 2004; 101:8354-9. [PMID: 15150413 PMCID: PMC420398 DOI: 10.1073/pnas.0401672101] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
In this study, we explore nucleation and the transition state ensemble of the ribosomal protein S6 using a Monte Carlo (MC) Go model in conjunction with restraints from experiment. The results are analyzed in the context of extensive experimental and evolutionary data. The roles of individual residues in the folding nucleus are identified, and the order of events in the S6 folding mechanism is explored in detail. Interpretation of our results agrees with, and extends the utility of, experiments that shift phi-values by modulating denaturant concentration and presents strong evidence for the realism of the mechanistic details in our MC Go model and the structural interpretation of experimental phi-values. We also observe plasticity in the contacts of the hydrophobic core that support the specific nucleus. For S6, which binds to RNA and protein after folding, this plasticity may result from the conformational flexibility required to achieve biological function. These results present a theoretical and conceptual picture that is relevant in understanding the mechanism of nucleation in protein folding.
Collapse
|
144
|
Tiana G, Shakhnovich BE, Dokholyan NV, Shakhnovich EI. Imprint of evolution on protein structures. Proc Natl Acad Sci U S A 2004; 101:2846-51. [PMID: 14970345 PMCID: PMC365708 DOI: 10.1073/pnas.0306638101] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2003] [Accepted: 12/22/2003] [Indexed: 11/18/2022] Open
Abstract
We attempt to understand the evolutionary origin of protein folds by simulating their divergent evolution with a three-dimensional lattice model. Starting from an initial seed lattice structure, evolution of model proteins progresses by sequence duplication and subsequent point mutations. A new gene's ability to fold into a stable and unique structure is tested each time through direct kinetic folding simulations. Where possible, the algorithm accepts the new sequence and structure and thus a "new protein structure" is born. During the course of each run, this model evolutionary algorithm provides several thousand new proteins with diverse structures. Analysis of evolved structures shows that later evolved structures are more designable than seed structures as judged by recently developed structural determinant of protein designability, as well as direct estimate of designability for selected structures by thermodynamic sampling of their sequence space. We test the significance of this trend predicted on lattice models on real proteins and show that protein domains that are found in eukaryotic organisms only feature statistically significant higher designability than their prokaryotic counterparts. These results present a fundamental view on protein evolution highlighting the relative roles of structural selection and evolutionary dynamics on genesis of modern proteins.
Collapse
|
145
|
Hubner IA, Shimada J, Shakhnovich EI. Commitment and Nucleation in the Protein G Transition State. J Mol Biol 2004; 336:745-61. [PMID: 15095985 DOI: 10.1016/j.jmb.2003.12.032] [Citation(s) in RCA: 58] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2003] [Revised: 11/19/2003] [Accepted: 12/08/2003] [Indexed: 10/26/2022]
Abstract
An accurate characterization of the transition state ensemble (TSE) is central to furthering our understanding of the protein folding reaction. We have extensively tested a recently reported method for studying a protein's TSE, utilizing phi-value data from protein engineering experiments and computational studies as restraints in all-atom Monte Carlo (MC) simulations. The validity of interpreting experimental phi-values as the fraction of native contacts made by a residue in the TSE was explored, revealing that this definition is unable to uniquely specify a TSE. The identification of protein G's second hairpin, in both pre and post-transition conformations demonstrates that high experimental phi-values do not guarantee a residue's importance in the TSE. An analysis of simulations based on structures restrained by experimental phi-values is necessary to yield this result, which is not obvious from a simplistic interpretation of individual phi-values. The TSE that we obtain corresponds to a single, specific nucleation event, characterized by six residues common to all three observed, convergent folding pathways. The same specific nucleus was independently identified from computational and experimental data, and "Conservation of Conservation" analysis in the protein G fold. When associated strictly with complete nucleus formation and concomitant chain collapse, folding is a well-defined two state event. Once the nucleus has formed, the folding reaction enters a slow relaxation process associated with side-chain packing and small, local backbone rearrangements. A detailed analysis of phi-values and their relationship to the transition state ensemble allows us to construct a unified theoretical model of protein G folding.
Collapse
|
146
|
Tannenbaum E, Shakhnovich EI. Error and repair catastrophes: A two-dimensional phase diagram in the quasispecies model. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2004; 69:011902. [PMID: 14995642 DOI: 10.1103/physreve.69.011902] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/09/2003] [Revised: 09/04/2003] [Indexed: 05/24/2023]
Abstract
This paper develops a two-gene, single fitness peak model for determining the equilibrium distribution of genotypes in a unicellular population which is capable of genetic damage repair. The first gene, denoted by sigma(via), yields a viable organism with first-order growth rate constant k>1 if it is equal to some target "master" sequence sigma(via,0). The second gene, denoted by sigma(rep), yields an organism capable of genetic repair if it is equal to some target "master" sequence sigma(rep,0). This model is analytically solvable in the limit of infinite sequence length, and gives an equilibrium distribution which depends on micro identical with Lepsilon, the product of sequence length and per base pair replication error probability, and epsilon(r), the probability of repair failure per base pair. The equilibrium distribution is shown to exist in one of the three possible "phases." In the first phase, the population is localized about the viability and repairing master sequences. As epsilon(r) exceeds the fraction of deleterious mutations, the population undergoes a "repair" catastrophe, in which the equilibrium distribution is still localized about the viability master sequence, but is spread ergodically over the sequence subspace defined by the repair gene. Below the repair catastrophe, the distribution undergoes the error catastrophe when micro exceeds ln k/epsilon(r), while above the repair catastrophe, the distribution undergoes the error catastrophe when micro exceeds ln k/f(del), where f(del) denotes the fraction of deleterious mutations.
Collapse
|
147
|
Deeds EJ, Dokholyan NV, Shakhnovich EI. Protein evolution within a structural space. Biophys J 2003; 85:2962-72. [PMID: 14581198 PMCID: PMC1303574 DOI: 10.1016/s0006-3495(03)74716-x] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2003] [Accepted: 07/28/2003] [Indexed: 10/21/2022] Open
Abstract
Understanding of the evolutionary origins of protein structures represents a key component of the understanding of molecular evolution as a whole. Here we seek to elucidate how the features of an underlying protein structural "space" might impact protein structural evolution. We approach this question using lattice polymers as a completely characterized model of this space. We develop a measure of structural comparison of lattice structures that is analogous to the one used to understand structural similarities between real proteins. We use this measure of structural relatedness to create a graph of lattice structures and compare this graph (in which nodes are lattice structures and edges are defined using structural similarity) to the graph obtained for real protein structures. We find that the graph obtained from all compact lattice structures exhibits a distribution of structural neighbors per node consistent with a random graph. We also find that subgraphs of 3500 nodes chosen either at random or according to physical constraints also represent random graphs. We develop a divergent evolution model based on the lattice space which produces graphs that, within certain parameter regimes, recapitulate the scale-free behavior observed in similar graphs of real protein structures.
Collapse
|
148
|
Pei J, Dokholyan NV, Shakhnovich EI, Grishin NV. Using protein design for homology detection and active site searches. Proc Natl Acad Sci U S A 2003; 100:11361-6. [PMID: 12975528 PMCID: PMC208762 DOI: 10.1073/pnas.2034878100] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2003] [Indexed: 11/18/2022] Open
Abstract
We describe a method of designing artificial sequences that resemble naturally occurring sequences in terms of their compatibility with a template structure and its functional constraints. The design procedure is a Monte Carlo simulation of amino acid substitution process. The selective fixation of substitutions is dictated by a simple scoring function derived from the template structure and a multiple alignment of its homologs. Designed sequences represent an enlargement of sequence space around native sequences. We show that the use of designed sequences improves the performance of profile-based homology detection. The difference in position-specific conservation between designed sequences and native sequences is helpful for prediction of functionally important residues. Our sequence selection criteria in evolutionary simulations introduce amino acid substitution rate variation among sites in a natural way, providing a better model to test phylogenetic methods.
Collapse
|
149
|
Tannenbaum E, Deeds EJ, Shakhnovich EI. Equilibrium distribution of mutators in the single fitness peak model. PHYSICAL REVIEW LETTERS 2003; 91:138105. [PMID: 14525341 DOI: 10.1103/physrevlett.91.138105] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2003] [Indexed: 05/24/2023]
Abstract
This Letter develops an analytically tractable model for determining the equilibrium distribution of mismatch repair deficient strains in unicellular populations. The approach is based on the single fitness peak model, which has been used in Eigen's quasispecies equations in order to understand various aspects of evolutionary dynamics. As with the quasispecies model, our model for mutator-nonmutator equilibrium undergoes a phase transition in the limit of infinite sequence length. This "repair catas-trophe" occurs at a critical repair error probability of epsilon(r)=L(via)/L, where L(via) denotes the length of the genome controlling viability, while L denotes the overall length of the genome. The repair catastrophe therefore occurs when the repair error probability exceeds the fraction of deleterious mutations. Our model also gives a quantitative estimate for the equilibrium fraction of mutators in Escherichia coli.
Collapse
|
150
|
Abstract
The processes by which protein side chains reach equilibrium during a folding reaction are investigated using both lattice and all-atom simulations. We find that rates of side-chain relaxation exhibit a distribution over the protein structure, with the fastest relaxing side chains located in positions kinetically important for folding. Traversal of the major folding transition state corresponds to the freezing of a small number of side chains, belonging to the folding nucleus, whereas the rest of the protein proceeds toward equilibrium via backbone fluctuations around the native fold. The postnucleation processes by which side chains relax are characterized by very slow dynamics and many barrier crossings, and thus resemble the behavior of a glass.
Collapse
|