1
|
Martin NS, Schaper S, Camargo CQ, Louis AA. Non-Poissonian Bursts in the Arrival of Phenotypic Variation Can Strongly Affect the Dynamics of Adaptation. Mol Biol Evol 2024; 41:msae085. [PMID: 38693911 PMCID: PMC11156200 DOI: 10.1093/molbev/msae085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 03/01/2024] [Accepted: 04/17/2024] [Indexed: 05/03/2024] Open
Abstract
Modeling the rate at which adaptive phenotypes appear in a population is a key to predicting evolutionary processes. Given random mutations, should this rate be modeled by a simple Poisson process, or is a more complex dynamics needed? Here we use analytic calculations and simulations of evolving populations on explicit genotype-phenotype maps to show that the introduction of novel phenotypes can be "bursty" or overdispersed. In other words, a novel phenotype either appears multiple times in quick succession or not at all for many generations. These bursts are fundamentally caused by statistical fluctuations and other structure in the map from genotypes to phenotypes. Their strength depends on population parameters, being highest for "monomorphic" populations with low mutation rates. They can also be enhanced by additional inhomogeneities in the mapping from genotypes to phenotypes. We mainly investigate the effect of bursts using the well-studied genotype-phenotype map for RNA secondary structure, but find similar behavior in a lattice protein model and in Richard Dawkins's biomorphs model of morphological development. Bursts can profoundly affect adaptive dynamics. Most notably, they imply that fitness differences play a smaller role in determining which phenotype fixes than would be the case for a Poisson process without bursts.
Collapse
Affiliation(s)
- Nora S Martin
- Rudolf Peierls Centre for Theoretical Physics, University of Oxford, Oxford OX1 3PU, UK
| | - Steffen Schaper
- Rudolf Peierls Centre for Theoretical Physics, University of Oxford, Oxford OX1 3PU, UK
| | - Chico Q Camargo
- Rudolf Peierls Centre for Theoretical Physics, University of Oxford, Oxford OX1 3PU, UK
- Faculty of Environment, Science and Economy, University of Exeter, Exeter EX4 4QF, UK
| | - Ard A Louis
- Rudolf Peierls Centre for Theoretical Physics, University of Oxford, Oxford OX1 3PU, UK
| |
Collapse
|
2
|
Martin NS, Ahnert SE. The Boltzmann distributions of molecular structures predict likely changes through random mutations. Biophys J 2023; 122:4467-4475. [PMID: 37897043 PMCID: PMC10698324 DOI: 10.1016/j.bpj.2023.10.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 08/19/2023] [Accepted: 10/20/2023] [Indexed: 10/29/2023] Open
Abstract
New folded molecular structures can only evolve after arising through mutations. This aspect is modeled using genotype-phenotype maps, which connect sequence changes through mutations to changes in molecular structures. Previous work has shown that the likelihood of appearing through mutations can differ by orders of magnitude from structure to structure and that this can affect the outcomes of evolutionary processes. Thus, we focus on the phenotypic mutation probabilities φqp, i.e., the likelihood that a random mutation changes structure p into structure q. For both RNA secondary structures and the HP protein model, we show that a simple biophysical principle can explain and predict how this likelihood depends on the new structure q: φqp is high if sequences that fold into p as the minimum-free-energy structure are likely to have q as an alternative structure with high Boltzmann frequency. This generalizes the existing concept of plastogenetic congruence from individual sequences to the entire neutral spaces of structures. Our result helps us understand why some structural changes are more likely than others, may be useful for estimating these likelihoods via sampling and makes a connection to alternative structures with high Boltzmann frequency, which could be relevant in evolutionary processes.
Collapse
Affiliation(s)
- Nora S Martin
- Rudolf Peierls Centre for Theoretical Physics, University of Oxford, Oxford, United Kingdom; Theory of Condensed Matter Group, Cavendish Laboratory, University of Cambridge, Cambridge, United Kingdom; Sainsbury Laboratory, University of Cambridge, Cambridge, United Kingdom.
| | - Sebastian E Ahnert
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, United Kingdom; The Alan Turing Institute, London, United Kingdom
| |
Collapse
|
3
|
Salazar-Ciudad I, Cano-Fernández H. Evo-devo beyond development: Generalizing evo-devo to all levels of the phenotypic evolution. Bioessays 2023; 45:e2200205. [PMID: 36739577 DOI: 10.1002/bies.202200205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Revised: 12/25/2022] [Accepted: 01/12/2023] [Indexed: 02/06/2023]
Abstract
A foundational idea of evo-devo is that morphological variation is not isotropic, that is, it does not occur in all directions. Instead, some directions of morphological variation are more likely than others from DNA-level variation and these largely depend on development. We argue that this evo-devo perspective should apply not only to morphology but to evolution at all phenotypic levels. At other phenotypic levels there is no development, but there are processes that can be seen, in analogy to development, as constructing the phenotype (e.g., protein folding, learning for behavior, etc.). We argue that to explain the direction of evolution two types of arguments need to be combined: generative arguments about which phenotypic variation arises in each generation and selective arguments about which of it passes to the next generation. We explain how a full consideration of the two types of arguments improves the explanatory power of evolutionary theory. Also see the video abstract here: https://youtu.be/Egbvma_uaKc.
Collapse
Affiliation(s)
- Isaac Salazar-Ciudad
- Centre de Recerca Matemàtica, Cerdanyola del Vallès, Spain.,Genomics, Bioinformatics and Evolution, Departament de Genètica i Microbiologia, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Hugo Cano-Fernández
- Genomics, Bioinformatics and Evolution, Departament de Genètica i Microbiologia, Universitat Autònoma de Barcelona, Barcelona, Spain
| |
Collapse
|
4
|
The structure of genotype-phenotype maps makes fitness landscapes navigable. Nat Ecol Evol 2022; 6:1742-1752. [PMID: 36175543 DOI: 10.1038/s41559-022-01867-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Accepted: 08/01/2022] [Indexed: 11/09/2022]
Abstract
Fitness landscapes are often described in terms of 'peaks' and 'valleys', indicating an intuitive low-dimensional landscape of the kind encountered in everyday experience. The space of genotypes, however, is extremely high dimensional, which results in counter-intuitive structural properties of genotype-phenotype maps. Here we show that these properties, such as the presence of pervasive neutral networks, make fitness landscapes navigable. For three biologically realistic genotype-phenotype map models-RNA secondary structure, protein tertiary structure and protein complexes-we find that, even under random fitness assignment, fitness maxima can be reached from almost any other phenotype without passing through fitness valleys. This in turn indicates that true fitness valleys are very rare. By considering evolutionary simulations between pairs of real examples of functional RNA sequences, we show that accessible paths are also likely to be used under evolutionary dynamics. Our findings have broad implications for the prediction of natural evolutionary outcomes and for directed evolution.
Collapse
|
5
|
Takahashi T, Chikenji G, Tokita K. Lattice protein design using Bayesian learning. Phys Rev E 2021; 104:014404. [PMID: 34412286 DOI: 10.1103/physreve.104.014404] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2020] [Accepted: 06/11/2021] [Indexed: 01/01/2023]
Abstract
Protein design is the inverse approach of the three-dimensional (3D) structure prediction for elucidating the relationship between the 3D structures and amino acid sequences. In general, the computation of the protein design involves a double loop: A loop for amino acid sequence changes and a loop for an exhaustive conformational search for each amino acid sequence. Herein, we propose a novel statistical mechanical design method using Bayesian learning, which can design lattice proteins without the exhaustive conformational search. We consider a thermodynamic hypothesis of the evolution of proteins and apply it to the prior distribution of amino acid sequences. Furthermore, we take the water effect into account in view of the grand canonical picture. As a result, on applying the 2D lattice hydrophobic-polar (HP) model, our design method successfully finds an amino acid sequence for which the target conformation has a unique ground state. However, the performance was not as good for the 3D lattice HP models compared to the 2D models. The performance of the 3D model improves on using a 20-letter lattice proteins. Furthermore, we find a strong linearity between the chemical potential of water and the number of surface residues, thereby revealing the relationship between protein structure and the effect of water molecules. The advantage of our method is that it greatly reduces computation time, because it does not require long calculations for the partition function corresponding to an exhaustive conformational search. As our method uses a general form of Bayesian learning and statistical mechanics and is not limited to lattice proteins, the results presented here elucidate some heuristics used successfully in previous protein design methods.
Collapse
Affiliation(s)
- Tomoei Takahashi
- Graduate School of Informatics, Nagoya University, Nagoya 464-8601, Japan
| | - George Chikenji
- Graduate School of Engineering, Nagoya University, Nagoya 464-8603, Japan
| | - Kei Tokita
- Graduate School of Informatics, Nagoya University, Nagoya 464-8601, Japan
| |
Collapse
|
6
|
Manrubia S, Cuesta JA, Aguirre J, Ahnert SE, Altenberg L, Cano AV, Catalán P, Diaz-Uriarte R, Elena SF, García-Martín JA, Hogeweg P, Khatri BS, Krug J, Louis AA, Martin NS, Payne JL, Tarnowski MJ, Weiß M. From genotypes to organisms: State-of-the-art and perspectives of a cornerstone in evolutionary dynamics. Phys Life Rev 2021; 38:55-106. [PMID: 34088608 DOI: 10.1016/j.plrev.2021.03.004] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Accepted: 03/01/2021] [Indexed: 12/21/2022]
Abstract
Understanding how genotypes map onto phenotypes, fitness, and eventually organisms is arguably the next major missing piece in a fully predictive theory of evolution. We refer to this generally as the problem of the genotype-phenotype map. Though we are still far from achieving a complete picture of these relationships, our current understanding of simpler questions, such as the structure induced in the space of genotypes by sequences mapped to molecular structures, has revealed important facts that deeply affect the dynamical description of evolutionary processes. Empirical evidence supporting the fundamental relevance of features such as phenotypic bias is mounting as well, while the synthesis of conceptual and experimental progress leads to questioning current assumptions on the nature of evolutionary dynamics-cancer progression models or synthetic biology approaches being notable examples. This work delves with a critical and constructive attitude into our current knowledge of how genotypes map onto molecular phenotypes and organismal functions, and discusses theoretical and empirical avenues to broaden and improve this comprehension. As a final goal, this community should aim at deriving an updated picture of evolutionary processes soundly relying on the structural properties of genotype spaces, as revealed by modern techniques of molecular and functional analysis.
Collapse
Affiliation(s)
- Susanna Manrubia
- Department of Systems Biology, Centro Nacional de Biotecnología (CSIC), Madrid, Spain; Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain.
| | - José A Cuesta
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain; Departamento de Matemáticas, Universidad Carlos III de Madrid, Leganés, Spain; Instituto de Biocomputación y Física de Sistemas Complejos (BiFi), Universidad de Zaragoza, Spain; UC3M-Santander Big Data Institute (IBiDat), Getafe, Madrid, Spain
| | - Jacobo Aguirre
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain; Centro de Astrobiología, CSIC-INTA, ctra. de Ajalvir km 4, 28850 Torrejón de Ardoz, Madrid, Spain
| | - Sebastian E Ahnert
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, UK; The Alan Turing Institute, British Library, 96 Euston Road, London NW1 2DB, UK
| | | | - Alejandro V Cano
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland; Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Pablo Catalán
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain; Departamento de Matemáticas, Universidad Carlos III de Madrid, Leganés, Spain
| | - Ramon Diaz-Uriarte
- Department of Biochemistry, Universidad Autónoma de Madrid, Madrid, Spain; Instituto de Investigaciones Biomédicas "Alberto Sols" (UAM-CSIC), Madrid, Spain
| | - Santiago F Elena
- Instituto de Biología Integrativa de Sistemas, I(2)SysBio (CSIC-UV), València, Spain; The Santa Fe Institute, Santa Fe, NM, USA
| | | | - Paulien Hogeweg
- Theoretical Biology and Bioinformatics Group, Utrecht University, the Netherlands
| | - Bhavin S Khatri
- The Francis Crick Institute, London, UK; Department of Life Sciences, Imperial College London, London, UK
| | - Joachim Krug
- Institute for Biological Physics, University of Cologne, Köln, Germany
| | - Ard A Louis
- Rudolf Peierls Centre for Theoretical Physics, University of Oxford, Oxford, UK
| | - Nora S Martin
- Theory of Condensed Matter Group, Cavendish Laboratory, University of Cambridge, Cambridge, UK; Sainsbury Laboratory, University of Cambridge, Cambridge, UK
| | - Joshua L Payne
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland; Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | | | - Marcel Weiß
- Theory of Condensed Matter Group, Cavendish Laboratory, University of Cambridge, Cambridge, UK; Sainsbury Laboratory, University of Cambridge, Cambridge, UK
| |
Collapse
|
7
|
Farris ACK, Seaton DT, Landau DP. Effects of lattice constraints in coarse-grained protein models. J Chem Phys 2021; 154:084903. [PMID: 33639740 DOI: 10.1063/5.0038184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We compare and contrast folding behavior in several coarse-grained protein models, both on- and off-lattice, in an attempt to uncover the effect of lattice constraints in these kinds of models. Using modern, extended ensemble Monte Carlo methods-Wang-Landau sampling, multicanonical sampling, replica-exchange Wang-Landau sampling, and replica-exchange multicanonical sampling, we investigate the thermodynamic and structural behavior of the protein Crambin within the context of the hydrophobic-polar, hydrophobic-"neutral"-polar (H0P), and semi-flexible H0P model frameworks. We uncover the folding process in all cases; all models undergo, at least, the two major structural transitions observed in nature-the coil-globule collapse and the folding transition. As the complexity of the model increases, these two major transitions begin to split into multi-step processes, wherein the lattice coarse-graining has a significant impact on the details of these processes. The results show that the level of structural coarse-graining is coupled to the level of interaction coarse-graining.
Collapse
Affiliation(s)
- Alfred C K Farris
- Department of Physics and Astronomy, Oxford College of Emory University, Oxford, Georgia 30054, USA
| | - Daniel T Seaton
- Open Learning, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - David P Landau
- Center for Simulational Physics, Department of Physics and Astronomy, The University of Georgia, Athens, Georgia 30602, USA
| |
Collapse
|
8
|
Catalán P, Wagner A, Manrubia S, Cuesta JA. Adding levels of complexity enhances robustness and evolvability in a multilevel genotype-phenotype map. J R Soc Interface 2019; 15:rsif.2017.0516. [PMID: 29321269 DOI: 10.1098/rsif.2017.0516] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2017] [Accepted: 12/01/2017] [Indexed: 01/24/2023] Open
Abstract
Robustness and evolvability are the main properties that account for the stability and accessibility of phenotypes. They have been studied in a number of computational genotype-phenotype maps. In this paper, we study a metabolic genotype-phenotype map defined in toyLIFE, a multilevel computational model that represents a simplified cellular biology. toyLIFE includes several levels of phenotypic expression, from proteins to regulatory networks to metabolism. Our results show that toyLIFE shares many similarities with other seemingly unrelated computational genotype-phenotype maps. Thus, toyLIFE shows a high degeneracy in the mapping from genotypes to phenotypes, as well as a highly skewed distribution of phenotypic abundances. The neutral networks associated with abundant phenotypes are highly navigable, and common phenotypes are close to each other in genotype space. All of these properties are remarkable, as toyLIFE is built on a version of the HP protein-folding model that is neither robust nor evolvable: phenotypes cannot be mutually accessed through point mutations. In addition, both robustness and evolvability increase with the number of genes in a genotype. Therefore, our results suggest that adding levels of complexity to the mapping of genotypes to phenotypes and increasing genome size enhances both these properties.
Collapse
Affiliation(s)
- Pablo Catalán
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain .,Departamento de Matematicas, Universidad Carlos III de Madrid, Madrid, Spain
| | - Andreas Wagner
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland.,Santa Fe Institute, Santa Fe, NM, USA.,Swiss Institute of Bioinformatics, Zurich, Switzerland
| | - Susanna Manrubia
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain.,Programa de Biología de Sistemas, Centro Nacional de Biotecnologia, Madrid, Spain
| | - José A Cuesta
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain.,Departamento de Matematicas, Universidad Carlos III de Madrid, Madrid, Spain.,Instituto de Biocomputación y Física de Sistemas Complejos (BIFI), Universidad de Zaragoza, Zaragoza, Spain.,Institute of Financial Big Data (IFiBiD), Universidad Carlos III de Madrid, UC3M-BS, Madrid, Spain
| |
Collapse
|
9
|
Shi G, Wüst T, Landau DP. Elucidating thermal behavior, native contacts, and folding funnels of simple lattice proteins using replica exchange Wang-Landau sampling. J Chem Phys 2018; 149:164913. [PMID: 30384708 DOI: 10.1063/1.5026256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
We studied the folding behavior of two coarse-grained, lattice models, the HP (hydrophobic-polar) model and the semi-flexible H0P model, whose 124 monomer long sequences were derived from the protein Ribonuclease A. Taking advantage of advanced parallel computing techniques, we applied replica exchange Wang-Landau sampling and calculated the density of states over the models entire energy ranges to high accuracy. We then determined both energetic and structural quantities in order to elucidate the folding behavior of each model completely. As a result of sufficiently long sequences and model complexity, yet computational accessibility, we were able to depict distinct free energy folding funnels for both models. In particular, we found that the HP model folds in a single-step process with a very highly degenerate native state and relatively flat low temperature folding funnel minimum. By contrast, the semi-flexible H0P model folds via a multi-step process and the native state is almost four orders of magnitude less degenerate than that for the HP model. In addition, for the H0P model, the bottom of the free energy folding funnel remains rough, even at low temperatures.
Collapse
Affiliation(s)
- Guangjie Shi
- Center for Simulational Physics, The University of Georgia, Athens, Georgia 30602-0002, USA
| | - Thomas Wüst
- Scientific IT Services, ETH Zurich, 8092 Zurich, Switzerland
| | - David P Landau
- Center for Simulational Physics, The University of Georgia, Athens, Georgia 30602-0002, USA
| |
Collapse
|
10
|
Farris ACK, Shi G, Wüst T, Landau DP. The role of chain-stiffness in lattice protein models: A replica-exchange Wang-Landau study. J Chem Phys 2018; 149:125101. [PMID: 30278675 DOI: 10.1063/1.5045482] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Using Monte Carlo simulations, we investigate simple, physically motivated extensions to the hydrophobic-polar lattice protein model for the small (46 amino acid) protein Crambin. We use two-dimensional replica-exchange Wang-Landau sampling to study the effects of a bond angle stiffness parameter on the folding and uncover a new step in the collapse process for particular values of this stiffness parameter. A physical interpretation of the folding is developed by analysis of changes in structural quantities, and the free energy landscape is explored. For these special values of stiffness, we find non-degenerate ground states, a property that is consistent with behavior of real proteins, and we use these unique ground states to elucidate the formation of native contacts during the folding process. Through this analysis, we conclude that chain-stiffness is particularly influential in the low energy, low temperature regime of the folding process once the lattice protein has partially collapsed.
Collapse
Affiliation(s)
- Alfred C K Farris
- Center for Simulational Physics, Department of Physics and Astronomy, The University of Georgia, Athens, Georgia 30602, USA
| | - Guangjie Shi
- Center for Simulational Physics, Department of Physics and Astronomy, The University of Georgia, Athens, Georgia 30602, USA
| | - Thomas Wüst
- Scientific IT Services, ETH Zürich, 8092 Zürich, Switzerland
| | - David P Landau
- Center for Simulational Physics, Department of Physics and Astronomy, The University of Georgia, Athens, Georgia 30602, USA
| |
Collapse
|
11
|
García-Martín JA, Catalán P, Manrubia S, Cuesta JA. Statistical theory of phenotype abundance distributions: A test through exact enumeration of genotype spaces. ACTA ACUST UNITED AC 2018. [DOI: 10.1209/0295-5075/123/28001] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
12
|
Aguirre J, Catalán P, Cuesta JA, Manrubia S. On the networked architecture of genotype spaces and its critical effects on molecular evolution. Open Biol 2018; 8:180069. [PMID: 29973397 PMCID: PMC6070719 DOI: 10.1098/rsob.180069] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Accepted: 06/12/2018] [Indexed: 12/26/2022] Open
Abstract
Evolutionary dynamics is often viewed as a subtle process of change accumulation that causes a divergence among organisms and their genomes. However, this interpretation is an inheritance of a gradualistic view that has been challenged at the macroevolutionary, ecological and molecular level. Actually, when the complex architecture of genotype spaces is taken into account, the evolutionary dynamics of molecular populations becomes intrinsically non-uniform, sharing deep qualitative and quantitative similarities with slowly driven physical systems: nonlinear responses analogous to critical transitions, sudden state changes or hysteresis, among others. Furthermore, the phenotypic plasticity inherent to genotypes transforms classical fitness landscapes into multiscapes where adaptation in response to an environmental change may be very fast. The quantitative nature of adaptive molecular processes is deeply dependent on a network-of-networks multilayered structure of the map from genotype to function that we begin to unveil.
Collapse
Affiliation(s)
- Jacobo Aguirre
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain
- Programa de Biología de Sistemas, Centro Nacional de Biotecnología (CSIC), Madrid, Spain
| | - Pablo Catalán
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain
- Departamento de Matemáticas, Universidad Carlos III de Madrid, Leganés, Madrid, Spain
| | - José A Cuesta
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain
- Departamento de Matemáticas, Universidad Carlos III de Madrid, Leganés, Madrid, Spain
- Instituto de Biocomputación y Física de Sistemas Complejos (BIFI), Universidad de Zaragoza, Zaragoza, Spain
- UC3M-BS Institute of Financial Big Data (IFiBiD), Universidad Carlos III de Madrid, Getafe, Madrid, Spain
| | - Susanna Manrubia
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain
- Programa de Biología de Sistemas, Centro Nacional de Biotecnología (CSIC), Madrid, Spain
| |
Collapse
|
13
|
Manrubia S, Cuesta JA. Distribution of genotype network sizes in sequence-to-structure genotype-phenotype maps. J R Soc Interface 2017; 14:rsif.2016.0976. [PMID: 28424303 DOI: 10.1098/rsif.2016.0976] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2016] [Accepted: 03/22/2017] [Indexed: 01/10/2023] Open
Abstract
An essential quantity to ensure evolvability of populations is the navigability of the genotype space. Navigability, understood as the ease with which alternative phenotypes are reached, relies on the existence of sufficiently large and mutually attainable genotype networks. The size of genotype networks (e.g. the number of RNA sequences folding into a particular secondary structure or the number of DNA sequences coding for the same protein structure) is astronomically large in all functional molecules investigated: an exhaustive experimental or computational study of all RNA folds or all protein structures becomes impossible even for moderately long sequences. Here, we analytically derive the distribution of genotype network sizes for a hierarchy of models which successively incorporate features of increasingly realistic sequence-to-structure genotype-phenotype maps. The main feature of these models relies on the characterization of each phenotype through a prototypical sequence whose sites admit a variable fraction of letters of the alphabet. Our models interpolate between two limit distributions: a power-law distribution, when the ordering of sites in the prototypical sequence is strongly constrained, and a lognormal distribution, as suggested for RNA, when different orderings of the same set of sites yield different phenotypes. Our main result is the qualitative and quantitative identification of those features of sequence-to-structure maps that lead to different distributions of genotype network sizes.
Collapse
Affiliation(s)
- Susanna Manrubia
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain .,Departamento de Biología de Sistemas, Centro Nacional de Biotecnología (CSIC), Madrid, Spain
| | - José A Cuesta
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain.,Departamento de Matemáticas, Universidad Carlos III de Madrid, Leganés, Madrid, Spain.,Instituto de Biocomputación y Física de Sistemas Complejos (BIFI), Universidad de Zaragoza, Zaragoza, Spain.,UC3M-BS Institute of Financial Big Data (IFiBiD), Universidad Carlos III de Madrid, Getafe, Madrid, Spain
| |
Collapse
|
14
|
Tian P, Best RB. How Many Protein Sequences Fold to a Given Structure? A Coevolutionary Analysis. Biophys J 2017; 113:1719-1730. [PMID: 29045866 PMCID: PMC5647607 DOI: 10.1016/j.bpj.2017.08.039] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2017] [Revised: 08/03/2017] [Accepted: 08/08/2017] [Indexed: 12/23/2022] Open
Abstract
Quantifying the relationship between protein sequence and structure is key to understanding the protein universe. A fundamental measure of this relationship is the total number of amino acid sequences that can fold to a target protein structure, known as the "sequence capacity," which has been suggested as a proxy for how designable a given protein fold is. Although sequence capacity has been extensively studied using lattice models and theory, numerical estimates for real protein structures are currently lacking. In this work, we have quantitatively estimated the sequence capacity of 10 proteins with a variety of different structures using a statistical model based on residue-residue co-evolution to capture the variation of sequences from the same protein family. Remarkably, we find that even for the smallest protein folds, such as the WW domain, the number of foldable sequences is extremely large, exceeding the Avogadro constant. In agreement with earlier theoretical work, the calculated sequence capacity is positively correlated with the size of the protein, or better, the density of contacts. This allows the absolute sequence capacity of a given protein to be approximately predicted from its structure. On the other hand, the relative sequence capacity, i.e., normalized by the total number of possible sequences, is an extremely tiny number and is strongly anti-correlated with the protein length. Thus, although there may be more foldable sequences for larger proteins, it will be much harder to find them. Lastly, we have correlated the evolutionary age of proteins in the CATH database with their sequence capacity as predicted by our model. The results suggest a trade-off between the opposing requirements of high designability and the likelihood of a novel fold emerging by chance.
Collapse
Affiliation(s)
- Pengfei Tian
- Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland
| | - Robert B Best
- Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland.
| |
Collapse
|
15
|
Foldamer hypothesis for the growth and sequence differentiation of prebiotic polymers. Proc Natl Acad Sci U S A 2017; 114:E7460-E7468. [PMID: 28831002 DOI: 10.1073/pnas.1620179114] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
It is not known how life originated. It is thought that prebiotic processes were able to synthesize short random polymers. However, then, how do short-chain molecules spontaneously grow longer? Also, how would random chains grow more informational and become autocatalytic (i.e., increasing their own concentrations)? We study the folding and binding of random sequences of hydrophobic ([Formula: see text]) and polar ([Formula: see text]) monomers in a computational model. We find that even short hydrophobic polar (HP) chains can collapse into relatively compact structures, exposing hydrophobic surfaces. In this way, they act as primitive versions of today's protein catalysts, elongating other such HP polymers as ribosomes would now do. Such foldamer catalysts are shown to form an autocatalytic set, through which short chains grow into longer chains that have particular sequences. An attractive feature of this model is that it does not overconverge to a single solution; it gives ensembles that could further evolve under selection. This mechanism describes how specific sequences and conformations could contribute to the chemistry-to-biology (CTB) transition.
Collapse
|
16
|
Catalán P, Arias CF, Cuesta JA, Manrubia S. Adaptive multiscapes: an up-to-date metaphor to visualize molecular adaptation. Biol Direct 2017; 12:7. [PMID: 28245845 PMCID: PMC5331743 DOI: 10.1186/s13062-017-0178-1] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2016] [Accepted: 02/11/2017] [Indexed: 01/08/2023] Open
Abstract
Background Wright’s metaphor of the fitness landscape has shaped and conditioned our view of the adaptation of populations for almost a century. Since its inception, and including criticism raised by Wright himself, the concept has been surrounded by controversy. Among others, the debate stems from the intrinsic difficulty to capture important features of the space of genotypes, such as its high dimensionality or the existence of abundant ridges, in a visually appealing two-dimensional picture. Two additional currently widespread observations come to further constrain the applicability of the original metaphor: the very skewed distribution of phenotype sizes (which may actively prevent, due to entropic effects, the achievement of fitness maxima), and functional promiscuity (i.e. the existence of secondary functions which entail partial adaptation to environments never encountered before by the population). Results Here we revise some of the shortcomings of the fitness landscape metaphor and propose a new “scape” formed by interconnected layers, each layer containing the phenotypes viable in a given environment. Different phenotypes within a layer are accessible through mutations with selective value, while neutral mutations cause displacements of populations within a phenotype. A different environment is represented as a separated layer, where phenotypes may have new fitness values, other phenotypes may be viable, and the same genotype may yield a different phenotype, representing genotypic promiscuity. This scenario explicitly includes the many-to-many structure of the genotype-to-phenotype map. A number of empirical observations regarding the adaptation of populations in the light of adaptive multiscapes are reviewed. Conclusions Several shortcomings of Wright’s visualization of fitness landscapes can be overcome through adaptive multiscapes. Relevant aspects of population adaptation, such as neutral drift, functional promiscuity or environment-dependent fitness, as well as entropic trapping and the concomitant impossibility to reach fitness peaks are visualized at once. Adaptive multiscapes should aid in the qualitative understanding of the multiple pathways involved in evolutionary dynamics. Reviewers This article was reviewed by Eugene Koonin and Ricard Solé.
Collapse
Affiliation(s)
- Pablo Catalán
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain.,Departamento de Matemáticas, Universidad Carlos III de Madrid, Madrid, Spain
| | - Clemente F Arias
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain
| | - Jose A Cuesta
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain.,Departamento de Matemáticas, Universidad Carlos III de Madrid, Madrid, Spain.,Institute for Biocomputation and Physics of Complex Systems, Zaragoza, Spain.,UC3M-BS Institute of Financial Big Data (IFiBiD), Madrid, Spain
| | - Susanna Manrubia
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain. .,National Biotechnology Centre (CSIC), c/ Darwin 3, Madrid, 28049, Spain.
| |
Collapse
|
17
|
Yang X, Lu ZY. Control globular structure formation of a copolymer chain through inverse design. J Chem Phys 2016; 144:224902. [PMID: 27306020 DOI: 10.1063/1.4953576] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
A copolymer chain in dilute solution can exhibit various globular structures with characteristic morphologies, which makes it a potentially useful candidate for artificial materials design. However, the chain has a huge conformation space and may not naturally form the globular structure we desire. An ideal way to control globular structure formation should be inverse design, i.e., starting from the target structure and finding out what kind of polymers can effectively generate it. To accomplish this, we propose an inverse design procedure, which is combined with Wang-Landau Monte Carlo to fully and precisely explore the huge conformation space of the chain. Starting from a desired target structure, all the geometrically possible sequences are exactly enumerated. Interestingly, reasonable interaction strengths are obtained and found to be not specified for only one sequence. Instead, they can be combined with many other sequences and also achieve a relatively high yield for target structure, although these sequences may be rather different. These results confirm the possibility of controlling globular structure formation of a copolymer chain through inverse design and pave the way for targeted materials design.
Collapse
Affiliation(s)
- Xi Yang
- State Key Laboratory of Supramolecular Structure and Materials, Institute of Theoretical Chemistry, Jilin University, Changchun 130021, China
| | - Zhong-Yuan Lu
- State Key Laboratory of Supramolecular Structure and Materials, Institute of Theoretical Chemistry, Jilin University, Changchun 130021, China
| |
Collapse
|
18
|
Greenbury SF, Schaper S, Ahnert SE, Louis AA. Genetic Correlations Greatly Increase Mutational Robustness and Can Both Reduce and Enhance Evolvability. PLoS Comput Biol 2016; 12:e1004773. [PMID: 26937652 PMCID: PMC4777517 DOI: 10.1371/journal.pcbi.1004773] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2015] [Accepted: 01/24/2016] [Indexed: 11/18/2022] Open
Abstract
Mutational neighbourhoods in genotype-phenotype (GP) maps are widely believed to be more likely to share characteristics than expected from random chance. Such genetic correlations should strongly influence evolutionary dynamics. We explore and quantify these intuitions by comparing three GP maps-a model for RNA secondary structure, the HP model for protein tertiary structure, and the Polyomino model for protein quaternary structure-to a simple random null model that maintains the number of genotypes mapping to each phenotype, but assigns genotypes randomly. The mutational neighbourhood of a genotype in these GP maps is much more likely to contain genotypes mapping to the same phenotype than in the random null model. Such neutral correlations can be quantified by the robustness to mutations, which can be many orders of magnitude larger than that of the null model, and crucially, above the critical threshold for the formation of large neutral networks of mutationally connected genotypes which enhance the capacity for the exploration of phenotypic novelty. Thus neutral correlations increase evolvability. We also study non-neutral correlations: Compared to the null model, i) If a particular (non-neutral) phenotype is found once in the 1-mutation neighbourhood of a genotype, then the chance of finding that phenotype multiple times in this neighbourhood is larger than expected; ii) If two genotypes are connected by a single neutral mutation, then their respective non-neutral 1-mutation neighbourhoods are more likely to be similar; iii) If a genotype maps to a folding or self-assembling phenotype, then its non-neutral neighbours are less likely to be a potentially deleterious non-folding or non-assembling phenotype. Non-neutral correlations of type i) and ii) reduce the rate at which new phenotypes can be found by neutral exploration, and so may diminish evolvability, while non-neutral correlations of type iii) may instead facilitate evolutionary exploration and so increase evolvability.
Collapse
Affiliation(s)
- Sam F. Greenbury
- Theory of Condensed Matter Group, Cavendish Laboratory, University of Cambridge, Cambridge, United Kingdom
- * E-mail:
| | - Steffen Schaper
- Rudolf Peierls Centre for Theoretical Physics, University of Oxford, Oxford, United Kingdom
| | - Sebastian E. Ahnert
- Theory of Condensed Matter Group, Cavendish Laboratory, University of Cambridge, Cambridge, United Kingdom
| | - Ard A. Louis
- Rudolf Peierls Centre for Theoretical Physics, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
19
|
Martins PHL, Bachmann M. Interlocking order parameter fluctuations in structural transitions between adsorbed polymer phases. Phys Chem Chem Phys 2016; 18:2143-51. [PMID: 26690091 DOI: 10.1039/c5cp05038c] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
By means of contact-density chain-growth simulations of a simple coarse-grained lattice model for a polymer grafted at a solid homogeneous substrate, we investigate the complementary behavior of the numbers of surface-monomer and monomer-monomer contacts under various solvent and thermal conditions. This pair of contact numbers represents an appropriate set of order parameters that enables the distinct discrimination of significantly different compact phases of polymer adsorption. Depending on the transition scenario, these order parameters can interlock in perfect cooperation. The analysis helps understand the transitions from compact filmlike adsorbed polymer conformations into layered morphologies and dissolved adsorbed structures, respectively, in more detail.
Collapse
Affiliation(s)
- Paulo H L Martins
- Instituto de Física, Universidade Federal de Mato Grosso, 78060-900 Cuiabá, MT, Brazil.
| | | |
Collapse
|
20
|
Exhaustive Analysis of a Genotype Space Comprising 10(15 )Central Carbon Metabolisms Reveals an Organization Conducive to Metabolic Innovation. PLoS Comput Biol 2015; 11:e1004329. [PMID: 26252881 PMCID: PMC4529314 DOI: 10.1371/journal.pcbi.1004329] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2014] [Accepted: 04/28/2015] [Indexed: 11/24/2022] Open
Abstract
All biological evolution takes place in a space of possible genotypes and their phenotypes. The structure of this space defines the evolutionary potential and limitations of an evolving system. Metabolism is one of the most ancient and fundamental evolving systems, sustaining life by extracting energy from extracellular nutrients. Here we study metabolism’s potential for innovation by analyzing an exhaustive genotype-phenotype map for a space of 1015 metabolisms that encodes all possible subsets of 51 reactions in central carbon metabolism. Using flux balance analysis, we predict the viability of these metabolisms on 10 different carbon sources which give rise to 1024 potential metabolic phenotypes. Although viable metabolisms with any one phenotype comprise a tiny fraction of genotype space, their absolute numbers exceed 109 for some phenotypes. Metabolisms with any one phenotype typically form a single network of genotypes that extends far or all the way through metabolic genotype space, where any two genotypes can be reached from each other through a series of single reaction changes. The minimal distance of genotype networks associated with different phenotypes is small, such that one can reach metabolisms with novel phenotypes – viable on new carbon sources – through one or few genotypic changes. Exceptions to these principles exist for those metabolisms whose complexity (number of reactions) is close to the minimum needed for viability. Increasing metabolic complexity enhances the potential for both evolutionary conservation and evolutionary innovation. Genotype-phenotype mapping is one of the ultimate goals of computational systems biology, and can provide new insights into the function and evolution of biological systems. We present a comprehensive genotype-phenotype map for a space of metabolic genotypes that comprises more than 1015 central carbon metabolisms. Only one in a million of these metabolisms can sustain life on any one of 10 carbon sources we consider, but these viable metabolisms form connected genotype networks that extend far through genotype space. In addition, they render multiple novel metabolic phenotypes in their immediate neighborhood accessible through small evolutionary changes that require only the alteration of single metabolic reactions. The map we construct reveals an organization of core metabolism that simultaneously facilitates evolutionary conservation of existing metabolic phenotypes, and the origination of novel metabolic traits that allow viability on novel carbon sources. Such metabolic innovation is essential, particularly for organisms that experience unexpected environmental changes, and that explore or invade new habitats.
Collapse
|
21
|
Hidaka T, Shimada A, Nakata Y, Kodama H, Kurihara H, Tokihiro T, Ihara S. Simple model of pH-induced protein denaturation. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2015; 92:012709. [PMID: 26274205 DOI: 10.1103/physreve.92.012709] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/19/2014] [Indexed: 06/04/2023]
Abstract
The pH-induced conformational changes of proteins are systematically studied in the framework of a hydrophobic-polar (HP) model, in which proteins are dramatically simplified as chains of hydrophobic (H) and polar (P) beads on a lattice. We express the electrostatic interaction, the principal driving force of pH-induced unfolding that is not included in the conventional HP model, as the repulsive energy term between P monomers. As a result of the exact enumeration of all of the 14- to 18-mers, it is found that lowest-energy states in many sequences change from single "native" conformations to multiple sets of "denatured" conformations with an increase in the electrostatic repulsion. The switching of the lowest-energy states occurs in quite a similar way to real proteins: it is almost always between two states, while in a small fraction of ≥16-mers it is between three states. We also calculate the structural fluctuations for all of the denatured states and find that the denatured states contain a broad range of incompletely unfolded conformations, similar to "molten globule" states referred to in acid or alkaline denatured real proteins. These results show that the proposed model provides a simple physical picture of pH-induced protein denaturation.
Collapse
Affiliation(s)
- T Hidaka
- Graduate School of Medicine and Faculty of Medicine, The University of Tokyo, Hongo, Tokyo 113-0033, Japan
| | - A Shimada
- Graduate School of Medicine and Faculty of Medicine, The University of Tokyo, Hongo, Tokyo 113-0033, Japan
| | - Y Nakata
- Institute for Biology and Mathematics of Dynamic Cellular Processes (iBMath), The University of Tokyo, Komaba, Tokyo 153-8904, Japan
- Graduate School of Mathematical Sciences, The University of Tokyo, Komaba, Tokyo 153-8902, Japan
| | - H Kodama
- Institute for Biology and Mathematics of Dynamic Cellular Processes (iBMath), The University of Tokyo, Komaba, Tokyo 153-8904, Japan
- Graduate School of Mathematical Sciences, The University of Tokyo, Komaba, Tokyo 153-8902, Japan
| | - H Kurihara
- Graduate School of Medicine and Faculty of Medicine, The University of Tokyo, Hongo, Tokyo 113-0033, Japan
- Institute for Biology and Mathematics of Dynamic Cellular Processes (iBMath), The University of Tokyo, Komaba, Tokyo 153-8904, Japan
- Core Research for Evolutional Science and Technology (CREST), Japan Science and Technology Agency (JST), Chiyoda-ku, Tokyo 102-0076, Japan
| | - T Tokihiro
- Institute for Biology and Mathematics of Dynamic Cellular Processes (iBMath), The University of Tokyo, Komaba, Tokyo 153-8904, Japan
- Graduate School of Mathematical Sciences, The University of Tokyo, Komaba, Tokyo 153-8902, Japan
- Core Research for Evolutional Science and Technology (CREST), Japan Science and Technology Agency (JST), Chiyoda-ku, Tokyo 102-0076, Japan
| | - S Ihara
- Institute for Biology and Mathematics of Dynamic Cellular Processes (iBMath), The University of Tokyo, Komaba, Tokyo 153-8904, Japan
- Graduate School of Mathematical Sciences, The University of Tokyo, Komaba, Tokyo 153-8902, Japan
- Research Center for Advanced Science and Technology, The University of Tokyo, Komaba, Tokyo 153-8904, Japan
| |
Collapse
|
22
|
Sikosek T, Chan HS. Biophysics of protein evolution and evolutionary protein biophysics. J R Soc Interface 2015; 11:20140419. [PMID: 25165599 DOI: 10.1098/rsif.2014.0419] [Citation(s) in RCA: 150] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
The study of molecular evolution at the level of protein-coding genes often entails comparing large datasets of sequences to infer their evolutionary relationships. Despite the importance of a protein's structure and conformational dynamics to its function and thus its fitness, common phylogenetic methods embody minimal biophysical knowledge of proteins. To underscore the biophysical constraints on natural selection, we survey effects of protein mutations, highlighting the physical basis for marginal stability of natural globular proteins and how requirement for kinetic stability and avoidance of misfolding and misinteractions might have affected protein evolution. The biophysical underpinnings of these effects have been addressed by models with an explicit coarse-grained spatial representation of the polypeptide chain. Sequence-structure mappings based on such models are powerful conceptual tools that rationalize mutational robustness, evolvability, epistasis, promiscuous function performed by 'hidden' conformational states, resolution of adaptive conflicts and conformational switches in the evolution from one protein fold to another. Recently, protein biophysics has been applied to derive more accurate evolutionary accounts of sequence data. Methods have also been developed to exploit sequence-based evolutionary information to predict biophysical behaviours of proteins. The success of these approaches demonstrates a deep synergy between the fields of protein biophysics and protein evolution.
Collapse
Affiliation(s)
- Tobias Sikosek
- Department of Biochemistry, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Physics, University of Toronto, Toronto, Ontario, Canada M5S 1A8
| | - Hue Sun Chan
- Department of Biochemistry, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Physics, University of Toronto, Toronto, Ontario, Canada M5S 1A8
| |
Collapse
|
23
|
Ferrada E. The amino acid alphabet and the architecture of the protein sequence-structure map. I. Binary alphabets. PLoS Comput Biol 2014; 10:e1003946. [PMID: 25473967 PMCID: PMC4256021 DOI: 10.1371/journal.pcbi.1003946] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2014] [Accepted: 09/26/2014] [Indexed: 11/19/2022] Open
Abstract
The correspondence between protein sequences and structures, or sequence-structure map, relates to fundamental aspects of structural, evolutionary and synthetic biology. The specifics of the mapping, such as the fraction of accessible sequences and structures, or the sequences' ability to fold fast, are dictated by the type of interactions between the monomers that compose the sequences. The set of possible interactions between monomers is encapsulated by the potential energy function. In this study, I explore the impact of the relative forces of the potential on the architecture of the sequence-structure map. My observations rely on simple exact models of proteins and random samples of the space of potential energy functions of binary alphabets. I adopt a graph perspective and study the distribution of viable sequences and the structures they produce, as networks of sequences connected by point mutations. I observe that the relative proportion of attractive, neutral and repulsive forces defines types of potentials, that induce sequence-structure maps of vastly different architectures. I characterize the properties underlying these differences and relate them to the structure of the potential. Among these properties are the expected number and relative distribution of sequences associated to specific structures and the diversity of structures as a function of sequence divergence. I study the types of binary potentials observed in natural amino acids and show that there is a strong bias towards only some types of potentials, a bias that seems to characterize the folding code of natural proteins. I discuss implications of these observations for the architecture of the sequence-structure map of natural proteins, the construction of random libraries of peptides, and the early evolution of the natural amino acid alphabet.
Collapse
Affiliation(s)
- Evandro Ferrada
- Santa Fe Institute, Santa Fe, New Mexico, United States of America
| |
Collapse
|
24
|
Shi G, Vogel T, Wüst T, Li YW, Landau DP. Effect of single-site mutations on hydrophobic-polar lattice proteins. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2014; 90:033307. [PMID: 25314564 DOI: 10.1103/physreve.90.033307] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/10/2014] [Indexed: 06/04/2023]
Abstract
We developed a heuristic method for determining the ground-state degeneracy of hydrophobic-polar (HP) lattice proteins, based on Wang-Landau and multicanonical sampling. It is applied during comprehensive studies of single-site mutations in specific HP proteins with different sequences. The effects in which we are interested include structural changes in ground states, changes of ground-state energy, degeneracy, and thermodynamic properties of the system. With respect to mutations, both extremely sensitive and insensitive positions in the HP sequence have been found. That is, ground-state energies and degeneracies, as well as other thermodynamic and structural quantities, may be either largely unaffected or may change significantly due to mutation.
Collapse
Affiliation(s)
- Guangjie Shi
- Center for Simulational Physics, The University of Georgia, Athens, Georgia 30602, USA
| | - Thomas Vogel
- Theoretical Division (T-1), Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
| | - Thomas Wüst
- Scientific IT Services, ETH Zürich IT Services, 8092 Zürich, Switzerland
| | - Ying Wai Li
- National Center for Computational Sciences, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, USA
| | - David P Landau
- Center for Simulational Physics, The University of Georgia, Athens, Georgia 30602, USA
| |
Collapse
|
25
|
Greenbury SF, Johnston IG, Louis AA, Ahnert SE. A tractable genotype-phenotype map modelling the self-assembly of protein quaternary structure. J R Soc Interface 2014; 11:20140249. [PMID: 24718456 PMCID: PMC4006268 DOI: 10.1098/rsif.2014.0249] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
The mapping between biological genotypes and phenotypes is central to the study of biological evolution. Here, we introduce a rich, intuitive and biologically realistic genotype–phenotype (GP) map that serves as a model of self-assembling biological structures, such as protein complexes, and remains computationally and analytically tractable. Our GP map arises naturally from the self-assembly of polyomino structures on a two-dimensional lattice and exhibits a number of properties: redundancy (genotypes vastly outnumber phenotypes), phenotype bias (genotypic redundancy varies greatly between phenotypes), genotype component disconnectivity (phenotypes consist of disconnected mutational networks) and shape space covering (most phenotypes can be reached in a small number of mutations). We also show that the mutational robustness of phenotypes scales very roughly logarithmically with phenotype redundancy and is positively correlated with phenotypic evolvability. Although our GP map describes the assembly of disconnected objects, it shares many properties with other popular GP maps for connected units, such as models for RNA secondary structure or the hydrophobic-polar (HP) lattice model for protein tertiary structure. The remarkable fact that these important properties similarly emerge from such different models suggests the possibility that universal features underlie a much wider class of biologically realistic GP maps.
Collapse
Affiliation(s)
- Sam F Greenbury
- Theory of Condensed Matter Group, Cavendish Laboratory, University of Cambridge, , Cambridge, UK
| | | | | | | |
Collapse
|
26
|
Giaquinta E, Pozzi L. An Effective Exact Algorithm and a New Upper Bound for the Number of Contacts in the Hydrophobic-Polar Two-Dimensional Lattice Model. J Comput Biol 2013; 20:593-609. [DOI: 10.1089/cmb.2012.0266] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Emanuele Giaquinta
- Department of Computer Science, University of Helsinki, Helsinki, Finland
| | - Laura Pozzi
- Faculty of Informatics, University of Lugano (USI), Lugano, Switzerland
| |
Collapse
|
27
|
Ferrada E, Wagner A. A comparison of genotype-phenotype maps for RNA and proteins. Biophys J 2012; 102:1916-25. [PMID: 22768948 DOI: 10.1016/j.bpj.2012.01.047] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2011] [Revised: 01/19/2012] [Accepted: 01/27/2012] [Indexed: 02/04/2023] Open
Abstract
The relationship between the genotype (sequence) and the phenotype (structure) of macromolecules affects their ability to evolve new structures and functions. We here compare the genotype space organization of proteins and RNA molecules to identify differences that may affect this ability. To this end, we computationally study the genotype-phenotype relationship for short RNA and lattice proteins of a reduced monomer alphabet size, to make exhaustive analysis and direct comparison of their genotype spaces feasible. We find that many fewer protein molecules than RNA molecules fold, but they fold into many more structures than RNA. In consequence, protein phenotypes have smaller genotype networks whose member genotypes tend to be more similar than for RNA phenotypes. Neighborhoods in sequence space of a given radius around an RNA molecule contain more novel structures than for protein molecules. We compare this property to evidence from natural RNA and protein molecules, and conclude that RNA genotype space may be more conducive to the evolution of new structure phenotypes.
Collapse
Affiliation(s)
- Evandro Ferrada
- Institute of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland.
| | | |
Collapse
|
28
|
Narasimhan SL, Rajarajan AK, Vardharaj L. HP-sequence design for lattice proteins—An exact enumeration study on diamond as well as square lattice. J Chem Phys 2012; 137:115102. [DOI: 10.1063/1.4752479] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
|
29
|
Sikosek T, Bornberg-Bauer E, Chan HS. Evolutionary dynamics on protein bi-stability landscapes can potentially resolve adaptive conflicts. PLoS Comput Biol 2012; 8:e1002659. [PMID: 23028272 PMCID: PMC3441461 DOI: 10.1371/journal.pcbi.1002659] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2012] [Accepted: 07/12/2012] [Indexed: 11/18/2022] Open
Abstract
Experimental studies have shown that some proteins exist in two alternative native-state conformations. It has been proposed that such bi-stable proteins can potentially function as evolutionary bridges at the interface between two neutral networks of protein sequences that fold uniquely into the two different native conformations. Under adaptive conflict scenarios, bi-stable proteins may be of particular advantage if they simultaneously provide two beneficial biological functions. However, computational models that simulate protein structure evolution do not yet recognize the importance of bi-stability. Here we use a biophysical model to analyze sequence space to identify bi-stable or multi-stable proteins with two or more equally stable native-state structures. The inclusion of such proteins enhances phenotype connectivity between neutral networks in sequence space. Consideration of the sequence space neighborhood of bridge proteins revealed that bi-stability decreases gradually with each mutation that takes the sequence further away from an exactly bi-stable protein. With relaxed selection pressures, we found that bi-stable proteins in our model are highly successful under simulated adaptive conflict. Inspired by these model predictions, we developed a method to identify real proteins in the PDB with bridge-like properties, and have verified a clear bi-stability gradient for a series of mutants studied by Alexander et al. (Proc Nat Acad Sci USA 2009, 106:21149–21154) that connect two sequences that fold uniquely into two different native structures via a bridge-like intermediate mutant sequence. Based on these findings, new testable predictions for future studies on protein bi-stability and evolution are discussed. Proteins are essential molecules for performing a majority of functions in all biological systems. These functions often depend on the three-dimensional structures of proteins. Here, we investigate a fundamental question in molecular evolution: how can proteins acquire new advantageous structures via mutations while not sacrificing their existing structures that are still needed? Some authors have suggested that the same protein may adopt two or more alternative structures, switch between them and thus perform different functions with each of the alternative structures. Intuitively, such a protein could provide an evolutionary compromise between conflicting demands for existing and new protein structures. Yet no theoretical study has systematically tackled the biophysical basis of such compromises during evolutionary processes. Here we devise a model of evolution that specifically recognizes protein molecules that can exist in several different stable structures. Our model demonstrates that proteins can indeed utilize multiple structures to satisfy conflicting evolutionary requirements. In light of these results, we identify data from known protein structures that are consistent with our predictions and suggest novel directions for future investigation.
Collapse
Affiliation(s)
- Tobias Sikosek
- Evolutionary Bioinformatics Group, Institute for Evolution and Biodiversity, University of Münster, Münster, Germany.
| | | | | |
Collapse
|
30
|
Moreno-Hernández S, Levitt M. Comparative modeling and protein-like features of hydrophobic-polar models on a two-dimensional lattice. Proteins 2012; 80:1683-93. [PMID: 22411636 DOI: 10.1002/prot.24067] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2011] [Revised: 02/26/2012] [Accepted: 03/03/2012] [Indexed: 11/07/2022]
Abstract
Lattice models of proteins have been extensively used to study protein thermodynamics, folding dynamics, and evolution. Our study considers two different hydrophobic-polar (HP) models on the 2D square lattice: the purely HP model and a model where a compactness-favoring term is added. We exhaustively enumerate all the possible structures in our models and perform the study of their corresponding folds, HP arrangements in space and shapes. The two models considered differ greatly in their numbers of structures, folds, arrangements, and shapes. Despite their differences, both lattice models have distinctive protein-like features: (1) Shapes are compact in both models, especially when a compactness-favoring energy term is added. (2) The residue composition is independent of the chain length and is very close to 50% hydrophobic in both models, as we observe in real proteins. (3) Comparative modeling works well in both models, particularly in the more compact one. The fact that our models show protein-like features suggests that lattice models incorporate the fundamental physical principles of proteins. Our study supports the use of lattice models to study questions about proteins that require exactness and extensive calculations, such as protein design and evolution, which are often too complex and computationally demanding to be addressed with more detailed models.
Collapse
Affiliation(s)
- Sergio Moreno-Hernández
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | | |
Collapse
|
31
|
Holzgräfe C, Irbäck A, Troein C. Mutation-induced fold switching among lattice proteins. J Chem Phys 2011; 135:195101. [DOI: 10.1063/1.3660691] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
32
|
Srivastava S, Patton Y, Fisher DW, Wood GR. Cotranslational protein folding and terminus hydrophobicity. Adv Bioinformatics 2011; 2011:176813. [PMID: 21687643 PMCID: PMC3112501 DOI: 10.1155/2011/176813] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2010] [Revised: 02/28/2011] [Accepted: 03/24/2011] [Indexed: 11/18/2022] Open
Abstract
Peptides fold on a time scale that is much smaller than the time required for synthesis, whence all proteins potentially fold cotranslationally to some degree (followed by additional folding events after release from the ribosome). In this paper, in three different ways, we find that cotranslational folding success is associated with higher hydrophobicity at the N-terminus than at the C-terminus. First, we fold simple HP models on a square lattice and observe that HP sequences that fold better cotranslationally than from a fully extended state exhibit a positive difference (N-C) in terminus hydrophobicity. Second, we examine real proteins using a previously established measure of potential cotranslationality known as ALR (Average Logarithmic Ratio of the extent of previous contacts) and again find a correlation with the difference in terminus hydrophobicity. Finally, we use the cotranslational protein structure prediction program SAINT and again find that such an approach to folding is more successful for proteins with higher N-terminus than C-terminus hydrophobicity. All results indicate that cotranslational folding is promoted in part by a hydrophobic start and a less hydrophobic finish to the sequence.
Collapse
Affiliation(s)
- Sheenal Srivastava
- Department of Statistics, Macquarie University, Sydney, NSW 2109, Australia
| | - Yumi Patton
- Department of Statistics, Macquarie University, Sydney, NSW 2109, Australia
| | - David W. Fisher
- Department of Statistics, Macquarie University, Sydney, NSW 2109, Australia
| | - Graham R. Wood
- Department of Statistics, Macquarie University, Sydney, NSW 2109, Australia
| |
Collapse
|
33
|
Chan HS. Short-Range Contact Preferences and Long-Range Indifference: Is Protein Folding Stoichiometry Driven? J Biomol Struct Dyn 2011; 28:603-5; discussion 669-674. [DOI: 10.1080/073911011010524960] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
34
|
Statistical theory of neutral protein evolution by random site mutations. J CHEM SCI 2009. [DOI: 10.1007/s12039-009-0105-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
35
|
Khoo A, Iwaki T, Shew CY, Yoshikawa K. Preferential positioning of a nanoparticle bound to a polymer: Exact enumeration of a self-avoiding walk chain model. J Chem Phys 2009. [DOI: 10.1063/1.3216571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
36
|
Mann M, Maticzka D, Saunders R, Backofen R. Classifying proteinlike sequences in arbitrary lattice protein models using LatPack. HFSP JOURNAL 2008; 2:396-404. [PMID: 19436498 DOI: 10.2976/1.3027681] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/18/2008] [Accepted: 10/23/2008] [Indexed: 01/06/2023]
Abstract
Knowledge of a protein's three-dimensional native structure is vital in determining its chemical properties and functionality. However, experimental methods to determine structure are very costly and time-consuming. Computational approaches such as folding simulations and structure prediction algorithms are quicker and cheaper but lack consistent accuracy. This currently restricts extensive computational studies to abstract protein models. It is thus essential that simplifications induced by the models do not negate scientific value. Key to this is the use of thoroughly defined proteinlike sequences. In such cases abstract models can allow for the investigation of important biological questions. Here, we present a procedure to generate and classify proteinlike sequence data sets. Our LatPack tools and the approach in general are applicable to arbitrary lattice protein models. Identification is based on thermodynamic kinetic features and incorporates the sequential assembly of proteins by addressing cotranslational folding. We demonstrate the approach in the widely used unrestricted 3D-cubic HP-model. The resulting sequence set is the first large data set for this model exhibiting the proteinlike properties required. Our data tools are freely available and can be used to investigate protein-related problems.
Collapse
|
37
|
Abstract
The amino acid composition of intrinsically disordered proteins and protein segments characteristically differs from that of ordered proteins. This observation forms the basis of several disorder prediction methods. These, however, usually perform worse for smaller proteins (or segments) than for larger ones. We show that the regions of amino acid composition space corresponding to ordered and disordered proteins overlap with each other, and the extent of the overlap (the "twilight zone") is larger for short than for long chains. To explain this finding, we used two-dimensional lattice model proteins containing hydrophobic, polar, and charged monomers and revealed the relation among chain length, amino acid composition, and disorder. Because the number of chain configurations exponentially grows with chain length, a larger fraction of longer chains can reach a low-energy, ordered state than do shorter chains. The amount of information carried by the amino acid composition about whether a protein or segment is (dis)ordered grows with increasing chain length. Smaller proteins rely more on specific interactions for stability, which limits the possible accuracy of disorder prediction methods. For proteins in the "twilight zone", size can determine order, as illustrated by the example of two-state homodimers.
Collapse
|
38
|
Peto M, Kloczkowski A, Jernigan RL. Shape-dependent designability studies of lattice proteins. JOURNAL OF PHYSICS. CONDENSED MATTER : AN INSTITUTE OF PHYSICS JOURNAL 2007; 19:285220-285230. [PMID: 18079979 PMCID: PMC2134837 DOI: 10.1088/0953-8984/19/28/285220] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
One important problem in computational structural biology is protein designability, that is, why protein sequences are not random strings of amino acids but instead show regular patterns that encode protein structures. Many previous studies that have attempted to solve the problem have relied upon reduced models of proteins. In particular, the 2D square and the 3D cubic lattices together with reduced amino acid alphabet models have been examined extensively and have lead to interesting results that shed some light on evolutionary relationship among proteins. Here we perform designability studies on the 2D square lattice and explore the effects of variable overall shapes on protein designability using a binary hydrophobic-polar (HP) amino acid alphabet. Because we rely on a simple energy function that counts the total number of H-H interactions between non-sequential residues, we restrict our studies to protein shapes that have the same number of residues and also a constant number of non-bonded contacts. We have found that there is a marked difference in the designability between various protein shapes, with some of them accounting for a significantly larger share of the total foldable sequences.
Collapse
Affiliation(s)
- Myron Peto
- Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011-3020
| | | | | |
Collapse
|
39
|
Nanda V, Andrianarijaona A, Narayanan C. The role of protein homochirality in shaping the energy landscape of folding. Protein Sci 2007; 16:1667-75. [PMID: 17600146 PMCID: PMC2203351 DOI: 10.1110/ps.072867007] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
The homochirality, or isotacticity, of the natural amino acids facilitates the formation of regular secondary structures such as alpha-helices and beta-sheets. However, many examples exist in nature where novel polypeptide topologies use both l- and d-amino acids. In this study, we explore how stereochemistry of the polypeptide backbone influences basic properties such as compactness and the size of fold space by simulating both lattice and all-atom polypeptide chains. We formulate a rectangular lattice chain model in both two and three dimensions, where monomers are chiral, having the effect of restricting local conformation. Syndiotactic chains with alternating chirality of adjacent monomers have a very large ensemble of accessible conformations characterized predominantly by extended structures. Isotactic chains on the other hand, have far fewer possible conformations and a significant fraction of these are compact. Syndiotactic chains are often unable to access maximally compact states available to their isotactic counterparts of the same length. Similar features are observed in all-atom models of isotactic versus syndiotactic polyalanine. Our results suggest that protein isotacticity has evolved to increase the enthalpy of chain collapse by facilitating compact helical states and to reduce the entropic cost of folding by restricting the size of the unfolded ensemble of competing states.
Collapse
Affiliation(s)
- Vikas Nanda
- Center for Advanced Biotechnology and Medicine, Department of Biochemistry, Robert Wood Johnson Medical School, University of Medicine and Dentistry of New Jersey, Piscataway, New Jersey 08854, USA.
| | | | | |
Collapse
|
40
|
Abstract
It has been proposed that proteins fold by a process called "Zipping and Assembly" (Z&A). Zipping refers to the growth of local substructures within the chain, and assembly refers to the coming together of already-formed pieces. Our interest here is in whether Z&A is a general method that can fold most of sequence space, to global minima, efficiently. Using the HP model, we can address this question by enumerating full conformation and sequence spaces. We find that Z&A reaches the global energy minimum native states, even though it searches only a very small fraction of conformational space, for most sequences in the full sequence space. We find that Z&A, a mechanism-based search, is more efficient in our tests than the replica exchange search method. Folding efficiency is increased for chains having: (a) small loop-closure steps, consistent with observations by Plaxco et al. 1998;277;985-994 that folding rates correlate with contact order, (b) neither too few nor too many nucleation sites per chain, and (c) assembly steps that do not occur too early in the folding process. We find that the efficiency increases with chain length, although our range of chain lengths is limited. We believe these insights may be useful for developing faster protein conformational search algorithms.
Collapse
Affiliation(s)
- Vincent A Voelz
- Graduate Group in Biophysics, University of California at San Francisco, San Francisco, California 94143, USA
| | | |
Collapse
|
41
|
Rashin AA, Rashin AHL. Surface hydrophobic groups, stability, and flip-flopping in lattice proteins. Proteins 2007; 66:321-41. [PMID: 17096417 DOI: 10.1002/prot.21169] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Two-dimensional lattice protein models were studied in two approximations of the conformational equilibrium to elucidate the role of surface hydrophobic groups in their stabilities. We demonstrate that stability of any compactly folded sequence is determined by its ability to "flip-flop" (refold) into alternative compact structures. The degree of stability required for folded sequences determines the average numbers of surface hydrophobic groups in stable lattice structures which are in good agreement with ratios of core to surface hydrophobic groups in real proteins. However, the average destabilization of the native structure per surface hydrophobic group is small (0-0.25 kcal/mol), often disagrees with the free energies derived from the ratios of core to surface hydrophobic groups in the same structures, and has a combinatorial entropic nature independent of the strength of structure stabilizing interactions. This suggests that the free energies derived from the core to surface ratios of hydrophobic groups in real proteins have little to do with folding thermodynamics. On average, sequences with highly stable native structures are the least hydrophobic. The results suggest that in designing novel stable proteins hydrophobic groups on the surface should be avoided to reduce the possibility of flip-flopping. The average stability of highly designable structures is never higher than that of some low designability structures, contrary to the accepted view. In the equilibrium approximation with alternative compact and partially unfolded structures, the requirement of high stability selects a unique 5 x 5 structure formed by only a few sequences, suggesting much stronger sequence selectivity than commonly thought.
Collapse
|
42
|
Abstract
An important puzzle in structural biology is the question of how proteins are able to fold so quickly into their unique native structures. There is much evidence that protein folding is hierarchic. In that case, folding routes are not linear, but have a tree structure. Trees are commonly used to represent the grammatical structure of natural language sentences, and chart parsing algorithms efficiently search the space of all possible trees for a given input string. Here we show that one such method, the CKY algorithm, can be useful both for providing novel insight into the physical protein folding process, and for computational protein structure prediction. As proof of concept, we apply this algorithm to the HP lattice model of proteins. Our algorithm identifies all direct folding route trees to the native state and allows us to construct a simple model of the folding process. Despite its simplicity, our model provides an account for the fact that folding rates depend only on the topology of the native state but not on sequence composition.
Collapse
Affiliation(s)
- Julia Hockenmaier
- Institute for Research in Cognitive Science and Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA 19104-6228, USA.
| | | | | |
Collapse
|
43
|
Dias CL, Grant M. Designable structures are easy to unfold. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2006; 74:042902. [PMID: 17155116 DOI: 10.1103/physreve.74.042902] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2005] [Indexed: 05/12/2023]
Abstract
We study the structural stability of models of proteins for which the selected folds are unusually stable to mutation, that is, designable. A two-dimensional hydrophobic-polar lattice model was used to determine designable folds and these folds were investigated through Langevin dynamics. We find that the phase diagram of these proteins depends on their designability. In particular, highly designable folds are found to be weaker, i.e., easier to unfold, than low designable ones. We expect this to be related to protein flexibility.
Collapse
Affiliation(s)
- Cristiano L Dias
- Physics Department, Rutherford Building, McGill University, 3600 rue University, Montréal, Québec H3A 2T8, Canada
| | | |
Collapse
|
44
|
Bloom JD, Drummond DA, Arnold FH, Wilke CO. Structural determinants of the rate of protein evolution in yeast. Mol Biol Evol 2006; 23:1751-61. [PMID: 16782762 DOI: 10.1093/molbev/msl040] [Citation(s) in RCA: 148] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We investigate how a protein's structure influences the rate at which its sequence evolves. Our basic hypothesis is that proteins with highly designable structures (structures that are encoded by many sequences) will evolve more rapidly. Recent theoretical advances argue that structures with a higher density of interresidue contacts are more designable, and we show that high contact density is correlated with an increased rate of sequence evolution in yeast. In addition, we investigate the correlations between the rate of sequence evolution and several other structural descriptors, carefully controlling for the strong effect of expression level on evolutionary rate. Overall, we find that the structural descriptors that we consider appear to explain roughly 10% of the variation in rates of protein evolution in yeast. We also show that despite the well-known trend for buried residues to be more conserved, proteins with a higher fraction of buried residues, nonetheless, tend to evolve their sequences more rapidly. We suggest that this effect is due to the increased designability of structures with more buried residues. Our results provide evidence that protein structure plays an important role in shaping the rate of sequence evolution and provide evidence to support recent theoretical advances linking structural designability to contact density.
Collapse
Affiliation(s)
- Jesse D Bloom
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California, USA
| | | | | | | |
Collapse
|
45
|
Abstract
In this article, we explore the information content of molecular force-field calculations. We make use of exhaustive lattice models of molecular conformations and reduced alphabet sequences to determine the relative resolving power of pairwise interaction-based force fields. We find that sequence-specific interactions that operate over longer distances offer greater amounts of information than nearest-neighbor or non-sequence-specific interactions. In a companion article in this issue, we explored the information content of sequence alignment procedures and the calculation of gap penalties. Both articles have implications for protein and nucleic-acid computations.
Collapse
Affiliation(s)
- Tiba Aynechi
- Graduate Group in Biophysics, and Department of Pharmaceutical Chemistry, University of California-San Francisco, San Francisco, CA 94143, USA
| | | |
Collapse
|
46
|
Huang L, Ma X, Liang H. What is the origin of those common structures of protein-model chains? POLYMER 2006. [DOI: 10.1016/j.polymer.2005.11.036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
47
|
WILLIAMS PAULD, POLLOCK DAVIDD, GOLDSTEIN RICHARDA. SELECTIVE ADVANTAGE OF RECOMBINATION IN EVOLVING PROTEIN POPULATIONS: A LATTICE MODEL STUDY. INTERNATIONAL JOURNAL OF MODERN PHYSICS. C, PHYSICS AND COMPUTERS 2006; 17:75-90. [PMID: 25473139 PMCID: PMC4249953 DOI: 10.1142/s0129183106008959] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Recent research has attempted to clarify the contributions of several mutational processes, such as substitutions or homologous recombination. Simplistic, tractable protein models, which determine the compact native structure phenotype from the sequence genotype, are well-suited to such studies. In this paper, we use a lattice-protein model to examine the effects of point mutation and homologous recombination on evolving populations of proteins. We find that while the majority of mutation and recombination events are neutral or deleterious, recombination is far more likely to be beneficial. This results in a faster increase in fitness during evolution, although the final fitness level is not significantly changed. This transient advantage provides an evolutionary advantage to subpopulations that undergo recombination, allowing fixation of recombination to occur in the population.
Collapse
Affiliation(s)
- PAUL D. WILLIAMS
- Department of Chemistry, University of Michigan, Ann Arbor, Michigan, 48109, USA
| | - DAVID D. POLLOCK
- Department of Biological Sciences, Louisiana State University, Baton Rouge, Louisiana, 70803, USA
| | - RICHARD A. GOLDSTEIN
- Mathematical Biology, National Institute for Medical Research, The Ridgeway, Mill Hill, London MW7 1AA, UK
| |
Collapse
|
48
|
Zhang XS, Wang Y, Zhan ZW, Wu LY, Chen L. Exploring protein's optimal HP configurations by self-organizing mapping. J Bioinform Comput Biol 2005; 3:385-400. [PMID: 15852511 DOI: 10.1142/s0219720005001107] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2004] [Revised: 09/03/2004] [Accepted: 09/09/2004] [Indexed: 11/18/2022]
Abstract
Self-organizing map (SOM) has been used in protein folding prediction when the HP model is employed. The existing work uses a square-like shape lattice with l = m x n points to represent the optimal compact structure of a sequence of l amino acids. In this paper, a general l'-size sequence of amino acids is self-organized in a two dimensional lattice with l (> l') points. The obtained minimum configuration then has a flexible shape, in contrast to the compact structure limited in the lattice. To fulfil this extension, a new self-organizing map (SOM) technique is proposed to deal with the difficulty of the unsymmetric input and output spaces. New competition rules in the training phase are introduced and a local search method is applied to overcome the multi-mapping phenomena. Several HP benchmark examples with up to 36 amino acids are tested to verify the effectiveness of the proposed approach in this paper.
Collapse
Affiliation(s)
- Xiang-Sun Zhang
- Institute of Applied Mathematics, Academy of Mathematics and Systems Science, CAS, Beijing 100080, China.
| | | | | | | | | |
Collapse
|
49
|
Wilke CO, Bloom JD, Drummond DA, Raval A. Predicting the tolerance of proteins to random amino acid substitution. Biophys J 2005; 89:3714-20. [PMID: 16150971 PMCID: PMC1366941 DOI: 10.1529/biophysj.105.062125] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We have recently proposed a thermodynamic model that predicts the tolerance of proteins to random amino acid substitutions. Here we test this model against extensive simulations with compact lattice proteins, and find that the overall performance of the model is very good. We also derive an approximate analytic expression for the fraction of mutant proteins that fold stably to the native structure, Pf(m), as a function of the number of amino acid substitutions m, and present several methods to estimate the asymptotic behavior of Pf(m) for large m. We test the accuracy of all approximations against our simulation results, and find good overall agreement between the approximations and the simulation measurements.
Collapse
Affiliation(s)
- Claus O Wilke
- Keck Graduate Institute of Applied Life Sciences, Claremont, California, USA.
| | | | | | | |
Collapse
|
50
|
Schiemann R, Bachmann M, Janke W. Exact sequence analysis for three-dimensional hydrophobic-polar lattice proteins. J Chem Phys 2005; 122:114705. [PMID: 15836241 DOI: 10.1063/1.1814941] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We have exactly enumerated all sequences and conformations of hydrophobic-polar (HP) proteins with chains of up to 19 monomers on the simple cubic lattice. For two variants of the HP model, where only two types of monomers are distinguished, we determined and statistically analyzed designing sequences, i.e., sequences that have a nondegenerate ground state. Furthermore we were interested in characteristic thermodynamic properties of HP proteins with designing sequences. In order to be able to perform these exact studies, we applied an efficient enumeration method based on contact sets.
Collapse
|