1
|
Martin NS, Schaper S, Camargo CQ, Louis AA. Non-Poissonian Bursts in the Arrival of Phenotypic Variation Can Strongly Affect the Dynamics of Adaptation. Mol Biol Evol 2024; 41:msae085. [PMID: 38693911 PMCID: PMC11156200 DOI: 10.1093/molbev/msae085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 03/01/2024] [Accepted: 04/17/2024] [Indexed: 05/03/2024] Open
Abstract
Modeling the rate at which adaptive phenotypes appear in a population is a key to predicting evolutionary processes. Given random mutations, should this rate be modeled by a simple Poisson process, or is a more complex dynamics needed? Here we use analytic calculations and simulations of evolving populations on explicit genotype-phenotype maps to show that the introduction of novel phenotypes can be "bursty" or overdispersed. In other words, a novel phenotype either appears multiple times in quick succession or not at all for many generations. These bursts are fundamentally caused by statistical fluctuations and other structure in the map from genotypes to phenotypes. Their strength depends on population parameters, being highest for "monomorphic" populations with low mutation rates. They can also be enhanced by additional inhomogeneities in the mapping from genotypes to phenotypes. We mainly investigate the effect of bursts using the well-studied genotype-phenotype map for RNA secondary structure, but find similar behavior in a lattice protein model and in Richard Dawkins's biomorphs model of morphological development. Bursts can profoundly affect adaptive dynamics. Most notably, they imply that fitness differences play a smaller role in determining which phenotype fixes than would be the case for a Poisson process without bursts.
Collapse
Affiliation(s)
- Nora S Martin
- Rudolf Peierls Centre for Theoretical Physics, University of Oxford, Oxford OX1 3PU, UK
| | - Steffen Schaper
- Rudolf Peierls Centre for Theoretical Physics, University of Oxford, Oxford OX1 3PU, UK
| | - Chico Q Camargo
- Rudolf Peierls Centre for Theoretical Physics, University of Oxford, Oxford OX1 3PU, UK
- Faculty of Environment, Science and Economy, University of Exeter, Exeter EX4 4QF, UK
| | - Ard A Louis
- Rudolf Peierls Centre for Theoretical Physics, University of Oxford, Oxford OX1 3PU, UK
| |
Collapse
|
2
|
Prabh N, Tautz D. Frequent lineage-specific substitution rate changes support an episodic model for protein evolution. G3-GENES GENOMES GENETICS 2021; 11:6372692. [PMID: 34542594 PMCID: PMC8664490 DOI: 10.1093/g3journal/jkab333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Accepted: 09/13/2021] [Indexed: 12/04/2022]
Abstract
Since the inception of the molecular clock model for sequence evolution, the investigation of protein divergence has revolved around the question of a more or less constant change of amino acid sequences, with specific overall rates for each family. Although anomalies in clock-like divergence are well known, the assumption of a constant decay rate for a given protein family is usually taken as the null model for protein evolution. However, systematic tests of this null model at a genome-wide scale have lagged behind, despite the databases’ enormous growth. We focus here on divergence rate comparisons between very closely related lineages since this allows clear orthology assignments by synteny and reliable alignments, which are crucial for determining substitution rate changes. We generated a high-confidence dataset of syntenic orthologs from four ape species, including humans. We find that despite the appearance of an overall clock-like substitution pattern, several hundred protein families show lineage-specific acceleration and deceleration in divergence rates, or combinations of both in different lineages. Hence, our analysis uncovers a rather dynamic history of substitution rate changes, even between these closely related lineages, implying that one should expect that a large fraction of proteins will have had a history of episodic rate changes in deeper phylogenies. Furthermore, each of the lineages has a separate set of particularly fast diverging proteins. The genes with the highest percentage of branch-specific substitutions are ADCYAP1 in the human lineage (9.7%), CALU in chimpanzees (7.1%), SLC39A14 in the internal branch leading to humans and chimpanzees (4.1%), RNF128 in gorillas (9%), and S100Z in gibbons (15.2%). The mutational pattern in ADCYAP1 suggests a biased mutation process, possibly through asymmetric gene conversion effects. We conclude that a null model of constant change can be problematic for predicting the evolutionary trajectories of individual proteins.
Collapse
Affiliation(s)
- Neel Prabh
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, August-Thienemann-Str. 2, 24306 Plön, Germany
| | - Diethard Tautz
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, August-Thienemann-Str. 2, 24306 Plön, Germany
| |
Collapse
|
3
|
Park S, Kumar P, Shi A, Mou B. Population genetics and genome-wide association studies provide insights into the influence of selective breeding on genetic variation in lettuce. THE PLANT GENOME 2021; 14:e20086. [PMID: 33629537 DOI: 10.1002/tpg2.20086] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Accepted: 12/17/2020] [Indexed: 05/10/2023]
Abstract
Genetic diversity is an important resource in crop breeding to improve cultivars with desirable traits. Selective breeding can lead to a reduction of genetic diversity. However, our understanding on this subject remains limited in lettuce (Lactuca sativa L.). Genotyping-by-sequencing (GBS) can provide a reduced version of the genome as a cost-effective method to identify genetic variants across the genome. We genotyped a diverse set of 441 lettuce accessions using the GBS method. Phylogenetic and population genetic analyses indicated substantial genetic divergence among four horticultural types of lettuce: butterhead, crisphead, leaf, and romaine. Genetic-diversity estimates between and within the four types indicated that the crisphead type was the most differentiated from other types, whereas its population was the most homogenous with the slowest linkage disequilibrium (LD) decay among the four types. These results suggested that crisphead lettuces had relatively less genetic variation across the genome as well as low gene flow from other types. We identified putative selective sweep regions that showed low genetic variation in the crisphead type. Genome-wide association study (GWAS) and quantitative trait loci (QTL) analyses provided evidence that these genomic regions were, in part, associated with delayed bolting, implicating the positive selection of delayed bolting in reducing variation. Our findings enhance the current understanding of genetic diversity and the impacts of selective breeding on patterning genetic variation in lettuce.
Collapse
Affiliation(s)
- Sunchung Park
- USDA-Agricultural Research Service, Crop Improvement and Protection Research Unit, Salinas, CA, 93905, USA
| | - Pawan Kumar
- USDA-Agricultural Research Service, Crop Improvement and Protection Research Unit, Salinas, CA, 93905, USA
| | - Ainong Shi
- Department of Horticulture, University of Arkansas, Fayetteville, AR, 72701, USA
| | - Beiquan Mou
- USDA-Agricultural Research Service, Crop Improvement and Protection Research Unit, Salinas, CA, 93905, USA
| |
Collapse
|
4
|
Genetic variation in the Mauritian cynomolgus macaque population reflects variation in the human population. Gene 2021; 787:145648. [PMID: 33848572 DOI: 10.1016/j.gene.2021.145648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Revised: 03/23/2021] [Accepted: 04/07/2021] [Indexed: 11/21/2022]
Abstract
The cynomolgus macaque is an important species for preclinical research, however the extent of genetic variation in this population and its similarity to the human population is not well understood. Exome sequencing was conducted for 101 cynomolgus macaques to characterize genetic variation. The variant distribution frequency was 7.81 variants per kilobase across the sequenced regions, with a total of 2,770,009 single nucleotide variants identified from 2,996,041 loci. A large portion (85.6%) had minor allele frequencies greater than 5%. Enriched pathways for genes with high genetic diversity (≥10 variants per kilobase) were those involving signaling peptides and immune response. Compared to human, the variant distribution frequency and nucleotide diversity in the macaque exome was approximately 4 times greater; however the ratio of non-synonymous to synonymous variants was similar (0.735 and 0.831, respectively). Understanding genetic variability in cynomolgus macaques will enable better interpretation and human translation of phenotypic variability in this species.
Collapse
|
5
|
Schneider D, Ramos AG, Córdoba‐Aguilar A. Multigenerational experimental simulation of climate change on an economically important insect pest. Ecol Evol 2020; 10:12893-12909. [PMID: 33304502 PMCID: PMC7713942 DOI: 10.1002/ece3.6847] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Revised: 08/19/2020] [Accepted: 08/25/2020] [Indexed: 12/22/2022] Open
Abstract
Long-term multigenerational experimental simulations of climate change on insect pests of economically and socially important crops are crucial to anticipate challenges for feeding humanity in the not-so-far future. Mexican bean weevil Zabrotes subfasciatus, is a worldwide pest that attacks the common bean Phaseolus vulgaris seeds, in crops and storage. We designed a long term (i.e., over 10 generations), experimental simulation of climate change by increasing temperature and CO2 air concentration in controlled conditions according to model predictions for 2100. Higher temperature and CO2 concentrations favored pest's egg-to-adult development survival, even at high female fecundity. It also induced a reduction of fat storage and increase of protein content but did not alter body size. After 10 generations of simulation, genetic adaptation was detected for total lipid content only, however, other traits showed signs of such process. Future experimental designs and methods similar to ours, are key for studying long-term effects of climate change through multigenerational experimental designs.
Collapse
Affiliation(s)
- David Schneider
- Departamento de Ecología EvolutivaInstituto de EcologíaUniversidad Nacional Autónoma de MéxicoMéxicoMexico
| | - Alejandra G. Ramos
- Facultad de CienciasUniversidad Autónoma de Baja CaliforniaEnsenadaMexico
| | - Alex Córdoba‐Aguilar
- Departamento de Ecología EvolutivaInstituto de EcologíaUniversidad Nacional Autónoma de MéxicoMéxicoMexico
| |
Collapse
|
6
|
Rizzato F, Zamuner S, Pagnani A, Laio A. A common root for coevolution and substitution rate variability in protein sequence evolution. Sci Rep 2019; 9:18032. [PMID: 31792239 PMCID: PMC6888882 DOI: 10.1038/s41598-019-53958-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Accepted: 10/25/2019] [Indexed: 11/09/2022] Open
Abstract
We introduce a simple model that describes the average occurrence of point variations in a generic protein sequence. This model is based on the idea that mutations are more likely to be fixed at sites in contact with others that have mutated in the recent past. Therefore, we extend the usual assumptions made in protein coevolution by introducing a time dumping on the effect of a substitution on its surrounding and makes correlated substitutions happen in avalanches localized in space and time. The model correctly predicts the average correlation of substitutions as a function of their distance along the sequence. At the same time, it predicts an among-site distribution of the number of substitutions per site highly compatible with a negative binomial, consistently with experimental data. The promising outcomes achieved with this model encourage the application of the same ideas in the field of pairwise and multiple sequence alignment.
Collapse
Affiliation(s)
- Francesca Rizzato
- Scuola Internazionale Superiore di Studi Avanzati (SISSA), Trieste, Italy
| | - Stefano Zamuner
- Scuola Internazionale Superiore di Studi Avanzati (SISSA), Trieste, Italy
| | - Andrea Pagnani
- DISAT, Politecnico di Torino, Torino, Italy.,Italian Institute for Genomic Medicine (IIGM), Torino, Italy.,Istituto Nazionale di Fisica Nucleare (INFN) Sezione di Torino, Torino, Italy
| | - Alessandro Laio
- Scuola Internazionale Superiore di Studi Avanzati (SISSA), Trieste, Italy. .,The Abdus Salam International Centre for Theoretical Physics (ICTP), Trieste, Italy.
| |
Collapse
|
7
|
How Often Do Protein Genes Navigate Valleys of Low Fitness? Genes (Basel) 2019; 10:genes10040283. [PMID: 30965625 PMCID: PMC6523826 DOI: 10.3390/genes10040283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Revised: 03/27/2019] [Accepted: 04/02/2019] [Indexed: 11/17/2022] Open
Abstract
To escape from local fitness peaks, a population must navigate across valleys of low fitness. How these transitions occur, and what role they play in adaptation, have been subjects of active interest in evolutionary genetics for almost a century. However, to our knowledge, this problem has never been addressed directly by considering the evolution of a gene, or group of genes, as a whole, including the complex effects of fitness interactions among multiple loci. Here, we use a precise model of protein fitness to compute the probability P ( s , Δ t ) that an allele, randomly sampled from a population at time t, has crossed a fitness valley of depth s during an interval t - Δ t , t in the immediate past. We study populations of model genes evolving under equilibrium conditions consistent with those in mammalian mitochondria. From this data, we estimate that genes encoding small protein motifs navigate fitness valleys of depth 2 N s ≳ 30 with probability P ≳ 0 . 1 on a time scale of human evolution, where N is the (mitochondrial) effective population size. The results are consistent with recent findings for Watson⁻Crick switching in mammalian mitochondrial tRNA molecules.
Collapse
|
8
|
Echave J, Wilke CO. Biophysical Models of Protein Evolution: Understanding the Patterns of Evolutionary Sequence Divergence. Annu Rev Biophys 2017; 46:85-103. [PMID: 28301766 DOI: 10.1146/annurev-biophys-070816-033819] [Citation(s) in RCA: 75] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
For decades, rates of protein evolution have been interpreted in terms of the vague concept of functional importance. Slowly evolving proteins or sites within proteins were assumed to be more functionally important and thus subject to stronger selection pressure. More recently, biophysical models of protein evolution, which combine evolutionary theory with protein biophysics, have completely revolutionized our view of the forces that shape sequence divergence. Slowly evolving proteins have been found to evolve slowly because of selection against toxic misfolding and misinteractions, linking their rate of evolution primarily to their abundance. Similarly, most slowly evolving sites in proteins are not directly involved in function, but mutating these sites has a large impact on protein structure and stability. In this article, we review the studies in the emerging field of biophysical protein evolution that have shaped our current understanding of sequence divergence patterns. We also propose future research directions to develop this nascent field.
Collapse
Affiliation(s)
- Julian Echave
- Escuela de Ciencia y Tecnología, Universidad Nacional de San Martín, 1650 San Martín, Buenos Aires, Argentina; .,Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Argentina
| | - Claus O Wilke
- Department of Integrative Biology, The University of Texas at Austin, Texas 78712;
| |
Collapse
|
9
|
Nelson ED, Grishin NV. Evolution of off-lattice model proteins under ligand binding constraints. Phys Rev E 2016; 94:022410. [PMID: 27627338 DOI: 10.1103/physreve.94.022410] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2016] [Indexed: 12/12/2022]
Abstract
We investigate protein evolution using an off-lattice polymer model evolved to imitate the behavior of small enzymes. Model proteins evolve through mutations to nucleotide sequences (including insertions and deletions) and are selected to fold and maintain a specific binding site compatible with a model ligand. We show that this requirement is, in itself, sufficient to maintain an ordered folding domain, and we compare it to the requirement of folding an ordered (but otherwise unrestricted) domain. We measure rates of amino acid change as a function of local environment properties such as solvent exposure, packing density, and distance from the active site, as well as overall rates of sequence and structure change, both along and among model lineages in star phylogenies. The model recapitulates essentially all of the behavior found in protein phylogenetic analyses, and predicts that amino acid substitution rates vary linearly with distance from the binding site.
Collapse
Affiliation(s)
- Erik D Nelson
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, 6001 Forest Park Blvd., Room ND10.124, Dallas, Texas 75235-9050, USA
| | - Nick V Grishin
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, 6001 Forest Park Blvd., Room ND10.124, Dallas, Texas 75235-9050, USA
| |
Collapse
|
10
|
Houchmandzadeh B, Vallade M. A Simple, General Result for the Variance of Substitution Number in Molecular Evolution. Mol Biol Evol 2016; 33:1858-69. [PMID: 27189545 PMCID: PMC4915360 DOI: 10.1093/molbev/msw063] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The number of substitutions (of nucleotides, amino acids, etc.) that take place during the evolution of a sequence is a stochastic variable of fundamental importance in the field of molecular evolution. Although the mean number of substitutions during molecular evolution of a sequence can be estimated for a given substitution model, no simple solution exists for the variance of this random variable. We show in this article that the computation of the variance is as simple as that of the mean number of substitutions for both short and long times. Apart from its fundamental importance, this result can be used to investigate the dispersion index R, that is, the ratio of the variance to the mean substitution number, which is of prime importance in the neutral theory of molecular evolution. By investigating large classes of substitution models, we demonstrate that although R≥1, to obtain R significantly larger than unity necessitates in general additional hypotheses on the structure of the substitution model.
Collapse
|
11
|
Manrubia S, Cuesta JA. Evolution on neutral networks accelerates the ticking rate of the molecular clock. J R Soc Interface 2015; 12:20141010. [PMID: 25392402 DOI: 10.1098/rsif.2014.1010] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Large sets of genotypes give rise to the same phenotype, because phenotypic expression is highly redundant. Accordingly, a population can accept mutations without altering its phenotype, as long as the genotype mutates into another one on the same set. By linking every pair of genotypes that are mutually accessible through mutation, genotypes organize themselves into neutral networks (NNs). These networks are known to be heterogeneous and assortative, and these properties affect the evolutionary dynamics of the population. By studying the dynamics of populations on NNs with arbitrary topology, we analyse the effect of assortativity, of NN (phenotype) fitness and of network size. We find that the probability that the population leaves the network is smaller the longer the time spent on it. This progressive 'phenotypic entrapment' entails a systematic increase in the overdispersion of the process with time and an acceleration in the fixation rate of neutral mutations. We also quantify the variation of these effects with the size of the phenotype and with its fitness relative to that of neighbouring alternatives.
Collapse
Affiliation(s)
- Susanna Manrubia
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain Systems Biology Programme, National Centre for Biotechnology (CSIC), c/ Darwin 3, 28049 Madrid, Spain
| | - José A Cuesta
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain Department of Mathematics, Universidad Carlos III de Madrid, 28911 Leganés, Madrid, Spain Instituto de Biocomputación y Física de Sistemas Complejos (BIFI), Universidad de Zaragoza, 50009 Zaragoza, Spain
| |
Collapse
|
12
|
Nelson ED, Grishin NV. Anomalous diffusion in neutral evolution of model proteins. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2015; 91:060701. [PMID: 26172648 DOI: 10.1103/physreve.91.060701] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/10/2015] [Indexed: 06/04/2023]
Abstract
Protein evolution is frequently explored using minimalist polymer models, however, little attention has been given to the problem of structural drift, or diffusion. Here, we study neutral evolution of small protein motifs using an off-lattice heteropolymer model in which individual monomers interact as low-resolution amino acids. In contrast to most earlier models, both the length and folded structure of the polymers are permitted to change. To describe structural change, we compute the mean-square distance (MSD) between monomers in homologous folds separated by n neutral mutations. We find that structural change is episodic, and, averaged over lineages (for example, those extending from a single sequence), exhibits a power-law dependence on n. We show that this exponent depends on the alignment method used, and we analyze the distribution of waiting times between neutral mutations. The latter are more disperse than for models required to maintain a specific fold, but exhibit a similar power-law tail.
Collapse
Affiliation(s)
- Erik D Nelson
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, 6001 Forest Park Blvd., Room ND10.124, Dallas, Texas 75235-9050, USA
| | - Nick V Grishin
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, 6001 Forest Park Blvd., Room ND10.124, Dallas, Texas 75235-9050, USA
| |
Collapse
|
13
|
Faure G, Koonin EV. Universal distribution of mutational effects on protein stability, uncoupling of protein robustness from sequence evolution and distinct evolutionary modes of prokaryotic and eukaryotic proteins. Phys Biol 2015; 12:035001. [PMID: 25927823 DOI: 10.1088/1478-3975/12/3/035001] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Robustness to destabilizing effects of mutations is thought of as a key factor of protein evolution. The connections between two measures of robustness, the relative core size and the computationally estimated effect of mutations on protein stability (ΔΔG), protein abundance and the selection pressure on protein-coding genes (dN/dS) were analyzed for the organisms with a large number of available protein structures including four eukaryotes, two bacteria and one archaeon. The distribution of the effects of mutations in the core on protein stability is universal and indistinguishable in eukaryotes and bacteria, centered at slightly destabilizing amino acid replacements, and with a heavy tail of more strongly destabilizing replacements. The distribution of mutational effects in the hyperthermophilic archaeon Thermococcus gammatolerans is significantly shifted toward strongly destabilizing replacements which is indicative of stronger constraints that are imposed on proteins in hyperthermophiles. The median effect of mutations is strongly, positively correlated with the relative core size, in evidence of the congruence between the two measures of protein robustness. However, both measures show only limited correlations to the expression level and selection pressure on protein-coding genes. Thus, the degree of robustness reflected in the universal distribution of mutational effects appears to be a fundamental, ancient feature of globular protein folds whereas the observed variations are largely neutral and uncoupled from short term protein evolution. A weak anticorrelation between protein core size and selection pressure is observed only for surface residues in prokaryotes but a stronger anticorrelation is observed for all residues in eukaryotic proteins. This substantial difference between proteins of prokaryotes and eukaryotes is likely to stem from the demonstrable higher compactness of prokaryotic proteins.
Collapse
Affiliation(s)
- Guilhem Faure
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | |
Collapse
|
14
|
McCandlish DM, Stoltzfus A. Modeling evolution using the probability of fixation: history and implications. QUARTERLY REVIEW OF BIOLOGY 2014; 89:225-52. [PMID: 25195318 DOI: 10.1086/677571] [Citation(s) in RCA: 123] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
Many models of evolution calculate the rate of evolution by multiplying the rate at which new mutations originate within a population by a probability of fixation. Here we review the historical origins, contemporary applications, and evolutionary implications of these "origin-fixation" models, which are widely used in evolutionary genetics, molecular evolution, and phylogenetics. Origin-fixation models were first introduced in 1969, in association with an emerging view of "molecular" evolution. Early origin-fixation models were used to calculate an instantaneous rate of evolution across a large number of independently evolving loci; in the 1980s and 1990s, a second wave of origin-fixation models emerged to address a sequence of fixation events at a single locus. Although origin fixation models have been applied to a broad array of problems in contemporary evolutionary research, their rise in popularity has not been accompanied by an increased appreciation of their restrictive assumptions or their distinctive implications. We argue that origin-fixation models constitute a coherent theory of mutation-limited evolution that contrasts sharply with theories of evolution that rely on the presence of standing genetic variation. A major unsolved question in evolutionary biology is the degree to which these models provide an accurate approximation of evolution in natural populations.
Collapse
|
15
|
Snir S, Wolf YI, Koonin EV. Universal pacemaker of genome evolution in animals and fungi and variation of evolutionary rates in diverse organisms. Genome Biol Evol 2014; 6:1268-78. [PMID: 24812293 PMCID: PMC4079209 DOI: 10.1093/gbe/evu091] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
Gene evolution is traditionally considered within the framework of the molecular clock (MC) model whereby each gene is characterized by an approximately constant rate of evolution. Recent comparative analysis of numerous phylogenies of prokaryotic genes has shown that a different model of evolution, denoted the Universal PaceMaker (UPM), which postulates conservation of relative, rather than absolute evolutionary rates, yields a better fit to the phylogenetic data. Here, we show that the UPM model is a better fit than the MC for genome wide sets of phylogenetic trees from six species of Drosophila and nine species of yeast, with extremely high statistical significance. Unlike the prokaryotic phylogenies that include distant organisms and multiple horizontal gene transfers, these are simple data sets that cover groups of closely related organisms and consist of gene trees with the same topology as the species tree. The results indicate that both lineage-specific and gene-specific rates are important in genome evolution but the lineage-specific contribution is greater. Similar to the MC, the gene evolution rates under the UPM are strongly overdispersed, approximately 2-fold compared with the expectation from sampling error alone. However, we show that neither Drosophila nor yeast genes form distinct clusters in the tree space. Thus, the gene-specific deviations from the UPM, although substantial, are uncorrelated and most likely depend on selective factors that are largely unique to individual genes. Thus, the UPM appears to be a key feature of genome evolution across the history of cellular life.
Collapse
Affiliation(s)
- Sagi Snir
- Department of Evolutionary and Environmental Biology and The Institute of Evolution, University of Haifa, Israel
| | - Yuri I Wolf
- National Center for Biotechnology Information, NLM, National Institutes of Health, Bethesda, MD
| | - Eugene V Koonin
- National Center for Biotechnology Information, NLM, National Institutes of Health, Bethesda, MD
| |
Collapse
|
16
|
Wolf YI, Snir S, Koonin EV. Stability along with extreme variability in core genome evolution. Genome Biol Evol 2013; 5:1393-402. [PMID: 23821522 PMCID: PMC3730350 DOI: 10.1093/gbe/evt098] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
The shape of the distribution of evolutionary distances between orthologous genes in pairs of closely related genomes is universal throughout the entire range of cellular life forms. The near invariance of this distribution across billions of years of evolution can be accounted for by the Universal Pace Maker (UPM) model of genome evolution that yields a significantly better fit to the phylogenetic data than the Molecular Clock (MC) model. Unlike the MC, the UPM model does not assume constant gene-specific evolutionary rates but rather postulates that, in each evolving lineage, the evolutionary rates of all genes change (approximately) in unison although the pacemakers of different lineages are not necessarily synchronized. Here, we dissect the nearly constant evolutionary rate distribution by comparing the genome-wide relative rates of evolution of individual genes in pairs or triplets of closely related genomes from diverse bacterial and archaeal taxa. We show that, although the gene-specific relative rate is an important feature of genome evolution that explains more than half of the variance of the evolutionary distances, the ranges of relative rate variability are extremely broad even for universal genes. Because of this high variance, the gene-specific rate is a poor predictor of the conservation rank for any gene in any particular lineage.
Collapse
Affiliation(s)
- Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | | | | |
Collapse
|
17
|
Çetinbaş M, Shakhnovich EI. Catalysis of protein folding by chaperones accelerates evolutionary dynamics in adapting cell populations. PLoS Comput Biol 2013; 9:e1003269. [PMID: 24244114 PMCID: PMC3820506 DOI: 10.1371/journal.pcbi.1003269] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2013] [Accepted: 08/23/2013] [Indexed: 11/19/2022] Open
Abstract
Although molecular chaperones are essential components of protein homeostatic machinery, their mechanism of action and impact on adaptation and evolutionary dynamics remain controversial. Here we developed a physics-based ab initio multi-scale model of a living cell for population dynamics simulations to elucidate the effect of chaperones on adaptive evolution. The 6-loci genomes of model cells encode model proteins, whose folding and interactions in cellular milieu can be evaluated exactly from their genome sequences. A genotype-phenotype relationship that is based on a simple yet non-trivially postulated protein-protein interaction (PPI) network determines the cell division rate. Model proteins can exist in native and molten globule states and participate in functional and all possible promiscuous non-functional PPIs. We find that an active chaperone mechanism, whereby chaperones directly catalyze protein folding, has a significant impact on the cellular fitness and the rate of evolutionary dynamics, while passive chaperones, which just maintain misfolded proteins in soluble complexes have a negligible effect on the fitness. We find that by partially releasing the constraint on protein stability, active chaperones promote a deeper exploration of sequence space to strengthen functional PPIs, and diminish the non-functional PPIs. A key experimentally testable prediction emerging from our analysis is that down-regulation of chaperones that catalyze protein folding significantly slows down the adaptation dynamics. Molecular chaperones or heat-shock proteins are essential components of protein homeostatic machinery in all three domains of life, whose role is not only to prevent protein aggregation but also catalyze the protein folding process by decreasing the energetic barrier for folding. Importantly, chaperones have often been implicated as phenotypic capacitors since they buffer the deleterious effects of mutations, promote genetic diversity, and thus speed up adaptive evolution. Here we explore computationally the consequences of chaperone activity in cytoplasm via long-time evolutionary dynamics simulations. We use a 6-loci multi scale model of cell populations, where the fitness of each cell is determined from its genome, based on statistical mechanical principles of protein folding and protein-protein interactions. We find that by catalyzing protein folding chaperones buffer the deleterious effect of mutations on folding stability and thus open up a sequence space for efficient and simultaneous optimization of multiple molecular traits determining the cellular fitness. As a result, chaperones dramatically accelerate adaptation dynamics.
Collapse
Affiliation(s)
- Murat Çetinbaş
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Eugene I. Shakhnovich
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts, United States of America
- * E-mail:
| |
Collapse
|
18
|
McCandlish DM. On the findability of genotypes. Evolution 2013; 67:2592-603. [PMID: 24033169 DOI: 10.1111/evo.12128] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2012] [Accepted: 03/14/2013] [Indexed: 02/02/2023]
Abstract
Can we define a measure that describes how easy or difficult it is for a population to evolve to a specific genotype? For populations evolving under weak mutation on a time-invariant fitness landscape, I argue that one appropriate measure is the expected waiting time, starting from equilibrium, for a population to become fixed for a given genotype. Under this definition for the "findability" of genotypes, I show that for any pair of genotypes (1) a population at equilibrium is always more likely to fix at the more findable before the less findable genotype and (2) the expected time to evolve from the more findable to the less findable genotype is always greater that the expected time to evolve in the opposite direction. Although increasing the fitness of a genotype always increases its findability, in general there is no simple relationship between the rank ordering of genotypes by fitness and the rank ordering of genotypes by findability. I also present a method for quantifying the relative contributions of mutation, selection, substitution rate, and probability of reversion to a genotype's findability.
Collapse
Affiliation(s)
- David M McCandlish
- Biology Department, Duke University, Box 90338, Durham, North Carolina, 27708; Current Address: Lynch Labs, Room 204K, Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania, 19104.
| |
Collapse
|
19
|
Snir S, Wolf YI, Koonin EV. Universal pacemaker of genome evolution. PLoS Comput Biol 2012; 8:e1002785. [PMID: 23209393 PMCID: PMC3510094 DOI: 10.1371/journal.pcbi.1002785] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2012] [Accepted: 10/02/2012] [Indexed: 11/18/2022] Open
Abstract
A fundamental observation of comparative genomics is that the distribution of evolution rates across the complete sets of orthologous genes in pairs of related genomes remains virtually unchanged throughout the evolution of life, from bacteria to mammals. The most straightforward explanation for the conservation of this distribution appears to be that the relative evolution rates of all genes remain nearly constant, or in other words, that evolutionary rates of different genes are strongly correlated within each evolving genome. This correlation could be explained by a model that we denoted Universal PaceMaker (UPM) of genome evolution. The UPM model posits that the rate of evolution changes synchronously across genome-wide sets of genes in all evolving lineages. Alternatively, however, the correlation between the evolutionary rates of genes could be a simple consequence of molecular clock (MC). We sought to differentiate between the MC and UPM models by fitting thousands of phylogenetic trees for bacterial and archaeal genes to supertrees that reflect the dominant trend of vertical descent in the evolution of archaea and bacteria and that were constrained according to the two models. The goodness of fit for the UPM model was better than the fit for the MC model, with overwhelming statistical significance, although similarly to the MC, the UPM is strongly overdispersed. Thus, the results of this analysis reveal a universal, genome-wide pacemaker of evolution that could have been in operation throughout the history of life.
Collapse
Affiliation(s)
- Sagi Snir
- Department of Evolutionary and Environmental Biology and The Institute of Evolution, University of Haifa Mount Carmel, Haifa, Israel
| | | | | |
Collapse
|
20
|
Yuan Q, Zhou Z, Lindell SG, Higley JD, Ferguson B, Thompson RC, Lopez JF, Suomi SJ, Baghal B, Baker M, Mash DC, Barr CS, Goldman D. The rhesus macaque is three times as diverse but more closely equivalent in damaging coding variation as compared to the human. BMC Genet 2012; 13:52. [PMID: 22747632 PMCID: PMC3426462 DOI: 10.1186/1471-2156-13-52] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2011] [Accepted: 05/18/2012] [Indexed: 11/23/2022] Open
Abstract
Background As a model organism in biomedicine, the rhesus macaque (Macaca mulatta) is the most widely used nonhuman primate. Although a draft genome sequence was completed in 2007, there has been no systematic genome-wide comparison of genetic variation of this species to humans. Comparative analysis of functional and nonfunctional diversity in this highly abundant and adaptable non-human primate could inform its use as a model for human biology, and could reveal how variation in population history and size alters patterns and levels of sequence variation in primates. Results We sequenced the mRNA transcriptome and H3K4me3-marked DNA regions in hippocampus from 14 humans and 14 rhesus macaques. Using equivalent methodology and sampling spaces, we identified 462,802 macaque SNPs, most of which were novel and disproportionately located in the functionally important genomic regions we had targeted in the sequencing. At least one SNP was identified in each of 16,797 annotated macaque genes. Accuracy of macaque SNP identification was conservatively estimated to be >90%. Comparative analyses using SNPs equivalently identified in the two species revealed that rhesus macaque has approximately three times higher SNP density and average nucleotide diversity as compared to the human. Based on this level of diversity, the effective population size of the rhesus macaque is approximately 80,000 which contrasts with an effective population size of less than 10,000 for humans. Across five categories of genomic regions, intergenic regions had the highest SNP density and average nucleotide diversity and CDS (coding sequences) the lowest, in both humans and macaques. Although there are more coding SNPs (cSNPs) per individual in macaques than in humans, the ratio of dN/dS is significantly lower in the macaque. Furthermore, the number of damaging nonsynonymous cSNPs (have damaging effects on protein functions from PolyPhen-2 prediction) in the macaque is more closely equivalent to that of the human. Conclusions This large panel of newly identified macaque SNPs enriched for functionally significant regions considerably expands our knowledge of genetic variation in the rhesus macaque. Comparative analysis reveals that this widespread, highly adaptable species is approximately three times as diverse as the human but more closely equivalent in damaging variation.
Collapse
Affiliation(s)
- Qiaoping Yuan
- Laboratory of Neurogenetics, National Institute on Alcohol Abuse and Alcoholism, NIH, Bethesda, MD 20892, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
21
|
Epistasis increases the rate of conditionally neutral substitution in an adapting population. Genetics 2011; 187:1139-52. [PMID: 21288876 DOI: 10.1534/genetics.110.125997] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
Abstract
Kimura observed that the rate of neutral substitution should equal the neutral mutation rate. This classic result is central to our understanding of molecular evolution, and it continues to influence phylogenetics, genomics, and the interpretation of evolution experiments. By demonstrating that neutral mutations substitute at a rate independent of population size and selection at linked sites, Kimura provided an influential justification for the idea of a molecular clock and emphasized the importance of genetic drift in shaping molecular evolution. But when epistasis among sites is common, as numerous empirical studies suggest, do neutral mutations substitute according to Kimura's expectation? Here we study simulated, asexual populations of RNA molecules, and we observe that conditionally neutral mutations--i.e., mutations that do not alter the fitness of the individual in which they arise, but that may alter the fitness effects of subsequent mutations--substitute much more often than expected while a population is adapting. We quantify these effects using a simple population-genetic model that elucidates how the substitution rate at conditionally neutral sites depends on the population size, mutation rate, strength of selection, and prevalence of epistasis. We discuss the implications of these results for our understanding of the molecular clock, and for the interpretation of molecular variation in laboratory and natural populations.
Collapse
|
22
|
Yang JR, Zhuang SM, Zhang J. Impact of translational error-induced and error-free misfolding on the rate of protein evolution. Mol Syst Biol 2011; 6:421. [PMID: 20959819 PMCID: PMC2990641 DOI: 10.1038/msb.2010.78] [Citation(s) in RCA: 76] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2010] [Accepted: 08/31/2010] [Indexed: 11/26/2022] Open
Abstract
Theoretical calculations suggest that, in addition to translational error-induced protein misfolding, a non-negligible fraction of misfolded proteins are error free. We propose that the anticorrelation between the expression level of a protein and its rate of sequence evolution be explained by an overarching protein-misfolding-avoidance hypothesis that includes selection against both error-induced and error-free protein misfolding, and verify this model by a molecular-level evolutionary simulation. We provide strong empirical evidence for the protein-misfolding-avoidance hypothesis, including a positive correlation between protein expression level and stability, enrichment of misfolding-minimizing codons and amino acids in highly expressed genes, and stronger evolutionary conservation of residues in which nonsynonymous changes are more likely to increase protein misfolding.
The rate of protein sequence evolution has long been of central interest to molecular evolutionists. Different proteins of the same species evolve at vastly different rates, which is commonly explained by a variation in functional constraint among different proteins (Kimura and Ohta, 1974). However, it is unclear how to quantify the functional constraint of a protein from the knowledge of its function. In the past decade, various types of genomic data from model organisms have been examined to look for the determinants of the rate of protein sequence evolution. The most unexpected discovery was a very strong anticorrelation between the expression level and evolutionary rate of a protein (E–R anticorrelation) (Pal et al, 2001). The prevailing explanation of the E–R anticorrelation is the translational robustness hypothesis (Drummond et al, 2005). This hypothesis posits that mistranslation induces protein misfolding, which is toxic to cells (Figure 1). Consequently, highly expressed proteins are under stronger pressures to be translationally robust and thus are more constrained in sequence evolution. However, the impact of the other source of misfolded proteins, translational error-free proteins (Figure 1), has not been evaluated. By theoretical calculation, computer simulation, and empirical data analysis, we examined the role of selection against both error-induced and error-free protein misfolding in creating the E–R correlation. Our theoretical calculations suggested that a non-negligible fraction of misfolded proteins are error free. We estimated that when a protein is not very stable, on average ∼20% of misfolded molecules are error free. However, when a protein is very stable, this fraction reduces to ∼5%, which is probably a result of natural selection against protein misfolding. We conducted a molecular-level evolutionary simulation (Figure 2A) using three different schemes: error-induced misfolding only, error-free misfolding only, and both types of misfolding. As expected, results from the first simulation are similar to those from a previous study that considers only error-induced misfolding (Drummond and Wilke, 2008). Interestingly, the second and third simulations can also generate the same patterns, including a positive correlation between the protein expression level and the unfolding energy (ΔG) of the error-free protein (Figure 2B), a negative correlation between the expression level and the fraction of protein molecules that misfold after being mistranslated (Figure 2C), a negative correlation between ΔG and the evolutionary rate (Figure 2D), and a negative correlation between the expression level and the evolutionary rate (i.e., the E–R anticorrelation) (Figure 2E). Furthermore, we found that selection against protein misfolding is more effective in reducing error-free misfolding than error-induced misfolding. Based on these results, we propose that an overarching protein-misfolding-avoidance hypothesis that includes both sources of misfolding is superior to the prevailing translational robustness hypothesis, which considers only error-induced misfolding. We tested three key predictions of the protein-misfolding-avoidance hypotheses using yeast data. First, we showed that, consistent with our prediction, a positive correlation exists between the protein expression level and stability, which is measured by the unfolding energy or melting temperature. In addition, protein expression level is negatively correlated with protein aggregation propensity. Second, we found that codons minimizing protein misfolding are used more frequently in highly expressed proteins than in lowly expressed ones. Third, we showed that, within the same protein, amino acid residues in which random nonsynonymous mutations are more likely to increase protein misfolding are evolutionarily more conserved. Together, these results provide unambiguous evidence that avoidance of both error-induced and error-free protein misfolding is a major source of the E–R anticorrelation and that protein stability and mistranslation have important roles in protein evolution. What determines the rate of protein evolution is a fundamental question in biology. Recent genomic studies revealed a surprisingly strong anticorrelation between the expression level of a protein and its rate of sequence evolution. This observation is currently explained by the translational robustness hypothesis in which the toxicity of translational error-induced protein misfolding selects for higher translational robustness of more abundant proteins, which constrains sequence evolution. However, the impact of error-free protein misfolding has not been evaluated. We estimate that a non-negligible fraction of misfolded proteins are error free and demonstrate by a molecular-level evolutionary simulation that selection against protein misfolding results in a greater reduction of error-free misfolding than error-induced misfolding. Thus, an overarching protein-misfolding-avoidance hypothesis that includes both sources of misfolding is superior to the translational robustness hypothesis. We show that misfolding-minimizing amino acids are preferentially used in highly abundant yeast proteins and that these residues are evolutionarily more conserved than other residues of the same proteins. These findings provide unambiguous support to the role of protein-misfolding-avoidance in determining the rate of protein sequence evolution.
Collapse
Affiliation(s)
- Jian-Rong Yang
- Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, PR China
| | | | | |
Collapse
|
23
|
Bhattacherjee A, Biswas P. Neutrality and evolvability of designed protein sequences. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2010; 82:011906. [PMID: 20866647 DOI: 10.1103/physreve.82.011906] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/17/2009] [Revised: 03/25/2010] [Indexed: 05/29/2023]
Abstract
The effect of foldability on protein's evolvability is analyzed by a two-prong approach consisting of a self-consistent mean-field theory and Monte Carlo simulations. Theory and simulation models representing protein sequences with binary patterning of amino acid residues compatible with a particular foldability criteria are used. This generalized foldability criterion is derived using the high temperature cumulant expansion approximating the free energy of folding. The effect of cumulative point mutations on these designed proteins is studied under neutral condition. The robustness, protein's ability to tolerate random point mutations is determined with a selective pressure of stability (ΔΔG) for the theory designed sequences, which are found to be more robust than that of Monte Carlo and mean-field-biased Monte Carlo generated sequences. The results show that this foldability criterion selects viable protein sequences more effectively compared to the Monte Carlo method, which has a marked effect on how the selective pressure shapes the evolutionary sequence space. These observations may impact de novo sequence design and its applications in protein engineering.
Collapse
|
24
|
Sammet SG, Bastolla U, Porto M. Comparison of translation loads for standard and alternative genetic codes. BMC Evol Biol 2010; 10:178. [PMID: 20546599 PMCID: PMC2909233 DOI: 10.1186/1471-2148-10-178] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2009] [Accepted: 06/14/2010] [Indexed: 11/25/2022] Open
Abstract
Background The (almost) universality of the genetic code is one of the most intriguing properties of cellular life. Nevertheless, several variants of the standard genetic code have been observed, which differ in one or several of 64 codon assignments and occur mainly in mitochondrial genomes and in nuclear genomes of some bacterial and eukaryotic parasites. These variants are usually considered to be the result of non-adaptive evolution. It has been shown that the standard genetic code is preferential to randomly assembled codes for its ability to reduce the effects of errors in protein translation. Results Using a genotype-to-phenotype mapping based on a quantitative model of protein folding, we compare the standard genetic code to seven of its naturally occurring variants with respect to the fitness loss associated to mistranslation and mutation. These fitness losses are computed through computer simulations of protein evolution with mutations that are either neutral or lethal, and different mutation biases, which influence the balance between unfolding and misfolding stability. We show that the alternative codes may produce significantly different mutation and translation loads, particularly for genomes evolving with a rather large mutation bias. Most of the alternative genetic codes are found to be disadvantageous to the standard code, in agreement with the view that the change of genetic code is a mutationally driven event. Nevertheless, one of the studied alternative genetic codes is predicted to be preferable to the standard code for a broad range of mutation biases. Conclusions Our results show that, with one exception, the standard genetic code is generally better able to reduce the translation load than the naturally occurring variants studied here. Besides this exception, some of the other alternative genetic codes are predicted to be better adapted for extreme mutation biases. Hence, the fixation of alternative genetic codes might be a neutral or nearly-neutral event in the majority of the cases, but adaptation cannot be excluded for some of the studied cases.
Collapse
Affiliation(s)
- Stefanie Gabriele Sammet
- Institut für Festkörperphysik, Technische Universität Darmstadt, Hochschulstr, 8, 64289 Darmstadt, Germany
| | | | | |
Collapse
|
25
|
Mendez R, Fritsche M, Porto M, Bastolla U. Mutation bias favors protein folding stability in the evolution of small populations. PLoS Comput Biol 2010. [PMID: 20463869 DOI: 10.1371/journal.pcbi.1000767#close] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Mutation bias in prokaryotes varies from extreme adenine and thymine (AT) in obligatory endosymbiotic or parasitic bacteria to extreme guanine and cytosine (GC), for instance in actinobacteria. GC mutation bias deeply influences the folding stability of proteins, making proteins on the average less hydrophobic and therefore less stable with respect to unfolding but also less susceptible to misfolding and aggregation. We study a model where proteins evolve subject to selection for folding stability under given mutation bias, population size, and neutrality. We find a non-neutral regime where, for any given population size, there is an optimal mutation bias that maximizes fitness. Interestingly, this optimal GC usage is small for small populations, large for intermediate populations and around 50% for large populations. This result is robust with respect to the definition of the fitness function and to the protein structures studied. Our model suggests that small populations evolving with small GC usage eventually accumulate a significant selective advantage over populations evolving without this bias. This provides a possible explanation to the observation that most species adopting obligatory intracellular lifestyles with a consequent reduction of effective population size shifted their mutation spectrum towards AT. The model also predicts that large GC usage is optimal for intermediate population size. To test these predictions we estimated the effective population sizes of bacterial species using the optimal codon usage coefficients computed by dos Reis et al. and the synonymous to non-synonymous substitution ratio computed by Daubin and Moran. We found that the population sizes estimated in these ways are significantly smaller for species with small and large GC usage compared to species with no bias, which supports our prediction.
Collapse
Affiliation(s)
- Raul Mendez
- Centro de Biología Molecular Severo Ochoa, Consejo Superior de Investigaciones Científicas and Universidad Autónoma de Madrid, Madrid, Spain
| | | | | | | |
Collapse
|
26
|
Mendez R, Fritsche M, Porto M, Bastolla U. Mutation bias favors protein folding stability in the evolution of small populations. PLoS Comput Biol 2010; 6:e1000767. [PMID: 20463869 PMCID: PMC2865504 DOI: 10.1371/journal.pcbi.1000767] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2009] [Accepted: 03/30/2010] [Indexed: 11/29/2022] Open
Abstract
Mutation bias in prokaryotes varies from extreme adenine and thymine (AT) in obligatory endosymbiotic or parasitic bacteria to extreme guanine and cytosine (GC), for instance in actinobacteria. GC mutation bias deeply influences the folding stability of proteins, making proteins on the average less hydrophobic and therefore less stable with respect to unfolding but also less susceptible to misfolding and aggregation. We study a model where proteins evolve subject to selection for folding stability under given mutation bias, population size, and neutrality. We find a non-neutral regime where, for any given population size, there is an optimal mutation bias that maximizes fitness. Interestingly, this optimal GC usage is small for small populations, large for intermediate populations and around 50% for large populations. This result is robust with respect to the definition of the fitness function and to the protein structures studied. Our model suggests that small populations evolving with small GC usage eventually accumulate a significant selective advantage over populations evolving without this bias. This provides a possible explanation to the observation that most species adopting obligatory intracellular lifestyles with a consequent reduction of effective population size shifted their mutation spectrum towards AT. The model also predicts that large GC usage is optimal for intermediate population size. To test these predictions we estimated the effective population sizes of bacterial species using the optimal codon usage coefficients computed by dos Reis et al. and the synonymous to non-synonymous substitution ratio computed by Daubin and Moran. We found that the population sizes estimated in these ways are significantly smaller for species with small and large GC usage compared to species with no bias, which supports our prediction.
Collapse
Affiliation(s)
- Raul Mendez
- Centro de Biología Molecular “Severo Ochoa”, Consejo Superior de Investigaciones Científicas and Universidad Autónoma de Madrid, Madrid, Spain
| | - Miriam Fritsche
- Institut für Festkörperphysik, Technische Universität Darmstadt, Darmstadt, Germany
| | - Markus Porto
- Institut für Festkörperphysik, Technische Universität Darmstadt, Darmstadt, Germany
| | - Ugo Bastolla
- Centro de Biología Molecular “Severo Ochoa”, Consejo Superior de Investigaciones Científicas and Universidad Autónoma de Madrid, Madrid, Spain
| |
Collapse
|
27
|
Noirel J, Simonson T. Neutral evolution of proteins: The superfunnel in sequence space and its relation to mutational robustness. J Chem Phys 2009; 129:185104. [PMID: 19045432 DOI: 10.1063/1.2992853] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Following Kimura's neutral theory of molecular evolution [M. Kimura, The Neutral Theory of Molecular Evolution (Cambridge University Press, Cambridge, 1983) (reprinted in 1986)], it has become common to assume that the vast majority of viable mutations of a gene confer little or no functional advantage. Yet, in silico models of protein evolution have shown that mutational robustness of sequences could be selected for, even in the context of neutral evolution. The evolution of a biological population can be seen as a diffusion on the network of viable sequences. This network is called a "neutral network." Depending on the mutation rate mu and the population size N, the biological population can evolve purely randomly (muN<<1) or it can evolve in such a way as to select for sequences of higher mutational robustness (muN>>1). The stringency of the selection depends not only on the product muN but also on the exact topology of the neutral network, the special arrangement of which was named "superfunnel." Even though the relation between mutation rate, population size, and selection was thoroughly investigated, a study of the salient topological features of the superfunnel that could affect the strength of the selection was wanting. This question is addressed in this study. We use two different models of proteins: on lattice and off lattice. We compare neutral networks computed using these models to random networks. From this, we identify two important factors of the topology that determine the stringency of the selection for mutationally robust sequences. First, the presence of highly connected nodes ("hubs") in the network increases the selection for mutationally robust sequences. Second, the stringency of the selection increases when the correlation between a sequence's mutational robustness and its neighbors' increases. The latter finding relates a global characteristic of the neutral network to a local one, which is attainable through experiments or molecular modeling.
Collapse
Affiliation(s)
- Josselin Noirel
- Laboratoire de Biochimie, Ecole Polytechnique, Route de Saclay, Palaiseau 91128 Cedex, France.
| | | |
Collapse
|
28
|
|
29
|
Abstract
Is genetic evolution predictable? Evolutionary developmental biologists have argued that, at least for morphological traits, the answer is a resounding yes. Most mutations causing morphological variation are expected to reside in the cis-regulatory, rather than the coding, regions of developmental genes. This "cis-regulatory hypothesis" has recently come under attack. In this review, we first describe and critique the arguments that have been proposed in support of the cis-regulatory hypothesis. We then test the empirical support for the cis-regulatory hypothesis with a comprehensive survey of mutations responsible for phenotypic evolution in multicellular organisms. Cis-regulatory mutations currently represent approximately 22% of 331 identified genetic changes although the number of cis-regulatory changes published annually is rapidly increasing. Above the species level, cis-regulatory mutations altering morphology are more common than coding changes. Also, above the species level cis-regulatory mutations predominate for genes not involved in terminal differentiation. These patterns imply that the simple question "Do coding or cis-regulatory mutations cause more phenotypic evolution?" hides more interesting phenomena. Evolution in different kinds of populations and over different durations may result in selection of different kinds of mutations. Predicting the genetic basis of evolution requires a comprehensive synthesis of molecular developmental biology and population genetics.
Collapse
Affiliation(s)
- David L Stern
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, New Jersey 08544, USA.
| | | |
Collapse
|
30
|
Drummond DA, Wilke CO. Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution. Cell 2008; 134:341-52. [PMID: 18662548 PMCID: PMC2696314 DOI: 10.1016/j.cell.2008.05.042] [Citation(s) in RCA: 811] [Impact Index Per Article: 47.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2008] [Revised: 04/21/2008] [Accepted: 05/21/2008] [Indexed: 12/30/2022]
Abstract
Strikingly consistent correlations between rates of coding-sequence evolution and gene expression levels are apparent across taxa, but the biological causes behind the selective pressures on coding-sequence evolution remain controversial. Here, we demonstrate conserved patterns of simple covariation between sequence evolution, codon usage, and mRNA level in E. coli, yeast, worm, fly, mouse, and human that suggest that all observed trends stem largely from a unified underlying selective pressure. In metazoans, these trends are strongest in tissues composed of neurons, whose structure and lifetime confer extreme sensitivity to protein misfolding. We propose, and demonstrate using a molecular-level evolutionary simulation, that selection against toxicity of misfolded proteins generated by ribosome errors suffices to create all of the observed covariation. The mechanistic model of molecular evolution that emerges yields testable biochemical predictions, calls into question the use of nonsynonymous-to-synonymous substitution ratios (Ka/Ks) to detect functional selection, and suggests how mistranslation may contribute to neurodegenerative disease.
Collapse
Affiliation(s)
- D Allan Drummond
- FAS Center for Systems Biology, Harvard University, Cambridge, MA 02138, USA.
| | | |
Collapse
|
31
|
Abstract
Although protein evolution can be approximated as a "molecular evolutionary clock," it is well known that sequence change departs from a clock-like Poisson expectation. Through studying the deviations from a molecular clock, insight can be gained into the forces shaping evolution at the level of proteins. Generally, substitution patterns that show greater variance than the Poisson expectation are said to be "overdispersed." Overdispersion of sequence change may result from temporal variation in the rate at which amino acid substitutions occur on a phylogeny. By comparing the genomes of four species of yeast, five species of Drosophila, and five species of mammals, we show that the extent of overdispersion shows a strong negative correlation with the effective population size of these organisms. Yeast proteins show very little overdispersion, while mammalian proteins show substantial overdispersion. Additionally, X-linked genes, which have reduced effective population size, have gene products that show increased overdispersion in both Drosophila and mammals. Our research suggests that mutational robustness is more pervasive in organisms with large population sizes and that robustness acts to stabilize the molecular evolutionary clock of sequence change.
Collapse
|
32
|
Five Drosophila genomes reveal nonneutral evolution and the signature of host specialization in the chemoreceptor superfamily. Genetics 2008; 177:1395-416. [PMID: 18039874 DOI: 10.1534/genetics.107.078683] [Citation(s) in RCA: 160] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The insect chemoreceptor superfamily comprises the olfactory receptor (Or) and gustatory receptor (Gr) multigene families. These families give insects the ability to smell and taste chemicals in the environment and are thus rich resources for linking molecular evolutionary and ecological processes. Although dramatic differences in family size among distant species and high divergence among paralogs have led to the belief that the two families evolve rapidly, a lack of evolutionary data over short time scales has frustrated efforts to identify the major forces shaping this evolution. Here, we investigate patterns of gene loss/gain, divergence, and polymorphism in the entire repertoire of approximately 130 chemoreceptor genes from five closely related species of Drosophila that share a common ancestor within the past 12 million years. We demonstrate that the overall evolution of the Or and Gr families is nonneutral. We also show that selection regimes differ both between the two families as wholes and within each family among groups of genes with varying functions, patterns of expression, and phylogenetic histories. Finally, we find that the independent evolution of host specialization in Drosophila sechellia and D. erecta is associated with a fivefold acceleration of gene loss and increased rates of amino acid evolution at receptors that remain intact. Gene loss appears to primarily affect Grs that respond to bitter compounds while elevated Ka/Ks is most pronounced in the subset of Ors that are expressed in larvae. Our results provide strong evidence that the observed phenomena result from the invasion of a novel ecological niche and present a unique synthesis of molecular evolutionary analyses with ecological data.
Collapse
|
33
|
Raval A. Molecular clock on a neutral network. PHYSICAL REVIEW LETTERS 2007; 99:138104. [PMID: 17930643 DOI: 10.1103/physrevlett.99.138104] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/28/2007] [Indexed: 05/25/2023]
Abstract
The number of fixed mutations accumulated in an evolving population often displays a variance that is significantly larger than the mean (the overdispersed molecular clock). By examining a generic evolutionary process on a neutral network of high-fitness genotypes, we establish a formalism for computing all cumulants of the full probability distribution of accumulated mutations in terms of graph properties of the neutral network, and use the formalism to prove overdispersion of the molecular clock. We further show that significant overdispersion arises naturally in evolution when the neutral network is highly sparse, exhibits large global fluctuations in neutrality, and small local fluctuations in neutrality. The results are also relevant for elucidating aspects of neutral network topology from empirical measurements of the substitution process.
Collapse
Affiliation(s)
- Alpan Raval
- Keck Graduate Institute of Applied Life Sciences, 535 Watson Drive, Claremont, California 91711, USA.
| |
Collapse
|
34
|
Abstract
Naturally evolving proteins gradually accumulate mutations while continuing to fold to stable structures. This process of neutral evolution is an important mode of genetic change and forms the basis for the molecular clock. We present a mathematical theory that predicts the number of accumulated mutations, the index of dispersion, and the distribution of stabilities in an evolving protein population from knowledge of the stability effects (delta deltaG values) for single mutations. Our theory quantitatively describes how neutral evolution leads to marginally stable proteins and provides formulas for calculating how fluctuations in stability can overdisperse the molecular clock. It also shows that the structural influences on the rate of sequence evolution observed in earlier simulations can be calculated using just the single-mutation delta deltaG values. We consider both the case when the product of the population size and mutation rate is small and the case when this product is large, and show that in the latter case the proteins evolve excess mutational robustness that is manifested by extra stability and an increase in the rate of sequence evolution. All our theoretical predictions are confirmed by simulations with lattice proteins. Our work provides a mathematical foundation for understanding how protein biophysics shapes the process of evolution.
Collapse
Affiliation(s)
- Jesse D Bloom
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, USA.
| | | | | |
Collapse
|
35
|
Forster R, Adami C, Wilke CO. Selection for mutational robustness in finite populations. J Theor Biol 2006; 243:181-90. [PMID: 16901510 DOI: 10.1016/j.jtbi.2006.06.020] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2005] [Revised: 06/07/2006] [Accepted: 06/23/2006] [Indexed: 01/08/2023]
Abstract
We investigate the evolutionary dynamics of a finite population of RNA sequences replicating on a neutral network. Despite the lack of differential fitness between viable sequences, we observe typical properties of adaptive evolution, such as increase of mean fitness over time and punctuated-equilibrium transitions, after initial mutation-selection balance has been reached. We find that a product of population size and mutation rate of approximately 30 or larger is sufficient to generate selection pressure for mutational robustness, even if the population size is orders of magnitude smaller than the neutral network on which the population resides. Our results show that quasispecies effects and neutral drift can occur concurrently, and that the relative importance of each is determined by the product of population size and mutation rate.
Collapse
Affiliation(s)
- Robert Forster
- Digital Life Laboratory, California Institute of Technology, Pasadena, CA 91125, USA
| | | | | |
Collapse
|
36
|
Bastolla U, Porto M, Roman HE, Vendruscolo M. A protein evolution model with independent sites that reproduces site-specific amino acid distributions from the Protein Data Bank. BMC Evol Biol 2006; 6:43. [PMID: 16737532 PMCID: PMC1570368 DOI: 10.1186/1471-2148-6-43] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2005] [Accepted: 05/31/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Since thermodynamic stability is a global property of proteins that has to be conserved during evolution, the selective pressure at a given site of a protein sequence depends on the amino acids present at other sites. However, models of molecular evolution that aim at reconstructing the evolutionary history of macromolecules become computationally intractable if such correlations between sites are explicitly taken into account. RESULTS We introduce an evolutionary model with sites evolving independently under a global constraint on the conservation of structural stability. This model consists of a selection process, which depends on two hydrophobicity parameters that can be computed from protein sequences without any fit, and a mutation process for which we consider various models. It reproduces quantitatively the results of Structurally Constrained Neutral (SCN) simulations of protein evolution in which the stability of the native state is explicitly computed and conserved. We then compare the predicted site-specific amino acid distributions with those sampled from the Protein Data Bank (PDB). The parameters of the mutation model, whose number varies between zero and five, are fitted from the data. The mean correlation coefficient between predicted and observed site-specific amino acid distributions is larger than <r> = 0.70 for a mutation model with no free parameters and no genetic code. In contrast, considering only the mutation process with no selection yields a mean correlation coefficient of <r> = 0.56 with three fitted parameters. The mutation model that best fits the data takes into account increased mutation rate at CpG dinucleotides, yielding <r> = 0.90 with five parameters. CONCLUSION The effective selection process that we propose reproduces well amino acid distributions as observed in the protein sequences in the PDB. Its simplicity makes it very promising for likelihood calculations in phylogenetic studies. Interestingly, in this approach the mutation process influences the effective selection process, i.e. selection and mutation must be entangled in order to obtain effectively independent sites. This interdependence between mutation and selection reflects the deep influence that mutation has on the evolutionary process: The bias in the mutation influences the thermodynamic properties of the evolving proteins, in agreement with comparative studies of bacterial proteomes, and it also influences the rate of accepted mutations.
Collapse
Affiliation(s)
- Ugo Bastolla
- Centro de Biología Molecular "Severo Ochoa", (CSIC-UAM), Cantoblanco, 28049 Madrid, Spain
| | - Markus Porto
- Institut für Festkörperphysik, Technische Universität Darmstadt, Hochschulstr. 8, 64289 Darmstadt, Germany
| | - H Eduardo Roman
- Dipartimento di Fisica, Università di Milano Bicocca, Piazza della Scienza 3, 20126 Milano, Italy
| | - Michele Vendruscolo
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, UK
| |
Collapse
|
37
|
Abstract
Recent work has shown that expression level is the main predictor of a gene's evolutionary rate and that more highly expressed genes evolve slower. A possible explanation for this observation is selection for proteins that fold properly despite mistranslation, in short selection for translational robustness. Translational robustness leads to the somewhat paradoxical prediction that highly expressed genes are extremely tolerant to missense substitutions but nevertheless evolve very slowly. Here, we study a simple theoretical model of translational robustness that allows us to gain analytic insight into how this paradoxical behavior arises.
Collapse
Affiliation(s)
- Claus O Wilke
- Section of Integrative Biology and Center for Computational Biology and Bioinformatics, University of Texas, Austin 78712, USA.
| | | |
Collapse
|
38
|
Wilke CO, Bloom JD, Drummond DA, Raval A. Predicting the tolerance of proteins to random amino acid substitution. Biophys J 2005; 89:3714-20. [PMID: 16150971 PMCID: PMC1366941 DOI: 10.1529/biophysj.105.062125] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We have recently proposed a thermodynamic model that predicts the tolerance of proteins to random amino acid substitutions. Here we test this model against extensive simulations with compact lattice proteins, and find that the overall performance of the model is very good. We also derive an approximate analytic expression for the fraction of mutant proteins that fold stably to the native structure, Pf(m), as a function of the number of amino acid substitutions m, and present several methods to estimate the asymptotic behavior of Pf(m) for large m. We test the accuracy of all approximations against our simulation results, and find good overall agreement between the approximations and the simulation measurements.
Collapse
Affiliation(s)
- Claus O Wilke
- Keck Graduate Institute of Applied Life Sciences, Claremont, California, USA.
| | | | | | | |
Collapse
|
39
|
Wilke CO. Quasispecies theory in the context of population genetics. BMC Evol Biol 2005; 5:44. [PMID: 16107214 PMCID: PMC1208876 DOI: 10.1186/1471-2148-5-44] [Citation(s) in RCA: 182] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2005] [Accepted: 08/17/2005] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND A number of recent papers have cast doubt on the applicability of the quasispecies concept to virus evolution, and have argued that population genetics is a more appropriate framework to describe virus evolution than quasispecies theory. RESULTS I review the pertinent literature, and demonstrate for a number of cases that the quasispecies concept is equivalent to the concept of mutation-selection balance developed in population genetics, and that there is no disagreement between the population genetics of haploid, asexually-replicating organisms and quasispecies theory. CONCLUSION Since quasispecies theory and mutation-selection balance are two sides of the same medal, the discussion about which is more appropriate to describe virus evolution is moot. In future work on virus evolution, we would do good to focus on the important questions, such as whether we can develop accurate, quantitative models of virus evolution, and to leave aside discussions about the relative merits of perfectly equivalent concepts.
Collapse
Affiliation(s)
- Claus O Wilke
- Keck Graduate Institute of Applied Life Sciences, 535 WatsonDrive, Claremont, California 91711, USA
- Digital Life Laboratory, California Institute of Technology, Mail Code 136-93, Pasadena, California 91125, USA
| |
Collapse
|
40
|
Drummond DA, Silberg JJ, Meyer MM, Wilke CO, Arnold FH. On the conservative nature of intragenic recombination. Proc Natl Acad Sci U S A 2005; 102:5380-5. [PMID: 15809422 PMCID: PMC556249 DOI: 10.1073/pnas.0500729102] [Citation(s) in RCA: 77] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2005] [Indexed: 11/18/2022] Open
Abstract
Intragenic recombination rapidly creates protein sequence diversity compared with random mutation, but little is known about the relative effects of recombination and mutation on protein function. Here, we compare recombination of the distantly related beta-lactamases PSE-4 and TEM-1 to mutation of PSE-4. We show that, among beta-lactamase variants containing the same number of amino acid substitutions, variants created by recombination retain function with a significantly higher probability than those generated by random mutagenesis. We present a simple model that accurately captures the differing effects of mutation and recombination in real and simulated proteins with only four parameters: (i) the amino acid sequence distance between parents, (ii) the number of substitutions, (iii) the average probability that random substitutions will preserve function, and (iv) the average probability that substitutions generated by recombination will preserve function. Our results expose a fundamental functional enrichment in regions of protein sequence space accessible by recombination and provide a framework for evaluating whether the relative rates of mutation and recombination observed in nature reflect the underlying imbalance in their effects on protein function.
Collapse
Affiliation(s)
- D Allan Drummond
- Program in Computation and Neural Systems, Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | | | | | | | | |
Collapse
|
41
|
Bastolla U, Porto M, Roman HE, Vendruscolo M. Looking at structure, stability, and evolution of proteins through the principal eigenvector of contact matrices and hydrophobicity profiles. Gene 2005; 347:219-30. [PMID: 15777696 DOI: 10.1016/j.gene.2004.12.015] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2004] [Revised: 11/29/2004] [Accepted: 12/10/2004] [Indexed: 11/28/2022]
Abstract
We review and further develop an analytical model that describes how thermodynamic constraints on the stability of the native state influence protein evolution in a site-specific manner. To this end, we represent both protein sequences and protein structures as vectors: structures are represented by the principal eigenvector (PE) of the protein contact matrix, a quantity that resembles closely the effective connectivity of each site; sequences are represented through the "interactivity" of each amino acid type, using novel parameters that are correlated with hydropathy scales. These interactivity parameters are more strongly correlated than the other hydropathy scales that we examine with: (1) the change upon mutations of the unfolding free energy of proteins with two-states thermodynamics; (2) genomic properties as the genome-size and the genome-wide GC content; (3) the main eigenvectors of the substitution matrices. The evolutionary average of the interactivity vector correlates very strongly with the PE of a protein structure. Using this result, we derive an analytic expression for site-specific distributions of amino acids across protein families in the form of Boltzmann distributions whose "inverse temperature" is a function of the PE component. We show that our predictions are in agreement with site-specific amino acid distributions obtained from the Protein Data Bank, and we determine the mutational model that best fits the observed site-specific amino acid distributions. Interestingly, the optimal model almost minimizes the rate at which deleterious mutations are eliminated by natural selection.
Collapse
Affiliation(s)
- Ugo Bastolla
- Centro de Astrobiología, INTA-CSIC, c.tra de Ajalvir km.4, E-28850, Torrejón de Ardoz, Madrid, Spain.
| | | | | | | |
Collapse
|