1
|
Leigh S, Thorpe P, Snook RR, Ritchie MG. Sexual selection, genomic evolution and population fitness in Drosophila pseudoobscura. Proc Biol Sci 2025; 292:20242744. [PMID: 40169023 PMCID: PMC11961267 DOI: 10.1098/rspb.2024.2744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2024] [Revised: 02/25/2025] [Accepted: 03/07/2025] [Indexed: 04/03/2025] Open
Abstract
Sexual selection shapes the genome in unique ways. It is also likely to have significant fitness consequences, such as purging deleterious mutations from the genome or conversely maintaining genetic load in a population via sexual conflict. Here, we examined what the influence of sexual selection has on genomic variation potentially underlying population fitness using experimentally evolved Drosophila pseudoobscura populations. Sexual selection was manipulated by keeping replicate lines in elevated polyandry or strict monogamy for approximately 200 generations followed by individual-based sequencing. Using pi (π), fixation index (Fst)and recombination rate measures, we confirmed signatures of selection were not dispersed but mainly localized to the third and X chromosome. Overall mutational load was similar between lines but our analysis of the distribution of fitness effects revealed considerable variation between lines and chromosomes. Furthermore, we found that the distribution of transposable elements differs between the lines, with a higher load in monogamous lines. Our results suggest that complex interactions between purifying selection and sexual conflict are shaping the genome, particularly on chromosome 3 and the sex chromosome; sexual selection influences divergence across chromosomes but in a more complex way than proposed by simple 'purging' of deleterious loci.
Collapse
Affiliation(s)
- Stewart Leigh
- Centre for Biological Diversity, School of Biology, University of St Andrews, St Andrews, Fife, UK
| | - Peter Thorpe
- Centre for Biological Diversity, School of Biology, University of St Andrews, St Andrews, Fife, UK
- The Data Analysis Group, School of Life Sciences, University of Dundee, Dundee, UK
| | - Rhonda R. Snook
- Department of Zoology, Stockholms Universitet, Stockholm, Sweden
| | - Michael G. Ritchie
- Centre for Biological Diversity, School of Biology, University of St Andrews, St Andrews, Fife, UK
| |
Collapse
|
2
|
Latrille T, Joseph J, Hartasánchez DA, Salamin N. Estimating the proportion of beneficial mutations that are not adaptive in mammals. PLoS Genet 2024; 20:e1011536. [PMID: 39724093 PMCID: PMC11709321 DOI: 10.1371/journal.pgen.1011536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Revised: 01/08/2025] [Accepted: 12/10/2024] [Indexed: 12/28/2024] Open
Abstract
Mutations can be beneficial by bringing innovation to their bearer, allowing them to adapt to environmental change. These mutations are typically unpredictable since they respond to an unforeseen change in the environment. However, mutations can also be beneficial because they are simply restoring a state of higher fitness that was lost due to genetic drift in a stable environment. In contrast to adaptive mutations, these beneficial non-adaptive mutations can be predicted if the underlying fitness landscape is stable and known. The contribution of such non-adaptive mutations to molecular evolution has been widely neglected mainly because their detection is very challenging. We have here reconstructed protein-coding gene fitness landscapes shared between mammals, using mutation-selection models and a multi-species alignments across 87 mammals. These fitness landscapes have allowed us to predict the fitness effect of polymorphisms found in 28 mammalian populations. Using methods that quantify selection at the population level, we have confirmed that beneficial non-adaptive mutations are indeed positively selected in extant populations. Our work confirms that deleterious substitutions are accumulating in mammals and are being reverted, generating a balance in which genomes are damaged and restored simultaneously at different loci. We observe that beneficial non-adaptive mutations represent between 15% and 45% of all beneficial mutations in 24 of 28 populations analyzed, suggesting that a substantial part of ongoing positive selection is not driven solely by adaptation to environmental change in mammals.
Collapse
Affiliation(s)
- Thibault Latrille
- Department of Computational Biology, Université de Lausanne, Lausanne, Switzerland
| | - Julien Joseph
- Laboratoire de Biométrie et Biologie Evolutive, UMR5558, Université Lyon 1, Villeurbanne, France
| | | | - Nicolas Salamin
- Department of Computational Biology, Université de Lausanne, Lausanne, Switzerland
| |
Collapse
|
3
|
Kuo YP, Carja O. Evolutionary graph theory beyond single mutation dynamics: on how network-structured populations cross fitness landscapes. Genetics 2024; 227:iyae055. [PMID: 38639307 PMCID: PMC11151934 DOI: 10.1093/genetics/iyae055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 03/28/2024] [Accepted: 04/01/2024] [Indexed: 04/20/2024] Open
Abstract
Spatially resolved datasets are revolutionizing knowledge in molecular biology, yet are under-utilized for questions in evolutionary biology. To gain insight from these large-scale datasets of spatial organization, we need mathematical representations and modeling techniques that can both capture their complexity, but also allow for mathematical tractability. Evolutionary graph theory utilizes the mathematical representation of networks as a proxy for heterogeneous population structure and has started to reshape our understanding of how spatial structure can direct evolutionary dynamics. However, previous results are derived for the case of a single new mutation appearing in the population and the role of network structure in shaping fitness landscape crossing is still poorly understood. Here we study how network-structured populations cross fitness landscapes and show that even a simple extension to a two-mutational landscape can exhibit complex evolutionary dynamics that cannot be predicted using previous single-mutation results. We show how our results can be intuitively understood through the lens of how the two main evolutionary properties of a network, the amplification and acceleration factors, change the expected fate of the intermediate mutant in the population and further discuss how to link these models to spatially resolved datasets of cellular organization.
Collapse
Affiliation(s)
- Yang Ping Kuo
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15232, USA
| | - Oana Carja
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15232, USA
| |
Collapse
|
4
|
Joseph J. Increased Positive Selection in Highly Recombining Genes Does not Necessarily Reflect an Evolutionary Advantage of Recombination. Mol Biol Evol 2024; 41:msae107. [PMID: 38829800 PMCID: PMC11173204 DOI: 10.1093/molbev/msae107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 04/08/2024] [Accepted: 05/28/2024] [Indexed: 06/05/2024] Open
Abstract
It is commonly thought that the long-term advantage of meiotic recombination is to dissipate genetic linkage, allowing natural selection to act independently on different loci. It is thus theoretically expected that genes with higher recombination rates evolve under more effective selection. On the other hand, recombination is often associated with GC-biased gene conversion (gBGC), which theoretically interferes with selection by promoting the fixation of deleterious GC alleles. To test these predictions, several studies assessed whether selection was more effective in highly recombining genes (due to dissipation of genetic linkage) or less effective (due to gBGC), assuming a fixed distribution of fitness effects (DFE) for all genes. In this study, I directly derive the DFE from a gene's evolutionary history (shaped by mutation, selection, drift, and gBGC) under empirical fitness landscapes. I show that genes that have experienced high levels of gBGC are less fit and thus have more opportunities for beneficial mutations. Only a small decrease in the genome-wide intensity of gBGC leads to the fixation of these beneficial mutations, particularly in highly recombining genes. This results in increased positive selection in highly recombining genes that is not caused by more effective selection. Additionally, I show that the death of a recombination hotspot can lead to a higher dN/dS than its birth, but with substitution patterns biased towards AT, and only at selected positions. This shows that controlling for a substitution bias towards GC is therefore not sufficient to rule out the contribution of gBGC to signatures of accelerated evolution. Finally, although gBGC does not affect the fixation probability of GC-conservative mutations, I show that by altering the DFE, gBGC can also significantly affect nonsynonymous GC-conservative substitution patterns.
Collapse
Affiliation(s)
- Julien Joseph
- Laboratoire de Biométrie et Biologie Evolutive, Université Lyon 1, CNRS, UMR 5558, Villeurbanne, France
| |
Collapse
|
5
|
Soni V, Pfeifer SP, Jensen JD. The Effects of Mutation and Recombination Rate Heterogeneity on the Inference of Demography and the Distribution of Fitness Effects. Genome Biol Evol 2024; 16:evae004. [PMID: 38207127 PMCID: PMC10834165 DOI: 10.1093/gbe/evae004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 12/12/2023] [Accepted: 01/07/2024] [Indexed: 01/13/2024] Open
Abstract
Disentangling the effects of demography and selection has remained a focal point of population genetic analysis. Knowledge about mutation and recombination is essential in this endeavor; however, despite clear evidence that both mutation and recombination rates vary across genomes, it is common practice to model both rates as fixed. In this study, we quantify how this unaccounted for rate heterogeneity may impact inference using common approaches for inferring selection (DFE-alpha, Grapes, and polyDFE) and/or demography (fastsimcoal2 and δaδi). We demonstrate that, if not properly modeled, this heterogeneity can increase uncertainty in the estimation of demographic and selective parameters and in some scenarios may result in mis-leading inference. These results highlight the importance of quantifying the fundamental evolutionary parameters of mutation and recombination before utilizing population genomic data to quantify the effects of genetic drift (i.e. as modulated by demographic history) and selection; or, at the least, that the effects of uncertainty in these parameters can and should be directly modeled in downstream inference.
Collapse
Affiliation(s)
- Vivak Soni
- School of Life Sciences, Center for Evolution & Medicine, Arizona State University, Tempe, AZ, USA
| | - Susanne P Pfeifer
- School of Life Sciences, Center for Evolution & Medicine, Arizona State University, Tempe, AZ, USA
| | - Jeffrey D Jensen
- School of Life Sciences, Center for Evolution & Medicine, Arizona State University, Tempe, AZ, USA
| |
Collapse
|
6
|
Li LL, Xiao Y, Wang X, He ZH, Lv YW, Hu XS. The Ka /Ks and πa /πs Ratios under Different Models of Gametophytic and Sporophytic Selection. Genome Biol Evol 2023; 15:evad151. [PMID: 37561000 PMCID: PMC10443736 DOI: 10.1093/gbe/evad151] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 08/06/2023] [Accepted: 08/08/2023] [Indexed: 08/11/2023] Open
Abstract
Alternation of generations in plant life cycle provides a biological basis for natural selection occurring in either the gametophyte or the sporophyte phase or in both. Divergent biphasic selection could yield distinct evolutionary rates for phase-specific or pleiotropic genes. Here, we analyze models that deal with antagonistic and synergistic selection between alternative generations in terms of the ratio of nonsynonymous to synonymous divergence (Ka/Ks). Effects of biphasic selection are opposite under antagonistic selection but cumulative under synergistic selection for pleiotropic genes. Under the additive and comparable strengths of biphasic allelic selection, the absolute Ka/Ks for the gametophyte gene is equal to in outcrossing but smaller than, in a mixed mating system, that for the sporophyte gene under antagonistic selection. The same pattern is predicted for Ka/Ks under synergistic selection. Selfing reduces efficacy of gametophytic selection. Other processes, including pollen and seed flow and genetic drift, reduce selection efficacy. The polymorphism (πa) at a nonsynonymous site is affected by the joint effects of selfing with gametophytic or sporophytic selection. Likewise, the ratio of nonsynonymous to synonymous polymorphism (πa/πs) is also affected by the same joint effects. Gene flow and genetic drift have opposite effects on πa or πa/πs in interacting with gametophytic and sporophytic selection. We discuss implications of this theory for detecting natural selection in terms of Ka/Ks and for interpreting the evolutionary divergence among gametophyte-specific, sporophyte-specific, and pleiotropic genes.
Collapse
Affiliation(s)
- Ling-Ling Li
- College of Forestry and Landscape Architecture, South China Agricultural University, Guangzhou 510642, China
- Guangdong Key Laboratory for Innovative Development and Utilization of Forest Plant Germplasm, South China Agricultural University, Guangzhou 510642, China
| | - Yu Xiao
- College of Forestry and Landscape Architecture, South China Agricultural University, Guangzhou 510642, China
- Guangdong Key Laboratory for Innovative Development and Utilization of Forest Plant Germplasm, South China Agricultural University, Guangzhou 510642, China
| | - Xi Wang
- College of Forestry and Landscape Architecture, South China Agricultural University, Guangzhou 510642, China
- Guangdong Key Laboratory for Innovative Development and Utilization of Forest Plant Germplasm, South China Agricultural University, Guangzhou 510642, China
| | - Zi-Han He
- College of Forestry and Landscape Architecture, South China Agricultural University, Guangzhou 510642, China
- Guangdong Key Laboratory for Innovative Development and Utilization of Forest Plant Germplasm, South China Agricultural University, Guangzhou 510642, China
| | - Yan-Wen Lv
- College of Forestry and Landscape Architecture, South China Agricultural University, Guangzhou 510642, China
- Guangdong Key Laboratory for Innovative Development and Utilization of Forest Plant Germplasm, South China Agricultural University, Guangzhou 510642, China
| | - Xin-Sheng Hu
- College of Forestry and Landscape Architecture, South China Agricultural University, Guangzhou 510642, China
- Guangdong Key Laboratory for Innovative Development and Utilization of Forest Plant Germplasm, South China Agricultural University, Guangzhou 510642, China
| |
Collapse
|
7
|
Desbiez-Piat A, Le Rouzic A, Tenaillon MI, Dillmann C. Interplay between extreme drift and selection intensities favors the fixation of beneficial mutations in selfing maize populations. Genetics 2021; 219:6339583. [PMID: 34849881 DOI: 10.1093/genetics/iyab123] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Accepted: 07/21/2021] [Indexed: 11/13/2022] Open
Abstract
Population and quantitative genetic models provide useful approximations to predict long-term selection responses sustaining phenotypic shifts, and underlying multilocus adaptive dynamics. Valid across a broad range of parameters, their use for understanding the adaptive dynamics of small selfing populations undergoing strong selection intensity (thereafter High Drift-High selection regime, HDHS) remains to be explored. Saclay Divergent Selection Experiments (DSEs) on maize flowering time provide an interesting example of populations evolving under HDHS, with significant selection responses over 20 generations in two directions. We combined experimental data from Saclay DSEs, forward individual-based simulations, and theoretical predictions to dissect the evolutionary mechanisms at play in the observed selection responses. We asked two main questions: How do mutations arise, spread, and reach fixation in populations evolving under HDHS? How does the interplay between drift and selection influence observed phenotypic shifts? We showed that the long-lasting response to selection in small populations is due to the rapid fixation of mutations occurring during the generations of selection. Among fixed mutations, we also found a clear signal of enrichment for beneficial mutations revealing a limited cost of selection. Both environmental stochasticity and variation in selection coefficients likely contributed to exacerbate mutational effects, thereby facilitating selection grasp and fixation of small-effect mutations. Together our results highlight that despite a small number of polymorphic loci expected under HDHS, adaptive variation is continuously fueled by a vast mutational target. We discuss our results in the context of breeding and long-term survival of small selfing populations.
Collapse
Affiliation(s)
- Arnaud Desbiez-Piat
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE-Le Moulon, 91190 Gif-sur-Yvette, France
| | - Arnaud Le Rouzic
- Université Paris-Saclay, CNRS, IRD, UMR Évolution, Génomes, Comportement et Écologie, 91120 Gif-sur-Yvette, France
| | - Maud I Tenaillon
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE-Le Moulon, 91190 Gif-sur-Yvette, France
| | - Christine Dillmann
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE-Le Moulon, 91190 Gif-sur-Yvette, France
| |
Collapse
|
8
|
Galtier N, Rousselle M. How Much Does Ne Vary Among Species? Genetics 2020; 216:559-572. [PMID: 32839240 PMCID: PMC7536855 DOI: 10.1534/genetics.120.303622] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Accepted: 08/20/2020] [Indexed: 11/18/2022] Open
Abstract
Genetic drift is an important evolutionary force of strength inversely proportional to Ne , the effective population size. The impact of drift on genome diversity and evolution is known to vary among species, but quantifying this effect is a difficult task. Here we assess the magnitude of variation in drift power among species of animals via its effect on the mutation load - which implies also inferring the distribution of fitness effects of deleterious mutations. To this aim, we analyze the nonsynonymous (amino-acid changing) and synonymous (amino-acid conservative) allele frequency spectra in a large sample of metazoan species, with a focus on the primates vs. fruit flies contrast. We show that a Gamma model of the distribution of fitness effects is not suitable due to strong differences in estimated shape parameters among taxa, while adding a class of lethal mutations essentially solves the problem. Using the Gamma + lethal model and assuming that the mean deleterious effects of nonsynonymous mutations is shared among species, we estimate that the power of drift varies by a factor of at least 500 between large-Ne and small-Ne species of animals, i.e., an order of magnitude more than the among-species variation in genetic diversity. Our results are relevant to Lewontin's paradox while further questioning the meaning of the Ne parameter in population genomics.
Collapse
Affiliation(s)
- Nicolas Galtier
- Institute of Evolution Sciences of Montpellier (ISEM), CNRS, University of Montpellier, IRD, EPHE, 34095 Montpellier, France
| | - Marjolaine Rousselle
- Institute of Evolution Sciences of Montpellier (ISEM), CNRS, University of Montpellier, IRD, EPHE, 34095 Montpellier, France
- Bioinformatics Research Centre, Aarhus University, DK Aarhus, Denmark
| |
Collapse
|
9
|
Amei A, Zhou S. Inferring the distribution of selective effects from a time inhomogeneous model. PLoS One 2019; 14:e0194709. [PMID: 30657757 PMCID: PMC6338356 DOI: 10.1371/journal.pone.0194709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2017] [Accepted: 03/08/2018] [Indexed: 11/18/2022] Open
Abstract
We have developed a Poisson random field model for estimating the distribution of selective effects of newly arisen nonsynonymous mutations that could be observed as polymorphism or divergence in samples of two related species under the assumption that the two species populations are not at mutation-selection-drift equilibrium. The model is applied to 91Drosophila genes by comparing levels of polymorphism in an African population of D. melanogaster with divergence to a reference strain of D. simulans. Based on the difference of gene expression level between testes and ovaries, the 91 genes were classified as 33 male-biased, 28 female-biased, and 30 sex-unbiased genes. Under a Bayesian framework, Markov chain Monte Carlo simulations are implemented to the model in which the distribution of selective effects is assumed to be Gaussian with a mean that may differ from one gene to the other to sample key parameters. Based on our estimates, the majority of newly-arisen nonsynonymous mutations that could contribute to polymorphism or divergence in Drosophila species are mildly deleterious with a mean scaled selection coefficient of -2.81, while almost 86% of the fixed differences between species are driven by positive selection. There are only 16.6% of the nonsynonymous mutations observed in sex-unbiased genes that are under positive selection in comparison to 30% of male-biased and 46% of female-biased genes that are beneficial. We also estimated that D. melanogaster and D. simulans may have diverged 1.72 million years ago.
Collapse
Affiliation(s)
- Amei Amei
- Department of Mathematical Sciences, University of Nevada, Las Vegas, Nevada, United States of America
- * E-mail:
| | - Shilei Zhou
- 54 Crescent Ave, Apt G, Dorchester, Massachusetts, United States of America
| |
Collapse
|
10
|
Savisaar R, Hurst LD. Exonic splice regulation imposes strong selection at synonymous sites. Genome Res 2018; 28:1442-1454. [PMID: 30143596 PMCID: PMC6169883 DOI: 10.1101/gr.233999.117] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2018] [Accepted: 07/31/2018] [Indexed: 01/17/2023]
Abstract
What proportion of coding sequence nucleotides have roles in splicing, and how strong is the selection that maintains them? Despite a large body of research into exonic splice regulatory signals, these questions have not been answered. This is because, to our knowledge, previous investigations have not explicitly disentangled the frequency of splice regulatory elements from the strength of the evolutionary constraint under which they evolve. Current data are consistent both with a scenario of weak and diffuse constraint, enveloping large swaths of sequence, as well as with well-defined pockets of strong purifying selection. In the former case, natural selection on exonic splice enhancers (ESEs) might primarily act as a slight modifier of codon usage bias. In the latter, mutations that disrupt ESEs are likely to have large fitness and, potentially, clinical effects. To distinguish between these scenarios, we used several different methods to determine the distribution of selection coefficients for new mutations within ESEs. The analyses converged to suggest that ∼15%-20% of fourfold degenerate sites are part of functional ESEs. Most of these sites are under strong evolutionary constraint. Therefore, exonic splice regulation does not simply impose a weak bias that gently nudges coding sequence evolution in a particular direction. Rather, the selection to preserve these motifs is a strong force that severely constrains the evolution of a substantial proportion of coding nucleotides. Thus synonymous mutations that disrupt ESEs should be considered as a potentially common cause of single-locus genetic disorders.
Collapse
Affiliation(s)
- Rosina Savisaar
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath BA2 7AY, United Kingdom
| | - Laurence D Hurst
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath BA2 7AY, United Kingdom
| |
Collapse
|
11
|
Inference of Distribution of Fitness Effects and Proportion of Adaptive Substitutions from Polymorphism Data. Genetics 2017; 207:1103-1119. [PMID: 28951530 PMCID: PMC5676230 DOI: 10.1534/genetics.117.300323] [Citation(s) in RCA: 96] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2017] [Accepted: 09/13/2017] [Indexed: 11/18/2022] Open
Abstract
The distribution of fitness effects (DFE) encompasses the fraction of deleterious, neutral, and beneficial mutations. It conditions the evolutionary trajectory of populations, as well as the rate of adaptive molecular evolution (α). Inferring DFE and α from patterns of polymorphism, as given through the site frequency spectrum (SFS) and divergence data, has been a longstanding goal of evolutionary genetics. A widespread assumption shared by previous inference methods is that beneficial mutations only contribute negligibly to the polymorphism data. Hence, a DFE comprising only deleterious mutations tends to be estimated from SFS data, and α is then predicted by contrasting the SFS with divergence data from an outgroup. We develop a hierarchical probabilistic framework that extends previous methods to infer DFE and α from polymorphism data alone. We use extensive simulations to examine the performance of our method. While an outgroup is still needed to obtain an unfolded SFS, we show that both a DFE, comprising both deleterious and beneficial mutations, and α can be inferred without using divergence data. We also show that not accounting for the contribution of beneficial mutations to polymorphism data leads to substantially biased estimates of the DFE and α. We compare our framework with one of the most widely used inference methods available and apply it on a recently published chimpanzee exome data set.
Collapse
|
12
|
Huber CD, Kim BY, Marsden CD, Lohmueller KE. Determining the factors driving selective effects of new nonsynonymous mutations. Proc Natl Acad Sci U S A 2017; 114:4465-4470. [PMID: 28400513 PMCID: PMC5410820 DOI: 10.1073/pnas.1619508114] [Citation(s) in RCA: 90] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
The distribution of fitness effects (DFE) of new mutations plays a fundamental role in evolutionary genetics. However, the extent to which the DFE differs across species has yet to be systematically investigated. Furthermore, the biological mechanisms determining the DFE in natural populations remain unclear. Here, we show that theoretical models emphasizing different biological factors at determining the DFE, such as protein stability, back-mutations, species complexity, and mutational robustness make distinct predictions about how the DFE will differ between species. Analyzing amino acid-changing variants from natural populations in a comparative population genomic framework, we find that humans have a higher proportion of strongly deleterious mutations than Drosophila melanogaster. Furthermore, when comparing the DFE across yeast, Drosophila, mice, and humans, the average selection coefficient becomes more deleterious with increasing species complexity. Last, pleiotropic genes have a DFE that is less variable than that of nonpleiotropic genes. Comparing four categories of theoretical models, only Fisher's geometrical model (FGM) is consistent with our findings. FGM assumes that multiple phenotypes are under stabilizing selection, with the number of phenotypes defining the complexity of the organism. Our results suggest that long-term population size and cost of complexity drive the evolution of the DFE, with many implications for evolutionary and medical genomics.
Collapse
Affiliation(s)
- Christian D Huber
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA 90095;
| | - Bernard Y Kim
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA 90095
| | - Clare D Marsden
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA 90095
| | - Kirk E Lohmueller
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA 90095;
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, CA 90095
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, CA 90095
| |
Collapse
|
13
|
Savisaar R, Hurst LD. Estimating the prevalence of functional exonic splice regulatory information. Hum Genet 2017; 136:1059-1078. [PMID: 28405812 PMCID: PMC5602102 DOI: 10.1007/s00439-017-1798-3] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2017] [Accepted: 04/04/2017] [Indexed: 12/14/2022]
Abstract
In addition to coding information, human exons contain sequences necessary for correct splicing. These elements are known to be under purifying selection and their disruption can cause disease. However, the density of functional exonic splicing information remains profoundly uncertain. Several groups have experimentally investigated how mutations at different exonic positions affect splicing. They have found splice information to be distributed widely in exons, with one estimate putting the proportion of splicing-relevant nucleotides at >90%. These results suggest that splicing could place a major pressure on exon evolution. However, analyses of sequence conservation have concluded that the need to preserve splice regulatory signals only slightly constrains exon evolution, with a resulting decrease in the average human rate of synonymous evolution of only 1–4%. Why do these two lines of research come to such different conclusions? Among other reasons, we suggest that the methods are measuring different things: one assays the density of sites that affect splicing, the other the density of sites whose effects on splicing are visible to selection. In addition, the experimental methods typically consider short exons, thereby enriching for nucleotides close to the splice junction, such sites being enriched for splice-control elements. By contrast, in part owing to correction for nucleotide composition biases and to the assumption that constraint only operates on exon ends, the conservation-based methods can be overly conservative.
Collapse
Affiliation(s)
- Rosina Savisaar
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY, UK.
| | - Laurence D Hurst
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY, UK
| |
Collapse
|
14
|
DNA sequence diversity and the efficiency of natural selection in animal mitochondrial DNA. Heredity (Edinb) 2016; 118:88-95. [PMID: 27827387 DOI: 10.1038/hdy.2016.108] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2016] [Revised: 09/07/2016] [Accepted: 09/19/2016] [Indexed: 12/21/2022] Open
Abstract
Selection is expected to be more efficient in species that are more diverse because both the efficiency of natural selection and DNA sequence diversity are expected to depend upon the effective population size. We explore this relationship across a data set of 751 mammal species for which we have mitochondrial polymorphism data. We introduce a method by which we can examine the relationship between our measure of the efficiency of natural selection, the nonsynonymous relative to the synonymous nucleotide site diversity (πN/πS), and synonymous nucleotide diversity (πS), avoiding the statistical non-independence between the two quantities. We show that these two variables are strongly negatively and linearly correlated on a log scale. The slope is such that as πS doubles, πN/πS is reduced by 34%. We show that the slope of this relationship differs between the two phylogenetic groups for which we have the most data, rodents and bats, and that it also differs between species with high and low body mass, and between those with high and low mass-specific metabolic rate.
Collapse
|
15
|
Khatri BS, Goldstein RA. A coarse-grained biophysical model of sequence evolution and the population size dependence of the speciation rate. J Theor Biol 2015; 378:56-64. [PMID: 25936759 PMCID: PMC4457359 DOI: 10.1016/j.jtbi.2015.04.027] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2014] [Revised: 02/20/2015] [Accepted: 04/20/2015] [Indexed: 11/29/2022]
Abstract
Speciation is fundamental to understanding the huge diversity of life on Earth. Although still controversial, empirical evidence suggests that the rate of speciation is larger for smaller populations. Here, we explore a biophysical model of speciation by developing a simple coarse-grained theory of transcription factor-DNA binding and how their co-evolution in two geographically isolated lineages leads to incompatibilities. To develop a tractable analytical theory, we derive a Smoluchowski equation for the dynamics of binding energy evolution that accounts for the fact that natural selection acts on phenotypes, but variation arises from mutations in sequences; the Smoluchowski equation includes selection due to both gradients in fitness and gradients in sequence entropy, which is the logarithm of the number of sequences that correspond to a particular binding energy. This simple consideration predicts that smaller populations develop incompatibilities more quickly in the weak mutation regime; this trend arises as sequence entropy poises smaller populations closer to incompatible regions of phenotype space. These results suggest a generic coarse-grained approach to evolutionary stochastic dynamics, allowing realistic modelling at the phenotypic level.
Collapse
Affiliation(s)
- Bhavin S Khatri
- The Francis Crick Institute, Mill Hill Laboratory, The Ridgeway, London NW7 1AA, UK; Division of Infection & Immunity, University College London, London WC1E 6BT, UK.
| | - Richard A Goldstein
- Division of Infection & Immunity, University College London, London WC1E 6BT, UK.
| |
Collapse
|
16
|
Causes of natural variation in fitness: evidence from studies of Drosophila populations. Proc Natl Acad Sci U S A 2015; 112:1662-9. [PMID: 25572964 DOI: 10.1073/pnas.1423275112] [Citation(s) in RCA: 130] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
DNA sequencing has revealed high levels of variability within most species. Statistical methods based on population genetics theory have been applied to the resulting data and suggest that most mutations affecting functionally important sequences are deleterious but subject to very weak selection. Quantitative genetic studies have provided information on the extent of genetic variation within populations in traits related to fitness and the rate at which variability in these traits arises by mutation. This paper attempts to combine the available information from applications of the two approaches to populations of the fruitfly Drosophila in order to estimate some important parameters of genetic variation, using a simple population genetics model of mutational effects on fitness components. Analyses based on this model suggest the existence of a class of mutations with much larger fitness effects than those inferred from sequence variability and that contribute most of the standing variation in fitness within a population caused by the input of mildly deleterious mutations. However, deleterious mutations explain only part of this standing variation, and other processes such as balancing selection appear to make a large contribution to genetic variation in fitness components in Drosophila.
Collapse
|
17
|
Ramos-Onsins SE, Burgos-Paz W, Manunza A, Amills M. Mining the pig genome to investigate the domestication process. Heredity (Edinb) 2014; 113:471-84. [PMID: 25074569 PMCID: PMC4815588 DOI: 10.1038/hdy.2014.68] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2013] [Revised: 05/22/2014] [Accepted: 06/09/2014] [Indexed: 12/11/2022] Open
Abstract
Pig domestication began around 9000 YBP in the Fertile Crescent and Far East, involving marked morphological and genetic changes that occurred in a relatively short window of time. Identifying the alleles that drove the behavioural and physiological transformation of wild boars into pigs through artificial selection constitutes a formidable challenge that can only be faced from an interdisciplinary perspective. Indeed, although basic facts regarding the demography of pig domestication and dispersal have been uncovered, the biological substrate of these processes remains enigmatic. Considerable hope has been placed on new approaches, based on next-generation sequencing, which allow whole-genome variation to be analyzed at the population level. In this review, we provide an outline of the current knowledge on pig domestication by considering both archaeological and genetic data. Moreover, we discuss several potential scenarios of genome evolution under the complex mixture of demography and selection forces at play during domestication. Finally, we highlight several technical and methodological approaches that may represent significant advances in resolving the conundrum of livestock domestication.
Collapse
Affiliation(s)
- S E Ramos-Onsins
- Department of Animal Genetics, Center for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Edifici CRAG, Campus Universitat Autònoma Barcelona, Bellaterra, Spain
| | - W Burgos-Paz
- Department of Animal Genetics, Center for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Edifici CRAG, Campus Universitat Autònoma Barcelona, Bellaterra, Spain
| | - A Manunza
- Department of Animal Genetics, Center for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Edifici CRAG, Campus Universitat Autònoma Barcelona, Bellaterra, Spain
| | - M Amills
- Department of Animal Genetics, Center for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Edifici CRAG, Campus Universitat Autònoma Barcelona, Bellaterra, Spain
| |
Collapse
|
18
|
McCandlish DM, Epstein CL, Plotkin JB. Formal properties of the probability of fixation: identities, inequalities and approximations. Theor Popul Biol 2014; 99:98-113. [PMID: 25450112 DOI: 10.1016/j.tpb.2014.11.004] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2014] [Revised: 11/03/2014] [Accepted: 11/11/2014] [Indexed: 12/22/2022]
Abstract
The formula for the probability of fixation of a new mutation is widely used in theoretical population genetics and molecular evolution. Here we derive a series of identities, inequalities and approximations for the exact probability of fixation of a new mutation under the Moran process (equivalent results hold for the approximate probability of fixation under the Wright-Fisher process, after an appropriate change of variables). We show that the logarithm of the fixation probability has particularly simple behavior when the selection coefficient is measured as a difference of Malthusian fitnesses, and we exploit this simplicity to derive inequalities and approximations. We also present a comprehensive comparison of both existing and new approximations for the fixation probability, highlighting those approximations that induce a reversible Markov chain when used to describe the dynamics of evolution under weak mutation. To demonstrate the power of these results, we consider the classical problem of determining the total substitution rate across an ensemble of biallelic loci and prove that, at equilibrium, a strict majority of substitutions are due to drift rather than selection.
Collapse
Affiliation(s)
- David M McCandlish
- Department of Biology, University of Pennsylvania, Philadelphia, PA, United States.
| | - Charles L Epstein
- Department of Mathematics, University of Pennsylvania, Philadelphia, PA, United States
| | - Joshua B Plotkin
- Department of Biology, University of Pennsylvania, Philadelphia, PA, United States
| |
Collapse
|
19
|
Approximation to the distribution of fitness effects across functional categories in human segregating polymorphisms. PLoS Genet 2014; 10:e1004697. [PMID: 25375159 PMCID: PMC4222666 DOI: 10.1371/journal.pgen.1004697] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2014] [Accepted: 08/22/2014] [Indexed: 02/03/2023] Open
Abstract
Quantifying the proportion of polymorphic mutations that are deleterious or neutral is of fundamental importance to our understanding of evolution, disease genetics and the maintenance of variation genome-wide. Here, we develop an approximation to the distribution of fitness effects (DFE) of segregating single-nucleotide mutations in humans. Unlike previous methods, we do not assume that synonymous mutations are neutral or not strongly selected, and we do not rely on fitting the DFE of all new nonsynonymous mutations to a single probability distribution, which is poorly motivated on a biological level. We rely on a previously developed method that utilizes a variety of published annotations (including conservation scores, protein deleteriousness estimates and regulatory data) to score all mutations in the human genome based on how likely they are to be affected by negative selection, controlling for mutation rate. We map this and other conservation scores to a scale of fitness coefficients via maximum likelihood using diffusion theory and a Poisson random field model on SNP data. Our method serves to approximate the deleterious DFE of mutations that are segregating, regardless of their genomic consequence. We can then compare the proportion of mutations that are negatively selected or neutral across various categories, including different types of regulatory sites. We observe that the distribution of intergenic polymorphisms is highly peaked at neutrality, while the distribution of nonsynonymous polymorphisms has a second peak at [Formula: see text]. Other types of polymorphisms have shapes that fall roughly in between these two. We find that transcriptional start sites, strong CTCF-enriched elements and enhancers are the regulatory categories with the largest proportion of deleterious polymorphisms.
Collapse
|
20
|
McCandlish DM, Stoltzfus A. Modeling evolution using the probability of fixation: history and implications. QUARTERLY REVIEW OF BIOLOGY 2014; 89:225-52. [PMID: 25195318 DOI: 10.1086/677571] [Citation(s) in RCA: 123] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
Many models of evolution calculate the rate of evolution by multiplying the rate at which new mutations originate within a population by a probability of fixation. Here we review the historical origins, contemporary applications, and evolutionary implications of these "origin-fixation" models, which are widely used in evolutionary genetics, molecular evolution, and phylogenetics. Origin-fixation models were first introduced in 1969, in association with an emerging view of "molecular" evolution. Early origin-fixation models were used to calculate an instantaneous rate of evolution across a large number of independently evolving loci; in the 1980s and 1990s, a second wave of origin-fixation models emerged to address a sequence of fixation events at a single locus. Although origin fixation models have been applied to a broad array of problems in contemporary evolutionary research, their rise in popularity has not been accompanied by an increased appreciation of their restrictive assumptions or their distinctive implications. We argue that origin-fixation models constitute a coherent theory of mutation-limited evolution that contrasts sharply with theories of evolution that rely on the presence of standing genetic variation. A major unsolved question in evolutionary biology is the degree to which these models provide an accurate approximation of evolution in natural populations.
Collapse
|
21
|
De Silva DR, Nichols R, Elgar G. Purifying selection in deeply conserved human enhancers is more consistent than in coding sequences. PLoS One 2014; 9:e103357. [PMID: 25062004 PMCID: PMC4111549 DOI: 10.1371/journal.pone.0103357] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2013] [Accepted: 07/01/2014] [Indexed: 12/30/2022] Open
Abstract
Comparison of polymorphism at synonymous and non-synonymous sites in protein-coding DNA can provide evidence for selective constraint. Non-coding DNA that forms part of the regulatory landscape presents more of a challenge since there is not such a clear-cut distinction between sites under stronger and weaker selective constraint. Here, we consider putative regulatory elements termed Conserved Non-coding Elements (CNEs) defined by their high level of sequence identity across all vertebrates. Some mutations in these regions have been implicated in developmental disorders; we analyse CNE polymorphism data to investigate whether such deleterious effects are widespread in humans. Single nucleotide variants from the HapMap and 1000 Genomes Projects were mapped across nearly 2000 CNEs. In the 1000 Genomes data we find a significant excess of rare derived alleles in CNEs relative to coding sequences; this pattern is absent in HapMap data, apparently obscured by ascertainment bias. The distribution of polymorphism within CNEs is not uniform; we could identify two categories of sites by exploiting deep vertebrate alignments: stretches that are non-variant, and those that have at least one substitution. The conserved category has fewer polymorphic sites and a greater excess of rare derived alleles, which can be explained by a large proportion of sites under strong purifying selection within humans--higher than that for non-synonymous sites in most protein coding regions, and comparable to that at the strongly conserved trans-dev genes. Conversely, the more evolutionarily labile CNE sites have an allele frequency distribution not significantly different from non-synonymous sites. Future studies should exploit genome-wide re-sequencing to obtain better coverage in selected non-coding regions, given the likelihood that mutations in evolutionarily conserved enhancer sequences are deleterious. Discovery pipelines should validate non-coding variants to aid in identifying causal and risk-enhancing variants in complex disorders, in contrast to the current focus on exome sequencing.
Collapse
Affiliation(s)
- Dilrini R. De Silva
- Systems Biology, MRC National Institute for Medical Research, Mill Hill, London, United Kingdom
- School of Biological and Chemical Sciences, Queen Mary University of London, London, United Kingdom
| | - Richard Nichols
- School of Biological and Chemical Sciences, Queen Mary University of London, London, United Kingdom
| | - Greg Elgar
- Systems Biology, MRC National Institute for Medical Research, Mill Hill, London, United Kingdom
| |
Collapse
|
22
|
Background selection as baseline for nucleotide variation across the Drosophila genome. PLoS Genet 2014; 10:e1004434. [PMID: 24968283 PMCID: PMC4072542 DOI: 10.1371/journal.pgen.1004434] [Citation(s) in RCA: 92] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2013] [Accepted: 04/28/2014] [Indexed: 11/21/2022] Open
Abstract
The constant removal of deleterious mutations by natural selection causes a reduction in neutral diversity and efficacy of selection at genetically linked sites (a process called Background Selection, BGS). Population genetic studies, however, often ignore BGS effects when investigating demographic events or the presence of other types of selection. To obtain a more realistic evolutionary expectation that incorporates the unavoidable consequences of deleterious mutations, we generated high-resolution landscapes of variation across the Drosophila melanogaster genome under a BGS scenario independent of polymorphism data. We find that BGS plays a significant role in shaping levels of variation across the entire genome, including long introns and intergenic regions distant from annotated genes. We also find that a very large percentage of the observed variation in diversity across autosomes can be explained by BGS alone, up to 70% across individual chromosome arms at 100-kb scale, thus indicating that BGS predictions can be used as baseline to infer additional types of selection and demographic events. This approach allows detecting several outlier regions with signal of recent adaptive events and selective sweeps. The use of a BGS baseline, however, is particularly appropriate to investigate the presence of balancing selection and our study exposes numerous genomic regions with the predicted signature of higher polymorphism than expected when a BGS context is taken into account. Importantly, we show that these conclusions are robust to the mutation and selection parameters of the BGS model. Finally, analyses of protein evolution together with previous comparisons of genetic maps between Drosophila species, suggest temporally variable recombination landscapes and, thus, local BGS effects that may differ between extant and past phases. Because genome-wide BGS and temporal changes in linkage effects can skew approaches to estimate demographic and selective events, future analyses should incorporate BGS predictions and capture local recombination variation across genomes and along lineages. The removal of deleterious mutations from natural populations has potential consequences on patterns of variation across genomes. Population genetic analyses, however, often assume that such effects are negligible across recombining regions of species like Drosophila. We use simple models of purifying selection and current knowledge of recombination rates and gene distribution across the genome to obtain a baseline of variation predicted by the constant input and removal of deleterious mutations. We find that purifying selection alone can explain a major fraction of the observed variance in nucleotide diversity across the genome. The use of a baseline of variation predicted by linkage to deleterious mutations as null expectation exposes genomic regions under other selective regimes, including more regions showing the signature of balancing selection than would be evident when using traditional approaches. Our study also indicates that most, if not all, nucleotides across the D. melanogaster genome are significantly influenced by the removal of deleterious mutations, even when located in the middle of highly recombining regions and distant from genes. Additionally, the study of rates of protein evolution confirms previous analyses suggesting that the recombination landscape across the genome has changed in the recent history of D. melanogaster. All these reported factors can skew current analyses designed to capture demographic events or estimate the strength and frequency of adaptive mutations, and illustrate the need for new and more realistic theoretical and modeling approaches to study naturally occurring genetic variation.
Collapse
|
23
|
Pleiotropy can be effectively estimated without counting phenotypes through the rank of a genotype-phenotype map. Genetics 2014; 197:1357-63. [PMID: 24899162 DOI: 10.1534/genetics.114.164673] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Although pleiotropy, the capability of a gene to affect multiple phenotypes, has been well known as one of the common gene properties, a quantitative estimation remains a great challenge, simply because of the phenotype complexity. Not surprisingly, it is hard for general readers to understand how, without counting phenotypes, gene pleiotropy can be effectively estimated from the genetics data. In this article we extensively discuss the Gu-2007 method that estimated pleiotropy from the protein sequence analysis. We show that this method is actually to estimate the rank (K) of genotype-phenotype mapping that can be concisely written as K = min(r, Pmin), where Pmin is the minimum pleiotropy among all legitimate measures including the fitness components, and r is the rank of mutational effects of an amino acid site. Together, the effective gene pleiotropy (Ke) estimated by the Gu-2007 method has the following meanings: (i) Ke is an estimate of K = min(r, Pmin), the rank of a genotype-phenotype map; (ii) Ke is an estimate for the minimum pleiotropy Pmin only if Pmin < r; (iii) the Gu-2007 method attempted to estimate the pleiotropy of amino acid sites, a conserved proxy to the true gene pleiotropy; (iv) with a sufficiently large phylogeny such that the rank of mutational effects at an amino acid site is r → 19, one can estimate Pmin between 1 and 19; and (v) Ke is a conserved estimate of K because those slightly affected components in fitness have been effectively removed by the estimation procedure. In addition, we conclude that mutational pleiotropy (number of traits affected by a single mutation) cannot be estimated without knowing the phenotypes.
Collapse
|
24
|
Chan CHS, Hamblin S, Tanaka MM. The effects of linkage on comparative estimators of selection. BMC Evol Biol 2013; 13:244. [PMID: 24199711 PMCID: PMC3828407 DOI: 10.1186/1471-2148-13-244] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2013] [Accepted: 10/29/2013] [Indexed: 11/10/2022] Open
Abstract
Background A major goal of molecular evolution is to determine how natural selection has shaped the evolution of a gene. One approach taken by methods such as KA/KS and the McDonald-Kreitman (MK) test is to compare the frequency of non-synonymous and synonymous changes. These methods, however, rely on the assumption that a change in frequency of one mutation will not affect changes in frequency of other mutations. Results We demonstrate that linkage between sites can bias measures of selection based on synonymous and non-synonymous changes. Using forward simulation of a Wright-Fisher process, we show that hitch-hiking of deleterious mutations with advantageous mutations can lead to overestimation of the number of adaptive substitutions, while background selection and clonal interference can distort the site frequency spectrum to obscure the signal for positive selection. We present three diagnostics for detecting these effects of linked selection and apply them to the human influenza (H3N2) hemagglutinin gene. Conclusion Various forms of linked selection have characteristic effects on MK-type statistics. The extent of background selection, hitch-hiking and clonal interference can be evaluated using the diagnostic statistics presented here. The diagnostics can also be used to determine how well we expect the MK statistics to perform and whether one form of the statistic may be preferable to another.
Collapse
Affiliation(s)
- Carmen H S Chan
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia.
| | | | | |
Collapse
|
25
|
Abstract
Knowing the distribution of fitness effects (DFE) of new mutations is important for several topics in evolutionary genetics. Existing computational methods with which to infer the DFE based on DNA polymorphism data have frequently assumed that the DFE can be approximated by a unimodal distribution, such as a lognormal or a gamma distribution. However, if the true DFE departs substantially from the assumed distribution (e.g., if the DFE is multimodal), this could lead to misleading inferences about its properties. We conducted simulations to test the performance of parametric and nonparametric discretized distribution models to infer the properties of the DFE for cases in which the true DFE is unimodal, bimodal, or multimodal. We found that lognormal and gamma distribution models can perform poorly in recovering the properties of the distribution if the true DFE is bimodal or multimodal, whereas discretized distribution models perform better. If there is a sufficient amount of data, the discretized models can detect a multimodal DFE and can accurately infer the mean effect and the average fixation probability of a new deleterious mutation. We fitted several models for the DFE of amino acid-changing mutations using whole-genome polymorphism data from Drosophila melanogaster and the house mouse subspecies Mus musculus castaneus. A lognormal DFE best explains the data for D. melanogaster, whereas we find evidence for a bimodal DFE in M. m. castaneus.
Collapse
|
26
|
Gingold H, Dahan O, Pilpel Y. Dynamic changes in translational efficiency are deduced from codon usage of the transcriptome. Nucleic Acids Res 2012; 40:10053-63. [PMID: 22941644 PMCID: PMC3488229 DOI: 10.1093/nar/gks772] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Translation of a gene is assumed to be efficient if the supply of the tRNAs that translate it is high. Yet high-abundance tRNAs are often also at high demand since they correspond to preferred codons in genomes. Thus to fully model translational efficiency one must gauge the supply-to-demand ratio of the tRNAs that are required by the transcriptome at a given time. The tRNAs’ supply is often approximated by their gene copy number in the genome. Yet neither the demand for each tRNA nor the extent to which its concentration changes across environmental conditions has been extensively examined. Here we compute changes in the codon usage of the transcriptome across different conditions in several organisms by inspecting conventional mRNA expression data. We find recurring dynamics of codon usage in the transcriptome in multiple stressful conditions. In particular, codons that are translated by rare tRNAs become over-represented in the transcriptome in response to stresses. These results raise the possibility that the tRNA pool might dynamically change upon stress to support efficient translation of stress-transcribed genes. Alternatively, stress genes may be typically translated with low efficiency, presumably due to lack of sufficient evolutionary optimization pressure on their codon usage.
Collapse
Affiliation(s)
- Hila Gingold
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | | | | |
Collapse
|
27
|
Betancourt AJ, Blanco-Martin B, Charlesworth B. The relation between the neutrality index for mitochondrial genes and the distribution of mutational effects on fitness. Evolution 2012; 66:2427-38. [PMID: 22834742 DOI: 10.1111/j.1558-5646.2012.01628.x] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
We explore factors affecting patterns of polymorphism and divergence (as captured by the neutrality index) at mammalian mitochondrial loci. To do this, we develop a population genetic model that incorporates a fraction of neutral amino acid sites, mutational bias, and a probability distribution of selection coefficients against new nonsynonymous mutations. We confirm, by reanalyzing publicly available datasets, that the mitochondrial cyt-b gene shows a broad range of neutrality indices across mammalian taxa, and explore the biological factors that can explain this observation. We find that observed patterns of differences in the neutrality index, polymorphism, and divergence are not caused by differences in mutational bias. They can, however, be explained by a combination of a small fraction of neutral amino acid sites, weak selection acting on most amino acid mutations, and differences in effective population size among taxa.
Collapse
Affiliation(s)
- Andrea J Betancourt
- Institut für Populationsgenetik, Vetmeduni Vienna, Veterinärplatz 1, 1210 Wien, Austria.
| | | | | |
Collapse
|
28
|
Estimating the distribution of selection coefficients from phylogenetic data using sitewise mutation-selection models. Genetics 2011; 190:1101-15. [PMID: 22209901 PMCID: PMC3296245 DOI: 10.1534/genetics.111.136432] [Citation(s) in RCA: 97] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Estimation of the distribution of selection coefficients of mutations is a long-standing issue in molecular evolution. In addition to population-based methods, the distribution can be estimated from DNA sequence data by phylogenetic-based models. Previous models have generally found unimodal distributions where the probability mass is concentrated between mildly deleterious and nearly neutral mutations. Here we use a sitewise mutation–selection phylogenetic model to estimate the distribution of selection coefficients among novel and fixed mutations (substitutions) in a data set of 244 mammalian mitochondrial genomes and a set of 401 PB2 proteins from influenza. We find a bimodal distribution of selection coefficients for novel mutations in both the mitochondrial data set and for the influenza protein evolving in its natural reservoir, birds. Most of the mutations are strongly deleterious with the rest of the probability mass concentrated around mildly deleterious to neutral mutations. The distribution of the coefficients among substitutions is unimodal and symmetrical around nearly neutral substitutions for both data sets at adaptive equilibrium. About 0.5% of the nonsynonymous mutations and 14% of the nonsynonymous substitutions in the mitochondrial proteins are advantageous, with 0.5% and 24% observed for the influenza protein. Following a host shift of influenza from birds to humans, however, we find among novel mutations in PB2 a trimodal distribution with a small mode of advantageous mutations.
Collapse
|
29
|
Wilson DJ, Hernandez RD, Andolfatto P, Przeworski M. A population genetics-phylogenetics approach to inferring natural selection in coding sequences. PLoS Genet 2011; 7:e1002395. [PMID: 22144911 PMCID: PMC3228810 DOI: 10.1371/journal.pgen.1002395] [Citation(s) in RCA: 73] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2010] [Accepted: 10/08/2011] [Indexed: 01/23/2023] Open
Abstract
Through an analysis of polymorphism within and divergence between species, we can hope to learn about the distribution of selective effects of mutations in the genome, changes in the fitness landscape that occur over time, and the location of sites involved in key adaptations that distinguish modern-day species. We introduce a novel method for the analysis of variation in selection pressures within and between species, spatially along the genome and temporally between lineages. We model codon evolution explicitly using a joint population genetics-phylogenetics approach that we developed for the construction of multiallelic models with mutation, selection, and drift. Our approach has the advantage of performing direct inference on coding sequences, inferring ancestral states probabilistically, utilizing allele frequency information, and generalizing to multiple species. We use a Bayesian sliding window model for intragenic variation in selection coefficients that efficiently combines information across sites and captures spatial clustering within the genome. To demonstrate the utility of the method, we infer selective pressures acting in Drosophila melanogaster and D. simulans from polymorphism and divergence data for 100 X-linked coding regions.
Collapse
Affiliation(s)
- Daniel J Wilson
- Department of Human Genetics and Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, USA.
| | | | | | | |
Collapse
|
30
|
Schneider A, Charlesworth B, Eyre-Walker A, Keightley PD. A method for inferring the rate of occurrence and fitness effects of advantageous mutations. Genetics 2011; 189:1427-37. [PMID: 21954160 PMCID: PMC3241409 DOI: 10.1534/genetics.111.131730] [Citation(s) in RCA: 84] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2011] [Accepted: 09/24/2011] [Indexed: 11/18/2022] Open
Abstract
The distribution of fitness effects (DFE) of new mutations is of fundamental importance in evolutionary genetics. Recently, methods have been developed for inferring the DFE that use information from the allele frequency distributions of putatively neutral and selected nucleotide polymorphic variants in a population sample. Here, we extend an existing maximum-likelihood method that estimates the DFE under the assumption that mutational effects are unconditionally deleterious, by including a fraction of positively selected mutations. We allow one or more classes of positive selection coefficients in the model and estimate both the fraction of mutations that are advantageous and the strength of selection acting on them. We show by simulations that the method is capable of recovering the parameters of the DFE under a range of conditions. We apply the method to two data sets on multiple protein-coding genes from African populations of Drosophila melanogaster. We use a probabilistic reconstruction of the ancestral states of the polymorphic sites to distinguish between derived and ancestral states at polymorphic nucleotide sites. In both data sets, we see a significant improvement in the fit when a category of positively selected amino acid mutations is included, but no further improvement if additional categories are added. We estimate that between 1% and 2% of new nonsynonymous mutations in D. melanogaster are positively selected, with a scaled selection coefficient representing the product of the effective population size, N(e), and the strength of selection on heterozygous carriers of ∼2.5.
Collapse
Affiliation(s)
- Adrian Schneider
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JT, United Kingdom
| | - Brian Charlesworth
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JT, United Kingdom
| | - Adam Eyre-Walker
- School of Life Sciences, University of Sussex, Brighton BN1 9QG, United Kingdom
| | - Peter D. Keightley
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JT, United Kingdom
| |
Collapse
|
31
|
Abstract
SummaryPopulation genomics is the study of the amount and causes of genome-wide variability in natural populations, a topic that has been under discussion since Darwin. This paper first briefly reviews the early development of molecular approaches to the subject: the pioneering unbiased surveys of genetic variability at multiple loci by means of gel electrophoresis and restriction enzyme mapping. The results of surveys of levels of genome-wide variability using DNA resequencing studies are then discussed. Studies of the extent to which variability for different classes of variants (non-synonymous, synonymous and non-coding) are affected by natural selection, or other directional forces such as biased gene conversion, are also described. Finally, the effects of deleterious mutations on population fitness and the possible role of Hill–Robertson interference in shaping patterns of sequence variability are discussed.
Collapse
|
32
|
Lourenço J, Galtier N, Glémin S. COMPLEXITY, PLEIOTROPY, AND THE FITNESS EFFECT OF MUTATIONS. Evolution 2011; 65:1559-71. [DOI: 10.1111/j.1558-5646.2011.01237.x] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
33
|
Abstract
The distribution of fitness effects (DFE) of mutations is of fundamental importance for understanding evolutionary dynamics and complex diseases and for conserving threatened species. DFEs estimated from DNA sequences have rarely been subject to direct experimental tests. We used a bacterial system in which the fitness effects of a large number of defined single mutations in two ribosomal proteins were measured with high sensitivity. The obtained DFE appears to be unimodal, where most mutations (120 out of 126) are weakly deleterious and the remaining ones are potentially neutral. The DFEs for synonymous and nonsynonymous substitutions are similar, suggesting that in some genes, strong fitness constraints are present at the level of the messenger RNA.
Collapse
Affiliation(s)
- Peter A Lind
- Department of Medical Biochemistry and Microbiology, Uppsala University, SE-75123 Uppsala, Sweden
| | | | | |
Collapse
|
34
|
Ratnakumar A, Mousset S, Glémin S, Berglund J, Galtier N, Duret L, Webster MT. Detecting positive selection within genomes: the problem of biased gene conversion. Philos Trans R Soc Lond B Biol Sci 2010; 365:2571-80. [PMID: 20643747 DOI: 10.1098/rstb.2010.0007] [Citation(s) in RCA: 102] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
The identification of loci influenced by positive selection is a major goal of evolutionary genetics. A popular approach is to perform scans of alignments on a genome-wide scale in order to find regions evolving at accelerated rates on a particular branch of a phylogenetic tree. However, positive selection is not the only process that can lead to accelerated evolution. Notably, GC-biased gene conversion (gBGC) is a recombination-associated process that results in the biased fixation of G and C nucleotides. This process can potentially generate bursts of nucleotide substitutions within hotspots of meiotic recombination. Here, we analyse the results of a scan for positive selection on genes on branches across the primate phylogeny. We show that genes identified as targets of positive selection have a significant tendency to exhibit the genomic signature of gBGC. Using a maximum-likelihood framework, we estimate that more than 20 per cent of cases of significantly elevated non-synonymous to synonymous substitution rates ratio (d(N)/d(S)), particularly in shorter branches, could be due to gBGC. We demonstrate that in some cases, gBGC can lead to very high d(N)/d(S) (more than 2). Our results indicate that gBGC significantly affects the evolution of coding sequences in primates, often leading to patterns of evolution that can be mistaken for positive selection.
Collapse
Affiliation(s)
- Abhirami Ratnakumar
- Department of Medical Biochemistry and Microbiology, Uppsala University, Box 582, 751 23 Uppsala, Sweden
| | | | | | | | | | | | | |
Collapse
|
35
|
Elyashiv E, Bullaughey K, Sattath S, Rinott Y, Przeworski M, Sella G. Shifts in the intensity of purifying selection: an analysis of genome-wide polymorphism data from two closely related yeast species. Genome Res 2010; 20:1558-73. [PMID: 20817943 DOI: 10.1101/gr.108993.110] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
How much does the intensity of purifying selection vary among populations and species? How uniform are the shifts in selective pressures across the genome? To address these questions, we took advantage of a recent, whole-genome polymorphism data set from two closely related species of yeast, Saccharomyces cerevisiae and S. paradoxus, paying close attention to the population structure within these species. We found that the average intensity of purifying selection on amino acid sites varies markedly among populations and between species. As expected in the presence of extensive weakly deleterious mutations, the effect of purifying selection is substantially weaker on single nucleotide polymorphisms (SNPs) segregating within populations than on SNPs fixed between population samples. Also in accordance with a Nearly Neutral model, the variation in the intensity of purifying selection across populations corresponds almost perfectly to simple measures of their effective size. As a first step toward understanding the processes generating these patterns, we sought to tease apart the relative importance of systematic, genome-wide changes in the efficacy of selection, such as those expected from demographic processes and of gene-specific changes, which may be expected after a shift in selective pressures. For that purpose, we developed a new model for the evolution of purifying selection between populations and inferred its parameters from the genome-wide data using a likelihood approach. We found that most, but not all changes seem to be explained by systematic shifts in the efficacy of selection. One population, the sake-derived strains of S. cerevisiae, however, also shows extensive gene-specific changes, plausibly associated with domestication. These findings have important implications for our understanding of purifying selection as well as for estimates of the rate of molecular adaptation in yeast and in other species.
Collapse
Affiliation(s)
- Eyal Elyashiv
- Department of Evolution, Systematics, and Ecology, Hebrew University of Jerusalem, Jerusalem 91905, Israel
| | | | | | | | | | | |
Collapse
|
36
|
Keightley PD, Eyre-Walker A. What can we learn about the distribution of fitness effects of new mutations from DNA sequence data? Philos Trans R Soc Lond B Biol Sci 2010; 365:1187-93. [PMID: 20308093 DOI: 10.1098/rstb.2009.0266] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
We investigate several questions concerning the inference of the distribution of fitness effects (DFE) of new mutations from the distribution of nucleotide frequencies in a population sample. If a fixed sequencing effort is available, we find that the optimum strategy is to sequence a modest number of alleles (approx. 10). If full genome information is available, the accuracy of parameter estimates increases as the number of alleles sequenced increases, but with diminishing returns. It is unlikely that the DFE for single genes can be reliably estimated in organisms such as humans and Drosophila, unless genes are very large and we sequence hundreds or perhaps thousands of alleles. We consider models involving several discrete classes of mutations in which the selection strength and density apportioned to each class can vary. Models with three classes fit almost as well as four class models unless many hundreds of alleles are sequenced. Large numbers of alleles need to be sequenced to accurately estimate the distribution's mean and variance. Estimating complex DFEs may therefore be difficult. Finally, we examine models involving slightly advantageous mutations. We show that the distribution of the absolute strength of selection is well estimated if mutations are assumed to be unconditionally deleterious.
Collapse
Affiliation(s)
- Peter D Keightley
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, , West Mains Road, Edinburgh EH9 3JT, UK.
| | | |
Collapse
|
37
|
Estimating the parameters of selection on nonsynonymous mutations in Drosophila pseudoobscura and D. miranda. Genetics 2010; 185:1381-96. [PMID: 20516497 DOI: 10.1534/genetics.110.117614] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We present the results of surveys of diversity in sets of >40 X-linked and autosomal loci in samples from natural populations of Drosophila miranda and D. pseudoobscura, together with their sequence divergence from D. affinis. Mean silent site diversity in D. miranda is approximately one-quarter of that in D. pseudoobscura; mean X-linked silent diversity is about three-quarters of that for the autosomes in both species. Estimates of the distribution of selection coefficients against heterozygous, deleterious nonsynonymous mutations from two different methods suggest a wide distribution, with coefficients of variation greater than one, and with the average segregating amino acid mutation being subject to only very weak selection. Only a small fraction of new amino acid mutations behave as effectively neutral, however. A large fraction of amino acid differences between D. pseudoobscura and D. affinis appear to have been fixed by positive natural selection, using three different methods of estimation; estimates between D. miranda and D. affinis are more equivocal. Sources of bias in the estimates, especially those arising from selection on synonymous mutations and from the choice of genes, are discussed and corrections for these applied. Overall, the results show that both purifying selection and positive selection on nonsynonymous mutations are pervasive.
Collapse
|
38
|
Durand E, Tenaillon MI, Ridel C, Coubriche D, Jamin P, Jouanne S, Ressayre A, Charcosset A, Dillmann C. Standing variation and new mutations both contribute to a fast response to selection for flowering time in maize inbreds. BMC Evol Biol 2010; 10:2. [PMID: 20047647 PMCID: PMC2837650 DOI: 10.1186/1471-2148-10-2] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2009] [Accepted: 01/04/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In order to investigate the rate and limits of the response to selection from highly inbred genetic material and evaluate the respective contribution of standing variation and new mutations, we conducted a divergent selection experiment from maize inbred lines in open-field conditions during 7 years. Two maize commercial seed lots considered as inbred lines, F252 and MBS847, constituted two biological replicates of the experiment. In each replicate, we derived an Early and a Late population by selecting and selfing the earliest and the latest individuals, respectively, to produce the next generation. RESULTS All populations, except the Early MBS847, responded to selection despite a short number of generations and a small effective population size. Part of the response can be attributed to standing genetic variation in the initial seed lot. Indeed, we identified one polymorphism initially segregating in the F252 seed lot at a candidate locus for flowering time, which explained 35% of the trait variation within the Late F252 population. However, the model that best explained our data takes into account both residual polymorphism in the initial seed lots and a constant input of heritable genetic variation by new (epi)mutations. Under this model, values of mutational heritability range from 0.013 to 0.025, and stand as an upper bound compare to what is reported in other species. CONCLUSIONS Our study reports a long-term divergent selection experiment for a complex trait, flowering time, conducted on maize in open-field conditions. Starting from a highly inbred material, we created within a few generations populations that strikingly differ from the initial seed lot for flowering time while preserving most of the phenotypic characteristics of the initial inbred. Such material is unique for studying the dynamics of the response to selection and its determinants. In addition to the fixation of a standing beneficial mutation associated with a large phenotypic effect, a constant input of genetic variance by new mutations has likely contributed to the response. We discuss our results in the context of the evolution and mutational dynamics of populations characterized by a small effective population size.
Collapse
Affiliation(s)
- Eléonore Durand
- INRA, UMR de Génétique Végétale, INRA/CNRS/Univ Paris-Sud/ AgroParistech, Ferme du Moulon, F-91190 Gif sur Yvette, France
| | | | | | | | | | | | | | | | | |
Collapse
|
39
|
Measuring the rates of spontaneous mutation from deep and large-scale polymorphism data. Genetics 2009; 182:1219-32. [PMID: 19528323 DOI: 10.1534/genetics.109.105692] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The rates and patterns of spontaneous mutation are fundamental parameters of molecular evolution. Current methodology either tries to measure such rates and patterns directly in mutation-accumulation experiments or tries to infer them indirectly from levels of divergence or polymorphism. While experimental approaches are constrained by the low rate at which new mutations occur, indirect approaches suffer from their underlying assumption that mutations are effectively neutral. Here I present a maximum-likelihood approach to estimate mutation rates from large-scale polymorphism data. It is demonstrated that the method is not sensitive to demography and the distribution of selection coefficients among mutations when applied to mutations at sufficiently low population frequencies. With the many large-scale sequencing projects currently underway, for instance, the 1000 genomes project in humans, plenty of the required low-frequency polymorphism data will shortly become available. My method will allow for an accurate and unbiased inference of mutation rates and patterns from such data sets at high spatial resolution. I discuss how the assessment of several long-standing problems of evolutionary biology would benefit from the availability of accurate mutation rate estimates.
Collapse
|
40
|
Abstract
Human genes responsible for inherited diseases are important for the understanding of human disease. We investigated the degree of polymorphism and divergence in the human disease genes to elucidate the effect of natural selection on human disease genes. In particular, the effect of disease dominance was incorporated into the analysis. Both dominant disease genes (DDG) and recessive disease genes (RDG) had a higher mutation rate per site and encoded longer proteins than the nondisease genes, which exposed the disease genes to a faster flux of new mutations. Using an unbiased polymorphism dataset, we found that, proportionally, RDG harbor more nonsynonymous polymorphisms compared with DDG. We estimated the selection intensity on the disease genes using polymorphism and divergence data and determined whether the different patterns of polymorphism and divergence between DDG and RDG could be explained by the difference in only dominance. Even after the dominance effect was considered, the selection intensity on RDG was significantly different from DDG, suggesting that the deleterious effect of the dominant and recessive disease mutations are fundamentally different.
Collapse
|
41
|
Zhang L, Watson LT. Analysis of the fitness effect of compensatory mutations. HFSP JOURNAL 2008; 3:47-54. [PMID: 19649156 PMCID: PMC2689613 DOI: 10.2976/1.2990075] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/05/2008] [Indexed: 11/19/2022]
Abstract
This paper extends previous work on the Darwinian evolutionary fitness effect of the fixation of deleterious mutations by incorporating compensatory mutations, which are mutations (deleterious by themselves) that ameliorate other deleterious mutations, thus reducing the genetic load of populations. Since having compensatory mutations essentially changes the distributional shapes of deleterious mutations, the effect of compensatory mutations is studied by comparing distributions of deleterious mutations without compensatory mutations to those with compensatory mutations. The effect of effective population size (N(e)), fitness distributional shape, and mutation rate on population fitness reduction is studied. Results indicate that, first, the smaller a population's N(e), the larger the effect of compensatory mutations on fitness recovery, and the compensatory effect increases sharply with decreasing N(e). Second, the larger the squared coefficient of variation in the fitness effect of deleterious mutations, the larger the effect of compensatory mutations. Third, for fixed N(e), the higher the rate of deleterious mutations, the more effective compensatory mutation is in fitness recovery, and this effect is more pronounced for smaller N(e).
Collapse
Affiliation(s)
- Liqing Zhang
- Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, Virginia 24061
| | - Layne T. Watson
- Departments of Computer Science and Mathematics, Virginia Polytechnic Institute and State University, Blacksburg, Virginia 24061
| |
Collapse
|
42
|
Abstract
The distribution of genetic polymorphisms in a population contains information about evolutionary processes. The Poisson random field (PRF) model uses the polymorphism frequency spectrum to infer the mutation rate and the strength of directional selection. The PRF model relies on an infinite-sites approximation that is reasonable for most eukaryotic populations, but that becomes problematic when is large ( greater, similar 0.05). Here, we show that at large mutation rates characteristic of microbes and viruses the infinite-sites approximation of the PRF model induces systematic biases that lead it to underestimate negative selection pressures and mutation rates and erroneously infer positive selection. We introduce two new methods that extend our ability to infer selection pressures and mutation rates at large : a finite-site modification of the PRF model and a new technique based on diffusion theory. Our methods can be used to infer not only a "weighted average" of selection pressures acting on a gene sequence, but also the distribution of selection pressures across sites. We evaluate the accuracy of our methods, as well that of the original PRF approach, by comparison with Wright-Fisher simulations.
Collapse
|
43
|
Divergence and Polymorphism Under the Nearly Neutral Theory of Molecular Evolution. J Mol Evol 2008; 67:418-26. [DOI: 10.1007/s00239-008-9146-9] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2008] [Revised: 05/26/2008] [Accepted: 07/14/2008] [Indexed: 11/26/2022]
|
44
|
Abstract
Animal mitochondrial genomes have high rates of sequence evolution, and should decay from the accumulation of deleterious mutations. But the purging of mutant mtDNAs in a pedigree of "mutator mice" reveals the speed and power of purifying selection to maintain mitochondrial function.
Collapse
|
45
|
Keightley PD, Halligan DL. Analysis and implications of mutational variation. Genetica 2008; 136:359-69. [PMID: 18663587 DOI: 10.1007/s10709-008-9304-4] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2008] [Accepted: 07/16/2008] [Indexed: 11/25/2022]
Abstract
Variation from new mutations is important for several questions in quantitative genetics. Key parameters are the genomic mutation rate and the distribution of effects of mutations (DEM), which determine the amount of new quantitative variation that arises per generation from mutation (V(M)). Here, we review methods and empirical results concerning mutation accumulation (MA) experiments that have shed light on properties of mutations affecting quantitative traits. Surprisingly, most data on fitness traits from laboratory assays of MA lines indicate that the DEM is platykurtic in form (i.e., substantially less leptokurtic than an exponential distribution), and imply that most variation is produced by mutations of moderate to large effect. This finding contrasts with results from MA or mutagenesis experiments in which mutational changes to the DNA can be assayed directly, which imply that the vast majority of mutations have very small phenotypic effects, and that the distribution has a leptokurtic form. We compare these findings with recent approaches that attempt to infer the DEM for fitness based on comparing the frequency spectra of segregating nucleotide polymorphisms at putatively neutral and selected sites in population samples. When applied to data for humans and Drosophila, these analyses also indicate that the DEM is strongly leptokurtic. However, by combining the resultant estimates of parameters of the DEM with estimates of the mutation rate per nucleotide, the predicted V(M) for fitness is only a tiny fraction of V(M) observed in MA experiments. This discrepancy can be explained if we postulate that a few deleterious mutations of large effect contribute most of the mutational variation observed in MA experiments and that such mutations segregate at very low frequencies in natural populations, and effectively are never seen in population samples.
Collapse
Affiliation(s)
- Peter D Keightley
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, West Mains Road, Edinburgh, EH9 3JT, UK.
| | | |
Collapse
|
46
|
Patterns of molecular evolution in Caenorhabditis preclude ancient origins of selfing. Genetics 2008; 178:2093-104. [PMID: 18430935 DOI: 10.1534/genetics.107.085787] [Citation(s) in RCA: 77] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The evolution of self-fertilization can mediate pronounced changes in genomes as a by-product of a drastic reduction in effective population size and the concomitant accumulation of slightly deleterious mutations by genetic drift. In the nematode genus Caenorhabditis, a highly selfing lifestyle has evolved twice independently, thus permitting an opportunity to test for the effects of mode of reproduction on patterns of molecular evolution on a genomic scale. Here we contrast rates of nucleotide substitution and codon usage bias among thousands of orthologous groups of genes in six species of Caenorhabditis, including the classic model organism Caenorhabditis elegans. Despite evidence that weak selection on synonymous codon usage is pervasive in the history of all species in this genus, we find little difference among species in the patterns of codon usage bias and in replacement-site substitution. Applying a model of relaxed selection on codon usage to the C. elegans and C. briggsae lineages suggests that self-fertilization is unlikely to have evolved more than approximately 4 million years ago, which is less than a quarter of the time since they shared a common ancestor with outcrossing species. We conclude that the profound changes in mating behavior, physiology, and developmental mechanisms that accompanied the transition from an obligately outcrossing to a primarily selfing mode of reproduction evolved in the not-too-distant past.
Collapse
|
47
|
Boyko AR, Williamson SH, Indap AR, Degenhardt JD, Hernandez RD, Lohmueller KE, Adams MD, Schmidt S, Sninsky JJ, Sunyaev SR, White TJ, Nielsen R, Clark AG, Bustamante CD. Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet 2008; 4:e1000083. [PMID: 18516229 PMCID: PMC2377339 DOI: 10.1371/journal.pgen.1000083] [Citation(s) in RCA: 471] [Impact Index Per Article: 27.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2007] [Accepted: 04/29/2008] [Indexed: 11/19/2022] Open
Abstract
Quantifying the distribution of fitness effects among newly arising mutations in the human genome is key to resolving important debates in medical and evolutionary genetics. Here, we present a method for inferring this distribution using Single Nucleotide Polymorphism (SNP) data from a population with non-stationary demographic history (such as that of modern humans). Application of our method to 47,576 coding SNPs found by direct resequencing of 11,404 protein coding-genes in 35 individuals (20 European Americans and 15 African Americans) allows us to assess the relative contribution of demographic and selective effects to patterning amino acid variation in the human genome. We find evidence of an ancient population expansion in the sample with African ancestry and a relatively recent bottleneck in the sample with European ancestry. After accounting for these demographic effects, we find strong evidence for great variability in the selective effects of new amino acid replacing mutations. In both populations, the patterns of variation are consistent with a leptokurtic distribution of selection coefficients (e.g., gamma or log-normal) peaked near neutrality. Specifically, we predict 27-29% of amino acid changing (nonsynonymous) mutations are neutral or nearly neutral (|s|<0.01%), 30-42% are moderately deleterious (0.01%<|s|<1%), and nearly all the remainder are highly deleterious or lethal (|s|>1%). Our results are consistent with 10-20% of amino acid differences between humans and chimpanzees having been fixed by positive selection with the remainder of differences being neutral or nearly neutral. Our analysis also predicts that many of the alleles identified via whole-genome association mapping may be selectively neutral or (formerly) positively selected, implying that deleterious genetic variation affecting disease phenotype may be missed by this widely used approach for mapping genes underlying complex traits.
Collapse
Affiliation(s)
- Adam R. Boyko
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, United States of America
| | - Scott H. Williamson
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
| | - Amit R. Indap
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
| | - Jeremiah D. Degenhardt
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
| | - Ryan D. Hernandez
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
| | - Kirk E. Lohmueller
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, United States of America
| | - Mark D. Adams
- Department of Genetics, BRB-624, Case Western Reserve University, Cleveland, Ohio, United States of America
| | - Steffen Schmidt
- Division of Genetics, Department of Medicine, Brigham & Women's Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
| | - John J. Sninsky
- Celera Diagnostics, Alameda, California, United States of America
| | - Shamil R. Sunyaev
- Division of Genetics, Department of Medicine, Brigham & Women's Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
| | - Thomas J. White
- Celera Diagnostics, Alameda, California, United States of America
| | - Rasmus Nielsen
- Center for Comparative Genomics, University of Copenhagen, Copenhagen, Denmark
| | - Andrew G. Clark
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, United States of America
| | - Carlos D. Bustamante
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
| |
Collapse
|
48
|
Loewe L, Lamatsch DK. Quantifying the threat of extinction from Muller's ratchet in the diploid Amazon molly (Poecilia formosa). BMC Evol Biol 2008; 8:88. [PMID: 18366680 PMCID: PMC2292145 DOI: 10.1186/1471-2148-8-88] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2007] [Accepted: 03/19/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The Amazon molly (Poecilia formosa) is a small unisexual fish that has been suspected of being threatened by extinction from the stochastic accumulation of slightly deleterious mutations that is caused by Muller's ratchet in non-recombining populations. However, no detailed quantification of the extent of this threat is available. RESULTS Here we quantify genomic decay in this fish by using a simple model of Muller's ratchet with the most realistic parameter combinations available employing the evolution@home global computing system. We also describe simple extensions of the standard model of Muller's ratchet that allow us to deal with selfing diploids, triploids and mitotic recombination. We show that Muller's ratchet creates a threat of extinction for the Amazon molly for many biologically realistic parameter combinations. In most cases, extinction is expected to occur within a time frame that is less than previous estimates of the age of the species, leading to a genomic decay paradox. CONCLUSION How then does the Amazon molly survive? Several biological processes could individually or in combination solve this genomic decay paradox, including paternal leakage of undamaged DNA from sexual sister species, compensatory mutations and many others. More research is needed to quantify the contribution of these potential solutions towards the survival of the Amazon molly and other (ancient) asexual species.
Collapse
Affiliation(s)
- Laurence Loewe
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Ashworth Laboratories, King's Buildings, Edinburgh EH9 3JT, UK
- Centre for Systems Biology Edinburgh, School of Biological Sciences, University of Edinburgh, Darwin Building, King's Buildings, Edinburgh EH9 3JU, UK
| | - Dunja K Lamatsch
- Universität Würzburg, Institute of Physiological Chemistry I, Biocenter, Würzburg, 97074 Würzburg, Germany
- Freshwater Biology, Royal Belgian Institute of Natural Sciences, Vautierstraat 29, B – 1000 Brussels, Belgium
- University of Sheffield, Department of Animal and Plant Sciences, Alfred Denny Building, Western Bank, Sheffield, S10 2TN, UK
- Austrian Academy of Sciences, Institute for Limnology, Mondseestrasse 9, 5310 Mondsee, Austria
| |
Collapse
|
49
|
Joint inference of the distribution of fitness effects of deleterious mutations and population demography based on nucleotide polymorphism frequencies. Genetics 2008; 177:2251-61. [PMID: 18073430 DOI: 10.1534/genetics.107.080663] [Citation(s) in RCA: 277] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The distribution of fitness effects of new mutations (DFE) is important for addressing several questions in genetics, including the nature of quantitative variation and the evolutionary fate of small populations. Properties of the DFE can be inferred by comparing the distributions of the frequencies of segregating nucleotide polymorphisms at selected and neutral sites in a population sample, but demographic changes alter the spectrum of allele frequencies at both neutral and selected sites, so can bias estimates of the DFE if not accounted for. We have developed a maximum-likelihood approach, based on the expected allele-frequency distribution generated by transition matrix methods, to estimate parameters of the DFE while simultaneously estimating parameters of a demographic model that allows a population size change at some time in the past. We tested the method using simulations and found that it accurately recovers simulated parameter values, even if the simulated demography differs substantially from that assumed in our analysis. We use our method to estimate parameters of the DFE for amino acid-changing mutations in humans and Drosophila melanogaster. For a model of unconditionally deleterious mutations, with effects sampled from a gamma distribution, the mean estimate for the distribution shape parameter is approximately 0.2 for human populations, which implies that the DFE is strongly leptokurtic. For Drosophila populations, we estimate that the shape parameter is approximately 0.35. Differences in the shape of the distribution and the mean selection coefficient between humans and Drosophila result in significantly more strongly deleterious mutations in Drosophila than in humans, and, conversely, nearly neutral mutations are significantly less frequent.
Collapse
|
50
|
Charlesworth J, Eyre-Walker A. The McDonald-Kreitman test and slightly deleterious mutations. Mol Biol Evol 2008; 25:1007-15. [PMID: 18195052 DOI: 10.1093/molbev/msn005] [Citation(s) in RCA: 143] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
It is possible to estimate the proportion of substitutions that are due to adaptive evolution using the numbers of silent and nonsilent polymorphisms and substitutions in a McDonald and Kreitman-type analysis. Unfortunately, this estimate of adaptive evolution is biased downward by the segregation of slightly deleterious mutations. It has been suggested that 1 way to cope with the effects of these slightly deleterious mutations is to remove low-frequency polymorphisms from the analysis. We investigate the performance of this method theoretically. We show that although removing low-frequency polymorphisms does indeed reduce the bias in the estimate of adaptive evolution, the estimate is always downwardly biased, often to the extent that one would not be able to detect adaptive evolution, even if it existed. The method is reasonably satisfactory, only if the rate of adaptive evolution is high and the distribution of fitness effects for slightly deleterious mutations is very leptokurtic. Our analysis suggests that adaptive evolution could be quite prevalent in humans (>8%) and still not be detectable using current methodologies. Our analysis also suggests that the level of adaptive evolution has probably been underestimated, possibly substantially, in both bacteria and Drosophila.
Collapse
Affiliation(s)
- Jane Charlesworth
- Centre for the Study of Evolution, University of Sussex, Brighton, United Kingdom
| | | |
Collapse
|