1
|
Becher H, Charlesworth B. A model of Hill-Robertson interference caused by purifying selection in a nonrecombining genome. Genetics 2025; 230:iyaf048. [PMID: 40120130 PMCID: PMC12059647 DOI: 10.1093/genetics/iyaf048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2025] [Accepted: 03/13/2025] [Indexed: 03/25/2025] Open
Abstract
A new approach to modeling the effects of Hill-Robertson interference on levels of adaptation and patterns of variability in a nonrecombining genome or genomic region is described. The model assumes a set of L diallelic sites subject to reversible mutations between beneficial and deleterious alleles, with the same selection coefficient at each site. The assumption of reversibility allows the system to reach a stable statistical equilibrium with respect to the frequencies of deleterious mutations, in contrast to many previous models that assume irreversible mutations to deleterious alleles. The model is therefore appropriate for understanding the long-term properties of nonrecombining genomes such as Y chromosomes, and is applicable to haploid genomes or to diploid genomes when there is intermediate dominance with respect to the effects of mutations on fitness. Approximations are derived for the equilibrium frequencies of deleterious mutations, the effective population size that controls the fixation probabilities of mutations at sites under selection, the nucleotide site diversity at neutral sites located within the nonrecombining region, and the site frequency spectrum for segregating neutral variants. The approximations take into account the effects of linkage disequilibrium on the genetic variance at sites under selection. Comparisons with published and new computer simulation results show that the approximations are sufficiently accurate to be useful, and can provide insights into a wider range of parameter sets than is accessible by simulation. The relevance of the findings to data on nonrecombining genome regions is discussed.
Collapse
Affiliation(s)
- Hannes Becher
- Royal (Dick) School of Veterinary Science, The Roslin Institute, The University of Edinburgh, Midlothian EH25 9RG, UK
| | - Brian Charlesworth
- School of Biological Sciences, Institute of Ecology and Evolution, The University of Edinburgh, Edinburgh EH9 3FL, UK
| |
Collapse
|
2
|
Strütt S, Excoffier L, Peischl S. A generalized structured coalescent for purifying selection without recombination. Genetics 2025; 229:iyaf013. [PMID: 39862229 DOI: 10.1093/genetics/iyaf013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2024] [Revised: 12/18/2024] [Accepted: 12/30/2024] [Indexed: 01/27/2025] Open
Abstract
Purifying selection is a critical factor in shaping genetic diversity. Current theoretical models mostly address scenarios of either very weak or strong selection, leaving a significant gap in our knowledge. The effects of purifying selection on patterns of genomic diversity remain poorly understood when selection against deleterious mutations is weak to moderate, particularly when recombination is limited or absent. In this study, we extend an existing approach, the fitness-class coalescent, to incorporate arbitrary levels of purifying selection in haploid populations. This model offers a comprehensive framework for exploring the influence of purifying selection in a wide range of demographic scenarios. Moreover, our research reveals potential sources of qualitative and quantitative biases in demographic inference, highlighting the significant risk of attributing genetic patterns to past demographic events rather than purifying selection. This work expands our understanding of the complex interplay between selection, drift, and population dynamics, and how purifying selection distorts demographic inference.
Collapse
Affiliation(s)
- Stefan Strütt
- Interfaculty Bioinformatics Unit, University of Bern, Baltzerstrasse 6, Bern 3012, Switzerland
- Computational and Molecular Population Genetics Lab, Institute of Ecology and Evolution, University of Bern, Baltzerstrasse 6, Bern 3012, Switzerland
| | - Laurent Excoffier
- Computational and Molecular Population Genetics Lab, Institute of Ecology and Evolution, University of Bern, Baltzerstrasse 6, Bern 3012, Switzerland
| | - Stephan Peischl
- Interfaculty Bioinformatics Unit, University of Bern, Baltzerstrasse 6, Bern 3012, Switzerland
- Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
| |
Collapse
|
3
|
Daigle A, Johri P. Hill-Robertson interference may bias the inference of fitness effects of new mutations in highly selfing species. Evolution 2025; 79:342-363. [PMID: 39565285 PMCID: PMC11879154 DOI: 10.1093/evolut/qpae168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2024] [Revised: 11/12/2024] [Accepted: 11/18/2024] [Indexed: 11/21/2024]
Abstract
The accurate estimation of the distribution of fitness effects (DFE) of new mutations is critical for population genetic inference but remains a challenging task. While various methods have been developed for DFE inference using the site frequency spectrum of putatively neutral and selected sites, their applicability in species with diverse life history traits and complex demographic scenarios is not well understood. Selfing is common among eukaryotic species and can lead to decreased effective recombination rates, increasing the effects of selection at linked sites, including interference between selected alleles. We employ forward simulations to investigate the limitations of current DFE estimation approaches in the presence of selfing and other model violations, such as linkage, departures from semidominance, population structure, and uneven sampling. We find that distortions of the site frequency spectrum due to Hill-Robertson interference in highly selfing populations lead to mis-inference of the deleterious DFE of new mutations. Specifically, when inferring the distribution of selection coefficients, there is an overestimation of nearly neutral and strongly deleterious mutations and an underestimation of mildly deleterious mutations when interference between selected alleles is pervasive. In addition, the presence of cryptic population structure with low rates of migration and uneven sampling across subpopulations leads to the false inference of a deleterious DFE skewed towards effectively neutral/mildly deleterious mutations. Finally, the proportion of adaptive substitutions estimated at high rates of selfing is substantially overestimated. Our observations apply broadly to species and genomic regions with little/no recombination and where interference might be pervasive.
Collapse
Affiliation(s)
- Austin Daigle
- Department of Biology, University of North Carolina, Chapel Hill, NC, United States
- Department of Genetics, University of North Carolina, Chapel Hill, NC, United States
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina, Chapel Hill, NC, United States
| | - Parul Johri
- Department of Biology, University of North Carolina, Chapel Hill, NC, United States
- Department of Genetics, University of North Carolina, Chapel Hill, NC, United States
- Integrative Program for Biological & Genome Sciences, University of North Carolina, Chapel Hill, NC, United States
| |
Collapse
|
4
|
Koelle K, Rasmussen DA. Phylodynamics beyond neutrality: the impact of incomplete purifying selection on viral phylogenies and inference. Philos Trans R Soc Lond B Biol Sci 2025; 380:20230314. [PMID: 39976414 PMCID: PMC11867112 DOI: 10.1098/rstb.2023.0314] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2024] [Revised: 10/07/2024] [Accepted: 11/04/2024] [Indexed: 02/21/2025] Open
Abstract
Viral phylodynamics focuses on using sequence data to make inferences about the population dynamics of viral diseases. These inferences commonly include estimation of growth rates, reproduction numbers and times of most recent common ancestor. With few exceptions, existing phylodynamic inference approaches assume that all observed and ancestral viral genetic variation is fitness-neutral. This assumption is commonly violated, with a large body of analyses indicating that fitness varies substantially among genotypes circulating in viral populations. Here, we focus on fitness variation arising from deleterious mutations, asking whether incomplete purifying selection of deleterious mutations has the potential to bias phylodynamic inference. We use simulations of an exponentially growing population to explore how incomplete purifying selection distorts tree shape and shifts the distribution of mutations over trees. We find that incomplete purifying selection strongly shapes the distribution of mutations while only weakly impacting tree shape. Despite incomplete purifying selection shifting the distribution of deleterious mutations, we find little discernible bias in estimates of viral growth rates and times of the most recent common ancestor. Our results reassuringly indicate that existing phylodynamic inference approaches that assume neutrality may nevertheless yield accurate epidemiological estimates in the face of incomplete purifying selection. More work is needed to assess the robustness of these findings to alternative epidemiological parametrizations.This article is part of the theme issue ''"A mathematical theory of evolution": phylogenetic models dating back 100 years'.
Collapse
Affiliation(s)
- Katia Koelle
- Department of Biology, Emory University, Atlanta, GA30322, USA
- Emory Center of Excellence for Influenza Research and Response (CEIRR), Atlanta, GA30322, USA
| | - David A. Rasmussen
- Department of Entomology and Plant Pathology, North Carolina State University, Raleigh, NC27607, USA
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC27607, USA
| |
Collapse
|
5
|
Daigle A, Johri P. Hill-Robertson interference may bias the inference of fitness effects of new mutations in highly selfing species. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.06.579142. [PMID: 38370745 PMCID: PMC10871249 DOI: 10.1101/2024.02.06.579142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
The accurate estimation of the distribution of fitness effects (DFE) of new mutations is critical for population genetic inference but remains a challenging task. While various methods have been developed for DFE inference using the site frequency spectrum of putatively neutral and selected sites, their applicability in species with diverse life history traits and complex demographic scenarios is not well understood. Selfing is common among eukaryotic species and can lead to decreased effective recombination rates, increasing the effects of selection at linked sites, including interference between selected alleles. We employ forward simulations to investigate the limitations of current DFE estimation approaches in the presence of selfing and other model violations, such as linkage, departures from semidominance, population structure, and uneven sampling. We find that distortions of the site frequency spectrum due to Hill-Robertson interference in highly selfing populations lead to mis-inference of the deleterious DFE of new mutations. Specifically, when inferring the distribution of selection coefficients, there is an overestimation of nearly neutral and strongly deleterious mutations and an underestimation of mildly deleterious mutations when interference between selected alleles is pervasive. In addition, the presence of cryptic population structure with low rates of migration and uneven sampling across subpopulations leads to the false inference of a deleterious DFE skewed towards effectively neutral/mildly deleterious mutations. Finally, the proportion of adaptive substitutions estimated at high rates of selfing is substantially overestimated. Our observations apply broadly to species and genomic regions with little/no recombination and where interference might be pervasive.
Collapse
Affiliation(s)
- Austin Daigle
- Department of Biology, University of North Carolina, Chapel Hill, NC 27599
- Department of Genetics, University of North Carolina, Chapel Hill, NC 27599
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina, Chapel Hill, NC 27599
| | - Parul Johri
- Department of Biology, University of North Carolina, Chapel Hill, NC 27599
- Department of Genetics, University of North Carolina, Chapel Hill, NC 27599
- Integrative Program for Biological & Genome Sciences, University of North Carolina, Chapel Hill, NC 27599
| |
Collapse
|
6
|
Soni V, Jensen JD. Temporal challenges in detecting balancing selection from population genomic data. G3 (BETHESDA, MD.) 2024; 14:jkae069. [PMID: 38551137 DOI: 10.1093/g3journal/jkae069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 12/21/2023] [Accepted: 03/19/2024] [Indexed: 04/28/2024]
Abstract
The role of balancing selection in maintaining genetic variation remains an open question in population genetics. Recent years have seen numerous studies identifying candidate loci potentially experiencing balancing selection, most predominantly in human populations. There are however numerous alternative evolutionary processes that may leave similar patterns of variation, thereby potentially confounding inference, and the expected signatures of balancing selection additionally change in a temporal fashion. Here we use forward-in-time simulations to quantify expected statistical power to detect balancing selection using both site frequency spectrum- and linkage disequilibrium-based methods under a variety of evolutionarily realistic null models. We find that whilst site frequency spectrum-based methods have little power immediately after a balanced mutation begins segregating, power increases with time since the introduction of the balanced allele. Conversely, linkage disequilibrium-based methods have considerable power whilst the allele is young, and power dissipates rapidly as the time since introduction increases. Taken together, this suggests that site frequency spectrum-based methods are most effective at detecting long-term balancing selection (>25N generations since the introduction of the balanced allele) whilst linkage disequilibrium-based methods are effective over much shorter timescales (<1N generations), thereby leaving a large time frame over which current methods have little power to detect the action of balancing selection. Finally, we investigate the extent to which alternative evolutionary processes may mimic these patterns, and demonstrate the need for caution in attempting to distinguish the signatures of balancing selection from those of both neutral processes (e.g. population structure and admixture) as well as of alternative selective processes (e.g. partial selective sweeps).
Collapse
Affiliation(s)
- Vivak Soni
- School of Life Sciences, Center for Evolution & Medicine, Arizona State University, Tempe, AZ 85281, USA
| | - Jeffrey D Jensen
- School of Life Sciences, Center for Evolution & Medicine, Arizona State University, Tempe, AZ 85281, USA
| |
Collapse
|
7
|
Buffalo V, Kern AD. A quantitative genetic model of background selection in humans. PLoS Genet 2024; 20:e1011144. [PMID: 38507461 PMCID: PMC10984650 DOI: 10.1371/journal.pgen.1011144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2023] [Revised: 04/01/2024] [Accepted: 01/19/2024] [Indexed: 03/22/2024] Open
Abstract
Across the human genome, there are large-scale fluctuations in genetic diversity caused by the indirect effects of selection. This "linked selection signal" reflects the impact of selection according to the physical placement of functional regions and recombination rates along chromosomes. Previous work has shown that purifying selection acting against the steady influx of new deleterious mutations at functional portions of the genome shapes patterns of genomic variation. To date, statistical efforts to estimate purifying selection parameters from linked selection models have relied on classic Background Selection theory, which is only applicable when new mutations are so deleterious that they cannot fix in the population. Here, we develop a statistical method based on a quantitative genetics view of linked selection, that models how polygenic additive fitness variance distributed along the genome increases the rate of stochastic allele frequency change. By jointly predicting the equilibrium fitness variance and substitution rate due to both strong and weakly deleterious mutations, we estimate the distribution of fitness effects (DFE) and mutation rate across three geographically distinct human samples. While our model can accommodate weaker selection, we find evidence of strong selection operating similarly across all human samples. Although our quantitative genetic model of linked selection fits better than previous models, substitution rates of the most constrained sites disagree with observed divergence levels. We find that a model incorporating selective interference better predicts observed divergence in conserved regions, but overall our results suggest uncertainty remains about the processes generating fitness variation in humans.
Collapse
Affiliation(s)
- Vince Buffalo
- Department of Integrative Biology, University of California, Berkeley, Berkeley, California, United States of America
- Institute of Ecology and Evolution and Department of Biology, University of Oregon, Eugene, Oregon, United States of America
| | - Andrew D. Kern
- Institute of Ecology and Evolution and Department of Biology, University of Oregon, Eugene, Oregon, United States of America
| |
Collapse
|
8
|
Soni V, Pfeifer SP, Jensen JD. The Effects of Mutation and Recombination Rate Heterogeneity on the Inference of Demography and the Distribution of Fitness Effects. Genome Biol Evol 2024; 16:evae004. [PMID: 38207127 PMCID: PMC10834165 DOI: 10.1093/gbe/evae004] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 12/12/2023] [Accepted: 01/07/2024] [Indexed: 01/13/2024] Open
Abstract
Disentangling the effects of demography and selection has remained a focal point of population genetic analysis. Knowledge about mutation and recombination is essential in this endeavor; however, despite clear evidence that both mutation and recombination rates vary across genomes, it is common practice to model both rates as fixed. In this study, we quantify how this unaccounted for rate heterogeneity may impact inference using common approaches for inferring selection (DFE-alpha, Grapes, and polyDFE) and/or demography (fastsimcoal2 and δaδi). We demonstrate that, if not properly modeled, this heterogeneity can increase uncertainty in the estimation of demographic and selective parameters and in some scenarios may result in mis-leading inference. These results highlight the importance of quantifying the fundamental evolutionary parameters of mutation and recombination before utilizing population genomic data to quantify the effects of genetic drift (i.e. as modulated by demographic history) and selection; or, at the least, that the effects of uncertainty in these parameters can and should be directly modeled in downstream inference.
Collapse
Affiliation(s)
- Vivak Soni
- School of Life Sciences, Center for Evolution & Medicine, Arizona State University, Tempe, AZ, USA
| | - Susanne P Pfeifer
- School of Life Sciences, Center for Evolution & Medicine, Arizona State University, Tempe, AZ, USA
| | - Jeffrey D Jensen
- School of Life Sciences, Center for Evolution & Medicine, Arizona State University, Tempe, AZ, USA
| |
Collapse
|
9
|
Soni V, Pfeifer SP, Jensen JD. The effects of mutation and recombination rate heterogeneity on the inference of demography and the distribution of fitness effects. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.11.566703. [PMID: 38014252 PMCID: PMC10680612 DOI: 10.1101/2023.11.11.566703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Disentangling the effects of demography and selection has remained a focal point of population genetic analysis. Knowledge about mutation and recombination is essential in this endeavour; however, despite clear evidence that both mutation and recombination rates vary across genomes, it is common practice to model both rates as fixed. In this study, we quantify how this unaccounted for rate heterogeneity may impact inference using common approaches for inferring selection (DFE-alpha, Grapes, and polyDFE) and/or demography (fastsimcoal2 and δaδi). We demonstrate that, if not properly modelled, this heterogeneity can increase uncertainty in the estimation of demographic and selective parameters and in some scenarios may result in mis-leading inference. These results highlight the importance of quantifying the fundamental evolutionary parameters of mutation and recombination prior to utilizing population genomic data to quantify the effects of genetic drift (i.e., as modulated by demographic history) and selection; or, at the least, that the effects of uncertainty in these parameters can and should be directly modelled in downstream inference.
Collapse
Affiliation(s)
- Vivak Soni
- Arizona State University, School of Life Sciences, Center for Evolution & Medicine
| | - Susanne P. Pfeifer
- Arizona State University, School of Life Sciences, Center for Evolution & Medicine
| | - Jeffrey D. Jensen
- Arizona State University, School of Life Sciences, Center for Evolution & Medicine
| |
Collapse
|
10
|
Moinet A, Schlichta F, Peischl S, Excoffier L. Strong neutral sweeps occurring during a population contraction. Genetics 2022; 220:6529544. [PMID: 35171980 PMCID: PMC8982045 DOI: 10.1093/genetics/iyac021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Accepted: 01/22/2022] [Indexed: 11/14/2022] Open
Abstract
A strong reduction in diversity around a specific locus is often interpreted as a recent rapid fixation of a positively selected allele, a phenomenon called a selective sweep. Rapid fixation of neutral variants can however lead to a similar reduction in local diversity, especially when the population experiences changes in population size, e.g. bottlenecks or range expansions. The fact that demographic processes can lead to signals of nucleotide diversity very similar to signals of selective sweeps is at the core of an ongoing discussion about the roles of demography and natural selection in shaping patterns of neutral variation. Here, we quantitatively investigate the shape of such neutral valleys of diversity under a simple model of a single population size change, and we compare it to signals of a selective sweep. We analytically describe the expected shape of such "neutral sweeps" and show that selective sweep valleys of diversity are, for the same fixation time, wider than neutral valleys. On the other hand, it is always possible to parametrize our model to find a neutral valley that has the same width as a given selected valley. Our findings provide further insight into how simple demographic models can create valleys of genetic diversity similar to those attributed to positive selection.
Collapse
Affiliation(s)
- Antoine Moinet
- Interfaculty Bioinformatics Unit, University of Bern, Bern 3012, Switzerland,Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland,Computational and Molecular Population Genetics Lab, Institute of Ecology and Evolution, University of Bern, Baltzerstrasse 6, 3012 Bern, Switzerland
| | - Flávia Schlichta
- Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland,Computational and Molecular Population Genetics Lab, Institute of Ecology and Evolution, University of Bern, Baltzerstrasse 6, 3012 Bern, Switzerland
| | - Stephan Peischl
- Interfaculty Bioinformatics Unit, University of Bern, Bern 3012, Switzerland,Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland,Corresponding author.
| | - Laurent Excoffier
- Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland,Computational and Molecular Population Genetics Lab, Institute of Ecology and Evolution, University of Bern, Baltzerstrasse 6, 3012 Bern, Switzerland
| |
Collapse
|
11
|
Johri P, Riall K, Becher H, Excoffier L, Charlesworth B, Jensen JD. The Impact of Purifying and Background Selection on the Inference of Population History: Problems and Prospects. Mol Biol Evol 2021; 38:2986-3003. [PMID: 33591322 PMCID: PMC8233493 DOI: 10.1093/molbev/msab050] [Citation(s) in RCA: 57] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Current procedures for inferring population history generally assume complete neutrality—that is, they neglect both direct selection and the effects of selection on linked sites. We here examine how the presence of direct purifying selection and background selection may bias demographic inference by evaluating two commonly-used methods (MSMC and fastsimcoal2), specifically studying how the underlying shape of the distribution of fitness effects and the fraction of directly selected sites interact with demographic parameter estimation. The results show that, even after masking functional genomic regions, background selection may cause the mis-inference of population growth under models of both constant population size and decline. This effect is amplified as the strength of purifying selection and the density of directly selected sites increases, as indicated by the distortion of the site frequency spectrum and levels of nucleotide diversity at linked neutral sites. We also show how simulated changes in background selection effects caused by population size changes can be predicted analytically. We propose a potential method for correcting for the mis-inference of population growth caused by selection. By treating the distribution of fitness effect as a nuisance parameter and averaging across all potential realizations, we demonstrate that even directly selected sites can be used to infer demographic histories with reasonable accuracy.
Collapse
Affiliation(s)
- Parul Johri
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Kellen Riall
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Hannes Becher
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Laurent Excoffier
- Institute of Ecology and Evolution, University of Berne, Berne, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Brian Charlesworth
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Jeffrey D Jensen
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| |
Collapse
|
12
|
Johri P, Riall K, Becher H, Excoffier L, Charlesworth B, Jensen JD. The impact of purifying and background selection on the inference of population history: problems and prospects. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2021. [PMID: 33501439 PMCID: PMC7836109 DOI: 10.1101/2020.04.28.066365] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Current procedures for inferring population history generally assume complete neutrality - that is, they neglect both direct selection and the effects of selection on linked sites. We here examine how the presence of direct purifying selection and background selection may bias demographic inference by evaluating two commonly-used methods (MSMC and fastsimcoal2), specifically studying how the underlying shape of the distribution of fitness effects (DFE) and the fraction of directly selected sites interact with demographic parameter estimation. The results show that, even after masking functional genomic regions, background selection may cause the mis-inference of population growth under models of both constant population size and decline. This effect is amplified as the strength of purifying selection and the density of directly selected sites increases, as indicated by the distortion of the site frequency spectrum and levels of nucleotide diversity at linked neutral sites. We also show how simulated changes in background selection effects caused by population size changes can be predicted analytically. We propose a potential method for correcting for the mis-inference of population growth caused by selection. By treating the DFE as a nuisance parameter and averaging across all potential realizations, we demonstrate that even directly selected sites can be used to infer demographic histories with reasonable accuracy.
Collapse
Affiliation(s)
- Parul Johri
- School of Life Sciences, Arizona State University, Tempe, AZ 85287, USA
| | - Kellen Riall
- School of Life Sciences, Arizona State University, Tempe, AZ 85287, USA
| | - Hannes Becher
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, EH9 3FL, United Kingdom
| | - Laurent Excoffier
- Institute of Ecology and Evolution, University of Berne, Berne 3012, Switzerland.,Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
| | - Brian Charlesworth
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, EH9 3FL, United Kingdom
| | - Jeffrey D Jensen
- School of Life Sciences, Arizona State University, Tempe, AZ 85287, USA
| |
Collapse
|
13
|
Andor N, Lau BT, Catalanotti C, Sathe A, Kubit M, Chen J, Blaj C, Cherry A, Bangs CD, Grimes SM, Suarez CJ, Ji HP. Joint single cell DNA-seq and RNA-seq of gastric cancer cell lines reveals rules of in vitro evolution. NAR Genom Bioinform 2020; 2:lqaa016. [PMID: 32215369 PMCID: PMC7079336 DOI: 10.1093/nargab/lqaa016] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Revised: 02/16/2020] [Accepted: 03/09/2020] [Indexed: 01/01/2023] Open
Abstract
Cancer cell lines are not homogeneous nor are they static in their genetic state and biological properties. Genetic, transcriptional and phenotypic diversity within cell lines contributes to the lack of experimental reproducibility frequently observed in tissue-culture-based studies. While cancer cell line heterogeneity has been generally recognized, there are no studies which quantify the number of clones that coexist within cell lines and their distinguishing characteristics. We used a single-cell DNA sequencing approach to characterize the cellular diversity within nine gastric cancer cell lines and integrated this information with single-cell RNA sequencing. Overall, we sequenced the genomes of 8824 cells, identifying between 2 and 12 clones per cell line. Using the transcriptomes of more than 28 000 single cells from the same cell lines, we independently corroborated 88% of the clonal structure determined from single cell DNA analysis. For one of these cell lines, we identified cell surface markers that distinguished two subpopulations and used flow cytometry to sort these two clones. We identified substantial proportions of replicating cells in each cell line, assigned these cells to subclones detected among the G0/G1 population and used the proportion of replicating cells per subclone as a surrogate of each subclone's growth rate.
Collapse
Affiliation(s)
- Noemi Andor
- Integrated Mathematical Oncology, Moffitt Cancer Center, Tampa, 33612 FL, USA
| | - Billy T Lau
- Stanford Genome Technology Center, Stanford University, Palo Alto, 94304 CA, USA
| | | | - Anuja Sathe
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, 94305 CA, USA
| | - Matthew Kubit
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, 94305 CA, USA
| | - Jiamin Chen
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, 94305 CA, USA
| | - Cristina Blaj
- Department of Molecular and Cell Biology, University of California, Berkeley, 94720 CA, USA
| | - Athena Cherry
- Department of Pathology, Stanford University School of Medicine, Stanford, 94305 CA, USA
| | - Charles D Bangs
- Department of Pathology, Stanford University School of Medicine, Stanford, 94305 CA, USA
| | - Susan M Grimes
- Stanford Genome Technology Center, Stanford University, Palo Alto, 94304 CA, USA
| | - Carlos J Suarez
- Department of Pathology, Stanford University School of Medicine, Stanford, 94305 CA, USA
| | - Hanlee P Ji
- Stanford Genome Technology Center, Stanford University, Palo Alto, 94304 CA, USA
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, 94305 CA, USA
| |
Collapse
|
14
|
Johri P, Charlesworth B, Jensen JD. Toward an Evolutionarily Appropriate Null Model: Jointly Inferring Demography and Purifying Selection. Genetics 2020; 215:173-192. [PMID: 32152045 PMCID: PMC7198275 DOI: 10.1534/genetics.119.303002] [Citation(s) in RCA: 107] [Impact Index Per Article: 21.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Accepted: 03/05/2020] [Indexed: 01/27/2023] Open
Abstract
The question of the relative evolutionary roles of adaptive and nonadaptive processes has been a central debate in population genetics for nearly a century. While advances have been made in the theoretical development of the underlying models, and statistical methods for estimating their parameters from large-scale genomic data, a framework for an appropriate null model remains elusive. A model incorporating evolutionary processes known to be in constant operation, genetic drift (as modulated by the demographic history of the population) and purifying selection, is lacking. Without such a null model, the role of adaptive processes in shaping within- and between-population variation may not be accurately assessed. Here, we investigate how population size changes and the strength of purifying selection affect patterns of variation at "neutral" sites near functional genomic components. We propose a novel statistical framework for jointly inferring the contribution of the relevant selective and demographic parameters. By means of extensive performance analyses, we quantify the utility of the approach, identify the most important statistics for parameter estimation, and compare the results with existing methods. Finally, we reanalyze genome-wide population-level data from a Zambian population of Drosophila melanogaster, and find that it has experienced a much slower rate of population growth than was inferred when the effects of purifying selection were neglected. Our approach represents an appropriate null model, against which the effects of positive selection can be assessed.
Collapse
Affiliation(s)
- Parul Johri
- School of Life Sciences, Arizona State University, Tempe, Arizona 85287
| | - Brian Charlesworth
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, EH9 3FL, United Kingdom
| | - Jeffrey D Jensen
- School of Life Sciences, Arizona State University, Tempe, Arizona 85287
| |
Collapse
|
15
|
Woerner AE, Veeramah KR, Watkins JC, Hammer MF. The Role of Phylogenetically Conserved Elements in Shaping Patterns of Human Genomic Diversity. Mol Biol Evol 2020; 35:2284-2295. [PMID: 30113695 DOI: 10.1093/molbev/msy145] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
Evolutionary genetic studies have shown a positive correlation between levels of nucleotide diversity and either rates of recombination or genetic distance to genes. Both positive-directional and purifying selection have been offered as the source of these correlations via genetic hitchhiking and background selection, respectively. Phylogenetically conserved elements (CEs) are short (∼100 bp), widely distributed (comprising ∼5% of genome), sequences that are often found far from genes. While the function of many CEs is unknown, CEs also are associated with reduced diversity at linked sites. Using high coverage (>80×) whole genome data from two human populations, the Yoruba and the CEU, we perform fine scale evaluations of diversity, rates of recombination, and linkage to genes. We find that the local rate of recombination has a stronger effect on levels of diversity than linkage to genes, and that these effects of recombination persist even in regions far from genes. Our whole genome modeling demonstrates that, rather than recombination or GC-biased gene conversion, selection on sites within or linked to CEs better explains the observed genomic diversity patterns. A major implication is that very few sites in the human genome are predicted to be free of the effects of selection. These sites, which we refer to as the human "neutralome," comprise only 1.2% of the autosomes and 5.1% of the X chromosome. Demographic analysis of the neutralome reveals larger population sizes and lower rates of growth for ancestral human populations than inferred by previous analyses.
Collapse
Affiliation(s)
- August E Woerner
- ARL Division of Biotechnology, University of Arizona, Tucson, AZ.,Center for Human Identification, University of North Texas Health Science Center, Fort Worth, TX
| | - Krishna R Veeramah
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, NY
| | | | - Michael F Hammer
- ARL Division of Biotechnology, University of Arizona, Tucson, AZ
| |
Collapse
|
16
|
An ancestral process with selection in an ecological community. J Theor Biol 2019; 466:128-144. [PMID: 30586554 DOI: 10.1016/j.jtbi.2018.12.032] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2018] [Revised: 12/17/2018] [Accepted: 12/21/2018] [Indexed: 11/20/2022]
Abstract
An ecological community is a geographical area composed of two or more species. The ancestral histories of individuals from the same and different species in an ecological community may be interconnected due to direct and indirect interactions. Here, we present a model of the ancestral history of an ecological community that is built upon the framework of coalescent and ancestral graph theory. The model includes selection, whereby the fitness of an ancestral lineage is a function of both its abiotic environment and interactions with individuals from its biotic environment. The model also allows for metacommunity structure. We first define a forward-time percolation process characterizing the evolution of an ecological community and then present its corresponding backward-time graphical model in the limit of large population sizes. Next, we present expectations of properties of phenotypes in the graph. These expectations give insight into the structure of phenotypic variation and trait-environment covariances across local communities, including the effects of drift, intra and inter-species genealogical structure and the sampling effects of selection. In addition, we derive expectations for multivariate phenotypic diversity in a community assuming neutrality and compare this to expectations with stabilizing selection.
Collapse
|
17
|
The Effects on Neutral Variability of Recurrent Selective Sweeps and Background Selection. Genetics 2019; 212:287-303. [PMID: 30923166 DOI: 10.1534/genetics.119.301951] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Accepted: 03/19/2019] [Indexed: 12/11/2022] Open
Abstract
Levels of variability and rates of adaptive evolution may be affected by hitchhiking, the effect of selection on evolution at linked sites. Hitchhiking can be caused either by "selective sweeps" or by background selection, involving the spread of new favorable alleles or the elimination of deleterious mutations, respectively. Recent analyses of population genomic data have fitted models where both these processes act simultaneously, to infer the parameters of selection. Here, we investigate the consequences of relaxing a key assumption of some of these studies, that the time occupied by a selective sweep is negligible compared with the neutral coalescent time. We derive a new expression for the expected level of neutral variability in the presence of recurrent selective sweeps and background selection. We also derive approximate integral expressions for the effects of recurrent selective sweeps. The accuracy of the theoretical predictions was tested against multilocus simulations, with selection, recombination, and mutation parameters that are realistic for Drosophila melanogaster In the presence of crossing over, there is approximate agreement between the theoretical and simulation results. We show that the observed relationships between the rate of crossing over, and the level of synonymous site diversity and rate of adaptive evolution in Drosophila are probably mainly caused by background selection, whereas selective sweeps and population size changes are needed to produce the observed distortions of the site frequency spectrum.
Collapse
|
18
|
The Effect of Strong Purifying Selection on Genetic Diversity. Genetics 2018; 209:1235-1278. [PMID: 29844134 DOI: 10.1534/genetics.118.301058] [Citation(s) in RCA: 149] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2018] [Accepted: 05/25/2018] [Indexed: 12/15/2022] Open
Abstract
Purifying selection reduces genetic diversity, both at sites under direct selection and at linked neutral sites. This process, known as background selection, is thought to play an important role in shaping genomic diversity in natural populations. Yet despite its importance, the effects of background selection are not fully understood. Previous theoretical analyses of this process have taken a backward-time approach based on the structured coalescent. While they provide some insight, these methods are either limited to very small samples or are computationally prohibitive. Here, we present a new forward-time analysis of the trajectories of both neutral and deleterious mutations at a nonrecombining locus. We find that strong purifying selection leads to remarkably rich dynamics: neutral mutations can exhibit sweep-like behavior, and deleterious mutations can reach substantial frequencies even when they are guaranteed to eventually go extinct. Our analysis of these dynamics allows us to calculate analytical expressions for the full site frequency spectrum. We find that whenever background selection is strong enough to lead to a reduction in genetic diversity, it also results in substantial distortions to the site frequency spectrum, which can mimic the effects of population expansions or positive selection. Because these distortions are most pronounced in the low and high frequency ends of the spectrum, they become particularly important in larger samples, but may have small effects in smaller samples. We also apply our forward-time framework to calculate other quantities, such as the ultimate fates of polymorphisms or the fitnesses of their ancestral backgrounds.
Collapse
|
19
|
Adams RH, Schield DR, Card DC, Castoe TA. Assessing the Impacts of Positive Selection on Coalescent-Based Species Tree Estimation and Species Delimitation. Syst Biol 2018; 67:1076-1090. [DOI: 10.1093/sysbio/syy034] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2017] [Accepted: 05/05/2018] [Indexed: 11/13/2022] Open
Affiliation(s)
- Richard H Adams
- Department of Biology, University of Texas at Arlington, 501 S. Nedderman Dr., Arlington, TX 76019, USA
| | - Drew R Schield
- Department of Biology, University of Texas at Arlington, 501 S. Nedderman Dr., Arlington, TX 76019, USA
| | - Daren C Card
- Department of Biology, University of Texas at Arlington, 501 S. Nedderman Dr., Arlington, TX 76019, USA
| | - Todd A Castoe
- Department of Biology, University of Texas at Arlington, 501 S. Nedderman Dr., Arlington, TX 76019, USA
| |
Collapse
|
20
|
Santiago E, Caballero A. Joint Prediction of the Effective Population Size and the Rate of Fixation of Deleterious Mutations. Genetics 2016; 204:1267-1279. [PMID: 27672094 PMCID: PMC5105856 DOI: 10.1534/genetics.116.188250] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2016] [Accepted: 09/20/2016] [Indexed: 11/18/2022] Open
Abstract
Mutation, genetic drift, and selection are considered the main factors shaping genetic variation in nature. There is a lack, however, of general predictions accounting for the mutual interrelation between these factors. In the context of the background selection model, we provide a set of equations for the joint prediction of the effective population size and the rate of fixation of deleterious mutations, which are applicable both to sexual and asexual species. For a population of N haploid individuals and a model of deleterious mutations with effect s appearing with rate U in a genome L Morgans long, the asymptotic effective population size (Ne) and the average number of generations (T) between consecutive fixations can be approximated by [Formula: see text] and [Formula: see text] The solution is applicable to Muller's ratchet, providing satisfactory approximations to the rate of accumulation of mutations for a wide range of parameters. We also obtain predictions of the effective size accounting for the expected nucleotide diversity. Predictions for sexual populations allow for outlining the general conditions where mutational meltdown occurs. The equations can be extended to any distribution of mutational effects and the consideration of hotspots of recombination, showing that Ne is rather insensitive and not proportional to changes in N for many combinations of parameters. This could contribute to explain the observed small differences in levels of polymorphism between species with very different census sizes.
Collapse
Affiliation(s)
- Enrique Santiago
- Departamento de Biología Funcional, Facultad de Biología, Universidad de Oviedo, 33071 Oviedo, Spain
| | - Armando Caballero
- Departamento de Bioquímica, Genética e Inmunología, Facultad de Biología, Universidad de Vigo, 36310 Vigo, Spain
| |
Collapse
|
21
|
Lapierre M, Blin C, Lambert A, Achaz G, Rocha EPC. The Impact of Selection, Gene Conversion, and Biased Sampling on the Assessment of Microbial Demography. Mol Biol Evol 2016; 33:1711-25. [PMID: 26931140 PMCID: PMC4915353 DOI: 10.1093/molbev/msw048] [Citation(s) in RCA: 48] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Recent studies have linked demographic changes and epidemiological patterns in bacterial populations using coalescent-based approaches. We identified 26 studies using skyline plots and found that 21 inferred overall population expansion. This surprising result led us to analyze the impact of natural selection, recombination (gene conversion), and sampling biases on demographic inference using skyline plots and site frequency spectra (SFS). Forward simulations based on biologically relevant parameters from Escherichia coli populations showed that theoretical arguments on the detrimental impact of recombination and especially natural selection on the reconstructed genealogies cannot be ignored in practice. In fact, both processes systematically lead to spurious interpretations of population expansion in skyline plots (and in SFS for selection). Weak purifying selection, and especially positive selection, had important effects on skyline plots, showing patterns akin to those of population expansions. State-of-the-art techniques to remove recombination further amplified these biases. We simulated three common sampling biases in microbiological research: uniform, clustered, and mixed sampling. Alone, or together with recombination and selection, they further mislead demographic inferences producing almost any possible skyline shape or SFS. Interestingly, sampling sub-populations also affected skyline plots and SFS, because the coalescent rates of populations and their sub-populations had different distributions. This study suggests that extreme caution is needed to infer demographic changes solely based on reconstructed genealogies. We suggest that the development of novel sampling strategies and the joint analyzes of diverse population genetic methods are strictly necessary to estimate demographic changes in populations where selection, recombination, and biased sampling are present.
Collapse
Affiliation(s)
- Marguerite Lapierre
- Atelier de Bioinformatique, UMR7205 ISYEB, MNHN-UPMC-CNRS-EPHE, Muséum National d'Histoire Naturelle, Paris, France Collège de France, Center for Interdisciplinary Research in Biology (CIRB), CNRS UMR 7241, Paris, France
| | - Camille Blin
- Sorbonne Universités, UPMC Univ Paris06, IFD, 4 Place Jussieu, Paris Cedex05, France Institut Pasteur, Microbial Evolutionary Genomics, Paris, France CNRS, UMR3525, Paris, France
| | - Amaury Lambert
- Collège de France, Center for Interdisciplinary Research in Biology (CIRB), CNRS UMR 7241, Paris, France UPMC Univ Paris 06, Laboratoire de Probabilités et Modèles Aléatoires (LPMA), CNRS UMR 7599, Paris, France
| | - Guillaume Achaz
- Atelier de Bioinformatique, UMR7205 ISYEB, MNHN-UPMC-CNRS-EPHE, Muséum National d'Histoire Naturelle, Paris, France Collège de France, Center for Interdisciplinary Research in Biology (CIRB), CNRS UMR 7241, Paris, France
| | - Eduardo P C Rocha
- Institut Pasteur, Microbial Evolutionary Genomics, Paris, France CNRS, UMR3525, Paris, France
| |
Collapse
|
22
|
The Effects of Background and Interference Selection on Patterns of Genetic Variation in Subdivided Populations. Genetics 2015; 201:1539-54. [PMID: 26434720 DOI: 10.1534/genetics.115.178558] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2015] [Accepted: 09/24/2015] [Indexed: 11/18/2022] Open
Abstract
It is well known that most new mutations that affect fitness exert deleterious effects and that natural populations are often composed of subpopulations (demes) connected by gene flow. To gain a better understanding of the joint effects of purifying selection and population structure, we focus on a scenario where an ancestral population splits into multiple demes and study neutral diversity patterns in regions linked to selected sites. In the background selection regime of strong selection, we first derive analytic equations for pairwise coalescent times and FST as a function of time after the ancestral population splits into two demes and then construct a flexible coalescent simulator that can generate samples under complex models such as those involving multiple demes or nonconservative migration. We have carried out extensive forward simulations to show that the new methods can accurately predict diversity patterns both in the nonequilibrium phase following the split of the ancestral population and in the equilibrium between mutation, migration, drift, and selection. In the interference selection regime of many tightly linked selected sites, forward simulations provide evidence that neutral diversity patterns obtained from both the nonequilibrium and equilibrium phases may be virtually indistinguishable for models that have identical variance in fitness, but are nonetheless different with respect to the number of selected sites and the strength of purifying selection. This equivalence in neutral diversity patterns suggests that data collected from subdivided populations may have limited power for differentiating among the selective pressures to which closely linked selected sites are subject.
Collapse
|
23
|
Jackson BC, Campos JL, Zeng K. The effects of purifying selection on patterns of genetic differentiation between Drosophila melanogaster populations. Heredity (Edinb) 2014; 114:163-74. [PMID: 25227256 PMCID: PMC4270736 DOI: 10.1038/hdy.2014.80] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2014] [Revised: 06/16/2014] [Accepted: 07/22/2014] [Indexed: 01/21/2023] Open
Abstract
Using the data provided by the Drosophila Population Genomics Project, we investigate factors that affect the genetic differentiation between Rwandan and French populations of D. melanogaster. By examining within-population polymorphisms, we show that sites in long introns (especially those >2000 bp) have significantly lower π (nucleotide diversity) and more low-frequency variants (as measured by Tajima's D, minor allele frequencies, and prevalence of variants that are private to one of the two populations) than short introns, suggesting a positive relationship between intron length and selective constraint. A similar analysis of protein-coding polymorphisms shows that 0-fold (degenerate) sites in more conserved genes are under stronger purifying selection than those in less conserved genes. There is limited evidence that selection on codon bias has an effect on differentiation (as measured by FST) at 4-fold (degenerate) sites, and 4-fold sites and sites in 8–30 bp of short introns ⩽65 bp have comparable FST values. Consistent with the expected effect of purifying selection, sites in long introns and 0-fold sites in conserved genes are less differentiated than those in short introns and less conserved genes, respectively. Genes in non-crossover regions (for example, the fourth chromosome) have very high FST values at both 0-fold and 4-fold degenerate sites, which is probably because of the large reduction in within-population diversity caused by tight linkage between many selected sites. Our analyses also reveal subtle statistical properties of FST, which arise when information from multiple single nucleotide polymorphisms is combined and can lead to the masking of important signals of selection.
Collapse
Affiliation(s)
- B C Jackson
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield, UK
| | - J L Campos
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, UK
| | - K Zeng
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield, UK
| |
Collapse
|
24
|
Abstract
Mounting evidence suggests that natural populations can harbor extensive fitness diversity with numerous genomic loci under selection. It is also known that genealogical trees for populations under selection are quantifiably different from those expected under neutral evolution and described statistically by Kingman's coalescent. While differences in the statistical structure of genealogies have long been used as a test for the presence of selection, the full extent of the information that they contain has not been exploited. Here we demonstrate that the shape of the reconstructed genealogical tree for a moderately large number of random genomic samples taken from a fitness diverse, but otherwise unstructured, asexual population can be used to predict the relative fitness of individuals within the sample. To achieve this we define a heuristic algorithm, which we test in silico, using simulations of a Wright-Fisher model for a realistic range of mutation rates and selection strength. Our inferred fitness ranking is based on a linear discriminator that identifies rapidly coalescing lineages in the reconstructed tree. Inferred fitness ranking correlates strongly with actual fitness, with a genome in the top 10% ranked being in the top 20% fittest with false discovery rate of 0.1-0.3, depending on the mutation/selection parameters. The ranking also enables us to predict the genotypes that future populations inherit from the present one. While the inference accuracy increases monotonically with sample size, samples of 200 nearly saturate the performance. We propose that our approach can be used for inferring relative fitness of genomes obtained in single-cell sequencing of tumors and in monitoring viral outbreaks.
Collapse
|
25
|
Good BH, Walczak AM, Neher RA, Desai MM. Genetic diversity in the interference selection limit. PLoS Genet 2014; 10:e1004222. [PMID: 24675740 PMCID: PMC3967937 DOI: 10.1371/journal.pgen.1004222] [Citation(s) in RCA: 74] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2013] [Accepted: 01/22/2014] [Indexed: 01/23/2023] Open
Abstract
Pervasive natural selection can strongly influence observed patterns of genetic variation, but these effects remain poorly understood when multiple selected variants segregate in nearby regions of the genome. Classical population genetics fails to account for interference between linked mutations, which grows increasingly severe as the density of selected polymorphisms increases. Here, we describe a simple limit that emerges when interference is common, in which the fitness effects of individual mutations play a relatively minor role. Instead, similar to models of quantitative genetics, molecular evolution is determined by the variance in fitness within the population, defined over an effectively asexual segment of the genome (a "linkage block"). We exploit this insensitivity in a new "coarse-grained" coalescent framework, which approximates the effects of many weakly selected mutations with a smaller number of strongly selected mutations that create the same variance in fitness. This approximation generates accurate and efficient predictions for silent site variability when interference is common. However, these results suggest that there is reduced power to resolve individual selection pressures when interference is sufficiently widespread, since a broad range of parameters possess nearly identical patterns of silent site variability.
Collapse
Affiliation(s)
- Benjamin H. Good
- Departments of Organismic and Evolutionary Biology and of Physics, Harvard University, Cambridge, Massachusetts, United States of America
- FAS Center for Systems Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | | | - Richard A. Neher
- Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Michael M. Desai
- Departments of Organismic and Evolutionary Biology and of Physics, Harvard University, Cambridge, Massachusetts, United States of America
- FAS Center for Systems Biology, Harvard University, Cambridge, Massachusetts, United States of America
| |
Collapse
|
26
|
Neher RA. Genetic Draft, Selective Interference, and Population Genetics of Rapid Adaptation. ANNUAL REVIEW OF ECOLOGY EVOLUTION AND SYSTEMATICS 2013. [DOI: 10.1146/annurev-ecolsys-110512-135920] [Citation(s) in RCA: 134] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Richard A. Neher
- Max Planck Institute for Developmental Biology, Tübingen 72070, Germany;
| |
Collapse
|
27
|
Rasmussen DA, Boni MF, Koelle K. Reconciling phylodynamics with epidemiology: the case of dengue virus in southern Vietnam. Mol Biol Evol 2013; 31:258-71. [PMID: 24150038 PMCID: PMC3907054 DOI: 10.1093/molbev/mst203] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Coalescent methods are widely used to infer the demographic history of populations from gene genealogies. These approaches—often referred to as phylodynamic methods—have proven especially useful for reconstructing the dynamics of rapidly evolving viral pathogens. Yet, population dynamics inferred from viral genealogies often differ widely from those observed from other sources of epidemiological data, such as hospitalization records. We demonstrate how a modeling framework that allows for the direct fitting of mechanistic epidemiological models to genealogies can be used to test different hypotheses about what ecological factors cause phylodynamic inferences to differ from observed dynamics. We use this framework to test different hypotheses about why dengue serotype 1 (DENV-1) population dynamics in southern Vietnam inferred using existing phylodynamic methods differ from hospitalization data. Specifically, we consider how factors such as seasonality, vector dynamics, and spatial structure can affect inferences drawn from genealogies. The coalescent models we derive to take into account vector dynamics and spatial structure reveal that these ecological complexities can substantially affect coalescent rates among lineages. We show that incorporating these additional ecological complexities into coalescent models can also greatly improve estimates of historical population dynamics and lead to new insights into the factors shaping viral genealogies.
Collapse
|
28
|
Abstract
Purifying selection at many linked sites alters patterns of molecular evolution, reducing overall diversity and distorting the shapes of genealogies. Recombination attenuates these effects; however, purifying selection can significantly distort genealogies even for substantial recombination rates. Here, we show that when selection and/or recombination are sufficiently strong, the genealogy at any single site can be described by a time-dependent effective population size, Ne(t), which has a simple analytic form. Our results illustrate how recombination reduces distortions in genealogies and allow us to quantitatively describe the shapes of genealogies in the presence of strong purifying selection and recombination. We also analyze the effects of a distribution of selection coefficients across the genome.
Collapse
|
29
|
Purifying selection causes widespread distortions of genealogical structure on the human X chromosome. Genetics 2013; 194:485-92. [PMID: 23589459 DOI: 10.1534/genetics.113.152074] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
The extent to which selective forces shape patterns of genetic and genealogical variation is unknown in many species. Recent theoretical models have suggested that even relatively weak purifying selection may produce significant distortions in gene genealogies, but few studies have sought to quantify this effect in humans. Here, we employ a reconstruction method based on the ancestral recombination graph to infer genealogies across the length of the human X chromosome and to examine time to most recent common ancestor (TMRCA) and measures of tree imbalance at both broad and very fine scales. In agreement with theory, TMRCA is significantly reduced and genealogies are significantly more imbalanced in coding regions and introns when compared to intergenic regions, and these effects are increased in areas of greater evolutionary constraint. These distortions are present at multiple scales, and chromosomal regions as broad as 5 Mb show a significant negative correlation in TMRCA with exon density. We also show that areas of recent TMRCA are significantly associated with the disease-causing potential of site as measured by the MutationTaster prediction algorithm. Together, these findings suggest that purifying selection has significantly distorted human genealogical structure on both broad and fine scales and that few chromosomal regions escape selection-induced distortions.
Collapse
|
30
|
Good BH, Desai MM. Fluctuations in fitness distributions and the effects of weak linked selection on sequence evolution. Theor Popul Biol 2013; 85:86-102. [PMID: 23337315 DOI: 10.1016/j.tpb.2013.01.005] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2012] [Revised: 01/02/2013] [Accepted: 01/11/2013] [Indexed: 02/02/2023]
Abstract
Evolutionary dynamics and patterns of molecular evolution are strongly influenced by selection on linked regions of the genome, but our quantitative understanding of these effects remains incomplete. Recent work has focused on predicting the distribution of fitness within an evolving population, and this forms the basis for several methods that leverage the fitness distribution to predict the patterns of genetic diversity when selection is strong. However, in weakly selected populations random fluctuations due to genetic drift are more severe, and neither the distribution of fitness nor the sequence diversity within the population are well understood. Here, we briefly review the motivations behind the fitness-distribution picture, and summarize the general approaches that have been used to analyze this distribution in the strong-selection regime. We then extend these approaches to the case of weak selection, by outlining a perturbative treatment of selection at a large number of linked sites. This allows us to quantify the stochastic behavior of the fitness distribution and yields exact analytical predictions for the sequence diversity and substitution rate in the limit that selection is weak.
Collapse
Affiliation(s)
- Benjamin H Good
- Department of Organismic and Evolutionary Biology, Department of Physics, and FAS Center for Systems Biology, Harvard University, United States
| | | |
Collapse
|
31
|
|
32
|
Abstract
The genetic diversity of a species is shaped by its recent evolutionary history and can be used to infer demographic events or selective sweeps. Most inference methods are based on the null hypothesis that natural selection is a weak or infrequent evolutionary force. However, many species, particularly pathogens, are under continuous pressure to adapt in response to changing environments. A statistical framework for inference from diversity data of such populations is currently lacking. Towards this goal, we explore the properties of genealogies in a model of continual adaptation in asexual populations. We show that lineages trace back to a small pool of highly fit ancestors, in which almost simultaneous coalescence of more than two lineages frequently occurs. Whereas such multiple mergers are unlikely under the neutral coalescent, they create a unique genetic footprint in adapting populations. The site frequency spectrum of derived neutral alleles, for example, is nonmonotonic and has a peak at high frequencies, whereas Tajima's D becomes more and more negative with increasing sample size. Because multiple merger coalescents emerge in many models of rapid adaptation, we argue that they should be considered as a null model for adapting populations.
Collapse
|
33
|
Wu S, Koelle K, Rodrigo A. Coalescent entanglement and the conditional dependence of the times to common ancestry of mutually exclusive pairs of individuals. J Hered 2012; 104:86-91. [PMID: 23077234 DOI: 10.1093/jhered/ess074] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The Kingman coalescent is a continuous-time diffusion approximation of the times to common ancestry of a sample of individuals drawn from a Wright-Fisher population. Here, we use the coalescent to answer a simple question: if we know the ancestry of 2 randomly sampled individuals in the population, what does it tell us about the ancestry of 2 other randomly sampled individuals? We show that there is a conditional dependency between the times to common ancestry between pairs of randomly sampled individuals. We call this "coalescent entanglement," and we demonstrate its effects through simulation. The effects of entanglement extend beyond the coalescent to phylogenetic birth-death processes in general. Entanglement also exerts its effects when the pairs of individuals chosen share no common lineages in the paths that connect the individuals in each pair.
Collapse
Affiliation(s)
- Steven Wu
- Department of Biology, Duke University, Box 90338, Durham, NC 27708, USA
| | | | | |
Collapse
|
34
|
Irwin DE. Local Adaptation along Smooth Ecological Gradients Causes Phylogeographic Breaks and Phenotypic Clustering. Am Nat 2012; 180:35-49. [DOI: 10.1086/666002] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
35
|
Nicolaisen LE, Desai MM. Distortions in genealogies due to purifying selection. Mol Biol Evol 2012; 29:3589-600. [PMID: 22729750 DOI: 10.1093/molbev/mss170] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Purifying selection can substantially alter patterns of molecular evolution. Its main effect is to reduce overall levels of genetic variation, leading to a reduced effective population size. However, it also distorts genealogies relative to neutral expectations. A structured coalescent method has been used to describe this effect, and forms the basis for numerical methods and simulations. In this study, we extend this approach by making the additional approximation that lineages may be treated independently, which is valid only in the strong selection regime. We show that in this regime, the distortions due to purifying selection can be described by a time-dependent effective population size and mutation rate, confirming earlier intuition. We calculate simple analytical expressions for these functions, N(e)(t) and U(e)(t). These results allow us to describe the structure of genealogies in a population under strong purifying selection as equivalent to a purely neutral population with varying population size and mutation rate, thereby enabling the use of neutral methods of inference and estimation for populations in the strong selection regime.
Collapse
|
36
|
Charlesworth B. The effects of deleterious mutations on evolution at linked sites. Genetics 2012; 190:5-22. [PMID: 22219506 PMCID: PMC3249359 DOI: 10.1534/genetics.111.134288] [Citation(s) in RCA: 201] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2011] [Accepted: 11/04/2011] [Indexed: 01/14/2023] Open
Abstract
The process of evolution at a given site in the genome can be influenced by the action of selection at other sites, especially when these are closely linked to it. Such selection reduces the effective population size experienced by the site in question (the Hill-Robertson effect), reducing the level of variability and the efficacy of selection. In particular, deleterious variants are continually being produced by mutation and then eliminated by selection at sites throughout the genome. The resulting reduction in variability at linked neutral or nearly neutral sites can be predicted from the theory of background selection, which assumes that deleterious mutations have such large effects that their behavior in the population is effectively deterministic. More weakly selected mutations can accumulate by Muller's ratchet after a shutdown of recombination, as in an evolving Y chromosome. Many functionally significant sites are probably so weakly selected that Hill-Robertson interference undermines the effective strength of selection upon them, when recombination is rare or absent. This leads to large departures from deterministic equilibrium and smaller effects on linked neutral sites than under background selection or Muller's ratchet. Evidence is discussed that is consistent with the action of these processes in shaping genome-wide patterns of variation and evolution.
Collapse
Affiliation(s)
- Brian Charlesworth
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JT, United Kingdom.
| |
Collapse
|
37
|
The structure of allelic diversity in the presence of purifying selection. Theor Popul Biol 2011; 81:144-57. [PMID: 22198521 DOI: 10.1016/j.tpb.2011.12.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2011] [Revised: 12/05/2011] [Accepted: 12/06/2011] [Indexed: 11/24/2022]
Abstract
In the absence of selection, the structure of equilibrium allelic diversity is described by the elegant sampling formula of Ewens. This formula has helped to shape our expectations of empirical patterns of molecular variation. Along with coalescent theory, it provides statistical techniques for rejecting the null model of neutrality. However, we still do not fully understand the statistics of the allelic diversity expected in the presence of natural selection. Earlier work has described the effects of strongly deleterious mutations linked to many neutral sites, and allelic variation in models where offspring fitness is unrelated to parental fitness, but it has proven difficult to understand allelic diversity in the presence of purifying selection at many linked sites. Here, we study the population genetics of infinitely many perfectly linked sites, some neutral and some deleterious. Our approach is based on studying the lineage structure within each class of individuals of similar fitness in the deleterious mutation-selection balance. Consistent with previous observations, we find that for moderate and weak selection pressures, the patterns of allelic diversity cannot be described by a neutral model for any choice of the effective population site. We compute precisely how purifying selection at many linked sites distorts the patterns of allelic diversity, by developing expressions for the likelihood of any configuration of allelic types in a sample analogous to the Ewens sampling formula.
Collapse
|
38
|
Kühnert D, Wu CH, Drummond AJ. Phylogenetic and epidemic modeling of rapidly evolving infectious diseases. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2011; 11:1825-41. [PMID: 21906695 PMCID: PMC7106223 DOI: 10.1016/j.meegid.2011.08.005] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/01/2011] [Revised: 08/09/2011] [Accepted: 08/09/2011] [Indexed: 12/23/2022]
Abstract
Epidemic modeling of infectious diseases has a long history in both theoretical and empirical research. However the recent explosion of genetic data has revealed the rapid rate of evolution that many populations of infectious agents undergo and has underscored the need to consider both evolutionary and ecological processes on the same time scale. Mathematical epidemiology has applied dynamical models to study infectious epidemics, but these models have tended not to exploit--or take into account--evolutionary changes and their effect on the ecological processes and population dynamics of the infectious agent. On the other hand, statistical phylogenetics has increasingly been applied to the study of infectious agents. This approach is based on phylogenetics, molecular clocks, genealogy-based population genetics and phylogeography. Bayesian Markov chain Monte Carlo and related computational tools have been the primary source of advances in these statistical phylogenetic approaches. Recently the first tentative steps have been taken to reconcile these two theoretical approaches. We survey the Bayesian phylogenetic approach to epidemic modeling of infection diseases and describe the contrasts it provides to mathematical epidemiology as well as emphasize the significance of the future unification of these two fields.
Collapse
|
39
|
The structure of genealogies in the presence of purifying selection: a fitness-class coalescent. Genetics 2011; 190:753-79. [PMID: 22135349 DOI: 10.1534/genetics.111.134544] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Compared to a neutral model, purifying selection distorts the structure of genealogies and hence alters the patterns of sampled genetic variation. Although these distortions may be common in nature, our understanding of how we expect purifying selection to affect patterns of molecular variation remains incomplete. Genealogical approaches such as coalescent theory have proven difficult to generalize to situations involving selection at many linked sites, unless selection pressures are extremely strong. Here, we introduce an effective coalescent theory (a "fitness-class coalescent") to describe the structure of genealogies in the presence of purifying selection at many linked sites. We use this effective theory to calculate several simple statistics describing the expected patterns of variation in sequence data, both at the sites under selection and at linked neutral sites. Our analysis combines a description of the allele frequency spectrum in the presence of purifying selection with the structured coalescent approach of Kaplan et al. (1988), to trace the ancestry of individuals through the distribution of fitnesses within the population. We also derive our results using a more direct extension of the structured coalescent approach of Hudson and Kaplan (1994). We find that purifying selection leads to patterns of genetic variation that are related but not identical to a neutrally evolving population in which population size has varied in a specific way in the past.
Collapse
|
40
|
Lohmueller KE, Albrechtsen A, Li Y, Kim SY, Korneliussen T, Vinckenbosch N, Tian G, Huerta-Sanchez E, Feder AF, Grarup N, Jørgensen T, Jiang T, Witte DR, Sandbæk A, Hellmann I, Lauritzen T, Hansen T, Pedersen O, Wang J, Nielsen R. Natural selection affects multiple aspects of genetic variation at putatively neutral sites across the human genome. PLoS Genet 2011; 7:e1002326. [PMID: 22022285 PMCID: PMC3192825 DOI: 10.1371/journal.pgen.1002326] [Citation(s) in RCA: 110] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2011] [Accepted: 08/16/2011] [Indexed: 12/30/2022] Open
Abstract
A major question in evolutionary biology is how natural selection has shaped patterns of genetic variation across the human genome. Previous work has documented a reduction in genetic diversity in regions of the genome with low recombination rates. However, it is unclear whether other summaries of genetic variation, like allele frequencies, are also correlated with recombination rate and whether these correlations can be explained solely by negative selection against deleterious mutations or whether positive selection acting on favorable alleles is also required. Here we attempt to address these questions by analyzing three different genome-wide resequencing datasets from European individuals. We document several significant correlations between different genomic features. In particular, we find that average minor allele frequency and diversity are reduced in regions of low recombination and that human diversity, human-chimp divergence, and average minor allele frequency are reduced near genes. Population genetic simulations show that either positive natural selection acting on favorable mutations or negative natural selection acting against deleterious mutations can explain these correlations. However, models with strong positive selection on nonsynonymous mutations and little negative selection predict a stronger negative correlation between neutral diversity and nonsynonymous divergence than observed in the actual data, supporting the importance of negative, rather than positive, selection throughout the genome. Further, we show that the widespread presence of weakly deleterious alleles, rather than a small number of strongly positively selected mutations, is responsible for the correlation between neutral genetic diversity and recombination rate. This work suggests that natural selection has affected multiple aspects of linked neutral variation throughout the human genome and that positive selection is not required to explain these observations.
Collapse
Affiliation(s)
- Kirk E Lohmueller
- Department of Integrative Biology, University of California Berkeley, Berkeley, California, United States of America.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
41
|
The joint effects of background selection and genetic recombination on local gene genealogies. Genetics 2011; 189:251-66. [PMID: 21705759 DOI: 10.1534/genetics.111.130575] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Background selection, the effects of the continual removal of deleterious mutations by natural selection on variability at linked sites, is potentially a major determinant of DNA sequence variability. However, the joint effects of background selection and genetic recombination on the shape of the neutral gene genealogy have proved hard to study analytically. The only existing formula concerns the mean coalescent time for a pair of alleles, making it difficult to assess the importance of background selection from genome-wide data on sequence polymorphism. Here we develop a structured coalescent model of background selection with recombination and implement it in a computer program that efficiently generates neutral gene genealogies for an arbitrary sample size. We check the validity of the structured coalescent model against forward-in-time simulations and show that it accurately captures the effects of background selection. The model produces more accurate predictions of the mean coalescent time than the existing formula and supports the conclusion that the effect of background selection is greater in the interior of a deleterious region than at its boundaries. The level of linkage disequilibrium between sites is elevated by background selection, to an extent that is well summarized by a change in effective population size. The structured coalescent model is readily extendable to more realistic situations and should prove useful for analyzing genome-wide polymorphism data.
Collapse
|
42
|
Ho SYW, Lanfear R, Bromham L, Phillips MJ, Soubrier J, Rodrigo AG, Cooper A. Time-dependent rates of molecular evolution. Mol Ecol 2011; 20:3087-101. [PMID: 21740474 DOI: 10.1111/j.1365-294x.2011.05178.x] [Citation(s) in RCA: 364] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
For over half a century, it has been known that the rate of morphological evolution appears to vary with the time frame of measurement. Rates of microevolutionary change, measured between successive generations, were found to be far higher than rates of macroevolutionary change inferred from the fossil record. More recently, it has been suggested that rates of molecular evolution are also time dependent, with the estimated rate depending on the timescale of measurement. This followed surprising observations that estimates of mutation rates, obtained in studies of pedigrees and laboratory mutation-accumulation lines, exceeded long-term substitution rates by an order of magnitude or more. Although a range of studies have provided evidence for such a pattern, the hypothesis remains relatively contentious. Furthermore, there is ongoing discussion about the factors that can cause molecular rate estimates to be dependent on time. Here we present an overview of our current understanding of time-dependent rates. We provide a summary of the evidence for time-dependent rates in animals, bacteria and viruses. We review the various biological and methodological factors that can cause rates to be time dependent, including the effects of natural selection, calibration errors, model misspecification and other artefacts. We also describe the challenges in calibrating estimates of molecular rates, particularly on the intermediate timescales that are critical for an accurate characterization of time-dependent rates. This has important consequences for the use of molecular-clock methods to estimate timescales of recent evolutionary events.
Collapse
Affiliation(s)
- Simon Y W Ho
- Centre for Macroevolution and Macroecology, Evolution Ecology & Genetics, Research School of Biology, Australian National University, Canberra, ACT, Australia.
| | | | | | | | | | | | | |
Collapse
|
43
|
O'Fallon BD. A method for accurate inference of population size from serially sampled genealogies distorted by selection. Mol Biol Evol 2011; 28:3171-81. [PMID: 21680870 DOI: 10.1093/molbev/msr153] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
The serial coalescent extends traditional coalescent theory to include genealogies in which not all individuals were sampled at the same time. Inference in this framework is powerful because population size and evolutionary rate may be estimated independently. However, when the sequences in question are affected by selection acting at many sites, the genealogies may differ significantly from their neutral expectation, and inference of demographic parameters may become inaccurate. I demonstrate that this inaccuracy is severe when the mutation rate and strength of selection are jointly large, and I develop a new likelihood calculation that, while approximate, improves the accuracy of population size estimates. When used in a Bayesian parameter estimation context, the new calculation allows for estimation of the shape of the pairwise coalescent rate function and can be used to detect the presence of selection acting at many sites in a sequence. Using the new method, I investigate two sets of dengue virus sequences from Puerto Rico and Thailand, and show that both genealogies are likely to have been distorted by selection.
Collapse
|
44
|
Cartwright RA, Lartillot N, Thorne JL. History can matter: non-Markovian behavior of ancestral lineages. Syst Biol 2011; 60:276-90. [PMID: 21398626 DOI: 10.1093/sysbio/syr012] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Although most of the important evolutionary events in the history of biology can only be studied via interspecific comparisons, it is challenging to apply the rich body of population genetic theory to the study of interspecific genetic variation. Probabilistic modeling of the substitution process would ideally be derived from first principles of population genetics, allowing a quantitative connection to be made between the parameters describing mutation, selection, drift, and the patterns of interspecific variation. There has been progress in reconciling population genetics and interspecific evolution for the case where mutation rates are sufficiently low, but when mutation rates are higher, reconciliation has been hampered due to complications from how the loss or fixation of new mutations can be influenced by linked nonneutral polymorphisms (i.e., the Hill-Robertson effect). To investigate the generation of interspecific genetic variation when concurrent fitness-affecting polymorphisms are common and the Hill-Robertson effect is thereby potentially strong, we used the Wright-Fisher model of population genetics to simulate very many generations of mutation, natural selection, and genetic drift. This was done so that the chronological history of advantageous, deleterious, and neutral substitutions could be traced over time along the ancestral lineage. Our simulations show that the process by which a nonrecombining sequence changes over time can markedly deviate from the Markov assumption that is ubiquitous in molecular phylogenetics. In particular, we find tendencies for advantageous substitutions to be followed by deleterious ones and for deleterious substitutions to be followed by advantageous ones. Such non-Markovian patterns reflect the fact that the fate of the ancestral lineage depends not only on its current allelic state but also on gene copies not belonging to the ancestral lineage. Although our simulations describe nonrecombining sequences, we conclude by discussing how non-Markovian behavior of the ancestral lineage is plausible even when recombination rates are not low. As a result, we believe that increased attention needs to be devoted to the robustness of evolutionary inference procedures that rely upon the Markov assumption.
Collapse
Affiliation(s)
- Reed A Cartwright
- Department of Genetics, Bioinformatics Research Center, North Carolina State University, Raleigh, NC 27695-7566, USA
| | | | | |
Collapse
|
45
|
Contact heterogeneity and phylodynamics: how contact networks shape parasite evolutionary trees. Interdiscip Perspect Infect Dis 2010; 2011:238743. [PMID: 21151699 PMCID: PMC2995904 DOI: 10.1155/2011/238743] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2010] [Accepted: 10/15/2010] [Indexed: 12/12/2022] Open
Abstract
The inference of population dynamics from molecular sequence data
is becoming an important new method for the surveillance of infectious
diseases. Here, we examine how heterogeneity in contact shapes the
genealogies of parasitic agents. Using extensive simulations, we find
that contact heterogeneity can have a strong effect on how the structure
of genealogies reflects epidemiologically relevant quantities such as the
proportion of a population that is infected. Comparing the simulations
to BEAST reconstructions, we also find that contact heterogeneity can
increase the number of sequence isolates required to estimate these
quantities over the course of an epidemic. Our results suggest that
data about contact-network structure will be required in addition to
sequence data for accurate estimation of a parasitic agent's genealogy.
We conclude that network models will be important for progress in
this area.
Collapse
|
46
|
O'Fallon BD. A method to correct for the effects of purifying selection on genealogical inference. Mol Biol Evol 2010; 27:2406-16. [PMID: 20513741 DOI: 10.1093/molbev/msq132] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Accurate reconstruction of the divergence times among individuals is an essential step toward inferring population parameters from genetic data. However, our ability to reconstruct accurate genealogies is often thwarted by the evolutionary forces we hope to detect, most prominently natural selection. Here, I demonstrate that purifying selection acting at many linked sites can systematically bias current methods of genealogical reconstruction, and I present a new method that corrects for this bias by allowing a class of sites to have a time-dependent rate. The parameters influencing the time dependency can be estimated from the data, allowing for a general method to detect the presence of selected sites and correcting for their distortion of the apparent mutation rate. The method works well under a variety of scenarios, including gamma-distributed selection coefficients as well as entirely neutral evolution. I also compare the performance of the new method to relaxed clock models, and I demonstrate the method on a data set from the mitochondrion of the North Atlantic whale-"louse" Cyamus ovalis.
Collapse
|
47
|
Gene genealogies strongly distorted by weakly interfering mutations in constant environments. Genetics 2009; 184:529-45. [PMID: 19966069 DOI: 10.1534/genetics.109.103556] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Neutral nucleotide diversity does not scale with population size as expected, and this "paradox of variation" is especially severe for animal mitochondria. Adaptive selective sweeps are often proposed as a major cause, but a plausible alternative is selection against large numbers of weakly deleterious mutations subject to Hill-Robertson interference. The mitochondrial genealogies of several species of whale lice (Amphipoda: Cyamus) are consistently too short relative to neutral-theory expectations, and they are also distorted in shape (branch-length proportions) and topology (relative sister-clade sizes). This pattern is not easily explained by adaptive sweeps or demographic history, but it can be reproduced in models of interference among forward and back mutations at large numbers of sites on a nonrecombining chromosome. A coalescent simulation algorithm was used to study this model over a wide range of parameter values. The genealogical distortions are all maximized when the selection coefficients are of critical intermediate sizes, such that Muller's ratchet begins to turn. In this regime, linked neutral nucleotide diversity becomes nearly insensitive to N. Mutations of this size dominate the dynamics even if there are also large numbers of more strongly and more weakly selected sites in the genome. A genealogical perspective on Hill-Robertson interference leads directly to a generalized background-selection model in which the effective population size is progressively reduced going back in time from the present.
Collapse
|