1
|
Hobolth A, Boitard S, Futschik A, Leblois R. A matrix-analytical sampling formula for time-homogeneous coalescent processes under the infinite sites mutation model. Theor Popul Biol 2025; 163:62-79. [PMID: 40180224 DOI: 10.1016/j.tpb.2025.03.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2024] [Revised: 01/31/2025] [Accepted: 03/02/2025] [Indexed: 04/05/2025]
Abstract
In this paper we develop a general framework for calculating the probability of a genetic sample under a time-homogeneous coalescent process and the infinite sites mutation model. The evolutionary model that we consider can be characterized as a two-step procedure: A coalescent process that describes the ancestral relatedness of the samples and a sprinkling of mutations in separate sites on the ancestral tree according to a Poisson process. The coalescent process is defined using multivariate phase-type theory. The requirements are a rate matrix that determines the transition rates between the ancestral states, an initial state probability vector, and a reward matrix that informs about the characteristics of the ancestral states. For example, the reward matrix could contain information about the number of singleton, doubleton or higher-order lineages in the ancestral states. We analyze the probability generating function for the evolutionary model as a function of the initial state probability vector, the transition rate matrix, the reward matrix, and the mutation rate. The matrix-analytical expression of the probability generating function allows us to develop a general method for calculating the probability of a population genetic data set. We demonstrate that the method is computationally attractive for a small number of mutations and provide a simple and easy-to-implement algorithm for determining the probability of a sample from the evolutionary model. The method is computationally stable and only involves a single inverse matrix operation, matrix multiplications and matrix additions. We provide comprehensive understanding of the procedure by detailed calculations and discussions of several elementary examples. These examples include different sample representations (labeled samples and the site frequency spectrum) and different demographic and genetic models (the structured coalescent and the Beta-coalescent). We apply the sampling formula to calculate probabilities of spectra for the Kingman coalescent and the Beta-coalescent. Even for a small number of samples and mutations we find that the probabilities for spectra vary in huge orders of magnitudes. We compare the probabilities of the spectra to the values of Tajima's D-statistics, and find that the D-statistic is a poor predictor for the probability of a spectrum. Finally, we investigate how the probabilities of the spectra vary with the parametrization of the Beta-coalescent.
Collapse
Affiliation(s)
- Asger Hobolth
- Department of Mathematics, Aarhus University, Aarhus, Denmark.
| | - Simon Boitard
- CBGP, INRAE, CIRAD, IRD, Institut Agro, Univ Montpellier, Montpellier, France.
| | - Andreas Futschik
- Institute of Applied Statistics, Johannes Kepler University, Linz, Austria.
| | - Raphael Leblois
- CBGP, INRAE, CIRAD, IRD, Institut Agro, Univ Montpellier, Montpellier, France.
| |
Collapse
|
2
|
Patel RA, Weiß CL, Zhu H, Mostafavi H, Simons YB, Spence JP, Pritchard JK. Characterizing selection on complex traits through conditional frequency spectra. Genetics 2025; 229:iyae210. [PMID: 39691067 PMCID: PMC12005249 DOI: 10.1093/genetics/iyae210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2024] [Revised: 11/18/2024] [Accepted: 12/03/2024] [Indexed: 12/19/2024] Open
Abstract
Natural selection on complex traits is difficult to study in part due to the ascertainment inherent to genome-wide association studies (GWAS). The power to detect a trait-associated variant in GWAS is a function of its frequency and effect size - but for traits under selection, the effect size of a variant determines the strength of selection against it, constraining its frequency. Recognizing the biases inherent to GWAS ascertainment, we propose studying the joint distribution of allele frequencies across populations, conditional on the frequencies in the GWAS cohort. Before considering these conditional frequency spectra, we first characterized the impact of selection and non-equilibrium demography on allele frequency dynamics forwards and backwards in time. We then used these results to understand conditional frequency spectra under realistic human demography. Finally, we investigated empirical conditional frequency spectra for GWAS variants associated with 106 complex traits, finding compelling evidence for either stabilizing or purifying selection. Our results provide insights into polygenic score portability and other properties of variants ascertained with GWAS, highlighting the utility of conditional frequency spectra.
Collapse
Affiliation(s)
- Roshni A Patel
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Clemens L Weiß
- Stanford Cancer Institute Core, Stanford University, Stanford, CA 94305, USA
| | - Huisheng Zhu
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| | - Hakhamanesh Mostafavi
- Center for Human Genetics and Genomics, New York University School of Medicine, New York, NY 10016, USA
- Division of Biostatistics, Department of Population Health, New York University School of Medicine, New York, NY 10016, USA
| | - Yuval B Simons
- Department of Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Jeffrey P Spence
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Jonathan K Pritchard
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
3
|
Fenton EF, Rice DP, Novembre J, Desai MM. Detecting deviations from Kingman coalescence using 2-site frequency spectra. Genetics 2025; 229:iyaf023. [PMID: 39919046 PMCID: PMC12005255 DOI: 10.1093/genetics/iyaf023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2024] [Accepted: 01/24/2025] [Indexed: 02/09/2025] Open
Abstract
Demographic inference methods in population genetics typically assume that the ancestry of a sample can be modeled by the Kingman coalescent. A defining feature of this stochastic process is that it generates genealogies that are binary trees: no more than 2 ancestral lineages may coalesce at the same time. However, this assumption breaks down under several scenarios. For example, pervasive natural selection and extreme variation in offspring number can both generate genealogies with "multiple-merger" events in which more than 2 lineages coalesce instantaneously. Therefore, detecting violations of the Kingman assumptions (e.g. due to multiple mergers) is important both for understanding which forces have shaped the diversity of a population and for avoiding fitting misspecified models to data. Current methods to detect deviations from Kingman coalescence in genomic data rely primarily on the site frequency spectrum (SFS). However, the signatures of some non-Kingman processes (e.g. multiple mergers) in the SFS are also consistent with a Kingman coalescent with a time-varying population size. Here, we present a new statistical test for determining whether the Kingman coalescent with any population size history is consistent with population data. Our approach is based on information contained in the 2-site joint frequency spectrum (2-SFS) for pairs of linked sites, which has a different dependence on the topologies of genealogies than the SFS. Our statistical test is global in the sense that it can detect when the genome-wide genetic diversity is inconsistent with the Kingman model, rather than detecting outlier regions, as in selection scan methods. We validate this test using simulations and then apply it to demonstrate that genomic diversity data from Drosophila melanogaster is inconsistent with the Kingman coalescent.
Collapse
Affiliation(s)
- Eliot F Fenton
- Department of Physics, Harvard University, Cambridge, MA 02138, USA
| | - Daniel P Rice
- Media Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- SecureBio, Cambridge, MA 02138, USA
| | - John Novembre
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
- Department of Ecology & Evolution, University of Chicago, Chicago, IL 60637, USA
| | - Michael M Desai
- Department of Physics, Harvard University, Cambridge, MA 02138, USA
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| |
Collapse
|
4
|
Wences AH, Peñaloza L, Steinrücken M, Siri-Jégousse A. The TMRCA of general genealogies in populations of variable size. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.19.613917. [PMID: 39386540 PMCID: PMC11463648 DOI: 10.1101/2024.09.19.613917] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/12/2024]
Abstract
We study the time to the most recent common ancestor of a sample of finite size in a wide class of genealogical models for populations with variable size. This is made possible by recently developed results on inhomogeneous phase-type random variables, allowing us to obtain the density and the moments of the TMRCA of time-dependent coalescent processes in terms of matrix formulas. We also provide matrix simplifications permitting a more straightforward calculation. With these results, the TMRCA provides an explicative variable to distinguish different evolutionary scenarios.
Collapse
Affiliation(s)
| | - Lizbeth Peñaloza
- Instituto de Investigaciόn de Matemáticas y Actuaría, Universidad del Mar, campus Huatulco, México
| | | | - Arno Siri-Jégousse
- Instituto de Investigaciones en Matemáticas Aplicadas y Sistemas, Universidad Nacional Autόnoma de México, México
| |
Collapse
|
5
|
Patel RA, Weiß CL, Zhu H, Mostafavi H, Simons YB, Spence JP, Pritchard JK. Conditional frequency spectra as a tool for studying selection on complex traits in biobanks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.15.599126. [PMID: 38948697 PMCID: PMC11212903 DOI: 10.1101/2024.06.15.599126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
Natural selection on complex traits is difficult to study in part due to the ascertainment inherent to genome-wide association studies (GWAS). The power to detect a trait-associated variant in GWAS is a function of frequency and effect size - but for traits under selection, the effect size of a variant determines the strength of selection against it, constraining its frequency. To account for GWAS ascertainment, we propose studying the joint distribution of allele frequencies across populations, conditional on the frequencies in the GWAS cohort. Before considering these conditional frequency spectra, we first characterized the impact of selection and non-equilibrium demography on allele frequency dynamics forwards and backwards in time. We then used these results to understand conditional frequency spectra under realistic human demography. Finally, we investigated empirical conditional frequency spectra for GWAS variants associated with 106 complex traits, finding compelling evidence for either stabilizing or purifying selection. Our results provide insight into polygenic score portability and other properties of variants ascertained with GWAS, highlighting the utility of conditional frequency spectra.
Collapse
Affiliation(s)
- Roshni A. Patel
- Department of Genetics, Stanford University School of Medicine, Stanford, CA
| | - Clemens L. Weiß
- Stanford Cancer Institute Core, Stanford University School of Medicine, Stanford, CA
| | - Huisheng Zhu
- Department of Biology, Stanford University, Stanford, CA
| | - Hakhamanesh Mostafavi
- Center for Human Genetics and Genomics, New York University School of Medicine, New York, NY
- Division of Biostatistics, Department of Population Health, New York University School of Medicine, New York, NY
| | | | - Jeffrey P. Spence
- Department of Genetics, Stanford University School of Medicine, Stanford, CA
| | - Jonathan K. Pritchard
- Department of Genetics, Stanford University School of Medicine, Stanford, CA
- Department of Biology, Stanford University, Stanford, CA
| |
Collapse
|
6
|
Davison C, Tallman S, de Ste-Croix M, Antonio M, Oggioni MR, Kwambana-Adams B, Freund F, Beleza S. Long-term evolution of Streptococcus mitis and Streptococcus pneumoniae leads to higher genetic diversity within rather than between human populations. PLoS Genet 2024; 20:e1011317. [PMID: 38843312 PMCID: PMC11185502 DOI: 10.1371/journal.pgen.1011317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 06/18/2024] [Accepted: 05/23/2024] [Indexed: 06/19/2024] Open
Abstract
Evaluation of the apportionment of genetic diversity of human bacterial commensals within and between human populations is an important step in the characterization of their evolutionary potential. Recent studies showed a correlation between the genomic diversity of human commensal strains and that of their host, but the strength of this correlation and of the geographic structure among human populations is a matter of debate. Here, we studied the genomic diversity and evolution of the phylogenetically related oro-nasopharyngeal healthy-carriage Streptococcus mitis and Streptococcus pneumoniae, whose lifestyles range from stricter commensalism to high pathogenic potential. A total of 119 S. mitis genomes showed higher within- and among-host variation than 810 S. pneumoniae genomes in European, East Asian and African populations. Summary statistics of the site-frequency spectrum for synonymous and non-synonymous variation and ABC modelling showed this difference to be due to higher ancestral bacterial population effective size (Ne) in S. mitis, whose genomic variation has been maintained close to mutation-drift equilibrium across (at least many) generations, whereas S. pneumoniae has been expanding from a smaller ancestral bacterial population. Strikingly, both species show limited differentiation among human populations. As genetic differentiation is inversely proportional to the product of effective population size and migration rate (Nem), we argue that large Ne have led to similar differentiation patterns, even if m is very low for S. mitis. We conclude that more diversity within than among human populations and limited population differentiation must be common features of the human microbiome due to large Ne.
Collapse
Affiliation(s)
- Charlotte Davison
- Department of Genetics and Genome Biology, University of Leicester, Leicester, United Kingdom
| | - Sam Tallman
- Department of Genetics and Genome Biology, University of Leicester, Leicester, United Kingdom
| | - Megan de Ste-Croix
- Department of Genetics and Genome Biology, University of Leicester, Leicester, United Kingdom
| | - Martin Antonio
- Medical Research Council Unit The Gambia at the London School of Hygiene & Tropical Medicine, Fajara, The Gambia
- Centre for Epidemic Preparedness and Response, London School of Hygiene & Tropical Medicine, London, United Kingdom
- Department of Infection Biology, Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, London, United Kingdom
| | - Marco R. Oggioni
- Department of Genetics and Genome Biology, University of Leicester, Leicester, United Kingdom
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Brenda Kwambana-Adams
- Medical Research Council Unit The Gambia at the London School of Hygiene & Tropical Medicine, Fajara, The Gambia
- Department of Clinical Sciences, Liverpool School of Tropical Medicine, Liverpool, United Kingdom
- Malawi Liverpool Welcome Programme, Blantyre, Malawi
- Division of Infection and Immunity, University College London, London, United Kingdom
| | - Fabian Freund
- Department of Genetics and Genome Biology, University of Leicester, Leicester, United Kingdom
| | - Sandra Beleza
- Department of Genetics and Genome Biology, University of Leicester, Leicester, United Kingdom
| |
Collapse
|
7
|
Mikula LC, Vogl C. The expected sample allele frequencies from populations of changing size via orthogonal polynomials. Theor Popul Biol 2024; 157:55-85. [PMID: 38552964 DOI: 10.1016/j.tpb.2024.03.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 03/24/2024] [Accepted: 03/26/2024] [Indexed: 04/11/2024]
Abstract
In this article, discrete and stochastic changes in (effective) population size are incorporated into the spectral representation of a biallelic diffusion process for drift and small mutation rates. A forward algorithm inspired by Hidden-Markov-Model (HMM) literature is used to compute exact sample allele frequency spectra for three demographic scenarios: single changes in (effective) population size, boom-bust dynamics, and stochastic fluctuations in (effective) population size. An approach for fully agnostic demographic inference from these sample allele spectra is explored, and sufficient statistics for stepwise changes in population size are found. Further, convergence behaviours of the polymorphic sample spectra for population size changes on different time scales are examined and discussed within the context of inference of the effective population size. Joint visual assessment of the sample spectra and the temporal coefficients of the spectral decomposition of the forward diffusion process is found to be important in determining departure from equilibrium. Stochastic changes in (effective) population size are shown to shape sample spectra particularly strongly.
Collapse
Affiliation(s)
- Lynette Caitlin Mikula
- Centre for Biological Diversity, School of Biology, University of St. Andrews, St, Andrews KY16 9TH, UK.
| | - Claus Vogl
- Department of Biomedical Sciences and Pathobiology, Vetmeduni Vienna, Veterinärplatz 1, A-1210 Wien, Austria; Vienna Graduate School of Population Genetics, Vetmeduni Vienna, Veterinärplatz 1, A-1210 Wien, Austria.
| |
Collapse
|
8
|
Diamantidis D, Fan WTL, Birkner M, Wakeley J. Bursts of coalescence within population pedigrees whenever big families occur. Genetics 2024; 227:iyae030. [PMID: 38408329 DOI: 10.1093/genetics/iyae030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 01/23/2024] [Accepted: 02/18/2024] [Indexed: 02/28/2024] Open
Abstract
We consider a simple diploid population-genetic model with potentially high variability of offspring numbers among individuals. Specifically, against a backdrop of Wright-Fisher reproduction and no selection, there is an additional probability that a big family occurs, meaning that a pair of individuals has a number of offspring on the order of the population size. We study how the pedigree of the population generated under this model affects the ancestral genetic process of a sample of size two at a single autosomal locus without recombination. Our population model is of the type for which multiple-merger coalescent processes have been described. We prove that the conditional distribution of the pairwise coalescence time given the random pedigree converges to a limit law as the population size tends to infinity. This limit law may or may not be the usual exponential distribution of the Kingman coalescent, depending on the frequency of big families. But because it includes the number and times of big families, it differs from the usual multiple-merger coalescent models. The usual multiple-merger coalescent models are seen as describing the ancestral process marginal to, or averaging over, the pedigree. In the limiting ancestral process conditional on the pedigree, the intervals between big families can be modeled using the Kingman coalescent but each big family causes a discrete jump in the probability of coalescence. Analogous results should hold for larger samples and other population models. We illustrate these results with simulations and additional analysis, highlighting their implications for inference and understanding of multilocus data.
Collapse
Affiliation(s)
| | - Wai-Tong Louis Fan
- Department of Mathematics, Indiana University, Bloomington, IN 47405, USA
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| | - Matthias Birkner
- Institut für Mathematik, Johannes-Gutenberg-Universität, 55099 Mainz, Germany
| | - John Wakeley
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| |
Collapse
|
9
|
Spence JP, Zeng T, Mostafavi H, Pritchard JK. Scaling the discrete-time Wright-Fisher model to biobank-scale datasets. Genetics 2023; 225:iyad168. [PMID: 37724741 PMCID: PMC10627256 DOI: 10.1093/genetics/iyad168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 06/01/2023] [Accepted: 09/08/2023] [Indexed: 09/21/2023] Open
Abstract
The discrete-time Wright-Fisher (DTWF) model and its diffusion limit are central to population genetics. These models can describe the forward-in-time evolution of allele frequencies in a population resulting from genetic drift, mutation, and selection. Computing likelihoods under the diffusion process is feasible, but the diffusion approximation breaks down for large samples or in the presence of strong selection. Existing methods for computing likelihoods under the DTWF model do not scale to current exome sequencing sample sizes in the hundreds of thousands. Here, we present a scalable algorithm that approximates the DTWF model with provably bounded error. Our approach relies on two key observations about the DTWF model. The first is that transition probabilities under the model are approximately sparse. The second is that transition distributions for similar starting allele frequencies are extremely close as distributions. Together, these observations enable approximate matrix-vector multiplication in linear (as opposed to the usual quadratic) time. We prove similar properties for Hypergeometric distributions, enabling fast computation of likelihoods for subsamples of the population. We show theoretically and in practice that this approximation is highly accurate and can scale to population sizes in the tens of millions, paving the way for rigorous biobank-scale inference. Finally, we use our results to estimate the impact of larger samples on estimating selection coefficients for loss-of-function variants. We find that increasing sample sizes beyond existing large exome sequencing cohorts will provide essentially no additional information except for genes with the most extreme fitness effects.
Collapse
Affiliation(s)
- Jeffrey P Spence
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Tony Zeng
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | | | - Jonathan K Pritchard
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
10
|
Spence JP, Zeng T, Mostafavi H, Pritchard JK. Scaling the Discrete-time Wright Fisher model to biobank-scale datasets. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.19.541517. [PMID: 37293115 PMCID: PMC10245735 DOI: 10.1101/2023.05.19.541517] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The Discrete-Time Wright Fisher (DTWF) model and its large population diffusion limit are central to population genetics. These models describe the forward-in-time evolution of the frequency of an allele in a population and can include the fundamental forces of genetic drift, mutation, and selection. Computing like-lihoods under the diffusion process is feasible, but the diffusion approximation breaks down for large sample sizes or in the presence of strong selection. Unfortunately, existing methods for computing likelihoods under the DTWF model do not scale to current exome sequencing sample sizes in the hundreds of thousands. Here we present an algorithm that approximates the DTWF model with provably bounded error and runs in time linear in the size of the population. Our approach relies on two key observations about Binomial distributions. The first is that Binomial distributions are approximately sparse. The second is that Binomial distributions with similar success probabilities are extremely close as distributions, allowing us to approximate the DTWF Markov transition matrix as a very low rank matrix. Together, these observations enable matrix-vector multiplication in linear (as opposed to the usual quadratic) time. We prove similar properties for Hypergeometric distributions, enabling fast computation of likelihoods for subsamples of the population. We show theoretically and in practice that this approximation is highly accurate and can scale to population sizes in the billions, paving the way for rigorous biobank-scale population genetic inference. Finally, we use our results to estimate how increasing sample sizes will improve the estimation of selection coefficients acting on loss-of-function variants. We find that increasing sample sizes beyond existing large exome sequencing cohorts will provide essentially no additional information except for genes with the most extreme fitness effects.
Collapse
Affiliation(s)
| | - Tony Zeng
- Department of Genetics, Stanford University
| | | | - Jonathan K. Pritchard
- Department of Genetics, Stanford University
- Department of Biology, Stanford University
| |
Collapse
|
11
|
Freund F, Kerdoncuff E, Matuszewski S, Lapierre M, Hildebrandt M, Jensen JD, Ferretti L, Lambert A, Sackton TB, Achaz G. Interpreting the pervasive observation of U-shaped Site Frequency Spectra. PLoS Genet 2023; 19:e1010677. [PMID: 36952570 PMCID: PMC10072462 DOI: 10.1371/journal.pgen.1010677] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 04/04/2023] [Accepted: 02/22/2023] [Indexed: 03/25/2023] Open
Abstract
The standard neutral model of molecular evolution has traditionally been used as the null model for population genomics. We gathered a collection of 45 genome-wide site frequency spectra from a diverse set of species, most of which display an excess of low and high frequency variants compared to the expectation of the standard neutral model, resulting in U-shaped spectra. We show that multiple merger coalescent models often provide a better fit to these observations than the standard Kingman coalescent. Hence, in many circumstances these under-utilized models may serve as the more appropriate reference for genomic analyses. We further discuss the underlying evolutionary processes that may result in the widespread U-shape of frequency spectra.
Collapse
Affiliation(s)
- Fabian Freund
- Institute of Plant Breeding, Seed Science and Population Genetics, University of Hohenheim, Stuttgart, Germany
- Department of Genetics and Genome Biology, University of Leicester, Leicester, United Kingdom
| | - Elise Kerdoncuff
- Department of Genetics, University of California, Berkeley, California, United States of America
- Informatics Group, Harvard University, Cambridge, Massachusetts, United States of America
| | | | - Marguerite Lapierre
- Informatics Group, Harvard University, Cambridge, Massachusetts, United States of America
| | | | - Jeffrey D Jensen
- Center for Evolution & Medicine, School of Life Sciences, Arizona State University, Tempe, Arizona, United States of America
| | - Luca Ferretti
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
| | - Amaury Lambert
- Institut de Biologie de l'ENS (IBENS), École Normale Supérieure, Paris, France
- Informatics Group, Harvard University, Cambridge, Massachusetts, United States of America
| | - Timothy B Sackton
- Éco-anthropologie, Muséum National d'Histoire Naturelle, Université Paris-Cité, Paris, France
| | - Guillaume Achaz
- Informatics Group, Harvard University, Cambridge, Massachusetts, United States of America
- SMILE group, Center for Interdisciplinary Research in Biology (CIRB), Collège de France, Paris, France
| |
Collapse
|
12
|
Dilber E, Terhorst J. Robust detection of natural selection using a probabilistic model of tree imbalance. Genetics 2022; 220:6511494. [PMID: 35100408 PMCID: PMC8893258 DOI: 10.1093/genetics/iyac009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Accepted: 12/16/2021] [Indexed: 01/21/2023] Open
Abstract
Neutrality tests such as Tajima's D and Fay and Wu's H are standard implements in the population genetics toolbox. One of their most common uses is to scan the genome for signals of natural selection. However, it is well understood that D and H are confounded by other evolutionary forces-in particular, population expansion-that may be unrelated to selection. Because they are not model-based, it is not clear how to deconfound these tests in a principled way. In this article, we derive new likelihood-based methods for detecting natural selection, which are robust to fluctuations in effective population size. At the core of our method is a novel probabilistic model of tree imbalance, which generalizes Kingman's coalescent to allow certain aberrant tree topologies to arise more frequently than is expected under neutrality. We derive a frequency spectrum-based estimator that can be used in place of D, and also extend to the case where genealogies are first estimated. We benchmark our methods on real and simulated data, and provide an open source software implementation.
Collapse
Affiliation(s)
- Enes Dilber
- Department of Statistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Jonathan Terhorst
- Department of Statistics, University of Michigan, Ann Arbor, MI 48109, USA,Corresponding author: Department of Statistics, University of Michigan, Ann Arbor, MI 48109, USA.
| |
Collapse
|
13
|
DeWitt WS, Harris KD, Ragsdale AP, Harris K. Nonparametric coalescent inference of mutation spectrum history and demography. Proc Natl Acad Sci U S A 2021; 118:e2013798118. [PMID: 34016747 PMCID: PMC8166128 DOI: 10.1073/pnas.2013798118] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
As populations boom and bust, the accumulation of genetic diversity is modulated, encoding histories of living populations in present-day variation. Many methods exist to decode these histories, and all must make strong model assumptions. It is typical to assume that mutations accumulate uniformly across the genome at a constant rate that does not vary between closely related populations. However, recent work shows that mutational processes in human and great ape populations vary across genomic regions and evolve over time. This perturbs the mutation spectrum (relative mutation rates in different local nucleotide contexts). Here, we develop theoretical tools in the framework of Kingman's coalescent to accommodate mutation spectrum dynamics. We present mutation spectrum history inference (mushi), a method to perform nonparametric inference of demographic and mutation spectrum histories from allele frequency data. We use mushi to reconstruct trajectories of effective population size and mutation spectrum divergence between human populations, identify mutation signatures and their dynamics in different human populations, and calibrate the timing of a previously reported mutational pulse in the ancestors of Europeans. We show that mutation spectrum histories can be placed in a well-studied theoretical setting and rigorously inferred from genomic variation data, like other features of evolutionary history.
Collapse
Affiliation(s)
- William S DeWitt
- Department of Genome Sciences, University of Washington, Seattle, WA 98195;
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98109
| | - Kameron Decker Harris
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA 98195
- Department of Biology, University of Washington, Seattle, WA 98195
| | - Aaron P Ragsdale
- National Laboratory of Genomics for Biodiversity, Unit of Advanced Genomics, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional, Irapuato, Mexico 36821
| | - Kelley Harris
- Department of Genome Sciences, University of Washington, Seattle, WA 98195;
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98109
| |
Collapse
|
14
|
Freund F, Siri-Jégousse A. The minimal observable clade size of exchangeable coalescents. BRAZ J PROBAB STAT 2021. [DOI: 10.1214/20-bjps480] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Fabian Freund
- Crop Plant Biodiversity and Breeding Informatics Group (350b), Institute of Plant Breeding, Seed Science and Population Genetics, University of Hohenheim, Fruwirthstrasse 21, 70599 Stuttgart, Germany
| | - Arno Siri-Jégousse
- Departamento de Probabilidad y Estadística, IIMAS, Universidad Nacional Autónoma de México, Mexico City, Mexico
| |
Collapse
|
15
|
Sapra A, Jaksik R, Mehta H, Biesiadny S, Kimmel M, Corey SJ. Effect of the unfolded protein response and oxidative stress on mutagenesis in CSF3R: a model for evolution of severe congenital neutropenia to myelodysplastic syndrome/acute myeloid leukemia. Mutagenesis 2020; 35:381-389. [PMID: 33511998 DOI: 10.1093/mutage/geaa027] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Accepted: 09/21/2020] [Indexed: 11/14/2022] Open
Abstract
Severe congenital neutropenia (SCN) is a rare blood disorder characterised by abnormally low levels of circulating neutrophils. The most common recurrent mutations that cause SCN involve neutrophil elastase (ELANE). The treatment of choice for SCN is the administration of granulocyte-colony stimulating factor (G-CSF), which increases the neutrophil number and improves the survival and quality of life. Long-term survival is however linked to the development of myelodysplastic syndrome/acute myeloid leukemia (MDS/AML). About 70% of MDS/AML patients acquire nonsense mutations affecting the cytoplasmic domain of CSF3R (the G-CSF receptor). About 70% of SCN patients with AML harbour additional mutations in RUNX1. We hypothesised that this coding region of CSF3R constitutes a hotspot vulnerable to mutations resulting from excessive oxidative stress or endoplasmic reticulum (ER) stress. We used the murine Ba/F3 cell line to measure the effect of induced oxidative or ER stress on the mutation rate in our hypothesised hotspot of the exogenous human CSF3R, the corresponding region in the endogenous Csf3r, and Runx1. Ba/F3 cells transduced with the cDNA for partial C-terminal of CSF3R fused in-frame with a green fluorescent protein (GFP) tag were subjected to stress-inducing treatment for 30 days (~51 doubling times). The amplicon-based targeted deep sequencing data for days 15 and 30 samples show that although there was increased mutagenesis observed in all the three genes of interest (partial CSF3R, Csf3r and Runx1), there were more mutations in the GFP region compared with the partial CSF3R region. Our findings also indicate that there is no correlation between the stress-inducing chemical treatments and mutagenesis in Ba/F3 cells. Our data suggest that oxidative or ER stress induction does not promote genomic instability, affecting partial C-terminal of the transduced CSF3R, the endogenous Csf3R and the endogenous Runx1 in Ba/F3 cells that could account for these targets to being mutational hotspots. We conclude that other mechanisms to acquire mutations of CSF3R that help drive the evolution of SCN to MDS/AML.
Collapse
Affiliation(s)
- Adya Sapra
- Department of Pediatrics, Cancer Biology, and Translational Hematology and Oncology Research, Cleveland Clinic, Cleveland, OH, USA
| | - Roman Jaksik
- Department of Systems Biology and Engineering, Silesian University of Technology, Gliwice, Poland
| | - Hrishikesh Mehta
- Department of Pediatrics, Cancer Biology, and Translational Hematology and Oncology Research, Cleveland Clinic, Cleveland, OH, USA
| | - Sara Biesiadny
- Department of Statistics, Rice University, Houston, TX, USA
| | - Marek Kimmel
- Department of Systems Biology and Engineering, Silesian University of Technology, Gliwice, Poland.,Department of Statistics, Rice University, Houston, TX, USA
| | - Seth J Corey
- Department of Systems Biology and Engineering, Silesian University of Technology, Gliwice, Poland
| |
Collapse
|
16
|
Cannings models, population size changes and multiple-merger coalescents. J Math Biol 2020; 80:1497-1521. [PMID: 32008102 PMCID: PMC7052052 DOI: 10.1007/s00285-020-01470-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2019] [Revised: 01/10/2020] [Indexed: 11/17/2022]
Abstract
Multiple-merger coalescents, e.g. \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\varLambda $$\end{document}Λ-n-coalescents, have been proposed as models of the genealogy of n sampled individuals for a range of populations whose genealogical structures are not captured well by Kingman’s n-coalescent. \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\varLambda $$\end{document}Λ-n-coalescents can be seen as the limit process of the discrete genealogies of Cannings models with fixed population size, when time is rescaled and population size \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$N\rightarrow \infty $$\end{document}N→∞. As established for Kingman’s n-coalescent, moderate population size fluctuations in the discrete population model should be reflected by a time-change of the limit coalescent. For \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\varLambda $$\end{document}Λ-n-coalescents, this has been explicitly shown for only a limited subclass of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\varLambda $$\end{document}Λ-n-coalescents and exponentially growing populations. This article gives a more general construction of time-changed \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\varLambda $$\end{document}Λ-n-coalescents as limits of specific Cannings models with rather arbitrary time changes.
Collapse
|
17
|
Diehl CS, Kersting G. Tree lengths for general $\Lambda $-coalescents and the asymptotic site frequency spectrum around the Bolthausen–Sznitman coalescent. ANN APPL PROBAB 2019. [DOI: 10.1214/19-aap1462] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
18
|
Abstract
A variety of methods based on coalescent theory have been developed to infer demographic history from gene sequences sampled from natural populations. The 'skyline plot' and related approaches are commonly employed as flexible prior distributions for phylogenetic trees in the Bayesian analysis of pathogen gene sequences. In this work we extend the classic and generalized skyline plot methods to phylogenies that contain one or more multifurcations (i.e. hard polytomies). We use the theory of Λ-coalescents (specifically, Beta ( 2 - α , α ) -coalescents) to develop the 'multifurcating skyline plot', which estimates a piecewise constant function of effective population size through time, conditional on a time-scaled multifurcating phylogeny. We implement a smoothing procedure and extend the method to serially sampled (heterochronous) data, but we do not address here the problem of estimating trees with multifurcations from gene sequence alignments. We validate our estimator on simulated data using maximum likelihood and find that parameters of the Beta ( 2 - α , α ) -coalescent process can be estimated accurately. Furthermore, we apply the multifurcating skyline plot to simulated trees generated by tracking transmissions in an individual-based model of epidemic superspreading. We find that high levels of superspreading are consistent with the high-variance assumptions underlying Λ-coalescents and that the estimated parameters of the Λ-coalescent model contain information about the degree of superspreading.
Collapse
Affiliation(s)
- Patrick Hoscheit
- MaIAGE, INRA, Université Paris-Saclay, Domaine de Vilvert, Jouy-en-Josas 78350, France
| | - Oliver G Pybus
- Department of Zoology, University of Oxford, Peter Medawar Building, South Parks Road, Oxford OX1 3SY, UK
| |
Collapse
|
19
|
Gnedin A, Iksanov A, Marynych A, Möhle M. The collision spectrum of $\Lambda$-coalescents. ANN APPL PROBAB 2018. [DOI: 10.1214/18-aap1409] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
20
|
Cayuela H, Rougemont Q, Prunier JG, Moore JS, Clobert J, Besnard A, Bernatchez L. Demographic and genetic approaches to study dispersal in wild animal populations: A methodological review. Mol Ecol 2018; 27:3976-4010. [DOI: 10.1111/mec.14848] [Citation(s) in RCA: 85] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2018] [Revised: 08/17/2018] [Accepted: 08/19/2018] [Indexed: 12/31/2022]
Affiliation(s)
- Hugo Cayuela
- Institut de Biologie Intégrative et des Systèmes (IBIS); Université Laval; Québec City Québec Canada
| | - Quentin Rougemont
- Institut de Biologie Intégrative et des Systèmes (IBIS); Université Laval; Québec City Québec Canada
| | - Jérôme G. Prunier
- Station d'Ecologie Théorique et Expérimentale; Unité Mixte de Recherche (UMR) 5321; Centre National de la Recherche Scientifique (CNRS); Université Paul Sabatier (UPS); Moulis France
| | - Jean-Sébastien Moore
- Institut de Biologie Intégrative et des Systèmes (IBIS); Université Laval; Québec City Québec Canada
| | - Jean Clobert
- Station d'Ecologie Théorique et Expérimentale; Unité Mixte de Recherche (UMR) 5321; Centre National de la Recherche Scientifique (CNRS); Université Paul Sabatier (UPS); Moulis France
| | - Aurélien Besnard
- CNRS; PSL Research University; EPHE; UM, SupAgro, IRD; INRA; UMR 5175 CEFE; Montpellier France
| | - Louis Bernatchez
- Institut de Biologie Intégrative et des Systèmes (IBIS); Université Laval; Québec City Québec Canada
| |
Collapse
|
21
|
Koskela J, Jenkins PA, Spanò D. Bayesian non-parametric inference for $\Lambda$-coalescents: Posterior consistency and a parametric method. BERNOULLI 2018. [DOI: 10.3150/16-bej923] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
22
|
Coalescent Processes with Skewed Offspring Distributions and Nonequilibrium Demography. Genetics 2017; 208:323-338. [PMID: 29127263 DOI: 10.1534/genetics.117.300499] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2017] [Accepted: 10/30/2017] [Indexed: 11/18/2022] Open
Abstract
Nonequilibrium demography impacts coalescent genealogies leaving detectable, well-studied signatures of variation. However, similar genomic footprints are also expected under models of large reproductive skew, posing a serious problem when trying to make inference. Furthermore, current approaches consider only one of the two processes at a time, neglecting any genomic signal that could arise from their simultaneous effects, preventing the possibility of jointly inferring parameters relating to both offspring distribution and population history. Here, we develop an extended Moran model with exponential population growth, and demonstrate that the underlying ancestral process converges to a time-inhomogeneous psi-coalescent. However, by applying a nonlinear change of time scale-analogous to the Kingman coalescent-we find that the ancestral process can be rescaled to its time-homogeneous analog, allowing the process to be simulated quickly and efficiently. Furthermore, we derive analytical expressions for the expected site-frequency spectrum under the time-inhomogeneous psi-coalescent, and develop an approximate-likelihood framework for the joint estimation of the coalescent and growth parameters. By means of extensive simulation, we demonstrate that both can be estimated accurately from whole-genome data. In addition, not accounting for demography can lead to serious biases in the inferred coalescent model, with broad implications for genomic studies ranging from ecology to conservation biology. Finally, we use our method to analyze sequence data from Japanese sardine populations, and find evidence of high variation in individual reproductive success, but few signs of a recent demographic expansion.
Collapse
|
23
|
Inferring the Joint Demographic History of Multiple Populations: Beyond the Diffusion Approximation. Genetics 2017; 206:1549-1567. [PMID: 28495960 DOI: 10.1534/genetics.117.200493] [Citation(s) in RCA: 150] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2017] [Accepted: 04/26/2017] [Indexed: 12/18/2022] Open
Abstract
Understanding variation in allele frequencies across populations is a central goal of population genetics. Classical models for the distribution of allele frequencies, using forward simulation, coalescent theory, or the diffusion approximation, have been applied extensively for demographic inference, medical study design, and evolutionary studies. Here we propose a tractable model of ordinary differential equations for the evolution of allele frequencies that is closely related to the diffusion approximation but avoids many of its limitations and approximations. We show that the approach is typically faster, more numerically stable, and more easily generalizable than the state-of-the-art software implementation of the diffusion approximation. We present a number of applications to human sequence data, including demographic inference with a five-population joint frequency spectrum and a discussion of the robustness of the out-of-Africa model inference to the choice of modern population.
Collapse
|
24
|
Eldon B, Riquet F, Yearsley J, Jollivet D, Broquet T. Current hypotheses to explain genetic chaos under the sea. Curr Zool 2016; 62:551-566. [PMID: 29491945 PMCID: PMC5829445 DOI: 10.1093/cz/zow094] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2016] [Accepted: 08/27/2016] [Indexed: 01/07/2023] Open
Abstract
Chaotic genetic patchiness (CGP) refers to surprising patterns of spatial and temporal genetic structure observed in some marine species at a scale where genetic variation should be efficiently homogenized by gene flow via larval dispersal. Here we review and discuss 4 mechanisms that could generate such unexpected patterns: selection, sweepstakes reproductive success, collective dispersal, and temporal shifts in local population dynamics. First, we review examples where genetic differentiation at specific loci was driven by diversifying selection, which was historically the first process invoked to explain CGP. Second, we turn to neutral demographic processes that may drive genome-wide effects, and whose effects on CGP may be enhanced when they act together. We discuss how sweepstakes reproductive success accelerates genetic drift and can thus generate genetic structure, provided that gene flow is not too strong. Collective dispersal is another mechanism whereby genetic structure can be maintained regardless of dispersal intensity, because it may prevent larval cohorts from becoming entirely mixed. Theoretical analyses of both the sweepstakes and the collective dispersal ideas are presented. Finally, we discuss an idea that has received less attention than the other ones just mentioned, namely temporal shifts in local population dynamics.
Collapse
Affiliation(s)
- Bjarki Eldon
- Museum für Naturkunde Berlin, Leibniz Institut für Evolutions- und
Biodiversitätsforschung, Berlin 10115, Germany
| | - Florentine Riquet
- Université Montpellier 2, Place Eugène Bataillon, 34095 Montpellier Cedex 5,
France
- ISEM - CNRS, UMR 5554, SMEL, 2 rue des Chantiers, Sète 34200, France
| | - Jon Yearsley
- School of Biology and Environmental Science and UCD Earth Institute,
University College Dublin, Belfield, Dublin 4, Ireland
| | - Didier Jollivet
- Centre National de la Recherche Scientifique, Team Adaptation and Biology of
Invertebrates in Extreme Environments, Station Biologique de Roscoff, Roscoff 29680,
France
- Sorbonne Universités, Université Pierre et Marie Curie, Unité Mixte de
Recherche 7144, Station Biologique de Roscoff, Roscoff 29680, France
| | - Thomas Broquet
- Sorbonne Universités, Université Pierre et Marie Curie, Unité Mixte de
Recherche 7144, Station Biologique de Roscoff, Roscoff 29680, France
- Centre National de la Recherche Scientifique, Team Diversity and
Connectivity of Coastal Marine Landscapes, Station Biologique de Roscoff, Roscoff 29680,
France
| |
Collapse
|
25
|
Gao F, Keinan A. Explosive genetic evidence for explosive human population growth. Curr Opin Genet Dev 2016; 41:130-139. [PMID: 27710906 PMCID: PMC5161661 DOI: 10.1016/j.gde.2016.09.002] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2016] [Revised: 08/26/2016] [Accepted: 09/11/2016] [Indexed: 11/19/2022]
Abstract
The advent of next-generation sequencing technology has allowed the collection of vast amounts of genetic variation data. A recurring discovery from studying larger and larger samples of individuals had been the extreme, previously unexpected, excess of very rare genetic variants, which has been shown to be mostly due to the recent explosive growth of human populations. Here, we review recent literature that inferred recent changes in population size in different human populations and with different methodologies, with many pointing to recent explosive growth, especially in European populations for which more data has been available. We also review the state-of-the-art methods and software for the inference of historical population size changes that lead to these discoveries. Finally, we discuss the implications of recent population growth on personalized genomics, on purifying selection in the non-equilibrium state it entails and, as a consequence, on the genetic architecture underlying complex disease and the performance of mapping methods in discovering rare variants that contribute to complex disease risk.
Collapse
Affiliation(s)
- Feng Gao
- Department of Biological Statistics and Computational Biology, Ithaca, NY 14850, United States
| | - Alon Keinan
- Department of Biological Statistics and Computational Biology, Ithaca, NY 14850, United States.
| |
Collapse
|
26
|
The site-frequency spectrum associated with Ξ-coalescents. Theor Popul Biol 2016; 110:36-50. [DOI: 10.1016/j.tpb.2016.04.002] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2015] [Revised: 04/12/2016] [Accepted: 04/13/2016] [Indexed: 11/24/2022]
|
27
|
Inference Methods for Multiple Merger Coalescents. Evol Biol 2016. [DOI: 10.1007/978-3-319-41324-2_20] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|