Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Polanski A, Kimmel M. New Explicit Expressions for Relative Frequencies of Single-Nucleotide Polymorphisms With Application to Statistical Inference on Population Growth. Genetics 2003;165:427-36. [PMID: 14504247 PMCID: PMC1462751 DOI: 10.1093/genetics/165.1.427] [Citation(s) in RCA: 86] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

For:	Polanski A, Kimmel M. New Explicit Expressions for Relative Frequencies of Single-Nucleotide Polymorphisms With Application to Statistical Inference on Population Growth. Genetics 2003;165:427-36. [PMID: 14504247 PMCID: PMC1462751 DOI: 10.1093/genetics/165.1.427] [Citation(s) in RCA: 86] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Number

Cited by Other Article(s)

Hobolth A, Rivas-González I, Bladt M, Futschik A. Phase-type distributions in mathematical population genetics: An emerging framework. Theor Popul Biol 2024;157:14-32. [PMID: 38460602 DOI: 10.1016/j.tpb.2024.03.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2023] [Revised: 02/29/2024] [Accepted: 03/04/2024] [Indexed: 03/11/2024]

Abstract

A phase-type distribution is the time to absorption in a continuous- or discrete-time Markov chain. Phase-type distributions can be used as a general framework to calculate key properties of the standard coalescent model and many of its extensions. Here, the 'phases' in the phase-type distribution correspond to states in the ancestral process. For example, the time to the most recent common ancestor and the total branch length are phase-type distributed. Furthermore, the site frequency spectrum follows a multivariate discrete phase-type distribution and the joint distribution of total branch lengths in the two-locus coalescent-with-recombination model is multivariate phase-type distributed. In general, phase-type distributions provide a powerful mathematical framework for coalescent theory because they are analytically tractable using matrix manipulations. The purpose of this review is to explain the phase-type theory and demonstrate how the theory can be applied to derive basic properties of coalescent models. These properties can then be used to obtain insight into the ancestral process, or they can be applied for statistical inference. In particular, we show the relation between classical first-step analysis of coalescent models and phase-type calculations. We also show how reward transformations in phase-type theory lead to easy calculation of covariances and correlation coefficients between e.g. tree height, tree length, external branch length, and internal branch length. Furthermore, we discuss how these quantities can be used for statistical inference based on estimating equations. Providing an alternative to previous work based on the Laplace transform, we derive likelihoods for small-size coalescent trees based on phase-type theory. Overall, our main aim is to demonstrate that phase-type distributions provide a convenient general set of tools to understand aspects of coalescent models that are otherwise difficult to derive. Throughout the review, we emphasize the versatility of the phase-type framework, which is also illustrated by our accompanying R-code. All our analyses and figures can be reproduced from code available on GitHub.

Collapse

Mikula LC, Vogl C. The expected sample allele frequencies from populations of changing size via orthogonal polynomials. Theor Popul Biol 2024;157:55-85. [PMID: 38552964 DOI: 10.1016/j.tpb.2024.03.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 03/24/2024] [Accepted: 03/26/2024] [Indexed: 04/11/2024]

Gerard D. Bayesian tests for random mating in polyploids. Mol Ecol Resour 2023;23:1812-1822. [PMID: 37578636 DOI: 10.1111/1755-0998.13856] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 07/24/2023] [Accepted: 08/03/2023] [Indexed: 08/15/2023]

Spence JP, Zeng T, Mostafavi H, Pritchard JK. Scaling the discrete-time Wright-Fisher model to biobank-scale datasets. Genetics 2023;225:iyad168. [PMID: 37724741 PMCID: PMC10627256 DOI: 10.1093/genetics/iyad168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 06/01/2023] [Accepted: 09/08/2023] [Indexed: 09/21/2023] Open

Johnson B, Shuai Y, Schweinsberg J, Curtius K. cloneRate: fast estimation of single-cell clonal dynamics using coalescent theory. Bioinformatics 2023;39:btad561. [PMID: 37699006 PMCID: PMC10534056 DOI: 10.1093/bioinformatics/btad561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 08/25/2023] [Indexed: 09/14/2023] Open

Hu W, Hao Z, Du P, Di Vincenzo F, Manzi G, Cui J, Fu YX, Pan YH, Li H. Genomic inference of a severe human bottleneck during the Early to Middle Pleistocene transition. Science 2023;381:979-984. [PMID: 37651513 DOI: 10.1126/science.abq7487] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Accepted: 07/11/2023] [Indexed: 09/02/2023]

Affiliation(s)

Wangjie Hu CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China Key Laboratory of Brain Functional Genomics of Ministry of Education, School of Life Science, East China Normal University, Shanghai, China
Ziqian Hao College of Artificial Intelligence and Big Data for Medical Sciences, Shandong First Medical University & Shandong Academy of Medical Sciences, Jinan, China
Pengyuan Du CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China College of Artificial Intelligence and Big Data for Medical Sciences, Shandong First Medical University & Shandong Academy of Medical Sciences, Jinan, China
Fabio Di Vincenzo Natural History Museum, University of Florence, Florence, Italy
Giorgio Manzi Department of Environmental Biology, Sapienza University of Rome, Rome, Italy
Jialong Cui Key Laboratory of Brain Functional Genomics of Ministry of Education, School of Life Science, East China Normal University, Shanghai, China
Yun-Xin Fu Department of Biostatistics and Data Science, School of Public Health, University of Texas Health Science Center at Houston, Houston, TX, USA Key Laboratory for Conservation and Utilization of Bioresources, Yunnan University, Kunming, China
Yi-Hsuan Pan Key Laboratory of Brain Functional Genomics of Ministry of Education, School of Life Science, East China Normal University, Shanghai, China
Haipeng Li CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China

Collapse

Spence JP, Zeng T, Mostafavi H, Pritchard JK. Scaling the Discrete-time Wright Fisher model to biobank-scale datasets. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.19.541517. [PMID: 37293115 PMCID: PMC10245735 DOI: 10.1101/2023.05.19.541517] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]

Abstract

The Discrete-Time Wright Fisher (DTWF) model and its large population diffusion limit are central to population genetics. These models describe the forward-in-time evolution of the frequency of an allele in a population and can include the fundamental forces of genetic drift, mutation, and selection. Computing like-lihoods under the diffusion process is feasible, but the diffusion approximation breaks down for large sample sizes or in the presence of strong selection. Unfortunately, existing methods for computing likelihoods under the DTWF model do not scale to current exome sequencing sample sizes in the hundreds of thousands. Here we present an algorithm that approximates the DTWF model with provably bounded error and runs in time linear in the size of the population. Our approach relies on two key observations about Binomial distributions. The first is that Binomial distributions are approximately sparse. The second is that Binomial distributions with similar success probabilities are extremely close as distributions, allowing us to approximate the DTWF Markov transition matrix as a very low rank matrix. Together, these observations enable matrix-vector multiplication in linear (as opposed to the usual quadratic) time. We prove similar properties for Hypergeometric distributions, enabling fast computation of likelihoods for subsamples of the population. We show theoretically and in practice that this approximation is highly accurate and can scale to population sizes in the billions, paving the way for rigorous biobank-scale population genetic inference. Finally, we use our results to estimate how increasing sample sizes will improve the estimation of selection coefficients acting on loss-of-function variants. We find that increasing sample sizes beyond existing large exome sequencing cohorts will provide essentially no additional information except for genes with the most extreme fitness effects.

Collapse

A class of identifiable phylogenetic birth-death models. Proc Natl Acad Sci U S A 2022;119:e2119513119. [PMID: 35994663 PMCID: PMC9436344 DOI: 10.1073/pnas.2119513119] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open

Kurpas MK, Kimmel M. Modes of Selection in Tumors as Reflected by Two Mathematical Models and Site Frequency Spectra. Front Ecol Evol 2022;10:889438. [PMID: 37333691 PMCID: PMC10275603 DOI: 10.3389/fevo.2022.889438] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/28/2024] Open

Dilber E, Terhorst J. Robust detection of natural selection using a probabilistic model of tree imbalance. Genetics 2022;220:6511494. [PMID: 35100408 PMCID: PMC8893258 DOI: 10.1093/genetics/iyac009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Accepted: 12/16/2021] [Indexed: 01/21/2023] Open

Good BH. Linkage disequilibrium between rare mutations. Genetics 2022;220:6503502. [PMID: 35100407 PMCID: PMC8982034 DOI: 10.1093/genetics/iyac004] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Accepted: 12/21/2021] [Indexed: 01/13/2023] Open

Abstract

The statistical associations between mutations, collectively known as linkage disequilibrium, encode important information about the evolutionary forces acting within a population. Yet in contrast to single-site analogues like the site frequency spectrum, our theoretical understanding of linkage disequilibrium remains limited. In particular, little is currently known about how mutations with different ages and fitness costs contribute to expected patterns of linkage disequilibrium, even in simple settings where recombination and genetic drift are the major evolutionary forces. Here, I introduce a forward-time framework for predicting linkage disequilibrium between pairs of neutral and deleterious mutations as a function of their present-day frequencies. I show that the dynamics of linkage disequilibrium become much simpler in the limit that mutations are rare, where they admit a simple heuristic picture based on the trajectories of the underlying lineages. I use this approach to derive analytical expressions for a family of frequency-weighted linkage disequilibrium statistics as a function of the recombination rate, the frequency scale, and the additive and epistatic fitness costs of the mutations. I find that the frequency scale can have a dramatic impact on the shapes of the resulting linkage disequilibrium curves, reflecting the broad range of time scales over which these correlations arise. I also show that the differences between neutral and deleterious linkage disequilibrium are not purely driven by differences in their mutation frequencies and can instead display qualitative features that are reminiscent of epistasis. I conclude by discussing the implications of these results for recent linkage disequilibrium measurements in bacteria. This forward-time approach may provide a useful framework for predicting linkage disequilibrium across a range of evolutionary scenarios.

Collapse

Nadachowska‐Brzyska K, Konczal M, Babik W. Navigating the temporal continuum of effective population size. Methods Ecol Evol 2021. [DOI: 10.1111/2041-210x.13740] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]

DeWitt WS, Harris KD, Ragsdale AP, Harris K. Nonparametric coalescent inference of mutation spectrum history and demography. Proc Natl Acad Sci U S A 2021;118:e2013798118. [PMID: 34016747 PMCID: PMC8166128 DOI: 10.1073/pnas.2013798118] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Zeng K, Charlesworth B, Hobolth A. Studying models of balancing selection using phase-type theory. Genetics 2021;218:6237896. [PMID: 33871627 DOI: 10.1093/genetics/iyab055] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 03/25/2021] [Indexed: 11/15/2022] Open

Johri P, Riall K, Becher H, Excoffier L, Charlesworth B, Jensen JD. The Impact of Purifying and Background Selection on the Inference of Population History: Problems and Prospects. Mol Biol Evol 2021;38:2986-3003. [PMID: 33591322 PMCID: PMC8233493 DOI: 10.1093/molbev/msab050] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open

Chen H. A Computational Approach for Modeling the Allele Frequency Spectrum of Populations with Arbitrarily Varying Size. GENOMICS PROTEOMICS & BIOINFORMATICS 2020;17:635-644. [PMID: 32173599 PMCID: PMC7212486 DOI: 10.1016/j.gpb.2019.06.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Revised: 06/04/2019] [Accepted: 08/02/2019] [Indexed: 11/25/2022]

Zeng K, Jackson BC, Barton HJ. Methods for Estimating Demography and Detecting Between-Locus Differences in the Effective Population Size and Mutation Rate. Mol Biol Evol 2019;36:423-433. [PMID: 30428070 PMCID: PMC6409433 DOI: 10.1093/molbev/msy212] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open

Mura M, Feillet C, Bertolusso R, Delaunay F, Kimmel M. Mathematical modelling reveals unexpected inheritance and variability patterns of cell cycle parameters in mammalian cells. PLoS Comput Biol 2019;15:e1007054. [PMID: 31158226 PMCID: PMC6564046 DOI: 10.1371/journal.pcbi.1007054] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2018] [Revised: 06/13/2019] [Accepted: 04/26/2019] [Indexed: 01/12/2023] Open

Abstract

The cell cycle is the fundamental process of cell populations, it is regulated by environmental cues and by intracellular checkpoints. Cell cycle variability in clonal cell population is caused by stochastic processes such as random partitioning of cellular components to progeny cells at division and random interactions among biomolecules in cells. One of the important biological questions is how the dynamics at the cell cycle scale, which is related to family dependencies between the cell and its descendants, affects cell population behavior in the long-run. We address this question using a “mechanistic” model, built based on observations of single cells over several cell generations, and then extrapolated in time. We used cell pedigree observations of NIH 3T3 cells including FUCCI markers, to determine patterns of inheritance of cell-cycle phase durations and single-cell protein dynamics. Based on that information we developed a hybrid mathematical model, involving bifurcating autoregression to describe stochasticity of partitioning and inheritance of cell-cycle-phase times, and an ordinary differential equation system to capture single-cell protein dynamics. Long-term simulations, concordant with in vitro experiments, demonstrated the model reproduced the main features of our data and had homeostatic properties. Moreover, heterogeneity of cell cycle may have important consequences during population development. We discovered an effect similar to genetic drift, amplified by family relationships among cells. In consequence, the progeny of a single cell with a short cell cycle time had a high probability of eventually dominating the population, due to the heritability of cell-cycle phases. Patterns of epigenetic heritability in proliferating cells are important for understanding long-term trends of cell populations which are either required to provide the influx of maturing cells (such as hematopoietic stem cells) or which started proliferating uncontrollably (such as cancer cells).

All cells in multicellular organisms obey orchestrated sequences of signals to ensure developmental and homeostatic fitness under a variety of external stimuli. However, there also exist self-perpetuating stem-cell populations, the function of which is to provide a steady supply of differentiated progenitors that in turn ensure persistence of organism functions. This “cell production engine” is an important element of biological homeostasis. A similar process, albeit distorted in many respects, plays a major role in cancer development; here the robustness of homeostasis contributes to difficulty in eradication of malignancy. An important role in homeostasis seems to be played by generation of heterogeneity among cell phenotypes, which then can be shaped by selection and other genetic forces. In the present paper, we present a model of a cultured cell population, which factors in relationships among related cells and the dynamics of cell growth and important proteins regulating cell division. We find that the model not only maintains homeostasis, but that it also responds to perturbations in a manner that is similar to that exhibited by the Wright-Fisher model of population genetics. The model-cell population can become dominated by the progeny of the fittest individuals, without invoking advantageous mutations. If confirmed, this may provide an alternative mode of evolution of cell populations.

Collapse

Beichman AC, Huerta-Sanchez E, Lohmueller KE. Using Genomic Data to Infer Historic Population Dynamics of Nonmodel Organisms. ANNUAL REVIEW OF ECOLOGY EVOLUTION AND SYSTEMATICS 2018. [DOI: 10.1146/annurev-ecolsys-110617-062431] [Citation(s) in RCA: 89] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

The Wright-Fisher site frequency spectrum as a perturbation of the coalescent's. Theor Popul Biol 2018;124:81-92. [PMID: 30308178 DOI: 10.1016/j.tpb.2018.09.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2018] [Revised: 09/22/2018] [Accepted: 09/28/2018] [Indexed: 11/24/2022]

Geometry of the Sample Frequency Spectrum and the Perils of Demographic Inference. Genetics 2018;210:665-682. [PMID: 30064984 DOI: 10.1534/genetics.118.300733] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2018] [Accepted: 07/30/2018] [Indexed: 11/18/2022] Open

Reppell M, Zöllner S. An efficient algorithm for generating the internal branches of a Kingman coalescent. Theor Popul Biol 2018;122:57-66. [PMID: 28709926 PMCID: PMC5764821 DOI: 10.1016/j.tpb.2017.05.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2016] [Revised: 05/19/2017] [Accepted: 05/26/2017] [Indexed: 01/16/2023]

Waltoft BL, Hobolth A. Non-parametric estimation of population size changes from the site frequency spectrum. Stat Appl Genet Mol Biol 2018;17:sagmb-2017-0061. [PMID: 29886455 DOI: 10.1515/sagmb-2017-0061] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Melfi A, Viswanath D. Single and simultaneous binary mergers in Wright-Fisher genealogies. Theor Popul Biol 2018;121:60-71. [PMID: 29655651 DOI: 10.1016/j.tpb.2018.04.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2017] [Revised: 03/29/2018] [Accepted: 04/04/2018] [Indexed: 11/25/2022]

Coalescent Processes with Skewed Offspring Distributions and Nonequilibrium Demography. Genetics 2017;208:323-338. [PMID: 29127263 DOI: 10.1534/genetics.117.300499] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2017] [Accepted: 10/30/2017] [Indexed: 11/18/2022] Open

Comparison of Single Genome and Allele Frequency Data Reveals Discordant Demographic Histories. G3-GENES GENOMES GENETICS 2017;7:3605-3620. [PMID: 28893846 PMCID: PMC5677151 DOI: 10.1534/g3.117.300259] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]

Ohtsuki H, Innan H. Forward and backward evolutionary processes and allele frequency spectrum in a cancer cell population. Theor Popul Biol 2017;117:43-50. [PMID: 28866007 DOI: 10.1016/j.tpb.2017.08.006] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2017] [Revised: 08/08/2017] [Accepted: 08/23/2017] [Indexed: 01/04/2023]

Accuracy of Demographic Inferences from the Site Frequency Spectrum: The Case of the Yoruba Population. Genetics 2017;206:439-449. [PMID: 28341655 DOI: 10.1534/genetics.116.192708] [Citation(s) in RCA: 54] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Accepted: 03/23/2017] [Indexed: 01/23/2023] Open

Kamm JA, Terhorst J, Song YS. Efficient computation of the joint sample frequency spectra for multiple populations. J Comput Graph Stat 2017;26:182-194. [PMID: 28239248 DOI: 10.1080/10618600.2016.1159212] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]

Polanski A, Szczesna A, Garbulowski M, Kimmel M. Coalescence computations for large samples drawn from populations of time-varying sizes. PLoS One 2017;12:e0170701. [PMID: 28170404 PMCID: PMC5295683 DOI: 10.1371/journal.pone.0170701] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2016] [Accepted: 01/09/2017] [Indexed: 11/19/2022] Open

Terhorst J, Kamm JA, Song YS. Robust and scalable inference of population history from hundreds of unphased whole genomes. Nat Genet 2017;49:303-309. [PMID: 28024154 PMCID: PMC5470542 DOI: 10.1038/ng.3748] [Citation(s) in RCA: 368] [Impact Index Per Article: 52.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2016] [Accepted: 11/23/2016] [Indexed: 12/20/2022]

Davis A, Gao R, Navin N. Tumor evolution: Linear, branching, neutral or punctuated? Biochim Biophys Acta Rev Cancer 2017;1867:151-161. [PMID: 28110020 DOI: 10.1016/j.bbcan.2017.01.003] [Citation(s) in RCA: 173] [Impact Index Per Article: 24.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2016] [Revised: 01/14/2017] [Accepted: 01/16/2017] [Indexed: 02/08/2023]

Gao F, Keinan A. Explosive genetic evidence for explosive human population growth. Curr Opin Genet Dev 2016;41:130-139. [PMID: 27710906 PMCID: PMC5161661 DOI: 10.1016/j.gde.2016.09.002] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2016] [Revised: 08/26/2016] [Accepted: 09/11/2016] [Indexed: 11/19/2022]

Spence JP, Kamm JA, Song YS. The Site Frequency Spectrum for General Coalescents. Genetics 2016;202:1549-61. [PMID: 26883445 PMCID: PMC4827730 DOI: 10.1534/genetics.115.184101] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2015] [Accepted: 02/10/2016] [Indexed: 01/25/2023] Open

Burden CJ, Simon H. Genetic drift in populations governed by a Galton-Watson branching process. Theor Popul Biol 2016;109:63-74. [PMID: 27018000 DOI: 10.1016/j.tpb.2016.03.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2015] [Revised: 01/18/2016] [Accepted: 03/15/2016] [Indexed: 11/26/2022]

Inference Methods for Multiple Merger Coalescents. Evol Biol 2016. [DOI: 10.1007/978-3-319-41324-2_20] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]

Inference of Super-exponential Human Population Growth via Efficient Computation of the Site Frequency Spectrum for Generalized Models. Genetics 2015;202:235-45. [PMID: 26450922 PMCID: PMC4701087 DOI: 10.1534/genetics.115.180570] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2015] [Accepted: 09/28/2015] [Indexed: 01/08/2023] Open

Inferring Bottlenecks from Genome-Wide Samples of Short Sequence Blocks. Genetics 2015;201:1157-69. [PMID: 26341659 PMCID: PMC4649642 DOI: 10.1534/genetics.115.179861] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2015] [Accepted: 09/01/2015] [Indexed: 01/02/2023] Open

Chen H. Population genetic studies in the genomic sequencing era. DONG WU XUE YAN JIU = ZOOLOGICAL RESEARCH 2015;36:223-32. [PMID: 26228473 DOI: 10.13918/j.issn.2095-8137.2015.4.223] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]

Chen H, Hey J, Chen K. Inferring Very Recent Population Growth Rate from Population-Scale Sequencing Data: Using a Large-Sample Coalescent Estimator. Mol Biol Evol 2015;32:2996-3011. [PMID: 26187437 DOI: 10.1093/molbev/msv158] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open

Fundamental limits on the accuracy of demographic inference based on the sample frequency spectrum. Proc Natl Acad Sci U S A 2015;112:7677-82. [PMID: 26056264 DOI: 10.1073/pnas.1503717112] [Citation(s) in RCA: 63] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open

Leveraging ancestry to improve causal variant identification in exome sequencing for monogenic disorders. Eur J Hum Genet 2015;24:113-9. [PMID: 25898925 DOI: 10.1038/ejhg.2015.68] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2014] [Revised: 03/01/2015] [Accepted: 03/10/2015] [Indexed: 01/18/2023] Open

Exploring population size changes using SNP frequency spectra. Nat Genet 2015;47:555-9. [PMID: 25848749 PMCID: PMC4414822 DOI: 10.1038/ng.3254] [Citation(s) in RCA: 235] [Impact Index Per Article: 26.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2014] [Accepted: 02/26/2015] [Indexed: 02/05/2023]

Can the site-frequency spectrum distinguish exponential population growth from multiple-merger coalescents? Genetics 2015;199:841-56. [PMID: 25575536 DOI: 10.1534/genetics.114.173807] [Citation(s) in RCA: 57] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open

Bhaskar A, Wang YXR, Song YS. Efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data. Genome Res 2015;25:268-79. [PMID: 25564017 PMCID: PMC4315300 DOI: 10.1101/gr.178756.114] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]

Bhaskar A, Song YS. DESCARTES' RULE OF SIGNS AND THE IDENTIFIABILITY OF POPULATION DEMOGRAPHIC MODELS FROM GENOMIC VARIATION DATA. Ann Stat 2014;42:2469-2493. [PMID: 28018011 DOI: 10.1214/14-aos1264] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]

Abstract

The sample frequency spectrum (SFS) is a widely-used summary statistic of genomic variation in a sample of homologous DNA sequences. It provides a highly efficient dimensional reduction of large-scale population genomic data and its mathematical dependence on the underlying population demography is well understood, thus enabling the development of efficient inference algorithms. However, it has been recently shown that very different population demographies can actually generate the same SFS for arbitrarily large sample sizes. Although in principle this nonidentifiability issue poses a thorny challenge to statistical inference, the population size functions involved in the counterexamples are arguably not so biologically realistic. Here, we revisit this problem and examine the identifiability of demographic models under the restriction that the population sizes are piecewise-defined where each piece belongs to some family of biologically-motivated functions. Under this assumption, we prove that the expected SFS of a sample uniquely determines the underlying demographic model, provided that the sample is sufficiently large. We obtain a general bound on the sample size sufficient for identifiability; the bound depends on the number of pieces in the demographic model and also on the type of population size function in each piece. In the cases of piecewise-constant, piecewise-exponential and piecewise-generalized-exponential models, which are often assumed in population genomic inferences, we provide explicit formulas for the bounds as simple functions of the number of pieces. Lastly, we obtain analogous results for the "folded" SFS, which is often used when there is ambiguity as to which allelic type is ancestral. Our results are proved using a generalization of Descartes' rule of signs for polynomials to the Laplace transform of piecewise continuous functions.

Collapse

Distortion of genealogical properties when the sample is very large. Proc Natl Acad Sci U S A 2014;111:2385-90. [PMID: 24469801 DOI: 10.1073/pnas.1322709111] [Citation(s) in RCA: 57] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

General triallelic frequency spectrum under demographic models with variable population size. Genetics 2013;196:295-311. [PMID: 24214345 DOI: 10.1534/genetics.113.158584] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open

Factors influencing ascertainment bias of microsatellite allele sizes: impact on estimates of mutation rates. Genetics 2013;195:563-72. [PMID: 23946335 DOI: 10.1534/genetics.113.154161] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Abstract

Microsatellite loci play an important role as markers for identification, disease gene mapping, and evolutionary studies. Mutation rate, which is of fundamental importance, can be obtained from interspecies comparisons, which, however, are subject to ascertainment bias. This bias arises, for example, when a locus is selected on the basis of its large allele size in one species (cognate species 1), in which it is first discovered. This bias is reflected in average allele length in any noncognate species 2 being smaller than that in species 1. This phenomenon was observed in various pairs of species, including comparisons of allele sizes in human and chimpanzee. Various mechanisms were proposed to explain observed differences in mean allele lengths between two species. Here, we examine the framework of a single-step asymmetric and unrestricted stepwise mutation model with genetic drift. Analysis is based on coalescent theory. Analytical results are confirmed by simulations using the simuPOP software. The mechanism of ascertainment bias in this model is a tighter correlation of allele sizes within a cognate species 1 than of allele sizes in two different species 1 and 2. We present computations of the expected average allele size difference, given the mutation rate, population sizes of species 1 and 2, time of separation of species 1 and 2, and the age of the allele. We show that when the past demographic histories of the cognate and noncognate taxa are different, the rate and directionality of mutations affect the allele sizes in the two taxa differently from the simple effect of ascertainment bias. This effect may exaggerate or reverse the effect of difference in mutation rates. We reanalyze literature data, which indicate that despite the bias, the microsatellite mutation rate estimate in the ancestral population is consistently greater than that in either human or chimpanzee and the mutation rate estimate in human exceeds or equals that in chimpanzee with the rate of allele length expansion in human being greater than that in chimpanzee. We also demonstrate that population bottlenecks and expansions in the recent human history have little impact on our conclusions.

Collapse

Lachance J, Tishkoff SA. SNP ascertainment bias in population genetic analyses: why it is important, and how to correct it. Bioessays 2013;35:780-6. [PMID: 23836388 DOI: 10.1002/bies.201300014] [Citation(s) in RCA: 193] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]