1
|
Smith TB, Weissman DB. Isolation by distance in populations with power-law dispersal. G3 (BETHESDA, MD.) 2023; 13:jkad023. [PMID: 36718551 PMCID: PMC10085794 DOI: 10.1093/g3journal/jkad023] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Accepted: 01/07/2023] [Indexed: 02/01/2023]
Abstract
Limited dispersal of individuals between generations results in isolation by distance, in which individuals further apart in space tend to be less related. Classic models of isolation by distance assume that dispersal distances are drawn from a thin-tailed distribution and predict that the proportion of the genome that is identical by descent between a pair of individuals should decrease exponentially with the spatial separation between them. However, in many natural populations, individuals occasionally disperse over very long distances. In this work, we use mathematical analysis and coalescent simulations to study the effect of long-range (power-law) dispersal on patterns of isolation by distance. We find that it leads to power-law decay of identity-by-descent at large distances with the same exponent as dispersal. We also find that broad power-law dispersal produces another, shallow power-law decay of identity-by-descent at short distances. These results suggest that the distribution of long-range dispersal events could be estimated from sequencing large population samples taken from a wide range of spatial scales.
Collapse
Affiliation(s)
- Tyler B Smith
- Department of Physics, Emory University, Atlanta, Georgia 30322, USA
| | - Daniel B Weissman
- Corresponding author: Department of Physics, Emory University, Atlanta, Georgia 30322, USA.
| |
Collapse
|
2
|
Bisschop G, Lohse K, Setter D. Sweeps in time: leveraging the joint distribution of branch lengths. Genetics 2021; 219:iyab119. [PMID: 34849880 PMCID: PMC8633083 DOI: 10.1093/genetics/iyab119] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Accepted: 07/10/2021] [Indexed: 11/14/2022] Open
Abstract
Current methods of identifying positively selected regions in the genome are limited in two key ways: the underlying models cannot account for the timing of adaptive events and the comparison between models of selective sweeps and sequence data is generally made via simple summaries of genetic diversity. Here, we develop a tractable method of describing the effect of positive selection on the genealogical histories in the surrounding genome, explicitly modeling both the timing and context of an adaptive event. In addition, our framework allows us to go beyond analyzing polymorphism data via the site frequency spectrum or summaries thereof and instead leverage information contained in patterns of linked variants. Tests on both simulations and a human data example, as well as a comparison to SweepFinder2, show that even with very small sample sizes, our analytic framework has higher power to identify old selective sweeps and to correctly infer both the time and strength of selection. Finally, we derived the marginal distribution of genealogical branch lengths at a locus affected by selection acting at a linked site. This provides a much-needed link between our analytic understanding of the effects of sweeps on sequence variation and recent advances in simulation and heuristic inference procedures that allow researchers to examine the sequence of genealogical histories along the genome.
Collapse
Affiliation(s)
- Gertjan Bisschop
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh EH9 3FL, UK
| | - Konrad Lohse
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh EH9 3FL, UK
| | - Derek Setter
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh EH9 3FL, UK
| |
Collapse
|
3
|
Excofffier L, Marchi N, Marques DA, Matthey-Doret R, Gouy A, Sousa VC. fastsimcoal2: demographic inference under complex evolutionary scenarios. Bioinformatics 2021; 37:4882-4885. [PMID: 34164653 PMCID: PMC8665742 DOI: 10.1093/bioinformatics/btab468] [Citation(s) in RCA: 174] [Impact Index Per Article: 43.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Revised: 06/11/2021] [Accepted: 06/22/2021] [Indexed: 01/25/2023] Open
Abstract
Motivation fastsimcoal2 extends fastsimcoal, a continuous time coalescent-based genetic simulation program, by enabling the estimation of demographic parameters under very complex scenarios from the site frequency spectrum under a maximum-likelihood framework. Results Other improvements include multi-threading, handling of population inbreeding, extended input file syntax facilitating the description of complex demographic scenarios, and more efficient simulations of sparsely structured populations and of large chromosomes. Availability and implementation fastsimcoal2 is freely available on http://cmpg.unibe.ch/software/fastsimcoal2/. It includes console versions for Linux, Windows and MacOS, additional scripts for the analysis and visualization of simulated and estimated scenarios, as well as a detailed documentation and ready-to-use examples.
Collapse
Affiliation(s)
- Laurent Excofffier
- Computational and Molecular Population Genetics Lab, Institute of Ecology and Evolution, University of Bern, 3012 Bern, Switzerland.,Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Nina Marchi
- Computational and Molecular Population Genetics Lab, Institute of Ecology and Evolution, University of Bern, 3012 Bern, Switzerland.,Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - David Alexander Marques
- Life Science Division, Natural History Museum Basel, 4051 Basel, Switzerland.,Aquatic Ecology and Evolution, Institute of Ecology and Evolution, University of Bern, 3012 Bern, Switzerland.,Department of Fish Ecology and Evolution, EAWAG swiss Federal institute of Aquatic Science and Technology, Center for Ecology, Evolution and Biogeochemistry, 6047 Kastanienbaum, Switzerland
| | - Remi Matthey-Doret
- Computational and Molecular Population Genetics Lab, Institute of Ecology and Evolution, University of Bern, 3012 Bern, Switzerland.,Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Alexandre Gouy
- Computational and Molecular Population Genetics Lab, Institute of Ecology and Evolution, University of Bern, 3012 Bern, Switzerland.,Gouy Data Consulting, 1026 Denges, Switzerland
| | - Vitor C Sousa
- Computational and Molecular Population Genetics Lab, Institute of Ecology and Evolution, University of Bern, 3012 Bern, Switzerland.,cE3c - Centre for Ecology, Evolution and Environmental Changes, Faculdade de Ciências da Universidade de Lisboa, University of Lisbon, Campo Grande, 1749-016, Lisbon, Portugal
| |
Collapse
|
4
|
Inference of Historical Population-Size Changes with Allele-Frequency Data. G3-GENES GENOMES GENETICS 2020; 10:211-223. [PMID: 31699776 PMCID: PMC6945023 DOI: 10.1534/g3.119.400854] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
With up to millions of nearly neutral polymorphisms now being routinely sampled in population-genomic surveys, it is possible to estimate the site-frequency spectrum of such sites with high precision. Each frequency class reflects a mixture of potentially unique demographic histories, which can be revealed using theory for the probability distributions of the starting and ending points of branch segments over all possible coalescence trees. Such distributions are completely independent of past population history, which only influences the segment lengths, providing the basis for estimating average population sizes separating tree-wide coalescence events. The history of population-size change experienced by a sample of polymorphisms can then be dissected in a model-flexible fashion, and extension of this theory allows estimation of the mean and full distribution of long-term effective population sizes and ages of alleles of specific frequencies. Here, we outline the basic theory underlying the conceptual approach, develop and test an efficient statistical procedure for parameter estimation, and apply this to multiple population-genomic datasets for the microcrustacean Daphnia pulex.
Collapse
|
5
|
Mather N, Traves SM, Ho SYW. A practical introduction to sequentially Markovian coalescent methods for estimating demographic history from genomic data. Ecol Evol 2020; 10:579-589. [PMID: 31988743 PMCID: PMC6972798 DOI: 10.1002/ece3.5888] [Citation(s) in RCA: 67] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Revised: 10/11/2019] [Accepted: 11/12/2019] [Indexed: 12/31/2022] Open
Abstract
A common goal of population genomics and molecular ecology is to reconstruct the demographic history of a species of interest. A pair of powerful tools based on the sequentially Markovian coalescent have been developed to infer past population sizes using genome sequences. These methods are most useful when sequences are available for only a limited number of genomes and when the aim is to study ancient demographic events. The results of these analyses can be difficult to interpret accurately, because doing so requires some understanding of their theoretical basis and of their sensitivity to confounding factors. In this practical review, we explain some of the key concepts underpinning the pairwise and multiple sequentially Markovian coalescent methods (PSMC and MSMC, respectively). We relate these concepts to the use and interpretation of these methods, and we explain how the choice of different parameter values by the user can affect the accuracy and precision of the inferences. Based on our survey of 100 PSMC studies and 30 MSMC studies, we describe how the two methods are used in practice. Readers of this article will become familiar with the principles, practice, and interpretation of the sequentially Markovian coalescent for inferring demographic history.
Collapse
Affiliation(s)
- Niklas Mather
- School of Life and Environmental SciencesUniversity of SydneySydneyNSWAustralia
| | - Samuel M. Traves
- School of Life and Environmental SciencesUniversity of SydneySydneyNSWAustralia
| | - Simon Y. W. Ho
- School of Life and Environmental SciencesUniversity of SydneySydneyNSWAustralia
| |
Collapse
|
6
|
Genetic variation across trophic levels: A test of the correlation between population size and genetic diversity in sympatric desert lizards. PLoS One 2019; 14:e0224040. [PMID: 31805058 PMCID: PMC6894812 DOI: 10.1371/journal.pone.0224040] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2019] [Accepted: 10/03/2019] [Indexed: 01/15/2023] Open
Abstract
Understanding the causes of genetic variation in real populations has been elusive. Competing theories claim that neutral vs. selective processes have a greater influence on the genetic variation within a population. A key difference among theories is the relationship between population size and genetic diversity. Our study tests this empirically by sampling two species of herbivorous lizards (Dipsosaurus dorsalis and Sauromalus ater) and two species of carnivorous lizards (Crotaphytus bicinctores and Gambelia wislizenii) that vary in population size at the same locality, and comparing metrics of genetic diversity. Contrary to neutral expectations, results from four independent loci showed levels of diversity were usually higher for species with smaller population sizes. This suggests that selective processes may be having an important impact on intraspecific diversity in this reptile community, although tests showed little evidence for selection on the loci sequenced for this study. It is also possible that idiosyncratic histories of the focal species may be overriding predictions from simple neutral models. If future studies show that lack of correlation between population size and genetic diversity is common, methods using genetic diversity to estimate population parameters like population size or time to common ancestor should be used with caution, as these estimates are based on neutral theory predictions.
Collapse
|
7
|
Parag KV, Pybus OG. Robust Design for Coalescent Model Inference. Syst Biol 2019; 68:730-743. [PMID: 30726979 DOI: 10.1093/sysbio/syz008] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2018] [Revised: 01/28/2019] [Accepted: 02/04/2019] [Indexed: 11/08/2023] Open
Abstract
The coalescent process describes how changes in the size or structure of a population influence the genealogical patterns of sequences sampled from that population. The estimation of (effective) population size changes from genealogies that are reconstructed from these sampled sequences is an important problem in many biological fields. Often, population size is characterized by a piecewise-constant function, with each piece serving as a population size parameter to be estimated. Estimation quality depends on both the statistical coalescent inference method employed, and on the experimental protocol, which controls variables such as the sampling of sequences through time and space, or the transformation of model parameters. While there is an extensive literature on coalescent inference methodology, there is comparatively little work on experimental design. The research that does exist is largely simulation-based, precluding the development of provable or general design theorems. We examine three key design problems: temporal sampling of sequences under the skyline demographic coalescent model, spatio-temporal sampling under the structured coalescent model, and time discretization for sequentially Markovian coalescent models. In all cases, we prove that 1) working in the logarithm of the parameters to be inferred (e.g., population size) and 2) distributing informative coalescent events uniformly among these log-parameters, is uniquely robust. "Robust" means that the total and maximum uncertainty of our parameter estimates are minimized, and made insensitive to their unknown (true) values. This robust design theorem provides rigorous justification for several existing coalescent experimental design decisions and leads to usable guidelines for future empirical or simulation-based investigations. Given its persistence among models, this theorem may form the basis of an experimental design paradigm for coalescent inference.
Collapse
Affiliation(s)
- Kris V Parag
- Department of Zoology, University of Oxford, Oxford OX1 3SY, UK
| | - Oliver G Pybus
- Department of Zoology, University of Oxford, Oxford OX1 3SY, UK
| |
Collapse
|
8
|
Beichman AC, Huerta-Sanchez E, Lohmueller KE. Using Genomic Data to Infer Historic Population Dynamics of Nonmodel Organisms. ANNUAL REVIEW OF ECOLOGY EVOLUTION AND SYSTEMATICS 2018. [DOI: 10.1146/annurev-ecolsys-110617-062431] [Citation(s) in RCA: 89] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Genome sequence data are now being routinely obtained from many nonmodel organisms. These data contain a wealth of information about the demographic history of the populations from which they originate. Many sophisticated statistical inference procedures have been developed to infer the demographic history of populations from this type of genomic data. In this review, we discuss the different statistical methods available for inference of demography, providing an overview of the underlying theory and logic behind each approach. We also discuss the types of data required and the pros and cons of each method. We then discuss how these methods have been applied to a variety of nonmodel organisms. We conclude by presenting some recommendations for researchers looking to use genomic data to infer demographic history.
Collapse
Affiliation(s)
- Annabel C. Beichman
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California 90095, USA
| | - Emilia Huerta-Sanchez
- Department of Molecular and Cell Biology, University of California, Merced, California 95343, USA
- Current affiliation: Department of Ecology and Evolutionary Biology, Brown University, Providence, Rhode Island 02912, USA
| | - Kirk E. Lohmueller
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California 90095, USA
- Interdepartmental Program in Bioinformatics and Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, California 90095, USA
| |
Collapse
|
9
|
Hallatschek O. Selection-Like Biases Emerge in Population Models with Recurrent Jackpot Events. Genetics 2018; 210:1053-1073. [PMID: 30171032 PMCID: PMC6218241 DOI: 10.1534/genetics.118.301516] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2018] [Accepted: 08/26/2018] [Indexed: 11/18/2022] Open
Abstract
Evolutionary dynamics driven out of equilibrium by growth, expansion, or adaptation often generate a characteristically skewed distribution of descendant numbers: the earliest, the most advanced, or the fittest ancestors have exceptionally large number of descendants, which Luria and Delbrück called "jackpot" events. Here, I show that recurrent jackpot events generate a deterministic median bias favoring majority alleles, which is akin to positive frequency-dependent selection (proportional to the log ratio of the frequencies of mutant and wild-type alleles). This fictitious selection force results from the fact that majority alleles tend to sample deeper into the tail of the descendant distribution. The flip side of this sampling effect is the rare occurrence of large frequency hikes in favor of minority alleles, which ensures that the allele frequency dynamics remains neutral in expectation, unless genuine selection is present. The resulting picture of a selection-like bias compensated by rare big jumps allows for an intuitive understanding of allele frequency trajectories and enables the exact calculation of transition densities for a range of important scenarios, including population-size variations and different forms of natural selection. As a general signature of evolution by rare events, fictitious selection hampers the establishment of new beneficial mutations, counteracts balancing selection, and confounds methods to infer selection from data over limited timescales.
Collapse
Affiliation(s)
- Oskar Hallatschek
- Department of Physics, University of California, Berkeley, California 94720
- Department of Integrative Biology, University of California, Berkeley, California 94720
| |
Collapse
|
10
|
Beeravolu CR, Hickerson MJ, Frantz LAF, Lohse K. ABLE: blockwise site frequency spectra for inferring complex population histories and recombination. Genome Biol 2018; 19:145. [PMID: 30253810 PMCID: PMC6156964 DOI: 10.1186/s13059-018-1517-y] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2018] [Accepted: 08/22/2018] [Indexed: 01/08/2023] Open
Abstract
We introduce ABLE (Approximate Blockwise Likelihood Estimation), a novel simulation-based composite likelihood method that uses the blockwise site frequency spectrum to jointly infer past demography and recombination. ABLE is explicitly designed for a wide variety of data from unphased diploid genomes to genome-wide multi-locus data (for example, RADSeq) and can also accommodate arbitrarily large samples. We use simulations to demonstrate the accuracy of this method to infer complex histories of divergence and gene flow and reanalyze whole genome data from two species of orangutan. ABLE is available for download at https://github.com/champost/ABLE.
Collapse
Affiliation(s)
- Champak R Beeravolu
- Biology Department, The City College of New York, New York, 10031, NY, USA. .,Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, 8057, Switzerland.
| | - Michael J Hickerson
- Biology Department, The City College of New York, New York, 10031, NY, USA.,The Graduate Center, The City University of New York, New York, 10016, NY, USA.,Division of Invertebrate Zoology, American Museum of Natural History, New York, 10024, NY, USA
| | - Laurent A F Frantz
- Paleogenomics and Bio-Archaeology Research Network, Research Laboratory for Archeology and History of Art, University of Oxford, Oxford, OX1 3QY, UK.,School of Biological and Chemical Sciences, Queen Mary University of London, London, E1 4NS, UK
| | - Konrad Lohse
- Institute of Evolutionary Biology, University of Edinburgh, King's Buildings, Edinburgh, EH9 3FL, UK
| |
Collapse
|
11
|
Grusea S, Rodríguez W, Pinchon D, Chikhi L, Boitard S, Mazet O. Coalescence times for three genes provide sufficient information to distinguish population structure from population size changes. J Math Biol 2018; 78:189-224. [DOI: 10.1007/s00285-018-1272-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2018] [Revised: 06/19/2018] [Indexed: 01/27/2023]
|
12
|
Simon A, Duranton M. Digest: Demographic inferences accounting for selection at linked sites†. Evolution 2018; 72:1330-1332. [PMID: 29766494 DOI: 10.1111/evo.13504] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Accepted: 05/07/2018] [Indexed: 01/08/2023]
Abstract
Complex demography and selection at linked sites can generate spurious signatures of divergent selection. Unfortunately, many attempts at demographic inference consider overly simple models and neglect the effect of selection at linked sites. In this issue, Rougemont and Bernatchez (2018) applied an approximate Bayesian computation (ABC) framework that accounts for indirect selection to reveal a complex history of secondary contacts in Atlantic salmon (Salmo salar) that might explain a high rate of latitudinal clines in this species.
Collapse
Affiliation(s)
- Alexis Simon
- Institut des Sciences de l'Evolution-Montpellier, Université de Montpellier, CNRS-IRD-EPHE-UM, France
| | - Maud Duranton
- Institut des Sciences de l'Evolution-Montpellier, Université de Montpellier, CNRS-IRD-EPHE-UM, France
| |
Collapse
|
13
|
Comparison of Single Genome and Allele Frequency Data Reveals Discordant Demographic Histories. G3-GENES GENOMES GENETICS 2017; 7:3605-3620. [PMID: 28893846 PMCID: PMC5677151 DOI: 10.1534/g3.117.300259] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
Inference of demographic history from genetic data is a primary goal of population genetics of model and nonmodel organisms. Whole genome-based approaches such as the pairwise/multiple sequentially Markovian coalescent methods use genomic data from one to four individuals to infer the demographic history of an entire population, while site frequency spectrum (SFS)-based methods use the distribution of allele frequencies in a sample to reconstruct the same historical events. Although both methods are extensively used in empirical studies and perform well on data simulated under simple models, there have been only limited comparisons of them in more complex and realistic settings. Here we use published demographic models based on data from three human populations (Yoruba, descendants of northwest-Europeans, and Han Chinese) as an empirical test case to study the behavior of both inference procedures. We find that several of the demographic histories inferred by the whole genome-based methods do not predict the genome-wide distribution of heterozygosity, nor do they predict the empirical SFS. However, using simulated data, we also find that the whole genome methods can reconstruct the complex demographic models inferred by SFS-based methods, suggesting that the discordant patterns of genetic variation are not attributable to a lack of statistical power, but may reflect unmodeled complexities in the underlying demography. More generally, our findings indicate that demographic inference from a small number of genomes, routine in genomic studies of nonmodel organisms, should be interpreted cautiously, as these models cannot recapitulate other summaries of the data.
Collapse
|