1
|
Liu JJ, Edge MD. Error rates in QST-FST comparisons depend on genetic architecture and estimation procedures. Genetics 2025; 229:iyaf034. [PMID: 40036848 PMCID: PMC12005246 DOI: 10.1093/genetics/iyaf034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2024] [Revised: 02/12/2025] [Accepted: 02/21/2025] [Indexed: 03/06/2025] Open
Abstract
Genetic and phenotypic variation among populations is one of the fundamental subjects of evolutionary genetics. One question that arises often in data on natural populations is whether differentiation among populations on a particular trait might be caused in part by natural selection. For the past several decades, researchers have used QST-FST approaches to compare the amount of trait differentiation among populations on one or more traits (measured by the statistic QST) with differentiation on genome-wide genetic variants (measured by FST). Theory says that under neutrality, FST and QST should be approximately equal in expectation, so QST values much larger than FST are consistent with local adaptation driving subpopulations' trait values apart, and QST values much smaller than FST are consistent with stabilizing selection on similar optima. At the same time, investigators have differed in their definitions of genome-wide FST (such as "ratio of averages" vs. "average of ratios" versions of FST) and in their definitions of the variance components in QST. Here, we show that these details matter. Different versions of FST and QST have different interpretations in terms of coalescence time, and comparing incompatible statistics can lead to elevated type I error rates, with some choices leading to type I error rates near one when the nominal rate is 5%. We conduct simulations under varying genetic architectures and forms of population structure and show how they affect the distribution of QST. When many loci influence the trait, our simulations support procedures grounded in a coalescent-based framework for neutral phenotypic differentiation.
Collapse
Affiliation(s)
- Junjian J Liu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Michael D Edge
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
2
|
Liu X, Ahsan Z, Rosenberg NA. Using mathematical constraints to explain narrow ranges for allele-sharing dissimilarities. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.19.624404. [PMID: 39605376 PMCID: PMC11601660 DOI: 10.1101/2024.11.19.624404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
Allele-sharing dissimilarity (ASD) statistics are measures of genetic differentiation for pairs of individuals or populations. Given the allele-frequency distributions of two populations-possibly the same population-the expected value of an ASD statistic is computed by evaluating the expectation of the pairwise dissimilarity between two individuals drawn at random, each from its associated allele-frequency distribution. For each of two ASD statistics, which we termD 1 andD 2 , we investigate the extent to which the expected ASD is constrained by allele frequencies in the two populations; in other words, how is the magnitude of the measure bounded as a function of the frequency of the most frequent allelic type? We first consider dissimilarity of a population with itself, obtaining bounds on expected ASD in terms of the frequency of the most frequent allelic type in the population. We then examine pairs of populations that might or might not possess the same most frequent allelic type. Across the unit interval for the frequency of the most frequent allelic type, the expected allele-sharing dissimilarity has a range that is more restricted than the [0, 1] interval. The mathematical constraints on expected ASD assist in explaining a pattern observed empirically in human populations, namely that when averaging across loci, allele-sharing dissimilarities between pairs of individuals often tend to vary only within a relatively narrow range.
Collapse
Affiliation(s)
- Xiran Liu
- Institute for Computational and Mathematical Engineering, Stanford University, Stanford, CA 94305
| | - Zarif Ahsan
- Department of Biology, Stanford University, Stanford, CA 94305
| | | |
Collapse
|
3
|
Liu JJ, Edge MD. Error rates in Q ST - F ST comparisons depend on genetic architecture and estimation procedures. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.28.620737. [PMID: 39553965 PMCID: PMC11565820 DOI: 10.1101/2024.10.28.620737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/19/2024]
Abstract
Genetic and phenotypic variation among populations is one of the fundamental subjects of evolutionary genetics. One question that arises often in data on natural populations is whether differentiation among populations on a particular trait might be caused in part by natural selection. For the past several decades, researchers have usedQ S T - F S T approaches to compare the amount of trait differentiation among populations on one or more traits (measured by the statisticQ S T ) with differentiation on genome-wide genetic variants (measured byF S T ). Theory says that under neutrality,F S T andQ S T should be approximately equal in expectation, soQ S T values much larger thanF S T are consistent with local adaptation driving subpopulations' trait values apart, andQ S T values much smaller thanF S T are consistent with stabilizing selection on similar optima. At the same time, investigators have differed in their definitions of genome-wideF S T (such as "ratio of averages" vs. "average of ratios" versions ofF S T ) and in their definitions of the variance components inQ S T . Here, we show that these details matter. Different versions ofF S T andQ S T have different interpretations in terms of coalescence time, and comparing incompatible statistics can lead to elevated type I error rates, with some choices leading to type I error rates near one when the nominal rate is 5%. We conduct simulations under varying genetic architectures and forms of population structure and show how they affect the distribution ofQ S T . When many loci influence the trait, our simulations support procedures grounded in a coalescent-based framework for neutral phenotytpic differentiation.
Collapse
Affiliation(s)
- Junjian J. Liu
- Department of Quantitative and Computational Biology, University of Southern California
| | - Michael D. Edge
- Department of Quantitative and Computational Biology, University of Southern California
| |
Collapse
|
4
|
Harris KD, Greenbaum G. DORA: an interactive map for the visualization and analysis of ancient human DNA and associated data. Nucleic Acids Res 2024; 52:W54-W60. [PMID: 38742634 PMCID: PMC11223807 DOI: 10.1093/nar/gkae373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 04/17/2024] [Accepted: 04/25/2024] [Indexed: 05/16/2024] Open
Abstract
The ability to sequence ancient genomes has revolutionized the way we study evolutionary history by providing access to the most important aspect of evolution-time. Until recently, studying human demography, ecology, biology, and history using population genomic inference relied on contemporary genomic datasets. Over the past decade, the availability of human ancient DNA (aDNA) has increased rapidly, almost doubling every year, opening the way for spatiotemporal studies of ancient human populations. However, the multidimensionality of aDNA, with genotypes having temporal, spatial and genomic coordinates, and integrating multiple sources of data, poses a challenge for developing meta-analyses pipelines. To address this challenge, we developed a publicly-available interactive tool, DORA, which integrates multiple data types, genomic and non-genomic, in a unified interface. This web-based tool enables browsing sample metadata alongside additional layers of information, such as population structure, climatic data, and unpublished samples. Users can perform analyses on genotypes of these samples, or export sample subsets for external analyses. DORA integrates analyses and visualizations in a single intuitive interface, resolving the technical issues of combining datasets from different sources and formats, and allowing researchers to focus on the scientific questions that can be addressed through analysis of aDNA datasets.
Collapse
Affiliation(s)
- Keith D Harris
- Department of Ecology, Evolution and Behavior, The Hebrew University of Jerusalem, Givat Ram, 9190401 Jerusalem, Israel
| | - Gili Greenbaum
- Department of Ecology, Evolution and Behavior, The Hebrew University of Jerusalem, Givat Ram, 9190401 Jerusalem, Israel
| |
Collapse
|
5
|
Ye Z, Pfrender ME, Lynch M. Evolutionary Genomics of Sister Species Differing in Effective Population Sizes and Recombination Rates. Genome Biol Evol 2023; 15:evad202. [PMID: 37946625 PMCID: PMC10664402 DOI: 10.1093/gbe/evad202] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 10/16/2023] [Accepted: 11/02/2023] [Indexed: 11/12/2023] Open
Abstract
Studies of closely related species with known ecological differences provide exceptional opportunities for understanding the genetic mechanisms of evolution. In this study, we compared population-genomics data between Daphnia pulex and Daphnia pulicaria, two reproductively compatible sister species experiencing ecological speciation, the first largely confined to intermittent ponds and the second to permanent lakes in the same geographic region. Daphnia pulicaria has lower genome-wide nucleotide diversity, a smaller effective population size, a higher incidence of private alleles, and a substantially more linkage disequilibrium than D. pulex. Positively selected genes in D. pulicaria are enriched in potentially aging-related categories such as cellular homeostasis, which may explain the extended life span in D. pulicaria. We also found that opsin-related genes, which may mediate photoperiodic responses, are under different selection pressures in these two species. Genes involved in mitochondrial functions, ribosomes, and responses to environmental stimuli are found to be under positive selection in both species. Additionally, we found that the two species have similar average evolutionary rates at the DNA-sequence level, although approximately 160 genes have significantly different rates in the two lineages. Our results provide insights into the physiological traits that differ within this regionally sympatric sister-species pair that occupies unique microhabitats.
Collapse
Affiliation(s)
- Zhiqiang Ye
- Hubei Key Laboratory of Genetic Regulation and Integrative Biology, School of Life Sciences, Central China Normal University, Wuhan, China
| | - Michael E Pfrender
- Department of Biological Sciences, University of Notre Dame, Notre Dame, Indiana, USA
| | - Michael Lynch
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, Arizona, USA
| |
Collapse
|
6
|
Morrison ML, Rosenberg NA. Mathematical bounds on Shannon entropy given the abundance of the ith most abundant taxon. J Math Biol 2023; 87:76. [PMID: 37884812 PMCID: PMC10603011 DOI: 10.1007/s00285-023-01997-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 08/21/2023] [Accepted: 09/14/2023] [Indexed: 10/28/2023]
Abstract
The measurement of diversity is a central component of studies in ecology and evolution, with broad uses spanning multiple biological scales. Studies of diversity conducted in population genetics and ecology make use of analogous concepts and even employ equivalent mathematical formulas. For the Shannon entropy statistic, recent developments in the mathematics of diversity in population genetics have produced mathematical constraints on the statistic in relation to the frequency of the most frequent allele. These results have characterized the ways in which standard measures depend on the highest-frequency class in a discrete probability distribution. Here, we extend mathematical constraints on the Shannon entropy in relation to entries in specific positions in a vector of species abundances, listed in decreasing order. We illustrate the new mathematical results using abundance data from examples involving coral reefs and sponge microbiomes. The new results update the understanding of the relationship of a standard measure to the abundance vectors from which it is calculated, potentially contributing to improved interpretation of numerical measurements of biodiversity.
Collapse
Affiliation(s)
- Maike L Morrison
- Department of Biology, Stanford University, Stanford, CA, 94305, USA.
| | - Noah A Rosenberg
- Department of Biology, Stanford University, Stanford, CA, 94305, USA
| |
Collapse
|
7
|
Morrison ML, Alcala N, Rosenberg NA. FSTruct: An F ST -based tool for measuring ancestry variation in inference of population structure. Mol Ecol Resour 2022; 22:2614-2626. [PMID: 35596736 PMCID: PMC9544611 DOI: 10.1111/1755-0998.13647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Revised: 03/09/2022] [Accepted: 05/13/2022] [Indexed: 11/30/2022]
Abstract
In model-based inference of population structure from individual-level genetic data, individuals are assigned membership coefficients in a series of statistical clusters generated by clustering algorithms. Distinct patterns of variability in membership coefficients can be produced for different groups of individuals, for example, representing different predefined populations, sampling sites or time periods. Such variability can be difficult to capture in a single numerical value; membership coefficient vectors are multivariate and potentially incommensurable across predefined groups, as the number of clusters over which individuals are distributed can vary among groups of interest. Further, two groups might share few clusters in common, so that membership coefficient vectors are concentrated on different clusters. We introduce a method for measuring the variability of membership coefficients of individuals in a predefined group, making use of an analogy between variability across individuals in membership coefficient vectors and variation across populations in allele frequency vectors. We show that in a model in which membership coefficient vectors in a population follow a Dirichlet distribution, the measure increases linearly with a parameter describing the variance of a specified component of the membership vector and does not depend on its mean. We apply the approach, which makes use of a normalized FST statistic, to data on inferred population structure in three example scenarios. We also introduce a bootstrap test for equivalence of two or more predefined groups in their level of membership coefficient variability. Our methods are implemented in the r package FSTruct.
Collapse
Affiliation(s)
| | - Nicolas Alcala
- Rare Cancers Genomics Team (RCG)Genomic Epidemiology Branch (GEM)International Agency for Research on Cancer/World Health Organisation (IARC/WHO)LyonFrance
| | | |
Collapse
|
8
|
Abstract
The ways in which genetic variation is distributed within and among populations is a key determinant of the evolutionary features of a species. However, most comprehensive studies of these features have been restricted to studies of subdivision in settings known to have been driven by local adaptation, leaving our understanding of the natural dispersion of allelic variation less than ideal. Here, we present a geographic population-genomic analysis of 10 populations of the freshwater microcrustacean Daphnia pulex, an emerging model system in evolutionary genomics. These populations exhibit a pattern of moderate isolation-by-distance, with an average migration rate of 0.6 individuals per generation, and average effective population sizes of ∼650,000 individuals. Most populations contain numerous private alleles, and genomic scans highlight the presence of islands of excessively high population subdivision for more common alleles. A large fraction of such islands of population divergence likely reflect historical neutral changes, including rare stochastic migration and hybridization events. The data do point to local adaptive divergence, although the precise nature of the relevant variation is diffuse and cannot be associated with particular loci, despite the very large sample sizes involved in this study. In contrast, an analysis of between-species divergence highlights positive selection operating on a large set of genes with functions nearly nonoverlapping with those involved in local adaptation, in particular ribosome structure, mitochondrial bioenergetics, light reception and response, detoxification, and gene regulation. These results set the stage for using D. pulex as a model for understanding the relationship between molecular and cellular evolution in the context of natural environments.
Collapse
Affiliation(s)
- Takahiro Maruki
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, AZ 85287, USA
| | - Zhiqiang Ye
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, AZ 85287, USA
| | - Michael Lynch
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, AZ 85287, USA
| |
Collapse
|
9
|
Alcala N, Rosenberg NA. Mathematical constraints on FST: multiallelic markers in arbitrarily many populations. Philos Trans R Soc Lond B Biol Sci 2022; 377:20200414. [PMID: 35430885 PMCID: PMC9014193 DOI: 10.1098/rstb.2020.0414] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Accepted: 10/23/2021] [Indexed: 11/12/2022] Open
Abstract
Interpretations of values of the FST measure of genetic differentiation rely on an understanding of its mathematical constraints. Previously, it has been shown that FST values computed from a biallelic locus in a set of multiple populations and FST values computed from a multiallelic locus in a pair of populations are mathematically constrained as a function of the frequency of the allele that is most frequent across populations. We generalize from these cases to report here the mathematical constraint on FST given the frequency M of the most frequent allele at a multiallelic locus in a set of multiple populations. Using coalescent simulations of an island model of migration with an infinitely-many-alleles mutation model, we argue that the joint distribution of FST and M helps in disentangling the separate influences of mutation and migration on FST. Finally, we show that our results explain a puzzling pattern of microsatellite differentiation: the lower FST in an interspecific comparison between humans and chimpanzees than in the comparison of chimpanzee populations. We discuss the implications of our results for the use of FST. This article is part of the theme issue 'Celebrating 50 years since Lewontin's apportionment of human diversity'.
Collapse
Affiliation(s)
- Nicolas Alcala
- Rare Cancers Genomics Team (RCG), Genetic Epidemiology Branch (GEM), International Agency for Research on Cancer/World Health Organization, Lyon 69008, France
| | - Noah A. Rosenberg
- Department of Biology, Stanford University, Stanford, CA 94305-5020, USA
| |
Collapse
|
10
|
Sentinella AT, Moles AT, Bragg JG, Rossetto M, Sherwin WB. Detecting steps in spatial genetic data: Which diversity measures are best? PLoS One 2022; 17:e0265110. [PMID: 35287164 PMCID: PMC8920294 DOI: 10.1371/journal.pone.0265110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Accepted: 02/23/2022] [Indexed: 12/05/2022] Open
Abstract
Accurately detecting sudden changes, or steps, in genetic diversity across landscapes is important for locating barriers to gene flow, identifying selectively important loci, and defining management units. However, there are many metrics that researchers could use to detect steps and little information on which might be the most robust. Our study aimed to determine the best measure/s for genetic step detection along linear gradients using biallelic single nucleotide polymorphism (SNP) data. We tested the ability to differentiate between linear and step-like gradients in genetic diversity, using a range of diversity measures derived from the q-profile, including allelic richness, Shannon Information, GST, and Jost-D, as well as Bray-Curtis dissimilarity. To determine the properties of each measure, we repeated simulations of different intensities of step and allele proportion ranges, with varying genome sample size, number of loci, and number of localities. We found that alpha diversity (within-locality) based measures were ineffective at detecting steps. Further, allelic richness-based beta (between-locality) measures (e.g., Jaccard and Sørensen dissimilarity) were not reliable for detecting steps, but instead detected departures from fixation. The beta diversity measures best able to detect steps were: Shannon Information based measures, GST based measures, a Jost-D related measure, and Bray-Curtis dissimilarity. No one measure was best overall, with a trade-off between those measures with high step detection sensitivity (GST and Bray-Curtis) and those that minimised false positives (a variant of Shannon Information). Therefore, when detecting steps, we recommend understanding the differences between measures and using a combination of approaches.
Collapse
Affiliation(s)
- Alexander T. Sentinella
- Evolution & Ecology Research Centre, School of Biological, Earth and Environmental Sciences, UNSW Sydney, Sydney, NSW, Australia
| | - Angela T. Moles
- Evolution & Ecology Research Centre, School of Biological, Earth and Environmental Sciences, UNSW Sydney, Sydney, NSW, Australia
| | - Jason G. Bragg
- Evolution & Ecology Research Centre, School of Biological, Earth and Environmental Sciences, UNSW Sydney, Sydney, NSW, Australia
- Research Centre for Ecosystem Resilience, Australian Institute of Botanical Science, The Royal Botanic Garden Sydney, Sydney, NSW, Australia
| | - Maurizio Rossetto
- Research Centre for Ecosystem Resilience, Australian Institute of Botanical Science, The Royal Botanic Garden Sydney, Sydney, NSW, Australia
| | - William B. Sherwin
- Evolution & Ecology Research Centre, School of Biological, Earth and Environmental Sciences, UNSW Sydney, Sydney, NSW, Australia
| |
Collapse
|
11
|
Gay L, Dhinaut J, Jullien M, Vitalis R, Navascués M, Ranwez V, Ronfort J. Evolution of flowering time in a selfing annual plant: Roles of adaptation and genetic drift. Ecol Evol 2022; 12:e8555. [PMID: 35127051 PMCID: PMC8794724 DOI: 10.1002/ece3.8555] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Accepted: 12/10/2021] [Indexed: 11/10/2022] Open
Abstract
Resurrection studies are a useful tool to measure how phenotypic traits have changed in populations through time. If these trait modifications correlate with the environmental changes that occurred during the time period, it suggests that the phenotypic changes could be a response to selection. Selfing, through its reduction of effective size, could challenge the ability of a population to adapt to environmental changes. Here, we used a resurrection study to test for adaptation in a selfing population of Medicago truncatula, by comparing the genetic composition and flowering times across 22 generations. We found evidence for evolution toward earlier flowering times by about two days and a peculiar genetic structure, typical of highly selfing populations, where some multilocus genotypes (MLGs) are persistent through time. We used the change in frequency of the MLGs through time as a multilocus fitness measure and built a selection gradient that suggests evolution toward earlier flowering times. Yet, a simulation model revealed that the observed change in flowering time could be explained by drift alone, provided the effective size of the population is small enough (<150). These analyses suffer from the difficulty to estimate the effective size in a highly selfing population, where effective recombination is severely reduced.
Collapse
Affiliation(s)
- Laurène Gay
- CIRADINRAEInstitut AgroUMR AGAP InstitutUniv MontpellierMontpellierFrance
| | - Julien Dhinaut
- CIRADINRAEInstitut AgroUMR AGAP InstitutUniv MontpellierMontpellierFrance
- Present address:
Evolutionary Biology and Ecology of AlgaeUPMCUniversity of Paris VI, UC, UACH, UMI 3614CNRSSorbonne UniversitésRoscoffFrance
| | - Margaux Jullien
- CIRADINRAEInstitut AgroUMR AGAP InstitutUniv MontpellierMontpellierFrance
- Present address:
INRAUniv. Paris‐SudCNRSAgroParisTechGQE – Le MoulonUniversité Paris‐SaclayGif‐sur‐YvetteFrance
| | - Renaud Vitalis
- CIRADINRAEInstitut AgroIRDCBGPUniv MontpellierMontpellierFrance
| | | | - Vincent Ranwez
- CIRADINRAEInstitut AgroUMR AGAP InstitutUniv MontpellierMontpellierFrance
| | - Joëlle Ronfort
- CIRADINRAEInstitut AgroUMR AGAP InstitutUniv MontpellierMontpellierFrance
| |
Collapse
|
12
|
Bertram J. Allele frequency divergence reveals ubiquitous influence of positive selection in Drosophila. PLoS Genet 2021; 17:e1009833. [PMID: 34591854 PMCID: PMC8509871 DOI: 10.1371/journal.pgen.1009833] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Revised: 10/12/2021] [Accepted: 09/22/2021] [Indexed: 12/04/2022] Open
Abstract
Resolving the role of natural selection is a basic objective of evolutionary biology. It is generally difficult to detect the influence of selection because ubiquitous non-selective stochastic change in allele frequencies (genetic drift) degrades evidence of selection. As a result, selection scans typically only identify genomic regions that have undergone episodes of intense selection. Yet it seems likely such episodes are the exception; the norm is more likely to involve subtle, concurrent selective changes at a large number of loci. We develop a new theoretical approach that uncovers a previously undocumented genome-wide signature of selection in the collective divergence of allele frequencies over time. Applying our approach to temporally resolved allele frequency measurements from laboratory and wild Drosophila populations, we quantify the selective contribution to allele frequency divergence and find that selection has substantial effects on much of the genome. We further quantify the magnitude of the total selection coefficient (a measure of the combined effects of direct and linked selection) at a typical polymorphic locus, and find this to be large (of order 1%) even though most mutations are not directly under selection. We find that selective allele frequency divergence is substantially elevated at intermediate allele frequencies, which we argue is most parsimoniously explained by positive-not negative-selection. Thus, in these populations most mutations are far from evolving neutrally in the short term (tens of generations), including mutations with neutral fitness effects, and the result cannot be explained simply as an ongoing purging of deleterious mutations.
Collapse
Affiliation(s)
- Jason Bertram
- Environmental Resilience Institute, Indiana University, Bloomington, Indiana, United States of America
- Department of Biology, Indiana University, Bloomington, Indiana, United States of America
| |
Collapse
|
13
|
Boca SM, Huang L, Rosenberg NA. On the heterozygosity of an admixed population. J Math Biol 2020; 81:1217-1250. [PMID: 33034736 DOI: 10.1007/s00285-020-01531-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2019] [Revised: 08/04/2020] [Indexed: 01/18/2023]
Abstract
In this study, we consider admixed populations through their expected heterozygosity, a measure of genetic diversity. A population is termed admixed if its members possess recent ancestry from two or more separate sources. As a result of the fusion of source populations with different genetic variants, admixed populations can exhibit high levels of genetic diversity, reflecting contributions of their multiple ancestral groups. For a model of an admixed population derived from K source populations, we obtain a relationship between its heterozygosity and its proportions of admixture from the various source populations. We show that the heterozygosity of the admixed population is at least as great as that of the least heterozygous source population, and that it potentially exceeds the heterozygosities of all of the source populations. The admixture proportions that maximize the heterozygosity possible for an admixed population formed from a specified set of source populations are also obtained under specific conditions. We examine the special case of [Formula: see text] source populations in detail, characterizing the maximal admixture in terms of the heterozygosities of the two source populations and the value of [Formula: see text] between them. In this case, the heterozygosity of the admixed population exceeds the maximal heterozygosity of the source groups if the divergence between them, measured by [Formula: see text], is large enough, namely above a certain bound that is a function of the heterozygosities of the source groups. We present applications to simulated data as well as to data from human admixture scenarios, providing results useful for interpreting the properties of genetic variability in admixed populations.
Collapse
Affiliation(s)
- Simina M Boca
- Department of Oncology, Department of Biostatistics, Bioinformatics and Biomathematics, Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, 20007, USA.
| | - Lucy Huang
- Bioinformatics Graduate Program, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Noah A Rosenberg
- Department of Biology, Stanford University, Stanford, CA, 94305, USA
| |
Collapse
|
14
|
Lefèvre F, Gallais A. Partitioning heterozygosity in subdivided populations: Some misuses of Nei's decomposition and an alternative probabilistic approach. Mol Ecol 2020; 29:2957-2962. [PMID: 32594582 DOI: 10.1111/mec.15527] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Revised: 06/12/2020] [Accepted: 06/12/2020] [Indexed: 01/05/2023]
Abstract
Nei's decomposition of total expected heterozygosity in subdivided populations into within- and between-subpopulation components, HS and DST , respectively, is a classical tool in the conservation and management of genetic resources. Reviewing why this is not a decomposition into independent terms of within- and between-subpopulation gene diversity, we illustrate how this approach can be misleading because it overemphasizes the within-subpopulation component compared to Jost's nonadditive decomposition based on gene diversity indices. Using probabilistic partitioning of the total expected heterozygosity into independent within- and between-subpopulation contributions, we show that the contribution of the within-subpopulation expected heterozygosity to the total expected heterozygosity is not HS , as suggested by Nei's decomposition, but HS /s, with s being the number of subpopulations. Finally, we compare three possible approaches of decomposing total heterozygosity in subdivided populations (i.e., Nei's decomposition, Jost's approach, and probabilistic partitioning) with regard to independence between terms and sensitivity to unequal subpopulation sizes. For the conservation and management of genetic resources, we recommend using probabilistic partitioning and Jost's differentiation parameter rather than Nei's decomposition.
Collapse
Affiliation(s)
- François Lefèvre
- Ecologie des Forêts Méditerranéennes, URFM, INRAE, Avignon, France
| | - André Gallais
- UMR Génétique Quantitative et Evolution, INRAE-UPS-CNRS, Gif-sur-Yvette, France
| |
Collapse
|
15
|
Mamoozadeh NR, Graves JE, McDowell JR. Genome-wide SNPs resolve spatiotemporal patterns of connectivity within striped marlin ( Kajikia audax), a broadly distributed and highly migratory pelagic species. Evol Appl 2020; 13:677-698. [PMID: 32211060 PMCID: PMC7086058 DOI: 10.1111/eva.12892] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Revised: 09/23/2019] [Accepted: 10/16/2019] [Indexed: 01/04/2023] Open
Abstract
Genomic methodologies offer unprecedented opportunities for statistically robust studies of species broadly distributed in environments conducive to high gene flow, providing valuable information for wildlife conservation and management. Here, we sequence restriction site-associated DNA to characterize genome-wide single nucleotide polymorphisms (SNPs) in a broadly distributed and highly migratory large pelagic fish, striped marlin (Kajikia audax). Assessment of over 4,000 SNPs resolved spatiotemporal patterns of genetic connectivity throughout the species range in the Pacific and, for the first time, Indian oceans. Individual-based cluster analyses identified six genetically distinct populations corresponding with the western Indian, eastern Indian, western South Pacific, and eastern central Pacific oceans, as well as two populations in the North Pacific Ocean (F ST = 0.0137-0.0819). F ST outlier analyses identified a subset of SNPs (n = 59) putatively under the influence of natural selection and capable of resolving populations separated by comparatively high degrees of genetic differentiation. Temporal collections available for some regions demonstrated the stability of allele frequencies over three to five generations of striped marlin. Relative migration rates reflected lower levels of genetic connectivity between Indian Ocean populations (m R ≤ 0.37) compared with most populations in the Pacific Ocean (m R ≥ 0.57) and highlight the importance of the western South Pacific in facilitating gene flow between ocean basins. Collectively, our results provide novel insights into rangewide population structure for striped marlin and highlight substantial inconsistencies between genetically distinct populations and stocks currently recognized for fisheries management. More broadly, we demonstrate that species capable of long-distance dispersal in environments lacking obvious physical barriers to movement can display substantial population subdivision that persists over multiple generations and that may be facilitated by both neutral and adaptive processes. Importantly, surveys of genome-wide markers enable inference of population-level relationships using sample sizes practical for large pelagic fishes of conservation concern.
Collapse
Affiliation(s)
- Nadya R. Mamoozadeh
- Department of Fisheries ScienceVirginia Institute of Marine ScienceWilliam & MaryGloucester PointVirginia
| | - John E. Graves
- Department of Fisheries ScienceVirginia Institute of Marine ScienceWilliam & MaryGloucester PointVirginia
| | - Jan R. McDowell
- Department of Fisheries ScienceVirginia Institute of Marine ScienceWilliam & MaryGloucester PointVirginia
| |
Collapse
|
16
|
Kang JTL, Rosenberg NA. Mathematical Properties of Linkage Disequilibrium Statistics Defined by Normalization of the Coefficient D = pAB - pApB. Hum Hered 2020; 84:127-143. [PMID: 32045910 DOI: 10.1159/000504171] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Accepted: 10/10/2019] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Many statistics for measuring linkage disequilibrium (LD) take the form of a normalization of the LD coefficient D. Different normalizations produce statistics with different ranges, interpretations, and arguments favoring their use. METHODS Here, to compare the mathematical properties of these normalizations, we consider 5 of these normalized statistics, describing their upper bounds, the mean values of their maxima over the set of possible allele frequency pairs, and the size of the allele frequency regions accessible given specified values of the statistics. RESULTS We produce detailed characterizations of these properties for the statistics d and ρ, analogous to computations previously performed for r2. We examine the relationships among the statistics, uncovering conditions under which some of them have close connections. CONCLUSION The results contribute insight into LD measurement, particularly the understanding of differences in the features of different LD measures when computed on the same data.
Collapse
Affiliation(s)
- Jonathan T L Kang
- Department of Biology, Stanford University, Stanford, California, USA,
| | - Noah A Rosenberg
- Department of Biology, Stanford University, Stanford, California, USA
| |
Collapse
|
17
|
Abstract
Climate shifts are key drivers of ecosystem change. Despite the critical importance of Antarctica and the Southern Ocean for global climate, the extent of climate-driven ecological change in this region remains controversial. In particular, the biological effects of changing sea ice conditions are poorly understood. We hypothesize that rapid postglacial reductions in sea ice drove biological shifts across multiple widespread Southern Ocean species. We test for demographic shifts driven by climate events over recent millennia by analyzing population genomic datasets spanning 3 penguin genera (Eudyptes, Pygoscelis, and Aptenodytes). Demographic analyses for multiple species (macaroni/royal, eastern rockhopper, Adélie, gentoo, king, and emperor) currently inhabiting southern coastlines affected by heavy sea ice conditions during the Last Glacial Maximum (LGM) yielded genetic signatures of near-simultaneous population expansions associated with postglacial warming. Populations of the ice-adapted emperor penguin are inferred to have expanded slightly earlier than those of species requiring ice-free terrain. These concerted high-latitude expansion events contrast with relatively stable or declining demographic histories inferred for 4 penguin species (northern rockhopper, western rockhopper, Fiordland crested, and Snares crested) that apparently persisted throughout the LGM in ice-free habitats. Limited genetic structure detected in all ice-affected species across the vast Southern Ocean may reflect both rapid postglacial colonization of subantarctic and Antarctic shores, in addition to recent genetic exchange among populations. Together, these analyses highlight dramatic, ecosystem-wide responses to past Southern Ocean climate change and suggest potential for further shifts as warming continues.
Collapse
|
18
|
Mehta RS, Feder AF, Boca SM, Rosenberg NA. The Relationship Between Haplotype-Based FST and Haplotype Length. Genetics 2019; 213:281-295. [PMID: 31285255 PMCID: PMC6727796 DOI: 10.1534/genetics.119.302430] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2019] [Accepted: 06/29/2019] [Indexed: 11/18/2022] Open
Abstract
The population-genetic statistic [Formula: see text] is used widely to describe allele frequency distributions in subdivided populations. The increasing availability of DNA sequence data has recently enabled computations of [Formula: see text] from sequence-based "haplotype loci." At the same time, theoretical work has revealed that [Formula: see text] has a strong dependence on the underlying genetic diversity of a locus from which it is computed, with high diversity constraining values of [Formula: see text] to be low. In the case of haplotype loci, for which two haplotypes that are distinct over a specified length along a chromosome are treated as distinct alleles, genetic diversity is influenced by haplotype length: longer haplotype loci have the potential for greater genetic diversity. Here, we study the dependence of [Formula: see text] on haplotype length. Using a model in which a haplotype locus is sequentially incremented by one biallelic locus at a time, we show that increasing the length of the haplotype locus can either increase or decrease the value of [Formula: see text], and usually decreases it. We compute [Formula: see text] on haplotype loci in human populations, finding a close correspondence between the observed values and our theoretical predictions. We conclude that effects of haplotype length are valuable to consider when interpreting [Formula: see text] calculated on haplotypic data.
Collapse
Affiliation(s)
- Rohan S Mehta
- Department of Biology, Stanford University, Stanford, California 94305
| | - Alison F Feder
- Department of Biology, Stanford University, Stanford, California 94305
- Department of Integrative Biology, University of California, Berkeley, California 94720
| | - Simina M Boca
- Innovation Center for Biomedical Informatics, Georgetown University, Washington, DC 20007
| | - Noah A Rosenberg
- Department of Biology, Stanford University, Stanford, California 94305
| |
Collapse
|
19
|
Alcala N, Rosenberg NA. GST' , Jost's D, and F ST are similarly constrained by allele frequencies: A mathematical, simulation, and empirical study. Mol Ecol 2019; 28:1624-1636. [PMID: 30589985 PMCID: PMC6821915 DOI: 10.1111/mec.15000] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2018] [Revised: 12/11/2018] [Accepted: 12/17/2018] [Indexed: 01/15/2023]
Abstract
Statistics GST' and Jost's D have been proposed for replacing FST as measures of genetic differentiation. A principal argument in favour of these statistics is the independence of their maximal values with respect to the subpopulation heterozygosity HS , a property not shared by FST . Nevertheless, it has been unclear if these alternative differentiation measures are constrained by other aspects of the allele frequencies. Here, for biallelic markers, we study the mathematical properties of the maximal values of GST' and D, comparing them to those of FST . We show that GST' and D exhibit the same peculiar frequency-dependence phenomena as FST , including a maximal value as a function of the frequency of the most frequent allele that lies well below one. Although the functions describing GST' , D, and FST in terms of the frequency of the most frequent allele are different, the allele frequencies that maximize them are identical. Moreover, we show using coalescent simulations that when taking into account the specific maximal values of the three statistics, their behaviours become similar across a large range of migration rates. We use our results to explain two empirical patterns: the similar values of the three statistics among North American wolves, and the low D values compared to GST' and FST in Atlantic salmon. The results suggest that the three statistics are often predictably similar, so that they can make quite similar contributions to data analysis. When they are not similar, the difference can be understood in relation to features of genetic diversity.
Collapse
Affiliation(s)
- Nicolas Alcala
- Department of Biology, Stanford University, Stanford, California
| | - Noah A Rosenberg
- Department of Biology, Stanford University, Stanford, California
| |
Collapse
|
20
|
Brandt DYC, César J, Goudet J, Meyer D. The Effect of Balancing Selection on Population Differentiation: A Study with HLA Genes. G3 (BETHESDA, MD.) 2018; 8:2805-2815. [PMID: 29950428 PMCID: PMC6071603 DOI: 10.1534/g3.118.200367] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2018] [Accepted: 06/21/2018] [Indexed: 01/10/2023]
Abstract
Balancing selection is defined as a class of selective regimes that maintain polymorphism above what is expected under neutrality. Theory predicts that balancing selection reduces population differentiation, as measured by FST. However, balancing selection regimes in which different sets of alleles are maintained in different populations could increase population differentiation. To tackle the connection between balancing selection and population differentiation, we investigated population differentiation at the HLA genes, which constitute the most striking example of balancing selection in humans. We found that population differentiation of single nucleotide polymorphisms (SNPs) at the HLA genes is on average lower than that of SNPs in other genomic regions. We show that these results require using a computation that accounts for the dependence of FST on allele frequencies. However, in pairs of closely related populations, where genome-wide differentiation is low, differentiation at HLA is higher than in other genomic regions. Such increased population differentiation at HLA genes for recently diverged population pairs was reproduced in simulations of overdominant selection, as long as the fitness of the homozygotes differs between the diverging populations. The results give insight into a possible "divergent overdominance" mechanism for the nature of balancing selection on HLA genes across human populations.
Collapse
Affiliation(s)
- Débora Y C Brandt
- Departamento de Genética e Biologia Evolutiva, Universidade de São Paulo, São Paulo, SP, Brazil
| | - Jônatas César
- Departamento de Genética e Biologia Evolutiva, Universidade de São Paulo, São Paulo, SP, Brazil
| | - Jérôme Goudet
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland
| | - Diogo Meyer
- Departamento de Genética e Biologia Evolutiva, Universidade de São Paulo, São Paulo, SP, Brazil
| |
Collapse
|