51
|
Thornton KR. Polygenic Adaptation to an Environmental Shift: Temporal Dynamics of Variation Under Gaussian Stabilizing Selection and Additive Effects on a Single Trait. Genetics 2019; 213:1513-1530. [PMID: 31653678 PMCID: PMC6893385 DOI: 10.1534/genetics.119.302662] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Accepted: 10/21/2019] [Indexed: 11/26/2022] Open
Abstract
Predictions about the effect of natural selection on patterns of linked neutral variation are largely based on models involving the rapid fixation of unconditionally beneficial mutations. However, when phenotypes adapt to a new optimum trait value, the strength of selection on individual mutations decreases as the population adapts. Here, I use explicit forward simulations of a single trait with additive-effect mutations adapting to an "optimum shift." Detectable "hitchhiking" patterns are only apparent if (i) the optimum shifts are large with respect to equilibrium variation for the trait, (ii) mutation rates to large-effect mutations are low, and (iii) large-effect mutations rapidly increase in frequency and eventually reach fixation, which typically occurs after the population reaches the new optimum. For the parameters simulated here, partial sweeps do not appreciably affect patterns of linked variation, even when the mutations are strongly selected. The contribution of new mutations vs. standing variation to fixation depends on the mutation rate affecting trait values. Given the fixation of a strongly selected variant, patterns of hitchhiking are similar on average for the two classes of sweeps because sweeps from standing variation involving large-effect mutations are rare when the optimum shifts. The distribution of effect sizes of new mutations has little effect on the time to reach the new optimum, but reducing the mutational variance increases the magnitude of hitchhiking patterns. In general, populations reach the new optimum prior to the completion of any sweeps, and the times to fixation are longer for this model than for standard models of directional selection. The long fixation times are due to a combination of declining selection pressures during adaptation and the possibility of interference among weakly selected sites for traits with high mutation rates.
Collapse
Affiliation(s)
- Kevin R Thornton
- Department of Ecology and Evolutionary Biology, University of California, Irvine, California 92697
| |
Collapse
|
52
|
Satta Y, Zheng W, Nishiyama KV, Iwasaki RL, Hayakawa T, Fujito NT, Takahata N. Two-dimensional site frequency spectrum for detecting, classifying and dating incomplete selective sweeps. Genes Genet Syst 2019; 94:283-300. [DOI: 10.1266/ggs.19-00012] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Affiliation(s)
- Yoko Satta
- School of Advanced Sciences, SOKENDAI (The Graduate University for Advanced Studies)
| | - Wanjing Zheng
- School of Advanced Sciences, SOKENDAI (The Graduate University for Advanced Studies)
| | - Kumiko V. Nishiyama
- School of Advanced Sciences, SOKENDAI (The Graduate University for Advanced Studies)
| | - Risa L. Iwasaki
- School of Advanced Sciences, SOKENDAI (The Graduate University for Advanced Studies)
| | - Toshiyuki Hayakawa
- Graduate School of Systems Life Sciences and Faculty of Arts and Science, Kyushu University
| | - Naoko T. Fujito
- Institute for Human Genetics and Department of Epidemiology and Biostatistics, University of California
| | - Naoyuki Takahata
- School of Advanced Sciences, SOKENDAI (The Graduate University for Advanced Studies)
| |
Collapse
|
53
|
Huss E, Pfaffelhuber P. Genealogical distances under low levels of selection. Theor Popul Biol 2019; 131:2-11. [PMID: 31759974 DOI: 10.1016/j.tpb.2019.10.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2018] [Revised: 08/30/2019] [Accepted: 10/02/2019] [Indexed: 10/25/2022]
Abstract
For a panmictic population of constant size evolving under neutrality, Kingman's coalescent describes the genealogy of a population sample in equilibrium. However, for genealogical trees under selection, not even expectations for most basic quantities like height and length of the resulting random tree are known. Here, we give an analytic expression for the distribution of the total tree length of a sample of size n under low levels of selection in a two-alleles model. We can prove that trees are shorter than under neutrality under genic selection and if the beneficial mutant has dominance h<1∕2, but longer for h>1∕2. The difference from neutrality is O(α2) for genic selection with selection intensity α and O(α) for other modes of dominance.
Collapse
Affiliation(s)
- Elisabeth Huss
- Abteilung für Mathematische Stochastik, Albert-Ludwigs University of Freiburg, Ernst-Zermelo-Straße 1, D - 79104 Freiburg, Germany
| | - Peter Pfaffelhuber
- Abteilung für Mathematische Stochastik, Albert-Ludwigs University of Freiburg, Ernst-Zermelo-Straße 1, D - 79104 Freiburg, Germany.
| |
Collapse
|
54
|
Adaptation in structured populations and fuzzy boundaries between hard and soft sweeps. PLoS Comput Biol 2019; 15:e1007426. [PMID: 31710623 PMCID: PMC6872172 DOI: 10.1371/journal.pcbi.1007426] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2019] [Revised: 11/21/2019] [Accepted: 09/20/2019] [Indexed: 11/19/2022] Open
Abstract
Selective sweeps, the genetic footprint of positive selection, have been extensively studied in the past decades, with dozens of methods developed to identify swept regions. However, these methods suffer from both false positive and false negative reports, and the candidates identified with different methods are often inconsistent with each other. We propose that a biological cause of this problem can be population subdivision, and a technical cause can be incomplete, or inaccurate, modeling of the dynamic process associated with sweeps. Here we used simulations to show how these effects interact and potentially cause bias. In particular, we show that sweeps maybe misclassified as either hard or soft, when the true time stage of a sweep and that implied, or pre-supposed, by the model do not match. We call this "temporal misclassification". Similarly, "spatial misclassification (softening)" can occur when hard sweeps, which are imported by migration into a new subpopulation, are falsely identified as soft. This can easily happen in case of local adaptation, i.e. when the sweeping allele is not under positive selection in the new subpopulation, and the underlying model assumes panmixis instead of substructure. The claim that most sweeps in the evolutionary history of humans were soft, may have to be reconsidered in the light of these findings.
Collapse
|
55
|
Stern AJ, Wilton PR, Nielsen R. An approximate full-likelihood method for inferring selection and allele frequency trajectories from DNA sequence data. PLoS Genet 2019; 15:e1008384. [PMID: 31518343 PMCID: PMC6760815 DOI: 10.1371/journal.pgen.1008384] [Citation(s) in RCA: 90] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Revised: 09/25/2019] [Accepted: 08/26/2019] [Indexed: 12/24/2022] Open
Abstract
Most current methods for detecting natural selection from DNA sequence data are limited in that they are either based on summary statistics or a composite likelihood, and as a consequence, do not make full use of the information available in DNA sequence data. We here present a new importance sampling approach for approximating the full likelihood function for the selection coefficient. Our method CLUES treats the ancestral recombination graph (ARG) as a latent variable that is integrated out using previously published Markov Chain Monte Carlo (MCMC) methods. The method can be used for detecting selection, estimating selection coefficients, testing models of changes in the strength of selection, estimating the time of the start of a selective sweep, and for inferring the allele frequency trajectory of a selected or neutral allele. We perform extensive simulations to evaluate the method and show that it uniformly improves power to detect selection compared to current popular methods such as nSL and SDS, and can provide reliable inferences of allele frequency trajectories under many conditions. We also explore the potential of our method to detect extremely recent changes in the strength of selection. We use the method to infer the past allele frequency trajectory for a lactase persistence SNP (MCM6) in Europeans. We also infer the trajectory of a SNP (EDAR) in Han Chinese, finding evidence that this allele's age is much older than previously claimed. We also study a set of 11 pigmentation-associated variants. Several genes show evidence of strong selection particularly within the last 5,000 years, including ASIP, KITLG, and TYR. However, selection on OCA2/HERC2 seems to be much older and, in contrast to previous claims, we find no evidence of selection on TYRP1.
Collapse
Affiliation(s)
- Aaron J. Stern
- Graduate Group in Computation Biology, University of California, Berkeley, Berkeley, California, United States of America
| | - Peter R. Wilton
- Department of Integrative Biology, University of California, Berkeley, Berkeley, California, United States of America
| | - Rasmus Nielsen
- Department of Integrative Biology, University of California, Berkeley, Berkeley, California, United States of America
- Department of Statistics, University of California, Berkeley, Berkeley, California, United States of America
| |
Collapse
|
56
|
Flagel L, Brandvain Y, Schrider DR. The Unreasonable Effectiveness of Convolutional Neural Networks in Population Genetic Inference. Mol Biol Evol 2019; 36:220-238. [PMID: 30517664 PMCID: PMC6367976 DOI: 10.1093/molbev/msy224] [Citation(s) in RCA: 117] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Population-scale genomic data sets have given researchers incredible amounts of information from which to infer evolutionary histories. Concomitant with this flood of data, theoretical and methodological advances have sought to extract information from genomic sequences to infer demographic events such as population size changes and gene flow among closely related populations/species, construct recombination maps, and uncover loci underlying recent adaptation. To date, most methods make use of only one or a few summaries of the input sequences and therefore ignore potentially useful information encoded in the data. The most sophisticated of these approaches involve likelihood calculations, which require theoretical advances for each new problem, and often focus on a single aspect of the data (e.g., only allele frequency information) in the interest of mathematical and computational tractability. Directly interrogating the entirety of the input sequence data in a likelihood-free manner would thus offer a fruitful alternative. Here, we accomplish this by representing DNA sequence alignments as images and using a class of deep learning methods called convolutional neural networks (CNNs) to make population genetic inferences from these images. We apply CNNs to a number of evolutionary questions and find that they frequently match or exceed the accuracy of current methods. Importantly, we show that CNNs perform accurate evolutionary model selection and parameter estimation, even on problems that have not received detailed theoretical treatments. Thus, when applied to population genetic alignments, CNNs are capable of outperforming expert-derived statistical methods and offer a new path forward in cases where no likelihood approach exists.
Collapse
Affiliation(s)
- Lex Flagel
- Monsanto Company, Chesterfield, MO
- Department of Plant and Microbial Biology, University of Minnesota, St. Paul, MN
| | - Yaniv Brandvain
- Department of Plant and Microbial Biology, University of Minnesota, St. Paul, MN
| | - Daniel R Schrider
- Department of Genetics, University of North Carolina, Chapel Hill, NC
| |
Collapse
|
57
|
Abstract
Identifying genomic locations of natural selection from sequence data is an ongoing challenge in population genetics. Current methods utilizing information combined from several summary statistics typically assume no correlation of summary statistics regardless of the genomic location from which they are calculated. However, due to linkage disequilibrium, summary statistics calculated at nearby genomic positions are highly correlated. We introduce an approach termed Trendsetter that accounts for the similarity of statistics calculated from adjacent genomic regions through trend filtering, while reducing the effects of multicollinearity through regularization. Our penalized regression framework has high power to detect sweeps, is capable of classifying sweep regions as either hard or soft, and can be applied to other selection scenarios as well. We find that Trendsetter is robust to both extensive missing data and strong background selection, and has comparable power to similar current approaches. Moreover, the model learned by Trendsetter can be viewed as a set of curves modeling the spatial distribution of summary statistics in the genome. Application to human genomic data revealed positively selected regions previously discovered such as LCT in Europeans and EDAR in East Asians. We also identified a number of novel candidates and show that populations with greater relatedness share more sweep signals.
Collapse
Affiliation(s)
- Mehreen R Mughal
- Bioinformatics and Genomics at the Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA
| | - Michael DeGiorgio
- Departments of Biology and Statistics, Pennsylvania State University,University Park, PA
- Institute for CyberScience, Pennsylvania State University, University Park, PA
| |
Collapse
|
58
|
Harris RB, Sackman A, Jensen JD. On the unfounded enthusiasm for soft selective sweeps II: Examining recent evidence from humans, flies, and viruses. PLoS Genet 2018; 14:e1007859. [PMID: 30592709 PMCID: PMC6336318 DOI: 10.1371/journal.pgen.1007859] [Citation(s) in RCA: 63] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2018] [Revised: 01/17/2019] [Accepted: 11/28/2018] [Indexed: 12/13/2022] Open
Abstract
Since the initial description of the genomic patterns expected under models of positive selection acting on standing genetic variation and on multiple beneficial mutations—so-called soft selective sweeps—researchers have sought to identify these patterns in natural population data. Indeed, over the past two years, large-scale data analyses have argued that soft sweeps are pervasive across organisms of very different effective population size and mutation rate—humans, Drosophila, and HIV. Yet, others have evaluated the relevance of these models to natural populations, as well as the identifiability of the models relative to other known population-level processes, arguing that soft sweeps are likely to be rare. Here, we look to reconcile these opposing results by carefully evaluating three recent studies and their underlying methodologies. Using population genetic theory, as well as extensive simulation, we find that all three examples are prone to extremely high false-positive rates, incorrectly identifying soft sweeps under both hard sweep and neutral models. Furthermore, we demonstrate that well-fit demographic histories combined with rare hard sweeps serve as the more parsimonious explanation. These findings represent a necessary response to the growing tendency of invoking parameter-heavy, assumption-laden models of pervasive positive selection, and neglecting best practices regarding the construction of proper demographic null models. A long-standing debate in evolutionary biology revolves around the role of selective vs. stochastic processes in driving molecular evolution and shaping genetic variation. With the advent of genomics, genome-wide polymorphism data have been utilized to characterize these processes, with a major interest in describing the fraction of genomic variation shaped by positive selection. These genomic scans were initially focused around a hard sweep model, in which selection acts upon rare, newly arising beneficial mutations. Recent years have seen the description of sweeps occurring from both standing and rapidly recurring beneficial mutations, collectively known as soft sweeps. However, common to both hard and soft sweeps is the difficulty in distinguishing these effects from neutral demographic patterns, and disentangling these processes has remained an important field of study within population genetics. Despite this, there is a recent and troubling tendency to neglect these demographic considerations, and to naively fit sweep models to genomic data. Recent realizations of such efforts have resulted in the claim that soft sweeps play a dominant role in shaping genomic variation and in driving adaptation across diverse branches of the tree of life. Here, we reanalyze these findings and demonstrate that a more careful consideration of neutral processes results in highly differing conclusions.
Collapse
Affiliation(s)
- Rebecca B. Harris
- School of Life Sciences, Arizona State University, Tempe, AZ, United States of America
| | - Andrew Sackman
- School of Life Sciences, Arizona State University, Tempe, AZ, United States of America
| | - Jeffrey D. Jensen
- School of Life Sciences, Arizona State University, Tempe, AZ, United States of America
- * E-mail:
| |
Collapse
|
59
|
Bourgeois Y, Stritt C, Walser JC, Gordon SP, Vogel JP, Roulin AC. Genome-wide scans of selection highlight the impact of biotic and abiotic constraints in natural populations of the model grass Brachypodium distachyon. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2018; 96:438-451. [PMID: 30044522 DOI: 10.1111/tpj.14042] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/16/2018] [Revised: 06/20/2018] [Accepted: 07/17/2018] [Indexed: 06/08/2023]
Abstract
Grasses are essential plants for ecosystem functioning. Quantifying the selective pressures that act on natural variation in grass species is therefore essential regarding biodiversity maintenance. In this study, we investigate the selection pressures that act on two distinct populations of the grass model Brachypodium distachyon without prior knowledge about the traits under selection. We took advantage of whole-genome sequencing data produced for 44 natural accessions of B. distachyon and used complementary genome-wide selection scans (GWSS) methods to detect genomic regions under balancing and positive selection. We show that selection is shaping genetic diversity at multiple temporal and spatial scales in this species, and affects different genomic regions across the two populations. Gene ontology annotation of candidate genes reveals that pathogens may constitute important factors of positive and balancing selection in B. distachyon. We eventually cross-validated our results with quantitative trait locus data available for leaf-rust resistance in this species and demonstrate that, when paired with classical trait mapping, GWSS can help pinpointing candidate genes for further molecular validation. Thanks to a near base-perfect reference genome and the large collection of freely available natural accessions collected across its natural range, B. distachyon appears as a prime system for studies in ecology, population genomics and evolutionary biology.
Collapse
Affiliation(s)
- Yann Bourgeois
- New York University Abu Dhabi, PO Box 129188, Saadiyat Island, Abu Dhabi, United Arab Emirates
| | - Christoph Stritt
- Institute of Plant and Microbial Biology, University of Zürich, Zollikerstrasse 107, 8008, Zürich, Switzerland
| | - Jean-Claude Walser
- Genetic Diversity Centre, ETH Zürich, Universitätstrasse 16, Zurich, Switzerland
| | - Sean P Gordon
- DOE Joint Genome Institute, Walnut Creek, CA, 94598, USA
| | - John P Vogel
- DOE Joint Genome Institute, Walnut Creek, CA, 94598, USA
| | - Anne C Roulin
- Institute of Plant and Microbial Biology, University of Zürich, Zollikerstrasse 107, 8008, Zürich, Switzerland
| |
Collapse
|
60
|
Fujito NT, Satta Y, Hayakawa T, Takahata N. A new inference method for detecting an ongoing selective sweep. Genes Genet Syst 2018; 93:149-161. [DOI: 10.1266/ggs.18-00008] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Affiliation(s)
- Naoko T. Fujito
- School of Advanced Sciences, SOKENDAI (The Graduate University for Advanced Studies)
| | - Yoko Satta
- School of Advanced Sciences, SOKENDAI (The Graduate University for Advanced Studies)
| | - Toshiyuki Hayakawa
- Graduate School of Systems Life Sciences, Kyushu University
- Faculty of Arts and Science, Kyushu University
| | - Naoyuki Takahata
- School of Advanced Sciences, SOKENDAI (The Graduate University for Advanced Studies)
| |
Collapse
|
61
|
Alachiotis N, Pavlidis P. RAiSD detects positive selection based on multiple signatures of a selective sweep and SNP vectors. Commun Biol 2018; 1:79. [PMID: 30271960 PMCID: PMC6123745 DOI: 10.1038/s42003-018-0085-8] [Citation(s) in RCA: 80] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2017] [Accepted: 06/05/2018] [Indexed: 12/16/2022] Open
Abstract
Selective sweeps leave distinct signatures locally in genomes, enabling the detection of loci that have undergone recent positive selection. Multiple signatures of a selective sweep are known, yet each neutrality test only identifies a single signature. We present RAiSD (Raised Accuracy in Sweep Detection), an open-source software that implements a novel, to our knowledge, and parameter-free detection mechanism that relies on multiple signatures of a selective sweep via the enumeration of SNP vectors. RAiSD achieves higher sensitivity and accuracy than the current state of the art, while the computational complexity is greatly reduced, allowing up to 1000 times faster processing than widely used tools, and negligible memory requirements. Nikolaos Alachiotis and Pavlos Pavlidis present RAiSD, a computational method for identifying multiple signatures of selective sweeps using single nucleotide polymorphism vectors. They show that RAiSD has higher sensitivity and accuracy with reduced computational complexity than current methods.
Collapse
Affiliation(s)
- Nikolaos Alachiotis
- Institute of Computer Science, Foundation for Research and Technology-Hellas, Nikolaou Plastira 100, 70013, Heraklion, Crete, Greece.
| | - Pavlos Pavlidis
- Institute of Computer Science, Foundation for Research and Technology-Hellas, Nikolaou Plastira 100, 70013, Heraklion, Crete, Greece.
| |
Collapse
|
62
|
Satta Y, Fujito NT, Takahata N. Nonequilibrium Neutral Theory for Hitchhikers. Mol Biol Evol 2018; 35:1362-1365. [PMID: 29722819 DOI: 10.1093/molbev/msy093] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Selective sweep is a phenomenon of reduced variation at presumably neutrally evolving sites (hitchhikers) in the genome that is caused by the spread of a selected allele at a linked focal site, and is widely used to test for action of positive selection. Nonetheless, selective sweep may also provide an unprecedented opportunity for studying nonequilibrium properties of the neutral variation itself. We have demonstrated this possibility in relation to ancient selective sweep for modern human-specific changes and ongoing selective sweep for local population-specific changes.
Collapse
Affiliation(s)
- Yoko Satta
- Department of Evolutionary Studies of Biosystems, School of Advanced Sciences, SOKENDAI (The Graduate University for Advanced Studies), Hayama, Kanagawa, Japan
| | - Naoko T Fujito
- Department of Evolutionary Studies of Biosystems, School of Advanced Sciences, SOKENDAI (The Graduate University for Advanced Studies), Hayama, Kanagawa, Japan
| | - Naoyuki Takahata
- Department of Evolutionary Studies of Biosystems, School of Advanced Sciences, SOKENDAI (The Graduate University for Advanced Studies), Hayama, Kanagawa, Japan
| |
Collapse
|
63
|
Kern AD, Schrider DR. diploS/HIC: An Updated Approach to Classifying Selective Sweeps. G3 (BETHESDA, MD.) 2018; 8:1959-1970. [PMID: 29626082 PMCID: PMC5982824 DOI: 10.1534/g3.118.200262] [Citation(s) in RCA: 88] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Accepted: 04/04/2018] [Indexed: 11/18/2022]
Abstract
Identifying selective sweeps in populations that have complex demographic histories remains a difficult problem in population genetics. We previously introduced a supervised machine learning approach, S/HIC, for finding both hard and soft selective sweeps in genomes on the basis of patterns of genetic variation surrounding a window of the genome. While S/HIC was shown to be both powerful and precise, the utility of S/HIC was limited by the use of phased genomic data as input. In this report we describe a deep learning variant of our method, diploS/HIC, that uses unphased genotypes to accurately classify genomic windows. diploS/HIC is shown to be quite powerful even at moderate to small sample sizes.
Collapse
Affiliation(s)
- Andrew D Kern
- Department of Genetics, Rutgers University, Piscataway, NJ 08854
| | | |
Collapse
|
64
|
Cheng X, Xu C, DeGiorgio M. Fast and robust detection of ancestral selective sweeps. Mol Ecol 2017; 26:6871-6891. [DOI: 10.1111/mec.14416] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2017] [Revised: 10/16/2017] [Accepted: 10/23/2017] [Indexed: 01/01/2023]
Affiliation(s)
- Xiaoheng Cheng
- Huck Institutes of Life Sciences; Pennsylvania State University; University Park PA USA
- Department of Biology; Pennsylvania State University; University Park PA USA
| | - Cheng Xu
- Huck Institutes of Life Sciences; Pennsylvania State University; University Park PA USA
| | - Michael DeGiorgio
- Department of Biology; Pennsylvania State University; University Park PA USA
- Department of Statistics; Pennsylvania State University; University Park PA USA
- Institute for CyberScience; Pennsylvania State University; University Park PA USA
| |
Collapse
|
65
|
Xue AT, Hickerson MJ. multi-dice: r package for comparative population genomic inference under hierarchical co-demographic models of independent single-population size changes. Mol Ecol Resour 2017; 17:e212-e224. [PMID: 28449263 PMCID: PMC5724483 DOI: 10.1111/1755-0998.12686] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2017] [Revised: 03/14/2017] [Accepted: 04/14/2017] [Indexed: 01/25/2023]
Abstract
Population genetic data from multiple taxa can address comparative phylogeographic questions about community-scale response to environmental shifts, and a useful strategy to this end is to employ hierarchical co-demographic models that directly test multi-taxa hypotheses within a single, unified analysis. This approach has been applied to classical phylogeographic data sets such as mitochondrial barcodes as well as reduced-genome polymorphism data sets that can yield 10,000s of SNPs, produced by emergent technologies such as RAD-seq and GBS. A strategy for the latter had been accomplished by adapting the site frequency spectrum to a novel summarization of population genomic data across multiple taxa called the aggregate site frequency spectrum (aSFS), which potentially can be deployed under various inferential frameworks including approximate Bayesian computation, random forest and composite likelihood optimization. Here, we introduce the r package multi-dice, a wrapper program that exploits existing simulation software for flexible execution of hierarchical model-based inference using the aSFS, which is derived from reduced genome data, as well as mitochondrial data. We validate several novel software features such as applying alternative inferential frameworks, enforcing a minimal threshold of time surrounding co-demographic pulses and specifying flexible hyperprior distributions. In sum, multi-dice provides comparative analysis within the familiar R environment while allowing a high degree of user customization, and will thus serve as a tool for comparative phylogeography and population genomics.
Collapse
Affiliation(s)
- Alexander T. Xue
- Department of Biology: Subprogram in Ecology, Evolutionary Biology, and BehaviorCity College and Graduate Center of City University of New YorkNew YorkNYUSA
| | - Michael J. Hickerson
- Department of Biology: Subprogram in Ecology, Evolutionary Biology, and BehaviorCity College and Graduate Center of City University of New YorkNew YorkNYUSA
- Division of Invertebrate ZoologyAmerican Museum of Natural HistoryNew YorkNYUSA
| |
Collapse
|
66
|
Abstract
The degree to which adaptation in recent human evolution shapes genetic variation remains controversial. This is in part due to the limited evidence in humans for classic "hard selective sweeps", wherein a novel beneficial mutation rapidly sweeps through a population to fixation. However, positive selection may often proceed via "soft sweeps" acting on mutations already present within a population. Here, we examine recent positive selection across six human populations using a powerful machine learning approach that is sensitive to both hard and soft sweeps. We found evidence that soft sweeps are widespread and account for the vast majority of recent human adaptation. Surprisingly, our results also suggest that linked positive selection affects patterns of variation across much of the genome, and may increase the frequencies of deleterious mutations. Our results also reveal insights into the role of sexual selection, cancer risk, and central nervous system development in recent human evolution.
Collapse
Affiliation(s)
- Daniel R. Schrider
- Department of Genetics, Rutgers University, Piscataway, NJ
- Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ
| | - Andrew D. Kern
- Department of Genetics, Rutgers University, Piscataway, NJ
- Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ
| |
Collapse
|
67
|
Schrider DR, Shanku AG, Kern AD. Effects of Linked Selective Sweeps on Demographic Inference and Model Selection. Genetics 2016; 204:1207-1223. [PMID: 27605051 PMCID: PMC5105852 DOI: 10.1534/genetics.116.190223] [Citation(s) in RCA: 107] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2016] [Accepted: 09/02/2016] [Indexed: 01/06/2023] Open
Abstract
The availability of large-scale population genomic sequence data has resulted in an explosion in efforts to infer the demographic histories of natural populations across a broad range of organisms. As demographic events alter coalescent genealogies, they leave detectable signatures in patterns of genetic variation within and between populations. Accordingly, a variety of approaches have been designed to leverage population genetic data to uncover the footprints of demographic change in the genome. The vast majority of these methods make the simplifying assumption that the measures of genetic variation used as their input are unaffected by natural selection. However, natural selection can dramatically skew patterns of variation not only at selected sites, but at linked, neutral loci as well. Here we assess the impact of recent positive selection on demographic inference by characterizing the performance of three popular methods through extensive simulation of data sets with varying numbers of linked selective sweeps. In particular, we examined three different demographic models relevant to a number of species, finding that positive selection can bias parameter estimates of each of these models-often severely. We find that selection can lead to incorrect inferences of population size changes when none have occurred. Moreover, we show that linked selection can lead to incorrect demographic model selection, when multiple demographic scenarios are compared. We argue that natural populations may experience the amount of recent positive selection required to skew inferences. These results suggest that demographic studies conducted in many species to date may have exaggerated the extent and frequency of population size changes.
Collapse
Affiliation(s)
- Daniel R Schrider
- Department of Genetics, Rutgers University, Piscataway, New Jersey 08854
- Human Genetics Institute of New Jersey, Rutgers University, Piscataway, New Jersey 08554
| | - Alexander G Shanku
- Department of Genetics, Rutgers University, Piscataway, New Jersey 08854
- Institute for Quantitative Biomedicine, Rutgers University, Piscataway, New Jersey 08554
| | - Andrew D Kern
- Department of Genetics, Rutgers University, Piscataway, New Jersey 08854
- Human Genetics Institute of New Jersey, Rutgers University, Piscataway, New Jersey 08554
| |
Collapse
|