1
|
Peng D, Mulder OJ, Edge MD. Evaluating ARG-estimation methods in the context of estimating population-mean polygenic score histories. Genetics 2025; 229:iyaf033. [PMID: 40048614 PMCID: PMC12005257 DOI: 10.1093/genetics/iyaf033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2025] [Revised: 02/12/2025] [Accepted: 02/15/2025] [Indexed: 03/12/2025] Open
Abstract
Scalable methods for estimating marginal coalescent trees across the genome present new opportunities for studying evolution and have generated considerable excitement, with new methods extending scalability to thousands of samples. Benchmarking of the available methods has revealed general tradeoffs between accuracy and scalability, but performance in downstream applications has not always been easily predictable from general performance measures, suggesting that specific features of the ancestral recombination graph (ARG) may be important for specific downstream applications of estimated ARGs. To exemplify this point, we benchmark ARG estimation methods with respect to a specific set of methods for estimating the historical time course of a population-mean polygenic score (PGS) using the marginal coalescent trees encoded by the ARG. Here, we examine the performance in simulation of seven ARG estimation methods: ARGweaver, RENT+, Relate, tsinfer+tsdate, ARG-Needle, ASMC-clust, and SINGER, using their estimated coalescent trees and examining bias, mean squared error, confidence interval coverage, and Type I and II error rates of the downstream methods. Although it does not scale to the sample sizes attainable by other new methods, SINGER produced the most accurate estimated PGS histories in many instances, even when Relate, tsinfer+tsdate, ARG-Needle, and ASMC-clust used samples 10 or more times as large as those used by SINGER. In general, the best choice of method depends on the number of samples available and the historical time period of interest. In particular, the unprecedented sample sizes allowed by Relate, tsinfer+tsdate, ARG-Needle, and ASMC-clust are of greatest importance when the recent past is of interest-further back in time, most of the tree has coalesced, and differences in contemporary sample size are less salient.
Collapse
Affiliation(s)
- Dandan Peng
- Department of Quantitative and Computational Biology, University of Southern California, 1050 Childs Way, Los Angeles, CA 90098, USA
| | - Obadiah J Mulder
- Department of Quantitative and Computational Biology, University of Southern California, 1050 Childs Way, Los Angeles, CA 90098, USA
| | - Michael D Edge
- Department of Quantitative and Computational Biology, University of Southern California, 1050 Childs Way, Los Angeles, CA 90098, USA
| |
Collapse
|
2
|
Patin E, Quintana-Murci L. Tracing the Evolution of Human Immunity Through Ancient DNA. Annu Rev Immunol 2025; 43:57-82. [PMID: 39705165 DOI: 10.1146/annurev-immunol-082323-024638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2024]
Abstract
Infections have imposed strong selection pressures throughout human evolution, making the study of natural selection's effects on immunity genes highly complementary to disease-focused research. This review discusses how ancient DNA studies, which have revolutionized evolutionary genetics, increase our understanding of the evolution of human immunity. These studies have shown that interbreeding between modern humans and Neanderthals or Denisovans has influenced present-day immune responses, particularly to viruses. Additionally, ancient genomics enables the tracking of how human immunity has evolved across cultural transitions, highlighting strong selection since the Bronze Age in Europe (<4,500 years) and potential genetic adaptations to epidemics raging during the Middle Ages and the European colonization of the Americas. Furthermore, ancient genomic studies suggest that the genetic risk for noninfectious immune disorders has gradually increased over millennia because alleles associated with increased risk for autoimmunity and inflammation once conferred resistance to infections. The challenge now is to extend these findings to diverse, non-European populations and to provide a more global understanding of the evolution of human immunity.
Collapse
Affiliation(s)
- Etienne Patin
- Institut Pasteur, Université Paris Cité, CNRS UMR 2000, Human Evolutionary Genetics Unit, Paris, France;
| | - Lluis Quintana-Murci
- Human Genomics and Evolution, Collège de France, Paris, France
- Institut Pasteur, Université Paris Cité, CNRS UMR 2000, Human Evolutionary Genetics Unit, Paris, France;
| |
Collapse
|
3
|
Yang Y, Durbin R, Iversen AKN, Lawson DJ. Sparse haplotype-based fine-scale local ancestry inference at scale reveals recent selection on immune responses. Nat Commun 2025; 16:2742. [PMID: 40113767 PMCID: PMC11926123 DOI: 10.1038/s41467-025-57601-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Accepted: 02/12/2025] [Indexed: 03/22/2025] Open
Abstract
Increasingly efficient methods for inferring the ancestral origin of genome regions are needed to gain insights into genetic function and history as biobanks grow in scale. Here we describe two near-linear time algorithms to learn ancestry harnessing the strengths of a Positional Burrows-Wheeler Transform. SparsePainter is a faster, sparse replacement of previous model-based 'chromosome painting' algorithms to identify recently shared haplotypes, whilst PBWTpaint uses further approximations to obtain lightning-fast estimation optimized for genome-wide relatedness estimation. The computational efficiency gains of these tools for fine-scale local ancestry inference offer the possibility to analyse large-scale genomic datasets using different approaches. Application to the UK Biobank shows that haplotypes better represent ancestries than principal components, whilst linkage-disequilibrium of ancestry identifies signals of recent changes to population-specific selection for many genomic regions associated with immune responses, suggesting avenues for understanding the pathogen-immune system interplay on a historical timescale.
Collapse
Affiliation(s)
- Yaoling Yang
- Department of Statistical Sciences, School of Mathematics, University of Bristol, Bristol, UK.
- MRC Integrative Epidemiology Unit, Population Health Sciences, University of Bristol, Bristol, UK.
| | - Richard Durbin
- Department of Genetics, University of Cambridge, Cambridge, UK
| | - Astrid K N Iversen
- Nuffield Department of Clinical Neurosciences, John Radcliffe Hospital, University of Oxford, Oxford, UK
| | - Daniel J Lawson
- Department of Statistical Sciences, School of Mathematics, University of Bristol, Bristol, UK.
- MRC Integrative Epidemiology Unit, Population Health Sciences, University of Bristol, Bristol, UK.
| |
Collapse
|
4
|
Huang Y, Carmi S, Ringbauer H. Estimating effective population size trajectories from time-series identity-by-descent segments. Genetics 2025; 229:iyae212. [PMID: 39854269 PMCID: PMC11912830 DOI: 10.1093/genetics/iyae212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2024] [Accepted: 12/12/2024] [Indexed: 01/26/2025] Open
Abstract
Long, identical haplotypes shared between pairs of individuals, known as identity-by-descent (IBD) segments, result from recently shared co-ancestry. Various methods have been developed to utilize IBD sharing for demographic inference in contemporary DNA data. Recent methodological advances have extended the screening for IBD segments to ancient DNA (aDNA) data, making demographic inference based on IBD also possible for aDNA. However, aDNA data typically have varying sampling times, but most demographic inference methods for modern data assume that sampling is contemporaneous. Here, we present Ttne (Time-Transect Ne), which models time-transect sampling to infer recent effective population size trajectories. Using simulations, we show that utilizing IBD sharing in time series increased resolution to infer recent fluctuations in effective population sizes compared with methods that only use contemporaneous samples. To account for IBD detection errors common in empirical analyses, we implemented an approach to estimate and model IBD detection errors. Finally, we applied Ttne to two aDNA time transects: individuals associated with the Copper Age Corded Ware Culture and Medieval England. In both cases, we found evidence of a growing population, a signal consistent with archaeological records.
Collapse
Affiliation(s)
- Yilei Huang
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig 04317, Germany
- Bioinformatics Group, Institute of Computer Science, Universität Leipzig, Leipzig 04109, Germany
| | - Shai Carmi
- Braun School of Public Health and Community Medicine, Hebrew University of Jerusalem, Jerusalem 9112102, Israel
| | - Harald Ringbauer
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig 04317, Germany
| |
Collapse
|
5
|
Shastry V, Berg JJ. Allele ages provide limited information about the strength of negative selection. Genetics 2025; 229:iyae211. [PMID: 39698825 PMCID: PMC11912868 DOI: 10.1093/genetics/iyae211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2024] [Accepted: 12/12/2024] [Indexed: 12/20/2024] Open
Abstract
For many problems in population genetics, it is useful to characterize the distribution of fitness effects (DFE) of de novo mutations among a certain class of sites. A DFE is typically estimated by fitting an observed site frequency spectrum (SFS) to an expected SFS given a hypothesized distribution of selection coefficients and demographic history. The development of tools to infer gene trees from haplotype alignments, along with ancient DNA resources, provides us with additional information about the frequency trajectories of segregating mutations. Here, we ask how useful this additional information is for learning about the DFE, using the joint distribution on allele frequency and age to summarize information about the trajectory. To this end, we introduce an accurate and efficient numerical method for computing the density on the age of a segregating variant found at a given sample frequency, given the strength of selection and an arbitrarily complex population size history. We then use this framework to show that the unconditional age distribution of negatively selected alleles is very closely approximated by reweighting the neutral age distribution in terms of the negatively selected SFS, suggesting that allele ages provide little information about the DFE beyond that already contained in the present day frequency. To confirm this prediction, we extended the standard Poisson random field method to incorporate the joint distribution of frequency and age in estimating selection coefficients, and test its performance using simulations. We find that when the full SFS is observed and the true allele ages are known, including ages in the estimation provides only small increases in the accuracy of estimated selection coefficients. However, if only sites with frequencies above a certain threshold are observed, then the true ages can provide substantial information about the selection coefficients, especially when the selection coefficient is large. When ages are estimated from haplotype data using state-of-the-art tools, uncertainty about the age abrogates most of the additional information in the fully observed SFS case, while the neutral prior assumed in these tools when estimating ages induces a downward bias in the case of the thresholded SFS.
Collapse
Affiliation(s)
- Vivaswat Shastry
- Committee on Genetics, Genomics and Systems Biology, University of Chicago, Chicago, IL 60637, USA
| | - Jeremy J Berg
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
6
|
Barton AR, Santander CG, Skoglund P, Moltke I, Reich D, Mathieson I. Insufficient evidence for natural selection associated with the Black Death. Nature 2025; 638:E19-E22. [PMID: 39972236 PMCID: PMC11938207 DOI: 10.1038/s41586-024-08496-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Accepted: 12/05/2024] [Indexed: 02/21/2025]
Affiliation(s)
- Alison R Barton
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Cindy G Santander
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Pontus Skoglund
- Ancient Genomics Laboratory, The Francis Crick Institute, London, UK
| | - Ida Moltke
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - David Reich
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | - Iain Mathieson
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
7
|
Vilgalys TP, Klunk J, Demeure CE, Cheng X, Shiratori M, Madej J, Beau R, Elli D, Patino MI, Redfern R, DeWitte SN, Gamble JA, Boldsen JL, Carmichael A, Varlik N, Eaton K, Grenier JC, Golding GB, Devault A, Rouillard JM, Yotova V, Sindeaux R, Ye CJ, Bikaran M, Dumaine A, Brinkworth JF, Missiakas D, Rouleau GA, Steinrücken M, Pizarro-Cerdá J, Poinar HN, Barreiro LB. Reply to: Insufficient evidence for natural selection associated with the Black Death. Nature 2025; 638:E23-E29. [PMID: 39972229 DOI: 10.1038/s41586-024-08497-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Affiliation(s)
- Tauras P Vilgalys
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Jennifer Klunk
- McMaster Ancient DNA Centre, Departments of Anthropology, Biology and Biochemistry, McMaster University, Hamilton, Ontario, Canada
- Daicel Arbor Biosciences, Ann Arbor, MI, USA
| | - Christian E Demeure
- Institut Pasteur, Université Paris Cité, CNRS UMR6047, Yersinia Research Unit, Microbiology Department, Paris, France
| | - Xiaoheng Cheng
- Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA
| | - Mari Shiratori
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Julien Madej
- Institut Pasteur, Université Paris Cité, CNRS UMR6047, Yersinia Research Unit, Microbiology Department, Paris, France
| | - Rémi Beau
- Institut Pasteur, Université Paris Cité, CNRS UMR6047, Yersinia Research Unit, Microbiology Department, Paris, France
| | - Derek Elli
- Department of Microbiology, Ricketts Laboratory, University of Chicago, Lemont, IL, USA
| | - Maria I Patino
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Rebecca Redfern
- Centre for Human Bioarchaeology, Museum of London, London, UK
| | - Sharon N DeWitte
- Department of Anthropology, University of South Carolina, Columbia, SC, USA
| | - Julia A Gamble
- Department of Anthropology, University of Manitoba, Winnipeg, Manitoba, Canada
| | - Jesper L Boldsen
- Unit of Anthropology (ADBOU), Department of Forensic Medicine University of Southern Denmark, Odense, Denmark
| | - Ann Carmichael
- History Department, Indiana University, Bloomington, IN, USA
| | - Nükhet Varlik
- Department of History, Rutgers University, Newark, NJ, USA
| | - Katherine Eaton
- McMaster Ancient DNA Centre, Departments of Anthropology, Biology and Biochemistry, McMaster University, Hamilton, Ontario, Canada
| | - Jean-Christophe Grenier
- Montreal Heart Institute, Faculty of Medicine, Université de Montréal, Montreal, Quebec, Canada
| | - G Brian Golding
- McMaster Ancient DNA Centre, Departments of Anthropology, Biology and Biochemistry, McMaster University, Hamilton, Ontario, Canada
| | | | - Jean-Marie Rouillard
- Daicel Arbor Biosciences, Ann Arbor, MI, USA
- Department of Chemical Engineering, University of Michigan Ann Arbor, Ann Arbor, MI, USA
| | - Vania Yotova
- Centre Hospitalier Universitaire Sainte-Justine, Montreal, Quebec, Canada
| | - Renata Sindeaux
- Centre Hospitalier Universitaire Sainte-Justine, Montreal, Quebec, Canada
| | - Chun Jimmie Ye
- Division of Rheumatology, Department of Medicine, University of California, San Francisco, CA, USA
- Institute for Human Genetics, University of California, San Francisco, CA, USA
| | - Matin Bikaran
- Division of Rheumatology, Department of Medicine, University of California, San Francisco, CA, USA
- Institute for Human Genetics, University of California, San Francisco, CA, USA
| | - Anne Dumaine
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Jessica F Brinkworth
- Department of Anthropology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Dominique Missiakas
- Department of Microbiology, Ricketts Laboratory, University of Chicago, Lemont, IL, USA
| | - Guy A Rouleau
- Montreal Neurological Institute-Hospital, McGill University, Montreal, Quebec, Canada
| | - Matthias Steinrücken
- Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Javier Pizarro-Cerdá
- Institut Pasteur, Université Paris Cité, CNRS UMR6047, Yersinia Research Unit, Microbiology Department, Paris, France
| | - Hendrik N Poinar
- McMaster Ancient DNA Centre, Departments of Anthropology, Biology and Biochemistry, McMaster University, Hamilton, Ontario, Canada.
- Michael G. DeGroote Institute of Infectious Disease Research, McMaster University, Hamilton, Ontario, Canada.
- Humans and the Microbiome Program, Canadian Institute for Advanced Research, Toronto, Ontario, Canada.
| | - Luis B Barreiro
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA.
- Committee on Genetics, Genomics, and Systems Biology, University of Chicago, Chicago, IL, USA.
- Committee on Immunology, University of Chicago, Chicago, IL, USA.
| |
Collapse
|
8
|
Temple SD, Browning SR. Multiple-testing corrections in selection scans using identity-by-descent segments. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.01.29.635528. [PMID: 39975073 PMCID: PMC11838353 DOI: 10.1101/2025.01.29.635528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Abstract
Failing to correct for multiple testing in selection scans can lead to false discoveries of recent genetic adaptations. The scanning statistics in selection studies are often too complicated to theoretically derive a genome-wide significance level or empirically validate control of the family-wise error rate (FWER). By modeling the autocorrelation of identity-by-descent (IBD) rates, we propose a computationally efficient method to determine genome-wide significance levels in an IBD-based scan for recent positive selection. In whole genome simulations, we show that our method has approximate control of the FWER and can adapt to the spacing of tests along the genome. We also show that these scans can have more than fifty percent power to reject the null model in hard sweeps with a selection coefficient s > = 0.01 and a sweeping allele frequency between twenty-five and seventy-five percent. A few human genes and gene complexes have statistically significant excesses of IBD segments in thousands of samples of African, European, and South Asian ancestry groups from the Trans-Omics for Precision Medicine project and the United Kingdom Biobank. Among the significant loci, many signals of recent selection are shared across ancestry groups. One shared selection signal at a skeletal cell development gene is extremely strong in African ancestry samples.
Collapse
Affiliation(s)
- Seth D. Temple
- Department of Statistics, University of Washington, Seattle, Washington, USA
- Department of Statistics, University of Michigan, Ann Arbor, Michigan, USA
- Michigan Institute for Data Science, University of Michigan, Ann Arbor, Michigan, USA
| | - Sharon R. Browning
- Department of Biostatistics, University of Washington, Seattle, Washington, USA
| |
Collapse
|
9
|
Anderson NW, Kirk L, Schraiber JG, Ragsdale AP. A path integral approach for allele frequency dynamics under polygenic selection. Genetics 2025; 229:1-63. [PMID: 39531638 PMCID: PMC12086674 DOI: 10.1093/genetics/iyae182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2024] [Revised: 10/11/2024] [Accepted: 10/16/2024] [Indexed: 11/16/2024] Open
Abstract
Many phenotypic traits have a polygenic genetic basis, making it challenging to learn their genetic architectures and predict individual phenotypes. One promising avenue to resolve the genetic basis of complex traits is through evolve-and-resequence (E&R) experiments, in which laboratory populations are exposed to some selective pressure and trait-contributing loci are identified by extreme frequency changes over the course of the experiment. However, small laboratory populations will experience substantial random genetic drift, and it is difficult to determine whether selection played a role in a given allele frequency change (AFC). Predicting AFCs under drift and selection, even for alleles contributing to simple, monogenic traits, has remained a challenging problem. Recently, there have been efforts to apply the path integral, a method borrowed from physics, to solve this problem. So far, this approach has been limited to genic selection, and is therefore inadequate to capture the complexity of quantitative, highly polygenic traits that are commonly studied. Here, we extend one of these path integral methods, the perturbation approximation, to selection scenarios that are of interest to quantitative genetics. We derive analytic expressions for the transition probability (i.e. the probability that an allele will change in frequency from x to y in time t) of an allele contributing to a trait subject to stabilizing selection, as well as that of an allele contributing to a trait rapidly adapting to a new phenotypic optimum. We use these expressions to characterize the use of AFC to test for selection, as well as explore optimal design choices for E&R experiments to uncover the genetic architecture of polygenic traits under selection.
Collapse
Affiliation(s)
- Nathan W Anderson
- Department of Integrative Biology, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Lloyd Kirk
- Department of Integrative Biology, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Joshua G Schraiber
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Aaron P Ragsdale
- Department of Integrative Biology, University of Wisconsin-Madison, Madison, WI 53706, USA
| |
Collapse
|
10
|
Cheng X, Steinrücken M. diplo-locus: A lightweight toolkit for inference and simulation of time-series genetic data under general diploid selection. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.10.12.562101. [PMID: 37905072 PMCID: PMC10614779 DOI: 10.1101/2023.10.12.562101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
Whole-genome time-series allele frequency data are becoming more prevalent as ancient DNA (aDNA) sequences and data from evolve-and-resequence (E&R) experiments are generated at a rapid pace. Such data presents unprecedented opportunities to elucidate the dynamics of genetic variation under selection. However, despite many methods to infer parameters of selection models from allele frequency trajectories available in the literature, few provide user-friendly implementations for large-scale empirical applications. Here, we present diplo-locus, an open-source Python package that provides functionality to simulate and perform inference from time-series data under the Wright-Fisher diffusion with general diploid selection. The package includes Python modules as well as command-line tools and is available at: https://github.com/steinrue/diplo_locus.
Collapse
Affiliation(s)
- Xiaoheng Cheng
- Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA
| | - Matthias Steinrücken
- Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA
- Department of Human Genetics, University of Chicago, Chicago IL, USA
| |
Collapse
|
11
|
Pandey D, Harris M, Garud NR, Narasimhan VM. Leveraging ancient DNA to uncover signals of natural selection in Europe lost due to admixture or drift. Nat Commun 2024; 15:9772. [PMID: 39532856 PMCID: PMC11557891 DOI: 10.1038/s41467-024-53852-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 10/23/2024] [Indexed: 11/16/2024] Open
Abstract
Large ancient DNA (aDNA) studies offer the chance to examine genomic changes over time, providing direct insights into human evolution. While recent studies have used time-stratified aDNA for selection scans, most focus on single-locus methods. We conducted a multi-locus genotype scan on 708 samples spanning 7000 years of European history. We show that the G12 statistic, originally designed for unphased diploid data, can effectively detect selection in aDNA processed to create 'pseudo-haplotypes'. In simulations and at known positive control loci (e.g., lactase persistence), G12 outperforms the allele frequency-based selection statistic, SweepFinder2, previously used on aDNA. Applying our approach, we identified 14 candidate regions of selection across four time periods, with half the signals detectable only in the earliest period. Our findings suggest that selective events in European prehistory, including from the onset of animal domestication, have been obscured by neutral processes like genetic drift and demographic shifts such as admixture.
Collapse
Affiliation(s)
- Devansh Pandey
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA
| | - Mariana Harris
- Department of Computational Medicine, University of California, Los Angeles, CA, USA
| | - Nandita R Garud
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA, USA.
- Department of Human Genetics, University of California, Los Angeles, CA, USA.
| | - Vagheesh M Narasimhan
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA.
- Department of Statistics and Data Science, The University of Texas at Austin, Austin, TX, USA.
| |
Collapse
|
12
|
Temple SD, Waples RK, Browning SR. Modeling recent positive selection using identity-by-descent segments. Am J Hum Genet 2024; 111:2510-2529. [PMID: 39362217 PMCID: PMC11568764 DOI: 10.1016/j.ajhg.2024.08.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 08/29/2024] [Accepted: 08/30/2024] [Indexed: 10/05/2024] Open
Abstract
Recent positive selection can result in an excess of long identity-by-descent (IBD) haplotype segments overlapping a locus. The statistical methods that we propose here address three major objectives in studying selective sweeps: scanning for regions of interest, identifying possible sweeping alleles, and estimating a selection coefficient s. First, we implement a selection scan to locate regions with excess IBD rates. Second, we estimate the allele frequency and location of an unknown sweeping allele by aggregating over variants that are more abundant in an inferred outgroup with excess IBD rate versus the rest of the sample. Third, we propose an estimator for the selection coefficient and quantify uncertainty using the parametric bootstrap. Comparing against state-of-the-art methods in extensive simulations, we show that our methods are more precise at estimating s when s≥0.015. We also show that our 95% confidence intervals contain s in nearly 95% of our simulations. We apply these methods to study positive selection in European ancestry samples from the Trans-Omics for Precision Medicine project. We analyze eight loci where IBD rates are more than four standard deviations above the genome-wide median, including LCT where the maximum IBD rate is 35 standard deviations above the genome-wide median. Overall, we present robust and accurate approaches to study recent adaptive evolution without knowing the identity of the causal allele or using time series data.
Collapse
Affiliation(s)
- Seth D Temple
- Department of Statistics, University of Washington, Seattle, WA, USA.
| | - Ryan K Waples
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Sharon R Browning
- Department of Biostatistics, University of Washington, Seattle, WA, USA.
| |
Collapse
|
13
|
Laval G, Patin E, Quintana-Murci L, Kerner G. Deep estimation of the intensity and timing of natural selection from ancient genomes. Mol Ecol Resour 2024; 24:e14015. [PMID: 39215552 DOI: 10.1111/1755-0998.14015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Revised: 07/22/2024] [Accepted: 08/15/2024] [Indexed: 09/04/2024]
Abstract
Leveraging past allele frequencies has proven to be key for identifying the impact of natural selection across time. However, this approach suffers from imprecise estimations of the intensity (s) and timing (T) of selection, particularly when ancient samples are scarce in specific epochs. Here, we aimed to bypass the computation of allele frequencies across arbitrarily defined past epochs and refine the estimations of selection parameters by implementing convolutional neural networks (CNNs) algorithms that directly use ancient genotypes sampled across time. Using computer simulations, we first show that genotype-based CNNs consistently outperform an approximate Bayesian computation (ABC) approach based on past allele frequency trajectories, regardless of the selection model assumed and the number of available ancient genotypes. When applying this method to empirical data from modern and ancient Europeans, we replicated the reported increased number of selection events in post-Neolithic Europe, independently of the continental subregion studied. Furthermore, we substantially refined the ABC-based estimations of s and T for a set of positively and negatively selected variants, including iconic cases of positive selection and experimentally validated disease-risk variants. Our CNN predictions support a history of recent positive and negative selection targeting variants associated with host defence against pathogens, aligning with previous work that highlights the significant impact of infectious diseases, such as tuberculosis, in Europe. These findings collectively demonstrate that detecting the footprints of natural selection on ancient genomes is crucial for unravelling the history of severe human diseases.
Collapse
Affiliation(s)
- Guillaume Laval
- Human Evolutionary Genetics Unit, Institut Pasteur, Université Paris Cité, CNRS UMR2000, Paris, France
| | - Etienne Patin
- Human Evolutionary Genetics Unit, Institut Pasteur, Université Paris Cité, CNRS UMR2000, Paris, France
| | - Lluis Quintana-Murci
- Human Evolutionary Genetics Unit, Institut Pasteur, Université Paris Cité, CNRS UMR2000, Paris, France
- Chair of Human Genomics and Evolution, Collège de France, Paris, France
| | - Gaspard Kerner
- Human Evolutionary Genetics Unit, Institut Pasteur, Université Paris Cité, CNRS UMR2000, Paris, France
| |
Collapse
|
14
|
Bolognini D, Halgren A, Lou RN, Raveane A, Rocha JL, Guarracino A, Soranzo N, Chin CS, Garrison E, Sudmant PH. Recurrent evolution and selection shape structural diversity at the amylase locus. Nature 2024; 634:617-625. [PMID: 39232174 PMCID: PMC11485256 DOI: 10.1038/s41586-024-07911-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 08/06/2024] [Indexed: 09/06/2024]
Abstract
The adoption of agriculture triggered a rapid shift towards starch-rich diets in human populations1. Amylase genes facilitate starch digestion, and increased amylase copy number has been observed in some modern human populations with high-starch intake2, although evidence of recent selection is lacking3,4. Here, using 94 long-read haplotype-resolved assemblies and short-read data from approximately 5,600 contemporary and ancient humans, we resolve the diversity and evolutionary history of structural variation at the amylase locus. We find that amylase genes have higher copy numbers in agricultural populations than in fishing, hunting and pastoral populations. We identify 28 distinct amylase structural architectures and demonstrate that nearly identical structures have arisen recurrently on different haplotype backgrounds throughout recent human history. AMY1 and AMY2A genes each underwent multiple duplication/deletion events with mutation rates up to more than 10,000-fold the single-nucleotide polymorphism mutation rate, whereas AMY2B gene duplications share a single origin. Using a pangenome-based approach, we infer structural haplotypes across thousands of humans identifying extensively duplicated haplotypes at higher frequency in modern agricultural populations. Leveraging 533 ancient human genomes, we find that duplication-containing haplotypes (with more gene copies than the ancestral haplotype) have rapidly increased in frequency over the past 12,000 years in West Eurasians, suggestive of positive selection. Together, our study highlights the potential effects of the agricultural revolution on human genomes and the importance of structural variation in human adaptation.
Collapse
Affiliation(s)
| | - Alma Halgren
- Department of Integrative Biology, University of California Berkeley, Berkeley, CA, USA
| | - Runyang Nicolas Lou
- Department of Integrative Biology, University of California Berkeley, Berkeley, CA, USA
| | | | - Joana L Rocha
- Department of Integrative Biology, University of California Berkeley, Berkeley, CA, USA
| | - Andrea Guarracino
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Nicole Soranzo
- Human Technopole, Milan, Italy
- Wellcome Sanger Institute, Hinxton, UK
- National Institute for Health Research Blood and Transplant Research Unit in Donor Health and Genomics, University of Cambridge, Cambridge, UK
- Department of Haematology, Cambridge Biomedical Campus, Cambridge, UK
- British Heart Foundation Centre of Research Excellence, University of Cambridge, Cambridge, UK
| | - Chen-Shan Chin
- Foundation for Biological Data Science, Belmont, CA, USA
| | - Erik Garrison
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA.
| | - Peter H Sudmant
- Department of Integrative Biology, University of California Berkeley, Berkeley, CA, USA.
- Center for Computational Biology, University of California Berkeley, Berkeley, CA, USA.
| |
Collapse
|
15
|
Scheib CL, Hui R, Rose AK, D’Atanasio E, Inskip SA, Dittmar J, Cessford C, Griffith SJ, Solnik A, Wiseman R, Neil B, Biers T, Harknett SJ, Sasso S, Biagini SA, Runfeldt G, Duhig C, Evans C, Metspalu M, Millett MJ, O’Connell TC, Robb JE, Kivisild T. Low Genetic Impact of the Roman Occupation of Britain in Rural Communities. Mol Biol Evol 2024; 41:msae168. [PMID: 39268685 PMCID: PMC11393495 DOI: 10.1093/molbev/msae168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 07/25/2024] [Accepted: 08/07/2024] [Indexed: 09/17/2024] Open
Abstract
The Roman period saw the empire expand across Europe and the Mediterranean, including much of what is today Great Britain. While there is written evidence of high mobility into and out of Britain for administrators, traders, and the military, the impact of imperialism on local, rural population structure, kinship, and mobility is invisible in the textual record. The extent of genetic change that occurred in Britain during the Roman military occupation remains underexplored. Here, using genome-wide data from 52 ancient individuals from eight sites in Cambridgeshire covering the period of Roman occupation, we show low levels of genetic ancestry differentiation between Romano-British sites and indications of larger populations than in the Bronze Age and Neolithic. We find no evidence of long-distance migration from elsewhere in the Empire, though we do find one case of possible temporary mobility within a family unit during the Late Romano-British period. We also show that the present-day patterns of genetic ancestry composition in Britain emerged after the Roman period.
Collapse
Affiliation(s)
- Christiana L Scheib
- Estonian Biocentre, Institute of Genomics, University of Tartu Tartu 51010, Estonia
- St John's College, University of Cambridge, Cambridge CB2 1TP, UK
- McDonald Institute for Archaeological Research, University of Cambridge, Cambridge CB2 3ER, UK
| | - Ruoyun Hui
- McDonald Institute for Archaeological Research, University of Cambridge, Cambridge CB2 3ER, UK
- Alan Turing Institute, British Library, London NW1 2DB, UK
| | - Alice K Rose
- McDonald Institute for Archaeological Research, University of Cambridge, Cambridge CB2 3ER, UK
| | - Eugenia D’Atanasio
- Institute of Molecular Biology and Pathology, IBPM CNR, Rome 00185, Italy
| | - Sarah A Inskip
- McDonald Institute for Archaeological Research, University of Cambridge, Cambridge CB2 3ER, UK
- School of Archaeology and Ancient History, University of Leicester, University Road, Leicester LE1 7RH, UK
| | - Jenna Dittmar
- McDonald Institute for Archaeological Research, University of Cambridge, Cambridge CB2 3ER, UK
| | - Craig Cessford
- Cambridge Archaeological Unit, Department of Archaeology, University of Cambridge, Cambridge CB3 0DT, UK
| | - Samuel J Griffith
- Estonian Biocentre, Institute of Genomics, University of Tartu Tartu 51010, Estonia
| | - Anu Solnik
- Core Facility, Institute of Genomics, University of Tartu, Tartu 51010, Estonia
| | - Rob Wiseman
- Core Facility, Institute of Genomics, University of Tartu, Tartu 51010, Estonia
| | - Benjamin Neil
- Core Facility, Institute of Genomics, University of Tartu, Tartu 51010, Estonia
| | - Trish Biers
- Department of Archaeology, University of Cambridge, Cambridge CB2 3DZ, UK
| | | | - Stefania Sasso
- Estonian Biocentre, Institute of Genomics, University of Tartu Tartu 51010, Estonia
| | - Simone A Biagini
- Institut de Biologia Evolutiva (UPF-CSIC), Departament de Medicina i Ciències de la Vida, Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, 08003 Barcelona, Spain
- Department of Human Genetics, KU Leuven, 3000 Leuven, Belgium
| | | | - Corinne Duhig
- Wolfson College, University of Cambridge, Cambridge CB3 9BB, UK
| | - Christopher Evans
- Department of Archaeology, University of Cambridge, Cambridge CB2 3DZ, UK
| | - Mait Metspalu
- Estonian Biocentre, Institute of Genomics, University of Tartu Tartu 51010, Estonia
| | - Martin J Millett
- Faculty of Classics, University of Cambridge, Cambridge CB3 9DA, UK
| | - Tamsin C O’Connell
- Department of Archaeology, University of Cambridge, Cambridge CB2 3DZ, UK
| | - John E Robb
- Department of Archaeology, University of Cambridge, Cambridge CB2 3DZ, UK
| | - Toomas Kivisild
- Estonian Biocentre, Institute of Genomics, University of Tartu Tartu 51010, Estonia
- McDonald Institute for Archaeological Research, University of Cambridge, Cambridge CB2 3ER, UK
- Department of Human Genetics, KU Leuven, 3000 Leuven, Belgium
| |
Collapse
|
16
|
Vaughn AH, Nielsen R. Fast and Accurate Estimation of Selection Coefficients and Allele Histories from Ancient and Modern DNA. Mol Biol Evol 2024; 41:msae156. [PMID: 39078618 PMCID: PMC11321360 DOI: 10.1093/molbev/msae156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2023] [Revised: 07/02/2024] [Accepted: 07/10/2024] [Indexed: 07/31/2024] Open
Abstract
We here present CLUES2, a full-likelihood method to infer natural selection from sequence data that is an extension of the method CLUES. We make several substantial improvements to the CLUES method that greatly increases both its applicability and its speed. We add the ability to use ancestral recombination graphs on ancient data as emissions to the underlying hidden Markov model, which enables CLUES2 to use both temporal and linkage information to make estimates of selection coefficients. We also fully implement the ability to estimate distinct selection coefficients in different epochs, which allows for the analysis of changes in selective pressures through time, as well as selection with dominance. In addition, we greatly increase the computational efficiency of CLUES2 over CLUES using several approximations to the forward-backward algorithms and develop a new way to reconstruct historic allele frequencies by integrating over the uncertainty in the estimation of the selection coefficients. We illustrate the accuracy of CLUES2 through extensive simulations and validate the importance sampling framework for integrating over the uncertainty in the inference of gene trees. We also show that CLUES2 is well-calibrated by showing that under the null hypothesis, the distribution of log-likelihood ratios follows a χ2 distribution with the appropriate degrees of freedom. We run CLUES2 on a set of recently published ancient human data from Western Eurasia and test for evidence of changing selection coefficients through time. We find significant evidence of changing selective pressures in several genes correlated with the introduction of agriculture to Europe and the ensuing dietary and demographic shifts of that time. In particular, our analysis supports previous hypotheses of strong selection on lactase persistence during periods of ancient famines and attenuated selection in more modern periods.
Collapse
Affiliation(s)
- Andrew H Vaughn
- Center for Computational Biology, University of California, Berkeley, CA 94720, USA
| | - Rasmus Nielsen
- Departments of Integrative Biology and Statistics, University of California, Berkeley, CA 94720, USA
- Center for GeoGenetics, University of Copenhagen, Copenhagen DK-1350, Denmark
| |
Collapse
|
17
|
Zeng T, Spence JP, Mostafavi H, Pritchard JK. Bayesian estimation of gene constraint from an evolutionary model with gene features. Nat Genet 2024; 56:1632-1643. [PMID: 38977852 DOI: 10.1038/s41588-024-01820-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 05/29/2024] [Indexed: 07/10/2024]
Abstract
Measures of selective constraint on genes have been used for many applications, including clinical interpretation of rare coding variants, disease gene discovery and studies of genome evolution. However, widely used metrics are severely underpowered at detecting constraints for the shortest ~25% of genes, potentially causing important pathogenic mutations to be overlooked. Here we developed a framework combining a population genetics model with machine learning on gene features to enable accurate inference of an interpretable constraint metric, shet. Our estimates outperform existing metrics for prioritizing genes important for cell essentiality, human disease and other phenotypes, especially for short genes. Our estimates of selective constraint should have wide utility for characterizing genes relevant to human disease. Finally, our inference framework, GeneBayes, provides a flexible platform that can improve the estimation of many gene-level properties, such as rare variant burden or gene expression differences.
Collapse
Affiliation(s)
- Tony Zeng
- Department of Genetics, Stanford University, Stanford, CA, USA.
| | | | - Hakhamanesh Mostafavi
- Department of Genetics, Stanford University, Stanford, CA, USA
- Department of Population Health, New York University, New York, NY, USA
| | - Jonathan K Pritchard
- Department of Genetics, Stanford University, Stanford, CA, USA.
- Department of Biology, Stanford University, Stanford, CA, USA.
| |
Collapse
|
18
|
Naseri A, Zhi D, Zhang S. Discovery of runs-of-homozygosity diplotype clusters and their associations with diseases in UK Biobank. eLife 2024; 13:e81698. [PMID: 38905121 PMCID: PMC11249732 DOI: 10.7554/elife.81698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Accepted: 06/20/2024] [Indexed: 06/23/2024] Open
Abstract
Runs-of-homozygosity (ROH) segments, contiguous homozygous regions in a genome were traditionally linked to families and inbred populations. However, a growing literature suggests that ROHs are ubiquitous in outbred populations. Still, most existing genetic studies of ROH in populations are limited to aggregated ROH content across the genome, which does not offer the resolution for mapping causal loci. This limitation is mainly due to a lack of methods for the efficient identification of shared ROH diplotypes. Here, we present a new method, ROH-DICE (runs-of-homozygous diplotype cluster enumerator), to find large ROH diplotype clusters, sufficiently long ROHs shared by a sufficient number of individuals, in large cohorts. ROH-DICE identified over 1 million ROH diplotypes that span over 100 single nucleotide polymorphisms (SNPs) and are shared by more than 100 UK Biobank participants. Moreover, we found significant associations of clustered ROH diplotypes across the genome with various self-reported diseases, with the strongest associations found between the extended human leukocyte antigen (HLA) region and autoimmune disorders. We found an association between a diplotype covering the homeostatic iron regulator (HFE) gene and hemochromatosis, even though the well-known causal SNP was not directly genotyped or imputed. Using a genome-wide scan, we identified a putative association between carriers of an ROH diplotype in chromosome 4 and an increase in mortality among COVID-19 patients (p-value = 1.82 × 10-11). In summary, our ROH-DICE method, by calling out large ROH diplotypes in a large outbred population, enables further population genetics into the demographic history of large populations. More importantly, our method enables a new genome-wide mapping approach for finding disease-causing loci with multi-marker recessive effects at a population scale.
Collapse
Affiliation(s)
- Ardalan Naseri
- Department of Computer Science, University of Central FloridaOrlandoUnited States
| | - Degui Zhi
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at HoustonHoustonUnited States
| | - Shaojie Zhang
- Department of Computer Science, University of Central FloridaOrlandoUnited States
| |
Collapse
|
19
|
Anderson NW, Kirk L, Schraiber JG, Ragsdale AP. A Path Integral Approach for Allele Frequency Dynamics Under Polygenic Selection. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.14.599114. [PMID: 38915613 PMCID: PMC11195211 DOI: 10.1101/2024.06.14.599114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
Many phenotypic traits have a polygenic genetic basis, making it challenging to learn their genetic architectures and predict individual phenotypes. One promising avenue to resolve the genetic basis of complex traits is through evolve-and-resequence experiments, in which laboratory populations are exposed to some selective pressure and trait-contributing loci are identified by extreme frequency changes over the course of the experiment. However, small laboratory populations will experience substantial random genetic drift, and it is difficult to determine whether selection played a roll in a given allele frequency change. Predicting how much allele frequencies change under drift and selection had remained an open problem well into the 21st century, even those contributing to simple, monogenic traits. Recently, there have been efforts to apply the path integral, a method borrowed from physics, to solve this problem. So far, this approach has been limited to genic selection, and is therefore inadequate to capture the complexity of quantitative, highly polygenic traits that are commonly studied. Here we extend one of these path integral methods, the perturbation approximation, to selection scenarios that are of interest to quantitative genetics. In particular, we derive analytic expressions for the transition probability (i.e., the probability that an allele will change in frequency from x , to y in time t ) of an allele contributing to a trait subject to stabilizing selection, as well as that of an allele contributing to a trait rapidly adapting to a new phenotypic optimum. We use these expressions to characterize the use of allele frequency change to test for selection, as well as explore optimal design choices for evolve-and-resequence experiments to uncover the genetic architecture of polygenic traits under selection.
Collapse
Affiliation(s)
- Nathan W. Anderson
- Department of Integrative Biology, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Lloyd Kirk
- Department of Integrative Biology, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Joshua G. Schraiber
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, 90089, USA
| | - Aaron P. Ragsdale
- Department of Integrative Biology, University of Wisconsin-Madison, Madison, WI, 53706, USA
| |
Collapse
|
20
|
Zeng T, Spence JP, Mostafavi H, Pritchard JK. Bayesian estimation of gene constraint from an evolutionary model with gene features. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.05.19.541520. [PMID: 37292653 PMCID: PMC10245655 DOI: 10.1101/2023.05.19.541520] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Measures of selective constraint on genes have been used for many applications including clinical interpretation of rare coding variants, disease gene discovery, and studies of genome evolution. However, widely-used metrics are severely underpowered at detecting constraint for the shortest ∼25% of genes, potentially causing important pathogenic mutations to be overlooked. We developed a framework combining a population genetics model with machine learning on gene features to enable accurate inference of an interpretable constraint metric, shet. Our estimates outperform existing metrics for prioritizing genes important for cell essentiality, human disease, and other phenotypes, especially for short genes. Our new estimates of selective constraint should have wide utility for characterizing genes relevant to human disease. Finally, our inference framework, GeneBayes, provides a flexible platform that can improve estimation of many gene-level properties, such as rare variant burden or gene expression differences.
Collapse
Affiliation(s)
- Tony Zeng
- Department of Genetics, Stanford University, Stanford CA
| | | | | | - Jonathan K. Pritchard
- Department of Genetics, Stanford University, Stanford CA
- Department of Biology, Stanford University, Stanford CA
| |
Collapse
|
21
|
Poyraz L, Colbran LL, Mathieson I. Predicting Functional Consequences of Recent Natural Selection in Britain. Mol Biol Evol 2024; 41:msae053. [PMID: 38466119 PMCID: PMC10962637 DOI: 10.1093/molbev/msae053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 02/02/2024] [Accepted: 03/01/2024] [Indexed: 03/12/2024] Open
Abstract
Ancient DNA can directly reveal the contribution of natural selection to human genomic variation. However, while the analysis of ancient DNA has been successful at identifying genomic signals of selection, inferring the phenotypic consequences of that selection has been more difficult. Most trait-associated variants are noncoding, so we expect that a large proportion of the phenotypic effects of selection will also act through noncoding variation. Since we cannot measure gene expression directly in ancient individuals, we used an approach (Joint-Tissue Imputation [JTI]) developed to predict gene expression from genotype data. We tested for changes in the predicted expression of 17,384 protein coding genes over a time transect of 4,500 years using 91 present-day and 616 ancient individuals from Britain. We identified 28 genes at seven genomic loci with significant (false discovery rate [FDR] < 0.05) changes in predicted expression levels in this time period. We compared the results from our transcriptome-wide scan to a genome-wide scan based on estimating per-single nucleotide polymorphism (SNP) selection coefficients from time series data. At five previously identified loci, our approach allowed us to highlight small numbers of genes with evidence for significant shifts in expression from peaks that in some cases span tens of genes. At two novel loci (SLC44A5 and NUP85), we identify selection on gene expression not captured by scans based on genomic signatures of selection. Finally, we show how classical selection statistics (iHS and SDS) can be combined with JTI models to incorporate functional information into scans that use present-day data alone. These results demonstrate the potential of this type of information to explore both the causes and consequences of natural selection.
Collapse
Affiliation(s)
- Lin Poyraz
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Computational Biology, Cornell University, Ithaca, NY, USA
| | - Laura L Colbran
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Iain Mathieson
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
22
|
Simon A, Coop G. The contribution of gene flow, selection, and genetic drift to five thousand years of human allele frequency change. Proc Natl Acad Sci U S A 2024; 121:e2312377121. [PMID: 38363870 PMCID: PMC10907250 DOI: 10.1073/pnas.2312377121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 01/09/2024] [Indexed: 02/18/2024] Open
Abstract
Genomic time series from experimental evolution studies and ancient DNA datasets offer us a chance to directly observe the interplay of various evolutionary forces. We show how the genome-wide variance in allele frequency change between two time points can be decomposed into the contributions of gene flow, genetic drift, and linked selection. In closed populations, the contribution of linked selection is identifiable because it creates covariances between time intervals, and genetic drift does not. However, repeated gene flow between populations can also produce directionality in allele frequency change, creating covariances. We show how to accurately separate the fraction of variance in allele frequency change due to admixture and linked selection in a population receiving gene flow. We use two human ancient DNA datasets, spanning around 5,000 y, as time transects to quantify the contributions to the genome-wide variance in allele frequency change. We find that a large fraction of genome-wide change is due to gene flow. In both cases, after correcting for known major gene flow events, we do not observe a signal of genome-wide linked selection. Thus despite the known role of selection in shaping long-term polymorphism levels, and an increasing number of examples of strong selection on single loci and polygenic scores from ancient DNA, it appears to be gene flow and drift, and not selection, that are the main determinants of recent genome-wide allele frequency change. Our approach should be applicable to the growing number of contemporary and ancient temporal population genomics datasets.
Collapse
Affiliation(s)
- Alexis Simon
- Center for Population Biology, University of California, Davis, CA95616
- Department of Evolution and Ecology, University of California, Davis, CA95616
| | - Graham Coop
- Center for Population Biology, University of California, Davis, CA95616
- Department of Evolution and Ecology, University of California, Davis, CA95616
| |
Collapse
|
23
|
Simon A, Coop G. The contribution of gene flow, selection, and genetic drift to five thousand years of human allele frequency change. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.07.11.548607. [PMID: 37503227 PMCID: PMC10370008 DOI: 10.1101/2023.07.11.548607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Genomic time series from experimental evolution studies and ancient DNA datasets offer us a chance to directly observe the interplay of various evolutionary forces. We show how the genome-wide variance in allele frequency change between two time points can be decomposed into the contributions of gene flow, genetic drift, and linked selection. In closed populations, the contribution of linked selection is identifiable because it creates covariances between time intervals, and genetic drift does not. However, repeated gene flow between populations can also produce directionality in allele frequency change, creating covariances. We show how to accurately separate the fraction of variance in allele frequency change due to admixture and linked selection in a population receiving gene flow. We use two human ancient DNA datasets, spanning around 5,000 years, as time transects to quantify the contributions to the genome-wide variance in allele frequency change. We find that a large fraction of genome-wide change is due to gene flow. In both cases, after correcting for known major gene flow events, we do not observe a signal of genome-wide linked selection. Thus despite the known role of selection in shaping long-term polymorphism levels, and an increasing number of examples of strong selection on single loci and polygenic scores from ancient DNA, it appears to be gene flow and drift, and not selection, that are the main determinants of recent genome-wide allele frequency change. Our approach should be applicable to the growing number of contemporary and ancient temporal population genomics datasets.
Collapse
Affiliation(s)
- Alexis Simon
- Center for Population Biology, University of California, Davis, CA 95616
- Department of Evolution and Ecology, University of California, Davis, CA 95616
| | - Graham Coop
- Center for Population Biology, University of California, Davis, CA 95616
- Department of Evolution and Ecology, University of California, Davis, CA 95616
| |
Collapse
|
24
|
Gao Z. Unveiling recent and ongoing adaptive selection in human populations. PLoS Biol 2024; 22:e3002469. [PMID: 38236800 PMCID: PMC10796035 DOI: 10.1371/journal.pbio.3002469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2024] Open
Abstract
Genome-wide scans for signals of selection have become a routine part of the analysis of population genomic variation datasets and have resulted in compelling evidence of selection during recent human evolution. This Essay spotlights methodological innovations that have enabled the detection of selection over very recent timescales, even in contemporary human populations. By harnessing large-scale genomic and phenotypic datasets, these new methods use different strategies to uncover connections between genotype, phenotype, and fitness. This Essay outlines the rationale and key findings of each strategy, discusses challenges in interpretation, and describes opportunities to improve detection and understanding of ongoing selection in human populations.
Collapse
Affiliation(s)
- Ziyue Gao
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
25
|
Spence JP, Zeng T, Mostafavi H, Pritchard JK. Scaling the discrete-time Wright-Fisher model to biobank-scale datasets. Genetics 2023; 225:iyad168. [PMID: 37724741 PMCID: PMC10627256 DOI: 10.1093/genetics/iyad168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 06/01/2023] [Accepted: 09/08/2023] [Indexed: 09/21/2023] Open
Abstract
The discrete-time Wright-Fisher (DTWF) model and its diffusion limit are central to population genetics. These models can describe the forward-in-time evolution of allele frequencies in a population resulting from genetic drift, mutation, and selection. Computing likelihoods under the diffusion process is feasible, but the diffusion approximation breaks down for large samples or in the presence of strong selection. Existing methods for computing likelihoods under the DTWF model do not scale to current exome sequencing sample sizes in the hundreds of thousands. Here, we present a scalable algorithm that approximates the DTWF model with provably bounded error. Our approach relies on two key observations about the DTWF model. The first is that transition probabilities under the model are approximately sparse. The second is that transition distributions for similar starting allele frequencies are extremely close as distributions. Together, these observations enable approximate matrix-vector multiplication in linear (as opposed to the usual quadratic) time. We prove similar properties for Hypergeometric distributions, enabling fast computation of likelihoods for subsamples of the population. We show theoretically and in practice that this approximation is highly accurate and can scale to population sizes in the tens of millions, paving the way for rigorous biobank-scale inference. Finally, we use our results to estimate the impact of larger samples on estimating selection coefficients for loss-of-function variants. We find that increasing sample sizes beyond existing large exome sequencing cohorts will provide essentially no additional information except for genes with the most extreme fitness effects.
Collapse
Affiliation(s)
- Jeffrey P Spence
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Tony Zeng
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | | | - Jonathan K Pritchard
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
26
|
Poyraz L, Colbran LL, Mathieson I. Predicting functional consequences of recent natural selection in Britain. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.16.562549. [PMID: 37904954 PMCID: PMC10614889 DOI: 10.1101/2023.10.16.562549] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
Ancient DNA can directly reveal the contribution of natural selection to human genomic variation. However, while the analysis of ancient DNA has been successful at identifying genomic signals of selection, inferring the phenotypic consequences of that selection has been more difficult. Most trait-associated variants are non-coding, so we expect that a large proportion of the phenotypic effects of selection will also act through non-coding variation. Since we cannot measure gene expression directly in ancient individuals, we used an approach (Joint-Tissue Imputation; JTI) developed to predict gene expression from genotype data. We tested for changes in the predicted expression of 17,384 protein coding genes over a time transect of 4500 years using 91 present-day and 616 ancient individuals from Britain. We identified 28 genes at seven genomic loci with significant (FDR < 0.05) changes in predicted expression levels in this time period. We compared the results from our transcriptome-wide scan to a genome-wide scan based on estimating per-SNP selection coefficients from time series data. At five previously identified loci, our approach allowed us to highlight small numbers of genes with evidence for significant shifts in expression from peaks that in some cases span tens of genes. At two novel loci (SLC44A5 and NUP85), we identify selection on gene expression not captured by scans based on genomic signatures of selection. Finally we show how classical selection statistics (iHS and SDS) can be combined with JTI models to incorporate functional information into scans that use present-day data alone. These results demonstrate the potential of this type of information to explore both the causes and consequences of natural selection.
Collapse
Affiliation(s)
- Lin Poyraz
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Computational Biology, Cornell University, Ithaca, NY, USA
| | - Laura L. Colbran
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Iain Mathieson
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
27
|
Mikhailova SV. Problems with studying directional natural selection in humans. Vavilovskii Zhurnal Genet Selektsii 2023; 27:684-693. [PMID: 38023807 PMCID: PMC10643113 DOI: 10.18699/vjgb-23-79] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Revised: 07/03/2023] [Accepted: 07/03/2023] [Indexed: 12/01/2023] Open
Abstract
The review describes the main methods for assessing directional selection in human populations. These include bioinformatic analysis of DNA sequences via detection of linkage disequilibrium and of deviations from the random distribution of frequencies of genetic variants, demographic and anthropometric studies based on a search for a correlation between fertility and phenotypic traits, genome-wide association studies on fertility along with genetic loci and polygenic risk scores, and a comparison of allele frequencies between generations (in modern samples and in those obtained from burials). Each approach has its limitations and is applicable to different periods in the evolution of Homo sapiens. The main source of error in such studies is thought to be sample stratification, the small number of studies on nonwhite populations, the impossibility of a complete comparison of the associations found and functionally significant causative variants, and the difficulty with taking into account all nongenetic determinants of fertility in contemporary populations. The results obtained by various methods indicate that the direction of human adaptation to new food products has not changed during evolution since the Neolithic; many variants of immunity genes associated with inflammatory and autoimmune diseases in modern populations have undergone positive selection over the past 2-3 thousand years owing to the spread of bacterial and viral infections. For some genetic variants and polygenic traits, an alteration of the direction of natural selection in Europe has been documented, e. g., for those associated with an immune response and cognitive abilities. Examination of the correlation between fertility and educational attainment yields conflicting results. In modern populations, to a greater extent than previously, there is selection for variants of genes responsible for social adaptation and behavioral phenotypes. In particular, several articles have shown a positive correlation of fertility with polygenic risk scores of attention deficit/hyperactivity disorder.
Collapse
Affiliation(s)
- S V Mikhailova
- Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| |
Collapse
|
28
|
Zeng T, Spence JP, Mostafavi H, Pritchard JK. Bayesian estimation of gene constraint from an evolutionary model with gene features. RESEARCH SQUARE 2023:rs.3.rs-3012879. [PMID: 37398424 PMCID: PMC10312940 DOI: 10.21203/rs.3.rs-3012879/v1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
Measures of selective constraint on genes have been used for many applications including clinical interpretation of rare coding variants, disease gene discovery, and studies of genome evolution. However, widely-used metrics are severely underpowered at detecting constraint for the shortest ~25% of genes, potentially causing important pathogenic mutations to be overlooked. We developed a framework combining a population genetics model with machine learning on gene features to enable accurate inference of an interpretable constraint metric, s het . Our estimates outperform existing metrics for prioritizing genes important for cell essentiality, human disease, and other phenotypes, especially for short genes. Our new estimates of selective constraint should have wide utility for characterizing genes relevant to human disease. Finally, our inference framework, GeneBayes, provides a flexible platform that can improve estimation of many gene-level properties, such as rare variant burden or gene expression differences.
Collapse
Affiliation(s)
- Tony Zeng
- Department of Genetics, Stanford University, Stanford CA
| | | | | | - Jonathan K. Pritchard
- Department of Genetics, Stanford University, Stanford CA
- Department of Biology, Stanford University, Stanford CA
| |
Collapse
|
29
|
Spence JP, Zeng T, Mostafavi H, Pritchard JK. Scaling the Discrete-time Wright Fisher model to biobank-scale datasets. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.19.541517. [PMID: 37293115 PMCID: PMC10245735 DOI: 10.1101/2023.05.19.541517] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The Discrete-Time Wright Fisher (DTWF) model and its large population diffusion limit are central to population genetics. These models describe the forward-in-time evolution of the frequency of an allele in a population and can include the fundamental forces of genetic drift, mutation, and selection. Computing like-lihoods under the diffusion process is feasible, but the diffusion approximation breaks down for large sample sizes or in the presence of strong selection. Unfortunately, existing methods for computing likelihoods under the DTWF model do not scale to current exome sequencing sample sizes in the hundreds of thousands. Here we present an algorithm that approximates the DTWF model with provably bounded error and runs in time linear in the size of the population. Our approach relies on two key observations about Binomial distributions. The first is that Binomial distributions are approximately sparse. The second is that Binomial distributions with similar success probabilities are extremely close as distributions, allowing us to approximate the DTWF Markov transition matrix as a very low rank matrix. Together, these observations enable matrix-vector multiplication in linear (as opposed to the usual quadratic) time. We prove similar properties for Hypergeometric distributions, enabling fast computation of likelihoods for subsamples of the population. We show theoretically and in practice that this approximation is highly accurate and can scale to population sizes in the billions, paving the way for rigorous biobank-scale population genetic inference. Finally, we use our results to estimate how increasing sample sizes will improve the estimation of selection coefficients acting on loss-of-function variants. We find that increasing sample sizes beyond existing large exome sequencing cohorts will provide essentially no additional information except for genes with the most extreme fitness effects.
Collapse
Affiliation(s)
| | - Tony Zeng
- Department of Genetics, Stanford University
| | | | - Jonathan K. Pritchard
- Department of Genetics, Stanford University
- Department of Biology, Stanford University
| |
Collapse
|
30
|
Vilgalys TP, Klunk J, Demeure CE, Cheng X, Shiratori M, Madej J, Beau R, Elli D, Patino MI, Redfern R, DeWitte SN, Gamble JA, Boldsen JL, Carmichael A, Varlik N, Eaton K, Grenier JC, Golding GB, Devault A, Rouillard JM, Yotova V, Sindeaux R, Ye CJ, Bikaran M, Dumaine A, Brinkworth JF, Missiakas D, Rouleau GA, Steinrücken M, Pizarro-Cerdá J, Poinar HN, Barreiro LB. Reply to Barton et al: signatures of natural selection during the Black Death. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.06.535944. [PMID: 37066254 PMCID: PMC10104142 DOI: 10.1101/2023.04.06.535944] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]
Abstract
Barton et al.1 raise several statistical concerns regarding our original analyses2 that highlight the challenge of inferring natural selection using ancient genomic data. We show here that these concerns have limited impact on our original conclusions. Specifically, we recover the same signature of enrichment for high FST values at the immune loci relative to putatively neutral sites after switching the allele frequency estimation method to a maximum likelihood approach, filtering to only consider known human variants, and down-sampling our data to the same mean coverage across sites. Furthermore, using permutations, we show that the rs2549794 variant near ERAP2 continues to emerge as the strongest candidate for selection (p = 1.2×10-5), falling below the Bonferroni-corrected significance threshold recommended by Barton et al. Importantly, the evidence for selection on ERAP2 is further supported by functional data demonstrating the impact of the ERAP2 genotype on the immune response to Y. pestis and by epidemiological data from an independent group showing that the putatively selected allele during the Black Death protects against severe respiratory infection in contemporary populations.
Collapse
Affiliation(s)
- Tauras P Vilgalys
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Jennifer Klunk
- McMaster Ancient DNA Centre, Departments of Anthropology, Biology and Biochemistry, McMaster University, Hamilton, Ontario, Canada L8S4L9
- Daicel Arbor Biosciences, Ann Arbor, MI, USA
| | - Christian E Demeure
- Institut Pasteur, Université Paris Cité, CNRS UMR6047, Yersinia Research Unit, Microbiology Department, F-75015 Paris, France
| | - Xiaoheng Cheng
- Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA
| | - Mari Shiratori
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Julien Madej
- Institut Pasteur, Université Paris Cité, CNRS UMR6047, Yersinia Research Unit, Microbiology Department, F-75015 Paris, France
| | - Rémi Beau
- Institut Pasteur, Université Paris Cité, CNRS UMR6047, Yersinia Research Unit, Microbiology Department, F-75015 Paris, France
| | - Derek Elli
- Department of Microbiology, Ricketts Laboratory, University of Chicago, Lemont, IL, USA
| | - Maria I Patino
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Rebecca Redfern
- Centre for Human Bioarchaeology, Museum of London, London, UK, EC2Y 5HN
| | - Sharon N DeWitte
- Department of Anthropology, University of South Carolina, Columbia, SC, USA
| | - Julia A Gamble
- Department of Anthropology, University of Manitoba, Winnipeg, Manitoba, R3T2N2
| | - Jesper L Boldsen
- Department of Forensic Medicine, Unit of Anthropology (ADBOU), University of Southern Denmark, Odense S, 5260, Denmark
| | - Ann Carmichael
- History Department, Indiana University, Bloomington, IN, USA
| | - Nükhet Varlik
- Department of History, Rutgers University-Newark, NJ, USA
| | - Katherine Eaton
- McMaster Ancient DNA Centre, Departments of Anthropology, Biology and Biochemistry, McMaster University, Hamilton, Ontario, Canada L8S4L9
| | - Jean-Christophe Grenier
- Montreal Heart Institute, Faculty of Medicine, Université de Montréal, Montréal, Quebec, Canada, H1T 1C7
| | - G Brian Golding
- McMaster Ancient DNA Centre, Departments of Anthropology, Biology and Biochemistry, McMaster University, Hamilton, Ontario, Canada L8S4L9
| | | | - Jean-Marie Rouillard
- Daicel Arbor Biosciences, Ann Arbor, MI, USA
- Department of Chemical Engineering, University of Michigan Ann Arbor, Ann Arbor, MI, USA
| | - Vania Yotova
- Centre Hospitalier Universitaire Sainte-Justine, Montréal, Quebec, Canada, H3T 1C5
| | - Renata Sindeaux
- Centre Hospitalier Universitaire Sainte-Justine, Montréal, Quebec, Canada, H3T 1C5
| | - Chun Jimmie Ye
- Division of Rheumatology, Department of Medicine, University of California, San Francisco, CA, USA
- Institute for Human Genetics, University of California, San Francisco, CA, USA
| | - Matin Bikaran
- Division of Rheumatology, Department of Medicine, University of California, San Francisco, CA, USA
- Institute for Human Genetics, University of California, San Francisco, CA, USA
| | - Anne Dumaine
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Jessica F Brinkworth
- Department of Anthropology, University of Illinois Urbana-Champaign, Urbana, IL, USA
- Carl R Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Dominique Missiakas
- Department of Microbiology, Ricketts Laboratory, University of Chicago, Lemont, IL, USA
| | - Guy A Rouleau
- Montreal Neurological Institute-Hospital, McGill University, Montréal, Quebec, Canada, H3A 2B4
| | - Matthias Steinrücken
- Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Javier Pizarro-Cerdá
- Institut Pasteur, Université Paris Cité, CNRS UMR6047, Yersinia Research Unit, Microbiology Department, F-75015 Paris, France
| | - Hendrik N Poinar
- McMaster Ancient DNA Centre, Departments of Anthropology, Biology and Biochemistry, McMaster University, Hamilton, Ontario, Canada L8S4L9
- Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA
- Department of Microbiology, Ricketts Laboratory, University of Chicago, Lemont, IL, USA
| | - Luis B Barreiro
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
- Centre for Human Bioarchaeology, Museum of London, London, UK, EC2Y 5HN
- Department of Anthropology, University of South Carolina, Columbia, SC, USA
- Department of Anthropology, University of Manitoba, Winnipeg, Manitoba, R3T2N2
| |
Collapse
|
31
|
Barton AR, Santander CG, Skoglund P, Moltke I, Reich D, Mathieson I. Insufficient evidence for natural selection associated with the Black Death. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.14.532615. [PMID: 36993413 PMCID: PMC10055098 DOI: 10.1101/2023.03.14.532615] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
Klunk et al. analyzed ancient DNA data from individuals in London and Denmark before, during and after the Black Death [1], and argued that allele frequency changes at immune genes were too large to be produced by random genetic drift and thus must reflect natural selection. They also identified four specific variants that they claimed show evidence of selection including at ERAP2, for which they estimate a selection coefficient of 0.39-several times larger than any selection coefficient on a common human variant reported to date. Here we show that these claims are unsupported for four reasons. First, the signal of enrichment of large allele frequency changes in immune genes comparing people in London before and after the Black Death disappears after an appropriate randomization test is carried out: the P value increases by ten orders of magnitude and is no longer significant. Second, a technical error in the estimation of allele frequencies means that none of the four originally reported loci actually pass the filtering thresholds. Third, the filtering thresholds do not adequately correct for multiple testing. Finally, in the case of the ERAP2 variant rs2549794, which Klunk et al. show experimentally may be associated with a host interaction with Y. pestis, we find no evidence of significant frequency change either in the data that Klunk et al. report, or in published data spanning 2,000 years. While it remains plausible that immune genes were subject to natural selection during the Black Death, the magnitude of this selection and which specific genes may have been affected remains unknown.
Collapse
Affiliation(s)
- Alison R. Barton
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| | - Cindy G. Santander
- Department of Biology, University of Copenhagen, Copenhagen, DK-2200, Denmark
| | - Pontus Skoglund
- Ancient Genomics Laboratory, The Francis Crick Institute, London NW1 1AT, UK
| | - Ida Moltke
- Department of Biology, University of Copenhagen, Copenhagen, DK-2200, Denmark
| | - David Reich
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA 02115, USA
| | - Iain Mathieson
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia PA 19104, USA
| |
Collapse
|
32
|
Fluctuating selection and the determinants of genetic variation. Trends Genet 2023; 39:491-504. [PMID: 36890036 DOI: 10.1016/j.tig.2023.02.004] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 02/01/2023] [Accepted: 02/07/2023] [Indexed: 03/08/2023]
Abstract
Recent studies of cosmopolitan Drosophila populations have found hundreds to thousands of genetic loci with seasonally fluctuating allele frequencies, bringing temporally fluctuating selection to the forefront of the historical debate surrounding the maintenance of genetic variation in natural populations. Numerous mechanisms have been explored in this longstanding area of research, but these exciting empirical findings have prompted several recent theoretical and experimental studies that seek to better understand the drivers, dynamics, and genome-wide influence of fluctuating selection. In this review, we evaluate the latest evidence for multilocus fluctuating selection in Drosophila and other taxa, highlighting the role of potential genetic and ecological mechanisms in maintaining these loci and their impacts on neutral genetic variation.
Collapse
|