1
|
Burban E, Tenaillon MI, Glémin S. RIDGE, a tool tailored to detect gene flow barriers across species pairs. Mol Ecol Resour 2024; 24:e13944. [PMID: 38419376 DOI: 10.1111/1755-0998.13944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 01/19/2024] [Accepted: 02/05/2024] [Indexed: 03/02/2024]
Abstract
Characterizing the processes underlying reproductive isolation between diverging lineages is central to understanding speciation. Here, we present RIDGE-Reproductive Isolation Detection using Genomic polymorphisms-a tool tailored for quantifying gene flow barrier proportion and identifying the relevant genomic regions. RIDGE relies on an Approximate Bayesian Computation with a model-averaging approach to accommodate diverse scenarios of lineage divergence. It captures heterogeneity in effective migration rate along the genome while accounting for variation in linked selection and recombination. The barrier detection test relies on numerous summary statistics to compute a Bayes factor, offering a robust statistical framework that facilitates cross-species comparisons. Simulations revealed RIDGE's efficiency in capturing signals of ongoing migration. Model averaging proved particularly valuable in scenarios of high model uncertainty where no migration or migration homogeneity can be wrongly assumed, typically for recent divergence times <0.1 2Ne generations. Applying RIDGE to four published crow data sets, we first validated our tool by identifying a well-known large genomic region associated with mate choice patterns. Second, while we identified a significant overlap of outlier loci using RIDGE and traditional genomic scans, our results suggest that a substantial portion of previously identified outliers are likely false positives. Outlier detection relies on allele differentiation, relative measures of divergence and the count of shared polymorphisms and fixed differences. Our analyses also highlight the value of incorporating multiple summary statistics including our newly developed outlier ones that can be useful in challenging detection conditions.
Collapse
Affiliation(s)
- Ewen Burban
- University of Rennes, CNRS, ECOBIO-UMR 6553, Rennes, France
| | - Maud I Tenaillon
- University Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE-Le Moulon, Gif-sur-Yvette, France
| | - Sylvain Glémin
- University of Rennes, CNRS, ECOBIO-UMR 6553, Rennes, France
- Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| |
Collapse
|
2
|
Rollins RE, Margos G, Brachmann A, Krebs S, Mouchet A, Dingemanse NJ, Laatamna A, Reghaissia N, Fingerle V, Metzler D, Becker NS, Chitimia-Dobler L. German Ixodes inopinatus samples may not actually represent this tick species. Int J Parasitol 2023; 53:751-761. [PMID: 37516335 DOI: 10.1016/j.ijpara.2023.06.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Revised: 06/05/2023] [Accepted: 06/06/2023] [Indexed: 07/31/2023]
Abstract
Ticks are important vectors of human and animal pathogens, but many questions remain unanswered regarding their taxonomy. Molecular sequencing methods have allowed research to start understanding the evolutionary history of even closely related tick species. Ixodes inopinatus is considered a sister species and highly similar to Ixodes ricinus, an important vector of many tick-borne pathogens in Europe, but identification between these species remains ambiguous with disagreement on the geographic extent of I. inopinatus. In 2018-2019, 1583 ticks were collected from breeding great tits (Parus major) in southern Germany, of which 45 were later morphologically identified as I. inopinatus. We aimed to confirm morphological identification using molecular tools. Utilizing two genetic markers (16S rRNA, TROSPA) and whole genome sequencing of specific ticks (n = 8), we were able to determine that German samples, morphologically identified as I. inopinatus, genetically represent I. ricinus regardless of previous morphological identification, and most likely are not I. ricinus/I. inopinatus hybrids. Further, our results showed that the entire mitochondrial genome, let alone singular mitochondrial genes (i.e., 16S), is unable to distinguish between I. ricinus and I. inopinatus. Our results suggest that I. inopinatus is geographically isolated as a species (northern Africa and potentially southern Spain and Portugal) and brings into question whether I. inopinatus exists in central Europe. Our results highlight the probable existence of I. inopinatus and the power of utilizing genomic data in answering questions regarding tick taxonomy.
Collapse
Affiliation(s)
- Robert E Rollins
- Institute of Avian Research "Vogelwarte Helgoland", Wilhelmshaven, Germany.
| | - Gabriele Margos
- National Reference Center for Borrelia, Bayerisches Landesamt für Gesundheit und Lebensmittelsicherheit, Oberschleißheim, Germany
| | - Andreas Brachmann
- Genetics, Faculty of Biology, LMU Munich, Planegg-Martinsried, Germany
| | - Stefan Krebs
- Gene Center, Laboratory for Functional Genome Analysis, LMU Munich, Munich, Germany
| | - Alexia Mouchet
- Behavioural Ecology Group, LMU Munich/Department of Biology, Planegg-Martinsried, Germany; IDEEV UMR Evolution, Génomes, Comportement, Ecologie, IRD, CNRS, Université Paris-Saclay, Gif-sur-Yvette, France
| | - Niels J Dingemanse
- Behavioural Ecology Group, LMU Munich/Department of Biology, Planegg-Martinsried, Germany
| | - AbdElkarim Laatamna
- Faculty of Nature and Life Sciences, University of Djelfa, Moudjbara Road, BP 3117, Djelfa, Algeria
| | - Nassiba Reghaissia
- Laboratory of Sciences and Living Techniques, Institute of Agronomic and Veterinary Sciences, University of Souk Ahras, Annaba Road 41000, Souk Ahras, Algeria
| | - Volker Fingerle
- National Reference Center for Borrelia, Bayerisches Landesamt für Gesundheit und Lebensmittelsicherheit, Oberschleißheim, Germany
| | - Dirk Metzler
- Division of Evolutionary Biology, Faculty of Biology, LMU Munich, Germany
| | - Noémie S Becker
- Division of Evolutionary Biology, Faculty of Biology, LMU Munich, Germany
| | | |
Collapse
|
3
|
Carvalho J, Morales HE, Faria R, Butlin RK, Sousa VC. Integrating Pool-seq uncertainties into demographic inference. Mol Ecol Resour 2023; 23:1737-1755. [PMID: 37475177 DOI: 10.1111/1755-0998.13834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Revised: 06/16/2023] [Accepted: 06/30/2023] [Indexed: 07/22/2023]
Abstract
Next-generation sequencing of pooled samples (Pool-seq) is a popular method to assess genome-wide diversity patterns in natural and experimental populations. However, Pool-seq is associated with specific sources of noise, such as unequal individual contributions. Consequently, using Pool-seq for the reconstruction of evolutionary history has remained underexplored. Here we describe a novel Approximate Bayesian Computation (ABC) method to infer demographic history, explicitly modelling Pool-seq sources of error. By jointly modelling Pool-seq data, demographic history and the effects of selection due to barrier loci, we obtain estimates of demographic history parameters accounting for technical errors associated with Pool-seq. Our ABC approach is computationally efficient as it relies on simulating subsets of loci (rather than the whole-genome) and on using relative summary statistics and relative model parameters. Our simulation study results indicate Pool-seq data allows distinction between general scenarios of ecotype formation (single versus parallel origin) and to infer relevant demographic parameters (e.g. effective sizes and split times). We exemplify the application of our method to Pool-seq data from the rocky-shore gastropod Littorina saxatilis, sampled on a narrow geographical scale at two Swedish locations where two ecotypes (Wave and Crab) are found. Our model choice and parameter estimates show that ecotypes formed before colonization of the two locations (i.e. single origin) and are maintained despite gene flow. These results indicate that demographic modelling and inference can be successful based on pool-sequencing using ABC, contributing to the development of suitable null models that allow for a better understanding of the genetic basis of divergent adaptation.
Collapse
Affiliation(s)
- João Carvalho
- cE3c - Centre for Ecology, Evolution and Environmental Changes & CHANGE - Global Change and Sustainability Institute, Departamento de Biologia Animal, Faculdade de Ciências, Universidade de Lisboa, Campo Grande, Portugal
| | - Hernán E Morales
- Section for Hologenomics, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Rui Faria
- CIBIO - Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO, Laboratório Associado, Universidade do Porto, Vairão, Portugal
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Vairão, Portugal
| | - Roger K Butlin
- Ecology and Evolutionary Biology, School of Biosciences, University of Sheffield, Sheffield, UK
- Department of Marine Sciences, University of Gothenburg, Gothenburg, Sweden
| | - Vítor C Sousa
- cE3c - Centre for Ecology, Evolution and Environmental Changes & CHANGE - Global Change and Sustainability Institute, Departamento de Biologia Animal, Faculdade de Ciências, Universidade de Lisboa, Campo Grande, Portugal
| |
Collapse
|
4
|
Korfmann K, Abu Awad D, Tellier A. Weak seed banks influence the signature and detectability of selective sweeps. J Evol Biol 2023; 36:1282-1294. [PMID: 37551039 DOI: 10.1111/jeb.14204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 06/20/2023] [Accepted: 06/27/2023] [Indexed: 08/09/2023]
Abstract
Seed banking (or dormancy) is a widespread bet-hedging strategy, generating a form of population overlap, which decreases the magnitude of genetic drift. The methodological complexity of integrating this trait implies it is ignored when developing tools to detect selective sweeps. But, as dormancy lengthens the ancestral recombination graph (ARG), increasing times to fixation, it can change the genomic signatures of selection. To detect genes under positive selection in seed banking species it is important to (1) determine whether the efficacy of selection is affected, and (2) predict the patterns of nucleotide diversity at and around positively selected alleles. We present the first tree sequence-based simulation program integrating a weak seed bank to examine the dynamics and genomic footprints of beneficial alleles in a finite population. We find that seed banking does not affect the probability of fixation and confirm expectations of increased times to fixation. We also confirm earlier findings that, for strong selection, the times to fixation are not scaled by the inbreeding effective population size in the presence of seed banks, but are shorter than would be expected. As seed banking increases the effective recombination rate, footprints of sweeps appear narrower around the selected sites and due to the scaling of the ARG are detectable for longer periods of time. The developed simulation tool can be used to predict the footprints of selection and draw statistical inference of past evolutionary events in plants, invertebrates, or fungi with seed banks.
Collapse
Affiliation(s)
- Kevin Korfmann
- Department of Life Science Systems, School of Life Sciences, Technical University of Munich, München, Germany
| | - Diala Abu Awad
- Department of Life Science Systems, School of Life Sciences, Technical University of Munich, München, Germany
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE-Le Moulon, Gif-sur-Yvette, France
| | - Aurélien Tellier
- Department of Life Science Systems, School of Life Sciences, Technical University of Munich, München, Germany
| |
Collapse
|
5
|
Wei K, Silva-Arias GA, Tellier A. Selective sweeps linked to the colonization of novel habitats and climatic changes in a wild tomato species. THE NEW PHYTOLOGIST 2023; 237:1908-1921. [PMID: 36419182 DOI: 10.1111/nph.18634] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 11/16/2022] [Indexed: 06/16/2023]
Abstract
Positive selection is the driving force underpinning local adaptation and leaves footprints of selective sweeps on the underlying major genes. Quantifying the timing of selection and revealing the genetic bases of adaptation in plant species occurring in steep and varying environmental gradients are crucial to predict a species' ability to colonize new niches. We use whole-genome sequence data from six populations across three different habitats of the wild tomato species Solanum chilense to infer the past demographic history and search for genes under strong positive selection. We then correlate current and past climatic projections with the demographic history, allele frequencies, the age of selection events and distribution shifts. Several selective sweeps occur at regulatory networks involved in root-hair development in low altitude and response to photoperiod and vernalization in high-altitude populations. These sweeps appear to occur in a concerted fashion in a given regulatory gene network at particular periods of substantial climatic change. Using a unique combination of genome scans and modelling of past climatic data, we quantify the timing of selection at genes likely underpinning local adaptation to semiarid habitats.
Collapse
Affiliation(s)
- Kai Wei
- Population Genetics, Department of Life Science Systems, School of Life Sciences, Technical University of Munich, Liesel-Beckmann Strasse 2, 85354, Freising, Germany
| | - Gustavo A Silva-Arias
- Population Genetics, Department of Life Science Systems, School of Life Sciences, Technical University of Munich, Liesel-Beckmann Strasse 2, 85354, Freising, Germany
| | - Aurélien Tellier
- Population Genetics, Department of Life Science Systems, School of Life Sciences, Technical University of Munich, Liesel-Beckmann Strasse 2, 85354, Freising, Germany
| |
Collapse
|
6
|
Lees JA, Tonkin-Hill G, Yang Z, Corander J. Mandrake: visualizing microbial population structure by embedding millions of genomes into a low-dimensional representation. Philos Trans R Soc Lond B Biol Sci 2022; 377:20210237. [PMID: 35989601 PMCID: PMC9393562 DOI: 10.1098/rstb.2021.0237] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
In less than a decade, population genomics of microbes has progressed from the effort of sequencing dozens of strains to thousands, or even tens of thousands of strains in a single study. There are now hundreds of thousands of genomes available even for a single bacterial species, and the number of genomes is expected to continue to increase at an accelerated pace given the advances in sequencing technology and widespread genomic surveillance initiatives. This explosion of data calls for innovative methods to enable rapid exploration of the structure of a population based on different data modalities, such as multiple sequence alignments, assemblies and estimates of gene content across different genomes. Here, we present Mandrake, an efficient implementation of a dimensional reduction method tailored for the needs of large-scale population genomics. Mandrake is capable of visualizing population structure from millions of whole genomes, and we illustrate its usefulness with several datasets representing major pathogens. Our method is freely available both as an analysis pipeline (https://github.com/johnlees/mandrake) and as a browser-based interactive application (https://gtonkinhill.github.io/mandrake-web/). This article is part of a discussion meeting issue ‘Genomic population structures of microbial pathogens’.
Collapse
Affiliation(s)
- John A Lees
- MRC Centre for Global Infectious Disease Analysis, School of Public Health, Imperial College London, London W2 1PG, UK.,European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Hinxton CB10 1SD, UK
| | | | - Zhirong Yang
- Department of Computer Science, Norwegian University of Science and Technology, 7491 Trondheim, Norway.,Aalto University, 02150 Espoo, Finland
| | - Jukka Corander
- Department of Biostatistics, University of Oslo, 0317 Oslo, Norway.,Parasites and Microbes, Wellcome Sanger Institute, Cambridge CB10 1SA, UK.,Helsinki Institute for Information Technology HIIT, Department of Mathematics and Statistics, University of Helsinki, 00100 Helsinki, Finland
| |
Collapse
|
7
|
Bendall EE, Bagley RK, Sousa VC, Linnen CR. Faster-haplodiploid evolution under divergence-with-gene-flow: simulations and empirical data from pine-feeding hymenopterans. Mol Ecol 2022; 31:2348-2366. [PMID: 35231148 DOI: 10.1111/mec.16410] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Revised: 02/10/2022] [Accepted: 02/21/2022] [Indexed: 11/28/2022]
Abstract
Although haplodiploidy is widespread in nature, the evolutionary consequences of this mode of reproduction are not well characterized. Here, we examine how genome-wide hemizygosity and a lack of recombination in haploid males affects genomic differentiation in populations that diverge via natural selection while experiencing gene flow. First, we simulated diploid and haplodiploid "genomes" (500-kb loci) evolving under an isolation-with-migration model with mutation, drift, selection, migration, and recombination; and examined differentiation at neutral sites both tightly and loosely linked to a divergently selected site. So long as there is divergent selection and migration, sex-limited hemizygosity and recombination cause elevated differentiation (i.e., produce a "faster-haplodiploid effect") in haplodiploid populations relative to otherwise equivalent diploid populations, for both recessive and codominant mutations. Second, we used genome-wide SNP data to model divergence history and describe patterns of genomic differentiation between sympatric populations of Neodiprion lecontei and N. pinetum, a pair of pine sawfly species (order: Hymenoptera; family: Diprionidae) that are specialized on different pine hosts. These analyses support a history of continuous gene exchange throughout divergence and reveal a pattern of heterogeneous genomic differentiation that is consistent with divergent selection on many unlinked loci. Third, using simulations of haplodiploid and diploid populations evolving according to the estimated divergence history of N. lecontei and N. pinetum, we found that divergent selection would lead to higher differentiation in haplodiploids. Based on these results, we hypothesize that haplodiploids undergo divergence-with-gene-flow and sympatric speciation more readily than diploids.
Collapse
Affiliation(s)
- Emily E Bendall
- Department of Biology, University of Kentucky, Lexington, Kentucky, 40506, USA.,Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, 48109, USA
| | - Robin K Bagley
- Department of Biology, University of Kentucky, Lexington, Kentucky, 40506, USA.,Department of Evolution, Ecology, and Organismal Biology, The Ohio State University at Lima, Lima, OH, 45804, USA
| | - Vitor C Sousa
- CE3C - Centre for Ecology, Evolution and Environmental Changes, Department of Animal Biology, Faculdade de Ciências da Universidade de Lisboa, University of Lisbon, Campo Grande 1749-016, Lisboa, Portugal
| | - Catherine R Linnen
- Department of Biology, University of Kentucky, Lexington, Kentucky, 40506, USA
| |
Collapse
|
8
|
Dittberner H, Tellier A, de Meaux J. Approximate Bayesian computation untangles signatures of contemporary and historical hybridization between two endangered species. Mol Biol Evol 2022; 39:6516021. [PMID: 35084503 PMCID: PMC8826969 DOI: 10.1093/molbev/msac015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Contemporary gene flow, when resumed after a period of isolation, can have crucial consequences for endangered species, as it can both increase the supply of adaptive alleles and erode local adaptation. Determining the history of gene flow and thus the importance of contemporary hybridization, however, is notoriously difficult. Here, we focus on two endangered plant species, Arabis nemorensis and A. sagittata, which hybridize naturally in a sympatric population located on the banks of the Rhine. Using reduced genome sequencing, we determined the phylogeography of the two taxa but report only a unique sympatric population. Molecular variation in chloroplast DNA indicated that A. sagittata is the principal receiver of gene flow. Applying classical D-statistics and its derivatives to whole-genome data of 35 accessions, we detect gene flow not only in the sympatric population but also among allopatric populations. Using an Approximate Bayesian computation approach, we identify the model that best describes the history of gene flow between these taxa. This model shows that low levels of gene flow have persisted long after speciation. Around 10 000 years ago, gene flow stopped and a period of complete isolation began. Eventually, a hotspot of contemporary hybridization was formed in the unique sympatric population. Occasional sympatry may have helped protect these lineages from extinction in spite of their extremely low diversity.
Collapse
Affiliation(s)
- Hannes Dittberner
- Institute of Plant Sciences,University of Cologne, Zülpicher str. 47b, Germany
| | - Aurelien Tellier
- Department of Life Science Systems, Technical University of Munich, Freising, Germany
| | - Juliette de Meaux
- Institute of Plant Sciences,University of Cologne, Zülpicher str. 47b, Germany
| |
Collapse
|
9
|
Mueller JC, Botero-Delgadillo E, Espíndola-Hernández P, Gilsenan C, Ewels P, Gruselius J, Kempenaers B. Local selection signals in the genome of Blue tits emphasize regulatory and neuronal evolution. Mol Ecol 2022; 31:1504-1514. [PMID: 34995389 DOI: 10.1111/mec.16345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Revised: 11/18/2021] [Accepted: 12/15/2021] [Indexed: 11/30/2022]
Abstract
Understanding the genomic landscape of adaptation is central to the understanding of microevolution in wild populations. Genomic targets of selection and the underlying genomic mechanisms of adaptation can be elucidated by genome-wide scans for past selective sweeps or by scans for direct fitness associations. We sequenced and assembled 150 haplotypes of 75 Blue tits (Cyanistes caeruleus) of a single central-European population by a linked-read technology. We used these genome data in combination with coalescent simulations (1) to estimate an historical effective population size of ~250,000, which recently declined to ~10,000, and (2) to identify genome-wide distributed selective sweeps of beneficial variants most likely originating from standing genetic variation (soft sweeps). The genes linked to these soft sweeps, but also the ones linked to hard sweeps based on new beneficial mutants, showed a significant enrichment for functions associated with gene expression and transcription regulation. This emphasizes the importance of regulatory evolution in the population's adaptive history. Soft sweeps were further enriched for genes related to axon and synapse development, indicating the significance of neuronal connectivity changes in the brain potentially linked to behavioural adaptations. A previous scan of heterozygosity-fitness correlations revealed a consistent negative effect on arrival date at the breeding site for a single microsatellite in the MDGA2 gene. Here, we used the haplotype structure around this microsatellite to explain the effect as a local and direct outbreeding effect of a gene involved in synapse development.
Collapse
Affiliation(s)
- Jakob C Mueller
- Department of Behavioural Ecology and Evolutionary Genetics, Max Planck Institute for Ornithology, Seewiesen, Germany
| | - Esteban Botero-Delgadillo
- Department of Behavioural Ecology and Evolutionary Genetics, Max Planck Institute for Ornithology, Seewiesen, Germany
| | - Pamela Espíndola-Hernández
- Department of Behavioural Ecology and Evolutionary Genetics, Max Planck Institute for Ornithology, Seewiesen, Germany
| | - Carol Gilsenan
- Department of Behavioural Ecology and Evolutionary Genetics, Max Planck Institute for Ornithology, Seewiesen, Germany
| | - Phil Ewels
- Science for Life Laboratory (SciLifeLab), Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | - Joel Gruselius
- Science for Life Laboratory, Department of Biosciences and Nutrition, Karolinska Institutet, Stockholm, Sweden.,current address: Vanadis Diagnostics, PerkinElmer, Sollentuna, Sweden
| | - Bart Kempenaers
- Department of Behavioural Ecology and Evolutionary Genetics, Max Planck Institute for Ornithology, Seewiesen, Germany
| |
Collapse
|
10
|
Baumdicker F, Bisschop G, Goldstein D, Gower G, Ragsdale AP, Tsambos G, Zhu S, Eldon B, Ellerman EC, Galloway JG, Gladstein AL, Gorjanc G, Guo B, Jeffery B, Kretzschmar WW, Lohse K, Matschiner M, Nelson D, Pope NS, Quinto-Cortés CD, Rodrigues MF, Saunack K, Sellinger T, Thornton K, van Kemenade H, Wohns AW, Wong Y, Gravel S, Kern AD, Koskela J, Ralph PL, Kelleher J. Efficient ancestry and mutation simulation with msprime 1.0. Genetics 2021; 220:6460344. [PMID: 34897427 PMCID: PMC9176297 DOI: 10.1093/genetics/iyab229] [Citation(s) in RCA: 84] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 12/03/2021] [Indexed: 11/13/2022] Open
Abstract
Stochastic simulation is a key tool in population genetics, since the models involved are often analytically intractable and simulation is usually the only way of obtaining ground-truth data to evaluate inferences. Because of this, a large number of specialized simulation programs have been developed, each filling a particular niche, but with largely overlapping functionality and a substantial duplication of effort. Here, we introduce msprime version 1.0, which efficiently implements ancestry and mutation simulations based on the succinct tree sequence data structure and the tskit library. We summarize msprime’s many features, and show that its performance is excellent, often many times faster and more memory efficient than specialized alternatives. These high-performance features have been thoroughly tested and validated, and built using a collaborative, open source development model, which reduces duplication of effort and promotes software quality via community engagement.
Collapse
Affiliation(s)
- Franz Baumdicker
- Cluster of Excellence "Controlling Microbes to Fight Infections", Mathematical and Computational Population Genetics, University of Tübingen, 72076 Tübingen, Germany
| | - Gertjan Bisschop
- Institute of Evolutionary Biology,The University of Edinburgh, EH9 3FL, UK
| | - Daniel Goldstein
- Khoury College of Computer Sciences, Northeastern University, MA 02115, USA.,No affiliation
| | - Graham Gower
- Lundbeck GeoGenetics Centre, Globe Institute, University of Copenhagen, 1350 Copenhagen K, Denmark
| | - Aaron P Ragsdale
- Department of Integrative Biology, University of Wisconsin-Madison, WI 53706, USA
| | - Georgia Tsambos
- Melbourne Integrative Genomics, School of Mathematics and Statistics, University of Melbourne, Victoria, 3010, Australia
| | - Sha Zhu
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, OX3 7LF, UK
| | - Bjarki Eldon
- Leibniz Institute for Evolution and Biodiversity Science,Museum für Naturkunde Berlin, 10115, Germany
| | | | - Jared G Galloway
- Institute of Ecology and Evolution, Department of Biology, University of Oregon, OR 97403-5289, USA.,Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98102, USA
| | - Ariella L Gladstein
- Department of Genetics, University of North Carolina at Chapel Hill, NC 27599-7264, USA.,Embark Veterinary, Inc., Boston, MA 02111, USA
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, EH25 9RG, UK
| | - Bing Guo
- Institute for Genome Sciences,University of Maryland School of Medicine, Baltimore, MD, 21201, USA
| | - Ben Jeffery
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, OX3 7LF, UK
| | - Warren W Kretzschmar
- Center for Hematology and Regenerative Medicine, Karolinska Institute, 141 83 Huddinge, Sweden
| | - Konrad Lohse
- Institute of Evolutionary Biology,The University of Edinburgh, EH9 3FL, UK
| | | | - Dominic Nelson
- Department of Human Genetics, McGill University, Montréal, QC H3A 0C7, Canada
| | - Nathaniel S Pope
- Department of Entomology, Pennsylvania State University, PA 16802, USA
| | - Consuelo D Quinto-Cortés
- National Laboratory of Genomics for Biodiversity (LANGEBIO), Unit of Advanced Genomics, CINVESTAV, Irapuato, Mexico
| | - Murillo F Rodrigues
- Institute of Ecology and Evolution, Department of Biology, University of Oregon, OR 97403-5289, USA
| | - Kumar Saunack
- IIT Bombay, Powai, Mumbai 400 076, Maharashtra, India
| | - Thibaut Sellinger
- Professorship for Population Genetics, Department of Life Science Systems, Technical University of Munich, 85354 Freising, Germany
| | - Kevin Thornton
- Ecology and Evolutionary Biology, University of California, Irvine, CA 92697, USA
| | | | - Anthony W Wohns
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, OX3 7LF, UK.,Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Yan Wong
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, OX3 7LF, UK
| | - Simon Gravel
- Department of Human Genetics, McGill University, Montréal, QC H3A 0C7, Canada
| | - Andrew D Kern
- Institute of Ecology and Evolution, Department of Biology, University of Oregon, OR 97403-5289, USA
| | - Jere Koskela
- Department of Statistics, University of Warwick, CV4 7AL, UK
| | - Peter L Ralph
- Institute of Ecology and Evolution, Department of Biology, University of Oregon, OR 97403-5289, USA.,Department of Mathematics, University of Oregon, OR 97403-5289 USA
| | - Jerome Kelleher
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, OX3 7LF, UK
| |
Collapse
|
11
|
Isshiki M, Naka I, Kimura R, Nishida N, Furusawa T, Natsuhara K, Yamauchi T, Nakazawa M, Ishida T, Inaoka T, Matsumura Y, Ohtsuka R, Ohashi J. Admixture with indigenous people helps local adaptation: admixture-enabled selection in Polynesians. BMC Ecol Evol 2021; 21:179. [PMID: 34551727 PMCID: PMC8456657 DOI: 10.1186/s12862-021-01900-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Accepted: 08/25/2021] [Indexed: 01/08/2023] Open
Abstract
Background Homo sapiens have experienced admixture many times in the last few thousand years. To examine how admixture affects local adaptation, we investigated genomes of modern Polynesians, who are shaped through admixture between Austronesian-speaking people from Southeast Asia (Asian-related ancestors) and indigenous people in Near Oceania (Papuan-related ancestors). Methods In this study local ancestry was estimated across the genome in Polynesians (23 Tongan subjects) to find the candidate regions of admixture-enabled selection contributed by Papuan-related ancestors. Results The mean proportion of Papuan-related ancestry across the Polynesian genome was estimated as 24.6% (SD = 8.63%), and two genomic regions, the extended major histocompatibility complex (xMHC) region on chromosome 6 and the ATP-binding cassette transporter sub-family C member 11 (ABCC11) gene on chromosome 16, showed proportions of Papuan-related ancestry more than 5 SD greater than the mean (> 67.8%). The coalescent simulation under the assumption of selective neutrality suggested that such signals of Papuan-related ancestry enrichment were caused by positive selection after admixture (false discovery rate = 0.045). The ABCC11 harbors a nonsynonymous SNP, rs17822931, which affects apocrine secretory cell function. The approximate Bayesian computation indicated that, in Polynesian ancestors, a strong positive selection (s = 0.0217) acted on the ancestral allele of rs17822931 derived from Papuan-related ancestors. Conclusions Our results suggest that admixture with Papuan-related ancestors contributed to the rapid local adaptation of Polynesian ancestors. Considering frequent admixture events in human evolution history, the acceleration of local adaptation through admixture should be a common event in humans. Supplementary Information The online version contains supplementary material available at 10.1186/s12862-021-01900-y.
Collapse
Affiliation(s)
- Mariko Isshiki
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, 113-0033, Japan
| | - Izumi Naka
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, 113-0033, Japan
| | - Ryosuke Kimura
- Department of Human Biology and Anatomy, Graduate School of Medicine, University of the Ryukyus, Nishihara, 903-0125, Japan
| | - Nao Nishida
- Genome Medical Science Project, Research Center for Hepatitis and Immunology, National Center for Global Health and Medicine, Chiba, 272-8516, Japan
| | - Takuro Furusawa
- Graduate School of Asian and African Area Studies, Kyoto University, Kyoto, 606-8501, Japan
| | - Kazumi Natsuhara
- Department of International Health and Nursing, Faculty of Nursing, Toho University, Tokyo, 143-0015, Japan
| | - Taro Yamauchi
- Faculty of Health Sciences, Hokkaido University, Sapporo, 060-0812, Japan
| | - Minato Nakazawa
- Graduate School of Health Sciences, Kobe University, Kobe, 654-0142, Japan
| | - Takafumi Ishida
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, 113-0033, Japan
| | - Tsukasa Inaoka
- Department of Human Ecology, Faculty of Agriculture, Saga University, Saga, 840-8502, Japan
| | - Yasuhiro Matsumura
- Faculty of Health and Nutrition, Bunkyo University, Chigasaki, 253-8550, Japan
| | | | - Jun Ohashi
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, 113-0033, Japan.
| |
Collapse
|
12
|
Weller CA, Tilk S, Rajpurohit S, Bergland AO. Accurate, ultra-low coverage genome reconstruction and association studies in Hybrid Swarm mapping populations. G3-GENES GENOMES GENETICS 2021; 11:6156828. [PMID: 33677482 PMCID: PMC8759814 DOI: 10.1093/g3journal/jkab062] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Accepted: 02/19/2021] [Indexed: 11/27/2022]
Abstract
Genetic association studies seek to uncover the link between genotype and phenotype, and often utilize inbred reference panels as a replicable source of genetic variation. However, inbred reference panels can differ substantially from wild populations in their genotypic distribution, patterns of linkage-disequilibrium, and nucleotide diversity. As a result, associations discovered using inbred reference panels may not reflect the genetic basis of phenotypic variation in natural populations. To address this problem, we evaluated a mapping population design where dozens to hundreds of inbred lines are outbred for few generations, which we call the Hybrid Swarm. The Hybrid Swarm approach has likely remained underutilized relative to pre-sequenced inbred lines due to the costs of genome-wide genotyping. To reduce sequencing costs and make the Hybrid Swarm approach feasible, we developed a computational pipeline that reconstructs accurate whole genomes from ultra-low-coverage (0.05X) sequence data in Hybrid Swarm populations derived from ancestors with phased haplotypes. We evaluate reconstructions using genetic variation from the Drosophila Genetic Reference Panel as well as variation from neutral simulations. We compared the power and precision of Genome-Wide Association Studies using the Hybrid Swarm, inbred lines, recombinant inbred lines (RILs), and highly outbred populations across a range of allele frequencies, effect sizes, and genetic architectures. Our simulations show that these different mapping panels vary in their power and precision, largely depending on the architecture of the trait. The Hybrid Swam and RILs outperform inbred lines for quantitative traits, but not for monogenic ones. Taken together, our results demonstrate the feasibility of the Hybrid Swarm as a cost-effective method of fine-scale genetic mapping.
Collapse
Affiliation(s)
- Cory A Weller
- Department of Biology, University of Virginia, Charlottesville, VA 22904, USA
| | - Susanne Tilk
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| | - Subhash Rajpurohit
- Department of Biological and Life Sciences, Ahmedabad University, Ahmedabad 380009, India
| | - Alan O Bergland
- Department of Biology, University of Virginia, Charlottesville, VA 22904, USA
| |
Collapse
|
13
|
Schaefer NK, Shapiro B, Green RE. An ancestral recombination graph of human, Neanderthal, and Denisovan genomes. SCIENCE ADVANCES 2021; 7:eabc0776. [PMID: 34272242 PMCID: PMC8284891 DOI: 10.1126/sciadv.abc0776] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2020] [Accepted: 06/03/2021] [Indexed: 05/02/2023]
Abstract
Many humans carry genes from Neanderthals, a legacy of past admixture. Existing methods detect this archaic hominin ancestry within human genomes using patterns of linkage disequilibrium or direct comparison to Neanderthal genomes. Each of these methods is limited in sensitivity and scalability. We describe a new ancestral recombination graph inference algorithm that scales to large genome-wide datasets and demonstrate its accuracy on real and simulated data. We then generate a genome-wide ancestral recombination graph including human and archaic hominin genomes. From this, we generate a map within human genomes of archaic ancestry and of genomic regions not shared with archaic hominins either by admixture or incomplete lineage sorting. We find that only 1.5 to 7% of the modern human genome is uniquely human. We also find evidence of multiple bursts of adaptive changes specific to modern humans within the past 600,000 years involving genes related to brain development and function.
Collapse
Affiliation(s)
- Nathan K Schaefer
- Howard Hughes Medical Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Beth Shapiro
- Howard Hughes Medical Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Richard E Green
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA.
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| |
Collapse
|
14
|
Cui R, Tyers AM, Malubhoy ZJ, Wisotsky S, Valdesalici S, Henriette E, Kosakovsky Pond SL, Valenzano DR. Ancestral transoceanic colonization and recent population reduction in a nonannual killifish from the Seychelles archipelago. Mol Ecol 2021; 30:3610-3623. [PMID: 33998095 DOI: 10.1111/mec.15982] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Revised: 04/29/2021] [Accepted: 05/10/2021] [Indexed: 12/28/2022]
Abstract
Whether freshwater fish colonize remote islands following tectonic or transoceanic dispersal remains an evolutionary puzzle. Integrating dating of known tectonic events with phylogenomics and current species distribution, we find that killifish species distribution is not explained by species dispersal by tectonic drift only. Investigating the colonization of a nonannual killifish (golden panchax, Pachypanchax playfairii) on the Seychelle islands, we found genetic support for transoceanic dispersal and experimentally discovered an adaptation to complete tolerance to seawater. At the macroevolutionary scale, despite their long-lasting isolation, nonannual golden panchax show stronger genome-wide purifying selection than annual killifishes from continental Africa. However, progressive decline in effective population size over a more recent timescale has probably led to the segregation of slightly deleterious mutations across golden panchax populations, which represents a potential threat for species preservation in the long term.
Collapse
Affiliation(s)
- Rongfeng Cui
- Max Planck Institute for Biology of Ageing, Cologne, Germany.,School of Ecology, Sun Yat-sen University, Guangzhou, China
| | | | | | - Sadie Wisotsky
- Department of Biology, Institute for Genomics and Evolutionary Medicine, Temple University, Temple, CA, USA
| | | | - Elvina Henriette
- Island Biodiversity Conservation Centre, University of Seychelles, Anse Royale, Mahe, Seychelles
| | - Sergei L Kosakovsky Pond
- Department of Biology, Institute for Genomics and Evolutionary Medicine, Temple University, Temple, CA, USA
| | - Dario Riccardo Valenzano
- Max Planck Institute for Biology of Ageing, Cologne, Germany.,CECAD, University of Cologne, Cologne, Germany
| |
Collapse
|
15
|
Bourgeois YXC, Warren BH. An overview of current population genomics methods for the analysis of whole-genome resequencing data in eukaryotes. Mol Ecol 2021; 30:6036-6071. [PMID: 34009688 DOI: 10.1111/mec.15989] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Revised: 04/26/2021] [Accepted: 05/11/2021] [Indexed: 01/01/2023]
Abstract
Characterizing the population history of a species and identifying loci underlying local adaptation is crucial in functional ecology, evolutionary biology, conservation and agronomy. The constant improvement of high-throughput sequencing techniques has facilitated the production of whole genome data in a wide range of species. Population genomics now provides tools to better integrate selection into a historical framework, and take into account selection when reconstructing demographic history. However, this improvement has come with a profusion of analytical tools that can confuse and discourage users. Such confusion limits the amount of information effectively retrieved from complex genomic data sets, and impairs the diffusion of the most recent analytical tools into fields such as conservation biology. It may also lead to redundancy among methods. To address these isssues, we propose an overview of more than 100 state-of-the-art methods that can deal with whole genome data. We summarize the strategies they use to infer demographic history and selection, and discuss some of their limitations. A website listing these methods is available at www.methodspopgen.com.
Collapse
Affiliation(s)
| | - Ben H Warren
- Institut de Systématique, Evolution, Biodiversité (ISYEB), Muséum National d'Histoire Naturelle, CNRS, Sorbonne Université, EPHE, UA, CP 51, Paris, France
| |
Collapse
|
16
|
Sellinger TPP, Abu-Awad D, Tellier A. Limits and convergence properties of the sequentially Markovian coalescent. Mol Ecol Resour 2021; 21:2231-2248. [PMID: 33978324 DOI: 10.1111/1755-0998.13416] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Revised: 04/19/2021] [Accepted: 04/29/2021] [Indexed: 02/07/2023]
Abstract
Several methods based on the sequentially Markovian coalescent (SMC) make use of full genome sequence data from samples to infer population demographic history including past changes in population size, admixture, migration events and population structure. More recently, the original theoretical framework has been extended to allow the simultaneous estimation of population size changes along with other life history traits such as selfing or seed banking. The latter developments enhance the applicability of SMC methods to nonmodel species. Although convergence proofs have been given using simulated data in a few specific cases, an in-depth investigation of the limitations of SMC methods is lacking. In order to explore such limits, we first develop a tool inferring the best case convergence of SMC methods assuming the true underlying coalescent genealogies are known. This tool can be used to quantify the amount and type of information that can be confidently retrieved from given data sets prior to the analysis of the real data. Second, we assess the inference accuracy when the assumptions of SMC approaches are violated due to departures from the model, namely the presence of transposable elements, variable recombination and mutation rates along the sequence, and SNP calling errors. Third, we deliver a new interpretation of SMC methods by highlighting the importance of the transition matrix, which we argue can be used as a set of summary statistics in other statistical inference methods, uncoupling the SMC from hidden Markov models (HMMs). We finally offer recommendations to better apply SMC methods and build adequate data sets under budget constraints.
Collapse
Affiliation(s)
| | - Diala Abu-Awad
- Department of Life Science Systems, Technical University of Munich, Munchen, Germany
| | - Aurélien Tellier
- Department of Life Science Systems, Technical University of Munich, Munchen, Germany
| |
Collapse
|
17
|
Mualim K, Theunert C, Slatkin M. Estimation of coalescence probabilities and population divergence times from SNP data. Heredity (Edinb) 2021; 127:1-9. [PMID: 33934123 PMCID: PMC8249664 DOI: 10.1038/s41437-021-00435-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2020] [Revised: 04/07/2021] [Accepted: 04/08/2021] [Indexed: 12/02/2022] Open
Abstract
We present a method called the G(A|B) method for estimating coalescence probabilities within population lineages from genome sequences when one individual is sampled from each population. Population divergence times can be estimated from these coalescence probabilities if additional assumptions about the history of population sizes are made. Our method is based on a method presented by Rasmussen et al. (2014) to test whether an archaic genome is from a population directly ancestral to a present-day population. The G(A|B) method does not require distinguishing ancestral from derived alleles or assumptions about demographic history before population divergence. We discuss the relationship of our method to two similar methods, one introduced by Green et al. (2010) and called the F(A|B) method and the other introduced by Schlebusch et al. (2017) and called the TT method. When our method is applied to individuals from three or more populations, it provides a test of whether the population history is treelike because coalescence probabilities are additive on a tree. We illustrate the use of our method by applying it to three high-coverage archaic genomes, two Neanderthals (Vindija and Altai) and a Denisovan.
Collapse
Affiliation(s)
- Kristy Mualim
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Christoph Theunert
- Department of Integrative Biology, University of California, Berkeley, CA, USA.,mewedo Ltd., Leipzig, Germany
| | - Montgomery Slatkin
- Department of Integrative Biology, University of California, Berkeley, CA, USA.
| |
Collapse
|
18
|
Liu X. Human Prehistoric Demography Revealed by the Polymorphic Pattern of CpG Transitions. Mol Biol Evol 2021; 37:2691-2698. [PMID: 32369585 DOI: 10.1093/molbev/msaa112] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
The prehistoric demography of human populations is an essential piece of information for illustrating our evolution. Despite its importance and the advancement of ancient DNA studies, our knowledge of human evolution is still limited, which is also the case for relatively recent population dynamics during and around the Holocene. Here, we inferred detailed demographic histories from 1 to 40 ka for 24 population samples using an improved model-flexible method with 36 million genome-wide noncoding CpG sites. Our results showed many population growth events that were likely due to the Neolithic Revolution (i.e., the shift from hunting and gathering to agriculture and settlement). Our results help to provide a clearer picture of human prehistoric demography, confirming the significant impact of agriculture on population expansion, and provide new hypotheses and directions for future research.
Collapse
Affiliation(s)
- Xiaoming Liu
- USF Genomics & College of Public Health, University of South Florida, Tampa, FL
| |
Collapse
|
19
|
Henderson D, Zhu S(J, Cole CB, Lunter G. Demographic inference from multiple whole genomes using a particle filter for continuous Markov jump processes. PLoS One 2021; 16:e0247647. [PMID: 33651801 PMCID: PMC7924771 DOI: 10.1371/journal.pone.0247647] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Accepted: 02/10/2021] [Indexed: 12/12/2022] Open
Abstract
Demographic events shape a population's genetic diversity, a process described by the coalescent-with-recombination model that relates demography and genetics by an unobserved sequence of genealogies along the genome. As the space of genealogies over genomes is large and complex, inference under this model is challenging. Formulating the coalescent-with-recombination model as a continuous-time and -space Markov jump process, we develop a particle filter for such processes, and use waypoints that under appropriate conditions allow the problem to be reduced to the discrete-time case. To improve inference, we generalise the Auxiliary Particle Filter for discrete-time models, and use Variational Bayes to model the uncertainty in parameter estimates for rare events, avoiding biases seen with Expectation Maximization. Using real and simulated genomes, we show that past population sizes can be accurately inferred over a larger range of epochs than was previously possible, opening the possibility of jointly analyzing multiple genomes under complex demographic models. Code is available at https://github.com/luntergroup/smcsmc.
Collapse
Affiliation(s)
| | - Sha (Joe) Zhu
- Wellcome Centre for Human Genetics, Oxford, United Kingdom
- Big Data Institute, Oxford, United Kingdom
| | - Christopher B. Cole
- MRC Weatherall Institute of Molecular Medicine, John Radcliffe Hospital, Headington, Oxford, United Kingdom
| | - Gerton Lunter
- MRC Weatherall Institute of Molecular Medicine, John Radcliffe Hospital, Headington, Oxford, United Kingdom
- Department of Epidemiology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| |
Collapse
|
20
|
Gu Z, Pan S, Lin Z, Hu L, Dai X, Chang J, Xue Y, Su H, Long J, Sun M, Ganusevich S, Sokolov V, Sokolov A, Pokrovsky I, Ji F, Bruford MW, Dixon A, Zhan X. Climate-driven flyway changes and memory-based long-distance migration. Nature 2021; 591:259-264. [PMID: 33658718 DOI: 10.1038/s41586-021-03265-0] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Accepted: 01/20/2021] [Indexed: 01/31/2023]
Abstract
Millions of migratory birds occupy seasonally favourable breeding grounds in the Arctic1, but we know little about the formation, maintenance and future of the migration routes of Arctic birds and the genetic determinants of migratory distance. Here we established a continental-scale migration system that used satellite tracking to follow 56 peregrine falcons (Falco peregrinus) from 6 populations that breed in the Eurasian Arctic, and resequenced 35 genomes from 4 of these populations. The breeding populations used five migration routes across Eurasia, which were probably formed by longitudinal and latitudinal shifts in their breeding grounds during the transition from the Last Glacial Maximum to the Holocene epoch. Contemporary environmental divergence between the routes appears to maintain their distinctiveness. We found that the gene ADCY8 is associated with population-level differences in migratory distance. We investigated the regulatory mechanism of this gene, and found that long-term memory was the most likely selective agent for divergence in ADCY8 among the peregrine populations. Global warming is predicted to influence migration strategies and diminish the breeding ranges of peregrine populations of the Eurasian Arctic. Harnessing ecological interactions and evolutionary processes to study climate-driven changes in migration can facilitate the conservation of migratory birds.
Collapse
Affiliation(s)
- Zhongru Gu
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.,Cardiff University-Institute of Zoology Joint Laboratory for Biocomplexity Research, Chinese Academy of Sciences, Beijing, China.,University of the Chinese Academy of Sciences, Beijing, China
| | - Shengkai Pan
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.,Cardiff University-Institute of Zoology Joint Laboratory for Biocomplexity Research, Chinese Academy of Sciences, Beijing, China
| | - Zhenzhen Lin
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.,Cardiff University-Institute of Zoology Joint Laboratory for Biocomplexity Research, Chinese Academy of Sciences, Beijing, China
| | - Li Hu
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.,Cardiff University-Institute of Zoology Joint Laboratory for Biocomplexity Research, Chinese Academy of Sciences, Beijing, China.,University of the Chinese Academy of Sciences, Beijing, China
| | - Xiaoyang Dai
- School of Biological Sciences, University of Bristol, Bristol, UK
| | - Jiang Chang
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing, China
| | - Yuanchao Xue
- Key Laboratory of RNA Biology, CAS Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
| | - Han Su
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.,Cardiff University-Institute of Zoology Joint Laboratory for Biocomplexity Research, Chinese Academy of Sciences, Beijing, China.,University of the Chinese Academy of Sciences, Beijing, China
| | - Juan Long
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.,Cardiff University-Institute of Zoology Joint Laboratory for Biocomplexity Research, Chinese Academy of Sciences, Beijing, China.,University of the Chinese Academy of Sciences, Beijing, China
| | - Mengru Sun
- University of the Chinese Academy of Sciences, Beijing, China.,Key Laboratory of RNA Biology, CAS Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
| | | | - Vasiliy Sokolov
- Institute of Plant and Animal Ecology, Ural Division Russian Academy of Sciences, Ekaterinburg, Russia
| | - Aleksandr Sokolov
- Arctic Research Station of the Institute of Plant and Animal Ecology, Ural Division Russian Academy of Sciences, Labytnangi, Russia
| | - Ivan Pokrovsky
- Arctic Research Station of the Institute of Plant and Animal Ecology, Ural Division Russian Academy of Sciences, Labytnangi, Russia.,Department of Migration, Max Planck Institute of Animal Behavior, Radolfzell, Germany.,Laboratory of Ornithology, Institute of Biological Problems of the North FEB RAS, Magadan, Russia
| | - Fen Ji
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Michael W Bruford
- Cardiff University-Institute of Zoology Joint Laboratory for Biocomplexity Research, Chinese Academy of Sciences, Beijing, China.,School of Biosciences and Sustainable Places Institute, Cardiff University, Cardiff, UK
| | - Andrew Dixon
- Cardiff University-Institute of Zoology Joint Laboratory for Biocomplexity Research, Chinese Academy of Sciences, Beijing, China.,Emirates Falconers' Club, Abu Dhabi, United Arab Emirates.,Reneco International Wildlife Consultants, Abu Dhabi, United Arab Emirates.,International Wildlife Consultants, Carmarthen, UK
| | - Xiangjiang Zhan
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China. .,Cardiff University-Institute of Zoology Joint Laboratory for Biocomplexity Research, Chinese Academy of Sciences, Beijing, China. .,University of the Chinese Academy of Sciences, Beijing, China. .,Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China.
| |
Collapse
|
21
|
Nell LA. jackalope
: A swift, versatile phylogenomic and high‐throughput sequencing simulator. Mol Ecol Resour 2020; 20:1132-1140. [DOI: 10.1111/1755-0998.13173] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2019] [Revised: 02/28/2020] [Accepted: 04/15/2020] [Indexed: 11/30/2022]
Affiliation(s)
- Lucas A. Nell
- Department of Integrative Biology University of Wisconsin Madison WI USA
| |
Collapse
|
22
|
Linck EB, Celi JE, Sheldon KS. Panmixia across elevation in thermally sensitive Andean dung beetles. Ecol Evol 2020; 10:4143-4155. [PMID: 32489637 PMCID: PMC7244805 DOI: 10.1002/ece3.6185] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Accepted: 02/20/2020] [Indexed: 11/17/2022] Open
Abstract
Janzen's seasonality hypothesis predicts that organisms inhabiting environments with limited climatic variability will evolve a reduced thermal tolerance breadth compared with organisms experiencing greater climatic variability. In turn, narrow tolerance breadth may select against dispersal across strong temperature gradients, such as those found across elevation. This can result in narrow elevational ranges and generate a pattern of isolation by environment or neutral genetic differentiation correlated with environmental variables that are independent of geographic distance. We tested for signatures of isolation by environment across elevation using genome-wide SNP data from five species of Andean dung beetles (subfamily Scarabaeinae) with well-characterized, narrow thermal physiologies, and narrow elevational distributions. Contrary to our expectations, we found no evidence of population genetic structure associated with elevation and little signal of isolation by environment. Further, elevational ranges for four of five species appear to be at equilibrium and show no decay of genetic diversity at range limits. Taken together, these results suggest physiological constraints on dispersal may primarily operate outside of a stable realized niche and point to a lower bound on the spatial scale of local adaptation.
Collapse
Affiliation(s)
- Ethan B. Linck
- Department of Ecology & Evolutionary BiologyUniversity of Tennessee, KnoxvilleKnoxvilleTNUSA
| | - Jorge E. Celi
- Biogeography and Spatial Ecology Research GroupUniversidad Regional Amazónica IkiamTenaEcuador
| | - Kimberly S. Sheldon
- Department of Ecology & Evolutionary BiologyUniversity of Tennessee, KnoxvilleKnoxvilleTNUSA
| |
Collapse
|
23
|
Isshiki M, Naka I, Watanabe Y, Nishida N, Kimura R, Furusawa T, Natsuhara K, Yamauchi T, Nakazawa M, Ishida T, Eddie R, Ohtsuka R, Ohashi J. Admixture and natural selection shaped genomes of an Austronesian-speaking population in the Solomon Islands. Sci Rep 2020; 10:6872. [PMID: 32327716 PMCID: PMC7181741 DOI: 10.1038/s41598-020-62866-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Accepted: 03/16/2020] [Indexed: 11/24/2022] Open
Abstract
People in the Solomon Islands today are considered to have derived from Asian- and Papuan-related ancestors. Papuan-related ancestors colonized Near Oceania about 47,000 years ago, and Asian-related ancestors were Austronesian (AN)-speaking population, called Lapita, who migrated from Southeast Asia about 3,500 years ago. These two ancestral populations admixed in Near Oceania before the expansion of Lapita people into Remote Oceania. To understand the impact of the admixture on the adaptation of AN-speaking Melanesians in Near Oceania, we performed the genome-wide single nucleotide polymorphism (SNP) analysis of 21 individuals from Munda, the main town of the New Georgia Islands in the western Solomon Islands. Population samples from Munda were genetically similar to other Solomon Island population samples. The analysis of genetic contribution from the two different ancestries to the Munda genome revealed significantly higher proportions of Asian- and Papuan-related ancestries in the region containing the annexin A1 (ANXA1) gene (Asian component > 82.6%) and in the human leukocyte antigen (HLA) class II region (Papuan component > 85.4%), respectively. These regions were suspected to have undergone natural selection since the time of admixture. Our results suggest that admixture had affected adaptation of AN-speaking Melanesians in the Solomon Islands.
Collapse
Affiliation(s)
- Mariko Isshiki
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, 113-0033, Japan
| | - Izumi Naka
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, 113-0033, Japan
| | - Yusuke Watanabe
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, 113-0033, Japan
| | - Nao Nishida
- Genome Medical Science Project, Research Center for Hepatitis and Immunology, National Center for Global Health and Medicine, Chiba, 272-8516, Japan
| | - Ryosuke Kimura
- Department of Human Biology and Anatomy, Graduate School of Medicine, University of the Ryukyus, Nishihara, 903-0125, Japan
| | - Takuro Furusawa
- Graduate School of Asian and African Area Studies, Kyoto University, Kyoto, 606-8501, Japan
| | - Kazumi Natsuhara
- Department of International Health and Nursing, Faculty of Nursing, Toho University, Tokyo, 143-8540, Japan
| | - Taro Yamauchi
- Faculty of Health Sciences, Hokkaido University, Sapporo, 060-0812, Japan
| | - Minato Nakazawa
- Graduate School of Health Sciences, Kobe University, Kobe, 654-0142, Japan
| | - Takafumi Ishida
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, 113-0033, Japan
| | - Ricky Eddie
- National Gizo Hospital, Ministry of Health and Medical Services, P.O. Box 36, Gizo, Solomon Islands
| | | | - Jun Ohashi
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, 113-0033, Japan.
| |
Collapse
|
24
|
Sellinger TPP, Abu Awad D, Moest M, Tellier A. Inference of past demography, dormancy and self-fertilization rates from whole genome sequence data. PLoS Genet 2020; 16:e1008698. [PMID: 32251472 PMCID: PMC7173940 DOI: 10.1371/journal.pgen.1008698] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Revised: 04/21/2020] [Accepted: 02/24/2020] [Indexed: 02/04/2023] Open
Abstract
Several methods based on the Sequential Markovian coalescence (SMC) have been developed that make use of genome sequence data to uncover population demographic history, which is of interest in its own right and is a key requirement to generate a null model for selection tests. While these methods can be applied to all possible kind of species, the underlying assumptions are sexual reproduction in each generation and non-overlapping generations. However, in many plants, invertebrates, fungi and other taxa, those assumptions are often violated due to different ecological and life history traits, such as self-fertilization or long term dormant structures (seed or egg-banking). We develop a novel SMC-based method to infer 1) the rates/parameters of dormancy and of self-fertilization, and 2) the populations' past demographic history. Using simulated data sets, we demonstrate the accuracy of our method for a wide range of demographic scenarios and for sequence lengths from one to 30 Mb using four sampled genomes. Finally, we apply our method to a Swedish and a German population of Arabidopsis thaliana demonstrating a selfing rate of ca. 0.87 and the absence of any detectable seed-bank. In contrast, we show that the water flea Daphnia pulex exhibits a long lived egg-bank of three to 18 generations. In conclusion, we here present a novel method to infer accurate demographies and life-history traits for species with selfing and/or seed/egg-banks. Finally, we provide recommendations for the use of SMC-based methods for non-model organisms, highlighting the importance of the per site and the effective ratios of recombination over mutation.
Collapse
Affiliation(s)
| | - Diala Abu Awad
- Department of Population Genetics, Technische Universitaet Muenchen, Freising, Germany
| | - Markus Moest
- Department of Ecology, University of Innsbruck, Innsbruck, Austria
| | - Aurélien Tellier
- Department of Population Genetics, Technische Universitaet Muenchen, Freising, Germany
| |
Collapse
|
25
|
Bergström A, McCarthy SA, Hui R, Almarri MA, Ayub Q, Danecek P, Chen Y, Felkel S, Hallast P, Kamm J, Blanché H, Deleuze JF, Cann H, Mallick S, Reich D, Sandhu MS, Skoglund P, Scally A, Xue Y, Durbin R, Tyler-Smith C. Insights into human genetic variation and population history from 929 diverse genomes. Science 2020; 367:eaay5012. [PMID: 32193295 PMCID: PMC7115999 DOI: 10.1126/science.aay5012] [Citation(s) in RCA: 353] [Impact Index Per Article: 88.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2019] [Accepted: 02/04/2020] [Indexed: 12/17/2022]
Abstract
Genome sequences from diverse human groups are needed to understand the structure of genetic variation in our species and the history of, and relationships between, different populations. We present 929 high-coverage genome sequences from 54 diverse human populations, 26 of which are physically phased using linked-read sequencing. Analyses of these genomes reveal an excess of previously undocumented common genetic variation private to southern Africa, central Africa, Oceania, and the Americas, but an absence of such variants fixed between major geographical regions. We also find deep and gradual population separations within Africa, contrasting population size histories between hunter-gatherer and agriculturalist groups in the past 10,000 years, and a contrast between single Neanderthal but multiple Denisovan source populations contributing to present-day human populations.
Collapse
Affiliation(s)
- Anders Bergström
- Wellcome Sanger Institute, Hinxton CB10 1SA, UK.
- The Francis Crick Institute, London NW1 1AT, UK
| | - Shane A McCarthy
- Wellcome Sanger Institute, Hinxton CB10 1SA, UK
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, UK
| | - Ruoyun Hui
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, UK
- McDonald Institute for Archaeological Research, University of Cambridge, Cambridge CB2 3ER, UK
| | | | - Qasim Ayub
- Wellcome Sanger Institute, Hinxton CB10 1SA, UK
- Monash University Malaysia Genomics Facility, Tropical Medicine and Biology Multidisciplinary Platform, 47500 Bandar Sunway, Malaysia
- School of Science, Monash University Malaysia, 47500 Bandar Sunway, Malaysia
| | | | - Yuan Chen
- Wellcome Sanger Institute, Hinxton CB10 1SA, UK
| | - Sabine Felkel
- Wellcome Sanger Institute, Hinxton CB10 1SA, UK
- Institute of Animal Breeding and Genetics, University of Veterinary Medicine Vienna, Vienna 1210, Austria
| | - Pille Hallast
- Wellcome Sanger Institute, Hinxton CB10 1SA, UK
- Institute of Biomedicine and Translational Medicine, University of Tartu, Tartu 50411, Estonia
| | - Jack Kamm
- Wellcome Sanger Institute, Hinxton CB10 1SA, UK
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, UK
- Chan Zuckerberg Biohub, San Francisco, CA 94158, USA
| | - Hélène Blanché
- Centre d'Etude du Polymorphisme Humain, Fondation Jean Dausset, 75010 Paris, France
- GENMED Labex, ANR-10-LABX-0013 Paris, France
| | - Jean-François Deleuze
- Centre d'Etude du Polymorphisme Humain, Fondation Jean Dausset, 75010 Paris, France
- GENMED Labex, ANR-10-LABX-0013 Paris, France
| | - Howard Cann
- Centre d'Etude du Polymorphisme Humain, Fondation Jean Dausset, 75010 Paris, France
| | - Swapan Mallick
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - David Reich
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Manjinder S Sandhu
- Wellcome Sanger Institute, Hinxton CB10 1SA, UK
- Department of Medicine, University of Cambridge, Cambridge CB2 0QQ, UK
| | | | - Aylwyn Scally
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, UK
| | - Yali Xue
- Wellcome Sanger Institute, Hinxton CB10 1SA, UK
| | - Richard Durbin
- Wellcome Sanger Institute, Hinxton CB10 1SA, UK
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, UK
| | | |
Collapse
|
26
|
Barbieri C, Barquera R, Arias L, Sandoval JR, Acosta O, Zurita C, Aguilar-Campos A, Tito-Álvarez AM, Serrano-Osuna R, Gray RD, Mafessoni F, Heggarty P, Shimizu KK, Fujita R, Stoneking M, Pugach I, Fehren-Schmitz L. The Current Genomic Landscape of Western South America: Andes, Amazonia, and Pacific Coast. Mol Biol Evol 2020; 36:2698-2713. [PMID: 31350885 PMCID: PMC6878948 DOI: 10.1093/molbev/msz174] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Studies of Native South American genetic diversity have helped to shed light on the peopling and differentiation of the continent, but available data are sparse for the major ecogeographic domains. These include the Pacific Coast, a potential early migration route; the Andes, home to the most expansive complex societies and to one of the most widely spoken indigenous language families of the continent (Quechua); and Amazonia, with its understudied population structure and rich cultural diversity. Here, we explore the genetic structure of 176 individuals from these three domains, genotyped with the Affymetrix Human Origins array. We infer multiple sources of ancestry within the Native American ancestry component; one with clear predominance on the Coast and in the Andes, and at least two distinct substrates in neighboring Amazonia, including a previously undetected ancestry characteristic of northern Ecuador and Colombia. Amazonian populations are also involved in recent gene-flow with each other and across ecogeographic domains, which does not accord with the traditional view of small, isolated groups. Long-distance genetic connections between speakers of the same language family suggest that indigenous languages here were spread not by cultural contact alone. Finally, Native American populations admixed with post-Columbian European and African sources at different times, with few cases of prolonged isolation. With our results we emphasize the importance of including understudied regions of the continent in high-resolution genetic studies, and we illustrate the potential of SNP chip arrays for informative regional-scale analysis.
Collapse
Affiliation(s)
- Chiara Barbieri
- Department of Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Jena, Germany.,Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland
| | - Rodrigo Barquera
- Department of Archaeogenetics, Max Planck Institute for the Science of Human History, Jena, Germany
| | - Leonardo Arias
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - José R Sandoval
- Centro de Investigación de Genética y Biología Molecular (CIGBM), Universidad de San Martín de Porres, Lima, Peru
| | - Oscar Acosta
- Centro de Investigación de Genética y Biología Molecular (CIGBM), Universidad de San Martín de Porres, Lima, Peru
| | - Camilo Zurita
- Cátedra de Inmunología, Facultad de Medicina, Universidad Central del Ecuador, Quito, Ecuador.,Zurita & Zurita Laboratorios, Unidad de Investigaciones en Biomedicina, Quito, Ecuador
| | - Abraham Aguilar-Campos
- Clinical Laboratory, Unidad Médica de Alta Especialidad (UMAE) # 2, Instituto Mexicano del Seguro Social (IMSS), Ciudad Obregón, Sonora, Mexico
| | - Ana M Tito-Álvarez
- Carrera de Enfermería, Facultad de Ciencias de la Salud, Universidad de Las Américas, Quito, Ecuador
| | - Ricardo Serrano-Osuna
- Clinical Laboratory, Unidad Médica de Alta Especialidad (UMAE) # 2, Instituto Mexicano del Seguro Social (IMSS), Ciudad Obregón, Sonora, Mexico
| | - Russell D Gray
- Department of Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Jena, Germany
| | - Fabrizio Mafessoni
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Paul Heggarty
- Department of Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Jena, Germany
| | - Kentaro K Shimizu
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland
| | - Ricardo Fujita
- Centro de Investigación de Genética y Biología Molecular (CIGBM), Universidad de San Martín de Porres, Lima, Peru
| | - Mark Stoneking
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Irina Pugach
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Lars Fehren-Schmitz
- UCSC Paleogenomics, Department of Anthropology, University of California, Santa Cruz, CA.,Genomics Institute, University of California, Santa Cruz, CA
| |
Collapse
|
27
|
Cottin A, Penaud B, Glaszmann JC, Yahiaoui N, Gautier M. Simulation-Based Evaluation of Three Methods for Local Ancestry Deconvolution of Non-model Crop Species Genomes. G3 (BETHESDA, MD.) 2020; 10:569-579. [PMID: 31862786 PMCID: PMC7003078 DOI: 10.1534/g3.119.400873] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/08/2019] [Accepted: 12/12/2019] [Indexed: 11/30/2022]
Abstract
Hybridizations between species and subspecies represented major steps in the history of many crop species. Such events generally lead to genomes with mosaic patterns of chromosomal segments of various origins that may be assessed by local ancestry inference methods. However, these methods have mainly been developed in the context of human population genetics with implicit assumptions that may not always fit plant models. The purpose of this study was to evaluate the suitability of three state-of-the-art inference methods (SABER, ELAI and WINPOP) for local ancestry inference under scenarios that can be encountered in plant species. For this, we developed an R package to simulate genotyping data under such scenarios. The tested inference methods performed similarly well as far as representatives of source populations were available. As expected, the higher the level of differentiation between ancestral source populations and the lower the number of generations since admixture, the more accurate were the results. Interestingly, the accuracy of the methods was only marginally affected by i) the number of ancestries (up to six tested); ii) the sample design (i.e., unbalanced representation of source populations); and iii) the reproduction mode (e.g., selfing, vegetative propagation). If a source population was not represented in the data set, no bias was observed in inference accuracy for regions originating from represented sources and regions from the missing source were assigned differently depending on the methods. Overall, the selected ancestry inference methods may be used for crop plant analysis if all ancestral sources are known.
Collapse
Affiliation(s)
- Aurélien Cottin
- CIRAD, UMR AGAP, F-34398 Montpellier, France
- AGAP, Univ. Montpellier, CIRAD, INRAE, Montpellier SupAgro, Montpellier, France, and
| | - Benjamin Penaud
- CIRAD, UMR AGAP, F-34398 Montpellier, France
- AGAP, Univ. Montpellier, CIRAD, INRAE, Montpellier SupAgro, Montpellier, France, and
| | - Jean-Christophe Glaszmann
- CIRAD, UMR AGAP, F-34398 Montpellier, France
- AGAP, Univ. Montpellier, CIRAD, INRAE, Montpellier SupAgro, Montpellier, France, and
| | - Nabila Yahiaoui
- CIRAD, UMR AGAP, F-34398 Montpellier, France,
- AGAP, Univ. Montpellier, CIRAD, INRAE, Montpellier SupAgro, Montpellier, France, and
| | | |
Collapse
|
28
|
Abstract
Coalescent simulation is a fundamental tool in modern population genetics. The msprime library provides unprecedented scalability in terms of both the simulations that can be performed and the efficiency with which the results can be processed. We show how coalescent models for population structure and demography can be constructed using a simple Python API, as well as how we can process the results of such simulations to efficiently calculate statistics of interest. We illustrate msprime's flexibility by implementing a simple (but functional) approximate Bayesian computation inference method in just a few tens of lines of code.
Collapse
Affiliation(s)
- Jerome Kelleher
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK.
| | - Konrad Lohse
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK
| |
Collapse
|
29
|
Lunter G. Haplotype matching in large cohorts using the Li and Stephens model. Bioinformatics 2019; 35:798-806. [PMID: 30165547 PMCID: PMC6394399 DOI: 10.1093/bioinformatics/bty735] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2018] [Revised: 05/16/2018] [Accepted: 08/23/2018] [Indexed: 12/28/2022] Open
Abstract
Motivation The Li and Stephens model, which approximates the coalescent describing the pattern of variation in a population, underpins a range of key tools and results in genetics. Although highly efficient compared to the coalescent, standard implementations of this model still cannot deal with the very large reference cohorts that are starting to become available, and practical implementations use heuristics to achieve reasonable runtimes. Results Here I describe a new, exact algorithm (‘fastLS’) that implements the Li and Stephens model and achieves runtimes independent of the size of the reference cohort. Key to achieving this runtime is the use of the Burrows-Wheeler transform, allowing the algorithm to efficiently identify partial haplotype matches across a cohort. I show that the proposed data structure is very similar to, and generalizes, Durbin’s positional Burrows-Wheeler transform.
Collapse
Affiliation(s)
- Gerton Lunter
- University of Oxford, Wellcome Centre for Human Genetics, Oxford, UK
| |
Collapse
|
30
|
V. Barroso G, Puzović N, Dutheil JY. Inference of recombination maps from a single pair of genomes and its application to ancient samples. PLoS Genet 2019; 15:e1008449. [PMID: 31725722 PMCID: PMC6879166 DOI: 10.1371/journal.pgen.1008449] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Revised: 11/26/2019] [Accepted: 09/30/2019] [Indexed: 12/11/2022] Open
Abstract
Understanding the causes and consequences of recombination landscape evolution is a fundamental goal in genetics that requires recombination maps from across the tree of life. Such maps can be obtained from population genomic datasets, but require large sample sizes. Alternative methods are therefore necessary to research organisms where such datasets cannot be generated easily, such as non-model or ancient species. Here we extend the sequentially Markovian coalescent model to jointly infer demography and the spatial variation in recombination rate. Using extensive simulations and sequence data from humans, fruit-flies and a fungal pathogen, we demonstrate that iSMC accurately infers recombination maps under a wide range of scenarios-remarkably, even from a single pair of unphased genomes. We exploit this possibility and reconstruct the recombination maps of ancient hominins. We report that the ancient and modern maps are correlated in a manner that reflects the established phylogeny of Neanderthals, Denisovans, and modern human populations.
Collapse
Affiliation(s)
- Gustavo V. Barroso
- Max Planck Institute for Evolutionary Biology, Department of Evolutionary Genetics, August-Thienemann-Straße , Plön–GERMANY
- * E-mail:
| | - Nataša Puzović
- Max Planck Institute for Evolutionary Biology, Department of Evolutionary Genetics, August-Thienemann-Straße , Plön–GERMANY
| | - Julien Y. Dutheil
- Max Planck Institute for Evolutionary Biology, Department of Evolutionary Genetics, August-Thienemann-Straße , Plön–GERMANY
| |
Collapse
|
31
|
Rogers AR. Legofit: estimating population history from genetic data. BMC Bioinformatics 2019; 20:526. [PMID: 31660852 PMCID: PMC6819480 DOI: 10.1186/s12859-019-3154-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2019] [Accepted: 10/14/2019] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Our current understanding of archaic admixture in humans relies on statistical methods with large biases, whose magnitudes depend on the sizes and separation times of ancestral populations. To avoid these biases, it is necessary to estimate these parameters simultaneously with those describing admixture. Genetic estimates of population histories also confront problems of statistical identifiability: different models or different combinations of parameter values may fit the data equally well. To deal with this problem, we need methods of model selection and model averaging, which are lacking from most existing software. RESULTS The Legofit software package allows simultaneous estimation of parameters describing admixture, and the sizes and separation times of ancestral populations. It includes facilities for data manipulation, estimation, analysis of residuals, model selection, and model averaging. CONCLUSIONS Legofit uses genetic data to study the history of a subdivided population. It is unaffected by recent history and can therefore focus on the deep history of population size, subdivision, and admixture. It outperforms several statistical methods that have been widely used to study population history and should be useful in any species for which DNA sequence data is available from several populations.
Collapse
Affiliation(s)
- Alan R Rogers
- Department of Anthropology, University of Utah, Gardner Commons, Salt Lake City, USA.
| |
Collapse
|
32
|
Hey J, Wang K. The effect of undetected recombination on genealogy sampling and inference under an isolation-with-migration model. Mol Ecol Resour 2019; 19:1593-1609. [PMID: 31479562 DOI: 10.1111/1755-0998.13083] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Revised: 07/22/2019] [Accepted: 07/24/2019] [Indexed: 11/30/2022]
Abstract
Many methods for fitting demographic models to data sets of aligned sequences rely upon an assumption that the data have a branching coalescent history without recombination within regions or loci. To mitigate the effects of the failure of this assumption, a common approach is to filter data and sample regions that pass the four-gamete criterion for recombination, an approach that allows data to run, but that is expected to detect only a minority of recombination events. A series of empirical tests of this approach were conducted using computer simulations with and without recombination for a variety of isolation-with-migration (IM) model for two and three populations. Only the IMa3 program was used, but the general results should apply to related genealogy-sampling-based methods for IM models or subsets of IM models. It was found that the details of sampling intervals that pass a four-gamete filter have a moderate effect, and that schemes that use the longest intervals, or that use overlapping intervals, gave poorer results. A simple approach of using a random nonoverlapping interval returned the smallest difference between results with and without recombination, with the mean difference between parameter estimates usually less than 20% of the true value (usually much less). However, the posterior probability distributions for migration rates were flatter with recombination, suggesting that filtering based on the four-gamete criterion, while necessary for methods like these, leads to reduced resolution on migration. A distinct, alternative approach, of using a finite sites mutation model and not filtering the data, performed quite poorly.
Collapse
Affiliation(s)
- Jody Hey
- Center for Computational Genetics and Genomics, Department of Biology, Temple University, Philadelphia, PA, USA
| | - Katherine Wang
- Center for Computational Genetics and Genomics, Department of Biology, Temple University, Philadelphia, PA, USA
| |
Collapse
|
33
|
Zhou Y, Tian X, Browning BL, Browning SR. POPdemog: visualizing population demographic history from simulation scripts. Bioinformatics 2019; 34:2854-2855. [PMID: 29590339 DOI: 10.1093/bioinformatics/bty184] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2017] [Accepted: 03/23/2018] [Indexed: 11/12/2022] Open
Abstract
Summary We present POPdemog, an R package which converts coalescent simulation program input parameters into a visual representation of the demographic model. This package is useful for preparing figures, for checking that demographic simulation parameters have been correctly specified, and for understanding demographic models that other researchers have used to simulate genetic data. The POPdemog package supports the ms, msa, msHot, MaCS, msprime, scrm and Cosi2 programs, and includes options for customizing the output figures. Availability and implementation The POPdemog package and its tutorial can be freely downloaded from https://github.com/YingZhou001/POPdemog. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ying Zhou
- Department of Biostatistics, University of Washington, WA, USA
| | - Xiaowen Tian
- Department of Biostatistics, University of Washington, WA, USA
| | - Brian L Browning
- Department of Biostatistics, University of Washington, WA, USA.,Division of Medical Genetics, Department of Medicine, University of Washington, WA, USA
| | | |
Collapse
|
34
|
Steinrücken M, Kamm J, Spence JP, Song YS. Inference of complex population histories using whole-genome sequences from multiple populations. Proc Natl Acad Sci U S A 2019; 116:17115-17120. [PMID: 31387977 PMCID: PMC6708337 DOI: 10.1073/pnas.1905060116] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
There has been much interest in analyzing genome-scale DNA sequence data to infer population histories, but inference methods developed hitherto are limited in model complexity and computational scalability. Here we present an efficient, flexible statistical method, diCal2, that can use whole-genome sequence data from multiple populations to infer complex demographic models involving population size changes, population splits, admixture, and migration. Applying our method to data from Australian, East Asian, European, and Papuan populations, we find that the population ancestral to Australians and Papuans started separating from East Asians and Europeans about 100,000 y ago, and that the separation of East Asians and Europeans started about 50,000 y ago, with pervasive gene flow between all pairs of populations.
Collapse
Affiliation(s)
- Matthias Steinrücken
- Department of Ecology and Evolution, University of Chicago, Chicago, IL 60637
- Department of Human Genetics, University of Chicago, Chicago, IL 60637
| | - Jack Kamm
- Department of Statistics, University of California, Berkeley, CA 94720
- Chan Zuckerberg Biohub, San Francisco, CA 94158
| | - Jeffrey P Spence
- Computational Biology Graduate Group, University of California, Berkeley, CA 94720
| | - Yun S Song
- Department of Statistics, University of California, Berkeley, CA 94720;
- Chan Zuckerberg Biohub, San Francisco, CA 94158
- Computer Science Division, University of California, Berkeley, CA 94720
| |
Collapse
|
35
|
Platt A, Pivirotto A, Knoblauch J, Hey J. An estimator of first coalescent time reveals selection on young variants and large heterogeneity in rare allele ages among human populations. PLoS Genet 2019; 15:e1008340. [PMID: 31425500 PMCID: PMC6715256 DOI: 10.1371/journal.pgen.1008340] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Revised: 08/29/2019] [Accepted: 08/01/2019] [Indexed: 11/18/2022] Open
Abstract
Allele age has long been a focus of population genetic research, primarily because it can be an important clue to the fitness effects of an allele. By virtue of their effects on fitness, alleles under directional selection are expected to be younger than neutral alleles of the same frequency. We developed a new coalescent-based estimator of a close proxy for allele age, the time when a copy of an allele first shares common ancestry with other chromosomes in a sample not carrying that allele. The estimator performs well, including for the very rarest of alleles that occur just once in a sample, with a bias that is typically negative. The estimator is mostly insensitive to population demography and to factors that can arise in population genomic pipelines, including the statistical phasing of chromosomes. Applications to 1000 Genomes Data and UK10K genome data confirm predictions that singleton alleles that alter proteins are significantly younger than those that do not, with a greater difference in the larger UK10K dataset, as expected. The 1000 Genomes populations varied markedly in their distributions for singleton allele ages, suggesting that these distributions can be used to inform models of demographic history, including recent events that are only revealed by their impacts on the ages of very rare alleles.
Collapse
Affiliation(s)
- Alexander Platt
- Center for Computational Genetics and Genomics, Dept. Biology, Temple University, Philadelphia, Pennsylvania, United States of America
| | - Alyssa Pivirotto
- Center for Computational Genetics and Genomics, Dept. Biology, Temple University, Philadelphia, Pennsylvania, United States of America
| | - Jared Knoblauch
- Center for Computational Genetics and Genomics, Dept. Biology, Temple University, Philadelphia, Pennsylvania, United States of America
| | - Jody Hey
- Center for Computational Genetics and Genomics, Dept. Biology, Temple University, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
36
|
Cui R, Medeiros T, Willemsen D, Iasi LN, Collier GE, Graef M, Reichard M, Valenzano DR. Relaxed Selection Limits Lifespan by Increasing Mutation Load. Cell 2019; 178:385-399.e20. [DOI: 10.1016/j.cell.2019.06.004] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2018] [Revised: 03/18/2019] [Accepted: 06/03/2019] [Indexed: 02/07/2023]
|
37
|
Tonkin-Hill G, Lees JA, Bentley SD, Frost SDW, Corander J. Fast hierarchical Bayesian analysis of population structure. Nucleic Acids Res 2019; 47:5539-5549. [PMID: 31076776 PMCID: PMC6582336 DOI: 10.1093/nar/gkz361] [Citation(s) in RCA: 119] [Impact Index Per Article: 23.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2019] [Accepted: 04/29/2019] [Indexed: 12/16/2022] Open
Abstract
We present fastbaps, a fast solution to the genetic clustering problem. Fastbaps rapidly identifies an approximate fit to a Dirichlet process mixture model (DPM) for clustering multilocus genotype data. Our efficient model-based clustering approach is able to cluster datasets 10-100 times larger than the existing model-based methods, which we demonstrate by analyzing an alignment of over 110 000 sequences of HIV-1 pol genes. We also provide a method for rapidly partitioning an existing hierarchy in order to maximize the DPM model marginal likelihood, allowing us to split phylogenetic trees into clades and subclades using a population genomic model. Extensive tests on simulated data as well as a diverse set of real bacterial and viral datasets show that fastbaps provides comparable or improved solutions to previous model-based methods, while being significantly faster. The method is made freely available under an open source MIT licence as an easy to use R package at https://github.com/gtonkinhill/fastbaps.
Collapse
Affiliation(s)
- Gerry Tonkin-Hill
- Parasites and Microbes, Wellcome Sanger Institute, Cambridge, CB10 1SA, UK
| | - John A Lees
- Department of Microbiology, New York University School of Medicine, NY 10016, USA
| | - Stephen D Bentley
- Parasites and Microbes, Wellcome Sanger Institute, Cambridge, CB10 1SA, UK
| | - Simon D W Frost
- Department of Veterinary Medicine, University of Cambridge, Cambridge, CB3 0ES, UK
- The Alan Turing Institute, London, NW1 2DB, UK
| | - Jukka Corander
- Parasites and Microbes, Wellcome Sanger Institute, Cambridge, CB10 1SA, UK
- Department of Biostatistics, University of Oslo, Blindern 0317, Norway
- Helsinki Institute for Information Technology HIIT, Department of Mathematics and Statistics, University of Helsinki, Aalto FI-00076, Finland
| |
Collapse
|
38
|
Peyrégne S, Slon V, Mafessoni F, de Filippo C, Hajdinjak M, Nagel S, Nickel B, Essel E, Le Cabec A, Wehrberger K, Conard NJ, Kind CJ, Posth C, Krause J, Abrams G, Bonjean D, Di Modica K, Toussaint M, Kelso J, Meyer M, Pääbo S, Prüfer K. Nuclear DNA from two early Neandertals reveals 80,000 years of genetic continuity in Europe. SCIENCE ADVANCES 2019; 5:eaaw5873. [PMID: 31249872 PMCID: PMC6594762 DOI: 10.1126/sciadv.aaw5873] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2019] [Accepted: 05/22/2019] [Indexed: 06/09/2023]
Abstract
Little is known about the population history of Neandertals over the hundreds of thousands of years of their existence. We retrieved nuclear genomic sequences from two Neandertals, one from Hohlenstein-Stadel Cave in Germany and the other from Scladina Cave in Belgium, who lived around 120,000 years ago. Despite the deeply divergent mitochondrial lineage present in the former individual, both Neandertals are genetically closer to later Neandertals from Europe than to a roughly contemporaneous individual from Siberia. That the Hohlenstein-Stadel and Scladina individuals lived around the time of their most recent common ancestor with later Neandertals suggests that all later Neandertals trace at least part of their ancestry back to these early European Neandertals.
Collapse
Affiliation(s)
- Stéphane Peyrégne
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, Leipzig04103, Germany
| | - Viviane Slon
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, Leipzig04103, Germany
| | - Fabrizio Mafessoni
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, Leipzig04103, Germany
| | - Cesare de Filippo
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, Leipzig04103, Germany
| | - Mateja Hajdinjak
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, Leipzig04103, Germany
| | - Sarah Nagel
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, Leipzig04103, Germany
| | - Birgit Nickel
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, Leipzig04103, Germany
| | - Elena Essel
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, Leipzig04103, Germany
| | - Adeline Le Cabec
- Department of Human Evolution, Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, Leipzig04103, Germany
| | | | - Nicholas J. Conard
- Department of Early Prehistory and Quaternary Ecology, University of Tübingen, Schloss Hohentübingen, Tübingen72070, Germany
| | - Claus Joachim Kind
- State Office for Cultural Heritage Baden-Württemberg Berliner Strasse 12, Esslingen 73728 Germany
| | - Cosimo Posth
- Max Planck Institute for the Science of Human History, Khalaische Strasse 10, Jena07745, Germany
| | - Johannes Krause
- Max Planck Institute for the Science of Human History, Khalaische Strasse 10, Jena07745, Germany
| | | | | | | | | | - Janet Kelso
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, Leipzig04103, Germany
| | - Matthias Meyer
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, Leipzig04103, Germany
| | - Svante Pääbo
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, Leipzig04103, Germany
| | - Kay Prüfer
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, Leipzig04103, Germany
- Max Planck Institute for the Science of Human History, Khalaische Strasse 10, Jena07745, Germany
| |
Collapse
|
39
|
Hermann P, Heissl A, Tiemann-Boege I, Futschik A. LDJump: Estimating variable recombination rates from population genetic data. Mol Ecol Resour 2019; 19:623-638. [PMID: 30666785 PMCID: PMC6519033 DOI: 10.1111/1755-0998.12994] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2018] [Revised: 12/13/2018] [Accepted: 01/11/2019] [Indexed: 11/27/2022]
Abstract
As recombination plays an important role in evolution, its estimation and the identification of hotspot positions is of considerable interest. We propose a novel approach for estimating population recombination rates based on genotyping or sequence data that involves a sequential multiscale change point estimator. Our method also permits demography to be taken into account. It uses several summary statistics within a regression model fitted on suitable scenarios. Our proposed method is accurate, computationally fast, and provides a parsimonious solution by ensuring a type I error control against too many changes in the recombination rate. An application to human genome data suggests a good congruence between our estimated and experimentally identified hotspots. Our method is implemented in the R‐package LDJump, which is freely available at https://github.com/PhHermann/LDJump.
Collapse
Affiliation(s)
- Philipp Hermann
- Department of Applied Statistics, Johannes Kepler University Linz, Linz, Austria
| | - Angelika Heissl
- Institute of Biophysics, Johannes Kepler University Linz, Linz, Austria
| | | | - Andreas Futschik
- Department of Applied Statistics, Johannes Kepler University Linz, Linz, Austria
| |
Collapse
|
40
|
Mueller JC, Kuhl H, Boerno S, Tella JL, Carrete M, Kempenaers B. Evolution of genomic variation in the burrowing owl in response to recent colonization of urban areas. Proc Biol Sci 2019; 285:rspb.2018.0206. [PMID: 29769357 PMCID: PMC5966595 DOI: 10.1098/rspb.2018.0206] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2018] [Accepted: 04/16/2018] [Indexed: 11/12/2022] Open
Abstract
When a species successfully colonizes an urban habitat it can be expected that its population rapidly adapts to the new environment but also experiences demographic perturbations. It is, therefore, essential to gain an understanding of the population structure and the demographic history of the urban and neighbouring rural populations before studying adaptation at the genome level. Here, we investigate populations of the burrowing owl (Athene cunicularia), a species that colonized South American cities just a few decades ago. We assembled a high-quality genome of the burrowing owl and re-sequenced 137 owls from three urban-rural population pairs at 17-fold median sequencing coverage per individual. Our data indicate that each city was independently colonized by a limited number of founders and that restricted gene flow occurred between neighbouring urban and rural populations, but not between urban populations of different cities. Using long-range linkage disequilibrium statistics in an approximate Bayesian computation approach, we estimated consistently lower population sizes in the recent past for the urban populations in comparison to the rural ones. The current urban populations all show reduced standing variation in rare single nucleotide polymorphisms (SNPs), but with different subsets of rare SNPs in different cities. This lowers the potential for local adaptation based on rare variants and makes it harder to detect consistent signals of selection in the genome.
Collapse
Affiliation(s)
- Jakob C Mueller
- Department of Behavioural Ecology & Evolutionary Genetics, Max Planck Institute for Ornithology, Seewiesen, Germany
| | - Heiner Kuhl
- Sequencing Core Facility, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Stefan Boerno
- Sequencing Core Facility, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Jose L Tella
- Department of Conservation Biology, Estación Biológica de Doñana - CSIC, Sevilla, Spain
| | - Martina Carrete
- Department of Conservation Biology, Estación Biológica de Doñana - CSIC, Sevilla, Spain.,Department of Physical, Chemical and Natural Systems, University Pablo de Olavide, Sevilla, Spain
| | - Bart Kempenaers
- Department of Behavioural Ecology & Evolutionary Genetics, Max Planck Institute for Ornithology, Seewiesen, Germany
| |
Collapse
|
41
|
Dutheil JY, Hobolth A. Ancestral Population Genomics. Methods Mol Biol 2019; 1910:555-589. [PMID: 31278677 DOI: 10.1007/978-1-4939-9074-0_18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Borrowing both from population genetics and phylogenetics, the field of population genomics emerged as full genomes of several closely related species were available. Providing we can properly model sequence evolution within populations undergoing speciation events, this resource enables us to estimate key population genetics parameters such as ancestral population sizes and split times. Furthermore we can enhance our understanding of the recombination process and investigate various selective forces. With the advent of resequencing technologies, genome-wide patterns of diversity in extant populations have now come to complement this picture, offering an increasing power to study more recent genetic history.We discuss the basic models of genomes in populations, including speciation models for closely related species. A major point in our discussion is that only a few complete genomes contain much information about the whole population. The reason being that recombination unlinks genomic regions, and therefore a few genomes contain many segments with distinct histories. The challenge of population genomics is to decode this mosaic of histories in order to infer scenarios of demography and selection. We survey modeling strategies for understanding genetic variation in ancestral populations and species. The underlying models build on the coalescent with recombination process and introduce further assumptions to scale the analyses to genomic data sets.
Collapse
Affiliation(s)
- Julien Y Dutheil
- Department of Evolutionary Genetics, Max Planck Institute of Evolutionary Biology, Plön, Germany.
| | - Asger Hobolth
- Bioinformatics Research Center (BiRC), Aarhus University, Aarhus, Denmark
| |
Collapse
|
42
|
F ST between archaic and present-day samples. Heredity (Edinb) 2018; 122:711-718. [PMID: 30538303 DOI: 10.1038/s41437-018-0169-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2018] [Accepted: 11/07/2018] [Indexed: 12/13/2022] Open
Abstract
The increasing abundance of DNA sequences obtained from fossils calls for new population genetics theory that takes account of both the temporal and spatial separation of samples. Here, we exploit the relationship between Wright's FST and average coalescence times to develop an analytic theory describing how FST depends on both the distance and time separating pairs of sampled genomes. We apply this theory to several simple models of population history. If there is a time series of samples, partial population replacement creates a discontinuity in pairwise FST values. The magnitude of the discontinuity depends on the extent of replacement. In stepping-stone models, pairwise FST values between archaic and present-day samples reflect both the spatial and temporal separation. At long distances, an isolation by distance pattern dominates. At short distances, the time separation dominates. Analytic predictions fit patterns generated by simulations. We illustrate our results with applications to archaic samples from European human populations. We compare present-day samples with a pair of archaic samples taken before and after a replacement event.
Collapse
|
43
|
Palkopoulou E, Lipson M, Mallick S, Nielsen S, Rohland N, Baleka S, Karpinski E, Ivancevic AM, To TH, Kortschak RD, Raison JM, Qu Z, Chin TJ, Alt KW, Claesson S, Dalén L, MacPhee RDE, Meller H, Roca AL, Ryder OA, Heiman D, Young S, Breen M, Williams C, Aken BL, Ruffier M, Karlsson E, Johnson J, Di Palma F, Alfoldi J, Adelson DL, Mailund T, Munch K, Lindblad-Toh K, Hofreiter M, Poinar H, Reich D. A comprehensive genomic history of extinct and living elephants. Proc Natl Acad Sci U S A 2018; 115:E2566-E2574. [PMID: 29483247 PMCID: PMC5856550 DOI: 10.1073/pnas.1720554115] [Citation(s) in RCA: 93] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Elephantids are the world's most iconic megafaunal family, yet there is no comprehensive genomic assessment of their relationships. We report a total of 14 genomes, including 2 from the American mastodon, which is an extinct elephantid relative, and 12 spanning all three extant and three extinct elephantid species including an ∼120,000-y-old straight-tusked elephant, a Columbian mammoth, and woolly mammoths. Earlier genetic studies modeled elephantid evolution via simple bifurcating trees, but here we show that interspecies hybridization has been a recurrent feature of elephantid evolution. We found that the genetic makeup of the straight-tusked elephant, previously placed as a sister group to African forest elephants based on lower coverage data, in fact comprises three major components. Most of the straight-tusked elephant's ancestry derives from a lineage related to the ancestor of African elephants while its remaining ancestry consists of a large contribution from a lineage related to forest elephants and another related to mammoths. Columbian and woolly mammoths also showed evidence of interbreeding, likely following a latitudinal cline across North America. While hybridization events have shaped elephantid history in profound ways, isolation also appears to have played an important role. Our data reveal nearly complete isolation between the ancestors of the African forest and savanna elephants for ∼500,000 y, providing compelling justification for the conservation of forest and savanna elephants as separate species.
Collapse
Affiliation(s)
- Eleftheria Palkopoulou
- Department of Genetics, Harvard Medical School, Boston, MA 02115;
- Broad Institute of MIT and Harvard, Cambridge, MA 02142
| | - Mark Lipson
- Department of Genetics, Harvard Medical School, Boston, MA 02115
| | - Swapan Mallick
- Department of Genetics, Harvard Medical School, Boston, MA 02115
- Broad Institute of MIT and Harvard, Cambridge, MA 02142
| | - Svend Nielsen
- Bioinformatics Research Centre, Aarhus University, DK-8000 Aarhus, Denmark
| | - Nadin Rohland
- Department of Genetics, Harvard Medical School, Boston, MA 02115
| | - Sina Baleka
- Unit of General Zoology-Evolutionary Adaptive Genomics, Institute of Biochemistry and Biology, Faculty of Mathematics and Life Sciences, University of Potsdam, 14476 Potsdam, Germany
| | - Emil Karpinski
- McMaster Ancient DNA Centre, Department of Anthropology, McMaster University, Hamilton, ON L8S 4L9, Canada
- Department of Biology, McMaster University, Hamilton, ON L8S 4K1, Canada
- Department of Biochemistry, McMaster University, Hamilton, ON L8S 4L8, Canada
- The Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, ON L8S 4L8, Canada
| | - Atma M Ivancevic
- Department of Genetics and Evolution, School of Biological Sciences, The University of Adelaide, Adelaide, 5005 SA, Australia
| | - Thu-Hien To
- Department of Genetics and Evolution, School of Biological Sciences, The University of Adelaide, Adelaide, 5005 SA, Australia
| | - R Daniel Kortschak
- Department of Genetics and Evolution, School of Biological Sciences, The University of Adelaide, Adelaide, 5005 SA, Australia
| | - Joy M Raison
- Department of Genetics and Evolution, School of Biological Sciences, The University of Adelaide, Adelaide, 5005 SA, Australia
| | - Zhipeng Qu
- Department of Genetics and Evolution, School of Biological Sciences, The University of Adelaide, Adelaide, 5005 SA, Australia
| | - Tat-Jun Chin
- School of Computer Science, The University of Adelaide, 5005 SA, Australia
| | - Kurt W Alt
- Center of Natural and Cultural Human History, Danube Private University, A-3500 Krems, Austria
- Department of Biomedical Engineering, University Hospital Basel, University of Basel, CH-4123 Basel, Switzerland
- Integrative Prehistory and Archaeological Science, University of Basel, CH-4055 Basel, Switzerland
| | | | - Love Dalén
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, SE-10405 Stockholm, Sweden
| | - Ross D E MacPhee
- Division of Vertebrate Zoology/Mammalogy, American Museum of Natural History, New York, NY 10024
| | - Harald Meller
- State Office for Heritage Management and Archaeology, 06114 Halle (Saale), Germany
| | - Alfred L Roca
- Department of Animal Sciences, University of Illinois at Urbana-Champaign, Urbana, IL 61801
- Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801
| | - Oliver A Ryder
- Institute for Conservation Research, San Diego Zoo, Escondido, CA 92027
| | - David Heiman
- Broad Institute of MIT and Harvard, Cambridge, MA 02142
| | - Sarah Young
- Broad Institute of MIT and Harvard, Cambridge, MA 02142
| | - Matthew Breen
- Department of Molecular Biomedical Sciences, College of Veterinary Medicine, North Carolina State University, Raleigh, NC 27607
| | - Christina Williams
- Department of Molecular Biomedical Sciences, College of Veterinary Medicine, North Carolina State University, Raleigh, NC 27607
| | - Bronwen L Aken
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, CB10 1SD Cambridge, United Kingdom
- Wellcome Sanger Institute, Hinxton, CB10 1SD Cambridge, United Kingdom
| | - Magali Ruffier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, CB10 1SD Cambridge, United Kingdom
- Wellcome Sanger Institute, Hinxton, CB10 1SD Cambridge, United Kingdom
| | - Elinor Karlsson
- Broad Institute of MIT and Harvard, Cambridge, MA 02142
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA 01655
| | | | | | | | - David L Adelson
- Department of Genetics and Evolution, School of Biological Sciences, The University of Adelaide, Adelaide, 5005 SA, Australia
| | - Thomas Mailund
- Bioinformatics Research Centre, Aarhus University, DK-8000 Aarhus, Denmark
| | - Kasper Munch
- Bioinformatics Research Centre, Aarhus University, DK-8000 Aarhus, Denmark
| | - Kerstin Lindblad-Toh
- Broad Institute of MIT and Harvard, Cambridge, MA 02142
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, 751 23 Uppsala, Sweden
| | - Michael Hofreiter
- Unit of General Zoology-Evolutionary Adaptive Genomics, Institute of Biochemistry and Biology, Faculty of Mathematics and Life Sciences, University of Potsdam, 14476 Potsdam, Germany
| | - Hendrik Poinar
- McMaster Ancient DNA Centre, Department of Anthropology, McMaster University, Hamilton, ON L8S 4L9, Canada
- Department of Biology, McMaster University, Hamilton, ON L8S 4K1, Canada
- Department of Biochemistry, McMaster University, Hamilton, ON L8S 4L8, Canada
- The Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, ON L8S 4L8, Canada
| | - David Reich
- Department of Genetics, Harvard Medical School, Boston, MA 02115;
- Broad Institute of MIT and Harvard, Cambridge, MA 02142
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA 02115
| |
Collapse
|
44
|
Stukenbrock EH, Dutheil JY. Fine-Scale Recombination Maps of Fungal Plant Pathogens Reveal Dynamic Recombination Landscapes and Intragenic Hotspots. Genetics 2018; 208:1209-1229. [PMID: 29263029 PMCID: PMC5844332 DOI: 10.1534/genetics.117.300502] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2017] [Accepted: 12/15/2017] [Indexed: 11/18/2022] Open
Abstract
Meiotic recombination is an important driver of evolution. Variability in the intensity of recombination across chromosomes can affect sequence composition, nucleotide variation, and rates of adaptation. In many organisms, recombination events are concentrated within short segments termed recombination hotspots. The variation in recombination rate and positions of recombination hotspot can be studied using population genomics data and statistical methods. In this study, we conducted population genomics analyses to address the evolution of recombination in two closely related fungal plant pathogens: the prominent wheat pathogen Zymoseptoria tritici and a sister species infecting wild grasses Z. ardabiliae We specifically addressed whether recombination landscapes, including hotspot positions, are conserved in the two recently diverged species and if recombination contributes to rapid evolution of pathogenicity traits. We conducted a detailed simulation analysis to assess the performance of methods of recombination rate estimation based on patterns of linkage disequilibrium, in particular in the context of high nucleotide diversity. Our analyses reveal overall high recombination rates, a lack of suppressed recombination in centromeres, and significantly lower recombination rates on chromosomes that are known to be accessory. The comparison of the recombination landscapes of the two species reveals a strong correlation of recombination rate at the megabase scale, but little correlation at smaller scales. The recombination landscapes in both pathogen species are dominated by frequent recombination hotspots across the genome including coding regions, suggesting a strong impact of recombination on gene evolution. A significant but small fraction of these hotspots colocalize between the two species, suggesting that hotspot dynamics contribute to the overall pattern of fast evolving recombination in these species.
Collapse
Affiliation(s)
- Eva H Stukenbrock
- Environmental Genomics, Max Planck Institute for Evolutionary Biology, 24306 Plön, Germany
- Environmental Genomics, Christian-Albrechts University of Kiel, 24118, Germany
| | - Julien Y Dutheil
- Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, 24306 Plön, Germany
- Institut des Sciences de L'Évolution de Montpellier, Centre National de la Recherche Scientifique, Université Montpellier 2, 34095, France
| |
Collapse
|
45
|
Elleouet JS, Aitken SN. Exploring Approximate Bayesian Computation for inferring recent demographic history with genomic markers in nonmodel species. Mol Ecol Resour 2018; 18:525-540. [DOI: 10.1111/1755-0998.12758] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2017] [Accepted: 01/16/2018] [Indexed: 01/11/2023]
Affiliation(s)
- Joane S. Elleouet
- Department of Forest and Conservation Sciences; Faculty of Forestry; University of British Columbia; Vancouver BC Canada
| | - Sally N. Aitken
- Department of Forest and Conservation Sciences; Faculty of Forestry; University of British Columbia; Vancouver BC Canada
| |
Collapse
|
46
|
Prüfer K, de Filippo C, Grote S, Mafessoni F, Korlević P, Hajdinjak M, Vernot B, Skov L, Hsieh P, Peyrégne S, Reher D, Hopfe C, Nagel S, Maricic T, Fu Q, Theunert C, Rogers R, Skoglund P, Chintalapati M, Dannemann M, Nelson BJ, Key FM, Rudan P, Kućan Ž, Gušić I, Golovanova LV, Doronichev VB, Patterson N, Reich D, Eichler EE, Slatkin M, Schierup MH, Andrés AM, Kelso J, Meyer M, Pääbo S. A high-coverage Neandertal genome from Vindija Cave in Croatia. Science 2017; 358:655-658. [PMID: 28982794 PMCID: PMC6185897 DOI: 10.1126/science.aao1887] [Citation(s) in RCA: 310] [Impact Index Per Article: 44.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2017] [Accepted: 09/27/2017] [Indexed: 12/30/2022]
Abstract
To date, the only Neandertal genome that has been sequenced to high quality is from an individual found in Southern Siberia. We sequenced the genome of a female Neandertal from ~50,000 years ago from Vindija Cave, Croatia, to ~30-fold genomic coverage. She carried 1.6 differences per 10,000 base pairs between the two copies of her genome, fewer than present-day humans, suggesting that Neandertal populations were of small size. Our analyses indicate that she was more closely related to the Neandertals that mixed with the ancestors of present-day humans living outside of sub-Saharan Africa than the previously sequenced Neandertal from Siberia, allowing 10 to 20% more Neandertal DNA to be identified in present-day humans, including variants involved in low-density lipoprotein cholesterol concentrations, schizophrenia, and other diseases.
Collapse
Affiliation(s)
- Kay Prüfer
- Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany.
| | - Cesare de Filippo
- Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany
| | - Steffi Grote
- Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany
| | - Fabrizio Mafessoni
- Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany
| | - Petra Korlević
- Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany
| | - Mateja Hajdinjak
- Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany
| | - Benjamin Vernot
- Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany
| | - Laurits Skov
- Bioinformatics Research Centre, Aarhus University, DK-8000 Aarhus C, Denmark
| | - Pinghsun Hsieh
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Stéphane Peyrégne
- Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany
| | - David Reher
- Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany
| | - Charlotte Hopfe
- Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany
| | - Sarah Nagel
- Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany
| | - Tomislav Maricic
- Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany
| | - Qiaomei Fu
- Key Laboratory of Vertebrate Evolution and Human Origins of Chinese Academy of Sciences, Institute of Vertebrate Paleontology and Paleoanthropology, Chinese Academy of Sciences, Beijing 100044, China
| | - Christoph Theunert
- Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany
- Department of Integrative Biology, University of California, Berkeley, CA 94720-3140, USA
| | - Rebekah Rogers
- Department of Integrative Biology, University of California, Berkeley, CA 94720-3140, USA
| | - Pontus Skoglund
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | | | - Michael Dannemann
- Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany
| | - Bradley J Nelson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Felix M Key
- Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany
| | - Pavao Rudan
- Anthropology Center of the Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia
| | - Željko Kućan
- Anthropology Center of the Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia
| | - Ivan Gušić
- Anthropology Center of the Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia
| | | | | | - Nick Patterson
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - David Reich
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA 02115, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | - Montgomery Slatkin
- Department of Integrative Biology, University of California, Berkeley, CA 94720-3140, USA
| | - Mikkel H Schierup
- Bioinformatics Research Centre, Aarhus University, DK-8000 Aarhus C, Denmark
| | - Aida M Andrés
- Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany
| | - Janet Kelso
- Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany
| | - Matthias Meyer
- Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany
| | - Svante Pääbo
- Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany.
| |
Collapse
|
47
|
Murray KD, Webers C, Ong CS, Borevitz J, Warthmann N. kWIP: The k-mer weighted inner product, a de novo estimator of genetic similarity. PLoS Comput Biol 2017; 13:e1005727. [PMID: 28873405 PMCID: PMC5600398 DOI: 10.1371/journal.pcbi.1005727] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2016] [Revised: 09/15/2017] [Accepted: 08/21/2017] [Indexed: 11/18/2022] Open
Abstract
Modern genomics techniques generate overwhelming quantities of data. Extracting population genetic variation demands computationally efficient methods to determine genetic relatedness between individuals (or “samples”) in an unbiased manner, preferably de novo. Rapid estimation of genetic relatedness directly from sequencing data has the potential to overcome reference genome bias, and to verify that individuals belong to the correct genetic lineage before conclusions are drawn using mislabelled, or misidentified samples. We present the k-mer Weighted Inner Product (kWIP), an assembly-, and alignment-free estimator of genetic similarity. kWIP combines a probabilistic data structure with a novel metric, the weighted inner product (WIP), to efficiently calculate pairwise similarity between sequencing runs from their k-mer counts. It produces a distance matrix, which can then be further analysed and visualised. Our method does not require prior knowledge of the underlying genomes and applications include establishing sample identity and detecting mix-up, non-obvious genomic variation, and population structure. We show that kWIP can reconstruct the true relatedness between samples from simulated populations. By re-analysing several published datasets we show that our results are consistent with marker-based analyses. kWIP is written in C++, licensed under the GNU GPL, and is available from https://github.com/kdmurray91/kwip.
Collapse
Affiliation(s)
- Kevin D. Murray
- Research School of Biology, The Australian National University, Canberra, Australia
- * E-mail: (KDM); (NW)
| | - Christfried Webers
- Data61, CSIRO, Canberra, Australia
- Research School of Computer Science, The Australian National University, Canberra, Australia
| | - Cheng Soon Ong
- Data61, CSIRO, Canberra, Australia
- Research School of Computer Science, The Australian National University, Canberra, Australia
| | - Justin Borevitz
- Research School of Biology, The Australian National University, Canberra, Australia
| | - Norman Warthmann
- Research School of Biology, The Australian National University, Canberra, Australia
- * E-mail: (KDM); (NW)
| |
Collapse
|
48
|
Peyrégne S, Boyle MJ, Dannemann M, Prüfer K. Detecting ancient positive selection in humans using extended lineage sorting. Genome Res 2017; 27:1563-1572. [PMID: 28720580 PMCID: PMC5580715 DOI: 10.1101/gr.219493.116] [Citation(s) in RCA: 70] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2016] [Accepted: 07/05/2017] [Indexed: 01/20/2023]
Abstract
Natural selection that affected modern humans early in their evolution has likely shaped some of the traits that set present-day humans apart from their closest extinct and living relatives. The ability to detect ancient natural selection in the human genome could provide insights into the molecular basis for these human-specific traits. Here, we introduce a method for detecting ancient selective sweeps by scanning for extended genomic regions where our closest extinct relatives, Neandertals and Denisovans, fall outside of the present-day human variation. Regions that are unusually long indicate the presence of lineages that reached fixation in the human population faster than expected under neutral evolution. Using simulations, we show that the method is able to detect ancient events of positive selection and that it can differentiate those from background selection. Applying our method to the 1000 Genomes data set, we find evidence for ancient selective sweeps favoring regulatory changes and present a list of genomic regions that are predicted to underlie positively selected human specific traits.
Collapse
Affiliation(s)
- Stéphane Peyrégne
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany
| | - Michael James Boyle
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany
| | - Michael Dannemann
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany
| | - Kay Prüfer
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany
| |
Collapse
|
49
|
Abstract
Bacteria can exchange and acquire new genetic material from other organisms directly and via the environment. This process, known as bacterial recombination, has a strong impact on the evolution of bacteria, for example, leading to the spread of antibiotic resistance across clades and species, and to the avoidance of clonal interference. Recombination hinders phylogenetic and transmission inference because it creates patterns of substitutions (homoplasies) inconsistent with the hypothesis of a single evolutionary tree. Bacterial recombination is typically modeled as statistically akin to gene conversion in eukaryotes, i.e., using the coalescent with gene conversion (CGC). However, this model can be very computationally demanding as it needs to account for the correlations of evolutionary histories of even distant loci. So, with the increasing popularity of whole genome sequencing, the need has emerged for a faster approach to model and simulate bacterial genome evolution. We present a new model that approximates the coalescent with gene conversion: the bacterial sequential Markov coalescent (BSMC). Our approach is based on a similar idea to the sequential Markov coalescent (SMC)-an approximation of the coalescent with crossover recombination. However, bacterial recombination poses hurdles to a sequential Markov approximation, as it leads to strong correlations and linkage disequilibrium across very distant sites in the genome. Our BSMC overcomes these difficulties, and shows a considerable reduction in computational demand compared to the exact CGC, and very similar patterns in simulated data. We implemented our BSMC model within new simulation software FastSimBac. In addition to the decreased computational demand compared to previous bacterial genome evolution simulators, FastSimBac provides more general options for evolutionary scenarios, allowing population structure with migration, speciation, population size changes, and recombination hotspots. FastSimBac is available from https://bitbucket.org/nicofmay/fastsimbac, and is distributed as open source under the terms of the GNU General Public License. Lastly, we use the BSMC within an Approximate Bayesian Computation (ABC) inference scheme, and suggest that parameters simulated under the exact CGC can correctly be recovered, further showcasing the accuracy of the BSMC. With this ABC we infer recombination rate, mutation rate, and recombination tract length of Bacillus cereus from a whole genome alignment.
Collapse
Affiliation(s)
- Nicola De Maio
- Institute for Emerging Infections, Oxford Martin School, University of Oxford, Oxford, OX1 3PA, United Kingdom
- Nuffield Department of Medicine, University of Oxford, Oxford, OX1 3PA, United Kingdom
| | - Daniel J Wilson
- Institute for Emerging Infections, Oxford Martin School, University of Oxford, Oxford, OX1 3PA, United Kingdom
- Nuffield Department of Medicine, University of Oxford, Oxford, OX1 3PA, United Kingdom
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, OX1 3PA, United Kingdom
| |
Collapse
|
50
|
Kamm JA, Terhorst J, Song YS. Efficient computation of the joint sample frequency spectra for multiple populations. J Comput Graph Stat 2017; 26:182-194. [PMID: 28239248 DOI: 10.1080/10618600.2016.1159212] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
A wide range of studies in population genetics have employed the sample frequency spectrum (SFS), a summary statistic which describes the distribution of mutant alleles at a polymorphic site in a sample of DNA sequences and provides a highly efficient dimensional reduction of large-scale population genomic variation data. Recently, there has been much interest in analyzing the joint SFS data from multiple populations to infer parameters of complex demographic histories, including variable population sizes, population split times, migration rates, admixture proportions, and so on. SFS-based inference methods require accurate computation of the expected SFS under a given demographic model. Although much methodological progress has been made, existing methods suffer from numerical instability and high computational complexity when multiple populations are involved and the sample size is large. In this paper, we present new analytic formulas and algorithms that enable accurate, efficient computation of the expected joint SFS for thousands of individuals sampled from hundreds of populations related by a complex demographic model with arbitrary population size histories (including piecewise-exponential growth). Our results are implemented in a new software package called momi (MOran Models for Inference). Through an empirical study we demonstrate our improvements to numerical stability and computational complexity.
Collapse
Affiliation(s)
- John A Kamm
- Department of Statistics, University of California, Berkeley
| | | | - Yun S Song
- Departments of EECS, Statistics, and Integrative Biology, University of California, Berkeley
| |
Collapse
|