1
|
Shastry V, Musiani M, Novembre J. Jointly representing long-range genetic similarity and spatially heterogeneous isolation-by-distance. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.10.637386. [PMID: 39990319 PMCID: PMC11844421 DOI: 10.1101/2025.02.10.637386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/25/2025]
Abstract
Isolation-by-distance patterns in genetic variation are a widespread feature of the geographic structure of genetic variation in many species, and many methods have been developed to illuminate such patterns in genetic data. However, long-range genetic similarities also exist, often as a result of rare or episodic long-range gene flow. Jointly characterizing patterns of isolation-by-distance and long-range genetic similarity in genetic data is an open data analysis challenge that, if resolved, could help produce more complete representations of the geographic structure of genetic data in any given species. Here, we present a computationally tractable method that identifies long-range genetic similarities in a background of spatially heterogeneous isolation-by-distance variation. The method uses a coalescent-based framework, and models long-range genetic similarity in terms of directional events with source fractions describing the fraction of ancestry at a location tracing back to a remote source. The method produces geographic maps annotated with inferred long-range edges, as well as maps of uncertainty in the geographic location of each source of long-range gene flow. We have implemented the method in a package called FEEMSmix (an extension to FEEMS from Marcus et al., 2021), and validated its implementation using simulations representative of typical data applications. We also apply this method to two empirical data sets. In a data set of over 4,000 humans (Homo sapiens) across Afro-Eurasia, we recover many known signals of long-distance dispersal from recent centuries. Similarly, in a data set of over 100 gray wolves (Canis lupus) across North America, we identify several previously unknown long-range connections, some of which were attributable to recording errors in sampling locations. Therefore, beyond identifying genuine long-range dispersals, our approach also serves as a useful tool for quality control in spatial genetic studies.
Collapse
Affiliation(s)
- Vivaswat Shastry
- Committee on Genetics, Genomics and Systems Biology, University of Chicago, Chicago, IL, USA
| | - Marco Musiani
- Department of Biological, Geological, and Environmental Sciences, University of Bologna, Bologna, Italy
| | - John Novembre
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| |
Collapse
|
2
|
Williams MP, Flegontov P, Maier R, Huber CD. Testing times: disentangling admixture histories in recent and complex demographies using ancient DNA. Genetics 2024; 228:iyae110. [PMID: 39013011 PMCID: PMC11373510 DOI: 10.1093/genetics/iyae110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Revised: 04/08/2024] [Accepted: 06/11/2024] [Indexed: 07/18/2024] Open
Abstract
Our knowledge of human evolutionary history has been greatly advanced by paleogenomics. Since the 2020s, the study of ancient DNA has increasingly focused on reconstructing the recent past. However, the accuracy of paleogenomic methods in resolving questions of historical and archaeological importance amidst the increased demographic complexity and decreased genetic differentiation remains an open question. We evaluated the performance and behavior of two commonly used methods, qpAdm and the f3-statistic, on admixture inference under a diversity of demographic models and data conditions. We performed two complementary simulation approaches-firstly exploring a wide demographic parameter space under four simple demographic models of varying complexities and configurations using branch-length data from two chromosomes-and secondly, we analyzed a model of Eurasian history composed of 59 populations using whole-genome data modified with ancient DNA conditions such as SNP ascertainment, data missingness, and pseudohaploidization. We observe that population differentiation is the primary factor driving qpAdm performance. Notably, while complex gene flow histories influence which models are classified as plausible, they do not reduce overall performance. Under conditions reflective of the historical period, qpAdm most frequently identifies the true model as plausible among a small candidate set of closely related populations. To increase the utility for resolving fine-scaled hypotheses, we provide a heuristic for further distinguishing between candidate models that incorporates qpAdm model P-values and f3-statistics. Finally, we demonstrate a significant performance increase for qpAdm using whole-genome branch-length f2-statistics, highlighting the potential for improved demographic inference that could be achieved with future advancements in f-statistic estimations.
Collapse
Affiliation(s)
- Matthew P Williams
- Department of Biology, Pennsylvania State University, University Park, PA 16802, USA
| | - Pavel Flegontov
- Department of Biology and Ecology, University of Ostrava, Ostrava 701 03, Czechia
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| | - Robert Maier
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| | - Christian D Huber
- Department of Biology, Pennsylvania State University, University Park, PA 16802, USA
| |
Collapse
|
3
|
Przelomska NAS, Diaz RA, Ávila FA, Ballen GA, Cortés-B R, Kistler L, Chitwood DH, Charitonidou M, Renner SS, Pérez-Escobar OA, Antonelli A. Morphometrics and Phylogenomics of Coca (Erythroxylum spp.) Illuminate Its Reticulate Evolution, With Implications for Taxonomy. Mol Biol Evol 2024; 41:msae114. [PMID: 38982580 PMCID: PMC11233275 DOI: 10.1093/molbev/msae114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 05/01/2024] [Accepted: 05/10/2024] [Indexed: 07/11/2024] Open
Abstract
South American coca (Erythroxylum coca and E. novogranatense) has been a keystone crop for many Andean and Amazonian communities for at least 8,000 years. However, over the last half-century, global demand for its alkaloid cocaine has driven intensive agriculture of this plant and placed it in the center of armed conflict and deforestation. To monitor the changing landscape of coca plantations, the United Nations Office on Drugs and Crime collects annual data on their areas of cultivation. However, attempts to delineate areas in which different varieties are grown have failed due to limitations around identification. In the absence of flowers, identification relies on leaf morphology, yet the extent to which this is reflected in taxonomy is uncertain. Here, we analyze the consistency of the current naming system of coca and its four closest wild relatives (the "coca clade"), using morphometrics, phylogenomics, molecular clocks, and population genomics. We include name-bearing type specimens of coca's closest wild relatives E. gracilipes and E. cataractarum. Morphometrics of 342 digitized herbarium specimens show that leaf shape and size fail to reliably discriminate between species and varieties. However, the statistical analyses illuminate that rounder and more obovate leaves of certain varieties could be associated with the subtle domestication syndrome of coca. Our phylogenomic data indicate extensive gene flow involving E. gracilipes which, combined with morphometrics, supports E. gracilipes being retained as a single species. Establishing a robust evolutionary-taxonomic framework for the coca clade will facilitate the development of cost-effective genotyping methods to support reliable identification.
Collapse
Affiliation(s)
- Natalia A S Przelomska
- School of Biological Sciences, University of Portsmouth, Portsmouth PO1 2DY, UK
- Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3AE, UK
- Department of Anthropology, National Museum of Natural History, Smithsonian Institution, Washington DC 20560, USA
| | - Rudy A Diaz
- Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3AE, UK
| | | | - Gustavo A Ballen
- Instituto de Biociências, Universidade Estadual Paulista, Botucatu, São Paulo, Brazil
- School of Biological and Behavioural Sciences, Queen Mary University of London, London E1 4NS, UK
| | - Rocío Cortés-B
- Herbario Forestal Universidad Distrital, Campus El Vivero, CR 5E 15-82 Bogotá, Colombia
| | - Logan Kistler
- Department of Anthropology, National Museum of Natural History, Smithsonian Institution, Washington DC 20560, USA
| | - Daniel H Chitwood
- Department of Horticulture, Michigan State University, East Lansing, MI 48824, USA
- Department of Computational Mathematics, Science & Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Martha Charitonidou
- Department of Biological Applications and Technology, University of Ioannina, 45110 Ioannina, Greece
| | - Susanne S Renner
- Department of Biology, Washington University, Saint Louis, MO 63130, USA
| | | | - Alexandre Antonelli
- Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3AE, UK
- Gothenburg Global Biodiversity Centre, Department of Biological and Environmental Sciences, University of Gothenburg, SE 41319 Göteborg, Sweden
- Department of Biology, University of Oxford, Oxford OX1 3RB, UK
| |
Collapse
|
4
|
Reyna-Blanco CS, Caduff M, Galimberti M, Leuenberger C, Wegmann D. Inference of Locus-Specific Population Mixtures from Linked Genome-Wide Allele Frequencies. Mol Biol Evol 2024; 41:msae137. [PMID: 38958167 PMCID: PMC11255385 DOI: 10.1093/molbev/msae137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 06/26/2024] [Accepted: 06/27/2024] [Indexed: 07/04/2024] Open
Abstract
Admixture between populations and species is common in nature. Since the influx of new genetic material might be either facilitated or hindered by selection, variation in mixture proportions along the genome is expected in organisms undergoing recombination. Various graph-based models have been developed to better understand these evolutionary dynamics of population splits and mixtures. However, current models assume a single mixture rate for the entire genome and do not explicitly account for linkage. Here, we introduce TreeSwirl, a novel method for inferring branch lengths and locus-specific mixture proportions by using genome-wide allele frequency data, assuming that the admixture graph is known or has been inferred. TreeSwirl builds upon TreeMix that uses Gaussian processes to estimate the presence of gene flow between diverged populations. However, in contrast to TreeMix, our model infers locus-specific mixture proportions employing a hidden Markov model that accounts for linkage. Through simulated data, we demonstrate that TreeSwirl can accurately estimate locus-specific mixture proportions and handle complex demographic scenarios. It also outperforms related D- and f-statistics in terms of accuracy and sensitivity to detect introgressed loci.
Collapse
Affiliation(s)
- Carlos S Reyna-Blanco
- Department of Biology, University of Fribourg, Fribourg 1700, Switzerland
- Swiss Institute of Bioinformatics, Fribourg 1700, Switzerland
| | - Madleina Caduff
- Department of Biology, University of Fribourg, Fribourg 1700, Switzerland
- Swiss Institute of Bioinformatics, Fribourg 1700, Switzerland
| | - Marco Galimberti
- Department of Biology, University of Fribourg, Fribourg 1700, Switzerland
- Swiss Institute of Bioinformatics, Fribourg 1700, Switzerland
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA
- Veterans Affairs Connecticut Healthcare System, West Haven, CT, USA
| | | | - Daniel Wegmann
- Department of Biology, University of Fribourg, Fribourg 1700, Switzerland
- Swiss Institute of Bioinformatics, Fribourg 1700, Switzerland
| |
Collapse
|
5
|
Yang C, Zhang X, Yan S, Yang S, Wu B, You F, Cui Y, Xie N, Wang Z, Jin L, Xu S, Zhang M. Large-scale lexical and genetic alignment supports a hybrid model of Han Chinese demic and cultural diffusions. Nat Hum Behav 2024; 8:1163-1176. [PMID: 38740988 DOI: 10.1038/s41562-024-01886-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Accepted: 04/11/2024] [Indexed: 05/16/2024]
Abstract
The Han Chinese history is shaped by substantial demographic activities and sociocultural transmissions. However, it remains challenging to assess the contributions of demic and cultural diffusion to Han culture and language, primarily due to the lack of rigorous examination of genetic-linguistic congruence. Here we digitized a large-scale linguistic inventory comprising 1,018 lexical traits across 926 dialect varieties. Using phylogenetic analysis and admixture inference, we revealed a north-south gradient of lexical differences that probably resulted from historical migrations. Furthermore, we quantified extensive horizontal language transfers and pinpointed central China as a dialectal melting pot. Integrating genetic data from 30,408 Han Chinese individuals, we compared the lexical and genetic landscapes across 26 provinces. Our results support a hybrid model where demic diffusion predominantly impacts central China, while cultural diffusion and language assimilation occur in southwestern and coastal regions, respectively. This interdisciplinary study sheds light on the complex social-genetic history of the Han Chinese.
Collapse
Affiliation(s)
- Chengkun Yang
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, School of Life Science, Fudan University, Shanghai, China
| | - Xiaoxi Zhang
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
- Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Shi Yan
- School of Ethnology and Sociology, Minzu University of China, Beijing, China
| | - Sizhe Yang
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, School of Life Science, Fudan University, Shanghai, China
| | - Baihui Wu
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, School of Life Science, Fudan University, Shanghai, China
| | - Fengshuo You
- Department of Chinese Language and Literature, Fudan University, Shanghai, China
| | - Yue Cui
- Department of Cultural Heritage and Museology, Fudan University, Shanghai, China
| | - Ni Xie
- Department of Linguistics and Modern Languages, The Chinese University of Hong Kong, Hong Kong, China
| | - Zhiyi Wang
- Department of Chinese Language and Literature, Fudan University, Shanghai, China
| | - Li Jin
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, School of Life Science, Fudan University, Shanghai, China.
| | - Shuhua Xu
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, School of Life Science, Fudan University, Shanghai, China.
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China.
- Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China.
| | - Menghan Zhang
- Institute of Modern Languages and Linguistics, Fudan University, Shanghai, China.
- Research Institute of Intelligent Complex Systems, Fudan University, Shanghai, China.
- Ministry of Education Key Laboratory of Contemporary Anthropology, Fudan University, Shanghai, China.
| |
Collapse
|
6
|
Senczuk G, Macrì M, Di Civita M, Mastrangelo S, Del Rosario Fresno M, Capote J, Pilla F, Delgado JV, Amills M, Martínez A. The demographic history and adaptation of Canarian goat breeds to environmental conditions through the use of genome-wide SNP data. Genet Sel Evol 2024; 56:2. [PMID: 38172652 PMCID: PMC10763158 DOI: 10.1186/s12711-023-00869-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Accepted: 12/18/2023] [Indexed: 01/05/2024] Open
Abstract
BACKGROUND The presence of goats in the Canary Islands dates back to the late 1st millennium BC, which coincides with the colonization by the Amazigh settlers. However, the exact geographic origin of Canarian goats is uncertain since the Amazigh peoples were distributed over a wide spatial range. Nowadays, three Canarian breeds (Palmera, Majorera and Tinerfeña) are officially recognized, along with two distinct South and North Tinerfeña ecotypes, with the South Tinerfeña and Majorera goats thriving in arid and dry semi-desertic environments and the Palmera and North Tinerfeña goats are adapted to humid and temperate areas that are influenced by trade winds. Genotypes for 224 Canarian goats were generated using the Illumina Goat single nucleotide polymorphism (SNP)50 BeadChip. By merging these data with the genotypes from 1007 individuals of African and Southern European ancestry, our aim was to ascertain the geographic origin of the Canarian goats and identify genes associated with adaptation to diverse environmental conditions. RESULTS The diversity indices of the Canarian breeds align with most of those of the analyzed local breeds from Africa and Europe, except for the Palmera goats that showed lower levels of genetic variation. The Canarian breeds demonstrate a significant genetic differentiation compared to other populations, which indicates a history of prolonged geographic isolation. Moreover, the phylogenetic reconstruction indicated that the ancestry of the Canarian goats is fundamentally North African rather than West African. The ADMIXTURE and the TreeMix analyses showed no evidence of gene flow between Canarian goats and other continental breeds. The analysis of runs of homozygosity (ROH) identified 13 ROH islands while the window-based FST method detected 25 genomic regions under selection. Major signals of selection were found on Capra hircus (CHI) chromosomes 6, 7, and 10 using various comparisons and methods. CONCLUSIONS This genome-wide analysis sheds new light on the evolutionary history of the four breeds that inhabit the Canary Islands. Our findings suggest a North African origin of the Canarian goats. In addition, within the genomic regions highlighted by the ROH and FST approaches, several genes related to body size and heat tolerance were identified.
Collapse
Affiliation(s)
- Gabriele Senczuk
- Department of Agricultural, Environmental and Food Sciences, University of Molise, 86100, Campobasso, Italy.
| | - Martina Macrì
- Animal Breeding Consulting S.L., 14014, Córdoba, Spain
- Universidad de Córdoba, 14071, Córdoba, Spain
| | - Marika Di Civita
- Department of Agricultural, Environmental and Food Sciences, University of Molise, 86100, Campobasso, Italy
| | - Salvatore Mastrangelo
- Department of Agricultural, Food and Forest Sciences, University of Palermo, 90128, Palermo, Italy
| | | | - Juan Capote
- Instituto Canario de Investigaciones Científicas, 38260, Tenerife, Spain
| | - Fabio Pilla
- Department of Agricultural, Environmental and Food Sciences, University of Molise, 86100, Campobasso, Italy
| | | | - Marcel Amills
- CRAG, CSIC-IRTA-UAB-UB, Universitat Autònoma de Barcelona, 08193, Bellaterra, Spain
| | | |
Collapse
|
7
|
Williams MP, Flegontov P, Maier R, Huber CD. Testing Times: Challenges in Disentangling Admixture Histories in Recent and Complex Demographies. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.13.566841. [PMID: 38014190 PMCID: PMC10680674 DOI: 10.1101/2023.11.13.566841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Paleogenomics has expanded our knowledge of human evolutionary history. Since the 2020s, the study of ancient DNA has increased its focus on reconstructing the recent past. However, the accuracy of paleogenomic methods in answering questions of historical and archaeological importance amidst the increased demographic complexity and decreased genetic differentiation within the historical period remains an open question. We used two simulation approaches to evaluate the limitations and behavior of commonly used methods, qpAdm and the f3-statistic, on admixture inference. The first is based on branch-length data simulated from four simple demographic models of varying complexities and configurations. The second, an analysis of Eurasian history composed of 59 populations using whole-genome data modified with ancient DNA conditions such as SNP ascertainment, data missingness, and pseudo-haploidization. We show that under conditions resembling historical populations, qpAdm can identify a small candidate set of true sources and populations closely related to them. However, in typical ancient DNA conditions, qpAdm is unable to further distinguish between them, limiting its utility for resolving fine-scaled hypotheses. Notably, we find that complex gene-flow histories generally lead to improvements in the performance of qpAdm and observe no bias in the estimation of admixture weights. We offer a heuristic for admixture inference that incorporates admixture weight estimate and P-values of qpAdm models, and f3-statistics to enhance the power to distinguish between multiple plausible candidates. Finally, we highlight the future potential of qpAdm through whole-genome branch-length f2-statistics, demonstrating the improved demographic inference that could be achieved with advancements in f-statistic estimations.
Collapse
Affiliation(s)
- Matthew P. Williams
- Pennsylvania State University, Department of Biology, University Park, PA 16802, USA
| | - Pavel Flegontov
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czechia
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Robert Maier
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Christian D. Huber
- Pennsylvania State University, Department of Biology, University Park, PA 16802, USA
| |
Collapse
|
8
|
Chen X, Cornille A, An N, Xing L, Ma J, Zhao C, Wang Y, Han M, Zhang D. The East Asian wild apples, Malus baccata (L.) Borkh and Malus hupehensis (Pamp.) Rehder., are additional contributors to the genomes of cultivated European and Chinese varieties. Mol Ecol 2023; 32:5125-5139. [PMID: 35510734 DOI: 10.1111/mec.16485] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2021] [Revised: 04/09/2022] [Accepted: 04/17/2022] [Indexed: 11/29/2022]
Abstract
The domestication process in long-lived plant perennials differs dramatically from that of annuals, with a huge amount of genetic exchange between crop and wild populations. Though apple is a major fruit crop grown worldwide, the contribution of wild apple species to the genetic makeup of the cultivated apple genome remains a topic of intense study. We used population genomics approaches to investigate the contributions of several wild apple species to European and Chinese rootstock and dessert genomes, with a focus on the extent of wild-crop gene flow. Population genetic structure inferences revealed that the East Asian wild apples, Malus baccata (L.) Borkh and M. hupehensis (Pamp.), form a single panmictic group, and that the European dessert and rootstock apples form a specific gene pool whereas the Chinese dessert and rootstock apples were a mixture of three wild gene pools, suggesting different evolutionary histories of European and Chinese apple varieties. Coalescent-based inferences and gene flow estimates indicated that M. baccata - M. hupehensis contributed to the genome of both European and Chinese cultivated apples through wild-to-crop introgressions, and not as an initial contributor as previously supposed. We also confirmed the contribution through wild-to-crop introgressions of Malus sylvestris Mill. to the cultivated apple genome. Apple tree domestication is therefore one example in woody perennials that involved gene flow from several wild species from multiple geographical areas. This study provides an example of a complex protracted process of domestication in long-lived plant perennials, and is a starting point for apple breeding programmes.
Collapse
Affiliation(s)
- Xilong Chen
- College of Horticulture, Yangling Sub-Center of National Center for Apple Improvement, Northwest A&F University, Yangling, Shaanxi, China
- Université Paris Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, Gif-sur-Yvette, France
| | - Amandine Cornille
- Université Paris Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, Gif-sur-Yvette, France
| | - Na An
- College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, China
| | - Libo Xing
- College of Horticulture, Yangling Sub-Center of National Center for Apple Improvement, Northwest A&F University, Yangling, Shaanxi, China
| | - Juanjuan Ma
- College of Horticulture, Yangling Sub-Center of National Center for Apple Improvement, Northwest A&F University, Yangling, Shaanxi, China
| | - Caiping Zhao
- College of Horticulture, Yangling Sub-Center of National Center for Apple Improvement, Northwest A&F University, Yangling, Shaanxi, China
| | - Yibin Wang
- College of Horticulture, Yangling Sub-Center of National Center for Apple Improvement, Northwest A&F University, Yangling, Shaanxi, China
| | - Mingyu Han
- College of Horticulture, Yangling Sub-Center of National Center for Apple Improvement, Northwest A&F University, Yangling, Shaanxi, China
| | - Dong Zhang
- College of Horticulture, Yangling Sub-Center of National Center for Apple Improvement, Northwest A&F University, Yangling, Shaanxi, China
| |
Collapse
|
9
|
Flegontov P, Işıldak U, Maier R, Yüncü E, Changmai P, Reich D. Modeling of African population history using f-statistics is biased when applying all previously proposed SNP ascertainment schemes. PLoS Genet 2023; 19:e1010931. [PMID: 37676865 PMCID: PMC10508636 DOI: 10.1371/journal.pgen.1010931] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2023] [Revised: 09/19/2023] [Accepted: 08/21/2023] [Indexed: 09/09/2023] Open
Abstract
f-statistics have emerged as a first line of analysis for making inferences about demographic history from genome-wide data. Not only are they guaranteed to allow robust tests of the fits of proposed models of population history to data when analyzing full genome sequencing data-that is, all single nucleotide polymorphisms (SNPs) in the individuals being analyzed-but they are also guaranteed to allow robust tests of models for SNPs ascertained as polymorphic in a population that is an outgroup in a phylogenetic sense to all groups being analyzed. True "outgroup ascertainment" is in practice impossible in humans because our species has arisen from a substructured ancestral population that does not descend from a homogeneous ancestral population going back many hundreds of thousands of years into the past. However, initial studies suggested that non-outgroup-ascertainment schemes might produce robust enough results using f-statistics, and that motivated widespread fitting of models to data using non-outgroup-ascertained SNP panels such as the "Affymetrix Human Origins array" which has been genotyped on thousands of modern individuals from hundreds of populations, or the "1240k" in-solution enrichment reagent which has been the source of about 70% of published genome-wide data for ancient humans. In this study, we show that while analyses of population history using such panels work well for studies of relationships among non-African populations and one African outgroup, when co-modeling more than one sub-Saharan African and/or archaic human groups (Neanderthals and Denisovans), fitting of f-statistics to such SNP sets is expected to frequently lead to false rejection of true demographic histories, and failure to reject incorrect models. Analyzing panels of SNPs polymorphic in archaic humans, which has been suggested as a solution for the ascertainment problem, has limited statistical power and retains important biases. However, by carrying out simulations of diverse demographic histories, we show that bias in inferences based on f-statistics can be minimized by ascertaining on variants common in a union of diverse African groups; such ascertainment retains high statistical power while allowing co-analysis of archaic and modern groups.
Collapse
Affiliation(s)
- Pavel Flegontov
- Department of Human Evolutionary Biology, Harvard University, Cambridge, Massachusetts, United States of America
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czechia
- Kalmyk Research Center of the Russian Academy of Sciences, Elista, Russia
| | - Ulaş Işıldak
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czechia
| | - Robert Maier
- Department of Human Evolutionary Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Eren Yüncü
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czechia
| | - Piya Changmai
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czechia
| | - David Reich
- Department of Human Evolutionary Biology, Harvard University, Cambridge, Massachusetts, United States of America
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America
- Howard Hughes Medical Institute, Harvard Medical School, Boston, Massachusetts, United States of America
- Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America
| |
Collapse
|
10
|
Moorjani P, Hellenthal G. Methods for Assessing Population Relationships and History Using Genomic Data. Annu Rev Genomics Hum Genet 2023; 24:305-332. [PMID: 37220313 PMCID: PMC11040641 DOI: 10.1146/annurev-genom-111422-025117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Genetic data contain a record of our evolutionary history. The availability of large-scale datasets of human populations from various geographic areas and timescales, coupled with advances in the computational methods to analyze these data, has transformed our ability to use genetic data to learn about our evolutionary past. Here, we review some of the widely used statistical methods to explore and characterize population relationships and history using genomic data. We describe the intuition behind commonly used approaches, their interpretation, and important limitations. For illustration, we apply some of these techniques to genome-wide autosomal data from 929 individuals representing 53 worldwide populations that are part of the Human Genome Diversity Project. Finally, we discuss the new frontiers in genomic methods to learn about population history. In sum, this review highlights the power (and limitations) of DNA to infer features of human evolutionary history, complementing the knowledge gleaned from other disciplines, such as archaeology, anthropology, and linguistics.
Collapse
Affiliation(s)
- Priya Moorjani
- Department of Molecular and Cell Biology and Center for Computational Biology, University of California, Berkeley, California, USA;
| | - Garrett Hellenthal
- UCL Genetics Institute and Research Department of Genetics, Evolution, and Environment, University College London, London, United Kingdom;
| |
Collapse
|
11
|
Buswell VG, Ellis JS, Huml JV, Wragg D, Barnett MW, Brown A, Knight ME. When One's Not Enough: Colony Pool-Seq Outperforms Individual-Based Methods for Assessing Introgression in Apis mellifera mellifera. INSECTS 2023; 14:insects14050421. [PMID: 37233049 DOI: 10.3390/insects14050421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 04/22/2023] [Accepted: 04/24/2023] [Indexed: 05/27/2023]
Abstract
The human management of honey bees (Apis mellifera) has resulted in the widespread introduction of subspecies outside of their native ranges. One well known example of this is Apis mellifera mellifera, native to Northern Europe, which has now been significantly introgressed by the introduction of C lineage honey bees. Introgression has consequences for species in terms of future adaptive potential and long-term viability. However, estimating introgression in colony-living haplodiploid species is challenging. Previous studies have estimated introgression using individual workers, individual drones, multiple drones, and pooled workers. Here, we compare introgression estimates via three genetic approaches: SNP array, individual RAD-seq, and pooled colony RAD-seq. We also compare two statistical approaches: a maximum likelihood cluster program (ADMIXTURE) and an incomplete lineage sorting model (ABBA BABA). Overall, individual approaches resulted in lower introgression estimates than pooled colonies when using ADMIXTURE. However, the pooled colony ABBA BABA approach resulted in generally lower introgression estimates than all three ADMIXTURE estimates. These results highlight that sometimes one individual is not enough to assess colony-level introgression, and future studies that do use colony pools should not be solely dependent on clustering programs for introgression estimates.
Collapse
Affiliation(s)
- Victoria G Buswell
- School of Biological and Marine Sciences, University of Plymouth, Drake Circus, Plymouth PL4 8AA, UK
- Information and Computational Sciences, The James Hutton Institute, Dundee DD2 5DA, UK
| | - Jonathan S Ellis
- School of Biological and Marine Sciences, University of Plymouth, Drake Circus, Plymouth PL4 8AA, UK
| | - J Vanessa Huml
- School of Biological and Marine Sciences, University of Plymouth, Drake Circus, Plymouth PL4 8AA, UK
| | - David Wragg
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Roslin EH25 9RG, UK
- Beebytes Analytics CIC, Roslin Innovation Centre, Easter Bush Campus, Roslin EH25 9RG, UK
| | - Mark W Barnett
- Beebytes Analytics CIC, Roslin Innovation Centre, Easter Bush Campus, Roslin EH25 9RG, UK
| | - Andrew Brown
- B4, Newton Farm Metherell, Cornwall, Callington PL17 8DQ, UK
| | - Mairi E Knight
- School of Biological and Marine Sciences, University of Plymouth, Drake Circus, Plymouth PL4 8AA, UK
| |
Collapse
|
12
|
Nielsen SV, Vaughn AH, Leppälä K, Landis MJ, Mailund T, Nielsen R. Bayesian inference of admixture graphs on Native American and Arctic populations. PLoS Genet 2023; 19:e1010410. [PMID: 36780565 PMCID: PMC9956672 DOI: 10.1371/journal.pgen.1010410] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Revised: 02/24/2023] [Accepted: 01/23/2023] [Indexed: 02/15/2023] Open
Abstract
Admixture graphs are mathematical structures that describe the ancestry of populations in terms of divergence and merging (admixing) of ancestral populations as a graph. An admixture graph consists of a graph topology, branch lengths, and admixture proportions. The branch lengths and admixture proportions can be estimated using numerous numerical optimization methods, but inferring the topology involves a combinatorial search for which no polynomial algorithm is known. In this paper, we present a reversible jump MCMC algorithm for sampling high-probability admixture graphs and show that this approach works well both as a heuristic search for a single best-fitting graph and for summarizing shared features extracted from posterior samples of graphs. We apply the method to 11 Native American and Siberian populations and exploit the shared structure of high-probability graphs to characterize the relationship between Saqqaq, Inuit, Koryaks, and Athabascans. Our analyses show that the Saqqaq is not a good proxy for the previously identified gene flow from Arctic people into the Na-Dene speaking Athabascans.
Collapse
Affiliation(s)
- Svend V. Nielsen
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| | - Andrew H. Vaughn
- Center for Computational Biology, University of California Berkeley, Berkeley, California, United States of America
| | - Kalle Leppälä
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
- Research Unit of Mathematical Sciences, University of Oulu, Oulu, Finland
| | - Michael J. Landis
- Department of Biology, Washington University in St. Louis, St. Louis, Missouri, United States of America
| | - Thomas Mailund
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| | - Rasmus Nielsen
- Departments of Integrative Biology and Statistics, University of California Berkeley, Berkeley, California, United States of America
- Center for GeoGenetics, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
13
|
Song M, Wang X, Zhao C, Qian X, Lang M, Hou Y, Song F. Inference of population structure and admixture proportion from Y chromosomal data of Chinese population. Electrophoresis 2022; 43:2351-2362. [PMID: 35973689 DOI: 10.1002/elps.202200041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Revised: 07/14/2022] [Accepted: 08/11/2022] [Indexed: 12/14/2022]
Abstract
In the past two decades, Y chromosome data has been generated for human population genetic studies. These Y chromosome datasets were produced with various testing methods and markers, thus difficult to combine them for a comprehensive analysis. In this study, we combine four human Y chromosomal datasets of Han, Tibetan, Hui, and Li ethnic groups. The dataset contains 27 microsatellites and 137 single nucleotide polymorphisms these populations share in common. We assembled a single dataset containing 2439 individuals from 25 nationwide populations in China. A systematic analysis of genetic distance and clustering was performed. To determine the gene flow of the studied population with worldwide populations, we modeled the ancestry informative markers. The reference panel was regarded as a mixture of South Asian (SAS), East Asian (EAS), European (EUR), African (AFR), and American (AMR) populations from 1000 Genomes data of Y chromosome using nonlinear data-fitting. We then calculated the admixture proportion of these four studied populations with 26 worldwide populations. The results showed that the Han and Hui have great genetic affinity, and Hui is the most admixed ethnic group, with 61.53% EAS, 34.65% SAS, 1.91% AFR, 1.56% AMR, and 0.04% EUR ancestry component (the AMR is highly admixed and thus should be ignored). All the other three ethnic groups contained more than 97% EAS ancestry component. The Li is the least admixed population in this study. The combined dataset in this study is the largest of this kind reported to date and proposes reference population data for use in future paternal genetic studies and forensic genealogical identification.
Collapse
Affiliation(s)
- Mengyuan Song
- Department of Laboratory Medicine, West China Hospital, Sichuan University; Med+Molecular Diagnostics Institute of West China Hospital/West China School of Medicine, Chengdu, P. R. China.,Institute of Forensic Medicine, West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, Chengdu, P. R. China
| | - Xindi Wang
- Institute of Forensic Medicine, West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, Chengdu, P. R. China
| | - Chenxi Zhao
- College of Computer Science, Sichuan University, Chengdu, P. R. China
| | - Xiaoqin Qian
- Institute of Forensic Medicine, West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, Chengdu, P. R. China
| | - Min Lang
- Law School, Sichuan University, Chengdu, P. R. China
| | - Yiping Hou
- Institute of Forensic Medicine, West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, Chengdu, P. R. China
| | - Feng Song
- Institute of Forensic Medicine, West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, Chengdu, P. R. China
| |
Collapse
|
14
|
Fujimoto S, Yaguchi H, Myosho T, Aoyama H, Sato Y, Kimura R. Population admixtures in medaka inferred by multiple arbitrary amplicon sequencing. Sci Rep 2022; 12:19989. [PMID: 36411327 PMCID: PMC9678866 DOI: 10.1038/s41598-022-24498-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Accepted: 11/16/2022] [Indexed: 11/23/2022] Open
Abstract
Cost-effective genotyping can be achieved by sequencing PCR amplicons. Short 3-10 base primers can arbitrarily amplify thousands of loci using only a few primers. To improve the sequencing efficiency of the multiple arbitrary amplicon sequencing (MAAS) approach, we designed new primers and examined their efficiency in sequencing and genotyping. To demonstrate the effectiveness of our method, we applied it to examining the population structure of the small freshwater fish, medaka (Oryzias latipes). We obtained 2987 informative SNVs with no missing genotype calls for 67 individuals from 15 wild populations and three artificial strains. The estimated phylogenic and population genetic structures of the wild populations were consistent with previous studies, corroborating the accuracy of our genotyping method. We also attempted to reconstruct the genetic backgrounds of a commercial orange mutant strain, Himedaka, which has caused a genetic disturbance in wild populations. Our admixture analysis focusing on Himedaka showed that at least two wild populations had genetically been contributed to the nuclear genome of this mutant strain. Our genotyping methods and results will be useful in quantitative assessments of genetic disturbance by this commercially available strain.
Collapse
Affiliation(s)
- Shingo Fujimoto
- grid.267625.20000 0001 0685 5104Graduate School of Medicine, University of the Ryukyus, Nishihara, Okinawa 903-0125 Japan ,grid.267625.20000 0001 0685 5104Present Address: Research Laboratory Center, Faculty of Medicine, University of the Ryukyus, Nishihara, Okinawa 903-0213 Japan ,grid.267625.20000 0001 0685 5104Tropical Biosphere Research Center, University of the Ryukyus, Nishihara, Okinawa 903-0213 Japan
| | - Hajime Yaguchi
- grid.267625.20000 0001 0685 5104Tropical Biosphere Research Center, University of the Ryukyus, Nishihara, Okinawa 903-0213 Japan ,grid.258777.80000 0001 2295 9421Present Address: Department of Bioscience, School of Science and Technology, Kwansei Gakuin University, Nishihara, Hyogo 669-1330 Japan
| | - Taijun Myosho
- grid.469280.10000 0000 9209 9298Laboratory of Molecular Reproductive Biology, Institute for Environmental Sciences, University of Shizuoka, Nishihara, 422-8526 Japan
| | - Hiroaki Aoyama
- grid.267625.20000 0001 0685 5104Center for Strategic and Research Center, University of the Ryukyus, Nishihara, Okinawa 903-0213 Japan ,grid.267625.20000 0001 0685 5104Research Planning Office, University of the Ryukyus, Nishihara, Okinawa 903-0213 Japan
| | - Yukuto Sato
- grid.267625.20000 0001 0685 5104Present Address: Research Laboratory Center, Faculty of Medicine, University of the Ryukyus, Nishihara, Okinawa 903-0213 Japan ,grid.267625.20000 0001 0685 5104Center for Strategic and Research Center, University of the Ryukyus, Nishihara, Okinawa 903-0213 Japan
| | - Ryosuke Kimura
- grid.267625.20000 0001 0685 5104Graduate School of Medicine, University of the Ryukyus, Nishihara, Okinawa 903-0125 Japan
| |
Collapse
|
15
|
Gunn JC, Berkman LK, Koppelman J, Taylor AT, Brewer SK, Long JM, Eggert LS. Genomic divergence, local adaptation, and complex demographic history may inform management of a popular sportfish species complex. Ecol Evol 2022; 12:e9370. [PMID: 36225830 PMCID: PMC9534746 DOI: 10.1002/ece3.9370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 08/29/2022] [Accepted: 09/05/2022] [Indexed: 11/05/2022] Open
Abstract
The Neosho Bass (Micropterus velox), a former subspecies of the keystone top-predator and globally popular Smallmouth Bass (M. dolomieu), is endemic and narrowly restricted to small, clear streams of the Arkansas River Basin in the Central Interior Highlands (CIH) ecoregion, USA. Previous studies have detected some morphological, genetic, and genomic differentiation between the Neosho and Smallmouth Basses; however, the extent of neutral and adaptive divergence and patterns of intraspecific diversity are poorly understood. Furthermore, lineage diversification has likely been impacted by gene flow in some Neosho populations, which may be due to a combination of natural biogeographic processes and anthropogenic introductions. We assessed: (1) lineage divergence, (2) local directional selection (adaptive divergence), and (3) demographic history among Smallmouth Bass populations in the CIH using population genomic analyses of 50,828 single-nucleotide polymorphisms (SNPs) obtained through ddRAD-seq. Neosho and Smallmouth Bass formed monophyletic clades with 100% bootstrap support. We identified two major lineages within each species. We discovered six Neosho Bass populations (two nonadmixed and four admixed) and three nonadmixed Smallmouth Bass populations. We detected 29 SNPs putatively under directional selection in the Neosho range, suggesting populations may be locally adapted. Two populations were admixed via recent asymmetric secondary contact, perhaps after anthropogenic introduction. Two other populations were likely admixed via combinations of ancient and recent processes. These species comprise independently evolving lineages, some having experienced historical and natural admixture. These results may be critical for management of Neosho Bass as a distinct species and may aid in the conservation of other species with complex biogeographic histories.
Collapse
Affiliation(s)
- Joe C. Gunn
- Division of Biological SciencesUniversity of MissouriColumbiaMissouriUSA
| | | | | | - Andrew T. Taylor
- Department of BiologyUniversity of Central OklahomaEdmondOklahomaUSA
- Department of BiologyUniversity of North GeorgiaDahlonegaGeorgiaUSA
| | - Shannon K. Brewer
- U.S. Geological Survey, Alabama Cooperative Fish and Wildlife Research Unit, School of Fisheries, Aquaculture, and Aquatic SciencesAuburn UniversityAuburnAlabamaUSA
| | - James M. Long
- U.S. Geological Survey, Oklahoma Cooperative Fish and Wildlife Research Unit, Department of Natural Resource Ecology and ManagementOklahoma State UniversityStillwaterOklahomaUSA
| | - Lori S. Eggert
- Division of Biological SciencesUniversity of MissouriColumbiaMissouriUSA
| |
Collapse
|
16
|
Zhang W, Wang H, Brandt DYC, Hu B, Sheng J, Wang M, Luo H, Li Y, Guo S, Sheng B, Zeng Q, Peng K, Zhao D, Jian S, Wu D, Wang J, Zhao G, Ren J, Shi W, van Esch JHM, Klingunga S, Nielsen R, Hong Y. The genetic architecture of phenotypic diversity in the Betta fish ( Betta splendens). SCIENCE ADVANCES 2022; 8:eabm4955. [PMID: 36129976 PMCID: PMC9491723 DOI: 10.1126/sciadv.abm4955] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Accepted: 08/03/2022] [Indexed: 05/28/2023]
Abstract
The Betta fish displays a remarkable variety of phenotypes selected during domestication. However, the genetic basis underlying these traits remains largely unexplored. Here, we report a high-quality genome assembly and resequencing of 727 individuals representing diverse morphotypes of the Betta fish. We show that current breeds have a complex domestication history with extensive introgression with wild species. Using a genome-wide association study, we identify the genetic basis of multiple traits, including coloration patterns, the "Dumbo" phenotype with pectoral fin outgrowth, extraordinary enlargement of body size that we map to a major locus on chromosome 8, the sex determination locus that we map to dmrt1, and the long-fin phenotype that maps to the locus containing kcnj15. We also identify a polygenic signal related to aggression, involving multiple neural system-related genes such as esyt2, apbb2, and pank2. Our study provides a resource for developing the Betta fish as a genetic model for morphological and behavioral research in vertebrates.
Collapse
Affiliation(s)
- Wanchang Zhang
- School of Life Sciences, Nanchang University, Nanchang 330031, China
| | - Hongru Wang
- Department of Integrative Biology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Débora Y. C. Brandt
- Department of Integrative Biology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Beijuan Hu
- School of Life Sciences, Nanchang University, Nanchang 330031, China
| | - Junqing Sheng
- School of Life Sciences, Nanchang University, Nanchang 330031, China
| | - Mengnan Wang
- School of Life Sciences, Nanchang University, Nanchang 330031, China
| | - Haijiang Luo
- School of Life Sciences, Nanchang University, Nanchang 330031, China
| | - Yahui Li
- Department of Molecular, Cell and Systems Biology, University of California, Riverside, Riverside, CA 92521, USA
| | - Shujie Guo
- School of Life Sciences, Nanchang University, Nanchang 330031, China
| | - Bin Sheng
- School of Life Sciences, Nanchang University, Nanchang 330031, China
| | - Qi Zeng
- School of Life Sciences, Nanchang University, Nanchang 330031, China
| | - Kou Peng
- School of Life Sciences, Nanchang University, Nanchang 330031, China
| | - Daxian Zhao
- School of Life Sciences, Nanchang University, Nanchang 330031, China
| | - Shaoqing Jian
- School of Life Sciences, Nanchang University, Nanchang 330031, China
| | - Di Wu
- School of Life Sciences, Nanchang University, Nanchang 330031, China
| | - Junhua Wang
- School of Life Sciences, Nanchang University, Nanchang 330031, China
| | - Guang Zhao
- School of Life Sciences, Nanchang University, Nanchang 330031, China
| | - Jun Ren
- College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Wentian Shi
- Faculty of Philosophy, University of Tübingen, Tübingen 72074, Germany
| | - Joep H. M. van Esch
- Biology and Medical Laboratory Research, Rotterdam University of Applied Sciences, Rotterdam 3015, Netherlands
| | - Sirawut Klingunga
- Aquatic Molecular Genetics and Biotechnology Research Team, National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency (NSTDA), Pathum Thani 12120, Thailand
| | - Rasmus Nielsen
- Department of Integrative Biology, University of California, Berkeley, Berkeley, CA 94720, USA
- Globe Institute, University of Copenhagen, Copenhagen DK-1165, Denmark
| | - Yijiang Hong
- School of Life Sciences, Nanchang University, Nanchang 330031, China
- Key Laboratory of Aquatic Resources and Utilization, Nanchang University, Nanchang 330031, China
| |
Collapse
|
17
|
Kim MS, Naidoo D, Hazra U, Quiver MH, Chen WC, Simonti CN, Kachambwa P, Harlemon M, Agalliu I, Baichoo S, Fernandez P, Hsing AW, Jalloh M, Gueye SM, Niang L, Diop H, Ndoye M, Snyper NY, Adusei B, Mensah JE, Abrahams AOD, Biritwum R, Adjei AA, Adebiyi AO, Shittu O, Ogunbiyi O, Adebayo S, Aisuodionoe-Shadrach OI, Nwegbu MM, Ajibola HO, Oluwole OP, Jamda MA, Singh E, Pentz A, Joffe M, Darst BF, Conti DV, Haiman CA, Spies PV, van der Merwe A, Rohan TE, Jacobson J, Neugut AI, McBride J, Andrews C, Petersen LN, Rebbeck TR, Lachance J. Testing the generalizability of ancestry-specific polygenic risk scores to predict prostate cancer in sub-Saharan Africa. Genome Biol 2022; 23:194. [PMID: 36100952 PMCID: PMC9472407 DOI: 10.1186/s13059-022-02766-z] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Accepted: 09/05/2022] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Genome-wide association studies do not always replicate well across populations, limiting the generalizability of polygenic risk scores (PRS). Despite higher incidence and mortality rates of prostate cancer in men of African descent, much of what is known about cancer genetics comes from populations of European descent. To understand how well genetic predictions perform in different populations, we evaluated test characteristics of PRS from three previous studies using data from the UK Biobank and a novel dataset of 1298 prostate cancer cases and 1333 controls from Ghana, Nigeria, Senegal, and South Africa. RESULTS Allele frequency differences cause predicted risks of prostate cancer to vary across populations. However, natural selection is not the primary driver of these differences. Comparing continental datasets, we find that polygenic predictions of case vs. control status are more effective for European individuals (AUC 0.608-0.707, OR 2.37-5.71) than for African individuals (AUC 0.502-0.585, OR 0.95-2.01). Furthermore, PRS that leverage information from African Americans yield modest AUC and odds ratio improvements for sub-Saharan African individuals. These improvements were larger for West Africans than for South Africans. Finally, we find that existing PRS are largely unable to predict whether African individuals develop aggressive forms of prostate cancer, as specified by higher tumor stages or Gleason scores. CONCLUSIONS Genetic predictions of prostate cancer perform poorly if the study sample does not match the ancestry of the original GWAS. PRS built from European GWAS may be inadequate for application in non-European populations and perpetuate existing health disparities.
Collapse
Affiliation(s)
- Michelle S Kim
- School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Dr, Atlanta, GA, 30332, USA
| | - Daphne Naidoo
- Centre for Proteomic and Genomic Research, Cape Town, South Africa
| | - Ujani Hazra
- School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Dr, Atlanta, GA, 30332, USA
| | - Melanie H Quiver
- School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Dr, Atlanta, GA, 30332, USA
| | - Wenlong C Chen
- Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa.,National Cancer Registry, National Health Laboratory Service, Johannesburg, South Africa
| | - Corinne N Simonti
- School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Dr, Atlanta, GA, 30332, USA
| | | | - Maxine Harlemon
- School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Dr, Atlanta, GA, 30332, USA
| | - Ilir Agalliu
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, USA
| | | | - Pedro Fernandez
- Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Ann W Hsing
- Stanford Cancer Institute, Stanford University, Stanford, CA, USA
| | | | | | - Lamine Niang
- Universite Cheikh Anta Diop de Dakar, Dakar, Senegal
| | | | - Medina Ndoye
- Universite Cheikh Anta Diop de Dakar, Dakar, Senegal
| | | | | | - James E Mensah
- Korle-Bu Teaching Hospital and University of Ghana Medical School, Accra, Ghana
| | - Afua O D Abrahams
- Korle-Bu Teaching Hospital and University of Ghana Medical School, Accra, Ghana
| | - Richard Biritwum
- Korle-Bu Teaching Hospital and University of Ghana Medical School, Accra, Ghana
| | - Andrew A Adjei
- Department of Pathology, University of Ghana Medical School, Accra, Ghana
| | | | | | | | - Sikiru Adebayo
- College of Medicine, University of Ibadan, Ibadan, Nigeria
| | | | - Maxwell M Nwegbu
- College of Health Sciences, University of Abuja and University of Abuja Teaching Hospital, Abuja, Nigeria
| | - Hafees O Ajibola
- College of Health Sciences, University of Abuja and University of Abuja Teaching Hospital, Abuja, Nigeria
| | - Olabode P Oluwole
- College of Health Sciences, University of Abuja and University of Abuja Teaching Hospital, Abuja, Nigeria
| | - Mustapha A Jamda
- College of Health Sciences, University of Abuja and University of Abuja Teaching Hospital, Abuja, Nigeria
| | - Elvira Singh
- National Cancer Registry, National Health Laboratory Service, Johannesburg, South Africa
| | - Audrey Pentz
- Non-Communicable Diseases Research Division, Wits Health Consortium (PTY) Ltd, Johannesburg, South Africa
| | - Maureen Joffe
- Non-Communicable Diseases Research Division, Wits Health Consortium (PTY) Ltd, Johannesburg, South Africa.,MRC Developmental Pathways to Health Research Unit, Department of Pediatrics, Faculty of Health Sciences, University of Witwatersrand, Johannesburg, South Africa
| | - Burcu F Darst
- Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - David V Conti
- Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Christopher A Haiman
- Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Petrus V Spies
- Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - André van der Merwe
- Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Thomas E Rohan
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Judith Jacobson
- Herbert Irving Comprehensive Cancer Center, Columbia University, New York, NY, USA
| | - Alfred I Neugut
- Herbert Irving Comprehensive Cancer Center, Columbia University, New York, NY, USA
| | - Jo McBride
- Centre for Proteomic and Genomic Research, Cape Town, South Africa
| | | | | | - Timothy R Rebbeck
- Dana-Farber Cancer Institute, Boston, MA, USA.,Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Joseph Lachance
- School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Dr, Atlanta, GA, 30332, USA.
| |
Collapse
|
18
|
Peter BM. A geometric relationship of
F
2
,
F
3
and
F
4
-statistics with principal component analysis. Philos Trans R Soc Lond B Biol Sci 2022; 377:20200413. [PMID: 35430884 PMCID: PMC9014194 DOI: 10.1098/rstb.2020.0413] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Principal component analysis (PCA) and
F
-statistics
sensu
Patterson are two of the most widely used population genetic tools to study human genetic variation. Here, I derive explicit connections between the two approaches and show that these two methods are closely related.
F
-statistics have a simple geometrical interpretation in the context of PCA, and orthogonal projections are a key concept to establish this link. I show that for any pair of populations, any population that is admixed as determined by an
F
3
-statistic will lie inside a circle on a PCA plot. Furthermore, the
F
4
-statistic is closely related to an angle measurement, and will be zero if the differences between pairs of populations intersect at a right angle in PCA space. I illustrate my results on two examples, one of Western Eurasian, and one of global human diversity. In both examples, I find that the first few PCs are sufficient to approximate most
F
-statistics, and that PCA plots are effective at predicting
F
-statistics. Thus, while
F
-statistics are commonly understood in terms of discrete populations, the geometric perspective illustrates that they can be viewed in a framework of populations that vary in a more continuous manner.
This article is part of the theme issue ‘Celebrating 50 years since Lewontin's apportionment of human diversity’.
Collapse
Affiliation(s)
- Benjamin M. Peter
- Max-Planck-Institute for Evolutionary Anthropology, Leipzig 04103, Germany
| |
Collapse
|
19
|
Gautier M, Vitalis R, Flori L, Estoup A. ƒ-statistics estimation and admixture graph construction with Pool-Seq or allele count data using the R package poolfstat. Mol Ecol Resour 2021; 22:1394-1416. [PMID: 34837462 DOI: 10.1111/1755-0998.13557] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Revised: 09/16/2021] [Accepted: 11/08/2021] [Indexed: 11/27/2022]
Abstract
By capturing various patterns of the structuring of genetic variation across populations, f -statistics have proved highly effective for the inference of demographic history. Such statistics are defined as covariance of SNP allele frequency differences among sets of populations without requiring haplotype information and are hence particularly relevant for the analysis of pooled sequencing (Pool-Seq) data. We here propose a reinterpretation of the F (and D) parameters in terms of probability of gene identity and derive from this unified definition unbiased estimators for both Pool-Seq data and standard allele count data obtained from individual genotypes. We implemented these estimators in a new version of the R package poolfstat, which now includes a wide range of inference methods: (i) three-population test of admixture; (ii) four-population test of treeness; (iii) F4-ratio estimation of admixture rates; and (iv) fitting, visualization and (semi-automatic) construction of admixture graphs. A comprehensive evaluation of the methods implemented in poolfstat on both simulated Pool-Seq (with various sequencing coverages and error rates) and allele count data confirmed the accuracy of these approaches, even for the most cost-effective Pool-Seq design involving relatively low sequencing coverages. We further analyzed a real Pool-Seq data made of 14 populations of the invasive species Drosophila suzukii which allowed refining both the demographic history of native populations and the invasion routes followed by this emblematic pest. Our new package poolfstat provides the community with a user-friendly and efficient all-in-one tool to unravel complex population genetic histories from large-size Pool-Seq or allele count SNP data.
Collapse
Affiliation(s)
- Mathieu Gautier
- CBGP, INRAE, CIRAD, IRD, Montpellier SupAgro, Univ Montpellier, Montpellier, France
| | - Renaud Vitalis
- CBGP, INRAE, CIRAD, IRD, Montpellier SupAgro, Univ Montpellier, Montpellier, France
| | - Laurence Flori
- SelMet, INRAE, CIRAD, Montpellier SupAgro, Montpellier, France
| | - Arnaud Estoup
- CBGP, INRAE, CIRAD, IRD, Montpellier SupAgro, Univ Montpellier, Montpellier, France
| |
Collapse
|
20
|
Kitada S, Nakamichi R, Kishino H. Understanding population structure in an evolutionary context: population-specific FST and pairwise FST. G3-GENES GENOMES GENETICS 2021; 11:6364900. [PMID: 34549777 PMCID: PMC8527463 DOI: 10.1093/g3journal/jkab316] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Accepted: 08/27/2021] [Indexed: 01/04/2023]
Abstract
Populations are shaped by their history. It is crucial to interpret population structure in an evolutionary context. Pairwise FST measures population structure, whereas population-specific FST measures deviation from the ancestral population. To understand the current population structure and a population’s history of range expansion, we propose a representation method that overlays population-specific FST estimates on a sampling location map, and on an unrooted neighbor-joining tree and a multi-dimensional scaling plot inferred from a pairwise FST distance matrix. We examined the usefulness of our procedure using simulations that mimicked population colonization from an ancestral population and by analyzing published human, Atlantic cod, and wild poplar data. Our results demonstrated that population-specific FST values identify the source population and trace the evolutionary history of its derived populations. Conversely, pairwise FST values represent the current population structure. By integrating the results of both estimators, we obtained a new picture of the population structure that incorporates evolutionary history. The generalized least squares estimate of genome-wide population-specific FST indicated that the wild poplar population expanded its distribution to the north, where daylight hours are long in summer, to coastal areas with abundant rainfall, and to the south where summers are dry. Genomic data highlight the power of the bias-corrected moment estimators of FST, whether global, pairwise, or population-specific, that provide unbiased estimates of FST. All FST moment estimators described in this paper have reasonable processing times and are useful in population genomics studies.
Collapse
Affiliation(s)
- Shuichi Kitada
- Tokyo University of Marine Science and Technology, Tokyo 108-8477, Japan
| | | | - Hirohisa Kishino
- Graduate School of Agriculture and Life Sciences, The University of Tokyo, Tokyo 113-8657, Japan.,The Research Institute of Evolutionary Biology, Tokyo 138-0098, Japan
| |
Collapse
|
21
|
Ioannidis AG, Blanco-Portillo J, Sandoval K, Hagelberg E, Barberena-Jonas C, Hill AVS, Rodríguez-Rodríguez JE, Fox K, Robson K, Haoa-Cardinali S, Quinto-Cortés CD, Miquel-Poblete JF, Auckland K, Parks T, Sofro ASM, Ávila-Arcos MC, Sockell A, Homburger JR, Eng C, Huntsman S, Burchard EG, Gignoux CR, Verdugo RA, Moraga M, Bustamante CD, Mentzer AJ, Moreno-Estrada A. Paths and timings of the peopling of Polynesia inferred from genomic networks. Nature 2021; 597:522-526. [PMID: 34552258 PMCID: PMC9710236 DOI: 10.1038/s41586-021-03902-8] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 08/12/2021] [Indexed: 02/08/2023]
Abstract
Polynesia was settled in a series of extraordinary voyages across an ocean spanning one third of the Earth1, but the sequences of islands settled remain unknown and their timings disputed. Currently, several centuries separate the dates suggested by different archaeological surveys2-4. Here, using genome-wide data from merely 430 modern individuals from 21 key Pacific island populations and novel ancestry-specific computational analyses, we unravel the detailed genetic history of this vast, dispersed island network. Our reconstruction of the branching Polynesian migration sequence reveals a serial founder expansion, characterized by directional loss of variants, that originated in Samoa and spread first through the Cook Islands (Rarotonga), then to the Society (Tōtaiete mā) Islands (11th century), the western Austral (Tuha'a Pae) Islands and Tuāmotu Archipelago (12th century), and finally to the widely separated, but genetically connected, megalithic statue-building cultures of the Marquesas (Te Henua 'Enana) Islands in the north, Raivavae in the south, and Easter Island (Rapa Nui), the easternmost of the Polynesian islands, settled in approximately AD 1200 via Mangareva.
Collapse
Affiliation(s)
- Alexander G Ioannidis
- Institute for Computational and Mathematical Engineering, Stanford University, Stanford, CA, USA.
- National Laboratory of Genomics for Biodiversity (LANGEBIO)-Advanced Genomics Unit (UGA), CINVESTAV, Irapuato, Guanajuato, Mexico.
| | - Javier Blanco-Portillo
- National Laboratory of Genomics for Biodiversity (LANGEBIO)-Advanced Genomics Unit (UGA), CINVESTAV, Irapuato, Guanajuato, Mexico
| | - Karla Sandoval
- National Laboratory of Genomics for Biodiversity (LANGEBIO)-Advanced Genomics Unit (UGA), CINVESTAV, Irapuato, Guanajuato, Mexico
| | | | - Carmina Barberena-Jonas
- National Laboratory of Genomics for Biodiversity (LANGEBIO)-Advanced Genomics Unit (UGA), CINVESTAV, Irapuato, Guanajuato, Mexico
| | - Adrian V S Hill
- Wellcome Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford, UK
- The Jenner Institute, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Juan Esteban Rodríguez-Rodríguez
- National Laboratory of Genomics for Biodiversity (LANGEBIO)-Advanced Genomics Unit (UGA), CINVESTAV, Irapuato, Guanajuato, Mexico
| | - Keolu Fox
- Department of Anthropology, University of California San Diego, La Jolla, CA, USA
| | - Kathryn Robson
- MRC Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, UK
| | | | - Consuelo D Quinto-Cortés
- National Laboratory of Genomics for Biodiversity (LANGEBIO)-Advanced Genomics Unit (UGA), CINVESTAV, Irapuato, Guanajuato, Mexico
| | | | - Kathryn Auckland
- Wellcome Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford, UK
| | - Tom Parks
- Wellcome Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford, UK
| | - Abdul Salam M Sofro
- Department of Biochemistry, Faculty of Medicine, Yayasan Rumah Sakit Islam (YARSI) University, Cempaka Putih, Jakarta, Indonesia
| | - María C Ávila-Arcos
- International Laboratory for Human Genome Research (LIIGH), UNAM Juriquilla, Queretaro, Mexico
| | - Alexandra Sockell
- Center for Computational, Evolutionary and Human Genomics (CEHG), Stanford University, Stanford, CA, USA
| | - Julian R Homburger
- Center for Computational, Evolutionary and Human Genomics (CEHG), Stanford University, Stanford, CA, USA
| | - Celeste Eng
- Program in Pharmaceutical Sciences and Pharmacogenomics, Department of Medicine, University of California San Francisco, San Francisco, CA, USA
| | - Scott Huntsman
- Program in Pharmaceutical Sciences and Pharmacogenomics, Department of Medicine, University of California San Francisco, San Francisco, CA, USA
| | - Esteban G Burchard
- Program in Pharmaceutical Sciences and Pharmacogenomics, Department of Medicine, University of California San Francisco, San Francisco, CA, USA
| | - Christopher R Gignoux
- Division of Biomedical Informatics and Personalized Medicine, University of Colorado, Denver, CO, USA
| | - Ricardo A Verdugo
- Human Genetics Program, Institute of Biomedical Sciences, Faculty of Medicine, University of Chile, Santiago, Chile
- Translational Oncology Department, Faculty of Medicine, University of Chile, Santiago, Chile
| | - Mauricio Moraga
- Human Genetics Program, Institute of Biomedical Sciences, Faculty of Medicine, University of Chile, Santiago, Chile
- Department of Anthropology, Faculty of Social Sciences, University of Chile, Santiago, Chile
| | - Carlos D Bustamante
- Center for Computational, Evolutionary and Human Genomics (CEHG), Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Alexander J Mentzer
- Wellcome Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford, UK
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK
| | - Andrés Moreno-Estrada
- National Laboratory of Genomics for Biodiversity (LANGEBIO)-Advanced Genomics Unit (UGA), CINVESTAV, Irapuato, Guanajuato, Mexico.
| |
Collapse
|
22
|
Molloy EK, Durvasula A, Sankararaman S. Advancing admixture graph estimation via maximum likelihood network orientation. Bioinformatics 2021; 37:i142-i150. [PMID: 34252951 PMCID: PMC8336447 DOI: 10.1093/bioinformatics/btab267] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/21/2021] [Indexed: 11/18/2022] Open
Abstract
Motivation Admixture, the interbreeding between previously distinct populations, is a pervasive force in evolution. The evolutionary history of populations in the presence of admixture can be modeled by augmenting phylogenetic trees with additional nodes that represent admixture events. While enabling a more faithful representation of evolutionary history, admixture graphs present formidable inferential challenges, and there is an increasing need for methods that are accurate, fully automated and computationally efficient. One key challenge arises from the size of the space of admixture graphs. Given that exhaustively evaluating all admixture graphs can be prohibitively expensive, heuristics have been developed to enable efficient search over this space. One heuristic, implemented in the popular method TreeMix, consists of adding edges to a starting tree while optimizing a suitable objective function. Results Here, we present a demographic model (with one admixed population incident to a leaf) where TreeMix and any other starting-tree-based maximum likelihood heuristic using its likelihood function is guaranteed to get stuck in a local optimum and return an incorrect network topology. To address this issue, we propose a new search strategy that we term maximum likelihood network orientation (MLNO). We augment TreeMix with an exhaustive search for an MLNO, referring to this approach as OrientAGraph. In evaluations including previously published admixture graphs, OrientAGraph outperformed TreeMix on 4/8 models (there are no differences in the other cases). Overall, OrientAGraph found graphs with higher likelihood scores and topological accuracy while remaining computationally efficient. Lastly, our study reveals several directions for improving maximum likelihood admixture graph estimation. Availability and implementation OrientAGraph is available on Github (https://github.com/sriramlab/OrientAGraph) under the GNU General Public License v3.0. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Erin K Molloy
- Department of Computer Science, University of California, Los Angeles, LA 90095, USA.,Institute for Advanced Computer Studies, University of Maryland, College Park, College Park, MD 20740, USA
| | - Arun Durvasula
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, LA 90095, USA
| | - Sriram Sankararaman
- Department of Computer Science, University of California, Los Angeles, LA 90095, USA.,Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, LA 90095, USA.,Bioinformatics Interdepartmental Program, University of California, Los Angeles, LA 90095, USA.,Department of Computational Medicine, University of California, Los Angeles, LA 90095, USA
| |
Collapse
|
23
|
Wu Y. Inference of population admixture network from local gene genealogies: a coalescent-based maximum likelihood approach. Bioinformatics 2021; 36:i326-i334. [PMID: 32657366 PMCID: PMC7355278 DOI: 10.1093/bioinformatics/btaa465] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
Motivation Population admixture is an important subject in population genetics. Inferring population demographic history with admixture under the so-called admixture network model from population genetic data is an established problem in genetics. Existing admixture network inference approaches work with single genetic polymorphisms. While these methods are usually very fast, they do not fully utilize the information [e.g. linkage disequilibrium (LD)] contained in population genetic data. Results In this article, we develop a new admixture network inference method called GTmix. Different from existing methods, GTmix works with local gene genealogies that can be inferred from population haplotypes. Local gene genealogies represent the evolutionary history of sampled haplotypes and contain the LD information. GTmix performs coalescent-based maximum likelihood inference of admixture networks with inferred local genealogies based on the well-known multispecies coalescent (MSC) model. GTmix utilizes various techniques to speed up the likelihood computation on the MSC model and the optimal network search. Our simulations show that GTmix can infer more accurate admixture networks with much smaller data than existing methods, even when these existing methods are given much larger data. GTmix is reasonably efficient and can analyze population genetic datasets of current interests. Availability and implementation The program GTmix is available for download at: https://github.com/yufengwudcs/GTmix. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yufeng Wu
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269, USA
| |
Collapse
|
24
|
Fortes-Lima CA, Laurent R, Thouzeau V, Toupance B, Verdu P. Complex genetic admixture histories reconstructed with Approximate Bayesian Computation. Mol Ecol Resour 2021; 21:1098-1117. [PMID: 33452723 PMCID: PMC8247995 DOI: 10.1111/1755-0998.13325] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2019] [Revised: 12/11/2020] [Accepted: 01/07/2021] [Indexed: 01/19/2023]
Abstract
Admixture is a fundamental evolutionary process that has influenced genetic patterns in numerous species. Maximum‐likelihood approaches based on allele frequencies and linkage‐disequilibrium have been extensively used to infer admixture processes from genome‐wide data sets, mostly in human populations. Nevertheless, complex admixture histories, beyond one or two pulses of admixture, remain methodologically challenging to reconstruct. We developed an Approximate Bayesian Computation (ABC) framework to reconstruct highly complex admixture histories from independent genetic markers. We built the software package methis to simulate independent SNPs or microsatellites in a two‐way admixed population for scenarios with multiple admixture pulses, monotonically decreasing or increasing recurring admixture, or combinations of these scenarios. methis allows users to draw model‐parameter values from prior distributions set by the user, and, for each simulation, methis can calculate numerous summary statistics describing genetic diversity patterns and moments of the distribution of individual admixture fractions. We coupled methis with existing machine‐learning ABC algorithms and investigated the admixture history of admixed populations. Results showed that random forest ABC scenario‐choice could accurately distinguish among most complex admixture scenarios, and errors were mainly found in regions of the parameter space where scenarios were highly nested, and, thus, biologically similar. We focused on African American and Barbadian populations as two study‐cases. We found that neural network ABC posterior parameter estimation was accurate and reasonably conservative under complex admixture scenarios. For both admixed populations, we found that monotonically decreasing contributions over time, from Europe and Africa, explained the observed data more accurately than multiple admixture pulses. This approach will allow for reconstructing detailed admixture histories when maximum‐likelihood methods are intractable.
Collapse
Affiliation(s)
- Cesar A Fortes-Lima
- UMR7206 Eco-anthropologie, CNRS, Muséum National d'Histoire Naturelle, Université de Paris, Paris, France.,Sub-department of Human Evolution, Department of Organismal Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | - Romain Laurent
- UMR7206 Eco-anthropologie, CNRS, Muséum National d'Histoire Naturelle, Université de Paris, Paris, France
| | - Valentin Thouzeau
- UMR7534 Centre de Recherche en Mathématiques de la Décision, CNRS, Université Paris-Dauphine, PSL University, Paris, France.,Laboratoire de Sciences Cognitives et Psycholinguistique, Département d'Etudes Cognitives, ENS, PSL University, EHESS, CNRS, Paris, France
| | - Bruno Toupance
- UMR7206 Eco-anthropologie, CNRS, Muséum National d'Histoire Naturelle, Université de Paris, Paris, France
| | - Paul Verdu
- UMR7206 Eco-anthropologie, CNRS, Muséum National d'Histoire Naturelle, Université de Paris, Paris, France
| |
Collapse
|
25
|
Yan J, Patterson N, Narasimhan VM. miqoGraph: fitting admixture graphs using mixed-integer quadratic optimization. Bioinformatics 2020; 37:2488-2490. [PMID: 33247708 DOI: 10.1093/bioinformatics/btaa988] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Revised: 10/25/2020] [Accepted: 11/16/2020] [Indexed: 11/14/2022] Open
Abstract
SUMMARY Admixture graphs represent the genetic relationship between a set of populations through splits, drift and admixture. In this article, we present the Julia package miqoGraph, which uses mixed-integer quadratic optimization to fit topology, drift lengths and admixture proportions simultaneously. Through applications of miqoGraph to both simulated and real data, we show that integer optimization can greatly speed up and automate what is usually an arduous manual process. AVAILABILITY AND IMPLEMENTATION https://github.com/juliayyan/PhylogeneticTrees.jl. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Julia Yan
- Operations Research Center, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Nick Patterson
- Department of Genetics, Harvard Medical School, Boston, MA, 02115, USA
| | - Vagheesh M Narasimhan
- Department of Genetics, Harvard Medical School, Boston, MA, 02115, USA.,Department of Integrative Biology, The University of Texas at Austin.,Department of Statistics and Data Science, The University of Texas at Austin
| |
Collapse
|
26
|
Lipson M. Applying f 4 -statistics and admixture graphs: Theory and examples. Mol Ecol Resour 2020; 20:1658-1667. [PMID: 32717097 PMCID: PMC11563031 DOI: 10.1111/1755-0998.13230] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Accepted: 07/02/2020] [Indexed: 01/25/2023]
Abstract
A popular approach to learning about admixture from population genetic data is by computing the allele-sharing summary statistics known as f-statistics. Compared to some methods in population genetics, f-statistics are relatively simple, but interpreting them can still be complicated at times. In addition, f-statistics can be used to build admixture graphs (multi-population trees allowing for admixture events), which provide more explicit and thorough modelling capabilities but are correspondingly more complex to work with. Here, I discuss some of these issues to provide users of these tools with a basic guide for protocols and procedures. My focus is on the kinds of conclusions that can or cannot be drawn from the results of f4 -statistics and admixture graphs, illustrated with real-world examples involving human populations.
Collapse
Affiliation(s)
- Mark Lipson
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| |
Collapse
|
27
|
Elhaik E, Ryan DM. Pair Matcher (PaM): fast model-based optimization of treatment/case-control matches. Bioinformatics 2020; 35:2243-2250. [PMID: 30445488 PMCID: PMC6596890 DOI: 10.1093/bioinformatics/bty946] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2017] [Revised: 11/03/2018] [Accepted: 11/15/2018] [Indexed: 11/22/2022] Open
Abstract
Motivation In clinical trials, individuals are matched using demographic criteria, paired and then randomly assigned to treatment and control groups to determine a drug’s efficacy. A chief cause for the irreproducibility of results across pilot to Phase-III trials is population stratification bias caused by the uneven distribution of ancestries in the treatment and control groups. Results Pair Matcher (PaM) addresses stratification bias by optimizing pairing assignments a priori and/or a posteriori to the trial using both genetic and demographic criteria. Using simulated and real datasets, we show that PaM identifies ideal and near-ideal pairs that are more genetically homogeneous than those identified based on competing methods, including the commonly used principal component analysis (PCA). Homogenizing the treatment (or case) and control groups can be expected to improve the accuracy and reproducibility of the trial or genetic study. PaM’s ancestral inferences also allow characterizing responders and developing a precision medicine approach to treatment. Availability and implementation PaM is freely available via Rhttps://github.com/eelhaik/PAM and a web-interface at http://elhaik-matcher.sheffield.ac.uk/ElhaikLab/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Eran Elhaik
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield UK, UK.,INSIGNEO Institute for In Silico Medicine, University of Sheffield, Sheffield UK, UK
| | - Desmond M Ryan
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield UK, UK
| |
Collapse
|
28
|
Dehasque M, Ávila‐Arcos MC, Díez‐del‐Molino D, Fumagalli M, Guschanski K, Lorenzen ED, Malaspinas A, Marques‐Bonet T, Martin MD, Murray GGR, Papadopulos AST, Therkildsen NO, Wegmann D, Dalén L, Foote AD. Inference of natural selection from ancient DNA. Evol Lett 2020; 4:94-108. [PMID: 32313686 PMCID: PMC7156104 DOI: 10.1002/evl3.165] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2019] [Revised: 01/13/2020] [Accepted: 02/02/2020] [Indexed: 01/01/2023] Open
Abstract
Evolutionary processes, including selection, can be indirectly inferred based on patterns of genomic variation among contemporary populations or species. However, this often requires unrealistic assumptions of ancestral demography and selective regimes. Sequencing ancient DNA from temporally spaced samples can inform about past selection processes, as time series data allow direct quantification of population parameters collected before, during, and after genetic changes driven by selection. In this Comment and Opinion, we advocate for the inclusion of temporal sampling and the generation of paleogenomic datasets in evolutionary biology, and highlight some of the recent advances that have yet to be broadly applied by evolutionary biologists. In doing so, we consider the expected signatures of balancing, purifying, and positive selection in time series data, and detail how this can advance our understanding of the chronology and tempo of genomic change driven by selection. However, we also recognize the limitations of such data, which can suffer from postmortem damage, fragmentation, low coverage, and typically low sample size. We therefore highlight the many assumptions and considerations associated with analyzing paleogenomic data and the assumptions associated with analytical methods.
Collapse
Affiliation(s)
- Marianne Dehasque
- Centre for Palaeogenetics10691StockholmSweden
- Department of Bioinformatics and GeneticsSwedish Museum of Natural History10405StockholmSweden
- Department of ZoologyStockholm University10691StockholmSweden
| | - María C. Ávila‐Arcos
- International Laboratory for Human Genome Research (LIIGH)UNAM JuriquillaQueretaro76230Mexico
| | - David Díez‐del‐Molino
- Centre for Palaeogenetics10691StockholmSweden
- Department of ZoologyStockholm University10691StockholmSweden
| | - Matteo Fumagalli
- Department of Life Sciences, Silwood Park CampusImperial College LondonAscotSL5 7PYUnited Kingdom
| | - Katerina Guschanski
- Animal Ecology, Department of Ecology and Genetics, Science for Life LaboratoryUppsala University75236UppsalaSweden
| | | | - Anna‐Sapfo Malaspinas
- Department of Computational BiologyUniversity of Lausanne1015LausanneSwitzerland
- SIB Swiss Institute of Bioinformatics1015LausanneSwitzerland
| | - Tomas Marques‐Bonet
- Institut de Biologia Evolutiva(CSIC‐Universitat Pompeu Fabra), Parc de Recerca Biomèdica de BarcelonaBarcelonaSpain
- National Centre for Genomic Analysis—Centre for Genomic RegulationBarcelona Institute of Science and Technology08028BarcelonaSpain
- Institucio Catalana de Recerca i Estudis Avançats08010BarcelonaSpain
- Institut Català de Paleontologia Miquel CrusafontUniversitat Autònoma de BarcelonaCerdanyola del VallèsSpain
| | - Michael D. Martin
- Department of Natural History, NTNU University MuseumNorwegian University of Science and Technology (NTNU)TrondheimNorway
| | - Gemma G. R. Murray
- Department of Veterinary MedicineUniversity of CambridgeCambridgeCB2 1TNUnited Kingdom
| | - Alexander S. T. Papadopulos
- Molecular Ecology and Fisheries Genetics Laboratory, School of Biological SciencesBangor UniversityBangorLL57 2UWUnited Kingdom
| | | | - Daniel Wegmann
- Department of BiologyUniversité de Fribourg1700FribourgSwitzerland
- Swiss Institute of BioinformaticsFribourgSwitzerland
| | - Love Dalén
- Centre for Palaeogenetics10691StockholmSweden
- Department of Bioinformatics and GeneticsSwedish Museum of Natural History10405StockholmSweden
| | - Andrew D. Foote
- Molecular Ecology and Fisheries Genetics Laboratory, School of Biological SciencesBangor UniversityBangorLL57 2UWUnited Kingdom
| |
Collapse
|
29
|
Meier JI, Stelkens RB, Joyce DA, Mwaiko S, Phiri N, Schliewen UK, Selz OM, Wagner CE, Katongo C, Seehausen O. The coincidence of ecological opportunity with hybridization explains rapid adaptive radiation in Lake Mweru cichlid fishes. Nat Commun 2019; 10:5391. [PMID: 31796733 PMCID: PMC6890737 DOI: 10.1038/s41467-019-13278-z] [Citation(s) in RCA: 57] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2019] [Accepted: 10/22/2019] [Indexed: 01/26/2023] Open
Abstract
The process of adaptive radiation was classically hypothesized to require isolation of a lineage from its source (no gene flow) and from related species (no competition). Alternatively, hybridization between species may generate genetic variation that facilitates adaptive radiation. Here we study haplochromine cichlid assemblages in two African Great Lakes to test these hypotheses. Greater biotic isolation (fewer lineages) predicts fewer constraints by competition and hence more ecological opportunity in Lake Bangweulu, whereas opportunity for hybridization predicts increased genetic potential in Lake Mweru. In Lake Bangweulu, we find no evidence for hybridization but also no adaptive radiation. We show that the Bangweulu lineages also colonized Lake Mweru, where they hybridized with Congolese lineages and then underwent multiple adaptive radiations that are strikingly complementary in ecology and morphology. Our data suggest that the presence of several related lineages does not necessarily prevent adaptive radiation, although it constrains the trajectories of morphological diversification. It might instead facilitate adaptive radiation when hybridization generates genetic variation, without which radiation may start much later, progress more slowly or never occur.
Collapse
Affiliation(s)
- Joana I Meier
- Division of Aquatic Ecology & Evolution, Institute of Ecology and Evolution, University of Bern, Baltzerstr. 6, CH-3012, Bern, Switzerland
- Department of Fish Ecology and Evolution, Centre of Ecology, Evolution and Biogeochemistry (CEEB), Eawag Swiss Federal Institute of Aquatic Science and Technology, Seestrasse 79, CH-6047, Kastanienbaum, Switzerland
- Department of Zoology, University of Cambridge, Downing Street, Cambridge, CB2 3EJ, UK
- St John's College, University of Cambridge, St John's Street, Cambridge, CB2 1TP, UK
| | - Rike B Stelkens
- Division of Aquatic Ecology & Evolution, Institute of Ecology and Evolution, University of Bern, Baltzerstr. 6, CH-3012, Bern, Switzerland
- Department of Fish Ecology and Evolution, Centre of Ecology, Evolution and Biogeochemistry (CEEB), Eawag Swiss Federal Institute of Aquatic Science and Technology, Seestrasse 79, CH-6047, Kastanienbaum, Switzerland
- Division of Population Genetics, Department of Zoology, Stockholm University, Svante Arrheniusväg 18 B, 106 91, Stockholm, Sweden
| | - Domino A Joyce
- Evolutionary and Ecological Genomics Group, Department of Biological and Marine Sciences, University of Hull, Hull, HU6 7RX, UK
| | - Salome Mwaiko
- Division of Aquatic Ecology & Evolution, Institute of Ecology and Evolution, University of Bern, Baltzerstr. 6, CH-3012, Bern, Switzerland
- Department of Fish Ecology and Evolution, Centre of Ecology, Evolution and Biogeochemistry (CEEB), Eawag Swiss Federal Institute of Aquatic Science and Technology, Seestrasse 79, CH-6047, Kastanienbaum, Switzerland
| | - Numel Phiri
- Department of Biological Sciences, University of Zambia, Lusaka, Zambia
| | - Ulrich K Schliewen
- SNSB-Bavarian State Collection of Zoology, Münchhausenstrasse 21, 81247, Munich, Germany
| | - Oliver M Selz
- Division of Aquatic Ecology & Evolution, Institute of Ecology and Evolution, University of Bern, Baltzerstr. 6, CH-3012, Bern, Switzerland
- Department of Fish Ecology and Evolution, Centre of Ecology, Evolution and Biogeochemistry (CEEB), Eawag Swiss Federal Institute of Aquatic Science and Technology, Seestrasse 79, CH-6047, Kastanienbaum, Switzerland
| | - Catherine E Wagner
- Division of Aquatic Ecology & Evolution, Institute of Ecology and Evolution, University of Bern, Baltzerstr. 6, CH-3012, Bern, Switzerland
- Department of Fish Ecology and Evolution, Centre of Ecology, Evolution and Biogeochemistry (CEEB), Eawag Swiss Federal Institute of Aquatic Science and Technology, Seestrasse 79, CH-6047, Kastanienbaum, Switzerland
- Biodiversity Institute and Department of Botany, University of Wyoming, Laramie, WY, 82071, USA
| | - Cyprian Katongo
- Department of Biological Sciences, University of Zambia, Lusaka, Zambia
| | - Ole Seehausen
- Division of Aquatic Ecology & Evolution, Institute of Ecology and Evolution, University of Bern, Baltzerstr. 6, CH-3012, Bern, Switzerland.
- Department of Fish Ecology and Evolution, Centre of Ecology, Evolution and Biogeochemistry (CEEB), Eawag Swiss Federal Institute of Aquatic Science and Technology, Seestrasse 79, CH-6047, Kastanienbaum, Switzerland.
| |
Collapse
|
30
|
Abstract
Interspecific hybridization is the process where closely related species mate and produce offspring with admixed genomes. The genomic revolution has shown that hybridization is common, and that it may represent an important source of novel variation. Although most interspecific hybrids are sterile or less fit than their parents, some may survive and reproduce, enabling the transfer of adaptive variants across the species boundary, and even result in the formation of novel evolutionary lineages. There are two main variants of hybrid species genomes: allopolyploid, which have one full chromosome set from each parent species, and homoploid, which are a mosaic of the parent species genomes with no increase in chromosome number. The establishment of hybrid species requires the development of reproductive isolation against parental species. Allopolyploid species often have strong intrinsic reproductive barriers due to differences in chromosome number, and homoploid hybrids can become reproductively isolated from the parent species through assortment of genetic incompatibilities. However, both types of hybrids can become further reproductively isolated, gaining extrinsic isolation barriers, by exploiting novel ecological niches, relative to their parents. Hybrids represent the merging of divergent genomes and thus face problems arising from incompatible combinations of genes. Thus hybrid genomes are highly dynamic and undergo rapid evolutionary change, including genome stabilization in which selection against incompatible combinations results in fixation of compatible ancestry block combinations within the hybrid species. The potential for rapid adaptation or speciation makes hybrid genomes a particularly exciting subject of in evolutionary biology. Here we summarize how introgressed alleles or hybrid species can establish and how the resulting hybrid genomes evolve.
Collapse
Affiliation(s)
- Anna Runemark
- Department of Biology, Lund University, Lund, Sweden
- * E-mail:
| | - Mario Vallejo-Marin
- Biological and Environmental Sciences, University of Stirling, Stirling, Scotland, United Kingdom
| | - Joana I. Meier
- St John's College, Cambridge, Cambridge, United Kingdom
- Department of Zoology, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
31
|
Refoyo-Martínez A, da Fonseca RR, Halldórsdóttir K, Árnason E, Mailund T, Racimo F. Identifying loci under positive selection in complex population histories. Genome Res 2019; 29:1506-1520. [PMID: 31362936 PMCID: PMC6724678 DOI: 10.1101/gr.246777.118] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2018] [Accepted: 07/23/2019] [Indexed: 12/24/2022]
Abstract
Detailed modeling of a species' history is of prime importance for understanding how natural selection operates over time. Most methods designed to detect positive selection along sequenced genomes, however, use simplified representations of past histories as null models of genetic drift. Here, we present the first method that can detect signatures of strong local adaptation across the genome using arbitrarily complex admixture graphs, which are typically used to describe the history of past divergence and admixture events among any number of populations. The method-called graph-aware retrieval of selective sweeps (GRoSS)-has good power to detect loci in the genome with strong evidence for past selective sweeps and can also identify which branch of the graph was most affected by the sweep. As evidence of its utility, we apply the method to bovine, codfish, and human population genomic data containing panels of multiple populations related in complex ways. We find new candidate genes for important adaptive functions, including immunity and metabolism in understudied human populations, as well as muscle mass, milk production, and tameness in specific bovine breeds. We are also able to pinpoint the emergence of large regions of differentiation owing to inversions in the history of Atlantic codfish.
Collapse
Affiliation(s)
- Alba Refoyo-Martínez
- Lundbeck GeoGenetics Centre, The Globe Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen 1350, Denmark
| | - Rute R da Fonseca
- Centre for Macroecology, Evolution and Climate, The Globe Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Copehnagen 2100, Denmark
| | - Katrín Halldórsdóttir
- Faculty of Life and Environmental Sciences, University of Iceland, Reykjavík 107, Iceland
| | - Einar Árnason
- Faculty of Life and Environmental Sciences, University of Iceland, Reykjavík 107, Iceland
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts 02138, USA
| | - Thomas Mailund
- Bioinformatics Research Centre, Aarhus University, Aarhus 8000, Denmark
| | - Fernando Racimo
- Lundbeck GeoGenetics Centre, The Globe Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen 1350, Denmark
| |
Collapse
|
32
|
Flagel L, Brandvain Y, Schrider DR. The Unreasonable Effectiveness of Convolutional Neural Networks in Population Genetic Inference. Mol Biol Evol 2019; 36:220-238. [PMID: 30517664 PMCID: PMC6367976 DOI: 10.1093/molbev/msy224] [Citation(s) in RCA: 110] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Population-scale genomic data sets have given researchers incredible amounts of information from which to infer evolutionary histories. Concomitant with this flood of data, theoretical and methodological advances have sought to extract information from genomic sequences to infer demographic events such as population size changes and gene flow among closely related populations/species, construct recombination maps, and uncover loci underlying recent adaptation. To date, most methods make use of only one or a few summaries of the input sequences and therefore ignore potentially useful information encoded in the data. The most sophisticated of these approaches involve likelihood calculations, which require theoretical advances for each new problem, and often focus on a single aspect of the data (e.g., only allele frequency information) in the interest of mathematical and computational tractability. Directly interrogating the entirety of the input sequence data in a likelihood-free manner would thus offer a fruitful alternative. Here, we accomplish this by representing DNA sequence alignments as images and using a class of deep learning methods called convolutional neural networks (CNNs) to make population genetic inferences from these images. We apply CNNs to a number of evolutionary questions and find that they frequently match or exceed the accuracy of current methods. Importantly, we show that CNNs perform accurate evolutionary model selection and parameter estimation, even on problems that have not received detailed theoretical treatments. Thus, when applied to population genetic alignments, CNNs are capable of outperforming expert-derived statistical methods and offer a new path forward in cases where no likelihood approach exists.
Collapse
Affiliation(s)
- Lex Flagel
- Monsanto Company, Chesterfield, MO
- Department of Plant and Microbial Biology, University of Minnesota, St. Paul, MN
| | - Yaniv Brandvain
- Department of Plant and Microbial Biology, University of Minnesota, St. Paul, MN
| | - Daniel R Schrider
- Department of Genetics, University of North Carolina, Chapel Hill, NC
| |
Collapse
|
33
|
Advances in Computational Methods for Phylogenetic Networks in the Presence of Hybridization. BIOINFORMATICS AND PHYLOGENETICS 2019. [DOI: 10.1007/978-3-030-10837-3_13] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
34
|
Soraggi S, Wiuf C. General theory for stochastic admixture graphs and F-statistics. Theor Popul Biol 2018; 125:56-66. [PMID: 30562538 DOI: 10.1016/j.tpb.2018.12.002] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2018] [Revised: 11/17/2018] [Accepted: 12/03/2018] [Indexed: 10/27/2022]
Abstract
We provide a general mathematical framework based on the theory of graphical models to study admixture graphs. Admixture graphs are used to describe the ancestral relationships between past and present populations, allowing for population merges and migration events, by means of gene flow. We give various mathematical properties of admixture graphs with particular focus on properties of the so-called F-statistics. Also the Wright-Fisher model is studied and a general expression for the loss of heterozygosity is derived.
Collapse
Affiliation(s)
- Samuele Soraggi
- Department of Mathematical Sciences, University of Copenhagen, Denmark
| | - Carsten Wiuf
- Department of Mathematical Sciences, University of Copenhagen, Denmark.
| |
Collapse
|
35
|
Wangkumhang P, Hellenthal G. Statistical methods for detecting admixture. Curr Opin Genet Dev 2018; 53:121-127. [PMID: 30245220 DOI: 10.1016/j.gde.2018.08.002] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2018] [Revised: 08/03/2018] [Accepted: 08/09/2018] [Indexed: 10/28/2022]
Abstract
The increasing availability of large-scale autosomal genetic variation data sampled from world-wide geographic areas, coupled with advances in the statistical methodology to analyse these data, is showcasing the power of DNA as a major tool to gain insights into the demographic history of humans and other organisms. Here we review statistical techniques that shed light on a specific aspect of demography: the detection and description of admixture events where two or more genetically distinct groups intermixed at one or more times in the past. In particular we give an overview of some of the widely used methods to identify and describe admixture events using autosomal DNA from unrelated individuals, with a particular focus on analysing biallelic Single-Nucleotide-Polymorphsim (SNP) markers.
Collapse
Affiliation(s)
- Pongsakorn Wangkumhang
- University College London Genetics Institute (UGI), Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| | - Garrett Hellenthal
- University College London Genetics Institute (UGI), Department of Genetics, Evolution and Environment, University College London, London, United Kingdom.
| |
Collapse
|
36
|
Abstract
Signatures of recent historical admixture are ubiquitous in human populations. We present a mechanistic model of admixture with two source populations, encompassing recurrent admixture periods and study the distribution of admixture fractions for finite but arbitrary genome size. We provide simulation-based methods to estimate the introgression parameters and discuss the implications of reaching stationarity on estimability of parameters when there are recurrent admixture events with different rates.
Collapse
Affiliation(s)
- Erkan Ozge Buzbas
- Department of Statistical Science, University of Idaho, United States.
| | - Paul Verdu
- CNRS/MNHN/Université Paris Diderot/Sorbonne Paris Cité, France
| |
Collapse
|
37
|
Whole-genome analysis of Mustela erminea finds that pulsed hybridization impacts evolution at high latitudes. Commun Biol 2018; 1:51. [PMID: 30271934 PMCID: PMC6123727 DOI: 10.1038/s42003-018-0058-y] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2017] [Accepted: 04/20/2018] [Indexed: 01/19/2023] Open
Abstract
At high latitudes, climatic shifts hypothetically initiate recurrent episodes of divergence by isolating populations in glacial refugia—ice-free regions that enable terrestrial species persistence. Upon glacial recession, populations subsequently expand and often come into contact with other independently diverging populations, resulting in gene flow. To understand how recurrent periods of isolation and contact may have impacted evolution at high latitudes, we investigated introgression dynamics in the stoat (Mustela erminea), a Holarctic mammalian carnivore, using whole-genome sequences. We identify two spatio-temporally distinct episodes of introgression coincident with large-scale climatic shifts: contemporary introgression in a mainland contact zone and ancient contact ~200 km south of the contemporary zone, in the archipelagos along North America’s North Pacific Coast. Repeated episodes of gene flow highlight the central role of cyclic climates in structuring high-latitude diversity, through refugial divergence and introgressive hybridization. When introgression is followed by allopatric isolation (e.g., insularization) it may ultimately expedite divergence. Jocelyn Colella et al. report whole-genome sequences of 10 stoats (Mustela erminea) from four regions of glacial refugia. They find evidence for two past introgressive events between lineages that coincide with interglacial periods, a pattern that may extend to other high–latitude species.
Collapse
|
38
|
Wang Y, Lu D, Chung YJ, Xu S. Genetic structure, divergence and admixture of Han Chinese, Japanese and Korean populations. Hereditas 2018; 155:19. [PMID: 29636655 PMCID: PMC5889524 DOI: 10.1186/s41065-018-0057-5] [Citation(s) in RCA: 79] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2018] [Accepted: 03/23/2018] [Indexed: 12/25/2022] Open
Abstract
Background Han Chinese, Japanese and Korean, the three major ethnic groups of East Asia, share many similarities in appearance, language and culture etc., but their genetic relationships, divergence times and subsequent genetic exchanges have not been well studied. Results We conducted a genome-wide study and evaluated the population structure of 182 Han Chinese, 90 Japanese and 100 Korean individuals, together with the data of 630 individuals representing 8 populations wordwide. Our analyses revealed that Han Chinese, Japanese and Korean populations have distinct genetic makeup and can be well distinguished based on either the genome wide data or a panel of ancestry informative markers (AIMs). Their genetic structure corresponds well to their geographical distributions, indicating geographical isolation played a critical role in driving population differentiation in East Asia. The most recent common ancestor of the three populations was dated back to 3000 ~ 3600 years ago. Our analyses also revealed substantial admixture within the three populations which occurred subsequent to initial splits, and distinct gene introgression from surrounding populations, of which northern ancestral component is dominant. Conclusions These estimations and findings facilitate to understanding population history and mechanism of human genetic diversity in East Asia, and have implications for both evolutionary and medical studies.
Collapse
Affiliation(s)
- Yuchen Wang
- 1Chinese Academy of Sciences (CAS) Key Laboratory of Computational Biology, Max Planck Independent Research Group on Population Genomics, CAS-MPG Partner Institute for Computational Biology (PICB), Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031 China.,2University of Chinese Academy of Sciences, Beijing, 100049 China
| | - Dongsheng Lu
- 1Chinese Academy of Sciences (CAS) Key Laboratory of Computational Biology, Max Planck Independent Research Group on Population Genomics, CAS-MPG Partner Institute for Computational Biology (PICB), Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031 China.,2University of Chinese Academy of Sciences, Beijing, 100049 China
| | - Yeun-Jun Chung
- 3Integrated Research Center for Genome Polymorphism, Department of Microbiology, The Catholic University Medical College, Seoul, Socho-gu 137-701 South Korea
| | - Shuhua Xu
- 1Chinese Academy of Sciences (CAS) Key Laboratory of Computational Biology, Max Planck Independent Research Group on Population Genomics, CAS-MPG Partner Institute for Computational Biology (PICB), Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031 China.,2University of Chinese Academy of Sciences, Beijing, 100049 China.,4School of Life Science and Technology ShanghaiTech University, Shanghai, 201210 China.,5Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, 650223 China.,Collaborative Innovation Center of Genetics and Development, Shanghai, 200438 China
| |
Collapse
|
39
|
Racimo F, Berg JJ, Pickrell JK. Detecting Polygenic Adaptation in Admixture Graphs. Genetics 2018; 208:1565-1584. [PMID: 29348143 PMCID: PMC5887149 DOI: 10.1534/genetics.117.300489] [Citation(s) in RCA: 73] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2017] [Accepted: 01/16/2018] [Indexed: 01/09/2023] Open
Abstract
An open question in human evolution is the importance of polygenic adaptation: adaptive changes in the mean of a multifactorial trait due to shifts in allele frequencies across many loci. In recent years, several methods have been developed to detect polygenic adaptation using loci identified in genome-wide association studies (GWAS). Though powerful, these methods suffer from limited interpretability: they can detect which sets of populations have evidence for polygenic adaptation, but are unable to reveal where in the history of multiple populations these processes occurred. To address this, we created a method to detect polygenic adaptation in an admixture graph, which is a representation of the historical divergences and admixture events relating different populations through time. We developed a Markov chain Monte Carlo (MCMC) algorithm to infer branch-specific parameters reflecting the strength of selection in each branch of a graph. Additionally, we developed a set of summary statistics that are fast to compute and can indicate which branches are most likely to have experienced polygenic adaptation. We show via simulations that this method-which we call PolyGraph-has good power to detect polygenic adaptation, and applied it to human population genomic data from around the world. We also provide evidence that variants associated with several traits, including height, educational attainment, and self-reported unibrow, have been influenced by polygenic adaptation in different populations during human evolution.
Collapse
Affiliation(s)
- Fernando Racimo
- New York Genome Center, New York, New York 10013
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, 1350, Denmark
| | - Jeremy J Berg
- Department of Biological Sciences, Columbia University, New York, New York 10027
| | - Joseph K Pickrell
- New York Genome Center, New York, New York 10013
- Department of Biological Sciences, Columbia University, New York, New York 10027
| |
Collapse
|
40
|
Pugach I, Duggan AT, Merriwether DA, Friedlaender FR, Friedlaender JS, Stoneking M. The Gateway from Near into Remote Oceania: New Insights from Genome-Wide Data. Mol Biol Evol 2018; 35:871-886. [PMID: 29301001 PMCID: PMC5889034 DOI: 10.1093/molbev/msx333] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
A widely accepted two-wave scenario of human settlement of Oceania involves the first out-of-Africa migration circa 50,000 years ago (ya), and the more recent Austronesian expansion, which reached the Bismarck Archipelago by 3,450 ya. Whereas earlier genetic studies provided evidence for extensive sex-biased admixture between the incoming and the indigenous populations, some archaeological, linguistic, and genetic evidence indicates a more complicated picture of settlement. To study regional variation in Oceania in more detail, we have compiled a genome-wide data set of 823 individuals from 72 populations (including 50 populations from Oceania) and over 620,000 autosomal single nucleotide polymorphisms (SNPs). We show that the initial dispersal of people from the Bismarck Archipelago into Remote Oceania occurred in a "leapfrog" fashion, completely by-passing the main chain of the Solomon Islands, and that the colonization of the Solomon Islands proceeded in a bidirectional manner. Our results also support a divergence between western and eastern Solomons, in agreement with the sharp linguistic divide known as the Tryon-Hackman line. We also report substantial post-Austronesian gene flow across the Solomons. In particular, Santa Cruz (in Remote Oceania) exhibits extraordinarily high levels of Papuan ancestry that cannot be explained by a simple bottleneck/founder event scenario. Finally, we use simulations to show that discrepancies between different methods for dating admixture likely reflect different sensitivities of the methods to multiple admixture events from the same (or similar) sources. Overall, this study points to the importance of fine-scale sampling to understand the complexities of human population history.
Collapse
Affiliation(s)
- Irina Pugach
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Ana T Duggan
- Department of Anthropology, McMaster University, Hamilton, Canada
| | | | | | | | - Mark Stoneking
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| |
Collapse
|
41
|
Leppälä K, Nielsen SV, Mailund T. admixturegraph: an R package for admixture graph manipulation and fitting. Bioinformatics 2018; 33:1738-1740. [PMID: 28158333 PMCID: PMC5447235 DOI: 10.1093/bioinformatics/btx048] [Citation(s) in RCA: 58] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2016] [Accepted: 01/24/2017] [Indexed: 11/13/2022] Open
Abstract
Summary Admixture graphs generalize phylogenetic trees by allowing genetic lineages to merge as well as split. In this paper we present the R package admixturegraph containing tools for building and visualizing admixture graphs, for fitting graph parameters to genetic data, for visualizing goodness of fit and for evaluating the relative goodness of fit between different graphs. Availability and Implementation GitHub: https://github.com/mailund/admixture_graph and CRAN: https://cran.r-project.org/web/packages/admixturegraph.
Collapse
Affiliation(s)
- Kalle Leppälä
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| | - Svend V Nielsen
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| | - Thomas Mailund
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| |
Collapse
|
42
|
Leonardi M, Librado P, Der Sarkissian C, Schubert M, Alfarhan AH, Alquraishi SA, Al-Rasheid KAS, Gamba C, Willerslev E, Orlando L. Evolutionary Patterns and Processes: Lessons from Ancient DNA. Syst Biol 2018; 66:e1-e29. [PMID: 28173586 PMCID: PMC5410953 DOI: 10.1093/sysbio/syw059] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2016] [Revised: 06/04/2016] [Accepted: 06/06/2016] [Indexed: 12/02/2022] Open
Abstract
Ever since its emergence in 1984, the field of ancient DNA has struggled to overcome the challenges related to the decay of DNA molecules in the fossil record. With the recent development of high-throughput DNA sequencing technologies and molecular techniques tailored to ultra-damaged templates, it has now come of age, merging together approaches in phylogenomics, population genomics, epigenomics, and metagenomics. Leveraging on complete temporal sample series, ancient DNA provides direct access to the most important dimension in evolution—time, allowing a wealth of fundamental evolutionary processes to be addressed at unprecedented resolution. This review taps into the most recent findings in ancient DNA research to present analyses of ancient genomic and metagenomic data.
Collapse
Affiliation(s)
- Michela Leonardi
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade, Copenhagen, Denmark
| | - Pablo Librado
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade, Copenhagen, Denmark
| | - Clio Der Sarkissian
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade, Copenhagen, Denmark
| | - Mikkel Schubert
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade, Copenhagen, Denmark
| | - Ahmed H Alfarhan
- Zoology Department, College of Science, King Saud University, Riyadh, Saudi Arabia
| | - Saleh A Alquraishi
- Zoology Department, College of Science, King Saud University, Riyadh, Saudi Arabia
| | | | - Cristina Gamba
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade, Copenhagen, Denmark
| | - Eske Willerslev
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade, Copenhagen, Denmark.,Zoology Department, College of Science, King Saud University, Riyadh, Saudi Arabia
| | - Ludovic Orlando
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade, Copenhagen, Denmark.,Université de Toulouse, University Paul Sabatier (UPS), Laboratoire AMIS, Toulouse, France
| |
Collapse
|
43
|
|
44
|
Zhao YX, Yang J, Lv FH, Hu XJ, Xie XL, Zhang M, Li WR, Liu MJ, Wang YT, Li JQ, Liu YG, Ren YL, Wang F, Hehua EE, Kantanen J, Arjen Lenstra J, Han JL, Li MH. Genomic Reconstruction of the History of Native Sheep Reveals the Peopling Patterns of Nomads and the Expansion of Early Pastoralism in East Asia. Mol Biol Evol 2017. [PMID: 28645168 PMCID: PMC5850515 DOI: 10.1093/molbev/msx181] [Citation(s) in RCA: 74] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
China has a rich resource of native sheep (Ovis aries) breeds associated with historical movements of several nomadic societies. However, the history of sheep and the associated nomadic societies in ancient China remains poorly understood. Here, we studied the genomic diversity of Chinese sheep using genome-wide SNPs, mitochondrial and Y-chromosomal variations in > 1,000 modern samples. Population genomic analyses combined with archeological records and historical ethnic demographics data revealed genetic signatures of the origins, secondary expansions and admixtures, of Chinese sheep thereby revealing the peopling patterns of nomads and the expansion of early pastoralism in East Asia. Originating from the Mongolian Plateau ∼5,000‒5,700 years ago, Chinese sheep were inferred to spread in the upper and middle reaches of the Yellow River ∼3,000‒5,000 years ago following the expansions of the Di-Qiang people. Afterwards, sheep were then inferred to reach the Qinghai-Tibetan and Yunnan-Kweichow plateaus ∼2,000‒2,600 years ago by following the north-to-southwest routes of the Di-Qiang migration. We also unveiled two subsequent waves of migrations of fat-tailed sheep into northern China, which were largely commensurate with the migrations of ancestors of Hui Muslims eastward and Mongols southward during the 12th‒13th centuries. Furthermore, we revealed signs of argali introgression into domestic sheep, extensive historical mixtures among domestic populations and strong artificial selection for tail type and other traits, reflecting various breeding strategies by nomadic societies in ancient China.
Collapse
Affiliation(s)
- Yong-Xin Zhao
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences (CAS), Beijing, China.,University of Chinese Academy of Sciences (UCAS), Beijing, China
| | - Ji Yang
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences (CAS), Beijing, China
| | - Feng-Hua Lv
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences (CAS), Beijing, China
| | - Xiao-Ju Hu
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences (CAS), Beijing, China.,University of Chinese Academy of Sciences (UCAS), Beijing, China
| | - Xing-Long Xie
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences (CAS), Beijing, China.,University of Chinese Academy of Sciences (UCAS), Beijing, China
| | - Min Zhang
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences (CAS), Beijing, China.,School of Life Sciences, University of Science and Technology of China, Hefei, China
| | - Wen-Rong Li
- Animal Biotechnological Research Center, Xinjiang Academy of Animal Science, Urumqi, China
| | - Ming-Jun Liu
- Animal Biotechnological Research Center, Xinjiang Academy of Animal Science, Urumqi, China
| | - Yu-Tao Wang
- College of Life and Geographic Sciences, Kashgar University, Kashgar, China
| | - Jin-Quan Li
- College of Animal Science, Inner Mongolia Agricultural University, Hohhot, China
| | - Yong-Gang Liu
- College of Animal Science and Technology, Yunnan Agricultural University, Kunming, China
| | - Yan-Ling Ren
- Shandong Binzhou Academy of Animal Science and Veterinary Medicine, Binzhou, China
| | - Feng Wang
- Institute of Sheep and Goat Science, Nanjing Agricultural University, Nanjing, China
| | - EEr Hehua
- Grass-Feeding Livestock Engineering Technology Research Center, Ningxia Academy of Agriculture and Forestry Sciences, Yinchuan, China
| | - Juha Kantanen
- Green Technology, Natural Resources Institute Finland (Luke), Jokioinen, Finland.,Department of Environmental and Biological Sciences, University of Eastern Finland, Kuopio, Finland
| | | | - Jian-Lin Han
- CAAS-ILRI Joint Laboratory on Livestock and Forage Genetic Resources, Institute of Animal Science, Chinese Academy of Agricultural Sciences (CAAS), Beijing, China.,Livestock Genetics Program, International Livestock Research Institute (ILRI), Nairobi, Kenya
| | - Meng-Hua Li
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences (CAS), Beijing, China
| |
Collapse
|
45
|
The genomic landscape of Nepalese Tibeto-Burmans reveals new insights into the recent peopling of Southern Himalayas. Sci Rep 2017; 7:15512. [PMID: 29138459 PMCID: PMC5686152 DOI: 10.1038/s41598-017-15862-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2017] [Accepted: 10/24/2017] [Indexed: 12/17/2022] Open
Abstract
While much research attention has focused on demographic processes that enabled human diffusion on the Tibetan plateau, little is known about more recent colonization of Southern Himalayas. In particular, the history of migrations, admixture and/or isolation of populations speaking Tibeto-Burman languages, which is supposed to be quite complex and to have reshaped patterns of genetic variation on both sides of the Himalayan arc, remains only partially elucidated. We thus described the genomic landscape of previously unsurveyed Tibeto-Burman (i.e. Sherpa and Tamang) and Indo-Aryan communities from remote Nepalese valleys. Exploration of their genomic relationships with South/East Asian populations provided evidence for Tibetan admixture with low-altitude East Asians and for Sherpa isolation. We also showed that the other Southern Himalayan Tibeto-Burmans derived East Asian ancestry not from the Tibetan/Sherpa lineage, but from low-altitude ancestors who migrated from China plausibly across Northern India/Myanmar, having experienced extensive admixture that reshuffled the ancestral Tibeto-Burman gene pool. These findings improved the understanding of the impact of gene flow/drift on the evolution of high-altitude Himalayan peoples and shed light on migration events that drove colonization of the southern Himalayan slopes, as well as on the role played by different Tibeto-Burman groups in such a complex demographic scenario.
Collapse
|
46
|
Distinguishing Among Modes of Convergent Adaptation Using Population Genomic Data. Genetics 2017; 207:1591-1619. [PMID: 29046403 DOI: 10.1534/genetics.117.300417] [Citation(s) in RCA: 75] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2017] [Accepted: 09/30/2017] [Indexed: 11/18/2022] Open
Abstract
Geographically separated populations can convergently adapt to the same selection pressure. Convergent evolution at the level of a gene may arise via three distinct modes. The selected alleles can (1) have multiple independent mutational origins, (2) be shared due to shared ancestral standing variation, or (3) spread throughout subpopulations via gene flow. We present a model-based, statistical approach that utilizes genomic data to detect cases of convergent adaptation at the genetic level, identify the loci involved and distinguish among these modes. To understand the impact of convergent positive selection on neutral diversity at linked loci, we make use of the fact that hitchhiking can be modeled as an increase in the variance in neutral allele frequencies around a selected site within a population. We build on coalescent theory to show how shared hitchhiking events between subpopulations act to increase covariance in allele frequencies between subpopulations at loci near the selected site, and extend this theory under different models of migration and selection on the same standing variation. We incorporate this hitchhiking effect into a multivariate normal model of allele frequencies that also accounts for population structure. Based on this theory, we present a composite-likelihood-based approach that utilizes genomic data to identify loci involved in convergence, and distinguishes among alternate modes of convergent adaptation. We illustrate our method on genome-wide polymorphism data from two distinct cases of convergent adaptation. First, we investigate the adaptation for copper toxicity tolerance in two populations of the common yellow monkey flower, Mimulus guttatus We show that selection has occurred on an allele that has been standing in these populations prior to the onset of copper mining in this region. Lastly, we apply our method to data from four populations of the killifish, Fundulus heteroclitus, that show very rapid convergent adaptation for tolerance to industrial pollutants. Here, we identify a single locus at which both independent mutation events and selection on an allele shared via gene flow, either slightly before or during selection, play a role in adaptation across the species' range.
Collapse
|
47
|
Botigué LR, Song S, Scheu A, Gopalan S, Pendleton AL, Oetjens M, Taravella AM, Seregély T, Zeeb-Lanz A, Arbogast RM, Bobo D, Daly K, Unterländer M, Burger J, Kidd JM, Veeramah KR. Ancient European dog genomes reveal continuity since the Early Neolithic. Nat Commun 2017; 8:16082. [PMID: 28719574 PMCID: PMC5520058 DOI: 10.1038/ncomms16082] [Citation(s) in RCA: 116] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2017] [Accepted: 05/25/2017] [Indexed: 12/19/2022] Open
Abstract
Europe has played a major role in dog evolution, harbouring the oldest uncontested Palaeolithic remains and having been the centre of modern dog breed creation. Here we sequence the genomes of an Early and End Neolithic dog from Germany, including a sample associated with an early European farming community. Both dogs demonstrate continuity with each other and predominantly share ancestry with modern European dogs, contradicting a previously suggested Late Neolithic population replacement. We find no genetic evidence to support the recent hypothesis proposing dual origins of dog domestication. By calibrating the mutation rate using our oldest dog, we narrow the timing of dog domestication to 20,000-40,000 years ago. Interestingly, we do not observe the extreme copy number expansion of the AMY2B gene characteristic of modern dogs that has previously been proposed as an adaptation to a starch-rich diet driven by the widespread adoption of agriculture in the Neolithic.
Collapse
Affiliation(s)
- Laura R Botigué
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, New York 11794-5245, USA
| | - Shiya Song
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Amelie Scheu
- Palaeogenetics Group, Johannes Gutenberg-University Mainz, 55099 Mainz, Germany.,Smurfit Institute of Genetics, Trinity College Dublin, Dublin 2, Ireland
| | - Shyamalika Gopalan
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, New York 11794-5245, USA
| | - Amanda L Pendleton
- Department of Human Genetics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Matthew Oetjens
- Department of Human Genetics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Angela M Taravella
- Department of Human Genetics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Timo Seregély
- Department of Prehistoric Archaeology, Institute of Archaeology, Heritage Sciences and Art History, University of Bamberg, 96045 Bamberg, Germany
| | - Andrea Zeeb-Lanz
- Generaldirektion Kulturelles Erbe Rheinland-Pfalz, Direktion Landesarchäologie, Außenstelle Speyer, 67346 Speyer, Germany
| | | | - Dean Bobo
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, New York 11794-5245, USA
| | - Kevin Daly
- Smurfit Institute of Genetics, Trinity College Dublin, Dublin 2, Ireland
| | - Martina Unterländer
- Palaeogenetics Group, Johannes Gutenberg-University Mainz, 55099 Mainz, Germany
| | - Joachim Burger
- Palaeogenetics Group, Johannes Gutenberg-University Mainz, 55099 Mainz, Germany
| | - Jeffrey M Kidd
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA.,Department of Human Genetics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Krishna R Veeramah
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, New York 11794-5245, USA
| |
Collapse
|
48
|
Lipson M, Reich D. A Working Model of the Deep Relationships of Diverse Modern Human Genetic Lineages Outside of Africa. Mol Biol Evol 2017; 34:889-902. [PMID: 28074030 PMCID: PMC5400393 DOI: 10.1093/molbev/msw293] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
A major topic of interest in human prehistory is how the large-scale genetic structure of modern populations outside of Africa was established. Demographic models have been developed that capture the relationships among small numbers of populations or within particular geographical regions, but constructing a phylogenetic tree with gene flow events for a wide diversity of non-Africans remains a difficult problem. Here, we report a model that provides a good statistical fit to allele-frequency correlation patterns among East Asians, Australasians, Native Americans, and ancient western and northern Eurasians, together with archaic human groups. The model features a primary eastern/western bifurcation dating to at least 45,000 years ago, with Australasians nested inside the eastern clade, and a parsimonious set of admixture events. While our results still represent a simplified picture, they provide a useful summary of deep Eurasian population history that can serve as a null model for future studies and a baseline for further discoveries.
Collapse
Affiliation(s)
- Mark Lipson
- Department of Genetics, Harvard Medical School, Boston, MA
| | - David Reich
- Department of Genetics, Harvard Medical School, Boston, MA
- Medical and Population Genetics Program, Broad Institute of MIT and Harvard, Cambridge, MA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA
| |
Collapse
|
49
|
Skoglund P, Reich D. A genomic view of the peopling of the Americas. Curr Opin Genet Dev 2016; 41:27-35. [PMID: 27507099 PMCID: PMC5161672 DOI: 10.1016/j.gde.2016.06.016] [Citation(s) in RCA: 49] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2016] [Revised: 06/23/2016] [Accepted: 06/25/2016] [Indexed: 10/21/2022]
Abstract
Whole-genome studies have documented that most Native American ancestry stems from a single population that diversified within the continent more than twelve thousand years ago. However, this shared ancestry hides a more complex history whereby at least four distinct streams of Eurasian migration have contributed to present-day and prehistoric Native American populations. Whole genome studies enhanced by technological breakthroughs in ancient DNA now provide evidence of a sequence of events involving initial migrations from a structured Northeast Asian source population with differential relatedness to present-day Australasian populations, followed by a divergence into northern and southern Native American lineages. During the Holocene, new migrations from Asia introduced the Saqqaq/Dorset Paleoeskimo population to the North American Arctic ∼4500 years ago, ancestry that is potentially connected with ancestry found in Athabaskan-speakers today. This was then followed by a major new population turnover in the high Arctic involving Thule-related peoples who are the ancestors of present-day Inuit. We highlight several open questions that could be addressed through future genomic research.
Collapse
Affiliation(s)
- Pontus Skoglund
- Harvard Medical School, Boston, MA, USA; Broad Institute of MIT and Harvard, Cambridge, MA, USA; Department of Archaeology and Classical History, Stockholm, Sweden.
| | - David Reich
- Harvard Medical School, Boston, MA, USA; Broad Institute of MIT and Harvard, Cambridge, MA, USA; Howard Hughes Medical Institute, Boston, MA, USA
| |
Collapse
|
50
|
Novembre J, Peter BM. Recent advances in the study of fine-scale population structure in humans. Curr Opin Genet Dev 2016; 41:98-105. [PMID: 27662060 DOI: 10.1016/j.gde.2016.08.007] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2016] [Revised: 08/18/2016] [Accepted: 08/24/2016] [Indexed: 01/17/2023]
Abstract
Empowered by modern genotyping and large samples, population structure can be accurately described and quantified even when it only explains a fraction of a percent of total genetic variance. This is especially relevant and interesting for humans, where fine-scale population structure can both confound disease-mapping studies and reveal the history of migration and divergence that shaped our species' diversity. Here we review notable recent advances in the detection, use, and understanding of population structure. Our work addresses multiple areas where substantial progress is being made: improved statistics and models for better capturing differentiation, admixture, and the spatial distribution of variation; computational speed-ups that allow methods to scale to modern data; and advances in haplotypic modeling that have wide ranging consequences for the analysis of population structure. We conclude by outlining four important open challenges: the limitations of discrete population models, uncertainty in individual origins, the incorporation of both fine-scale structure and ancient DNA in parametric models, and the development of efficient computational tools, particularly for haplotype-based methods.
Collapse
Affiliation(s)
- John Novembre
- Department of Human Genetics, University of Chicago, IL 60636, United States; Department of Ecology and Evolutionary Biology, University of Chicago, IL 60636, United States
| | - Benjamin M Peter
- Department of Human Genetics, University of Chicago, IL 60636, United States
| |
Collapse
|