1
|
Langdon QK, Groh JS, Aguillon SM, Powell DL, Gunn T, Payne C, Baczenas JJ, Donny A, Dodge TO, Du K, Schartl M, Ríos-Cárdenas O, Gutierrez-Rodríguez C, Morris M, Schumer M. Genome evolution is surprisingly predictable after initial hybridization. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.21.572897. [PMID: 38187753 PMCID: PMC10769416 DOI: 10.1101/2023.12.21.572897] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Over the past two decades, evolutionary biologists have come to appreciate that hybridization, or genetic exchange between distinct lineages, is remarkably common - not just in particular lineages but in taxonomic groups across the tree of life. As a result, the genomes of many modern species harbor regions inherited from related species. This observation has raised fundamental questions about the degree to which the genomic outcomes of hybridization are repeatable and the degree to which natural selection drives such repeatability. However, a lack of appropriate systems to answer these questions has limited empirical progress in this area. Here, we leverage independently formed hybrid populations between the swordtail fish Xiphophorus birchmanni and X. cortezi to address this fundamental question. We find that local ancestry in one hybrid population is remarkably predictive of local ancestry in another, demographically independent hybrid population. Applying newly developed methods, we can attribute much of this repeatability to strong selection in the earliest generations after initial hybridization. We complement these analyses with time-series data that demonstrates that ancestry at regions under selection has remained stable over the past ~40 generations of evolution. Finally, we compare our results to the well-studied X. birchmanni×X. malinche hybrid populations and conclude that deeper evolutionary divergence has resulted in stronger selection and higher repeatability in patterns of local ancestry in hybrids between X. birchmanni and X. cortezi.
Collapse
Affiliation(s)
- Quinn K. Langdon
- Department of Biology, Stanford University
- Centro de Investigaciones Científicas de las Huastecas “Aguazarca”, A.C
- Gladstone Institute of Virology, Gladstone Institutes, San Francisco, California
| | - Jeffrey S. Groh
- Center for Population Biology and Department of Evolution and Ecology, University of California, Davis
| | - Stepfanie M. Aguillon
- Department of Biology, Stanford University
- Centro de Investigaciones Científicas de las Huastecas “Aguazarca”, A.C
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles
| | - Daniel L. Powell
- Department of Biology, Stanford University
- Centro de Investigaciones Científicas de las Huastecas “Aguazarca”, A.C
| | - Theresa Gunn
- Department of Biology, Stanford University
- Centro de Investigaciones Científicas de las Huastecas “Aguazarca”, A.C
| | - Cheyenne Payne
- Department of Biology, Stanford University
- Centro de Investigaciones Científicas de las Huastecas “Aguazarca”, A.C
| | | | - Alex Donny
- Department of Biology, Stanford University
- Centro de Investigaciones Científicas de las Huastecas “Aguazarca”, A.C
| | - Tristram O. Dodge
- Department of Biology, Stanford University
- Centro de Investigaciones Científicas de las Huastecas “Aguazarca”, A.C
| | - Kang Du
- Xiphophorus Genetic Stock Center, Texas State University San Marcos
| | - Manfred Schartl
- Xiphophorus Genetic Stock Center, Texas State University San Marcos
- Developmental Biochemistry, Biocenter, University of Würzburg
| | | | | | | | - Molly Schumer
- Department of Biology, Stanford University
- Centro de Investigaciones Científicas de las Huastecas “Aguazarca”, A.C
- Freeman Hrabowski Fellow, Howard Hughes Medical Institute
| |
Collapse
|
2
|
Tan T, Atkinson EG. Strategies for the Genomic Analysis of Admixed Populations. Annu Rev Biomed Data Sci 2023; 6:105-127. [PMID: 37127050 PMCID: PMC10871708 DOI: 10.1146/annurev-biodatasci-020722-014310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Admixed populations constitute a large portion of global human genetic diversity, yet they are often left out of genomics analyses. This exclusion is problematic, as it leads to disparities in the understanding of the genetic structure and history of diverse cohorts and the performance of genomic medicine across populations. Admixed populations have particular statistical challenges, as they inherit genomic segments from multiple source populations-the primary reason they have historically been excluded from genetic studies. In recent years, however, an increasing number of statistical methods and software tools have been developed to account for and leverage admixture in the context of genomics analyses. Here, we provide a survey of such computational strategies for the informed consideration of admixture to allow for the well-calibrated inclusion of mixed ancestry populations in large-scale genomics studies, and we detail persisting gaps in existing tools.
Collapse
Affiliation(s)
- Taotao Tan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA;
| | - Elizabeth G Atkinson
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA;
| |
Collapse
|
3
|
Witt KE, Funk A, Añorve-Garibay V, Fang LL, Huerta-Sánchez E. The Impact of Modern Admixture on Archaic Human Ancestry in Human Populations. Genome Biol Evol 2023; 15:evad066. [PMID: 37103242 PMCID: PMC10194819 DOI: 10.1093/gbe/evad066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 03/07/2023] [Accepted: 04/17/2023] [Indexed: 04/28/2023] Open
Abstract
Admixture, the genetic merging of parental populations resulting in mixed ancestry, has occurred frequently throughout the course of human history. Numerous admixture events have occurred between human populations across the world, which have shaped genetic ancestry in modern humans. For example, populations in the Americas are often mosaics of different ancestries due to recent admixture events as part of European colonization. Admixed individuals also often have introgressed DNA from Neanderthals and Denisovans that may have come from multiple ancestral populations, which may affect how archaic ancestry is distributed across an admixed genome. In this study, we analyzed admixed populations from the Americas to assess whether the proportion and location of admixed segments due to recent admixture impact an individual's archaic ancestry. We identified a positive correlation between non-African ancestry and archaic alleles, as well as a slight increase of Denisovan alleles in Indigenous American segments relative to European segments in admixed genomes. We also identify several genes as candidates for adaptive introgression, based on archaic alleles present at high frequency in admixed American populations but low frequency in East Asian populations. These results provide insights into how recent admixture events between modern humans redistributed archaic ancestry in admixed genomes.
Collapse
Affiliation(s)
- Kelsey E Witt
- Ecology, Evolution, and Organismal Biology, Brown University, Providence, Rhode Island
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island
| | - Alyssa Funk
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island
- Molecular Biology, Cell Biology, & Biochemistry, Brown University, Providence, Rhode Island
| | - Valeria Añorve-Garibay
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island
- Licenciatura en Ciencias Genómicas, Escuela Nacional de Estudios Superiores Unidad Juriquilla, Universidad Nacional Autónoma de México, Querétaro, Mexico
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Querétaro, Mexico
| | - Lesly Lopez Fang
- Department of Life & Environmental Sciences, University of California, Merced, California, United States of America
| | - Emilia Huerta-Sánchez
- Ecology, Evolution, and Organismal Biology, Brown University, Providence, Rhode Island
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island
| |
Collapse
|
4
|
Zhang R, Ni X, Yuan K, Pan Y, Xu S. MultiWaverX: modeling latent sex-biased admixture history. Brief Bioinform 2022; 23:6590437. [PMID: 35598333 DOI: 10.1093/bib/bbac179] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Revised: 04/18/2022] [Accepted: 04/20/2022] [Indexed: 11/13/2022] Open
Abstract
Sex-biased gene flow has been common in the demographic history of modern humans. However, the lack of sophisticated methods for delineating the detailed sex-biased admixture process prevents insights into complex admixture history and thus our understanding of the evolutionary mechanisms of genetic diversity. Here, we present a novel algorithm, MultiWaverX, for modeling complex admixture history with sex-biased gene flow. Systematic simulations showed that MultiWaverX is a powerful tool for modeling complex admixture history and inferring sex-biased gene flow. Application of MultiWaverX to empirical data of 17 typical admixed populations in America, Central Asia, and the Middle East revealed sex-biased admixture histories that were largely consistent with the historical records. Notably, fine-scale admixture process reconstruction enabled us to recognize latent sex-biased gene flow in certain populations that would likely be overlooked by much of the routine analysis with commonly used methods. An outstanding example in the real world is the Kazakh population that experienced complex admixture with sex-biased gene flow but in which the overall signature has been canceled due to biased gene flow from an opposite direction.
Collapse
Affiliation(s)
- Rui Zhang
- Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Xumin Ni
- School of Mathematics and Statistics, Beijing Jiaotong University, Beijing, 100044, China
| | - Kai Yuan
- Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yuwen Pan
- Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Shuhua Xu
- Department of Liver Surgery and Transplantation Liver Cancer Institute, Zhongshan Hospital, Fudan University, Shanghai 200032, China.,State Key Laboratory of Genetic Engineering, Collaborative Innovation Center of Genetics and Development, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai 200438, China.,Human Phenome Institute, Zhangjiang Fudan International Innovation Center, and Ministry of Education Key Laboratory of Contemporary Anthropology, Fudan University, Shanghai 201203, China.,Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China.,Jiangsu Key Laboratory of Phylogenomics and Comparative Genomics, School of Life Sciences, Jiangsu Normal University, Xuzhou, 221116, China.,Henan Institute of Medical and Pharmaceutical Sciences, Zhengzhou University, Zhengzhou 450052, China.,School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China
| |
Collapse
|
5
|
Summo M, Comte A, Martin G, Perelle P, Weitz EM, Droc G, Rouard M. GeMo: a web-based platform for the visualization and curation of genome ancestry mosaics. Database (Oxford) 2022; 2022:6645005. [PMID: 35849014 PMCID: PMC9290862 DOI: 10.1093/database/baac057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Revised: 06/17/2022] [Accepted: 06/30/2022] [Indexed: 11/12/2022]
Abstract
In silico chromosome painting is a technique by which contributions of distinct genetic groups are represented along chromosomes of hybrid individuals. This type of analysis is used to study the mechanisms by which these individuals were formed. Such techniques are well adapted to identify genetic groups contributing to these individuals as well as hybridization events. It can also be used to follow chromosomal recombinations that occurred naturally or were generated by selective breeding. Here, we present GeMo, a novel interactive web-based and user-oriented interface to visualize in a linear-based fashion results of in silico chromosome painting. To facilitate data input generation, a script to execute analytical commands is provided and an interactive data curation mode is supported to ensure consistency of the automated procedure. GeMo contains preloaded datasets from published studies on crop domestication but can be applied to other purposes, such as breeding programs Although only applied so far on plants, GeMo can handle data from animals as well. Database URL: https://gemo.southgreen.fr/
Collapse
Affiliation(s)
- Marilyne Summo
- CIRAD, UMR AGAP Institut , Montpellier 34398, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro , Montpellier, 34398, France
- French Institute of Bioinformatics (IFB)—South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD , Montpellier 34398, France
| | - Aurore Comte
- French Institute of Bioinformatics (IFB)—South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD , Montpellier 34398, France
- IRD, CIRAD, INRAE, Institut Agro, PHIM Plant Health Institute, Montpellier University , Montpellier 34398, France
| | - Guillaume Martin
- CIRAD, UMR AGAP Institut , Montpellier 34398, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro , Montpellier, 34398, France
- French Institute of Bioinformatics (IFB)—South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD , Montpellier 34398, France
| | - Pierrick Perelle
- CIRAD, UMR AGAP Institut , Montpellier 34398, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro , Montpellier, 34398, France
| | - Eric M Weitz
- Data Sciences Platform, Broad Institute of MIT and Harvard , 105 Broadway, Cambridge, MA 02142, USA
| | - Gaëtan Droc
- CIRAD, UMR AGAP Institut , Montpellier 34398, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro , Montpellier, 34398, France
- French Institute of Bioinformatics (IFB)—South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD , Montpellier 34398, France
| | - Mathieu Rouard
- French Institute of Bioinformatics (IFB)—South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD , Montpellier 34398, France
- Bioversity International, Parc Scientifique Agropolis II , 34397, Montpellier, France
| |
Collapse
|
6
|
Aase K, Jensen H, Muff S. Genomic estimation of quantitative genetic parameters in wild admixed populations. Methods Ecol Evol 2022. [DOI: 10.1111/2041-210x.13810] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Affiliation(s)
- Kenneth Aase
- Centre for Biodiversity Dynamics, Department of Biology Norwegian University of Science and Technology Trondheim Norway
| | - Henrik Jensen
- Centre for Biodiversity Dynamics, Department of Biology Norwegian University of Science and Technology Trondheim Norway
| | - Stefanie Muff
- Centre for Biodiversity Dynamics, Department of Biology Norwegian University of Science and Technology Trondheim Norway
- Department of Mathematical Sciences, Norwegian University of Science and Technology Trondheim Norway
| |
Collapse
|
7
|
van Eeden G, Uren C, van der Spuy G, Tromp G, Möller M. Local ancestry inference in heterogeneous populations-Are recent recombination events more relevant? Brief Bioinform 2021; 22:6337894. [PMID: 34343255 DOI: 10.1093/bib/bbab300] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Revised: 06/29/2021] [Accepted: 07/14/2021] [Indexed: 12/11/2022] Open
Abstract
To date, numerous software tools have been developed to infer recombination maps. Many of these software tools infer the recombination rate from linkage disequilibrium, and therefore they infer recombination many generations into the past. Other recently developed methods rely on the inference of recent recombination events to determine the recombination rate, such as identity by descent- and local ancestry inference (LAI)-based tools. Methods that mainly use recent recombination events to infer the recombination rate might be more relevant for certain analyses like LAI. We therefore describe a protocol for creating high-resolution, population-specific recombination maps using methods that mainly use recent recombination events and a method that uses recent and distant recombination events for recombination rate inference. Subsequently, we compared the effect of using maps inferred by these two paradigms on LAI accuracy.
Collapse
Affiliation(s)
| | | | - Gian van der Spuy
- Department of Molecular Biology and Human Genetics, Stellenbosch University, South Africa
| | - Gerard Tromp
- South African Tuberculosis Bioinformatics Initiative (SATBBI), South Africa
| | - Marlo Möller
- Department of Molecular Biology and Human Genetics, Stellenbosch University, South Africa
| |
Collapse
|
8
|
Genetic Ancestry Inference and Its Application for the Genetic Mapping of Human Diseases. Int J Mol Sci 2021; 22:ijms22136962. [PMID: 34203440 PMCID: PMC8269095 DOI: 10.3390/ijms22136962] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Revised: 06/24/2021] [Accepted: 06/25/2021] [Indexed: 12/21/2022] Open
Abstract
Admixed populations arise when two or more ancestral populations interbreed. As a result of this admixture, the genome of admixed populations is defined by tracts of variable size inherited from these parental groups and has particular genetic features that provide valuable information about their demographic history. Diverse methods can be used to derive the ancestry apportionment of admixed individuals, and such inferences can be leveraged for the discovery of genetic loci associated with diseases and traits, therefore having important biomedical implications. In this review article, we summarize the most common methods of global and local genetic ancestry estimation and discuss the use of admixture mapping studies in human diseases.
Collapse
|
9
|
Wu J, Liu Y, Zhao Y. Systematic Review on Local Ancestor Inference From a Mathematical and Algorithmic Perspective. Front Genet 2021; 12:639877. [PMID: 34108987 PMCID: PMC8181461 DOI: 10.3389/fgene.2021.639877] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Accepted: 04/12/2021] [Indexed: 11/20/2022] Open
Abstract
Genotypic data provide deep insights into the population history and medical genetics. The local ancestry inference (LAI) (also termed local ancestry deconvolution) method uses the hidden Markov model (HMM) to solve the mathematical problem of ancestry reconstruction based on genomic data. HMM is combined with other statistical models and machine learning techniques for particular genetic tasks in a series of computer tools. In this article, we surveyed the mathematical structure, application characteristics, historical development, and benchmark analysis of the LAI method in detail, which will help researchers better understand and further develop LAI methods. Firstly, we extensively explore the mathematical structure of each model and its characteristic applications. Next, we use bibliometrics to show detailed model application fields and list articles to elaborate on the historical development. LAI publications had experienced a peak period during 2006-2016 and had kept on moving in the following years. The efficiency, accuracy, and stability of the existing models were evaluated by the benchmark. We find that phased data had higher accuracy in comparison with unphased data. We summarize these models with their distinct advantages and disadvantages. The Loter model uses dynamic programming to obtain a globally optimal solution with its parameter-free advantage. Aligned bases can be used directly in the Seqmix model if the genotype is hard to call. This research may help model developers to realize current challenges, develop more advanced models, and enable scholars to select appropriate models according to given populations and datasets.
Collapse
Affiliation(s)
- Jie Wu
- State Key Laboratory of Agrobiotechnology, China Agricultural University, Beijing, China
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing, China
| | - Yangxiu Liu
- State Key Laboratory of Agrobiotechnology, China Agricultural University, Beijing, China
| | - Yiqiang Zhao
- State Key Laboratory of Agrobiotechnology, China Agricultural University, Beijing, China
| |
Collapse
|
10
|
Molinaro L, Marnetto D, Mondal M, Ongaro L, Yelmen B, Lawson DJ, Montinaro F, Pagani L. A Chromosome-Painting-Based Pipeline to Infer Local Ancestry under Limited Source Availability. Genome Biol Evol 2021; 13:6135079. [PMID: 33585906 PMCID: PMC8085126 DOI: 10.1093/gbe/evab025] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/04/2021] [Indexed: 01/09/2023] Open
Abstract
Contemporary individuals are the combination of genetic fragments inherited from ancestors belonging to multiple populations, as the result of migration and admixture. Isolating and characterizing these layers are crucial to the understanding of the genetic history of a given population. Ancestry deconvolution approaches make use of a large amount of source individuals, therefore constraining the performance of Local Ancestry Inferences when only few genomes are available from a given population. Here we present WINC, a local ancestry framework derived from the combination of ChromoPainter and NNLS approaches, as a method to retrieve local genetic assignments when only a few reference individuals are available. The framework is aided by a score assignment based on source differentiation to maximize the amount of sequences retrieved and is capable of retrieving accurate ancestry assignments when only two individuals for source populations are used.
Collapse
Affiliation(s)
- Ludovica Molinaro
- Estonian Biocentre, Institute of Genomics, University of Tartu, Estonia.,Institute of Molecular and Cell Biology, University of Tartu, Estonia
| | - Davide Marnetto
- Estonian Biocentre, Institute of Genomics, University of Tartu, Estonia
| | - Mayukh Mondal
- Estonian Biocentre, Institute of Genomics, University of Tartu, Estonia
| | - Linda Ongaro
- Estonian Biocentre, Institute of Genomics, University of Tartu, Estonia.,Institute of Molecular and Cell Biology, University of Tartu, Estonia
| | - Burak Yelmen
- Estonian Biocentre, Institute of Genomics, University of Tartu, Estonia.,Institute of Molecular and Cell Biology, University of Tartu, Estonia
| | - Daniel John Lawson
- Medical Research Council Integrative Epidemiology Unit, Department of Population Health Sciences, Bristol Medical School, University of Bristol, United Kingdom
| | - Francesco Montinaro
- Estonian Biocentre, Institute of Genomics, University of Tartu, Estonia.,Department of Biology-Genetics, University of Bari, Italy
| | - Luca Pagani
- Estonian Biocentre, Institute of Genomics, University of Tartu, Estonia.,Department of Biology, University of Padova, Italy
| |
Collapse
|
11
|
Atkinson EG, Maihofer AX, Kanai M, Martin AR, Karczewski KJ, Santoro ML, Ulirsch JC, Kamatani Y, Okada Y, Finucane HK, Koenen KC, Nievergelt CM, Daly MJ, Neale BM. Tractor uses local ancestry to enable the inclusion of admixed individuals in GWAS and to boost power. Nat Genet 2021; 53:195-204. [PMID: 33462486 PMCID: PMC7867648 DOI: 10.1038/s41588-020-00766-y] [Citation(s) in RCA: 96] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Accepted: 12/15/2020] [Indexed: 12/26/2022]
Abstract
Admixed populations are routinely excluded from genomic studies due to concerns over population structure. Here, we present a statistical framework and software package, Tractor, to facilitate the inclusion of admixed individuals in association studies by leveraging local ancestry. We test Tractor with simulated and empirical two-way admixed African-European cohorts. Tractor generates accurate ancestry-specific effect-size estimates and P values, can boost genome-wide association study (GWAS) power and improves the resolution of association signals. Using a local ancestry-aware regression model, we replicate known hits for blood lipids, discover novel hits missed by standard GWAS and localize signals closer to putative causal variants.
Collapse
Affiliation(s)
- Elizabeth G Atkinson
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA.
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Adam X Maihofer
- Department of Psychiatry, University of California, San Diego, La Jolla, CA, USA
| | - Masahiro Kanai
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Bioinformatics and Integrative Genomics, Harvard Medical School, Boston, MA, USA
- Department of Statistical Genetics, Graduate School of Medicine, Osaka University, Suita, Japan
| | - Alicia R Martin
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Konrad J Karczewski
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Marcos L Santoro
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Departamento de Psiquiatria, Universidade Federal de São Paulo, São Paulo, Brazil
- Departamento de Morfologia e Genética, Universidade Federal de São Paulo, São Paulo, Brazil
| | - Jacob C Ulirsch
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA, USA
| | - Yoichiro Kamatani
- Laboratory of Complex Trait Genomics, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
| | - Yukinori Okada
- Department of Statistical Genetics, Graduate School of Medicine, Osaka University, Suita, Japan
- Laboratory of Statistical Immunology, Immunology Frontier Research Center (WPI-IFReC), Osaka University, Suita, Japan
- Integrated Frontier Research for Medical Science Division, Institute for Open and Transdisciplinary Research Initiatives, Osaka University, Suita, Japan
| | - Hilary K Finucane
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Karestan C Koenen
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | | | - Mark J Daly
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Institute for Molecular Medicine Finland, University of Helsinki, Helsinki, Finland
| | - Benjamin M Neale
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| |
Collapse
|
12
|
Parfenchyk MS, Kotava SA. The Theoretical Framework for the Panels of DNA Markers Formation in the Forensic Determination of an Individual Ancestral Origin. RUSS J GENET+ 2021. [DOI: 10.1134/s1022795421010105] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
|
13
|
Local ancestry inference provides insight into Tilapia breeding programmes. Sci Rep 2020; 10:18613. [PMID: 33122794 PMCID: PMC7596482 DOI: 10.1038/s41598-020-75744-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2020] [Accepted: 10/12/2020] [Indexed: 11/30/2022] Open
Abstract
Tilapia is one of the most commercially valuable species in aquaculture with over 5 million tonnes of Nile tilapia, Oreochromis niloticus, produced worldwide every year. It has become increasingly important to keep track of the inheritance of the selected traits under continuous improvement (e.g. growth rate, size at maturity or genetic gender), as selective breeding has also resulted in genes that can hitchhike as part of the process. The goal of this study was to generate a Local Ancestry Interence workflow that harnessed existing tilapia genotyping-by-sequencing studies, such as Double Digest RAD-seq derived Single-Nucleotide Polymorphism markers. We developed a workflow and implemented a suite of tools to resolve the local ancestry of each chromosomal locus based on reference panels of tilapia species of known origin. We used tilapia species, wild populations and breeding programmes to validate our methods. The precision of the pipeline was evaluated on the basis of its ability to identify the genetic makeup of samples of known ancestry. The easy and inexpensive application of local ancestry inference in breeding programmes will facilitate the monitoring of the genetic profile of individuals of interest, the tracking of the movement of genes from parents to offspring and the detection of hybrids and their origin.
Collapse
|
14
|
Schubert R, Andaleon A, Wheeler HE. Comparing local ancestry inference models in populations of two- and three-way admixture. PeerJ 2020; 8:e10090. [PMID: 33072440 PMCID: PMC7537619 DOI: 10.7717/peerj.10090] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2020] [Accepted: 09/13/2020] [Indexed: 12/23/2022] Open
Abstract
Local ancestry estimation infers the regional ancestral origin of chromosomal segments in admixed populations using reference populations and a variety of statistical models. Integrating local ancestry into complex trait genetics has the potential to increase detection of genetic associations and improve genetic prediction models in understudied admixed populations, including African Americans and Hispanics. Five methods for local ancestry estimation that have been used in human complex trait genetics are LAMP-LD (2012), RFMix (2013), ELAI (2014), Loter (2018), and MOSAIC (2019). As users rather than developers, we sought to perform direct comparisons of accuracy, runtime, memory usage, and usability of these software tools to determine which is best for incorporation into association study pipelines. We find that in the majority of cases RFMix has the highest median accuracy with the ranking of the remaining software dependent on the ancestral architecture of the population tested. Additionally, we estimate the O(n) of both memory and runtime for each software and find that for both time and memory most software increase linearly with respect to sample size. The only exception is RFMix, which increases quadratically with respect to runtime and linearly with respect to memory. Effective local ancestry estimation tools are necessary to increase diversity and prevent population disparities in human genetics studies. RFMix performs the best across methods, however, depending on application, other methods perform just as well with the benefit of shorter runtimes. Scripts used to format data, run software, and estimate accuracy can be found at https://github.com/WheelerLab/LAI_benchmarking.
Collapse
Affiliation(s)
- Ryan Schubert
- Department of Mathematics and Statistics, Loyola University Chicago, Chicago, IL, United States of America.,Department of Biology, Loyola University Chicago, Chicago, IL, United States of America.,Program in Bioinformatics, Loyola University Chicago, Chicago, IL, United States of America
| | - Angela Andaleon
- Department of Biology, Loyola University Chicago, Chicago, IL, United States of America.,Program in Bioinformatics, Loyola University Chicago, Chicago, IL, United States of America
| | - Heather E Wheeler
- Department of Biology, Loyola University Chicago, Chicago, IL, United States of America.,Program in Bioinformatics, Loyola University Chicago, Chicago, IL, United States of America.,Department of Public Health Sciences, Loyola University Chicago, Maywood, IL, United States of America
| |
Collapse
|
15
|
Cottin A, Penaud B, Glaszmann JC, Yahiaoui N, Gautier M. Simulation-Based Evaluation of Three Methods for Local Ancestry Deconvolution of Non-model Crop Species Genomes. G3 (BETHESDA, MD.) 2020; 10:569-579. [PMID: 31862786 PMCID: PMC7003078 DOI: 10.1534/g3.119.400873] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/08/2019] [Accepted: 12/12/2019] [Indexed: 11/30/2022]
Abstract
Hybridizations between species and subspecies represented major steps in the history of many crop species. Such events generally lead to genomes with mosaic patterns of chromosomal segments of various origins that may be assessed by local ancestry inference methods. However, these methods have mainly been developed in the context of human population genetics with implicit assumptions that may not always fit plant models. The purpose of this study was to evaluate the suitability of three state-of-the-art inference methods (SABER, ELAI and WINPOP) for local ancestry inference under scenarios that can be encountered in plant species. For this, we developed an R package to simulate genotyping data under such scenarios. The tested inference methods performed similarly well as far as representatives of source populations were available. As expected, the higher the level of differentiation between ancestral source populations and the lower the number of generations since admixture, the more accurate were the results. Interestingly, the accuracy of the methods was only marginally affected by i) the number of ancestries (up to six tested); ii) the sample design (i.e., unbalanced representation of source populations); and iii) the reproduction mode (e.g., selfing, vegetative propagation). If a source population was not represented in the data set, no bias was observed in inference accuracy for regions originating from represented sources and regions from the missing source were assigned differently depending on the methods. Overall, the selected ancestry inference methods may be used for crop plant analysis if all ancestral sources are known.
Collapse
Affiliation(s)
- Aurélien Cottin
- CIRAD, UMR AGAP, F-34398 Montpellier, France
- AGAP, Univ. Montpellier, CIRAD, INRAE, Montpellier SupAgro, Montpellier, France, and
| | - Benjamin Penaud
- CIRAD, UMR AGAP, F-34398 Montpellier, France
- AGAP, Univ. Montpellier, CIRAD, INRAE, Montpellier SupAgro, Montpellier, France, and
| | - Jean-Christophe Glaszmann
- CIRAD, UMR AGAP, F-34398 Montpellier, France
- AGAP, Univ. Montpellier, CIRAD, INRAE, Montpellier SupAgro, Montpellier, France, and
| | - Nabila Yahiaoui
- CIRAD, UMR AGAP, F-34398 Montpellier, France,
- AGAP, Univ. Montpellier, CIRAD, INRAE, Montpellier SupAgro, Montpellier, France, and
| | | |
Collapse
|
16
|
Geza E, Mulder NJ, Chimusa ER, Mazandu GK. FRANC: a unified framework for multi-way local ancestry deconvolution with high density SNP data. Brief Bioinform 2019. [DOI: 10.1093/bib/bbz117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Abstract
Several thousand genomes have been completed with millions of variants identified in the human deoxyribonucleic acid sequences. These genomic variations, especially those introduced by admixture, significantly contribute to a remarkable phenotypic variability with medical and/or evolutionary implications. Elucidating local ancestry estimates is necessary for a better understanding of genomic variation patterns throughout modern human evolution and adaptive processes, and consequences in human heredity and health. However, existing local ancestry deconvolution tools are accessible as individual scripts, each requiring input and producing output in its own complex format. This limits the user’s ability to retrieve local ancestry estimates. We introduce a unified framework for multi-way local ancestry inference, FRANC, integrating eight existing state-of-the-art local ancestry deconvolution tools. FRANC is an adaptable, expandable and portable tool that manipulates tool-specific inputs, deconvolutes ancestry and standardizes tool-specific results. To facilitate both medical and population genetics studies, FRANC requires convenient and easy to manipulate input files and allows users to choose output formats to ease their use in further potential local ancestry deconvolution applications.
Collapse
Affiliation(s)
- Ephifania Geza
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine University of Cape Town Health Sciences Campus Anzio Rd, Observatory, 7925, South Africa
| | - Nicola J Mulder
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine University of Cape Town Health Sciences Campus Anzio Rd, Observatory, 7925, South Africa
| | - Emile R Chimusa
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine University of Cape Town Health Sciences Campus Anzio Rd, Observatory, 7925, South Africa
| | - Gaston K Mazandu
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine University of Cape Town Health Sciences Campus Anzio Rd, Observatory, 7925, South Africa
| |
Collapse
|
17
|
Duranton M, Bonhomme F, Gagnaire P. The spatial scale of dispersal revealed by admixture tracts. Evol Appl 2019; 12:1743-1756. [PMID: 31548854 PMCID: PMC6752141 DOI: 10.1111/eva.12829] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2018] [Accepted: 05/28/2019] [Indexed: 12/11/2022] Open
Abstract
Evaluating species dispersal across the landscape is essential to design appropriate management and conservation actions. However, technical difficulties often preclude direct measures of individual movement, while indirect genetic approaches rely on assumptions that sometimes limit their application. Here, we show that the temporal decay of admixture tracts lengths can be used to assess genetic connectivity within a population introgressed by foreign haplotypes. We present a proof-of-concept approach based on local ancestry inference in a high gene flow marine fish species, the European sea bass (Dicentrarchus labrax). Genetic admixture in the contact zone between Atlantic and Mediterranean sea bass lineages allows the introgression of Atlantic haplotype tracts within the Mediterranean Sea. Once introgressed, blocks of foreign ancestry are progressively eroded by recombination as they diffuse from the western to the eastern Mediterranean basin, providing a means to estimate dispersal. By comparing the length distributions of Atlantic tracts between two Mediterranean populations located at different distances from the contact zone, we estimated the average per-generation dispersal distance within the Mediterranean lineage to less than 50 km. Using simulations, we showed that this approach is robust to a range of demographic histories and sample sizes. Our results thus support that the length of admixture tracts can be used together with a recombination clock to estimate genetic connectivity in species for which the neutral migration-drift balance is not informative or simply does not exist.
Collapse
Affiliation(s)
- Maud Duranton
- ISEM, Univ Montpellier, CNRS, EPHE, IRDMontpellierFrance
| | | | | |
Collapse
|
18
|
Santos JD, Chebotarov D, McNally KL, Bartholomé J, Droc G, Billot C, Glaszmann JC. Fine Scale Genomic Signals of Admixture and Alien Introgression among Asian Rice Landraces. Genome Biol Evol 2019; 11:1358-1373. [PMID: 31002105 PMCID: PMC6499253 DOI: 10.1093/gbe/evz084] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/11/2019] [Indexed: 12/26/2022] Open
Abstract
Modern rice cultivars are adapted to a range of environmental conditions and human preferences. At the root of this diversity is a marked genetic structure, owing to multiple foundation events. Admixture and recurrent introgression from wild sources have played upon this base to produce the myriad adaptations existing today. Genome-wide studies bring support to this idea, but understanding the history and nature of particular genetic adaptations requires the identification of specific patterns of genetic exchange. In this study, we explore the patterns of haplotype similarity along the genomes of a subset of rice cultivars available in the 3,000 Rice Genomes data set. We begin by establishing a custom method of classification based on a combination of dimensionality reduction and kernel density estimation. Through simulations, the behavior of this classifier is studied under scenarios of varying genetic divergence, admixture, and alien introgression. Finally, the method is applied to local haplotypes along the genome of a Core set of Asian Landraces. Taking the Japonica, Indica, and cAus groups as references, we find evidence of reciprocal introgressions covering 2.6% of reference genomes on average. Structured signals of introgression among reference accessions are discussed. We extend the analysis to elucidate the genetic structure of the group circum-Basmati: we delimit regions of Japonica, cAus, and Indica origin, as well as regions outlier to these groups (13% on average). Finally, the approach used highlights regions of partial to complete loss of structure that can be attributed to selective pressures during domestication.
Collapse
Affiliation(s)
- João D Santos
- UMR AGAP, CIRAD, Montpellier, France
- UMR AGAP, Université de Montpellier, France
| | - Dmytro Chebotarov
- International Rice Research Institute (IRRI), Los Baños, Philippines
| | - Kenneth L McNally
- International Rice Research Institute (IRRI), Los Baños, Philippines
| | - Jérôme Bartholomé
- UMR AGAP, CIRAD, Montpellier, France
- UMR AGAP, Université de Montpellier, France
- International Rice Research Institute (IRRI), Los Baños, Philippines
| | - Gaëtan Droc
- UMR AGAP, CIRAD, Montpellier, France
- UMR AGAP, Université de Montpellier, France
| | - Claire Billot
- UMR AGAP, CIRAD, Montpellier, France
- UMR AGAP, Université de Montpellier, France
| | | |
Collapse
|