1
|
Bengtsson C, Stålhammar H, Thomasen JR, Fikse WF, Strandberg E, Eriksson S, Johnsson M. Simulation of long-term impact of dairy cattle mating programmes using genomic information at the herd level. Animal 2025; 19:101498. [PMID: 40252276 DOI: 10.1016/j.animal.2025.101498] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2024] [Revised: 03/20/2025] [Accepted: 03/20/2025] [Indexed: 04/21/2025] Open
Abstract
Genotyping provides breeders with new information at the single nucleotide polymorphism level that can be used in mating programmes. This study used stochastic simulation to explore the long-term effects of genomic mating allocations combining economic scores and linear programming at the level of commercial herds. The economic scores included genetic level, a favourable monogenic trait (polledness), a recessive genetic defect, and parent relationships. The results showed that compared with only maximising genetic level, including genomic or pedigree relationship in the economic score lowered the rate of pedigree and genomic inbreeding with minimal effect on genetic gain. Including the cost of a recessive genetic defect in the score almost eliminated the risk of expression. We set the start allele frequency of polledness to ∼12%, and the value of polledness varied in the different scenarios (€0, €10, €50, and €100). Including an economic value for polledness of (≥ €50) in the economic score increased the frequency of polled animals by up to 0.037 per generation, without negatively impacting other comparison criteria. The use of genomic relationships was favourable for the rate of genomic inbreeding and performed as well as pedigree relationships concerning the rate of pedigree inbreeding. Limiting the number of females per bull and herd to a maximum of 5% instead of 10% also decreased the rate of inbreeding. The 5% females per bull and herd constraint lowered the variation in carrier frequency for genetic defects, which reduced the risk of mating two carriers of an unknown genetic defect in future generations after the widespread use of carriers in previous generations. However, the 10% females per bull constraint accelerated the increase in the polled allele. Therefore, planning matings with genomic information at the herd level involves important risk management decisions, such as balancing the trade-off between using fewer bulls to increase the polled allele frequency more quickly and using more bulls to reduce the rate of inbreeding and the variation in carrier frequency for genetic defects.
Collapse
Affiliation(s)
- C Bengtsson
- VikingGenetics, VikingGenetics Sweden AB, 53294 Skara, Sweden; Dept. of Animal Biosciences, Swedish University of Agricultural Sciences, Box 7023, 75007 Uppsala, Sweden.
| | - H Stålhammar
- VikingGenetics, VikingGenetics Sweden AB, 53294 Skara, Sweden
| | - J R Thomasen
- VikingGenetics, VikingGenetics Sweden AB, 53294 Skara, Sweden; Aarhus University, Center for Quantitative Genetics and Genomics, C. F. Møllers Allé 3, 8000 Aarhus, Denmark
| | - W F Fikse
- Växa, Box 288, 75105 Uppsala, Sweden
| | - E Strandberg
- Dept. of Animal Biosciences, Swedish University of Agricultural Sciences, Box 7023, 75007 Uppsala, Sweden
| | - S Eriksson
- Dept. of Animal Biosciences, Swedish University of Agricultural Sciences, Box 7023, 75007 Uppsala, Sweden
| | - M Johnsson
- Dept. of Animal Biosciences, Swedish University of Agricultural Sciences, Box 7023, 75007 Uppsala, Sweden; Beijer Laboratory for Animal Science, Swedish University of Agricultural Sciences, Box 7024, 750 07 Uppsala, Sweden
| |
Collapse
|
2
|
Yuan C, Gillon A, Gualdrón Duarte JL, Takeda H, Coppieters W, Georges M, Druet T. Evaluation of genomic selection models using whole genome sequence data and functional annotation in Belgian Blue cattle. Genet Sel Evol 2025; 57:10. [PMID: 40038647 PMCID: PMC11881496 DOI: 10.1186/s12711-025-00955-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2024] [Accepted: 02/10/2025] [Indexed: 03/06/2025] Open
Abstract
BACKGROUND The availability of large cohorts of whole-genome sequenced individuals, combined with functional annotation, is expected to provide opportunities to improve the accuracy of genomic selection (GS). However, such benefits have not often been observed in initial applications. The reference population for GS in Belgian Blue Cattle (BBC) continues to grow. Combined with the availability of reference panels of sequenced individuals, it provides an opportunity to evaluate GS models using whole genome sequence (WGS) data and functional annotation. RESULTS Here, we used data from 16,508 cows, with phenotypes for five muscular development traits and imputed at the WGS level, in combination with in silico functional annotation and catalogs of putative regulatory variants obtained from experimental data. We evaluated first GS models using the entire WGS data, with or without functional annotation. At this marker density, we were able to run two approaches, assuming either a highly polygenic architecture (GBLUP) or allowing some variants to have larger effects (BayesRR-RC, a Bayesian mixture model), and observed an increased reliability compared to the official GBLUP model at medium marker density (on average 0.016 and 0.018 for GBLUP and BayesRR-RC, respectively). When functional annotation was used, we observed slightly higher reliabilities with an extension of GBLUP that included multiple polygenic terms (one per functional group), while reliabilities decreased with BayesRR-RC. We then used large subsets of variants selected based on functional information or with a linkage disequilibrium (LD) pruning approach, which allowed us to evaluate two additional approaches, BayesCπ and Bayesian Sparse Linear Mixed Model (BSLMM). Reliabilities were higher for these panels than for the WGS data, with the highest accuracies obtained when markers were selected based on functional information. In our setting, BSLMM systematically achieved higher reliabilities than other methods. CONCLUSIONS GS with large panels of functional variants selected from WGS data allowed a significant increase in reliability compared to the official genomic evaluation approach. However, the benefits of using WGS and functional data remained modest, indicating that there is still room for improvement, for example by further refining the functional annotation in the BBC breed.
Collapse
Affiliation(s)
- Can Yuan
- Unit of Animal Genomics, GIGA-R & Faculty of Veterinary Medicine, University of Liège, Avenue de l'Hôpital, 1, 4000, Liège, Belgium.
| | - Alain Gillon
- Walloon Breeders Association, Rue Des Champs Elysées, 4, 5590, Ciney, Belgium
| | | | - Haruko Takeda
- Unit of Animal Genomics, GIGA-R & Faculty of Veterinary Medicine, University of Liège, Avenue de l'Hôpital, 1, 4000, Liège, Belgium
| | - Wouter Coppieters
- Unit of Animal Genomics, GIGA-R & Faculty of Veterinary Medicine, University of Liège, Avenue de l'Hôpital, 1, 4000, Liège, Belgium
| | - Michel Georges
- Unit of Animal Genomics, GIGA-R & Faculty of Veterinary Medicine, University of Liège, Avenue de l'Hôpital, 1, 4000, Liège, Belgium
| | - Tom Druet
- Unit of Animal Genomics, GIGA-R & Faculty of Veterinary Medicine, University of Liège, Avenue de l'Hôpital, 1, 4000, Liège, Belgium
| |
Collapse
|
3
|
Vanvanhossou SFU, Yin T, Gorjanc G, König S. Evaluation of crossbreeding strategies for improved adaptation and productivity in African smallholder cattle farms. Genet Sel Evol 2025; 57:6. [PMID: 39979829 PMCID: PMC11844127 DOI: 10.1186/s12711-025-00952-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2024] [Accepted: 01/23/2025] [Indexed: 02/22/2025] Open
Abstract
BACKGROUND Crossbreeding is successfully implemented worldwide to improve animal productivity and adaptability. However, recurrent failures of crossbreeding programmes in African countries imply the need to design effective strategies for the predominant smallholder production systems. METHODS A comprehensive simulation procedure mimicked body weight (BWL) and tick count (TCL) incidence in a local taurine cattle breed and in an exotic indicine beef cattle breed (BWE and TCE, respectively). The two breeds were crossed to produce F1 and rotational animals. Additionally, synthetic breeds were created by applying four schemes defined as farm bull (FB), intra-village bull (IVB), exchanged-village bull (EVB), and population-wide bull (PWB) scheme. These schemes reflect different strategies to select and allocate bulls to smallholder farms. The different crosses were compared with the local breed over 20 generations by varying the genetic correlation between the traits ( r g = - 0.4, 0, 0.4), genotype-by-environment effects (GxE) between local and exotic environment ( r g × e = 0.4, 0.6, 0.8), and the relative emphasis of TCL compared to BWL in a selection index (SI_TCL10%, SI_TCL30%, SI_TCL50%). RESULTS Regardless of r g and r g × e , EVB achieved the highest phenotypic and genetic gains for BWL and TCL over the 20 generations with SI_TCL50%. However, EVB displayed lower phenotypic means than F1 crosses in the first seven generations due to the loss of heterosis. Additive genetic variances were generally larger in synthetic crosses than in F1 and local animals, explaining the larger responses to selection. In addition, the EVB was the most effective strategy to stabilize inbreeding and retain heterosis in the advanced generations of synthetic animals. Low emphasis on TCL (SI_TCL30%, SI_TCL10%) resulted in negative phenotypic gain for TCL in synthetic animals when rg = - 0.4. In contrast to F1 and rotational crosses, GxE effects did not affect phenotypic gain in synthetic crosses. CONCLUSIONS The study demonstrates opportunities for long-term genetic improvement of adaptive and productive performances in smallholder cattle farms using synthetic breeding. Extensive exchange of semen between villages or regions controls inbreeding and additionally contributes to increasing genetic gain. Furthermore, the definition of a suitable selection index prevents antagonistic selection responses caused by negative correlations between traits and GxE effects.
Collapse
Affiliation(s)
| | - Tong Yin
- Institute of Animal Breeding and Genetics, Justus-Liebig-University Gießen, 35390, Gießen, Germany
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, UK
| | - Sven König
- Institute of Animal Breeding and Genetics, Justus-Liebig-University Gießen, 35390, Gießen, Germany
| |
Collapse
|
4
|
Pardo AM, Legarra A, Vitezica ZG, Forneris NS, Maizon DO, Munilla S. On the ability of the LR method to detect bias when there is pedigree misspecification and lack of connectedness. Genet Sel Evol 2024; 56:74. [PMID: 39574003 PMCID: PMC11583403 DOI: 10.1186/s12711-024-00943-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 10/31/2024] [Indexed: 11/24/2024] Open
Abstract
BACKGROUND Cross-validation techniques in genetic evaluations encounter limitations due to the unobservable nature of breeding values and the challenge of validating estimated breeding values (EBVs) against pre-corrected phenotypes, challenges which the Linear Regression (LR) method addresses as an alternative. Furthermore, beef cattle genetic evaluation programs confront challenges with connectedness among herds and pedigree errors. The objective of this work was to evaluate the LR method's performance under pedigree errors and weak connectedness typical in beef cattle genetic evaluations, through simulation. METHODS We simulated a beef cattle population resembling the Argentinean Brangus, including a quantitative trait selected over six pseudo-generations with a heritability of 0.4. This study considered various scenarios, including: 25% and 40% pedigree errors (PE-25 and PE-40), weak and strong connectedness among herds (WCO and SCO, respectively), and a benchmark scenario (BEN) with complete pedigree and optimal herd connections. RESULTS Over six pseudo-generations of selection, genetic gain was simulated to be under- and over-estimated in PE-40 and WCO, respectively, contrary to the BEN scenario which was unbiased. In genetic evaluations with PE-25 and PE-40, true biases of - 0.13 and - 0.18 genetic standard deviations were simulated, respectively. In the BEN scenario, the LR method accurately estimated bias, however, in PE-25 and PE-40 scenarios, it overestimated biases by 0.17 and 0.25 genetic standard deviations, respectively. In herds facing WCO, significant true bias due to confounding environmental and genetic effects was simulated, and the corresponding LR statistic failed to accurately estimate the magnitude and direction of this bias. On average, true dispersion values were close to one for BEN, PE-40, SCO and WCO, showing no significant inflation or deflation, and the values were accurately estimated by LR. However, PE-25 exhibited inflation of EBVs and was slightly underestimated by LR. Accuracies and reliabilities showed good agreement between true and LR estimated values for the scenarios evaluated. CONCLUSIONS The LR method demonstrated limitations in identifying biases induced by incomplete pedigrees, including scenarios with as much as 40% pedigree errors, or lack of connectedness, but it was effective in assessing dispersion, and population accuracies and reliabilities even in the challenging scenarios addressed.
Collapse
Affiliation(s)
- Alan M Pardo
- Estación Experimental Agropecuaria Balcarce, Instituto Nacional de Tecnología Agropecuaria (INTA), B7620, Balcarce, Argentina.
- Facultad de Ciencias Agrarias, Universidad Nacional de Mar del Plata, B7620, Balcarce, Argentina.
| | | | | | - Natalia S Forneris
- Facultad de Agronomía, Universidad de Buenos Aires, C1417DSQ, Buenos Aires, Argentina
- Instituto de Investigaciones en Producción Animal (INPA), CONICET-Universidad de Buenos Aires, C1427CWO, Buenos Aires, Argentina
| | - Daniel O Maizon
- Estación Experimental Agropecuaria Anguil, Instituto Nacional de Tecnología Agropecuaria (INTA), L6326, Anguil, Argentina
| | - Sebastián Munilla
- Facultad de Agronomía, Universidad de Buenos Aires, C1417DSQ, Buenos Aires, Argentina
- Instituto de Investigaciones en Producción Animal (INPA), CONICET-Universidad de Buenos Aires, C1427CWO, Buenos Aires, Argentina
| |
Collapse
|
5
|
Fortuna GM, Zumbach B, Johnsson M, Pocrnic I, Gorjanc G. Accounting for the nuclear and mito genome in dairy cattle breeding-A simulation study. JDS COMMUNICATIONS 2024; 5:572-576. [PMID: 39650025 PMCID: PMC11624359 DOI: 10.3168/jdsc.2023-0522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Accepted: 04/19/2024] [Indexed: 12/11/2024]
Abstract
Mitochondria play a significant role in numerous cellular processes through proteins encoded by both the nuclear genome (nDNA) and mito genome (mDNA), and increasing evidence shows that traits of interest might be affected by mito-nuclear interactions. Whereas the variation in nDNA is influenced by mutations and recombination of parental genomes, the variation in mDNA is solely driven by mutations. In addition, mDNA is inherited in a haploid form, from the dam. Cattle populations show substantial variation in mDNA between and within breeds. Past research suggests that variation in mDNA accounts for 1% to 5% of the phenotypic variation in dairy traits. Here we simulated a dairy cattle breeding program to assess the impact of accounting for mDNA variation in pedigree-based and genome-based genetic evaluations on the accuracy of EBVs for mDNA and nDNA components. We also examined the impact of alternative definitions of breeding values on genetic gain, including nDNA and mDNA components that both affect phenotype expression, but mDNA is inherited only maternally. We found that accounting for mDNA variation increased accuracy between +0.01 and +0.03 for different categories of animals, especially for young bulls (+0.03) and females without genotype data (between +0.01 and +0.03). Different scenarios of modeling and breeding value definition affected genetic gain. The standard approach of ignoring mDNA variation achieved competitive genetic gain. Modeling but not selecting on mDNA expectedly reduced genetic gain, whereas optimal use of mDNA variation recovered the genetic gain.
Collapse
Affiliation(s)
| | - B.J. Zumbach
- Division of Plant Breeding Methodology, Georg-August-Universität Göttingen, 37075, Göttingen, Germany
| | - M. Johnsson
- Department of Animal Biosciences, Swedish University of Agricultural Sciences, 750 07 Uppsala, Sweden
| | - I. Pocrnic
- The University of Edinburgh, The Roslin Institute, EH25 9RG, Edinburgh, United Kingdom
| | - G. Gorjanc
- The University of Edinburgh, The Roslin Institute, EH25 9RG, Edinburgh, United Kingdom
| |
Collapse
|
6
|
Rossi C, Sinding MHS, Mullin VE, Scheu A, Erven JAM, Verdugo MP, Daly KG, Ciucani MM, Mattiangeli V, Teasdale MD, Diquelou D, Manin A, Bangsgaard P, Collins M, Lord TC, Zeibert V, Zorzin R, Vinter M, Timmons Z, Kitchener AC, Street M, Haruda AF, Tabbada K, Larson G, Frantz LAF, Gehlen B, Alhaique F, Tagliacozzo A, Fornasiero M, Pandolfi L, Karastoyanova N, Sørensen L, Kiryushin K, Ekström J, Mostadius M, Grandal-d'Anglade A, Vidal-Gorosquieta A, Benecke N, Kropp C, Grushin SP, Gilbert MTP, Merts I, Merts V, Outram AK, Rosengren E, Kosintsev P, Sablin M, Tishkin AA, Makarewicz CA, Burger J, Bradley DG. The genomic natural history of the aurochs. Nature 2024; 635:136-141. [PMID: 39478219 DOI: 10.1038/s41586-024-08112-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2024] [Accepted: 09/25/2024] [Indexed: 11/04/2024]
Abstract
Now extinct, the aurochs (Bos primigenius) was a keystone species in prehistoric Eurasian and North African ecosystems, and the progenitor of cattle (Bos taurus), domesticates that have provided people with food and labour for millennia1. Here we analysed 38 ancient genomes and found 4 distinct population ancestries in the aurochs-European, Southwest Asian, North Asian and South Asian-each of which has dynamic trajectories that have responded to changes in climate and human influence. Similarly to Homo heidelbergensis, aurochsen first entered Europe around 650 thousand years ago2, but early populations left only trace ancestry, with both North Asian and European B. primigenius genomes coalescing during the most recent glaciation. North Asian and European populations then appear separated until mixing after the climate amelioration of the early Holocene. European aurochsen endured the more severe bottleneck during the Last Glacial Maximum, retreating to southern refugia before recolonizing from Iberia. Domestication involved the capture of a small number of individuals from the Southwest Asian aurochs population, followed by early and pervasive male-mediated admixture involving each ancestral strain of aurochs after domestic stocks dispersed beyond their cradle of origin.
Collapse
Affiliation(s)
- Conor Rossi
- Smurfit Institute of Genetics, Trinity College Dublin, Dublin, Ireland
| | | | - Victoria E Mullin
- Smurfit Institute of Genetics, Trinity College Dublin, Dublin, Ireland
| | - Amelie Scheu
- Smurfit Institute of Genetics, Trinity College Dublin, Dublin, Ireland
- Palaeogenetics Group, Institute of Organismic and Molecular Evolution (iomE), Johannes Gutenberg University Mainz, Mainz, Germany
| | - Jolijn A M Erven
- Smurfit Institute of Genetics, Trinity College Dublin, Dublin, Ireland
- Groningen Institute of Archaeology, University of Groningen, Groningen, The Netherlands
| | | | - Kevin G Daly
- Smurfit Institute of Genetics, Trinity College Dublin, Dublin, Ireland
- School of Agriculture and Food Science, University College Dublin, Dublin, Ireland
| | - Marta Maria Ciucani
- Globe Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | | | - Matthew D Teasdale
- Smurfit Institute of Genetics, Trinity College Dublin, Dublin, Ireland
- Bioinformatics Support Unit, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, UK
| | - Deborah Diquelou
- Smurfit Institute of Genetics, Trinity College Dublin, Dublin, Ireland
| | - Aurélie Manin
- Palaeogenomics and Bio-Archaeology Research Network, Research Laboratory for Archaeology and History of Art, University of Oxford, Oxford, UK
| | - Pernille Bangsgaard
- Globe Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Matthew Collins
- Globe Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
- McDonald Institute for Archaeological Research, University of Cambridge, Cambridge, UK
| | | | - Viktor Zeibert
- Institute of Archaeology and Steppe Civilizations, Al-Farabi Kazakh National University, Almaty, Kazakhstan
| | - Roberto Zorzin
- Sezione di Geologia e Paleontologia, Museo Civico di Storia Naturale di Verona, Verona, Italy
| | | | - Zena Timmons
- Department of Natural Sciences, National Museums Scotland, Edinburgh, UK
| | - Andrew C Kitchener
- Department of Natural Sciences, National Museums Scotland, Edinburgh, UK
- School of Geosciences, University of Edinburgh, Edinburgh, UK
| | - Martin Street
- LEIZA, Archaeological Research Centre and Museum for Human Behavioural Evolution, Schloss Monrepos, Neuwied, Germany
| | - Ashleigh F Haruda
- Palaeogenomics and Bio-Archaeology Research Network, Research Laboratory for Archaeology and History of Art, University of Oxford, Oxford, UK
| | - Kristina Tabbada
- Palaeogenomics and Bio-Archaeology Research Network, Research Laboratory for Archaeology and History of Art, University of Oxford, Oxford, UK
| | - Greger Larson
- Palaeogenomics and Bio-Archaeology Research Network, Research Laboratory for Archaeology and History of Art, University of Oxford, Oxford, UK
| | - Laurent A F Frantz
- Palaeogenomics Group, Institute of Palaeoanatomy, Domestication Research and the History of Veterinary Medicine, Ludwig-Maximilians-Universität, Munich, Germany
- School of Biological and Chemical Sciences, Queen Mary University of London, London, UK
| | - Birgit Gehlen
- Institute for Prehistory and Protohistory, University of Cologne, Cologne, Germany
| | - Francesca Alhaique
- Bioarchaeology Service, Museo delle Civiltà, Piazza Guglielmo Marconi, Rome, Italy
| | - Antonio Tagliacozzo
- Bioarchaeology Service, Museo delle Civiltà, Piazza Guglielmo Marconi, Rome, Italy
| | | | - Luca Pandolfi
- Dipartimento di Scienze della Terra, Università di Pisa, Pisa, Italy
| | - Nadezhda Karastoyanova
- Department of Paleontology and Mineralogy, National Museum of Natural History, Bulgarian Academy of Sciences, Sofia, Bulgaria
| | | | - Kirill Kiryushin
- Department of Recreational Geography, Service, Tourism and Hospitality, Institute of Geography, Altai State University, Barnaul, Russian Federation
| | - Jonas Ekström
- The Biological Museum, Lund University, Arkivcentrum Syd, Lund, Sweden
| | - Maria Mostadius
- The Biological Museum, Lund University, Arkivcentrum Syd, Lund, Sweden
| | | | | | - Norbert Benecke
- German Archaeological Institute, Central Department, Berlin, Germany
| | - Claus Kropp
- Lauresham Laboratory for Experimental Archaeology, UNESCO-Welterbestätte Kloster Lorsch, Lorsch, Germany
| | - Sergei P Grushin
- Department of Archaeology, Ethnography and Museology, Altai State University, Barnaul, Russian Federation
| | - M Thomas P Gilbert
- Globe Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Ilja Merts
- Toraighyrov University, Joint Research Center for Archeological Studies, Pavlodar, Kazakhstan
| | - Viktor Merts
- Toraighyrov University, Joint Research Center for Archeological Studies, Pavlodar, Kazakhstan
| | - Alan K Outram
- Department of Archaeology and History, University of Exeter, Exeter, UK
| | - Erika Rosengren
- Department of Archaeology and Ancient History, Lund University, Lund, Sweden
- Centre for Palaeogenetics, Stockholm, Sweden
- Lund University Historical Museum, Lund, Sweden
| | - Pavel Kosintsev
- Paleoecology Laboratory, Institute of Plant and Animal Ecology, Ural Branch of the Russian Academy of Sciences, Ekaterinburg, Russian Federation
- Department of History, Institute of Humanities, Ural Federal University, Ekaterinburg, Russian Federation
| | - Mikhail Sablin
- Zoological Institute of the Russian Academy of Sciences, Saint Petersburg, Russian Federation
| | - Alexey A Tishkin
- Department of Archaeology, Ethnography and Museology, Altai State University, Barnaul, Russian Federation
| | - Cheryl A Makarewicz
- Archaeology Stable Isotope Laboratory, Institute of Pre- and Protohistoric Archaeology, University of Kiel, Kiel, Germany
- University of Haifa, Haifa, Israel
| | - Joachim Burger
- Palaeogenetics Group, Institute of Organismic and Molecular Evolution (iomE), Johannes Gutenberg University Mainz, Mainz, Germany
| | - Daniel G Bradley
- Smurfit Institute of Genetics, Trinity College Dublin, Dublin, Ireland.
| |
Collapse
|
7
|
Yuan C, Gualdrón Duarte JL, Takeda H, Georges M, Druet T. Evaluation of heritability partitioning approaches in livestock populations. BMC Genomics 2024; 25:690. [PMID: 39003468 PMCID: PMC11246585 DOI: 10.1186/s12864-024-10600-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 07/08/2024] [Indexed: 07/15/2024] Open
Abstract
BACKGROUND Heritability partitioning approaches estimate the contribution of different functional classes, such as coding or regulatory variants, to the genetic variance. This information allows a better understanding of the genetic architecture of complex traits, including complex diseases, but can also help improve the accuracy of genomic selection in livestock species. However, methods have mainly been tested on human genomic data, whereas livestock populations have specific characteristics, such as high levels of relatedness, small effective population size or long-range levels of linkage disequilibrium. RESULTS Here, we used data from 14,762 cows, imputed at the whole-genome sequence level for 11,537,240 variants, to simulate traits in a typical livestock population and evaluate the accuracy of two state-of-the-art heritability partitioning methods, GREML and a Bayesian mixture model. In simulations where a single functional class had increased contribution to heritability, we observed that the estimators were unbiased but had low precision. When causal variants were enriched in variants with low (< 0.05) or high (> 0.20) minor allele frequency or low (below 1st quartile) or high (above 3rd quartile) linkage disequilibrium scores, it was necessary to partition the genetic variance into multiple classes defined on the basis of allele frequencies or LD scores to obtain unbiased results. When multiple functional classes had variable contributions to heritability, estimators showed higher levels of variation and confounding between certain categories was observed. In addition, estimators from small categories were particularly imprecise. However, the estimates and their ranking were still informative about the contribution of the classes. We also demonstrated that using methods that estimate the contribution of a single category at a time, a commonly used approach, results in an overestimation. Finally, we applied the methods to phenotypes for muscular development and height and estimated that, on average, variants in open chromatin regions had a higher contribution to the genetic variance (> 45%), while variants in coding regions had the strongest individual effects (> 25-fold enrichment on average). Conversely, variants in intergenic or intronic regions showed lower levels of enrichment (0.2 and 0.6-fold on average, respectively). CONCLUSIONS Heritability partitioning approaches should be used cautiously in livestock populations, in particular for small categories. Two-component approaches that fit only one functional category at a time lead to biased estimators and should not be used.
Collapse
Affiliation(s)
- Can Yuan
- Unit of Animal Genomics, GIGA-R & Faculty of Veterinary Medicine, University of Liège, Avenue de L'Hôpital, 1, 4000, Liège, Belgium.
| | | | - Haruko Takeda
- Unit of Animal Genomics, GIGA-R & Faculty of Veterinary Medicine, University of Liège, Avenue de L'Hôpital, 1, 4000, Liège, Belgium
| | - Michel Georges
- Unit of Animal Genomics, GIGA-R & Faculty of Veterinary Medicine, University of Liège, Avenue de L'Hôpital, 1, 4000, Liège, Belgium
| | - Tom Druet
- Unit of Animal Genomics, GIGA-R & Faculty of Veterinary Medicine, University of Liège, Avenue de L'Hôpital, 1, 4000, Liège, Belgium
| |
Collapse
|
8
|
Wong Y, Ignatieva A, Koskela J, Gorjanc G, Wohns AW, Kelleher J. A general and efficient representation of ancestral recombination graphs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.03.565466. [PMID: 37961279 PMCID: PMC10635123 DOI: 10.1101/2023.11.03.565466] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
As a result of recombination, adjacent nucleotides can have different paths of genetic inheritance and therefore the genealogical trees for a sample of DNA sequences vary along the genome. The structure capturing the details of these intricately interwoven paths of inheritance is referred to as an ancestral recombination graph (ARG). Classical formalisms have focused on mapping coalescence and recombination events to the nodes in an ARG. This approach is out of step with modern developments, which do not represent genetic inheritance in terms of these events or explicitly infer them. We present a simple formalism that defines an ARG in terms of specific genomes and their intervals of genetic inheritance, and show how it generalises these classical treatments and encompasses the outputs of recent methods. We discuss nuances arising from this more general structure, and argue that it forms an appropriate basis for a software standard in this rapidly growing field.
Collapse
Affiliation(s)
- Yan Wong
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, UK
| | - Anastasia Ignatieva
- School of Mathematics and Statistics, University of Glasgow, UK
- Department of Statistics, University of Oxford, UK
| | - Jere Koskela
- School of Mathematics, Statistics and Physics, Newcastle University, UK
- Department of Statistics, University of Warwick, UK
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, UK
| | - Anthony W. Wohns
- Broad Institute of MIT and Harvard, Cambridge, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, USA
| | - Jerome Kelleher
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, UK
| |
Collapse
|
9
|
Benjamin NR, Crooijmans RPMA, Jordan LR, Bolt CR, Schook LB, Schachtschneider KM, Groenen MAM, Roca AL. Swine global genomic resources: insights into wild and domesticated populations. Mamm Genome 2023; 34:520-530. [PMID: 37805667 DOI: 10.1007/s00335-023-10012-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Accepted: 07/25/2023] [Indexed: 10/09/2023]
Abstract
Suids, both domesticated and wild, are found on all continents except for Antarctica and provide valuable food resources for humans in addition to serving as important models for biomedical research. Continuing advances in genome sequencing have allowed researchers to compare the genomes from diverse populations of suids helping to clarify their evolution and dispersal. Further analysis of these samples may provide clues to improve disease resistance/resilience and productivity in domestic suids as well as better ways of classifying and conserving genetic diversity within wild and captive suids. Collecting samples from diverse populations of suids is resource intensive and may negatively impact endangered populations. Here we catalog extensive tissue and DNA samples from suids in collections in both Europe and North America. We include samples that have previously been used for whole genome sequencing, targeted DNA sequencing, RNA sequencing, and reduced representation bisulfite sequencing (RRBS). This work provides an important centralized resource for researchers who wish to access published databases.
Collapse
Affiliation(s)
- Neal R Benjamin
- The Program in Ecology, Evolution and Conservation Biology, University of Illinois Urbana-Champaign, Urbana, IL, USA
| | | | - Luke R Jordan
- Department of Radiology, University of Illinois at Chicago, Chicago, IL, USA
| | - Courtni R Bolt
- Department of Animal Sciences, University of Illinois Urbana-Champaign, Urbana, IL, USA
| | - Lawrence B Schook
- Department of Animal Sciences, University of Illinois Urbana-Champaign, Urbana, IL, USA
- National Center for Supercomputing Applications, University of Illinois at Chicago, Chicago, IL, USA
| | - Kyle M Schachtschneider
- National Center for Supercomputing Applications, University of Illinois at Chicago, Chicago, IL, USA.
- Department of Biochemistry and Molecular Genetics, University of Illinois at Chicago, Chicago, IL, USA.
- Department of Radiology, University of Illinois at Chicago, Chicago, IL, USA.
| | - Martien A M Groenen
- Animal Breeding and Genomics, Wageningen University and Research, Wageningen, The Netherlands
| | - Alfred L Roca
- The Program in Ecology, Evolution and Conservation Biology, University of Illinois Urbana-Champaign, Urbana, IL, USA.
- Department of Animal Sciences, University of Illinois Urbana-Champaign, Urbana, IL, USA.
| |
Collapse
|
10
|
Martchenko D, Shafer ABA. Contrasting whole-genome and reduced representation sequencing for population demographic and adaptive inference: an alpine mammal case study. Heredity (Edinb) 2023; 131:273-281. [PMID: 37532838 PMCID: PMC10539292 DOI: 10.1038/s41437-023-00643-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2022] [Revised: 07/22/2023] [Accepted: 07/22/2023] [Indexed: 08/04/2023] Open
Abstract
Genomes capture the adaptive and demographic history of a species, but the choice of sequencing strategy and sample size can impact such inferences. We compared whole genome and reduced representation sequencing approaches to study the population demographic and adaptive signals of the North American mountain goat (Oreamnos americanus). We applied the restriction site-associated DNA sequencing (RADseq) approach to 254 individuals and whole genome resequencing (WGS) approach to 35 individuals across the species range at mid-level coverage (9X) and to 5 individuals at high coverage (30X). We used ANGSD to estimate the genotype likelihoods and estimated the effective population size (Ne), population structure, and explicitly modelled the demographic history with δaδi and MSMC2. The data sets were overall concordant in supporting a glacial induced vicariance and extremely low Ne in mountain goats. We evaluated a set of climatic variables and geographic location as predictors of genetic diversity using redundancy analysis. A moderate proportion of total variance (36% for WGS and 21% for RADseq data sets) was explained by geography and climate variables; both data sets support a large impact of drift and some degree of local adaptation. The empirical similarities of WGS and RADseq presented herein reassuringly suggest that both approaches will recover large demographic and adaptive signals in a population; however, WGS offers several advantages over RADseq, such as inferring adaptive processes and calculating runs-of-homozygosity estimates. Considering the predicted climate-induced changes in alpine environments and the genetically depauperate mountain goat, the long-term adaptive capabilities of this enigmatic species are questionable.
Collapse
Affiliation(s)
- Daria Martchenko
- Environmental and Life Sciences Graduate Program, Trent University, 2140 East Bank Drive, Peterborough, ON, K9J 7B8, Canada.
| | - Aaron B A Shafer
- Environmental and Life Sciences Graduate Program, Trent University, 2140 East Bank Drive, Peterborough, ON, K9J 7B8, Canada
- Department of Forensics & Environmental and Life Sciences Graduate Program, Trent University, 2140 East Bank Drive, Peterborough, ON, K9J 7B8, Canada
| |
Collapse
|
11
|
Burnett HA, Bieker VC, Le Moullec M, Peeters B, Rosvold J, Pedersen ÅØ, Dalén L, Loe LE, Jensen H, Hansen BB, Martin MD. Contrasting genomic consequences of anthropogenic reintroduction and natural recolonization in high-arctic wild reindeer. Evol Appl 2023; 16:1531-1548. [PMID: 37752961 PMCID: PMC10519417 DOI: 10.1111/eva.13585] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Revised: 07/27/2023] [Accepted: 08/01/2023] [Indexed: 09/28/2023] Open
Abstract
Anthropogenic reintroduction can supplement natural recolonization in reestablishing a species' distribution and abundance. However, both reintroductions and recolonizations can give rise to founder effects that reduce genetic diversity and increase inbreeding, potentially causing the accumulation of genetic load and reduced fitness. Most current populations of the endemic high-arctic Svalbard reindeer (Rangifer tarandus platyrhynchus) originate from recent reintroductions or recolonizations following regional extirpations due to past overharvesting. We investigated and compared the genomic consequences of these two paths to reestablishment using whole-genome shotgun sequencing of 100 Svalbard reindeer across their range. We found little admixture between reintroduced and natural populations. Two reintroduced populations, each founded by 12 individuals around four decades (i.e. 8 reindeer generations) ago, formed two distinct genetic clusters. Compared to the source population, these populations showed only small decreases in genome-wide heterozygosity and increases in inbreeding and lengths of runs of homozygosity. In contrast, the two naturally recolonized populations without admixture possessed much lower heterozygosity, higher inbreeding and longer runs of homozygosity, possibly caused by serial population founder effects and/or fewer or more genetically related founders than in the reintroduction events. Naturally recolonized populations can thus be more vulnerable to the accumulation of genetic load than reintroduced populations. This suggests that in some organisms even small-scale reintroduction programs based on genetically diverse source populations can be more effective than natural recolonization in establishing genetically diverse populations. These findings warrant particular attention in the conservation and management of populations and species threatened by habitat fragmentation and loss.
Collapse
Affiliation(s)
- Hamish A. Burnett
- Centre for Biodiversity Dynamics, Department of BiologyNorwegian University of Science and Technology (NTNU)TrondheimNorway
- Department of Natural History, NTNU University MuseumNorwegian University of Science and Technology (NTNU)TrondheimNorway
| | - Vanessa C. Bieker
- Centre for Biodiversity Dynamics, Department of BiologyNorwegian University of Science and Technology (NTNU)TrondheimNorway
- Department of Natural History, NTNU University MuseumNorwegian University of Science and Technology (NTNU)TrondheimNorway
| | - Mathilde Le Moullec
- Centre for Biodiversity Dynamics, Department of BiologyNorwegian University of Science and Technology (NTNU)TrondheimNorway
| | - Bart Peeters
- Centre for Biodiversity Dynamics, Department of BiologyNorwegian University of Science and Technology (NTNU)TrondheimNorway
| | - Jørgen Rosvold
- Department of Terrestrial BiodiversityNorwegian Institute for Nature Research (NINA)TrondheimNorway
| | | | - Love Dalén
- Centre for PalaeogeneticsStockholmSweden
- Department of Bioinformatics and GeneticsSwedish Museum of Natural HistoryStockholmSweden
- Department of ZoologyStockholm UniversityStockholmSweden
| | - Leif Egil Loe
- Faculty of Environmental Sciences and Natural Resource ManagementNorwegian University of Life SciencesAasNorway
| | - Henrik Jensen
- Centre for Biodiversity Dynamics, Department of BiologyNorwegian University of Science and Technology (NTNU)TrondheimNorway
| | - Brage B. Hansen
- Centre for Biodiversity Dynamics, Department of BiologyNorwegian University of Science and Technology (NTNU)TrondheimNorway
- Department of Terrestrial EcologyNorwegian Institute for Nature Research (NINA)TrondheimNorway
| | - Michael D. Martin
- Centre for Biodiversity Dynamics, Department of BiologyNorwegian University of Science and Technology (NTNU)TrondheimNorway
- Department of Natural History, NTNU University MuseumNorwegian University of Science and Technology (NTNU)TrondheimNorway
| |
Collapse
|
12
|
Lauterbur ME, Cavassim MIA, Gladstein AL, Gower G, Pope NS, Tsambos G, Adrion J, Belsare S, Biddanda A, Caudill V, Cury J, Echevarria I, Haller BC, Hasan AR, Huang X, Iasi LNM, Noskova E, Obsteter J, Pavinato VAC, Pearson A, Peede D, Perez MF, Rodrigues MF, Smith CCR, Spence JP, Teterina A, Tittes S, Unneberg P, Vazquez JM, Waples RK, Wohns AW, Wong Y, Baumdicker F, Cartwright RA, Gorjanc G, Gutenkunst RN, Kelleher J, Kern AD, Ragsdale AP, Ralph PL, Schrider DR, Gronau I. Expanding the stdpopsim species catalog, and lessons learned for realistic genome simulations. eLife 2023; 12:RP84874. [PMID: 37342968 DOI: 10.7554/elife.84874] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/23/2023] Open
Abstract
Simulation is a key tool in population genetics for both methods development and empirical research, but producing simulations that recapitulate the main features of genomic datasets remains a major obstacle. Today, more realistic simulations are possible thanks to large increases in the quantity and quality of available genetic data, and the sophistication of inference and simulation software. However, implementing these simulations still requires substantial time and specialized knowledge. These challenges are especially pronounced for simulating genomes for species that are not well-studied, since it is not always clear what information is required to produce simulations with a level of realism sufficient to confidently answer a given question. The community-developed framework stdpopsim seeks to lower this barrier by facilitating the simulation of complex population genetic models using up-to-date information. The initial version of stdpopsim focused on establishing this framework using six well-characterized model species (Adrion et al., 2020). Here, we report on major improvements made in the new release of stdpopsim (version 0.2), which includes a significant expansion of the species catalog and substantial additions to simulation capabilities. Features added to improve the realism of the simulated genomes include non-crossover recombination and provision of species-specific genomic annotations. Through community-driven efforts, we expanded the number of species in the catalog more than threefold and broadened coverage across the tree of life. During the process of expanding the catalog, we have identified common sticking points and developed the best practices for setting up genome-scale simulations. We describe the input data required for generating a realistic simulation, suggest good practices for obtaining the relevant information from the literature, and discuss common pitfalls and major considerations. These improvements to stdpopsim aim to further promote the use of realistic whole-genome population genetic simulations, especially in non-model organisms, making them available, transparent, and accessible to everyone.
Collapse
Affiliation(s)
- M Elise Lauterbur
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, United States
| | - Maria Izabel A Cavassim
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, United States
| | | | - Graham Gower
- Section for Molecular Ecology and Evolution, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Nathaniel S Pope
- Institute of Ecology and Evolution, University of Oregon, Eugene, United States
| | - Georgia Tsambos
- School of Mathematics and Statistics, University of Melbourne, Melbourne, Australia
| | - Jeffrey Adrion
- Institute of Ecology and Evolution, University of Oregon, Eugene, United States
- Ancestry DNA, San Francisco, United States
| | - Saurabh Belsare
- Institute of Ecology and Evolution, University of Oregon, Eugene, United States
| | | | - Victoria Caudill
- Institute of Ecology and Evolution, University of Oregon, Eugene, United States
| | - Jean Cury
- Universite Paris-Saclay, CNRS, INRIA, Laboratoire Interdisciplinaire des Sciences du Numerique, Orsay, France
| | | | - Benjamin C Haller
- Department of Computational Biology, Cornell University, Ithaca, United States
| | - Ahmed R Hasan
- Department of Cell and Systems Biology, University of Toronto, Toronto, Canada
- Department of Biology, University of Toronto Mississauga, Mississauga, Canada
| | - Xin Huang
- Department of Evolutionary Anthropology, University of Vienna, Vienna, Austria
- Human Evolution and Archaeological Sciences (HEAS), University of Vienna, Vienna, Austria
| | | | - Ekaterina Noskova
- Computer Technologies Laboratory, ITMO University, St Petersburg, Russian Federation
| | - Jana Obsteter
- Agricultural Institute of Slovenia, Department of Animal Science, Ljubljana, Slovenia
| | | | - Alice Pearson
- Department of Genetics, University of Cambridge, Cambridge, United Kingdom
- Department of Zoology, University of Cambridge, Cambridge, United Kingdom
| | - David Peede
- Department of Ecology, Evolution, and Organismal Biology, Brown University, Providence, United States
- Center for Computational Molecular Biology, Brown University, Providence, United States
| | - Manolo F Perez
- Department of Genetics and Evolution, Federal University of Sao Carlos, Sao Carlos, Brazil
| | - Murillo F Rodrigues
- Institute of Ecology and Evolution, University of Oregon, Eugene, United States
| | - Chris C R Smith
- Institute of Ecology and Evolution, University of Oregon, Eugene, United States
| | - Jeffrey P Spence
- Department of Genetics, Stanford University School of Medicine, Stanford, United States
| | - Anastasia Teterina
- Institute of Ecology and Evolution, University of Oregon, Eugene, United States
| | - Silas Tittes
- Institute of Ecology and Evolution, University of Oregon, Eugene, United States
| | - Per Unneberg
- Department of Cell and Molecular Biology, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Juan Manuel Vazquez
- Department of Integrative Biology, University of California, Berkeley, Berkeley, United States
| | - Ryan K Waples
- Department of Biostatistics, University of Washington, Seattle, United States
| | | | - Yan Wong
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
| | - Franz Baumdicker
- Cluster of Excellence - Controlling Microbes to Fight Infections, Eberhard Karls Universit¨at Tubingen, Tubingen, Germany
| | - Reed A Cartwright
- School of Life Sciences and The Biodesign Institute, Arizona State University, Tempe, United States
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, United Kingdom
| | - Ryan N Gutenkunst
- Department of Molecular and Cellular Biology, University of Arizona, Tucson, United States
| | - Jerome Kelleher
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
| | - Andrew D Kern
- Institute of Ecology and Evolution, University of Oregon, Eugene, United States
| | - Aaron P Ragsdale
- Department of Integrative Biology, University of Wisconsin-Madison, Madison, United States
| | - Peter L Ralph
- Institute of Ecology and Evolution, University of Oregon, Eugene, United States
- Department of Mathematics, University of Oregon, Eugene, United States
| | - Daniel R Schrider
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, United States
| | - Ilan Gronau
- Efi Arazi School of Computer Science, Reichman University, Herzliya, Israel
| |
Collapse
|
13
|
Oliveira TP, Obšteter J, Pocrnic I, Heslot N, Gorjanc G. A method for partitioning trends in genetic mean and variance to understand breeding practices. Genet Sel Evol 2023; 55:36. [PMID: 37268883 DOI: 10.1186/s12711-023-00804-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 04/17/2023] [Indexed: 06/04/2023] Open
Abstract
BACKGROUND In breeding programmes, the observed genetic change is a sum of the contributions of different selection paths represented by groups of individuals. Quantifying these sources of genetic change is essential for identifying the key breeding actions and optimizing breeding programmes. However, it is difficult to disentangle the contribution of individual paths due to the inherent complexity of breeding programmes. Here we extend the previously developed method for partitioning genetic mean by paths of selection to work both with the mean and variance of breeding values. METHODS First, we extended the partitioning method to quantify the contribution of different paths to genetic variance assuming that the breeding values are known. Second, we combined the partitioning method with the Markov Chain Monte Carlo approach to draw samples from the posterior distribution of breeding values and use these samples for computing the point and interval estimates of partitions for the genetic mean and variance. We implemented the method in the R package AlphaPart. We demonstrated the method with a simulated cattle breeding programme. RESULTS We show how to quantify the contribution of different groups of individuals to genetic mean and variance and that the contributions of different selection paths to genetic variance are not necessarily independent. Finally, we observed that the partitioning method under the pedigree-based model has some limitations, which suggests the need for a genomic extension. CONCLUSIONS We presented a partitioning method to quantify sources of change in genetic mean and variance in breeding programmes. The method can help breeders and researchers understand the dynamics in genetic mean and variance in a breeding programme. The developed method for partitioning genetic mean and variance is a powerful method for understanding how different selection paths interact within a breeding programme and how they can be optimised.
Collapse
Affiliation(s)
- Thiago P Oliveira
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, UK.
| | - Jana Obšteter
- Agricultural Institute of Slovenia, Ljubljana, Slovenia
| | - Ivan Pocrnic
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, UK
| | | | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, UK
| |
Collapse
|
14
|
Wang Y, Zhao Z, Miao X, Wang Y, Qian X, Chen L, Wang C, Li S. eSMC: a statistical model to infer admixture events from individual genomics data. BMC Genomics 2022; 23:827. [PMCID: PMC9748406 DOI: 10.1186/s12864-022-09033-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Accepted: 11/21/2022] [Indexed: 12/15/2022] Open
Abstract
Abstract
Background
Inferring historical population admixture events yield essential insights in understanding a species demographic history. Methods are available to infer admixture events in demographic history with extant genetic data from multiple sources. Due to the deficiency in ancient population genetic data, there lacks a method for admixture inference from a single source. Pairwise Sequentially Markovian Coalescent (PSMC) estimates the historical effective population size from lineage genomes of a single individual, based on the distribution of the most recent common ancestor between the diploid’s alleles. However, PSMC does not infer the admixture event.
Results
Here, we proposed eSMC, an extended PSMC model for admixture inference from a single source. We evaluated our model’s performance on both in silico data and real data. We simulated population admixture events at an admixture time range from 5 kya to 100 kya (5 years/generation) with population admix ratio at 1:1, 2:1, 3:1, and 4:1, respectively. The root means the square error is $$\pm 7.61$$
±
7.61
kya for all experiments. Then we implemented our method to infer the historical admixture events in human, donkey and goat populations. The estimated admixture time for both Han and Tibetan individuals range from 60 kya to 80 kya (25 years/generation), while the estimated admixture time for the domesticated donkeys and the goats ranged from 40 kya to 60 kya (8 years/generation) and 40 kya to 100 kya (6 years/generation), respectively. The estimated admixture times were concordance to the time that domestication occurred in human history.
Conclusion
Our eSMC effectively infers the time of the most recent admixture event in history from a single individual’s genomics data. The source code of eSMC is hosted at https://github.com/zachary-zzc/eSMC.
Collapse
|
15
|
Pocrnic I, Lindgren F, Tolhurst D, Herring WO, Gorjanc G. Optimisation of the core subset for the APY approximation of genomic relationships. Genet Sel Evol 2022; 54:76. [PMID: 36418945 PMCID: PMC9682752 DOI: 10.1186/s12711-022-00767-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 10/31/2022] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND By entering the era of mega-scale genomics, we are facing many computational issues with standard genomic evaluation models due to their dense data structure and cubic computational complexity. Several scalable approaches have been proposed to address this challenge, such as the Algorithm for Proven and Young (APY). In APY, genotyped animals are partitioned into core and non-core subsets, which induces a sparser inverse of the genomic relationship matrix. This partitioning is often done at random. While APY is a good approximation of the full model, random partitioning can make results unstable, possibly affecting accuracy or even reranking animals. Here we present a stable optimisation of the core subset by choosing animals with the most informative genotype data. METHODS We derived a novel algorithm for optimising the core subset based on a conditional genomic relationship matrix or a conditional single nucleotide polymorphism (SNP) genotype matrix. We compared the accuracy of genomic predictions with different core subsets for simulated and real pig data sets. The core subsets were constructed (1) at random, (2) based on the diagonal of the genomic relationship matrix, (3) at random with weights from (2), or (4) based on the novel conditional algorithm. To understand the different core subset constructions, we visualise the population structure of the genotyped animals with linear Principal Component Analysis and non-linear Uniform Manifold Approximation and Projection. RESULTS All core subset constructions performed equally well when the number of core animals captured most of the variation in the genomic relationships, both in simulated and real data sets. When the number of core animals was not sufficiently large, there was substantial variability in the results with the random construction but no variability with the conditional construction. Visualisation of the population structure and chosen core animals showed that the conditional construction spreads core animals across the whole domain of genotyped animals in a repeatable manner. CONCLUSIONS Our results confirm that the size of the core subset in APY is critical. Furthermore, the results show that the core subset can be optimised with the conditional algorithm that achieves an optimal and repeatable spread of core animals across the domain of genotyped animals.
Collapse
Affiliation(s)
- Ivan Pocrnic
- grid.4305.20000 0004 1936 7988The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush Campus, Edinburgh, EH25 9RG UK
| | - Finn Lindgren
- grid.4305.20000 0004 1936 7988School of Mathematics, The University of Edinburgh, The King’s Buildings, Edinburgh, EH9 3FD UK
| | - Daniel Tolhurst
- grid.4305.20000 0004 1936 7988The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush Campus, Edinburgh, EH25 9RG UK
| | - William O. Herring
- Genus PIC, 100 Bluegrass Commons Blvd., Suite 2200, Hendersonville, TN 37075 USA
| | - Gregor Gorjanc
- grid.4305.20000 0004 1936 7988The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush Campus, Edinburgh, EH25 9RG UK
| |
Collapse
|
16
|
Cockerill CA, Hasselgren M, Dussex N, Dalén L, von Seth J, Angerbjörn A, Wallén JF, Landa A, Eide NE, Flagstad Ø, Ehrich D, Sokolov A, Sokolova N, Norén K. Genomic Consequences of Fragmentation in the Endangered Fennoscandian Arctic Fox ( Vulpes lagopus). Genes (Basel) 2022; 13:2124. [PMID: 36421799 PMCID: PMC9690288 DOI: 10.3390/genes13112124] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Revised: 10/06/2022] [Accepted: 10/10/2022] [Indexed: 11/17/2022] Open
Abstract
Accelerating climate change is causing severe habitat fragmentation in the Arctic, threatening the persistence of many cold-adapted species. The Scandinavian arctic fox (Vulpes lagopus) is highly fragmented, with a once continuous, circumpolar distribution, it struggled to recover from a demographic bottleneck in the late 19th century. The future persistence of the entire Scandinavian population is highly dependent on the northernmost Fennoscandian subpopulations (Scandinavia and the Kola Peninsula), to provide a link to the viable Siberian population. By analyzing 43 arctic fox genomes, we quantified genomic variation and inbreeding in these populations. Signatures of genome erosion increased from Siberia to northern Sweden indicating a stepping-stone model of connectivity. In northern Fennoscandia, runs of homozygosity (ROH) were on average ~1.47-fold longer than ROH found in Siberia, stretching almost entire scaffolds. Moreover, consistent with recent inbreeding, northern Fennoscandia harbored more homozygous deleterious mutations, whereas Siberia had more in heterozygous state. This study underlines the value of documenting genome erosion following population fragmentation to identify areas requiring conservation priority. With the increasing fragmentation and isolation of Arctic habitats due to global warming, understanding the genomic and demographic consequences is vital for maintaining evolutionary potential and preventing local extinctions.
Collapse
Affiliation(s)
| | - Malin Hasselgren
- Department of Zoology, Stockholm University, 10691 Stockholm, Sweden
| | - Nicolas Dussex
- Department of Zoology, Stockholm University, 10691 Stockholm, Sweden
- Centre for Palaeogenetics, Svante Arrhenius väg 20C, 10691 Stockholm, Sweden
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, 11418 Stockholm, Sweden
| | - Love Dalén
- Department of Zoology, Stockholm University, 10691 Stockholm, Sweden
- Centre for Palaeogenetics, Svante Arrhenius väg 20C, 10691 Stockholm, Sweden
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, 11418 Stockholm, Sweden
| | - Johanna von Seth
- Department of Zoology, Stockholm University, 10691 Stockholm, Sweden
- Centre for Palaeogenetics, Svante Arrhenius väg 20C, 10691 Stockholm, Sweden
| | - Anders Angerbjörn
- Department of Zoology, Stockholm University, 10691 Stockholm, Sweden
| | - Johan F. Wallén
- Department of Zoology, Stockholm University, 10691 Stockholm, Sweden
| | - Arild Landa
- Norwegian Institute for Nature Research, 7485 Trondheim, Norway
| | - Nina E. Eide
- Norwegian Institute for Nature Research, 7485 Trondheim, Norway
| | | | - Dorothee Ehrich
- Department of Arctic and Marine Biology, UiT Arctic University of Tromsø, 9037 Tromsø, Norway
| | - Aleksandr Sokolov
- Arctic Research Station of Institute of Plant and Animal Ecology, Ural Branch, Russian Academy of Sciences, Zelenaya Gorka Str. 21, 629400 Labytnangi, Russia
| | - Natalya Sokolova
- Arctic Research Station of Institute of Plant and Animal Ecology, Ural Branch, Russian Academy of Sciences, Zelenaya Gorka Str. 21, 629400 Labytnangi, Russia
| | - Karin Norén
- Department of Zoology, Stockholm University, 10691 Stockholm, Sweden
| |
Collapse
|
17
|
Omer EA, Hinrichs D, Addo S, Roessler R. Development of a breeding program for improving the milk yield performance of Butana cattle under smallholder production conditions using a stochastic simulation approach. J Dairy Sci 2022; 105:5261-5270. [DOI: 10.3168/jds.2021-21307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Accepted: 01/20/2022] [Indexed: 11/19/2022]
|
18
|
Benchmarking phasing software with a whole-genome sequenced cattle pedigree. BMC Genomics 2022; 23:130. [PMID: 35164677 PMCID: PMC8845340 DOI: 10.1186/s12864-022-08354-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Accepted: 01/24/2022] [Indexed: 12/30/2022] Open
Abstract
Background Accurate haplotype reconstruction is required in many applications in quantitative and population genomics. Different phasing methods are available but their accuracy must be evaluated for samples with different properties (population structure, marker density, etc.). We herein took advantage of whole-genome sequence data available for a Holstein cattle pedigree containing 264 individuals, including 98 trios, to evaluate several population-based phasing methods. This data represents a typical example of a livestock population, with low effective population size, high levels of relatedness and long-range linkage disequilibrium. Results After stringent filtering of our sequence data, we evaluated several population-based phasing programs including one or more versions of AlphaPhase, ShapeIT, Beagle, Eagle and FImpute. To that end we used 98 individuals having both parents sequenced for validation. Their haplotypes reconstructed based on Mendelian segregation rules were considered the gold standard to assess the performance of population-based methods in two scenarios. In the first one, only these 98 individuals were phased, while in the second one, all the 264 sequenced individuals were phased simultaneously, ignoring the pedigree relationships. We assessed phasing accuracy based on switch error counts (SEC) and rates (SER), lengths of correctly phased haplotypes and the probability that there is no phasing error between a pair of SNPs as a function of their distance. For most evaluated metrics or scenarios, the best software was either ShapeIT4.1 or Beagle5.2, both methods resulting in particularly high phasing accuracies. For instance, ShapeIT4.1 achieved a median SEC of 50 per individual and a mean haplotype block length of 24.1 Mb (scenario 2). These statistics are remarkable since the methods were evaluated with a map of 8,400,000 SNPs, and this corresponds to only one switch error every 40,000 phased informative markers. When more relatives were included in the data (scenario 2), FImpute3.0 reconstructed extremely long segments without errors. Conclusions We report extremely high phasing accuracies in a typical livestock sample. ShapeIT4.1 and Beagle5.2 proved to be the most accurate, particularly for phasing long segments and in the first scenario. Nevertheless, most tools achieved high accuracy at short distances and would be suitable for applications requiring only local haplotypes. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-022-08354-6.
Collapse
|
19
|
Nadachowska‐Brzyska K, Konczal M, Babik W. Navigating the temporal continuum of effective population size. Methods Ecol Evol 2021. [DOI: 10.1111/2041-210x.13740] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
| | | | - Wieslaw Babik
- Jagiellonian University in Kraków Faculty of Biology Institute of Environmental Sciences Kraków Poland
| |
Collapse
|
20
|
Ferrante JA, Smith CH, Thompson LM, Hunter ME. Genome-wide SNP analysis of three moose subspecies at the southern range limit in the contiguous United States. CONSERV GENET 2021. [DOI: 10.1007/s10592-021-01402-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
AbstractGenome-wide evaluations of genetic diversity and population structure are important for informing management and conservation of trailing-edge populations. North American moose (Alces alces) are declining along portions of the southern edge of their range due to disease, species interactions, and marginal habitat, all of which may be exacerbated by climate change. We employed a genotyping by sequencing (GBS) approach in an effort to collect baseline information on the genetic variation of moose inhabiting the species’ southern range periphery in the contiguous United States. We identified 1920 single nucleotide polymorphisms (SNPs) from 155 moose representing three subspecies from five states: A. a. americana (New Hampshire), A. a. andersoni (Minnesota), and A. a. shirasi (Idaho, Montana, and Wyoming). Molecular analyses supported three geographically isolated clusters, congruent with currently recognized subspecies. Additionally, while moderately low genetic diversity was observed, there was little evidence of inbreeding. Results also indicated > 20% shared ancestry proportions between A. a. shirasi samples from northern Montana and A. a. andersoni samples from Minnesota, indicating a putative hybrid zone warranting further investigation. GBS has proven to be a simple and effective method for genome-wide SNP discovery in moose and provides robust data for informing herd management and conservation priorities. With increasing disease, predation, and climate related pressure on range edge moose populations in the United States, the use of SNP data to identify gene flow between subspecies may prove a powerful tool for moose management and recovery, particularly if hybrid moose are more able to adapt.
Collapse
|
21
|
Zhang Y, Mao F, Mu H, Huang M, Bao Y, Wang L, Wong NK, Xiao S, Dai H, Xiang Z, Ma M, Xiong Y, Zhang Z, Zhang L, Song X, Wang F, Mu X, Li J, Ma H, Zhang Y, Zheng H, Simakov O, Yu Z. The genome of Nautilus pompilius illuminates eye evolution and biomineralization. Nat Ecol Evol 2021; 5:927-938. [PMID: 33972735 PMCID: PMC8257504 DOI: 10.1038/s41559-021-01448-6] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Accepted: 03/22/2021] [Indexed: 02/06/2023]
Abstract
Nautilus is the sole surviving externally shelled cephalopod from the Palaeozoic. It is unique within cephalopod genealogy and critical to understanding the evolutionary novelties of cephalopods. Here, we present a complete Nautilus pompilius genome as a fundamental genomic reference on cephalopod innovations, such as the pinhole eye and biomineralization. Nautilus shows a compact, minimalist genome with few encoding genes and slow evolutionary rates in both non-coding and coding regions among known cephalopods. Importantly, multiple genomic innovations including gene losses, independent contraction and expansion of specific gene families and their associated regulatory networks likely moulded the evolution of the nautilus pinhole eye. The conserved molluscan biomineralization toolkit and lineage-specific repetitive low-complexity domains are essential to the construction of the nautilus shell. The nautilus genome constitutes a valuable resource for reconstructing the evolutionary scenarios and genomic innovations that shape the extant cephalopods.
Collapse
Affiliation(s)
- Yang Zhang
- Key Laboratory of Tropical Marine Bio-resources and Ecology and Guangdong Provincial Key Laboratory of Applied Marine Biology, South China Sea Institute of Oceanology, Chinese Academy of Sciences, Guangzhou, China
- Innovation Academy of South China Sea Ecology and Environmental Engineering, Chinese Academy of Sciences, Guangzhou, China
- Southern Marine Science and Engineering Guangdong Laboratory, Guangzhou, China
| | - Fan Mao
- Key Laboratory of Tropical Marine Bio-resources and Ecology and Guangdong Provincial Key Laboratory of Applied Marine Biology, South China Sea Institute of Oceanology, Chinese Academy of Sciences, Guangzhou, China
- Innovation Academy of South China Sea Ecology and Environmental Engineering, Chinese Academy of Sciences, Guangzhou, China
- Southern Marine Science and Engineering Guangdong Laboratory, Guangzhou, China
| | - Huawei Mu
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, Hefei National Laboratory for Physical Sciences at the Microscale, CAS Key Laboratory of Brain Function and Disease, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
| | - Minwei Huang
- Key Laboratory of Tropical Marine Bio-resources and Ecology and Guangdong Provincial Key Laboratory of Applied Marine Biology, South China Sea Institute of Oceanology, Chinese Academy of Sciences, Guangzhou, China
- Innovation Academy of South China Sea Ecology and Environmental Engineering, Chinese Academy of Sciences, Guangzhou, China
- Southern Marine Science and Engineering Guangdong Laboratory, Guangzhou, China
| | - Yongbo Bao
- Zhejiang Key Laboratory of Aquatic Germplasm Resources, College of Biological and Environmental Sciences, Zhejiang Wanli University, Ningbo, China
| | - Lili Wang
- Biomarker Technologies Corporation, Beijing, China
| | - Nai-Kei Wong
- Key Laboratory of Tropical Marine Bio-resources and Ecology and Guangdong Provincial Key Laboratory of Applied Marine Biology, South China Sea Institute of Oceanology, Chinese Academy of Sciences, Guangzhou, China
| | - Shu Xiao
- Key Laboratory of Tropical Marine Bio-resources and Ecology and Guangdong Provincial Key Laboratory of Applied Marine Biology, South China Sea Institute of Oceanology, Chinese Academy of Sciences, Guangzhou, China
- Innovation Academy of South China Sea Ecology and Environmental Engineering, Chinese Academy of Sciences, Guangzhou, China
- Southern Marine Science and Engineering Guangdong Laboratory, Guangzhou, China
| | - He Dai
- Biomarker Technologies Corporation, Beijing, China
| | - Zhiming Xiang
- Key Laboratory of Tropical Marine Bio-resources and Ecology and Guangdong Provincial Key Laboratory of Applied Marine Biology, South China Sea Institute of Oceanology, Chinese Academy of Sciences, Guangzhou, China
- Innovation Academy of South China Sea Ecology and Environmental Engineering, Chinese Academy of Sciences, Guangzhou, China
- Southern Marine Science and Engineering Guangdong Laboratory, Guangzhou, China
| | - Mingli Ma
- Biomarker Technologies Corporation, Beijing, China
| | - Yuanyan Xiong
- State Key Laboratory of Biocontrol, College of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Ziwei Zhang
- State Key Laboratory of Biocontrol, College of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Lvping Zhang
- Key Laboratory of Tropical Marine Bio-resources and Ecology and Guangdong Provincial Key Laboratory of Applied Marine Biology, South China Sea Institute of Oceanology, Chinese Academy of Sciences, Guangzhou, China
- Innovation Academy of South China Sea Ecology and Environmental Engineering, Chinese Academy of Sciences, Guangzhou, China
- Southern Marine Science and Engineering Guangdong Laboratory, Guangzhou, China
| | - Xiaoyuan Song
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, Hefei National Laboratory for Physical Sciences at the Microscale, CAS Key Laboratory of Brain Function and Disease, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
| | - Fan Wang
- Biomarker Technologies Corporation, Beijing, China
| | - Xiyu Mu
- Biomarker Technologies Corporation, Beijing, China
| | - Jun Li
- Key Laboratory of Tropical Marine Bio-resources and Ecology and Guangdong Provincial Key Laboratory of Applied Marine Biology, South China Sea Institute of Oceanology, Chinese Academy of Sciences, Guangzhou, China
- Innovation Academy of South China Sea Ecology and Environmental Engineering, Chinese Academy of Sciences, Guangzhou, China
- Southern Marine Science and Engineering Guangdong Laboratory, Guangzhou, China
| | - Haitao Ma
- Key Laboratory of Tropical Marine Bio-resources and Ecology and Guangdong Provincial Key Laboratory of Applied Marine Biology, South China Sea Institute of Oceanology, Chinese Academy of Sciences, Guangzhou, China
- Innovation Academy of South China Sea Ecology and Environmental Engineering, Chinese Academy of Sciences, Guangzhou, China
- Southern Marine Science and Engineering Guangdong Laboratory, Guangzhou, China
| | - Yuehuan Zhang
- Key Laboratory of Tropical Marine Bio-resources and Ecology and Guangdong Provincial Key Laboratory of Applied Marine Biology, South China Sea Institute of Oceanology, Chinese Academy of Sciences, Guangzhou, China
- Innovation Academy of South China Sea Ecology and Environmental Engineering, Chinese Academy of Sciences, Guangzhou, China
- Southern Marine Science and Engineering Guangdong Laboratory, Guangzhou, China
| | | | - Oleg Simakov
- Department of Neuroscience and Developmental Biology, University of Vienna, Vienna, Austria
| | - Ziniu Yu
- Key Laboratory of Tropical Marine Bio-resources and Ecology and Guangdong Provincial Key Laboratory of Applied Marine Biology, South China Sea Institute of Oceanology, Chinese Academy of Sciences, Guangzhou, China.
- Innovation Academy of South China Sea Ecology and Environmental Engineering, Chinese Academy of Sciences, Guangzhou, China.
- Southern Marine Science and Engineering Guangdong Laboratory, Guangzhou, China.
| |
Collapse
|
22
|
Fang Y, Hao X, Xu Z, Sun H, Zhao Q, Cao R, Zhang Z, Ma P, Sun Y, Qi Z, Wei Q, Wang Q, Pan Y. Genome-Wide Detection of Runs of Homozygosity in Laiwu Pigs Revealed by Sequencing Data. Front Genet 2021; 12:629966. [PMID: 33995477 PMCID: PMC8116706 DOI: 10.3389/fgene.2021.629966] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Accepted: 03/08/2021] [Indexed: 11/13/2022] Open
Abstract
Laiwu pigs, distinguished by their high intramuscular fat of 7-9%, is an indigenous pig breed of China, and recent studies also found that Laiwu pigs showed high resistance to Porcine circovirus type 2. However, with the introduction of commercial varieties, the population of Laiwu pigs has declined, and some lineages have disappeared, which could result in inbreeding. Runs of homozygosity (ROH) can be used as a good measure of individual inbreeding status and is also normally used to detect selection signatures so as to map the candidate genes associated with economically important traits. In this study, we used data from Genotyping by Genome Reducing and Sequencing to investigate the number, length, coverage, and distribution patterns of ROH in 93 Chinese Laiwu pigs and identified genomic regions with a high ROH frequency. The average inbreeding coefficient calculated by pedigree was 0.021, whereas that estimated by all detected ROH segments was 0.133. Covering 13.4% of the whole genome, a total of 7,508 ROH segments longer than 1 Mb were detected, whose average length was 3.76 Mb, and short segments (1-5 Mb) dominated. For individuals, the coverage was in the range between 0.56 and 36.86%. For chromosomes, SSC6 had the largest number (n = 688), and the number of ROH in SSC12 was the lowest (n = 215). Thirteen ROH islands were detected in our study, and 86 genes were found within those regions. Some of these genes were correlated with economically important traits, such as meat quality (ECI1, LRP12, NDUFA4L2, GIL1, and LYZ), immunity capacity (IL23A, STAT2, STAT6, TBK1, IFNG, and ITH2), production (DCSTAMP, RDH16, and GDF11), and reproduction (ODF1 and CDK2). A total of six significant Gene Ontology terms and nine significant Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways were identified, most of which were correlated with disease resistance and biosynthesis processes, and one KEGG pathway was related to lipid metabolism. In addition, we aligned all of the ROH islands to the pig quantitative trait loci (QTL) database and finally found eight QTL related to the intramuscular fat trait. These results may help us understand the characteristics of Laiwu pigs and provide insight for future breeding strategies.
Collapse
Affiliation(s)
- Yifei Fang
- Department of Animal Science, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, China
| | - Xinyu Hao
- Department of Animal Science, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, China
| | - Zhong Xu
- Department of Animal Science, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, China
| | - Hao Sun
- Department of Animal Science, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, China
| | - Qingbo Zhao
- Department of Animal Science, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, China
| | - Rui Cao
- Department of Animal Science, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, China
| | - Zhe Zhang
- Department of Animal Science, College of Animal Sciences, Zhejiang University, Hangzhou, China
| | - Peipei Ma
- Department of Animal Science, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, China
| | | | | | | | - Qishan Wang
- Department of Animal Science, College of Animal Sciences, Zhejiang University, Hangzhou, China
| | - Yuchun Pan
- Department of Animal Science, College of Animal Sciences, Zhejiang University, Hangzhou, China
| |
Collapse
|
23
|
Yang HC, Chen CW, Lin YT, Chu SK. Genetic ancestry plays a central role in population pharmacogenomics. Commun Biol 2021; 4:171. [PMID: 33547344 PMCID: PMC7864978 DOI: 10.1038/s42003-021-01681-6] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2019] [Accepted: 01/06/2021] [Indexed: 12/12/2022] Open
Abstract
Recent studies have pointed out the essential role of genetic ancestry in population pharmacogenetics. In this study, we analyzed the whole-genome sequencing data from The 1000 Genomes Project (Phase 3) and the pharmacogenetic information from Drug Bank, PharmGKB, PharmaADME, and Biotransformation. Here we show that ancestry-informative markers are enriched in pharmacogenetic loci, suggesting that trans-ancestry differentiation must be carefully considered in population pharmacogenetics studies. Ancestry-informative pharmacogenetic loci are located in both protein-coding and non-protein-coding regions, illustrating that a whole-genome analysis is necessary for an unbiased examination over pharmacogenetic loci. Finally, those ancestry-informative pharmacogenetic loci that target multiple drugs are often a functional variant, which reflects their importance in biological functions and pathways. In summary, we develop an efficient algorithm for an ultrahigh-dimensional principal component analysis. We create genetic catalogs of ancestry-informative markers and genes. We explore pharmacogenetic patterns and establish a high-accuracy prediction panel of genetic ancestry. Moreover, we construct a genetic ancestry pharmacogenomic database Genetic Ancestry PhD (http://hcyang.stat.sinica.edu.tw/databases/genetic_ancestry_phd/). Hsin-Chou Yang et al. examine population structure in several genomic databases and identify that pharmacogenetic loci are enriched for markers of genetic ancestry. Their results suggest that genetic ancestry must be carefully considered in population pharmacogenetics studies.
Collapse
Affiliation(s)
- Hsin-Chou Yang
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan. .,Institute of Statistics, National Cheng Kung University, Tainan, Taiwan. .,Institute of Public Health, National Yang-Ming University, Taipei, Taiwan.
| | - Chia-Wei Chen
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| | - Yu-Ting Lin
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| | - Shih-Kai Chu
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| |
Collapse
|
24
|
Selle ML, Steinsland I, Powell O, Hickey JM, Gorjanc G. Spatial modelling improves genetic evaluation in smallholder breeding programs. Genet Sel Evol 2020; 52:69. [PMID: 33198636 PMCID: PMC7670695 DOI: 10.1186/s12711-020-00588-w] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2020] [Accepted: 11/03/2020] [Indexed: 01/13/2023] Open
Abstract
BACKGROUND Breeders and geneticists use statistical models to separate genetic and environmental effects on phenotype. A common way to separate these effects is to model a descriptor of an environment, a contemporary group or herd, and account for genetic relationship between animals across environments. However, separating the genetic and environmental effects in smallholder systems is challenging due to small herd sizes and weak genetic connectedness across herds. We hypothesised that accounting for spatial relationships between nearby herds can improve genetic evaluation in smallholder systems. Furthermore, geographically referenced environmental covariates are increasingly available and could model underlying sources of spatial relationships. The objective of this study was therefore, to evaluate the potential of spatial modelling to improve genetic evaluation in dairy cattle smallholder systems. METHODS We performed simulations and real dairy cattle data analysis to test our hypothesis. We modelled environmental variation by estimating herd and spatial effects. Herd effects were considered independent, whereas spatial effects had distance-based covariance between herds. We compared these models using pedigree or genomic data. RESULTS The results show that in smallholder systems (i) standard models do not separate genetic and environmental effects accurately, (ii) spatial modelling increases the accuracy of genetic evaluation for phenotyped and non-phenotyped animals, (iii) environmental covariates do not substantially improve the accuracy of genetic evaluation beyond simple distance-based relationships between herds, (iv) the benefit of spatial modelling was largest when separating the genetic and environmental effects was challenging, and (v) spatial modelling was beneficial when using either pedigree or genomic data. CONCLUSIONS We have demonstrated the potential of spatial modelling to improve genetic evaluation in smallholder systems. This improvement is driven by establishing environmental connectedness between herds, which enhances separation of genetic and environmental effects. We suggest routine spatial modelling in genetic evaluations, particularly for smallholder systems. Spatial modelling could also have a major impact in studies of human and wild populations.
Collapse
Affiliation(s)
- Maria L Selle
- Department of Mathematical Sciences, Norwegian University of Science and Technology, Trondheim, Norway.
| | - Ingelin Steinsland
- Department of Mathematical Sciences, Norwegian University of Science and Technology, Trondheim, Norway
| | - Owen Powell
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, UK
| | - John M Hickey
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, UK
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, UK
| |
Collapse
|
25
|
Sanchez T, Cury J, Charpiat G, Jay F. Deep learning for population size history inference: Design, comparison and combination with approximate Bayesian computation. Mol Ecol Resour 2020; 21:2645-2660. [DOI: 10.1111/1755-0998.13224] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Revised: 06/19/2020] [Accepted: 07/02/2020] [Indexed: 12/28/2022]
Affiliation(s)
- Théophile Sanchez
- Laboratoire de Recherche en Informatique CNRS UMR 8623 Université Paris‐Saclay Orsay France
| | - Jean Cury
- Laboratoire de Recherche en Informatique CNRS UMR 8623 Université Paris‐Saclay Orsay France
| | - Guillaume Charpiat
- Laboratoire de Recherche en Informatique CNRS UMR 8623 Université Paris‐Saclay Orsay France
| | - Flora Jay
- Laboratoire de Recherche en Informatique CNRS UMR 8623 Université Paris‐Saclay Orsay France
| |
Collapse
|
26
|
Estimates of Autozygosity Through Runs of Homozygosity in Farmed Coho Salmon. Genes (Basel) 2020; 11:genes11050490. [PMID: 32365758 PMCID: PMC7290985 DOI: 10.3390/genes11050490] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Revised: 04/23/2020] [Accepted: 04/24/2020] [Indexed: 11/17/2022] Open
Abstract
The characterization of runs of homozygosity (ROH), using high-density single nucleotide polymorphisms (SNPs) allows inferences to be made about the past demographic history of animal populations and the genomic ROH has become a common approach to characterize the inbreeding. We aimed to analyze and characterize ROH patterns and compare different genomic and pedigree-based methods to estimate the inbreeding coefficient in two pure lines (POP A and B) and one recently admixed line (POP C) of coho salmon (Oncorhynchus kisutch) breeding nuclei, genotyped using a 200 K Affymetrix Axiom® myDesign Custom SNP Array. A large number and greater mean length of ROH were found for the two “pure” lines and the recently admixed line (POP C) showed the lowest number and smaller mean length of ROH. The ROH analysis for different length classes suggests that all three coho salmon lines the genome is largely composed of a high number of short segments (<4 Mb), and for POP C no segment >16 Mb was found. A high variable number of ROH, mean length and inbreeding values across chromosomes; positively the consequence of artificial selection. Pedigree-based inbreeding values tended to underestimate genomic-based inbreeding levels, which in turn varied depending on the method used for estimation. The high positive correlations between different genomic-based inbreeding coefficients suggest that they are consistent and may be more accurate than pedigree-based methods, given that they capture information from past and more recent demographic events, even when there are no pedigree records available.
Collapse
|
27
|
Abstract
The domestication of animals led to a major shift in human subsistence patterns, from a hunter-gatherer to a sedentary agricultural lifestyle, which ultimately resulted in the development of complex societies. Over the past 15,000 years, the phenotype and genotype of multiple animal species, such as dogs, pigs, sheep, goats, cattle and horses, have been substantially altered during their adaptation to the human niche. Recent methodological innovations, such as improved ancient DNA extraction methods and next-generation sequencing, have enabled the sequencing of whole ancient genomes. These genomes have helped reconstruct the process by which animals entered into domestic relationships with humans and were subjected to novel selection pressures. Here, we discuss and update key concepts in animal domestication in light of recent contributions from ancient genomics.
Collapse
|
28
|
Ros-Freixedes R, Whalen A, Gorjanc G, Mileham AJ, Hickey JM. Evaluation of sequencing strategies for whole-genome imputation with hybrid peeling. Genet Sel Evol 2020; 52:18. [PMID: 32248818 PMCID: PMC7132986 DOI: 10.1186/s12711-020-00537-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2019] [Accepted: 03/27/2020] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND For assembling large whole-genome sequence datasets for routine use in research and breeding, the sequencing strategy should be adapted to the methods that will be used later for variant discovery and imputation. In this study, we used simulation to explore the impact that the sequencing strategy and level of sequencing investment have on the overall accuracy of imputation using hybrid peeling, a pedigree-based imputation method that is well suited for large livestock populations. METHODS We simulated marker array and whole-genome sequence data for 15 populations with simulated or real pedigrees that had different structures. In these populations, we evaluated the effect on imputation accuracy of seven methods for selecting which individuals to sequence, the generation of the pedigree to which the sequenced individuals belonged, the use of variable or uniform coverage, and the trade-off between the number of sequenced individuals and their sequencing coverage. For each population, we considered four levels of investment in sequencing that were proportional to the size of the population. RESULTS Imputation accuracy depended greatly on pedigree depth. The distribution of the sequenced individuals across the generations of the pedigree underlay the performance of the different methods used to select individuals to sequence and it was critical for achieving high imputation accuracy in both early and late generations. Imputation accuracy was highest with a uniform coverage across the sequenced individuals of 2× rather than variable coverage. An investment equivalent to the cost of sequencing 2% of the population at 2× provided high imputation accuracy. The gain in imputation accuracy from additional investment decreased with larger populations and higher levels of investment. However, to achieve the same imputation accuracy, a proportionally greater investment must be used in the smaller populations compared to the larger ones. CONCLUSIONS Suitable sequencing strategies for subsequent imputation with hybrid peeling involve sequencing ~2% of the population at a uniform coverage 2×, distributed preferably across all generations of the pedigree, except for the few earliest generations that lack genotyped ancestors. Such sequencing strategies are beneficial for generating whole-genome sequence data in populations with deep pedigrees of closely related individuals.
Collapse
Affiliation(s)
- Roger Ros-Freixedes
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK
- Departament de Ciència Animal, Universitat de Lleida-Agrotecnio Center, Lleida, Spain
| | - Andrew Whalen
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK
| | | | - John M. Hickey
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK
| |
Collapse
|
29
|
Leitwein M, Duranton M, Rougemont Q, Gagnaire PA, Bernatchez L. Using Haplotype Information for Conservation Genomics. Trends Ecol Evol 2020; 35:245-258. [DOI: 10.1016/j.tree.2019.10.012] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2019] [Revised: 10/18/2019] [Accepted: 10/28/2019] [Indexed: 12/19/2022]
|
30
|
Jay F, Boitard S, Austerlitz F. An ABC Method for Whole-Genome Sequence Data: Inferring Paleolithic and Neolithic Human Expansions. Mol Biol Evol 2020; 36:1565-1579. [PMID: 30785202 DOI: 10.1093/molbev/msz038] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Species generally undergo a complex demographic history consisting, in particular, of multiple changes in population size. Genome-wide sequencing data are potentially highly informative for reconstructing this demographic history. A crucial point is to extract the relevant information from these very large data sets. Here, we design an approach for inferring past demographic events from a moderate number of fully sequenced genomes. Our new approach uses Approximate Bayesian Computation, a simulation-based statistical framework that allows 1) identifying the best demographic scenario among several competing scenarios and 2) estimating the best-fitting parameters under the chosen scenario. Approximate Bayesian Computation relies on the computation of summary statistics. Using a cross-validation approach, we show that statistics such as the lengths of haplotypes shared between individuals, or the decay of linkage disequilibrium with distance, can be combined with classical statistics (e.g., heterozygosity and Tajima's D) to accurately infer complex demographic scenarios including bottlenecks and expansion periods. We also demonstrate the importance of simultaneously estimating the genotyping error rate. Applying our method on genome-wide human-sequence databases, we finally show that a model consisting in a bottleneck followed by a Paleolithic and a Neolithic expansion is the most relevant for Eurasian populations.
Collapse
Affiliation(s)
- Flora Jay
- Laboratoire EcoAnthropologie et Ethnobiologie, CNRS/MNHN/Université Paris Diderot, Paris, France.,Laboratoire de Recherche en Informatique, CNRS/Université Paris-Sud/Université Paris-Saclay, Orsay, France
| | - Simon Boitard
- GenPhySE, Université de Toulouse, INRA, INPT, INP-ENVT, Castanet Tolosan, France
| | - Frédéric Austerlitz
- Laboratoire EcoAnthropologie et Ethnobiologie, CNRS/MNHN/Université Paris Diderot, Paris, France
| |
Collapse
|
31
|
Abstract
Genome-wide single nucleotide polymorphism (SNP) arrays can be used to explore homozygosity segments, where two haplotypes inherited from the parents are identical. In this study, we identified a total of 27,358 runs of homozygosity (ROH) with an average of 153 ROH events per animal in Chinese local cattle. The sizes of ROH events varied considerably ranging from 0.5 to 66 Mb, with an average length of 1.22 Mb. The highest average proportion of the genome covered by ROH (~11.54% of the cattle genome) was found in Nanda cattle (NDC) from South China, whereas the lowest average proportion (~3.1%) was observed in Yanhuang cattle (YHC). The average estimated FROH ranged from 0.03 in YHC to 0.12 in NDC. For each of three ROH classes with different sizes (Small 0.5-1 Mb, Medium 1-5 Mb and Large >5 Mb), the numbers and total lengths of ROH per individual showed considerable differences across breeds. Moreover, we obtained 993 to 3603 ROH hotspots (which were defined where ROH frequency at a SNP within each breed exceeded the 1% threshold) among eight cattle breeds. Our results also revealed several candidate genes embedded with ROH hotspots which may be related to environmental conditions and local adaptation. In conclusion, we generated baselines for homozygosity patterns in diverse Chinese cattle breeds. Our results suggested that selection has, at least partially, played a role with other factors in shaping the genomic patterns of ROH in Chinese local cattle and might provide valuable insights for understanding the genetic basis of economic and adaptive traits.
Collapse
|
32
|
Weldenegodguad M, Popov R, Pokharel K, Ammosov I, Ming Y, Ivanova Z, Kantanen J. Whole-Genome Sequencing of Three Native Cattle Breeds Originating From the Northernmost Cattle Farming Regions. Front Genet 2019; 9:728. [PMID: 30687392 PMCID: PMC6336893 DOI: 10.3389/fgene.2018.00728] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Accepted: 12/22/2018] [Indexed: 12/30/2022] Open
Abstract
Northern Fennoscandia and the Sakha Republic in the Russian Federation represent the northernmost regions on Earth where cattle farming has been traditionally practiced. In this study, we performed whole-genome sequencing to genetically characterize three rare native breeds Eastern Finncattle, Western Finncattle and Yakutian cattle adapted to these northern Eurasian regions. We examined the demographic history, genetic diversity and unfolded loci under natural or artificial selection. On average, we achieved 13.01-fold genome coverage after mapping the sequencing reads on the bovine reference genome (UMD 3.1) and detected a total of 17.45 million single nucleotide polymorphisms (SNPs) and 1.95 million insertions-deletions (indels). We observed that the ancestral species (Bos primigenius) of Eurasian taurine cattle experienced two notable prehistorical declines in effective population size associated with dramatic climate changes. The modern Yakutian cattle exhibited a higher level of within-population variation in terms of number of SNPs and nucleotide diversity than the contemporary European taurine breeds. This result is in contrast to the results of marker-based cattle breed diversity studies, indicating assortment bias in previous analyses. Our results suggest that the effective population size of the ancestral Asiatic taurine cattle may have been higher than that of the European cattle. Alternatively, our findings could indicate the hybrid origins of the Yakutian cattle ancestries and possibly the lack of intensive artificial selection. We identified a number of genomic regions under selection that may have contributed to the adaptation to the northern and subarctic environments, including genes involved in disease resistance, sensory perception, cold adaptation and growth. By characterizing the native breeds, we were able to obtain new information on cattle genomes and on the value of the adapted breeds for the conservation of cattle genetic resources.
Collapse
Affiliation(s)
- Melak Weldenegodguad
- Department of Production Systems, Natural Resources Institute Finland (Luke), Helsinki, Finland.,Department of Environmental and Biological Sciences, University of Eastern Finland, Kuopio, Finland
| | - Ruslan Popov
- Yakutian Research Institute of Agriculture (FGBNU Yakutskij NIISH), Yakutsk, Russia
| | - Kisun Pokharel
- Department of Production Systems, Natural Resources Institute Finland (Luke), Helsinki, Finland
| | - Innokentyi Ammosov
- Board of Agricultural Office of Eveno-Bytantaj Region, Batagay-Alyta, Russia
| | - Yao Ming
- BGI-Genomics, BGI-Shenzhen, Shenzhen, China
| | - Zoya Ivanova
- Yakutian Research Institute of Agriculture (FGBNU Yakutskij NIISH), Yakutsk, Russia
| | - Juha Kantanen
- Department of Production Systems, Natural Resources Institute Finland (Luke), Helsinki, Finland
| |
Collapse
|
33
|
Pitt D, Bruford MW, Barbato M, Orozco‐terWengel P, Martínez R, Sevane N. Demography and rapid local adaptation shape Creole cattle genome diversity in the tropics. Evol Appl 2019; 12:105-122. [PMID: 30622639 PMCID: PMC6304683 DOI: 10.1111/eva.12641] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2017] [Revised: 04/12/2018] [Accepted: 04/14/2018] [Indexed: 02/06/2023] Open
Abstract
The introduction of Iberian cattle in the Americas after Columbus' arrival imposed high selection pressures on a limited number of animals over a brief period of time. Knowledge of the genomic regions selected during this process may help in enhancing climatic resilience and sustainable animal production. We first determined taurine and indicine contributions to the genomic structure of modern Creole cattle. Second, we inferred their demographic history using approximate Bayesian computation (ABC), linkage disequilibrium (LD) and N e Slope (NeS) analysis. Third, we performed whole genome scans for selection signatures based on cross-population extended haplotype homozygosity (XP-EHH) and population differentiation (F ST) to disentangle the genetic mechanisms involved in adaptation and phenotypic change by a rapid and major environmental transition. To tackle these questions, we combined SNP array data (~54,000 SNPs) in Creole breeds with their modern putative Iberian ancestors. Reconstruction of the population history of Creoles from the end of the 15th century indicated a major demographic expansion until the introduction of zebu and commercial breeds into the Americas ~180 years ago, coinciding with a drastic N e contraction. NeS analysis provided insights into short-term complexity in population change and depicted a decrease/expansion episode at the end of the ABC-inferred expansion, as well as several additional fluctuations in N e with the attainment of the current small N e only towards the end of the 20th century. Selection signatures for tropical adaptation pinpointed the thermoregulatory slick hair coat region, identifying a new candidate gene (GDNF), as well as novel candidate regions involved in immune function, behavioural processes, iron metabolism and adaptation to new feeding conditions. The outcomes from this study will help in future-proofing farm animal genetic resources (FAnGR) by providing molecular tools that allow selection for improved cattle performance, resilience and welfare under climate change.
Collapse
Affiliation(s)
- Daniel Pitt
- School of BiosciencesCardiff UniversityCardiffUK
| | - Michael W. Bruford
- School of BiosciencesCardiff UniversityCardiffUK
- Sustainable Places Research InstituteCardiff UniversityCardiffUK
| | - Mario Barbato
- Institute of ZootechnicsUniversità Cattolica del Sacro CuorePiacenzaItaly
| | | | - Rodrigo Martínez
- Centro de investigaciones TibaitatáCorporación Colombiana De Investigación Agropecuaria (Corpoica)BogotáColombia
| | | |
Collapse
|
34
|
Whalen A, Ros-Freixedes R, Wilson DL, Gorjanc G, Hickey JM. Hybrid peeling for fast and accurate calling, phasing, and imputation with sequence data of any coverage in pedigrees. Genet Sel Evol 2018; 50:67. [PMID: 30563452 PMCID: PMC6299538 DOI: 10.1186/s12711-018-0438-2] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2018] [Accepted: 12/11/2018] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND In this paper, we extend multi-locus iterative peeling to provide a computationally efficient method for calling, phasing, and imputing sequence data of any coverage in small or large pedigrees. Our method, called hybrid peeling, uses multi-locus iterative peeling to estimate shared chromosome segments between parents and their offspring at a subset of loci, and then uses single-locus iterative peeling to aggregate genomic information across multiple generations at the remaining loci. RESULTS Using a synthetic dataset, we first analysed the performance of hybrid peeling for calling and phasing genotypes in disconnected families, which contained only a focal individual and its parents and grandparents. Second, we analysed the performance of hybrid peeling for calling and phasing genotypes in the context of a full general pedigree. Third, we analysed the performance of hybrid peeling for imputing whole-genome sequence data to non-sequenced individuals in the population. We found that hybrid peeling substantially increased the number of called and phased genotypes by leveraging sequence information on related individuals. The calling rate and accuracy increased when the full pedigree was used compared to a reduced pedigree of just parents and grandparents. Finally, hybrid peeling imputed accurately whole-genome sequence to non-sequenced individuals. CONCLUSIONS We believe that this algorithm will enable the generation of low cost and high accuracy whole-genome sequence data in many pedigreed populations. We make this algorithm available as a standalone program called AlphaPeel.
Collapse
Affiliation(s)
- Andrew Whalen
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Midlothian, Scotland, UK
| | - Roger Ros-Freixedes
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Midlothian, Scotland, UK
| | - David L. Wilson
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Midlothian, Scotland, UK
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Midlothian, Scotland, UK
| | - John M. Hickey
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Midlothian, Scotland, UK
| |
Collapse
|
35
|
Beichman AC, Huerta-Sanchez E, Lohmueller KE. Using Genomic Data to Infer Historic Population Dynamics of Nonmodel Organisms. ANNUAL REVIEW OF ECOLOGY EVOLUTION AND SYSTEMATICS 2018. [DOI: 10.1146/annurev-ecolsys-110617-062431] [Citation(s) in RCA: 89] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Genome sequence data are now being routinely obtained from many nonmodel organisms. These data contain a wealth of information about the demographic history of the populations from which they originate. Many sophisticated statistical inference procedures have been developed to infer the demographic history of populations from this type of genomic data. In this review, we discuss the different statistical methods available for inference of demography, providing an overview of the underlying theory and logic behind each approach. We also discuss the types of data required and the pros and cons of each method. We then discuss how these methods have been applied to a variety of nonmodel organisms. We conclude by presenting some recommendations for researchers looking to use genomic data to infer demographic history.
Collapse
Affiliation(s)
- Annabel C. Beichman
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California 90095, USA
| | - Emilia Huerta-Sanchez
- Department of Molecular and Cell Biology, University of California, Merced, California 95343, USA
- Current affiliation: Department of Ecology and Evolutionary Biology, Brown University, Providence, Rhode Island 02912, USA
| | - Kirk E. Lohmueller
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California 90095, USA
- Interdepartmental Program in Bioinformatics and Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, California 90095, USA
| |
Collapse
|
36
|
Genomic Prediction Using Individual-Level Data and Summary Statistics from Multiple Populations. Genetics 2018; 210:53-69. [PMID: 30021793 DOI: 10.1534/genetics.118.301109] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2018] [Accepted: 07/16/2018] [Indexed: 01/27/2023] Open
Abstract
This study presents a method for genomic prediction that uses individual-level data and summary statistics from multiple populations. Genome-wide markers are nowadays widely used to predict complex traits, and genomic prediction using multi-population data are an appealing approach to achieve higher prediction accuracies. However, sharing of individual-level data across populations is not always possible. We present a method that enables integration of summary statistics from separate analyses with the available individual-level data. The data can either consist of individuals with single or multiple (weighted) phenotype records per individual. We developed a method based on a hypothetical joint analysis model and absorption of population-specific information. We show that population-specific information is fully captured by estimated allele substitution effects and the accuracy of those estimates, i.e., the summary statistics. The method gives identical result as the joint analysis of all individual-level data when complete summary statistics are available. We provide a series of easy-to-use approximations that can be used when complete summary statistics are not available or impractical to share. Simulations show that approximations enable integration of different sources of information across a wide range of settings, yielding accurate predictions. The method can be readily extended to multiple-traits. In summary, the developed method enables integration of genome-wide data in the individual-level or summary statistics from multiple populations to obtain more accurate estimates of allele substitution effects and genomic predictions.
Collapse
|
37
|
Inferring sex-specific demographic history from SNP data. PLoS Genet 2018; 14:e1007191. [PMID: 29385127 PMCID: PMC5809101 DOI: 10.1371/journal.pgen.1007191] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2017] [Revised: 02/12/2018] [Accepted: 01/08/2018] [Indexed: 12/04/2022] Open
Abstract
The relative female and male contributions to demography are of great importance to better understand the history and dynamics of populations. While earlier studies relied on uniparental markers to investigate sex-specific questions, the increasing amount of sequence data now enables us to take advantage of tens to hundreds of thousands of independent loci from autosomes and the X chromosome. Here, we develop a novel method to estimate effective sex ratios or ESR (defined as the female proportion of the effective population) from allele count data for each branch of a rooted tree topology that summarizes the history of the populations of interest. Our method relies on Kimura’s time-dependent diffusion approximation for genetic drift, and is based on a hierarchical Bayesian model to integrate over the allele frequencies along the branches. We show via simulations that parameters are inferred robustly, even under scenarios that violate some of the model assumptions. Analyzing bovine SNP data, we infer a strongly female-biased ESR in both dairy and beef cattle, as expected from the underlying breeding scheme. Conversely, we observe a strongly male-biased ESR in early domestication times, consistent with an easier taming and management of cows, and/or introgression from wild auroch males, that would both cause a relative increase in male effective population size. In humans, analyzing a subsample of non-African populations, we find a male-biased ESR in Oceanians that may reflect complex marriage patterns in Aboriginal Australians. Because our approach relies on allele count data, it may be applied on a wide range of species. The history of populations and their social organization is often intricate due to breeding structures, migration patterns or population bottlenecks. Estimation of the female proportion of the effective population (sex ratio) is therefore important to better understand this underlying social structure and dynamics. This question has been mainly investigated so far by comparing genetic variation of mitochondrial DNA and the Y chromosome, two uniparentally inherited markers that reflect the demographic history of females and males, respectively. To overcome the intrinsic limitations of these genetic markers, and to take advantage of the increasing amount of sequence data, we propose a new approach that uses large numbers of independent polymorphisms from autosomes and the X chromosome to estimate sex ratios, throughout the history of populations. This method allows us to confirm a strongly female-biased sex ratio in modern dairy and beef cattle breeds. Yet, we find a strongly male-biased sex ratio during domestication times, consistent with an easier taming and management of cows, and/or introgression from wild auroch males. Analyzing human data from a sample of non-African populations, we find a male bias in Oceanians, possibly indicating complex marriage patterns among Aboriginal Australian groups.
Collapse
|
38
|
Blant A, Kwong M, Szpiech ZA, Pemberton TJ. Weighted likelihood inference of genomic autozygosity patterns in dense genotype data. BMC Genomics 2017; 18:928. [PMID: 29191164 PMCID: PMC5709839 DOI: 10.1186/s12864-017-4312-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2017] [Accepted: 11/16/2017] [Indexed: 12/14/2022] Open
Abstract
Background Genomic regions of autozygosity (ROA) arise when an individual is homozygous for haplotypes inherited identical-by-descent from ancestors shared by both parents. Over the past decade, they have gained importance for understanding evolutionary history and the genetic basis of complex diseases and traits. However, methods to infer ROA in dense genotype data have not evolved in step with advances in genome technology that now enable us to rapidly create large high-resolution genotype datasets, limiting our ability to investigate their constituent ROA patterns. Methods We report a weighted likelihood approach for inferring ROA in dense genotype data that accounts for autocorrelation among genotyped positions and the possibilities of unobserved mutation and recombination events, and variability in the confidence of individual genotype calls in whole genome sequence (WGS) data. Results Forward-time genetic simulations under two demographic scenarios that reflect situations where inbreeding and its effect on fitness are of interest suggest this approach is better powered than existing state-of-the-art methods to infer ROA at marker densities consistent with WGS and popular microarray genotyping platforms used in human and non-human studies. Moreover, we present evidence that suggests this approach is able to distinguish ROA arising via consanguinity from ROA arising via endogamy. Using subsets of The 1000 Genomes Project Phase 3 data we show that, relative to WGS, intermediate and long ROA are captured robustly with popular microarray platforms, while detection of short ROA is more variable and improves with marker density. Worldwide ROA patterns inferred from WGS data are found to accord well with those previously reported on the basis of microarray genotype data. Finally, we highlight the potential of this approach to detect genomic regions enriched for autozygosity signals in one group relative to another based upon comparisons of per-individual autozygosity likelihoods instead of inferred ROA frequencies. Conclusions This weighted likelihood ROA inference approach can assist population- and disease-geneticists working with a wide variety of data types and species to explore ROA patterns and to identify genomic regions with differential ROA signals among groups, thereby advancing our understanding of evolutionary history and the role of recessive variation in phenotypic variation and disease. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-4312-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Alexandra Blant
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB, Canada
| | - Michelle Kwong
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB, Canada
| | - Zachary A Szpiech
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA, USA
| | - Trevor J Pemberton
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB, Canada.
| |
Collapse
|
39
|
Inferring Individual Inbreeding and Demographic History from Segments of Identity by Descent in Ficedula Flycatcher Genome Sequences. Genetics 2017; 205:1319-1334. [PMID: 28100590 DOI: 10.1534/genetics.116.198861] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2016] [Accepted: 01/11/2017] [Indexed: 01/25/2023] Open
Abstract
Individual inbreeding and historical demography can be estimated by analyzing runs of homozygosity (ROH), which are indicative of chromosomal segments of identity by descent (IBD). Such analyses have so far been rare in natural populations due to limited genomic resources. We analyzed ROH in whole genome sequences from 287 Ficedula flycatchers representing four species, with the objectives of evaluating the causes of genome-wide variation in the abundance of ROH and inferring historical demography. ROH were clearly more abundant in genomic regions with low recombination rate. However, this pattern was substantially weaker when ROH were mapped using genetic rather than physical single nucleotide polymorphism (SNP) coordinates in the genome. Empirical results and simulations suggest that high ROH abundance in regions of low recombination was partly caused by increased power to detect the very long IBD segments typical of regions with a low recombination rate. Simulations also showed that hard selective sweeps (but not soft sweeps or background selection) likely contributed to variation in the abundance of ROH across the genome. Comparisons of the abundance of ROH among several study populations indicated that the Spanish pied flycatcher population had the smallest historical effective population size (Ne) for this species, and that a putatively recently founded island (Baltic) population had the smallest historical Ne among the collared flycatchers. Analysis of pairwise IBD in Baltic collared flycatchers indicated that this population was founded <60 generations ago. This study provides a rare genomic glimpse into demographic history and the mechanisms underlying the genome-wide distribution of ROH.
Collapse
|
40
|
Peripolli E, Munari DP, Silva MVGB, Lima ALF, Irgang R, Baldi F. Runs of homozygosity: current knowledge and applications in livestock. Anim Genet 2016; 48:255-271. [PMID: 27910110 DOI: 10.1111/age.12526] [Citation(s) in RCA: 207] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/23/2016] [Indexed: 12/17/2022]
Abstract
This review presents a broader approach to the implementation and study of runs of homozygosity (ROH) in animal populations, focusing on identifying and characterizing ROH and their practical implications. ROH are continuous homozygous segments that are common in individuals and populations. The ability of these homozygous segments to give insight into a population's genetic events makes them a useful tool that can provide information about the demographic evolution of a population over time. Furthermore, ROH provide useful information about the genetic relatedness among individuals, helping to minimize the inbreeding rate and also helping to expose deleterious variants in the genome. The frequency, size and distribution of ROH in the genome are influenced by factors such as natural and artificial selection, recombination, linkage disequilibrium, population structure, mutation rate and inbreeding level. Calculating the inbreeding coefficient from molecular information from ROH (FROH ) is more accurate for estimating autozygosity and for detecting both past and more recent inbreeding effects than are estimates from pedigree data (FPED ). The better results of FROH suggest that FROH can be used to infer information about the history and inbreeding levels of a population in the absence of genealogical information. The selection of superior animals has produced large phenotypic changes and has reshaped the ROH patterns in various regions of the genome. Additionally, selection increases homozygosity around the target locus, and deleterious variants are seen to occur more frequently in ROH regions. Studies involving ROH are increasingly common and provide valuable information about how the genome's architecture can disclose a population's genetic background. By revealing the molecular changes in populations over time, genome-wide information is crucial to understanding antecedent genome architecture and, therefore, to maintaining diversity and fitness in endangered livestock breeds.
Collapse
Affiliation(s)
- E Peripolli
- Departamento de Zootecnia, Faculdade de Ciências Agrárias e Veterinárias, UNESP Univ Estadual Paulista Júlio de Mesquita Filho, Jaboticabal, 14884-900, Brazil
| | - D P Munari
- Departamento de Ciências Exatas, Faculdade de Ciências Agrárias e Veterinárias, UNESP Univ Estadual Paulista Júlio de Mesquita Filho, Jaboticabal, 14884-900, Brazil.,Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPQ), Lago Sul, 71605-001, Brazil
| | - M V G B Silva
- Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPQ), Lago Sul, 71605-001, Brazil.,Embrapa Gado de Leite, Juiz de Fora, 36038-330, Brazil
| | - A L F Lima
- Departamento de Zootecnia e Desenvolvimento Rural, Centro de Ciências Agrárias, Universidade Federal de Santa Catarina, Florianópolis, 88034-000, Brazil
| | - R Irgang
- Departamento de Zootecnia e Desenvolvimento Rural, Centro de Ciências Agrárias, Universidade Federal de Santa Catarina, Florianópolis, 88034-000, Brazil
| | - F Baldi
- Departamento de Zootecnia, Faculdade de Ciências Agrárias e Veterinárias, UNESP Univ Estadual Paulista Júlio de Mesquita Filho, Jaboticabal, 14884-900, Brazil.,Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPQ), Lago Sul, 71605-001, Brazil
| |
Collapse
|
41
|
Gautier M, Moazami-Goudarzi K, Levéziel H, Parinello H, Grohs C, Rialle S, Kowalczyk R, Flori L. Deciphering the Wisent Demographic and Adaptive Histories from Individual Whole-Genome Sequences. Mol Biol Evol 2016; 33:2801-2814. [PMID: 27436010 PMCID: PMC5062319 DOI: 10.1093/molbev/msw144] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
As the largest European herbivore, the wisent (Bison bonasus) is emblematic of the continent wildlife but has unclear origins. Here, we infer its demographic and adaptive histories from two individual whole-genome sequences via a detailed comparative analysis with bovine genomes. We estimate that the wisent and bovine species diverged from 1.7 × 106 to 850,000 years before present (YBP) through a speciation process involving an extended period of limited gene flow. Our data further support the occurrence of more recent secondary contacts, posterior to the Bos taurus and Bos indicus divergence (∼150,000 YBP), between the wisent and (European) taurine cattle lineages. Although the wisent and bovine population sizes experienced a similar sharp decline since the Last Glacial Maximum, we find that the wisent demography remained more fluctuating during the Pleistocene. This is in agreement with a scenario in which wisents responded to successive glaciations by habitat fragmentation rather than southward and eastward migration as for the bovine ancestors. We finally detect 423 genes under positive selection between the wisent and bovine lineages, which shed a new light on the genome response to different living conditions (temperature, available food resource, and pathogen exposure) and on the key gene functions altered by the domestication process.
Collapse
Affiliation(s)
- Mathieu Gautier
- CBGP, INRA, CIRAD, IRD, Supagro, Montferrier-sur-Lez, France IBC, Institut de Biologie Computationnelle, Montpellier, France
| | | | | | - Hugues Parinello
- MGX-Montpellier GenomiX, c/o Institut de Génomique Fonctionnelle, Montpellier, France
| | - Cécile Grohs
- GABI, INRA, AgroParisTech, Université Paris-Saclay, Jouy-en-Josas, France
| | - Stéphanie Rialle
- MGX-Montpellier GenomiX, c/o Institut de Génomique Fonctionnelle, Montpellier, France
| | - Rafał Kowalczyk
- Mammal Research Institute, Polish Academy of Sciences, Białowieża, Poland
| | - Laurence Flori
- GABI, INRA, AgroParisTech, Université Paris-Saclay, Jouy-en-Josas, France INTERTRYP, CIRAD, IRD, Montpellier, France
| |
Collapse
|
42
|
Iso-Touru T, Tapio M, Vilkki J, Kiseleva T, Ammosov I, Ivanova Z, Popov R, Ozerov M, Kantanen J. Genetic diversity and genomic signatures of selection among cattle breeds from Siberia, eastern and northern Europe. Anim Genet 2016; 47:647-657. [DOI: 10.1111/age.12473] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/31/2016] [Indexed: 12/31/2022]
Affiliation(s)
- T. Iso-Touru
- Green Technology; Natural Resources Institute Finland (Luke); Jokioinen 31600 Finland
| | - M. Tapio
- Green Technology; Natural Resources Institute Finland (Luke); Jokioinen 31600 Finland
| | - J. Vilkki
- Green Technology; Natural Resources Institute Finland (Luke); Jokioinen 31600 Finland
| | - T. Kiseleva
- All-Russian Research Institute for Farm Animal Genetics and Breeding; Russian Academy of Sciences; 55-a Moskovskoe Shosse St. Petersburg-Pushkin 199601 Russia
| | - I. Ammosov
- Board of Agricultural Office of Eveno-Bytantaj Region; Batagay-Alyta 678580 The Sakha Republic (Yakutsk) Russia
| | - Z. Ivanova
- Yakutian Research Institute of Agriculture; Yakutsk Sakha 677007 Russia
| | - R. Popov
- Yakutian Research Institute of Agriculture; Yakutsk Sakha 677007 Russia
| | - M. Ozerov
- Green Technology; Natural Resources Institute Finland (Luke); Jokioinen 31600 Finland
- Department of Biology; University of Turku; Turku 20014 Finland
| | - J. Kantanen
- Green Technology; Natural Resources Institute Finland (Luke); Jokioinen 31600 Finland
- Department of Environmental and Biological Sciences; University of Eastern Finland; PO Box 1627 Kuopio 70211 Finland
| |
Collapse
|
43
|
Fleming DS, Koltes JE, Markey AD, Schmidt CJ, Ashwell CM, Rothschild MF, Persia ME, Reecy JM, Lamont SJ. Genomic analysis of Ugandan and Rwandan chicken ecotypes using a 600 k genotyping array. BMC Genomics 2016; 17:407. [PMID: 27230772 PMCID: PMC4882793 DOI: 10.1186/s12864-016-2711-5] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2016] [Accepted: 05/06/2016] [Indexed: 02/07/2023] Open
Abstract
Background Indigenous populations of animals have developed unique adaptations to their local environments, which may include factors such as response to thermal stress, drought, pathogens and suboptimal nutrition. The survival and subsequent evolution within these local environments can be the result of both natural and artificial selection driving the acquisition of favorable traits, which over time leave genomic signatures in a population. This study’s goals are to characterize genomic diversity and identify selection signatures in chickens from equatorial Africa to identify genomic regions that may confer adaptive advantages of these ecotypes to their environments. Results Indigenous chickens from Uganda (n = 72) and Rwanda (n = 100), plus Kuroilers (n = 24, an Indian breed imported to Africa), were genotyped using the Axiom® 600 k Chicken Genotyping Array. Indigenous ecotypes were defined based upon location of sampling within Africa. The results revealed the presence of admixture among the Ugandan, Rwandan, and Kuroiler populations. Genes within runs of homozygosity consensus regions are linked to gene ontology (GO) terms related to lipid metabolism, immune functions and stress-mediated responses (FDR < 0.15). The genes within regions of signatures of selection are enriched for GO terms related to health and oxidative stress processes. Key genes in these regions had anti-oxidant, apoptosis, and inflammation functions. Conclusions The study suggests that these populations have alleles under selective pressure from their environment, which may aid in adaptation to harsh environments. The correspondence in gene ontology terms connected to stress-mediated processes across the populations could be related to the similarity of environments or an artifact of the detected admixture. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2711-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - J E Koltes
- Iowa State University, Ames, IA, USA.,University of Arkansas, Fayetteville, AR, USA
| | | | | | - C M Ashwell
- North Carolina State University, Raleigh, NC, USA
| | | | - M E Persia
- Virginia Polytechnic University, Blacksburg, VA, USA
| | - J M Reecy
- Iowa State University, Ames, IA, USA
| | | |
Collapse
|
44
|
Uncovering Adaptation from Sequence Data: Lessons from Genome Resequencing of Four Cattle Breeds. Genetics 2016; 203:433-50. [PMID: 27017625 DOI: 10.1534/genetics.115.181594] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2015] [Accepted: 03/03/2016] [Indexed: 01/25/2023] Open
Abstract
Detecting the molecular basis of adaptation is one of the major questions in population genetics. With the advance in sequencing technologies, nearly complete interrogation of genome-wide polymorphisms in multiple populations is becoming feasible in some species, with the expectation that it will extend quickly to new ones. Here, we investigate the advantages of sequencing for the detection of adaptive loci in multiple populations, exploiting a recently published data set in cattle (Bos taurus). We used two different approaches to detect statistically significant signals of positive selection: a within-population approach aimed at identifying hard selective sweeps and a population-differentiation approach that can capture other selection events such as soft or incomplete sweeps. We show that the two methods are complementary in that they indeed capture different kinds of selection signatures. Our study confirmed some of the well-known adaptive loci in cattle (e.g., MC1R, KIT, GHR, PLAG1, NCAPG/LCORL) and detected some new ones (e.g., ARL15, PRLR, CYP19A1, PPM1L). Compared to genome scans based on medium- or high-density SNP data, we found that sequencing offered an increased detection power and a higher resolution in the localization of selection signatures. In several cases, we could even pinpoint the underlying causal adaptive mutation or at least a very small number of possible candidates (e.g., MC1R, PLAG1). Our results on these candidates suggest that a vast majority of adaptive mutations are likely to be regulatory rather than protein-coding variants.
Collapse
|
45
|
Boitard S, Rodríguez W, Jay F, Mona S, Austerlitz F. Inferring Population Size History from Large Samples of Genome-Wide Molecular Data - An Approximate Bayesian Computation Approach. PLoS Genet 2016; 12:e1005877. [PMID: 26943927 PMCID: PMC4778914 DOI: 10.1371/journal.pgen.1005877] [Citation(s) in RCA: 107] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Accepted: 01/27/2016] [Indexed: 12/02/2022] Open
Abstract
Inferring the ancestral dynamics of effective population size is a long-standing question in population genetics, which can now be tackled much more accurately thanks to the massive genomic data available in many species. Several promising methods that take advantage of whole-genome sequences have been recently developed in this context. However, they can only be applied to rather small samples, which limits their ability to estimate recent population size history. Besides, they can be very sensitive to sequencing or phasing errors. Here we introduce a new approximate Bayesian computation approach named PopSizeABC that allows estimating the evolution of the effective population size through time, using a large sample of complete genomes. This sample is summarized using the folded allele frequency spectrum and the average zygotic linkage disequilibrium at different bins of physical distance, two classes of statistics that are widely used in population genetics and can be easily computed from unphased and unpolarized SNP data. Our approach provides accurate estimations of past population sizes, from the very first generations before present back to the expected time to the most recent common ancestor of the sample, as shown by simulations under a wide range of demographic scenarios. When applied to samples of 15 or 25 complete genomes in four cattle breeds (Angus, Fleckvieh, Holstein and Jersey), PopSizeABC revealed a series of population declines, related to historical events such as domestication or modern breed creation. We further highlight that our approach is robust to sequencing errors, provided summary statistics are computed from SNPs with common alleles. Molecular data sampled from extant individuals contains considerable information about their demographic history. In particular, one classical question in population genetics is to reconstruct past population size changes from such data. Relating these changes to various climatic, geological or anthropogenic events allows characterizing the main factors driving genetic diversity and can have major outcomes for conservation. Until recently, mostly very simple histories, including one or two population size changes, could be estimated from genetic data. This has changed with the sequencing of entire genomes in many species, and several methods allow now inferring complex histories consisting of several tens of population size changes. However, analyzing entire genomes, while accounting for recombination, remains a statistical and numerical challenge. These methods, therefore, can only be applied to small samples with a few diploid genomes. We overcome this limitation by using an approximate estimation approach, where observed genomes are summarized using a small number of statistics related to allele frequencies and linkage disequilibrium. In contrast to previous approaches, we show that our method allows us to reconstruct also the most recent part (the last 100 generations) of the population size history. As an illustration, we apply it to large samples of whole-genome sequences in four cattle breeds.
Collapse
Affiliation(s)
- Simon Boitard
- Institut de Systématique, Évolution, Biodiversité ISYEB - UMR 7205 - CNRS & MNHN & UPMC & EPHE, Ecole Pratique des Hautes Etudes, Sorbonne Universités, Paris, France
- GABI, INRA, AgroParisTech, Université Paris-Saclay, Jouy-en-Josas, France
- * E-mail:
| | - Willy Rodríguez
- UMR CNRS 5219, Institut de Mathématiques de Toulouse, Université de Toulouse, Toulouse, France
| | - Flora Jay
- UMR 7206 Eco-anthropologie et Ethnobiologie, Muséum National d’Histoire Naturelle, CNRS, Université Paris Diderot, Paris, France
- LRI, Paris-Sud University, CNRS UMR 8623, Orsay, France
| | - Stefano Mona
- Institut de Systématique, Évolution, Biodiversité ISYEB - UMR 7205 - CNRS & MNHN & UPMC & EPHE, Ecole Pratique des Hautes Etudes, Sorbonne Universités, Paris, France
| | - Frédéric Austerlitz
- UMR 7206 Eco-anthropologie et Ethnobiologie, Muséum National d’Histoire Naturelle, CNRS, Université Paris Diderot, Paris, France
| |
Collapse
|
46
|
Nadachowska-Brzyska K, Burri R, Smeds L, Ellegren H. PSMC analysis of effective population sizes in molecular ecology and its application to black-and-white Ficedula flycatchers. Mol Ecol 2016; 25:1058-72. [PMID: 26797914 PMCID: PMC4793928 DOI: 10.1111/mec.13540] [Citation(s) in RCA: 180] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2015] [Revised: 12/15/2015] [Accepted: 01/07/2016] [Indexed: 12/12/2022]
Abstract
Climatic fluctuations during the Quaternary period governed the demography of species and contributed to population differentiation and ultimately speciation. Studies of these past processes have previously been hindered by a lack of means and genetic data to model changes in effective population size (Ne ) through time. However, based on diploid genome sequences of high quality, the recently developed pairwise sequentially Markovian coalescent (PSMC) can estimate trajectories of changes in Ne over considerable time periods. We applied this approach to resequencing data from nearly 200 genomes of four species and several populations of the Ficedula species complex of black-and-white flycatchers. Ne curves of Atlas, collared, pied and semicollared flycatcher converged 1-2 million years ago (Ma) at an Ne of ≈ 200 000, likely reflecting the time when all four species last shared a common ancestor. Subsequent separate Ne trajectories are consistent with lineage splitting and speciation. All species showed evidence of population growth up until 100-200 thousand years ago (kya), followed by decline and then start of a new phase of population expansion. However, timing and amplitude of changes in Ne differed among species, and for pied flycatcher, the temporal dynamics of Ne differed between Spanish birds and central/northern European populations. This cautions against extrapolation of demographic inference between lineages and calls for adequate sampling to provide representative pictures of the coalescence process in different species or populations. We also empirically evaluate criteria for proper inference of demographic histories using PSMC and arrive at recommendations of using sequencing data with a mean genome coverage of ≥18X, a per-site filter of ≥10 reads and no more than 25% of missing data.
Collapse
Affiliation(s)
- Krystyna Nadachowska-Brzyska
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Norbyvägen 18D, SE-752 36, Uppsala, Sweden
| | - Reto Burri
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Norbyvägen 18D, SE-752 36, Uppsala, Sweden
| | - Linnéa Smeds
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Norbyvägen 18D, SE-752 36, Uppsala, Sweden
| | - Hans Ellegren
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Norbyvägen 18D, SE-752 36, Uppsala, Sweden
| |
Collapse
|
47
|
Neolithic and Bronze Age migration to Ireland and establishment of the insular Atlantic genome. Proc Natl Acad Sci U S A 2015; 113:368-73. [PMID: 26712024 DOI: 10.1073/pnas.1518445113] [Citation(s) in RCA: 118] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The Neolithic and Bronze Age transitions were profound cultural shifts catalyzed in parts of Europe by migrations, first of early farmers from the Near East and then Bronze Age herders from the Pontic Steppe. However, a decades-long, unresolved controversy is whether population change or cultural adoption occurred at the Atlantic edge, within the British Isles. We address this issue by using the first whole genome data from prehistoric Irish individuals. A Neolithic woman (3343-3020 cal BC) from a megalithic burial (10.3× coverage) possessed a genome of predominantly Near Eastern origin. She had some hunter-gatherer ancestry but belonged to a population of large effective size, suggesting a substantial influx of early farmers to the island. Three Bronze Age individuals from Rathlin Island (2026-1534 cal BC), including one high coverage (10.5×) genome, showed substantial Steppe genetic heritage indicating that the European population upheavals of the third millennium manifested all of the way from southern Siberia to the western ocean. This turnover invites the possibility of accompanying introduction of Indo-European, perhaps early Celtic, language. Irish Bronze Age haplotypic similarity is strongest within modern Irish, Scottish, and Welsh populations, and several important genetic variants that today show maximal or very high frequencies in Ireland appear at this horizon. These include those coding for lactase persistence, blue eye color, Y chromosome R1b haplotypes, and the hemochromatosis C282Y allele; to our knowledge, the first detection of a known Mendelian disease variant in prehistory. These findings together suggest the establishment of central attributes of the Irish genome 4,000 y ago.
Collapse
|
48
|
Chamberlain AJ, Vander Jagt CJ, Hayes BJ, Khansefid M, Marett LC, Millen CA, Nguyen TTT, Goddard ME. Extensive variation between tissues in allele specific expression in an outbred mammal. BMC Genomics 2015; 16:993. [PMID: 26596891 PMCID: PMC4657355 DOI: 10.1186/s12864-015-2174-0] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2015] [Accepted: 10/31/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Allele specific gene expression (ASE), with the paternal allele more expressed than the maternal allele or vice versa, appears to be a common phenomenon in humans and mice. In other species the extent of ASE is unknown, and even in humans and mice there are several outstanding questions. These include; to what extent is ASE tissue specific? how often does the direction of allele expression imbalance reverse between tissues? how often is only one of the two alleles expressed? is there a genome wide bias towards expression of the paternal or maternal allele; and finally do genes that are nearby on a chromosome share the same direction of ASE? Here we use gene expression data (RNASeq) from 18 tissues from a single cow to investigate each of these questions in turn, and then validate some of these findings in two tissues from 20 cows. RESULTS Between 40 and 100 million sequence reads were generated per tissue across three replicate samples for each of the eighteen tissues from the single cow (the discovery dataset). A bovine gene expression atlas was created (the first from RNASeq data), and differentially expressed genes in each tissue were identified. To analyse ASE, we had access to unambiguously phased genotypes for all heterozygous variants in the cow's whole genome sequence, where these variants were homozygous in the whole genome sequence of her sire, and as a result we were able to map reads to parental genomes, to determine SNP and genes showing ASE in each tissue. In total 25,251 heterozygous SNP within 7985 genes were tested for ASE in at least one tissue. ASE was pervasive, 89 % of genes tested had significant ASE in at least one tissue. This large proportion of genes displaying ASE was confirmed in the two tissues in a validation dataset. For individual tissues the proportion of genes showing significant ASE varied from as low as 8-16 % of those tested in thymus to as high as 71-82 % of those tested in lung. There were a number of cases where the direction of allele expression imbalance reversed between tissues. For example the gene SPTY2D1 showed almost complete paternal allele expression in kidney and thymus, and almost complete maternal allele expression in the brain caudal lobe and brain cerebellum. Mono allelic expression (MAE) was common, with 1349 of 4856 genes (28 %) tested with more than one heterozygous SNP showing MAE. Across all tissues, 54.17 % of all genes with ASE favoured the paternal allele. Genes that are closely linked on the chromosome were more likely to show higher expression of the same allele (paternal or maternal) than expected by chance. We identified several long runs of neighbouring genes that showed either paternal or maternal ASE, one example was five adjacent genes (GIMAP8, GIMAP7 copy1, GIMAP4, GIMAP7 copy 2 and GIMAP5) that showed almost exclusive paternal expression in brain caudal lobe. CONCLUSIONS Investigating the extent of ASE across 18 bovine tissues in one cow and two tissues in 20 cows demonstrated 1) ASE is pervasive in cattle, 2) the ASE is often MAE but ranges from MAE to slight overexpression of the major allele, 3) the ASE is most often tissue specific and that more than half the time displays divergent allele specific expression patterns across tissues, 4) across all genes there is a slight bias towards expression of the paternal allele and 5) genes expressing the same parental allele are clustered together more than expected by chance, and there are several runs of large numbers of genes expressing the same parental allele.
Collapse
Affiliation(s)
- Amanda J Chamberlain
- Department of Economic Development, Jobs, Transport and Resources, Agribiosciences Building, 5 Ring Rd, Bundoora, Australia.
- Dairy Futures Cooperative Research Centre, Agribiosciences Building, 5 Ring Rd, Bundoora, Australia.
| | - Christy J Vander Jagt
- Department of Economic Development, Jobs, Transport and Resources, Agribiosciences Building, 5 Ring Rd, Bundoora, Australia.
- Dairy Futures Cooperative Research Centre, Agribiosciences Building, 5 Ring Rd, Bundoora, Australia.
| | - Benjamin J Hayes
- Department of Economic Development, Jobs, Transport and Resources, Agribiosciences Building, 5 Ring Rd, Bundoora, Australia.
- Dairy Futures Cooperative Research Centre, Agribiosciences Building, 5 Ring Rd, Bundoora, Australia.
- La Trobe University, Agribiosciences Building, 5 Ring Rd, Bundoora, Australia.
| | - Majid Khansefid
- Department of Economic Development, Jobs, Transport and Resources, Agribiosciences Building, 5 Ring Rd, Bundoora, Australia.
- Dairy Futures Cooperative Research Centre, Agribiosciences Building, 5 Ring Rd, Bundoora, Australia.
- Institute of Land and Food, University of Melbourne, Royal Parade, Parkville, Australia.
| | - Leah C Marett
- Department of Economic Development, Jobs, Transport and Resources, 1301 Hazeldean Rd, Ellinbank, Australia.
| | - Catriona A Millen
- Dairy Futures Cooperative Research Centre, Agribiosciences Building, 5 Ring Rd, Bundoora, Australia.
- Institute of Land and Food, University of Melbourne, Royal Parade, Parkville, Australia.
| | - Thuy T T Nguyen
- Department of Economic Development, Jobs, Transport and Resources, Agribiosciences Building, 5 Ring Rd, Bundoora, Australia.
| | - Michael E Goddard
- Department of Economic Development, Jobs, Transport and Resources, Agribiosciences Building, 5 Ring Rd, Bundoora, Australia.
- Institute of Land and Food, University of Melbourne, Royal Parade, Parkville, Australia.
| |
Collapse
|
49
|
Metzger J, Karwath M, Tonda R, Beltran S, Águeda L, Gut M, Gut IG, Distl O. Runs of homozygosity reveal signatures of positive selection for reproduction traits in breed and non-breed horses. BMC Genomics 2015; 16:764. [PMID: 26452642 PMCID: PMC4600213 DOI: 10.1186/s12864-015-1977-3] [Citation(s) in RCA: 83] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2015] [Accepted: 10/03/2015] [Indexed: 11/24/2022] Open
Abstract
Background Modern horses represent heterogeneous populations specifically selected for appearance and performance. Genomic regions under high selective pressure show characteristic runs of homozygosity (ROH) which represent a low genetic diversity. This study aims at detecting the number and functional distribution of ROHs in different horse populations using next generation sequencing data. Methods Next generation sequencing was performed for two Sorraia, one Dülmen Horse, one Arabian, one Saxon-Thuringian Heavy Warmblood, one Thoroughbred and four Hanoverian. After quality control reads were mapped to the reference genome EquCab2.70. ROH detection was performed using PLINK, version 1.07 for a trimmed dataset with 11,325,777 SNPs and a mean read depth of 12. Stretches with homozygous genotypes of >40 kb as well as >400 kb were defined as ROHs. SNPs within consensus ROHs were tested for neutrality. Functional classification was done for genes annotated within ROHs using PANTHER gene list analysis and functional variants were tested for their distribution among breed or non-breed groups. Results ROH detection was performed using whole genome sequences of ten horses of six populations representing various breed types and non-breed horses. In total, an average number of 3492 ROHs were detected in windows of a minimum of 50 consecutive homozygous SNPs and an average number of 292 ROHs in windows of 500 consecutive homozygous SNPs. Functional analyses of private ROHs in each horse revealed a high frequency of genes affecting cellular, metabolic, developmental, immune system and reproduction processes. In non-breed horses, 198 ROHs in 50-SNP windows and seven ROHs in 500-SNP windows showed an enrichment of genes involved in reproduction, embryonic development, energy metabolism, muscle and cardiac development whereas all seven breed horses revealed only three common ROHs in 50-SNP windows harboring the fertility-related gene YES1. In the Hanoverian, a total of 18 private ROHs could be shown to be located in the region of genes potentially involved in neurologic control, signaling, glycogen balance and reproduction. Comparative analysis of homozygous stretches common in all ten horses displayed three ROHs which were all located in the region of KITLG, the ligand of KIT known to be involved in melanogenesis, haematopoiesis and gametogenesis. Conclusions The results of this study give a comprehensive insight into the frequency and number of ROHs in various horses and their potential influence on population diversity and selection pressures. Comparisons of breed and non-breed horses suggest a significant artificial as well as natural selection pressure on reproduction performance in all types of horse populations. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1977-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Julia Metzger
- Institute for Animal Breeding and Genetics, University of Veterinary Medicine Hannover, Bünteweg 17p, 30559, Hannover, Germany.
| | - Matthias Karwath
- Lower Saxony State Office for the Environment, Agriculture and Geology, Unit 74, Animal Breeding and Hygiene, Schlossallee 1, 01468, Moritzburg, Germany.
| | - Raul Tonda
- Centro Nacional de Análisis Genómico, Parc Científic de Barcelona, Torre I Baldiri Reixac, 4, 08028, Barcelona, Spain.
| | - Sergi Beltran
- Centro Nacional de Análisis Genómico, Parc Científic de Barcelona, Torre I Baldiri Reixac, 4, 08028, Barcelona, Spain.
| | - Lídia Águeda
- Centro Nacional de Análisis Genómico, Parc Científic de Barcelona, Torre I Baldiri Reixac, 4, 08028, Barcelona, Spain.
| | - Marta Gut
- Centro Nacional de Análisis Genómico, Parc Científic de Barcelona, Torre I Baldiri Reixac, 4, 08028, Barcelona, Spain.
| | - Ivo Glynne Gut
- Centro Nacional de Análisis Genómico, Parc Científic de Barcelona, Torre I Baldiri Reixac, 4, 08028, Barcelona, Spain.
| | - Ottmar Distl
- Institute for Animal Breeding and Genetics, University of Veterinary Medicine Hannover, Bünteweg 17p, 30559, Hannover, Germany.
| |
Collapse
|
50
|
Inference of Super-exponential Human Population Growth via Efficient Computation of the Site Frequency Spectrum for Generalized Models. Genetics 2015; 202:235-45. [PMID: 26450922 PMCID: PMC4701087 DOI: 10.1534/genetics.115.180570] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2015] [Accepted: 09/28/2015] [Indexed: 01/08/2023] Open
Abstract
The site frequency spectrum (SFS) and other genetic summary statistics are at the heart of many population genetic studies. Previous studies have shown that human populations have undergone a recent epoch of fast growth in effective population size. These studies assumed that growth is exponential, and the ensuing models leave an excess amount of extremely rare variants. This suggests that human populations might have experienced a recent growth with speed faster than exponential. Recent studies have introduced a generalized growth model where the growth speed can be faster or slower than exponential. However, only simulation approaches were available for obtaining summary statistics under such generalized models. In this study, we provide expressions to accurately and efficiently evaluate the SFS and other summary statistics under generalized models, which we further implement in a publicly available software. Investigating the power to infer deviation of growth from being exponential, we observed that adequate sample sizes facilitate accurate inference; e.g., a sample of 3000 individuals with the amount of data expected from exome sequencing allows observing and accurately estimating growth with speed deviating by ≥10% from that of exponential. Applying our inference framework to data from the NHLBI Exome Sequencing Project, we found that a model with a generalized growth epoch fits the observed SFS significantly better than the equivalent model with exponential growth (P-value =3.85×10−6). The estimated growth speed significantly deviates from exponential (P-value ≪10−12), with the best-fit estimate being of growth speed 12% faster than exponential.
Collapse
|