1
|
Déru V, Tiezzi F, VanRaden PM, Lozada-Soto EA, Toghiani S, Maltecca C. Imputation accuracy from low- to medium-density SNP chips for US crossbred dairy cattle. J Dairy Sci 2024; 107:398-411. [PMID: 37641298 DOI: 10.3168/jds.2023-23250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Accepted: 06/16/2023] [Indexed: 08/31/2023]
Abstract
This study aimed at evaluating the quality of imputation accuracy (IA) by marker (IAm) and by individual (IAi) in US crossbred dairy cattle. Holstein × Jersey crossbreds were used to evaluate IA from a low- (7K) to a medium-density (50K) SNP chip. Crossbred animals, as well as their sires (53), dams (77), and maternal grandsires (63), were all genotyped with a 78K SNP chip. Seven different scenarios of reference populations were tested, in which some scenarios used different family relationships and others added random unrelated purebred and crossbred individuals to those different family relationship scenarios. The same scenarios were tested on Holstein and Jersey purebred animals to compare these outcomes against those attained in crossbred animals. The genotype imputation was performed with findhap (version 4) software (VanRaden, 2015). There were no significant differences in IA results depending on whether the sire of imputed individuals was Holstein and the dam was Jersey, or vice versa. The IA increased significantly with the addition of related individuals in the reference population, from 86.70 ± 0.06% when only sires or dams were included in the reference population to 90.09 ± 0.06% when sire (S), dam (D), and maternal grandsire genomic data were combined in the reference population. In all scenarios including related individuals in the reference population, IAm and IAi were significantly superior in purebred Jersey and Holstein animals than in crossbreds, ranging from 90.75 ± 0.06 to 94.02 ± 0.06%, and from 90.88 ± 0.11 to 94.04 ± 0.10%, respectively. Additionally, a scenario called SPB+DLD(where PB indicates purebread and LD indicates low density), similar to the genomic evaluations performed on US crossbred dairy, was tested. In this scenario, the information from the 5 evaluated breeds (Ayrshire, Brown Swiss, Guernsey, Holstein, and Jersey) genotyped with a 50K SNP chip and genomic information from the dams genotyped with a 7K SNP chip were combined in the reference population, and the IAm and IAi were 80.87 ± 0.06% and 80.85 ± 0.08%, respectively. Adding randomly nonrelated genotyped individuals in the reference population reduced IA for both purebred and crossbred cows, except for scenario SPB+DLD, where adding crossbreds to the reference population increased IA values. Our findings demonstrate that IA for US Holstein × Jersey crossbred ranged from 85 to 90%, and emphasize the significance of designing and defining the reference population for improved IA.
Collapse
Affiliation(s)
- Vanille Déru
- Department of Animal Science, North Carolina State University, Raleigh, NC 27607.
| | - Francesco Tiezzi
- Department of Agriculture, Food, Environment and Forestry, University of Florence, Florence, 50144, Italy
| | - Paul M VanRaden
- USDA, Agricultural Research Service, Animal Genomics and Improvement Laboratory, Beltsville, MD 20705-2350
| | | | - Sajjad Toghiani
- USDA, Agricultural Research Service, Animal Genomics and Improvement Laboratory, Beltsville, MD 20705-2350
| | - Christian Maltecca
- Department of Animal Science, North Carolina State University, Raleigh, NC 27607
| |
Collapse
|
2
|
See GM, Fix JS, Schwab CR, Spangler ML. Imputation of non-genotyped F1 dams to improve genetic gain in swine crossbreeding programs. J Anim Sci 2022; 100:6572187. [PMID: 35451025 PMCID: PMC9126202 DOI: 10.1093/jas/skac148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Accepted: 04/20/2022] [Indexed: 11/12/2022] Open
Abstract
This study investigated using imputed genotypes from non-genotyped animals which were not in the pedigree for the purpose of genetic selection and improving genetic gain for economically relevant traits. Simulations were used to mimic a 3-breed crossbreeding system that resembled a modern swine breeding scheme. The simulation consisted of three purebred (PB) breeds A, B, and C each with 25 and 425 mating males and females, respectively. Males from A and females from B were crossed to produce AB females (n = 1,000), which were crossed with males from C to produce crossbreds (CB; n = 10,000). The genome consisted of three chromosomes with 300 quantitative trait loci and ~9,000 markers. Lowly heritable reproductive traits were simulated for A, B, and AB (h2 = 0.2, 0.2, and 0.15, respectively), whereas a moderately heritable carcass trait was simulated for C (h2 = 0.4). Genetic correlations between reproductive traits in A, B, and AB were moderate (rg = 0.65). The goal trait of the breeding program was AB performance. Selection was practiced for four generations where AB and CB animals were first produced in generations 1 and 2, respectively. Non-genotyped AB dams were imputed using FImpute beginning in generation 2. Genotypes of PB and CB were used for imputation. Imputation strategies differed by three factors: 1) AB progeny genotyped per generation (2, 3, 4, or 6), 2) known or unknown mates of AB dams, and 3) genotyping rate of females from breeds A and B (0% or 100%). PB selection candidates from A and B were selected using estimated breeding values for AB performance, whereas candidates from C were selected by phenotype. Response to selection using imputed genotypes of non-genotyped animals was then compared to the scenarios where true AB genotypes (trueGeno) or no AB genotypes/phenotypes (noGeno) were used in genetic evaluations. The simulation was replicated 20 times. The average increase in genotype concordance between unknown and known sire imputation strategies was 0.22. Genotype concordance increased as the number of genotyped CB increased with little additional gain beyond 9 progeny. When mates of AB were known and more than 4 progeny were genotyped per generation, the phenotypic response in AB did not differ (P > 0.05) from trueGeno yet was greater (P < 0.05) than noGeno. Imputed genotypes of non-genotyped animals can be used to increase performance when 4 or more progeny are genotyped and sire pedigrees of CB animals are known.
Collapse
Affiliation(s)
- Garrett M See
- Department of Animal Science, University of Nebraska - Lincoln, Lincoln, NE 68588, USA
| | | | | | - Matthew L Spangler
- Department of Animal Science, University of Nebraska - Lincoln, Lincoln, NE 68588, USA
| |
Collapse
|
3
|
Campos GS, Cardoso FF, Gomes CCG, Domingues R, de Almeida Regitano LC, de Sena Oliveira MC, de Oliveira HN, Carvalheiro R, Albuquerque LG, Miller S, Misztal I, Lourenco D. Development of genomic predictions for Angus cattle in Brazil incorporating genotypes from related American sires. J Anim Sci 2022; 100:6507787. [PMID: 35031806 PMCID: PMC8867558 DOI: 10.1093/jas/skac009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Accepted: 01/12/2022] [Indexed: 11/24/2022] Open
Abstract
Genomic prediction has become the new standard for genetic improvement programs, and currently, there is a desire to implement this technology for the evaluation of Angus cattle in Brazil. Thus, the main objective of this study was to assess the feasibility of evaluating young Brazilian Angus (BA) bulls and heifers for 12 routinely recorded traits using single-step genomic BLUP (ssGBLUP) with and without genotypes from American Angus (AA) sires. The second objective was to obtain estimates of effective population size (Ne) and linkage disequilibrium (LD) in the Brazilian Angus population. The dataset contained phenotypic information for up to 277,661 animals belonging to the Promebo breeding program, pedigree for 362,900, of which 1,386 were genotyped for 50k, 77k, and 150k single nucleotide polymorphism (SNP) panels. After imputation and quality control, 61,666 SNPs were available for the analyses. In addition, genotypes from 332 American Angus (AA) sires widely used in Brazil were retrieved from the AA Association database to be used for genomic predictions. Bivariate animal models were used to estimate variance components, traditional EBV, and genomic EBV (GEBV). Validation was carried out with the linear regression method (LR) using young-genotyped animals born between 2013 and 2015 without phenotypes in the reduced dataset and with records in the complete dataset. Validation animals were further split into progeny of BA and AA sires to evaluate if their progenies would benefit by including genotypes from AA sires. The Ne was 254 based on pedigree and 197 based on LD, and the average LD (±SD) and distance between adjacent single nucleotide polymorphisms (SNPs) across all chromosomes were 0.27 (±0.27) and 40743.68 bp, respectively. Prediction accuracies with ssGBLUP outperformed BLUP for all traits, improving accuracies by, on average, 16% for BA young bulls and heifers. The GEBV prediction accuracies ranged from 0.37 (total maternal for weaning weight and tick count) to 0.54 (yearling precocity) across all traits, and dispersion (LR coefficients) fluctuated between 0.92 and 1.06. Inclusion of genotyped sires from the AA improved GEBV accuracies by 2%, on average, compared to using only the BA reference population. Our study indicated that genomic information could help us to improve GEBV accuracies and hence genetic progress in the Brazilian Angus population. The inclusion of genotypes from American Angus sires heavily used in Brazil just marginally increased the GEBV accuracies for selection candidates. There was a desire to implement genomic selection for Angus cattle in Brazil since the technology has been proved to increase genetic gain in animal breeding programs. Single-step genomic best linear unbiased prediction (ssGBLUP), which simultaneously combines pedigree and genomic information, was used to estimate individuals’ genomic breeding values (GEBV) or genetic merit. Genomic selection can accelerate genetic progress by increasing accuracy, especially in young animals without progeny. The accuracy of GEBV can also be improved by combing data from other countries to increase the reference population (i.e., genotyped and phenotyped animals) in small, genotyped populations. Thus, the main objective of this study was to evaluate the accuracy of GEBV for young Brazilian Angus (BA) bulls and heifers with ssGBLUP, including or not the genotypes from American Angus sires. The accuracies with ssGBLUP were higher than those from traditional BLUP (EBV calculated from pedigree), improving accuracies by, on average, 16% for young bulls and heifers. Including genotypes from American Angus sires heavily used in Brazil just marginally increased the GEBV accuracies for selection candidates.
Collapse
Affiliation(s)
- Gabriel Soares Campos
- Department of Animal and Dairy Science, University of Georgia, 30602, Athens, GA, USA
| | | | | | | | | | | | - Henrique Nunes de Oliveira
- Departamento de Zootecnia, Faculdade de Ciências Agrárias e Veterinárias, Universidade Estadual Paulista, 14884-900, Jaboticabal, SP, Brazil
| | - Roberto Carvalheiro
- Departamento de Zootecnia, Faculdade de Ciências Agrárias e Veterinárias, Universidade Estadual Paulista, 14884-900, Jaboticabal, SP, Brazil
| | - Lucia Galvão Albuquerque
- Departamento de Zootecnia, Faculdade de Ciências Agrárias e Veterinárias, Universidade Estadual Paulista, 14884-900, Jaboticabal, SP, Brazil
| | | | - Ignacy Misztal
- Department of Animal and Dairy Science, University of Georgia, 30602, Athens, GA, USA
| | - Daniela Lourenco
- Department of Animal and Dairy Science, University of Georgia, 30602, Athens, GA, USA
| |
Collapse
|
4
|
Lashmar SF, Berry DP, Pierneef R, Muchadeyi FC, Visser C. Assessing single-nucleotide polymorphism selection methods for the development of a low-density panel optimized for imputation in South African Drakensberger beef cattle. J Anim Sci 2021; 99:6226920. [PMID: 33860324 DOI: 10.1093/jas/skab118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Accepted: 04/14/2021] [Indexed: 11/13/2022] Open
Abstract
A major obstacle in applying genomic selection (GS) to uniquely adapted local breeds in less-developed countries has been the cost of genotyping at high densities of single-nucleotide polymorphisms (SNP). Cost reduction can be achieved by imputing genotypes from lower to higher densities. Locally adapted breeds tend to be admixed and exhibit a high degree of genomic heterogeneity thus necessitating the optimization of SNP selection for downstream imputation. The aim of this study was to quantify the achievable imputation accuracy for a sample of 1,135 South African (SA) Drakensberger cattle using several custom-derived lower-density panels varying in both SNP density and how the SNP were selected. From a pool of 120,608 genotyped SNP, subsets of SNP were chosen (1) at random, (2) with even genomic dispersion, (3) by maximizing the mean minor allele frequency (MAF), (4) using a combined score of MAF and linkage disequilibrium (LD), (5) using a partitioning-around-medoids (PAM) algorithm, and finally (6) using a hierarchical LD-based clustering algorithm. Imputation accuracy to higher density improved as SNP density increased; animal-wise imputation accuracy defined as the within-animal correlation between the imputed and actual alleles ranged from 0.625 to 0.990 when 2,500 randomly selected SNP were chosen vs. a range of 0.918 to 0.999 when 50,000 randomly selected SNP were used. At a panel density of 10,000 SNP, the mean (standard deviation) animal-wise allele concordance rate was 0.976 (0.018) vs. 0.982 (0.014) when the worst (i.e., random) as opposed to the best (i.e., combination of MAF and LD) SNP selection strategy was employed. A difference of 0.071 units was observed between the mean correlation-based accuracy of imputed SNP categorized as low (0.01 < MAF ≤ 0.1) vs. high MAF (0.4 < MAF ≤ 0.5). Greater mean imputation accuracy was achieved for SNP located on autosomal extremes when these regions were populated with more SNP. The presented results suggested that genotype imputation can be a practical cost-saving strategy for indigenous breeds such as the SA Drakensberger. Based on the results, a genotyping panel consisting of ~10,000 SNP selected based on a combination of MAF and LD would suffice in achieving a <3% imputation error rate for a breed characterized by genomic admixture on the condition that these SNP are selected based on breed-specific selection criteria.
Collapse
Affiliation(s)
- Simon F Lashmar
- Department of Animal Sciences, University of Pretoria, Private Bag X20, Hatfield 0028, South Africa
| | - Donagh P Berry
- Department of Animal Sciences, University of Pretoria, Private Bag X20, Hatfield 0028, South Africa.,Animal and Grassland Research and Innovation Centre, Teagasc, Moorepark, Fermoy, Co. Cork, Ireland
| | - Rian Pierneef
- Biotechnology Platform, Agricultural Research Council, Private Bag X5, Onderstepoort 0110, South Africa
| | - Farai C Muchadeyi
- Biotechnology Platform, Agricultural Research Council, Private Bag X5, Onderstepoort 0110, South Africa
| | - Carina Visser
- Department of Animal Sciences, University of Pretoria, Private Bag X20, Hatfield 0028, South Africa
| |
Collapse
|
5
|
Hou L, Liang W, Xu G, Huang B, Zhang X, Hu CY, Wang C. Accuracy of genomic prediction using mixed low-density marker panels. Anim Prod Sci 2020. [DOI: 10.1071/an18503] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Low-density single-nucleotide polymorphism (LD-SNP) panel is one effective way to reduce the cost of genomic selection in animal breeding. The present study proposes a new type of LD-SNP panel called mixed low-density (MLD) panel, which considers SNPs with a substantial effect estimated by Bayes method B (BayesB) from many traits and evenly spaced distribution simultaneously. Simulated and real data were used to compare the imputation accuracy and genomic-selection accuracy of two types of LD-SNP panels. The result of genotyping imputation for simulated data showed that the number of quantitative trait loci (QTL) had limited influence on the imputation accuracy only for MLD panels. Evenly spaced (ELD) panel was not affected by QTL. For real data, ELD performed slightly better than did MLD when panel contained 500 and 1000 SNP. However, this advantage vanished quickly as the density increased. The result of genomic selection for simulated data using BayesB showed that MLD performed much better than did ELD when QTL was 100. For real data, MLD also outperformed ELD in growth and carcass traits when using BayesB. In conclusion, the MLD strategy is superior to ELD in genomic selection under most situations.
Collapse
|
6
|
Herry F, Hérault F, Picard Druet D, Varenne A, Burlot T, Le Roy P, Allais S. Design of low density SNP chips for genotype imputation in layer chicken. BMC Genet 2018; 19:108. [PMID: 30514201 PMCID: PMC6278067 DOI: 10.1186/s12863-018-0695-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2018] [Accepted: 11/14/2018] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND The main goal of selection is to achieve genetic gain for a population by choosing the best breeders among a set of selection candidates. Since 2013, the use of a high density genotyping chip (600K Affymetrix® Axiom® HD genotyping array) for chicken has enabled the implementation of genomic selection in layer and broiler breeding, but the genotyping costs remain high for a routine use on a large number of selection candidates. It has thus been deemed interesting to develop a low density genotyping chip that would induce lower costs. In this perspective, various simulation studies have been conducted to find the best way to select a set of SNPs for low density genotyping of two laying hen lines. RESULTS To design low density SNP chips, two methodologies, based on equidistance (EQ) or on linkage disequilibrium (LD) were compared. Imputation accuracy was assessed as the mean correlation between true and imputed genotypes. The results showed correlations more sensitive to false imputation of SNPs having low Minor Allele Frequency (MAF) when the EQ methodology was used. An increase in imputation accuracy was obtained when SNP density was increased, either through an increase in the number of selected windows on a chromosome or through the rise of the LD threshold. Moreover, the results varied depending on the type of chromosome (macro or micro-chromosome). The LD methodology enabled to optimize the number of SNPs, by reducing the SNP density on macro-chromosomes and by increasing it on micro-chromosomes. Imputation accuracy also increased when the size of the reference population was increased. Conversely, imputation accuracy decreased when the degree of kinship between reference and candidate populations was reduced. Finally, adding selection candidates' dams in the reference population, in addition to their sire, enabled to get better imputation results. CONCLUSIONS Whichever the SNP chip, the methodology, and the scenario studied, highly accurate imputations were obtained, with mean correlations higher than 0.83. The key point to achieve good imputation results is to take into account chicken lines' LD when designing a low density SNP chip, and to include the candidates' direct parents in the reference population.
Collapse
Affiliation(s)
- Florian Herry
- NOVOGEN, 5 rue des Compagnons, Secteur du Vau Ballier, 22960, Plédran, France.,PEGASE, INRA, Agrocampus Ouest, 16 Le Clos, 35590, Saint-Gilles, France
| | - Frédéric Hérault
- PEGASE, INRA, Agrocampus Ouest, 16 Le Clos, 35590, Saint-Gilles, France
| | | | - Amandine Varenne
- NOVOGEN, 5 rue des Compagnons, Secteur du Vau Ballier, 22960, Plédran, France
| | - Thierry Burlot
- NOVOGEN, 5 rue des Compagnons, Secteur du Vau Ballier, 22960, Plédran, France
| | - Pascale Le Roy
- PEGASE, INRA, Agrocampus Ouest, 16 Le Clos, 35590, Saint-Gilles, France
| | - Sophie Allais
- PEGASE, INRA, Agrocampus Ouest, 16 Le Clos, 35590, Saint-Gilles, France.
| |
Collapse
|
7
|
Aliloo H, Mrode R, Okeyo AM, Ni G, Goddard ME, Gibson JP. The feasibility of using low-density marker panels for genotype imputation and genomic prediction of crossbred dairy cattle of East Africa. J Dairy Sci 2018; 101:9108-9127. [PMID: 30077450 DOI: 10.3168/jds.2018-14621] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2018] [Accepted: 05/26/2018] [Indexed: 11/19/2022]
Abstract
Cost-effective high-density (HD) genotypes of livestock species can be obtained by genotyping a proportion of the population using a HD panel and the remainder using a cheaper low-density panel, and then imputing the missing genotypes that are not directly assayed in the low-density panel. The efficacy of genotype imputation can largely be affected by the structure and history of the specific target population and it should be checked before incorporating imputation in routine genotyping practices. Here, we investigated the efficacy of imputation in crossbred dairy cattle populations of East Africa using 4 different commercial single nucleotide polymorphisms (SNP) panels, 3 reference populations, and 3 imputation algorithms. We found that Minimac and a reference population, which included a mixture of crossbred and ancestral purebred animals, provided the highest imputation accuracy compared with other scenarios of imputation. The accuracies of imputation, measured as the correlation between real and imputed genotypes averaged across SNP, were around 0.76 and 0.94 for 7K and 40K SNP, respectively, when imputed up to a 770K panel. We also presented a method to maximize the imputation accuracy of low-density panels, which relies on the pairwise (co)variances between SNP and the minor allele frequency of SNP. The performance of the developed method was tested in a 5-fold cross-validation process where various densities of SNP were selected using the (co)variance method and also by alternative SNP selection methods and then imputed up to the HD panel. The (co)variance method provided the highest imputation accuracies at almost all marker densities, with accuracies being up to 0.19 higher than the random selection of SNP. The accuracies of imputation from 7K and 40K panels selected using the (co)variance method were around 0.80 and 0.94, respectively. The presented method also achieved higher accuracy of genomic prediction at lower densities of selected SNP. The squared correlation between genomic breeding values estimated using imputed genotypes and those from the real 770K HD panel was 0.95 when the accuracy of imputation was 0.64. The presented method for SNP selection is straightforward in its application and can ensure high accuracies in genotype imputation of crossbred dairy populations in East Africa.
Collapse
Affiliation(s)
- H Aliloo
- School of Environmental and Rural Science, University of New England, Armidale, NSW 2350, Australia.
| | - R Mrode
- International Livestock Research Institute (ILRI), PO Box 30709, Nairobi, Kenya; Scotland's Rural College, Easter Bush, Midlothian EH25 9RG, Scotland, United Kingdom
| | - A M Okeyo
- International Livestock Research Institute (ILRI), PO Box 30709, Nairobi, Kenya
| | - G Ni
- School of Environmental and Rural Science, University of New England, Armidale, NSW 2350, Australia
| | - M E Goddard
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, 5 Ring Road, Bundoora, VIC 3083, Australia; Faculty of Veterinary and Agricultural Sciences, Department of Agriculture and Food Systems, The University of Melbourne, Parkville, VIC 3010, Australia
| | - J P Gibson
- School of Environmental and Rural Science, University of New England, Armidale, NSW 2350, Australia
| |
Collapse
|
8
|
Lencz T, Yu J, Palmer C, Carmi S, Ben-Avraham D, Barzilai N, Bressman S, Darvasi A, Cho JH, Clark LN, Gümüş ZH, Joseph V, Klein R, Lipkin S, Offit K, Ostrer H, Ozelius LJ, Peter I, Atzmon G, Pe'er I. High-depth whole genome sequencing of an Ashkenazi Jewish reference panel: enhancing sensitivity, accuracy, and imputation. Hum Genet 2018; 137:343-55. [PMID: 29705978 DOI: 10.1007/s00439-018-1886-z] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2018] [Accepted: 04/21/2018] [Indexed: 12/31/2022]
Abstract
While increasingly large reference panels for genome-wide imputation have been recently made available, the degree to which imputation accuracy can be enhanced by population-specific reference panels remains an open question. Here, we sequenced at full-depth (≥ 30×), across two platforms (Illumina X Ten and Complete Genomics, Inc.), a moderately large (n = 738) cohort of samples drawn from the Ashkenazi Jewish population. We developed a series of quality control steps to optimize sensitivity, specificity, and comprehensiveness of variant calls in the reference panel, and then tested the accuracy of imputation against target cohorts drawn from the same population. Quality control (QC) thresholds for the Illumina X Ten platform were identified that permitted highly accurate calling of single nucleotide variants across 94% of the genome. QC procedures also identified numerous regions that are poorly mapped using current reference or alternate assemblies. After stringent QC, the population-specific reference panel produced more accurate and comprehensive imputation results relative to publicly available, large cosmopolitan reference panels, especially in the range of rare variants that may be most critical to further progress in mapping of complex phenotypes. The population-specific reference panel also permitted enhanced filtering of clinically irrelevant variants from personal genomes.
Collapse
|
9
|
Larmer SG, Sargolzaei M, Brito LF, Ventura RV, Schenkel FS. Novel methods for genotype imputation to whole-genome sequence and a simple linear model to predict imputation accuracy. BMC Genet 2017; 18:120. [PMID: 29281958 PMCID: PMC5746022 DOI: 10.1186/s12863-017-0588-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2017] [Accepted: 12/15/2017] [Indexed: 11/10/2022] Open
Abstract
Background Accurate imputation plays a major role in genomic studies of livestock industries, where the number of genotyped or sequenced animals is limited by costs. This study explored methods to create an ideal reference population for imputation to Next Generation Sequencing data in cattle. Methods Methods for clustering of animals for imputation were explored, using 1000 Bull Genomes Project sequence data on 1146 animals from a variety of beef and dairy breeds. Imputation from 50 K to 777 K was first carried out to choose an ideal clustering method, using ADMIXTURE or PLINK clustering algorithms with either genotypes or reconstructed haplotypes. Results Due to efficiency, accuracy and ease of use, clustering with PLINK using haplotypes as quasi-genotypes was chosen as the most advantageous grouping method. It was found that using a clustered population slightly decreased computing time, while maintaining accuracy across the population. Although overall accuracy remained the same, a slight increase in accuracy was observed for groups of animals in some breeds (primarily purebred beef cattle from breeds with fewer sequenced animals) and for other groups, primarily crossbreed animals, a slight decrease in accuracy was observed. However, it was noted that some animals in each breed were poorly imputed across all methods. When imputed sequences were included in the reference population to aid imputation of poorly imputed animals, a small increase in overall accuracy was observed for nearly every individual in the population. Two models were created to predict imputation accuracy, a complete model using all information available including Euclidean distances from genotypes and haplotypes, pedigree information, and clustering groups and a simple model using only breed and an Euclidean distance matrix as predictors. Both models were successful in predicting imputation accuracy, with correlations between predicted and true imputation accuracy as measured by concordance rate of 0.87 and 0.83, respectively. Conclusions A clustering methodology can be very useful to subgroup cattle for efficient genotype imputation. In addition, accuracy of genotype imputation from medium to high-density Single Nucleotide Polymorphisms (SNP) chip panels to whole-genome sequence can be predicted well using a simple linear model defined in this study.
Collapse
Affiliation(s)
- Steven G Larmer
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, 50 Stone Road East, Guelph, ON, N1G 2W1, Canada. .,The Semex Alliance, 5653 Highway 6 North, Guelph, ON, N1H 6J2, Canada.
| | - Mehdi Sargolzaei
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, 50 Stone Road East, Guelph, ON, N1G 2W1, Canada.,The Semex Alliance, 5653 Highway 6 North, Guelph, ON, N1H 6J2, Canada
| | - Luiz F Brito
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, 50 Stone Road East, Guelph, ON, N1G 2W1, Canada
| | - Ricardo V Ventura
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, 50 Stone Road East, Guelph, ON, N1G 2W1, Canada.,Bringing Intelligence Opportunities, 294 Mill St. East, Elora, ON, N0B 1S0, Canada
| | - Flávio S Schenkel
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, 50 Stone Road East, Guelph, ON, N1G 2W1, Canada
| |
Collapse
|
10
|
Oliveira Júnior GA, Chud TCS, Ventura RV, Garrick DJ, Cole JB, Munari DP, Ferraz JBS, Mullart E, DeNise S, Smith S, da Silva MVGB. Genotype imputation in a tropical crossbred dairy cattle population. J Dairy Sci 2017; 100:9623-9634. [PMID: 28987572 DOI: 10.3168/jds.2017-12732] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2017] [Accepted: 08/16/2017] [Indexed: 11/19/2022]
Abstract
The objective of this study was to investigate different strategies for genotype imputation in a population of crossbred Girolando (Gyr × Holstein) dairy cattle. The data set consisted of 478 Girolando, 583 Gyr, and 1,198 Holstein sires genotyped at high density with the Illumina BovineHD (Illumina, San Diego, CA) panel, which includes ∼777K markers. The accuracy of imputation from low (20K) and medium densities (50K and 70K) to the HD panel density and from low to 50K density were investigated. Seven scenarios using different reference populations (RPop) considering Girolando, Gyr, and Holstein breeds separately or combinations of animals of these breeds were tested for imputing genotypes of 166 randomly chosen Girolando animals. The population genotype imputation were performed using FImpute. Imputation accuracy was measured as the correlation between observed and imputed genotypes (CORR) and also as the proportion of genotypes that were imputed correctly (CR). This is the first paper on imputation accuracy in a Girolando population. The sample-specific imputation accuracies ranged from 0.38 to 0.97 (CORR) and from 0.49 to 0.96 (CR) imputing from low and medium densities to HD, and 0.41 to 0.95 (CORR) and from 0.50 to 0.94 (CR) for imputation from 20K to 50K. The CORRanim exceeded 0.96 (for 50K and 70K panels) when only Girolando animals were included in RPop (S1). We found smaller CORRanim when Gyr (S2) was used instead of Holstein (S3) as RPop. The same behavior was observed between S4 (Gyr + Girolando) and S5 (Holstein + Girolando) because the target animals were more related to the Holstein population than to the Gyr population. The highest imputation accuracies were observed for scenarios including Girolando animals in the reference population, whereas using only Gyr animals resulted in low imputation accuracies, suggesting that the haplotypes segregating in the Girolando population had a greater effect on accuracy than the purebred haplotypes. All chromosomes had similar imputation accuracies (CORRsnp) within each scenario. Crossbred animals (Girolando) must be included in the reference population to provide the best imputation accuracies.
Collapse
Affiliation(s)
- Gerson A Oliveira Júnior
- Departamento de Medicina Veterinária, Universidade de São Paulo (USP), Faculdade de Zootecnia e Engenharia de Alimentos, Pirassununga, SP, 13635-900, Brazil
| | - Tatiane C S Chud
- Departamento de Ciências Exatas, Universidade Estadual Paulista (Unesp), Faculdade de Ciências Agrárias e Veterinárias, Jaboticabal, SP, 14884-900, Brazil
| | - Ricardo V Ventura
- Beef Improvement Opportunities, Guelph, ON N1K1E5, Canada; Centre for Genetic Improvement of Livestock, University of Guelph, Guelph, ON N1G2W1, Canada
| | - Dorian J Garrick
- Department of Animal Science, Iowa State University, Ames 50011-3150
| | - John B Cole
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD, 20705-2350
| | - Danísio P Munari
- Departamento de Ciências Exatas, Universidade Estadual Paulista (Unesp), Faculdade de Ciências Agrárias e Veterinárias, Jaboticabal, SP, 14884-900, Brazil
| | - José B S Ferraz
- Departamento de Medicina Veterinária, Universidade de São Paulo (USP), Faculdade de Zootecnia e Engenharia de Alimentos, Pirassununga, SP, 13635-900, Brazil
| | | | | | | | | |
Collapse
|
11
|
Judge MM, Purfield DC, Sleator RD, Berry DP. The impact of multi-generational genotype imputation strategies on imputation accuracy and subsequent genomic predictions. J Anim Sci 2017; 95:1489-1501. [PMID: 28464096 DOI: 10.2527/jas.2016.1212] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The objective of the present study was to quantify, using simulations, the impact of successive generations of genotype imputation on genomic predictions. The impact of using a small reference population of true genotypes versus a larger reference population of imputed genotypes on the accuracy of genomic predictions was also investigated. After construction of a founder population, high-density (HD) genotypes ( = 43,500 single nucleotide polymorphisms, SNP) were simulated across 25 generations ( = 46,800 per generation); a low-density genotype panel ( = 3,000 SNP) was developed from these HD genotypes, which was then used to impute genotypes using 7 alternative imputation strategies. Both low (0.03) and moderately (0.35) heritable phenotypes were simulated. Direct genomic values (DGV) were estimated using imputed genotypes from the investigated scenarios and the accuracy of predicting the simulated true breeding values (TBV) were expressed relative to the accuracy when the true genotypes were used. Mean allele concordance rate and the rate of change in mean allele concordance per generation differed between the imputation strategies investigated. Imputation was most accurate when the true HD genotypes of sires and 50% of the dams of the generation being imputed were included in the reference population; the average allele concordance rate for this scenario across generations was 0.9707. The strongest correlation between the TBV and DGV of the last generation was when the reference population included sequentially imputed HD genotypes of all previous generations, plus the true HD genotypes of all sires of the previous generations (0.987 as efficient as when the true genotypes were used in the reference population). With a moderate heritability, the correlation between the TBV and the DGV using a small reference population of accurate genotypes were, on average, 0.07 units stronger compared to DGV generated using a larger population of imputed genotypes. When the heritability was low, the accuracy of genomic predictions benefited from a larger reference population, even if SNP were imputed. The impact on the accuracy of genomic predictions from the accumulation of imputation errors across generations indicates the need to routinely generate HD genotypes on influential animals to reduce the accumulation of imputation errors over generations.
Collapse
|
12
|
|
13
|
Piccoli ML, Brito LF, Braccini J, Cardoso FF, Sargolzaei M, Schenkel FS. Genomic predictions for economically important traits in Brazilian Braford and Hereford beef cattle using true and imputed genotypes. BMC Genet 2017; 18:2. [PMID: 28100165 PMCID: PMC5241971 DOI: 10.1186/s12863-017-0475-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2016] [Accepted: 01/13/2017] [Indexed: 12/30/2022] Open
Abstract
Background Genomic selection (GS) has played an important role in cattle breeding programs. However, genotyping prices are still a challenge for implementation of GS in beef cattle and there is still a lack of information about the use of low-density Single Nucleotide Polymorphisms (SNP) chip panels for genomic predictions in breeds such as Brazilian Braford and Hereford. Therefore, this study investigated the effect of using imputed genotypes in the accuracy of genomic predictions for twenty economically important traits in Brazilian Braford and Hereford beef cattle. Various scenarios composed by different percentages of animals with imputed genotypes and different sizes of the training population were compared. De-regressed EBVs (estimated breeding values) were used as pseudo-phenotypes in a Genomic Best Linear Unbiased Prediction (GBLUP) model using two different mimicked panels derived from the 50 K (8 K and 15 K SNP panels), which were subsequently imputed to the 50 K panel. In addition, genomic prediction accuracies generated from a 777 K SNP (imputed from the 50 K SNP) were presented as another alternate scenario. Results The accuracy of genomic breeding values averaged over the twenty traits ranged from 0.38 to 0.40 across the different scenarios. The average losses in expected genomic estimated breeding values (GEBV) accuracy (accuracy obtained from the inverse of the mixed model equations) relative to the true 50 K genotypes ranged from −0.0007 to −0.0012 and from −0.0002 to −0.0005 when using the 50 K imputed from the 8 K or 15 K, respectively. When using the imputed 777 K panel the average losses in expected GEBV accuracy was −0.0021. The average gain in expected EBVs accuracy by including genomic information when compared to simple BLUP was between 0.02 and 0.03 across scenarios and traits. Conclusions The percentage of animals with imputed genotypes in the training population did not significantly influence the validation accuracy. However, the size of the training population played a major role in the accuracies of genomic predictions in this population. The losses in the expected accuracies of GEBV due to imputation of genotypes were lower when using the 50 K SNP chip panel imputed from the 15 K compared to the one imputed from the 8 K SNP chip panel. Electronic supplementary material The online version of this article (doi:10.1186/s12863-017-0475-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Mario L Piccoli
- Departamento de Zootecnia, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil. .,GenSys Consultores Associados S/S, Porto Alegre, Brazil. .,Centre for Genetic Improvement of Livestock, University of Guelph, Guelph, Canada.
| | - Luiz F Brito
- Centre for Genetic Improvement of Livestock, University of Guelph, Guelph, Canada
| | - José Braccini
- Departamento de Zootecnia, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil.,Conselho Nacional de Desenvolvimento Científico e Tecnológico, Brasília, Brazil
| | - Fernando F Cardoso
- Embrapa Pecuária Sul, Bagé, Brazil.,Conselho Nacional de Desenvolvimento Científico e Tecnológico, Brasília, Brazil
| | - Mehdi Sargolzaei
- Centre for Genetic Improvement of Livestock, University of Guelph, Guelph, Canada.,The Semex Alliance, Guelph, Canada
| | - Flávio S Schenkel
- Centre for Genetic Improvement of Livestock, University of Guelph, Guelph, Canada
| |
Collapse
|
14
|
Ventura RV, Miller SP, Dodds KG, Auvray B, Lee M, Bixley M, Clarke SM, McEwan JC. Assessing accuracy of imputation using different SNP panel densities in a multi-breed sheep population. Genet Sel Evol 2016; 48:71. [PMID: 27663120 PMCID: PMC5035503 DOI: 10.1186/s12711-016-0244-7] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2015] [Accepted: 08/31/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Genotype imputation is a key element of the implementation of genomic selection within the New Zealand sheep industry, but many factors can influence imputation accuracy. Our objective was to provide practical directions on the implementation of imputation strategies in a multi-breed sheep population genotyped with three single nucleotide polymorphism (SNP) panels: 5K, 50K and HD (600K SNPs). RESULTS Imputation from 5K to HD was slightly better (0.6 %) than imputation from 5K to 50K. Two-step imputation from 5K to 50K and then from 50K to HD outperformed direct imputation from 5K to HD. A slight loss in imputation accuracy was observed when a large fixed reference population was used compared to a smaller within-breed reference (including all 50K genotypes on animals from different breeds excluding those in the validation set i.e. to be imputed), but only for a few animals across all imputation scenarios from 5K to 50K. However, a major gain in imputation accuracy for a large proportion of animals (purebred and crossbred), justified the use of a fixed and large reference dataset for all situations. This study also investigated the loss in imputation accuracy specifically for SNPs located at the ends of each chromosome, and showed that only chromosome 26 had an overall imputation (5K to 50K) accuracy for 100 SNPs at each end higher than 60 % (r2). Most of the chromosomes displayed reduced imputation accuracy at least at one of their ends. Prediction of imputation accuracy based on the relatedness of low-density genotypes to those of the reference dataset, before imputation (without running an imputation software) was also investigated. FIMPUTE V2.2 outperformed BEAGLE 3.3.2 across all imputation scenarios. CONCLUSIONS Imputation accuracy in sheep breeds can be improved by following a set of recommendations on SNP panels, software, strategies of imputation (one- or two-step imputation), and choice of the animals to be genotyped using both high- and low-density SNP panels. We present a method that predicts imputation accuracy for individual animals at the low-density level, before running imputation, which can be used to restrict genomic prediction only to the animals that can be imputed with sufficient accuracy.
Collapse
Affiliation(s)
- Ricardo V Ventura
- Centre for Genetic Improvement of Livestock, University of Guelph, Guelph, ON, N1G2W1, Canada.,Beef Improvement Opportunities, Guelph, ON, N1K1E5, Canada
| | - Stephen P Miller
- Centre for Genetic Improvement of Livestock, University of Guelph, Guelph, ON, N1G2W1, Canada. .,Invermay Agricultural Centre, AgResearch Limited, Mosgiel, 9053, New Zealand.
| | - Ken G Dodds
- Invermay Agricultural Centre, AgResearch Limited, Mosgiel, 9053, New Zealand
| | - Benoit Auvray
- Department of Mathematics and Statistics, University of Otago, Dunedin, 9016, New Zealand
| | - Michael Lee
- Department of Mathematics and Statistics, University of Otago, Dunedin, 9016, New Zealand
| | - Matthew Bixley
- Invermay Agricultural Centre, AgResearch Limited, Mosgiel, 9053, New Zealand
| | - Shannon M Clarke
- Invermay Agricultural Centre, AgResearch Limited, Mosgiel, 9053, New Zealand
| | - John C McEwan
- Invermay Agricultural Centre, AgResearch Limited, Mosgiel, 9053, New Zealand
| |
Collapse
|
15
|
Sevillano CA, Vandenplas J, Bastiaansen JWM, Calus MPL. Empirical determination of breed-of-origin of alleles in three-breed cross pigs. Genet Sel Evol 2016; 48:55. [PMID: 27491547 PMCID: PMC4973529 DOI: 10.1186/s12711-016-0234-9] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2016] [Accepted: 07/27/2016] [Indexed: 01/01/2023] Open
Abstract
Background Although breeding programs for pigs and poultry aim at improving crossbred performance, they mainly use training populations that consist of purebred animals. For some traits, e.g. residual feed intake, the genetic correlation between purebred and crossbred performance is low and thus including crossbred animals in the training population is required. With crossbred animals, the effects of single nucleotide polymorphisms (SNPs) may be breed-specific because linkage disequilibrium patterns between a SNP and a quantitative trait locus (QTL), and allele frequencies and allele substitution effects of a QTL may differ between breeds. To estimate the breed-specific effects of alleles in a crossbred population, the breed-of-origin of alleles in crossbred animals must be known. This study was aimed at investigating the performance of an approach that assigns breed-of-origin of alleles in real data of three-breed cross pigs. Genotypic data were available for 14,187 purebred, 1354 F1, and 1723 three-breed cross pigs. Results On average, 93.0 % of the alleles of three-breed cross pigs were assigned a breed-of-origin without using pedigree information and 94.6 % with using pedigree information. The assignment percentage could be improved by allowing a percentage (fr) of the copies of a haplotype to be observed in a purebred population different from the assigned breed-of-origin. Changing fr from 0 to 20 %, increased assignment of breed-of-origin by 0.6 and 0.7 % when pedigree information was and was not used, respectively, which indicates the benefit of setting fr to 20 %. Conclusions Breed-of-origin of alleles of three-breed cross pigs can be derived empirically without the need for pedigree information, with 93.7 % of the alleles assigned a breed-of-origin. Pedigree information is useful to reduce computation time and can slightly increase the percentage of assignments. Knowledge on the breed-of-origin of alleles allows the use of models that implement breed-specific effects of SNP alleles in genomic prediction, with the aim of improving selection of purebred animals for crossbred offspring performance. Electronic supplementary material The online version of this article (doi:10.1186/s12711-016-0234-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Claudia A Sevillano
- Animal Breeding and Genomics Centre, Wageningen University, PO Box 338, 6700 AH, Wageningen, The Netherlands. .,Topigs Norsvin, PO Box 43, 6640 AA, Beuningen, The Netherlands.
| | - Jeremie Vandenplas
- Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, 6700 AH, Wageningen, The Netherlands
| | - John W M Bastiaansen
- Animal Breeding and Genomics Centre, Wageningen University, PO Box 338, 6700 AH, Wageningen, The Netherlands
| | - Mario P L Calus
- Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, 6700 AH, Wageningen, The Netherlands
| |
Collapse
|
16
|
Lu D, Akanno EC, Crowley JJ, Schenkel F, Li H, De Pauw M, Moore SS, Wang Z, Li C, Stothard P, Plastow G, Miller SP, Basarab JA. Accuracy of genomic predictions for feed efficiency traits of beef cattle using 50K and imputed HD genotypes1. J Anim Sci 2016; 94:1342-53. [DOI: 10.2527/jas.2015-0126] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- D. Lu
- Livestock Gentec, Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, Canada
- AgResearch, Invermay Agricultural Centre, Post Box 50034, Mosgiel 9053, New Zealand
| | - E. C. Akanno
- Livestock Gentec, Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, Canada
| | - J. J. Crowley
- Livestock Gentec, Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, Canada
- Canadian Beef Breeds Council, Calgary, AB T2E 7H7, Canada
| | - F. Schenkel
- Centre for Genetic Improvement of Livestock, Department of Animal and Poultry Sciences, University of Guelph, ON, Canada
| | - H. Li
- Centre for Genetic Improvement of Livestock, Department of Animal and Poultry Sciences, University of Guelph, ON, Canada
| | - M. De Pauw
- Livestock Gentec, Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, Canada
| | - S. S. Moore
- Livestock Gentec, Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, Canada
- Centre for Animal Science, Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St Lucia, Queensland, Australia
| | - Z. Wang
- Livestock Gentec, Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, Canada
| | - C. Li
- Livestock Gentec, Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, Canada
- Centre for Genetic Improvement of Livestock, Department of Animal and Poultry Sciences, University of Guelph, ON, Canada
- Lacombe Research Centre, Agriculture and Agri-Food Canada, 6000 C & E Trail, Lacombe, AB, Canada
| | - P. Stothard
- Livestock Gentec, Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, Canada
| | - G. Plastow
- Livestock Gentec, Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, Canada
| | - S. P. Miller
- Livestock Gentec, Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, Canada
- AgResearch, Invermay Agricultural Centre, Post Box 50034, Mosgiel 9053, New Zealand
- Centre for Genetic Improvement of Livestock, Department of Animal and Poultry Sciences, University of Guelph, ON, Canada
| | - J. A. Basarab
- Livestock Gentec, Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, Canada
- Lacombe Research Centre, Alberta Agriculture and Forestry, 6000 C & E Trail, Lacombe, AB, Canada
| |
Collapse
|
17
|
Jattawa D, Elzo MA, Koonawootrittriron S, Suwanasopee T. Imputation Accuracy from Low to Moderate Density Single Nucleotide Polymorphism Chips in a Thai Multibreed Dairy Cattle Population. Asian-Australas J Anim Sci 2016; 29:464-70. [PMID: 26949946 PMCID: PMC4782080 DOI: 10.5713/ajas.15.0291] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/02/2015] [Revised: 07/31/2015] [Accepted: 08/24/2015] [Indexed: 11/27/2022]
Abstract
The objective of this study was to investigate the accuracy of imputation from low density (LDC) to moderate density SNP chips (MDC) in a Thai Holstein-Other multibreed dairy cattle population. Dairy cattle with complete pedigree information (n = 1,244) from 145 dairy farms were genotyped with GeneSeek GGP20K (n = 570), GGP26K (n = 540) and GGP80K (n = 134) chips. After checking for single nucleotide polymorphism (SNP) quality, 17,779 SNP markers in common between the GGP20K, GGP26K, and GGP80K were used to represent MDC. Animals were divided into two groups, a reference group (n = 912) and a test group (n = 332). The SNP markers chosen for the test group were those located in positions corresponding to GeneSeek GGP9K (n = 7,652). The LDC to MDC genotype imputation was carried out using three different software packages, namely Beagle 3.3 (population-based algorithm), FImpute 2.2 (combined family- and population-based algorithms) and Findhap 4 (combined family- and population-based algorithms). Imputation accuracies within and across chromosomes were calculated as ratios of correctly imputed SNP markers to overall imputed SNP markers. Imputation accuracy for the three software packages ranged from 76.79% to 93.94%. FImpute had higher imputation accuracy (93.94%) than Findhap (84.64%) and Beagle (76.79%). Imputation accuracies were similar and consistent across chromosomes for FImpute, but not for Findhap and Beagle. Most chromosomes that showed either high (73%) or low (80%) imputation accuracies were the same chromosomes that had above and below average linkage disequilibrium (LD; defined here as the correlation between pairs of adjacent SNP within chromosomes less than or equal to 1 Mb apart). Results indicated that FImpute was more suitable than Findhap and Beagle for genotype imputation in this Thai multibreed population. Perhaps additional increments in imputation accuracy could be achieved by increasing the completeness of pedigree information.
Collapse
Affiliation(s)
| | - Mauricio A. Elzo
- Department of Animal Sciences, University of Florida, Gainesville, FL 32611-0910,
USA
| | | | | |
Collapse
|
18
|
Heidaritabar M, Calus MPL, Vereijken A, Groenen MAM, Bastiaansen JWM. Accuracy of imputation using the most common sires as reference population in layer chickens. BMC Genet 2015; 16:101. [PMID: 26282557 PMCID: PMC4539854 DOI: 10.1186/s12863-015-0253-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2014] [Accepted: 07/10/2015] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Genotype imputation has become a standard practice in modern genetic research to increase genome coverage and improve the accuracy of genomic selection (GS) and genome-wide association studies (GWAS). We assessed accuracies of imputing 60K genotype data from lower density single nucleotide polymorphism (SNP) panels using a small set of the most common sires in a population of 2140 white layer chickens. Several factors affecting imputation accuracy were investigated, including the size of the reference population, the level of the relationship between the reference and validation populations, and minor allele frequency (MAF) of the SNP being imputed. RESULTS The accuracy of imputation was assessed with different scenarios using 22 and 62 carefully selected reference animals (Ref(22) and Ref(62)). Animal-specific imputation accuracy corrected for gene content was moderate on average (~ 0.80) in most scenarios and low in the 3K to 60K scenario. Maximum average accuracies were 0.90 and 0.93 for the most favourable scenario for Ref(22) and Ref(62) respectively, when SNPs were masked independent of their MAF. SNPs with low MAF were more difficult to impute, and the larger reference population considerably improved the imputation accuracy for these rare SNPs. When Ref(22) was used for imputation, the average imputation accuracy decreased by 0.04 when validation population was two instead of one generation away from the reference and increased again by 0.05 when validation was three generations away. Selecting the reference animals from the most common sires, compared with random animals from the population, considerably improved imputation accuracy for low MAF SNPs, but gave only limited improvement for other MAF classes. The allelic R(2) measure from Beagle software was found to be a good predictor of imputation reliability (correlation ~ 0.8) when the density of validation panel was very low (3K) and the MAF of the SNP and the size of the reference population were not extremely small. CONCLUSIONS Even with a very small number of animals in the reference population, reasonable accuracy of imputation can be achieved. Selecting a set of the most common sires, rather than selecting random animals for the reference population, improves the imputation accuracy of rare alleles, which may be a benefit when imputing with whole genome re-sequencing data.
Collapse
Affiliation(s)
- Marzieh Heidaritabar
- Animal Breeding and Genomics Centre, Wageningen University, P.O. Box 338, 6700 AH, Wageningen, the Netherlands.
| | - Mario P L Calus
- Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, P.O. Box 338, 6700 AH, Wageningen, the Netherlands.
| | - Addie Vereijken
- Hendrix Genetics Research, Technology and Services B.V., P.O. Box 114, 5830 AC, Boxmeer, the Netherlands.
| | - Martien A M Groenen
- Animal Breeding and Genomics Centre, Wageningen University, P.O. Box 338, 6700 AH, Wageningen, the Netherlands.
| | - John W M Bastiaansen
- Animal Breeding and Genomics Centre, Wageningen University, P.O. Box 338, 6700 AH, Wageningen, the Netherlands.
| |
Collapse
|
19
|
Chud TCS, Ventura RV, Schenkel FS, Carvalheiro R, Buzanskas ME, Rosa JO, Mudadu MDA, da Silva MVGB, Mokry FB, Marcondes CR, Regitano LCA, Munari DP. Strategies for genotype imputation in composite beef cattle. BMC Genet 2015; 16:99. [PMID: 26250698 PMCID: PMC4527250 DOI: 10.1186/s12863-015-0251-7] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2015] [Accepted: 07/09/2015] [Indexed: 11/23/2022] Open
Abstract
Background Genotype imputation has been used to increase genomic information, allow more animals in genome-wide analyses, and reduce genotyping costs. In Brazilian beef cattle production, many animals are resulting from crossbreeding and such an event may alter linkage disequilibrium patterns. Thus, the challenge is to obtain accurately imputed genotypes in crossbred animals. The objective of this study was to evaluate the best fitting and most accurate imputation strategy on the MA genetic group (the progeny of a Charolais sire mated with crossbred Canchim X Zebu cows) and Canchim cattle. The data set contained 400 animals (born between 1999 and 2005) genotyped with the Illumina BovineHD panel. Imputation accuracy of genotypes from the Illumina-Bovine3K (3K), Illumina-BovineLD (6K), GeneSeek-Genomic-Profiler (GGP) BeefLD (GGP9K), GGP-IndicusLD (GGP20Ki), Illumina-BovineSNP50 (50K), GGP-IndicusHD (GGP75Ki), and GGP-BeefHD (GGP80K) to Illumina-BovineHD (HD) SNP panels were investigated. Seven scenarios for reference and target populations were tested; the animals were grouped according with birth year (S1), genetic groups (S2 and S3), genetic groups and birth year (S4 and S5), gender (S6), and gender and birth year (S7). Analyses were performed using FImpute and BEAGLE software and computation run-time was recorded. Genotype imputation accuracy was measured by concordance rate (CR) and allelic R square (R2). Results The highest imputation accuracy scenario consisted of a reference population with males and females and a target population with young females. Among the SNP panels in the tested scenarios, from the 50K, GGP75Ki and GGP80K were the most adequate to impute to HD in Canchim cattle. FImpute reduced computation run-time to impute genotypes from 20 to 100 times when compared to BEAGLE. Conclusion The genotyping panels possessing at least 50 thousands markers are suitable for genotype imputation to HD with acceptable accuracy. The FImpute algorithm demonstrated a higher efficiency of imputed markers, especially in lower density panels. These considerations may assist to increase genotypic information, reduce genotyping costs, and aid in genomic selection evaluations in crossbred animals. Electronic supplementary material The online version of this article (doi:10.1186/s12863-015-0251-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Tatiane C S Chud
- Departamento de Ciências Exatas, UNESP - Univ Estadual Paulista "Júlio de Mesquita Filho", Jaboticabal, SP, Brazil.
| | - Ricardo V Ventura
- Beef Improvement Opportunities, Guelph, ON, Canada. .,University of Guelph, Guelph, ON, Canada.
| | | | - Roberto Carvalheiro
- Departamento de Zootecnia, UNESP - Univ Estadual Paulista "Júlio de Mesquita Filho", Jaboticabal, SP, Brazil.
| | - Marcos E Buzanskas
- Departamento de Ciências Exatas, UNESP - Univ Estadual Paulista "Júlio de Mesquita Filho", Jaboticabal, SP, Brazil.
| | - Jaqueline O Rosa
- Departamento de Ciências Exatas, UNESP - Univ Estadual Paulista "Júlio de Mesquita Filho", Jaboticabal, SP, Brazil.
| | | | | | - Fabiana B Mokry
- Department of Genetics and Evolution, Federal University of São Carlos, São Carlos, SP, Brazil.
| | - Cintia R Marcondes
- Embrapa Southeast Livestock - Brazilian Corporation of Agricultural Research, São Carlos, SP, Brazil.
| | - Luciana C A Regitano
- Embrapa Southeast Livestock - Brazilian Corporation of Agricultural Research, São Carlos, SP, Brazil.
| | - Danísio P Munari
- Departamento de Ciências Exatas, UNESP - Univ Estadual Paulista "Júlio de Mesquita Filho", Jaboticabal, SP, Brazil.
| |
Collapse
|
20
|
Xiang T, Ma P, Ostersen T, Legarra A, Christensen OF. Imputation of genotypes in Danish purebred and two-way crossbred pigs using low-density panels. Genet Sel Evol 2015; 47:54. [PMID: 26122927 PMCID: PMC4486706 DOI: 10.1186/s12711-015-0134-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2014] [Accepted: 06/13/2015] [Indexed: 01/30/2023] Open
Abstract
Background Genotype imputation is commonly used as an initial step in genomic selection since the accuracy of genomic selection does not decline if accurately imputed genotypes are used instead of actual genotypes but for a lower cost. Performance of imputation has rarely been investigated in crossbred animals and, in particular, in pigs. The extent and pattern of linkage disequilibrium differ in crossbred versus purebred animals, which may impact the performance of imputation. In this study, first we compared different scenarios of imputation from 5 K to 8 K single nucleotide polymorphisms (SNPs) in genotyped Danish Landrace and Yorkshire and crossbred Landrace-Yorkshire datasets and, second, we compared imputation from 8 K to 60 K SNPs in genotyped purebred and simulated crossbred datasets. All imputations were done using software Beagle version 3.3.2. Then, we investigated the reasons that could explain the differences observed. Results Genotype imputation performs as well in crossbred animals as in purebred animals when both parental breeds are included in the reference population. When the size of the reference population is very large, it is not necessary to use a reference population that combines the two breeds to impute the genotypes of purebred animals because a within-breed reference population can provide a very high level of imputation accuracy (correct rate ≥ 0.99, correlation ≥ 0.95). However, to ensure that similar imputation accuracies are obtained for crossbred animals, a reference population that combines both parental purebred animals is required. Imputation accuracies are higher when a larger proportion of haplotypes are shared between the reference population and the validation (imputed) populations. Conclusions The results from both real data and pedigree-based simulated data demonstrate that genotype imputation from low-density panels to medium-density panels is highly accurate in both purebred and crossbred pigs. In crossbred pigs, combining the parental purebred animals in the reference population is necessary to obtain high imputation accuracy. Electronic supplementary material The online version of this article (doi:10.1186/s12711-015-0134-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Tao Xiang
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, Tjele, DK-8830, Denmark. .,INRA, UR1388 GenPhySE, CS-52627, Castanet-Tolosan, F-31326, France.
| | - Peipei Ma
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, Tjele, DK-8830, Denmark.
| | - Tage Ostersen
- Pig Research Centre, Danish Agricultural and Food Council, Copenhagen, DK-1609, Denmark.
| | - Andres Legarra
- INRA, UR1388 GenPhySE, CS-52627, Castanet-Tolosan, F-31326, France.
| | - Ole F Christensen
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, Tjele, DK-8830, Denmark.
| |
Collapse
|
21
|
Ogawa S, Matsuda H, Taniguchi Y, Watanabe T, Takasuga A, Sugimoto Y, Iwaisaki H. Accuracy of imputation of single nucleotide polymorphism marker genotypes from low-density panels in Japanese Black cattle. Anim Sci J 2015; 87:3-12. [PMID: 26032028 DOI: 10.1111/asj.12393] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2014] [Accepted: 11/18/2014] [Indexed: 12/25/2022]
Abstract
Using target and reference fattened steer populations, the performance of genotype imputation using lower-density marker panels in Japanese Black cattle was evaluated. Population imputation was performed using BEAGLE software. Genotype information for approximately 40,000 single nucleotide polymorphism (SNP) markers by Illumina BovineSNP50 BeadChip was available, and imputation accuracy was assessed based on the average concordance rates of the genotypes, varying equally spaced SNP densities, and the number of individuals in the reference population. Two additional statistics were also calculated as indicators of imputation performance. The concordance rates tended to be lower for SNPs with greater minor allele frequencies, or those located near the ends of the chromosomes. Longer autosomes yielded greater imputation accuracies than shorter ones. When SNPs were selected based on linkage disequilibrium information, relative imputation accuracy was slightly improved. When 3000 and 10,000 equally spaced SNPs were used, the imputation accuracies were greater than 90% and approximately 97%, respectively. These results indicate that combining genotyping using a lower-density SNP chip with genotype imputation based on a population of individuals genotyped using a higher-density SNP chip is a cost-effective and valid approach for genomic prediction.
Collapse
Affiliation(s)
| | | | - Yukio Taniguchi
- Graduate School of Agriculture, Kyoto University, Kyoto, Japan
| | | | | | | | | |
Collapse
|
22
|
Piccoli ML, Braccini J, Cardoso FF, Sargolzaei M, Larmer SG, Schenkel FS. Accuracy of genome-wide imputation in Braford and Hereford beef cattle. BMC Genet 2014; 15:157. [PMID: 25543517 PMCID: PMC4300607 DOI: 10.1186/s12863-014-0157-9] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2014] [Accepted: 12/18/2014] [Indexed: 12/31/2022] Open
Abstract
Background Strategies for imputing genotypes from the Illumina-Bovine3K, Illumina-BovineLD (6K), BeefLD-GGP (8K), a non-commercial-15K and IndicusLD-GGP (20K) to either Illumina-BovineSNP50 (50K) or to Illumina-BovineHD (777K) SNP panel, as well as for imputing from 50K, GGP-IndicusHD (90iK) and GGP-BeefHD (90tK) to 777K were investigated. Imputation of low density (<50K) genotypes to 777K was carried out in either one or two steps. Imputation of ungenotyped parents (n = 37 sires) with four or more offspring to the 50K panel was also assessed. There were 2,946 Braford, 664 Hereford and 88 Nellore animals, from which 71, 59 and 88 were genotyped with the 777K panel, while all others had 50K genotypes. The reference population was comprised of 2,735 animals and 175 bulls for 50K and 777K, respectively. The low density panels were simulated by masking genotypes in the 50K or 777K panel for animals born in 2011. Analyses were performed using both Beagle and FImpute software. Genotype imputation accuracy was measured by concordance rate and allelic R2 between true and imputed genotypes. Results The average concordance rate using FImpute was 0.943 and 0.921 averaged across all simulated low density panels to 50K or to 777K, respectively, in comparison with 0.927 and 0.895 using Beagle. The allelic R2 was 0.912 and 0.866 for imputation to 50K or to 777K using FImpute, respectively, and 0.890 and 0.826 using Beagle. One and two steps imputation to 777K produced averaged concordance rates of 0.806 and 0.892 and allelic R2 of 0.674 and 0.819, respectively. Imputation of low density panels to 50K, with the exception of 3K, had overall concordance rates greater than 0.940 and allelic R2 greater than 0.919. Ungenotyped animals were imputed to 50K panel with an average concordance rate of 0.950 by FImpute. Conclusion FImpute accuracy outperformed Beagle on both imputation to 50K and to 777K. Two-step outperformed one-step imputation for imputing to 777K. Ungenotyped animals that have four or more offspring can have their 50K genotypes accurately inferred using FImpute. All low density panels, except the 3K, can be used to impute to the 50K using FImpute or Beagle with high concordance rate and allelic R2.
Collapse
Affiliation(s)
- Mario L Piccoli
- Departamento de Zootecnia, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil. .,GenSys Consultores Associados S/S, Porto Alegre, Brazil. .,Centre for Genetic Improvement of Livestock, University of Guelph, Guelph, ON, Canada.
| | - José Braccini
- Departamento de Zootecnia, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil. .,National Council for Scientific and Technological Development, Brasília, Brazil.
| | - Fernando F Cardoso
- Embrapa Southern Region Animal Husbandry, Bagé, Brazil. .,National Council for Scientific and Technological Development, Brasília, Brazil.
| | - Medhi Sargolzaei
- Centre for Genetic Improvement of Livestock, University of Guelph, Guelph, ON, Canada. .,The Semex Alliance, Guelph, ON, Canada.
| | - Steven G Larmer
- Centre for Genetic Improvement of Livestock, University of Guelph, Guelph, ON, Canada.
| | - Flávio S Schenkel
- Centre for Genetic Improvement of Livestock, University of Guelph, Guelph, ON, Canada.
| |
Collapse
|
23
|
Calus MP, Bouwman AC, Hickey JM, Veerkamp RF, Mulder HA. Evaluation of measures of correctness of genotype imputation in the context of genomic prediction: a review of livestock applications. Animal 2014; 8:1743-53. [PMID: 25045914 DOI: 10.1017/S1751731114001803] [Citation(s) in RCA: 57] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
In livestock, many studies have reported the results of imputation to 50k single nucleotide polymorphism (SNP) genotypes for animals that are genotyped with low-density SNP panels. The objective of this paper is to review different measures of correctness of imputation, and to evaluate their utility depending on the purpose of the imputed genotypes. Across studies, imputation accuracy, computed as the correlation between true and imputed genotypes, and imputation error rates, that counts the number of incorrectly imputed alleles, are commonly used measures of imputation correctness. Based on the nature of both measures and results reported in the literature, imputation accuracy appears to be a more useful measure of the correctness of imputation than imputation error rates, because imputation accuracy does not depend on minor allele frequency (MAF), whereas imputation error rate depends on MAF. Therefore imputation accuracy can be better compared across loci with different MAF. Imputation accuracy depends on the ability of identifying the correct haplotype of a SNP, but many other factors have been identified as well, including the number of genotyped immediate ancestors, the number of animals with genotypes at the high-density panel, the SNP density on the low- and high-density panel, the MAF of the imputed SNP and whether imputed SNP are located at the end of a chromosome or not. Some of these factors directly contribute to the linkage disequilibrium between imputed SNP and SNP on the low-density panel. When imputation accuracy is assessed as a predictor for the accuracy of subsequent genomic prediction, we recommend that: (1) individual-specific imputation accuracies should be used that are computed after centring and scaling both true and imputed genotypes; and (2) imputation of gene dosage is preferred over imputation of the most likely genotype, as this increases accuracy and reduces bias of the imputed genotypes and the subsequent genomic predictions.
Collapse
|