1
|
Short communication: Accuracy of whole-genome sequence imputation in Angus cattle using within-breed and multi breed reference populations. Animal 2024; 18:101087. [PMID: 38364656 DOI: 10.1016/j.animal.2024.101087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 01/16/2024] [Accepted: 01/19/2024] [Indexed: 02/18/2024] Open
Abstract
Genotype imputation is a standard approach used in the field of genetics. It can be used to fill in missing genotypes or to increase genotype density. Accurate imputed genotypes are required for downstream analyses. In this study, the accuracy of whole-genome sequence imputation for Angus beef cattle was examined using two different ways to form the reference panel, a within-breed reference population and a multi breed reference population. A stepwise imputation was conducted by imputing medium-density (50k) genotypes to high-density, and then to the whole genome sequence (WGS). The reference population consisted of animals with WGS information from the 1 000 Bull Genomes project. The within-breed reference panel comprised 396 Angus cattle, while an additional 2 380 Taurine cattle were added to the reference population for the multi breed reference scenario. Imputation accuracies were variant-wise average accuracies from a 10-fold cross-validation and expressed as concordance rates (CR) and Pearson's correlations (PR). The two imputation scenarios achieved moderate to high imputation accuracies ranging from 0.896 to 0.966 for CR and from 0.779 to 0.834 for PR. The accuracies from two different scenarios were similar, except for PR from WGS imputation, where the within-breed scenario outperformed the multi breed scenario. The result indicated that including a large number of animals from other breeds in the reference panel to impute purebred Angus did not improve the accuracy and may negatively impact the results. In conclusion, the imputed WGS in Angus cattle can be obtained with high accuracy using a within-breed reference panel.
Collapse
|
2
|
Evaluation of low-density SNP panels and imputation for cost-effective genomic selection in four aquaculture species. Front Genet 2023; 14:1194266. [PMID: 37252666 PMCID: PMC10213886 DOI: 10.3389/fgene.2023.1194266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2023] [Accepted: 04/26/2023] [Indexed: 05/31/2023] Open
Abstract
Genomic selection can accelerate genetic progress in aquaculture breeding programmes, particularly for traits measured on siblings of selection candidates. However, it is not widely implemented in most aquaculture species, and remains expensive due to high genotyping costs. Genotype imputation is a promising strategy that can reduce genotyping costs and facilitate the broader uptake of genomic selection in aquaculture breeding programmes. Genotype imputation can predict ungenotyped SNPs in populations genotyped at a low-density (LD), using a reference population genotyped at a high-density (HD). In this study, we used datasets of four aquaculture species (Atlantic salmon, turbot, common carp and Pacific oyster), phenotyped for different traits, to investigate the efficacy of genotype imputation for cost-effective genomic selection. The four datasets had been genotyped at HD, and eight LD panels (300-6,000 SNPs) were generated in silico. SNPs were selected to be: i) evenly distributed according to physical position ii) selected to minimise the linkage disequilibrium between adjacent SNPs or iii) randomly selected. Imputation was performed with three different software packages (AlphaImpute2, FImpute v.3 and findhap v.4). The results revealed that FImpute v.3 was faster and achieved higher imputation accuracies. Imputation accuracy increased with increasing panel density for both SNP selection methods, reaching correlations greater than 0.95 in the three fish species and 0.80 in Pacific oyster. In terms of genomic prediction accuracy, the LD and the imputed panels performed similarly, reaching values very close to the HD panels, except in the pacific oyster dataset, where the LD panel performed better than the imputed panel. In the fish species, when LD panels were used for genomic prediction without imputation, selection of markers based on either physical or genetic distance (instead of randomly) resulted in a high prediction accuracy, whereas imputation achieved near maximal prediction accuracy independently of the LD panel, showing higher reliability. Our results suggests that, in fish species, well-selected LD panels may achieve near maximal genomic selection prediction accuracy, and that the addition of imputation will result in maximal accuracy independently of the LD panel. These strategies represent effective and affordable methods to incorporate genomic selection into most aquaculture settings.
Collapse
|
3
|
Comparing Methods to Select Candidates for Re-Genotyping to Impute Higher-Density Genotype Data in a Japanese Black Cattle Population: A Case Study. Animals (Basel) 2023; 13:ani13040638. [PMID: 36830425 PMCID: PMC9951718 DOI: 10.3390/ani13040638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Revised: 02/04/2023] [Accepted: 02/10/2023] [Indexed: 02/15/2023] Open
Abstract
As optimization methods to identify the best animals for dense genotyping to construct a reference population for genotype imputation, the MCA and MCG methods, which use the pedigree-based additive genetic relationship matrix (A matrix) and the genomic relationship matrix (G matrix), respectively, have been proposed. We assessed the performance of MCA and MCG methods using 575 Japanese Black cows. Pedigree data were provided to trace back up to five generations to construct the A matrix with changing the pedigree depth from 1 to 5 (five MCA methods). Genotype information on 36,426 single-nucleotide polymorphisms was used to calculate the G matrix based on VanRaden's methods 1 and 2 (two MCG methods). The MCG always selected one cow per iteration, while MCA sometimes selected multiple cows. The number of commonly selected cows between the MCA and MCG methods was generally lower than that between different MCA methods or between different MCG methods. For the studied population, MCG appeared to be more reasonable than MCA in selecting cows as a reference population for higher-density genotype imputation to perform genomic prediction and a genome-wide association study.
Collapse
|
4
|
An imputation-based genome-wide association study for growth and fatness traits in Sujiang pigs. Animal 2022; 16:100591. [PMID: 35872387 DOI: 10.1016/j.animal.2022.100591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Revised: 06/15/2022] [Accepted: 06/16/2022] [Indexed: 11/01/2022] Open
Abstract
Sujiang pigs are a synthetic breed derived from Jiangquhai, Fengjing, and Duroc pigs. In this study, we sequenced the genome of 62 pigs with a coverage depth of 10× to 20×, including 27 Sujiang and 35 founder breed pigs, and we collected 360 global pigs' genome sequence data from public databases including 39 Duroc pigs. We obtained a high-quality variant dataset of 365 Sujiang pigs by imputing the porcine 80 K single nucleotide polymorphism (SNP) Beadchip to the whole-genome scale with a total of 422 pigs as a reference panel. A dataset of 365 imputated Sujiang pigs was used to perform single-trait genome-wide association study (GWAS) and meta-analyses for growth and fatness traits. Single-trait GWAS identified 1 907, 18, and 14 SNPs surpassing the suggestively significant threshold for backfat thickness, chest circumference, and chest width, respectively. Meta-analyses identified 2 400 genome-wide significant SNPs and 520 suggestively significant SNPs for backfat thickness and chest circumference, and 719 genome-wide significant SNPs and 1 225 suggestively significant SNPs for all seven traits. According to the meta-analysis of backfat thickness and chest circumference, a remarkable region of 2.69 Mb on Sus scrofa chromosome 4 containing FAM110B, IMPAD1, LYN, MOS, PENK, PLAG1, SDR16C5 and XKR4 was identified as a candidate region. The haplotype heat map of the 2.69 Mb region verified that Sujiang pigs were derived from Duroc and Chinese indigenous pigs, especially Jiangquhai pigs. The Kruskal-Wallis test showed that haplotypes of the 2.69 Mb region significantly affected backfat thickness and chest circumference traits. We then focused on PLAG1, an important growth-related gene, and identified two synonymous SNPs with obvious differences among different breeds in the PLAG1 gene. We then performed genotyping of 365 Sujiang, 150 Duroc, 95 Jiangquhai, and 100 Fengjing pigs to confirm the above result and verified that the two variants significantly affected phenotypes of growth and fatness traits. Our findings not only provide insights into the genetic architecture of porcine growth and fatness traits but also provide potential markers for selective breeding of these traits in Sujiang pigs.
Collapse
|
5
|
Genotyping, the Usefulness of Imputation to Increase SNP Density, and Imputation Methods and Tools. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2467:113-138. [PMID: 35451774 DOI: 10.1007/978-1-0716-2205-6_4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Imputation has become a standard practice in modern genetic research to increase genome coverage and improve accuracy of genomic selection and genome-wide association study as a large number of samples can be genotyped at lower density (and lower cost) and, imputed up to denser marker panels or to sequence level, using information from a limited reference population. Most genotype imputation algorithms use information from relatives and population linkage disequilibrium. A number of software for imputation have been developed originally for human genetics and, more recently, for animal and plant genetics considering pedigree information and very sparse SNP arrays or genotyping-by-sequencing data. In comparison to human populations, the population structures in farmed species and their limited effective sizes allow to accurately impute high-density genotypes or sequences from very low-density SNP panels and a limited set of reference individuals. Whatever the imputation method, the imputation accuracy, measured by the correct imputation rate or the correlation between true and imputed genotypes, increased with the increasing relatedness of the individual to be imputed with its denser genotyped ancestors and as its own genotype density increased. Increasing the imputation accuracy pushes up the genomic selection accuracy whatever the genomic evaluation method. Given the marker densities, the most important factors affecting imputation accuracy are clearly the size of the reference population and the relationship between individuals in the reference and target populations.
Collapse
|
6
|
Comparison of the choice of animals for re-sequencing in two maternal pig lines. Genet Sel Evol 2022; 54:16. [PMID: 35183111 PMCID: PMC8858453 DOI: 10.1186/s12711-022-00706-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Accepted: 01/31/2022] [Indexed: 11/10/2022] Open
Abstract
Next-generation sequencing is a promising approach for the detection of causal variants within previously identified quantitative trait loci. Because of the costs of re-sequencing experiments, this application is currently mainly restricted to subsets of animals from already genotyped populations. Imputation from a lower to a higher marker density could represent a useful complementary approach. An analysis of the literature shows that several strategies are available to select animals for re-sequencing. This study demonstrates an animal selection workflow under practical conditions. Our approach considers different data sources and limited resources such as budget and availability of sampling material. The workflow combines previously described approaches and makes use of genotype and pedigree information from a Landrace and Large White population. Genotypes were phased and haplotypes were accurately estimated with AlphaPhase. Then, AlphaSeqOpt was used to optimize selection of animals for re-sequencing, reflecting the existing diversity of haplotypes. AlphaSeqOpt and ENDOG were used to select individuals based on pedigree information and by taking into account key animals that represent the genetic diversity of the populations. After the best selection criteria were determined, a subset of 57 animals was selected for subsequent re-sequencing. In order to evaluate and assess the advantage of this procedure, imputation accuracy was assessed by setting a set of single nucleotide polymorphism (SNP) chip genotypes to missing. Accuracy values were compared to those of alternative selection scenarios and the results showed the clear benefits of a targeted selection within this practical-driven approach. Especially imputation of low-frequency markers benefits from the combined approach described here. Accuracy was increased by up to 12% compared to a randomized or exclusively haplotype-based selection of sequencing candidates.
Collapse
|
7
|
Assessing single-nucleotide polymorphism selection methods for the development of a low-density panel optimized for imputation in South African Drakensberger beef cattle. J Anim Sci 2021; 99:6226920. [PMID: 33860324 DOI: 10.1093/jas/skab118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Accepted: 04/14/2021] [Indexed: 11/13/2022] Open
Abstract
A major obstacle in applying genomic selection (GS) to uniquely adapted local breeds in less-developed countries has been the cost of genotyping at high densities of single-nucleotide polymorphisms (SNP). Cost reduction can be achieved by imputing genotypes from lower to higher densities. Locally adapted breeds tend to be admixed and exhibit a high degree of genomic heterogeneity thus necessitating the optimization of SNP selection for downstream imputation. The aim of this study was to quantify the achievable imputation accuracy for a sample of 1,135 South African (SA) Drakensberger cattle using several custom-derived lower-density panels varying in both SNP density and how the SNP were selected. From a pool of 120,608 genotyped SNP, subsets of SNP were chosen (1) at random, (2) with even genomic dispersion, (3) by maximizing the mean minor allele frequency (MAF), (4) using a combined score of MAF and linkage disequilibrium (LD), (5) using a partitioning-around-medoids (PAM) algorithm, and finally (6) using a hierarchical LD-based clustering algorithm. Imputation accuracy to higher density improved as SNP density increased; animal-wise imputation accuracy defined as the within-animal correlation between the imputed and actual alleles ranged from 0.625 to 0.990 when 2,500 randomly selected SNP were chosen vs. a range of 0.918 to 0.999 when 50,000 randomly selected SNP were used. At a panel density of 10,000 SNP, the mean (standard deviation) animal-wise allele concordance rate was 0.976 (0.018) vs. 0.982 (0.014) when the worst (i.e., random) as opposed to the best (i.e., combination of MAF and LD) SNP selection strategy was employed. A difference of 0.071 units was observed between the mean correlation-based accuracy of imputed SNP categorized as low (0.01 < MAF ≤ 0.1) vs. high MAF (0.4 < MAF ≤ 0.5). Greater mean imputation accuracy was achieved for SNP located on autosomal extremes when these regions were populated with more SNP. The presented results suggested that genotype imputation can be a practical cost-saving strategy for indigenous breeds such as the SA Drakensberger. Based on the results, a genotyping panel consisting of ~10,000 SNP selected based on a combination of MAF and LD would suffice in achieving a <3% imputation error rate for a breed characterized by genomic admixture on the condition that these SNP are selected based on breed-specific selection criteria.
Collapse
|
8
|
Assessing the accuracy of imputation in the Gyr breed using different SNP panels. Genome 2021; 64:893-899. [PMID: 34057850 DOI: 10.1139/gen-2020-0081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The aim of this study was to evaluate the accuracy of imputation in a Gyr population using two medium-density panels (Bos taurus - Bos indicus) and to test whether the inclusion of the Nellore breed increases the imputation accuracy in the Gyr population. The database consisted of 289 Gyr females from Brazil genotyped with the GGP Bovine LDv4 chip containing 30 000 SNPs and 158 Gyr females from Colombia genotyped with the GGP indicus chip containing 35 000 SNPs. A customized chip was created that contained the information of 9109 SNPs (9K) to test the imputation accuracy in Gyr populations; 604 Nellore animals with information of LD SNPs tested in the scenarios were included in the reference population. Four scenarios were tested: LD9K_30KGIR, LD9K_35INDGIR, LD9K_30KGIR_NEL, and LD9K_35INDGIR_NEL. Principal component analysis (PCA) was computed for the genomic matrix and sample-specific imputation accuracies were calculated using Pearson's correlation (CS) and the concordance rate (CR) for imputed genotypes. The results of PCA of the Colombian and Brazilian Gyr populations demonstrated the genomic relationship between the two populations. The CS and CR ranged from 0.88 to 0.94 and from 0.93 to 0.96, respectively. Among the scenarios tested, the highest CS (0.94) was observed for the LD9K_30KGIR scenario. The present results highlight the importance of the choice of chip for imputation in the Gyr breed. However, the variation in SNPs may reduce the imputation accuracy even when the chip of the Bos indicus subspecies is used.
Collapse
|
9
|
Imputation accuracy to whole-genome sequence in Nellore cattle. Genet Sel Evol 2021; 53:27. [PMID: 33711929 PMCID: PMC7953568 DOI: 10.1186/s12711-021-00622-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2020] [Accepted: 03/05/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A cost-effective strategy to explore the complete DNA sequence in animals for genetic evaluation purposes is to sequence key ancestors of a population, followed by imputation mechanisms to infer marker genotypes that were not originally reported in a target population of animals genotyped with single nucleotide polymorphism (SNP) panels. The feasibility of this process relies on the accuracy of the genotype imputation in that population, particularly for potential causal mutations which may be at low frequency and either within genes or regulatory regions. The objective of the present study was to investigate the imputation accuracy to the sequence level in a Nellore beef cattle population, including that for variants in annotation classes which are more likely to be functional. METHODS Information of 151 key sequenced Nellore sires were used to assess the imputation accuracy from bovine HD BeadChip SNP (~ 777 k) to whole-genome sequence. The choice of the sires aimed at optimizing the imputation accuracy of a genotypic database, comprised of about 10,000 genotyped Nellore animals. Genotype imputation was performed using two computational approaches: FImpute3 and Minimac4 (after using Eagle for phasing). The accuracy of the imputation was evaluated using a fivefold cross-validation scheme and measured by the squared correlation between observed and imputed genotypes, calculated by individual and by SNP. SNPs were classified into a range of annotations, and the accuracy of imputation within each annotation classification was also evaluated. RESULTS High average imputation accuracies per animal were achieved using both FImpute3 (0.94) and Minimac4 (0.95). On average, common variants (minor allele frequency (MAF) > 0.03) were more accurately imputed by Minimac4 and low-frequency variants (MAF ≤ 0.03) were more accurately imputed by FImpute3. The inherent Minimac4 Rsq imputation quality statistic appears to be a good indicator of the empirical Minimac4 imputation accuracy. Both software provided high average SNP-wise imputation accuracy for all classes of biological annotations. CONCLUSIONS Our results indicate that imputation to whole-genome sequence is feasible in Nellore beef cattle since high imputation accuracies per individual are expected. SNP-wise imputation accuracy is software-dependent, especially for rare variants. The accuracy of imputation appears to be relatively independent of annotation classification.
Collapse
|
10
|
Marker selection and genomic prediction of economically important traits using imputed high-density genotypes for 5 breeds of dairy cattle. J Dairy Sci 2021; 104:4478-4485. [PMID: 33612229 DOI: 10.3168/jds.2020-19260] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Accepted: 11/22/2020] [Indexed: 11/19/2022]
Abstract
Marker sets used in US dairy genomic predictions were previously expanded by including high-density (HD) or sequence markers with the largest effects for Holstein breed only. Other non-Holstein breeds lacked enough HD genotyped animals to be used as a reference population at that time, and thus were not included in the genomic prediction. Recently, numbers of non-Holstein breeds genotyped using HD panels reached an acceptable level for imputation and marker selection, allowing HD genomic prediction and HD marker selection for Holstein plus 4 other breeds. Genotypes for 351,461 Holsteins, 347,570 Jerseys, 42,346 Brown Swiss, 9,364 Ayrshires (including Red dairy cattle), and 4,599 Guernseys were imputed to the HD marker list that included 643,059 SNP. The separate HD reference populations included Illumina BovineHD (San Diego, CA) genotypes for 4,012 Holsteins, 407 Jerseys, 181 Brown Swiss, 527 Ayrshires, and 147 Guernseys. The 643,059 variants included the HD SNP and all 79,254 (80K) genetic markers and QTL used in routine national genomic evaluations. Before imputation, approximately 91 to 97% of genotypes were unknown for each breed; after imputation, 1.1% of Holstein, 3.2% of Jersey, 6.7% of Brown Swiss, 4.8% of Ayrshire, and 4.2% of Guernsey alleles remained unknown due to lower density haplotypes that had no matching HD haplotype. The higher remaining missing rates in non-Holstein breeds are mainly due to fewer HD genotyped animals in the imputation reference populations. Allele effects for up to 39 traits were estimated separately within each breed using phenotypic reference populations that included up to 6,157 Jersey males and 110,130 Jersey females. Correlations of HD with 80K genomic predictions for young animals averaged 0.986, 0.989, 0.985, 0.992, and 0.978 for Jersey, Ayrshire, Brown Swiss, Guernsey, and Holstein breeds, respectively. Correlations were highest for yield traits (about 0.991) and lowest for foot angle and rear legs-side view (0.981and 0.982, respectively). Some HD effects were more than twice as large as the largest 80K SNP effect, and HD markers had larger effects than nearby 80K markers for many breed-trait combinations. Previous studies selected and included markers with large effects for Holstein traits; the newly selected HD markers should also improve non-Holstein and crossbred genomic predictions and were added to official US genomic predictions in April 2020.
Collapse
|
11
|
Imputation for sequencing variants preselected to a customized low-density chip. Sci Rep 2020; 10:9524. [PMID: 32533087 PMCID: PMC7293337 DOI: 10.1038/s41598-020-66523-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2019] [Accepted: 05/19/2020] [Indexed: 12/27/2022] Open
Abstract
The sequencing variants preselected from association analyses and bioinformatics analyses could improve genomic prediction. In this study, the imputation of sequencing SNPs preselected from major dairy breeds in Denmark-Finland-Sweden (DFS) and France (FRA) was investigated for both contemporary animals and old bulls in Danish Jersey. For contemporary animals, a two-step imputation which first imputed to 54 K and then to 54 K + DFS + FRA SNPs achieved highest accuracy. Correlations between observed and imputed genotypes were 91.6% for DFS SNPs and 87.6% for FRA SNPs, while concordance rates were 96.6% for DFS SNPs and 93.5% for FRA SNPs. The SNPs with lower minor allele frequency (MAF) tended to have lower correlations but higher concordance rates. For old bulls, imputation for DFS and FRA SNPs were relatively accurate even for bulls without progenies (correlations higher than 97.2% and concordance rates higher than 98.4%). For contemporary animals, given limited imputation accuracy of preselected sequencing SNPs especially for SNPs with low MAF, it would be a good strategy to directly genotype preselected sequencing SNPs with a customized SNP chip. For old bulls, given high imputation accuracy for preselected sequencing SNPs with all MAF ranges, it would be unnecessary to re-genotype preselected sequencing SNPs.
Collapse
|
12
|
Genomic Analysis Using Bayesian Methods under Different Genotyping Platforms in Korean Duroc Pigs. Animals (Basel) 2020; 10:ani10050752. [PMID: 32344859 PMCID: PMC7277155 DOI: 10.3390/ani10050752] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2020] [Revised: 04/16/2020] [Accepted: 04/22/2020] [Indexed: 12/03/2022] Open
Abstract
Simple Summary This study investigated the informative regions and the efficiency of genomic predictions for backfat thickness, days to 90 kg body weight, loin muscle area, and lean percentage in Korean Duroc pigs. The several regions of the genome were identified and a significant marker was found near the MC4R gene for growth and production-related traits. No differences in genomic accuracy were identified on the basis of the Bayesian approaches in these four growth and production-related traits. The genomic accuracy is improved by using deregressed estimated breeding values including parental information as a response variable in Korean Duroc pigs. Abstract Genomic evaluation has been widely applied to several species using commercial single nucleotide polymorphism (SNP) genotyping platforms. This study investigated the informative genomic regions and the efficiency of genomic prediction by using two Bayesian approaches (BayesB and BayesC) under two moderate-density SNP genotyping panels in Korean Duroc pigs. Growth and production records of 1026 individuals were genotyped using two medium-density, SNP genotyping platforms: Illumina60K and GeneSeek80K. These platforms consisted of 61,565 and 68,528 SNP markers, respectively. The deregressed estimated breeding values (DEBVs) derived from estimated breeding values (EBVs) and their reliabilities were taken as response variables. Two Bayesian approaches were implemented to perform the genome-wide association study (GWAS) and genomic prediction. Multiple significant regions for days to 90 kg (DAYS), lean muscle area (LMA), and lean percent (PCL) were detected. The most significant SNP marker, located near the MC4R gene, was detected using GeneSeek80K. Accuracy of genomic predictions was higher using the GeneSeek80K SNP panel for DAYS (Δ2%) and LMA (Δ2–3%) with two response variables, with no gains in accuracy by the Bayesian approaches in four growth and production-related traits. Genomic prediction is best derived from DEBVs including parental information as a response variable between two DEBVs regardless of the genotyping platform and the Bayesian method for genomic prediction accuracy in Korean Duroc pig breeding.
Collapse
|
13
|
Comparisons of improved genomic predictions generated by different imputation methods for genotyping by sequencing data in livestock populations. J Anim Sci Biotechnol 2020; 11:3. [PMID: 31921417 PMCID: PMC6947967 DOI: 10.1186/s40104-019-0407-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Accepted: 11/26/2019] [Indexed: 11/16/2022] Open
Abstract
Background Genotyping by sequencing (GBS) still has problems with missing genotypes. Imputation is important for using GBS for genomic predictions, especially for low depths, due to the large number of missing genotypes. Minor allele frequency (MAF) is widely used as a marker data editing criteria for genomic predictions. In this study, three imputation methods (Beagle, IMPUTE2 and FImpute software) based on four MAF editing criteria were investigated with regard to imputation accuracy of missing genotypes and accuracy of genomic predictions, based on simulated data of livestock population. Results Four MAFs (no MAF limit, MAF ≥ 0.001, MAF ≥ 0.01 and MAF ≥ 0.03) were used for editing marker data before imputation. Beagle, IMPUTE2 and FImpute software were applied to impute the original GBS. Additionally, IMPUTE2 also imputed the expected genotype dosage after genotype correction (GcIM). The reliability of genomic predictions was calculated using GBS and imputed GBS data. The results showed that imputation accuracies were the same for the three imputation methods, except for the data of sequencing read depth (depth) = 2, where FImpute had a slightly lower imputation accuracy than Beagle and IMPUTE2. GcIM was observed to be the best for all of the imputations at depth = 4, 5 and 10, but the worst for depth = 2. For genomic prediction, retaining more SNPs with no MAF limit resulted in higher reliability. As the depth increased to 10, the prediction reliabilities approached those using true genotypes in the GBS loci. Beagle and IMPUTE2 had the largest increases in prediction reliability of 5 percentage points, and FImpute gained 3 percentage points at depth = 2. The best prediction was observed at depth = 4, 5 and 10 using GcIM, but the worst prediction was also observed using GcIM at depth = 2. Conclusions The current study showed that imputation accuracies were relatively low for GBS with low depths and high for GBS with high depths. Imputation resulted in larger gains in the reliability of genomic predictions for GBS with lower depths. These results suggest that the application of IMPUTE2, based on a corrected GBS (GcIM) to improve genomic predictions for higher depths, and FImpute software could be a good alternative for routine imputation.
Collapse
|
14
|
Evaluation of imputation accuracy using the combination of two high-density panels in Nelore beef cattle. Sci Rep 2019; 9:17920. [PMID: 31784673 PMCID: PMC6884513 DOI: 10.1038/s41598-019-54382-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2019] [Accepted: 11/12/2019] [Indexed: 11/17/2022] Open
Abstract
This study compared imputation from lower-density commercial and customized panels to high-density panels and a combined panel (Illumina and Affymetrix) in Nelore beef cattle. Additionally, linkage disequilibrium and haplotype block conformation were estimated in individual high-density panels and compared with corresponding values in the combined panel after imputation. Overall, 814 animals were genotyped using BovineHD BeadChip (IllumHD), and 93 of these animals were also genotyped using the Axion Genome-Wide BOS 1 Array Plate (AffyHD). In general, customization considering linkage disequilibrium and minor allele frequency had the highest accuracies. The IllumHD panel had higher values of linkage disequilibrium for short distances between SNPs than AffyHD and the combined panel. The combined panel had an increased number of small haplotype blocks. The use of a combined panel is recommended due to its increased density and number of haplotype blocks, which in turn increase the probability of a marker being close to a quantitative trait locus of interest. Considering common SNPs between IllumHD and AffyHD for the customization of a low-density panel increases the imputation accuracy for IllumHD, AffyHD and the combined panel.
Collapse
|
15
|
Generating High Density, Low Cost Genotype Data in Soybean [ Glycine max (L.) Merr.]. G3 (BETHESDA, MD.) 2019; 9:2153-2160. [PMID: 31072870 PMCID: PMC6643887 DOI: 10.1534/g3.119.400093] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Accepted: 05/01/2019] [Indexed: 11/18/2022]
Abstract
Obtaining genome-wide genotype information for millions of SNPs in soybean [Glycine max (L.) Merr.] often involves completely resequencing a line at 5X or greater coverage. Currently, hundreds of soybean lines have been resequenced at high depth levels with their data deposited in the NCBI Short Read Archive. This publicly available dataset may be leveraged as an imputation reference panel in combination with skim (low coverage) sequencing of new soybean genotypes to economically obtain high-density SNP information. Ninety-nine soybean lines resequenced at an average of 17.1X were used to generate a reference panel, with over 10 million SNPs called using GATK's Haplotype Caller tool. Whole genome resequencing at approximately 1X depth was performed on 114 previously ungenotyped experimental soybean lines. Coverages down to 0.1X were analyzed by randomly subsetting raw reads from the original 1X sequence data. SNPs discovered in the reference panel were genotyped in the experimental lines after aligning to the soybean reference genome, and missing markers imputed using Beagle 4.1. Sequencing depth of the experimental lines could be reduced to 0.3X while still retaining an accuracy of 97.8%. Accuracy was inversely related to minor allele frequency, and highly correlated with marker linkage disequilibrium. The high accuracy of skim sequencing combined with imputation provides a low cost method for obtaining dense genotypic information that can be used for various genomics applications in soybean.
Collapse
|
16
|
Optimizing Selection of the Reference Population for Genotype Imputation From Array to Sequence Variants. Front Genet 2019; 10:510. [PMID: 31214246 PMCID: PMC6554347 DOI: 10.3389/fgene.2019.00510] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2019] [Accepted: 05/10/2019] [Indexed: 11/29/2022] Open
Abstract
Imputation of high-density genotypes to whole-genome sequences (WGS) is a cost-effective method to increase the density of available markers within a population. Imputed genotypes have been successfully used for genomic selection and discovery of variants associated with traits of interest for the population. To allow for the use of imputed genotypes for genomic analyses, accuracy of imputation must be high. Accuracy of imputation is influenced by multiple factors, such as size and composition of the reference group, and the allele frequency of variants included. Understanding the use of imputed WGSs prior to the generation of the reference population is important, as accurate imputation might be more focused, for instance, on common or on rare variants. The aim of this study was to present and evaluate new methods to select animals for sequencing relying on a previously genotyped population. The Genetic Diversity Index method optimizes the number of unique haplotypes in the future reference population, while the Highly Segregating Haplotype selection method targets haplotype alleles found throughout the majority of the population of interest. First the WGSs of a dairy cattle population were simulated. The simulated sequences mimicked the linkage disequilibrium level and the variants’ frequency distribution observed in currently available Holstein sequences. Then, reference populations of different sizes, in which animals were selected using both novel methods proposed here as well as two other methods presented in previous studies, were created. Finally, accuracies of imputation obtained with different reference populations were compared against each other. The novel methods were found to have overall accuracies of imputation of more than 0.85. Accuracies of imputation of rare variants reached values above 0.50. In conclusion, if imputed sequences are to be used for discovery of novel associations between variants and traits of interest in the population, animals carrying novel information should be selected and, consequently, the Genetic Diversity Index method proposed here may be used. If sequences are to be used to impute the overall genotyped population, a reference population consisting of common haplotypes carriers selected using the proposed Highly Segregating Haplotype method is recommended.
Collapse
|
17
|
Genome-wide association studies revealed candidate genes for tail fat deposition and body size in the Hulun Buir sheep. J Anim Breed Genet 2019; 136:362-370. [PMID: 31045295 DOI: 10.1111/jbg.12402] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2018] [Revised: 03/25/2019] [Accepted: 03/28/2019] [Indexed: 01/01/2023]
Abstract
Fat-tailed sheep have a unique characteristic of depositing fat in their tails. In the present study, we conducted genome-wide association studies (GWAS) on traits related to tail fat deposition and body size in the Hulun Buir sheep. A total number of 300 individuals belonging to two fat-tailed lines of the Hulun Buir sheep breed genotyped with the Ovine Infinium HD SNP BeadChip were included in the current study. Two mixed models, one for continuous and one for binary phenotypic traits, were employed to analyse ten traits, that is, body length (BL), body height (BH), chest girth (CG), tail length (TL), tail width (TW), tail circumference (TC), carcass weight (CW), tail fat weight (TF), ratio of CW to TF (RCT) and tail type (TT). We identified 7, 6, 7, 2, 10 and 1 SNPs significantly associated with traits TF, CW, RCT, TW, TT and CG, respectively. Their associated genomic regions harboured 42 positional candidate genes. Out of them, 13 candidate genes including SMURF2, FBF1, DTNBP1, SETD7 and RBM11 have been associated with fat metabolism in sheep. The RBM11 gene has already been identified in a previous study on signatures of selection in this specific sheep population. Two more genes, that is, SMARCA5 and GAB1 were associated with body size in sheep. The present study has identified candidate genes that might be implicated in tail fat deposition and body size in sheep.
Collapse
|
18
|
Uncovering Genomic Regions Associated With 36 Agro-Morphological Traits in Indian Spring Wheat Using GWAS. FRONTIERS IN PLANT SCIENCE 2019; 10:527. [PMID: 31134105 PMCID: PMC6511880 DOI: 10.3389/fpls.2019.00527] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/20/2018] [Accepted: 04/04/2019] [Indexed: 05/13/2023]
Abstract
Wheat genetic improvement by integration of advanced genomic technologies is one way of improving productivity. To facilitate the breeding of economically important traits in wheat, SNP loci and underlying candidate genes associated with the 36 agro-morphological traits were studied in a diverse panel of 404 genotypes. By using Breeders' 35K Axiom array in a comprehensive genome-wide association study covering 4364.79 cM of the wheat genome and applying a compressed mixed linear model, a total of 146 SNPs (-log10 P ≥ 4) were found associated with 23 traits out of 36 traits studied explaining 3.7-47.0% of phenotypic variance. To reveal this a subset of 260 genotypes was characterized phenotypically for six quantitative traits [days to heading (DTH), days to maturity (DTM), plant height (PH), spike length (SL), awn length (Awn_L), and leaf length (Leaf_L)] under five environments. Gene annotations mined ∼38 putative candidate genes which were confirmed using tissue and stage specific gene expression data from RNA Seq. We observed strong co-localized loci for four traits (glume pubescence, SL, PH, and awn color) on chromosome 1B (24.64 cM) annotated five putative candidate genes. This study led to the discovery of hitherto unreported loci for some less explored traits (such as leaf sheath wax, awn attitude, and glume pubescence) besides the refined chromosomal regions of known loci associated with the traits. This study provides valuable information of the genetic loci and their potential genes underlying the traits such as awn characters which are being considered as important contributors toward yield enhancement.
Collapse
|
19
|
Finding the Optimal Imputation Strategy for Small Cattle Populations. Front Genet 2019; 10:52. [PMID: 30833959 PMCID: PMC6387911 DOI: 10.3389/fgene.2019.00052] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2018] [Accepted: 01/21/2019] [Indexed: 01/08/2023] Open
Abstract
The imputation from lower density SNP chip genotypes to whole-genome sequence level is an established approach to generate high density genotypes for many individuals. Imputation accuracy is dependent on many factors and for small cattle populations such as the endangered German Black Pied cattle (DSN), determining the optimal imputation strategy is especially challenging since only a low number of high density genotypes is available. In this paper, the accuracy of imputation was explored with regard to (1) phasing of the target population and the reference panel for imputation, (2) comparison of a 1-step imputation approach, where 50 k genotypes are directly imputed to sequence level, to a 2-step imputation approach that used an intermediate step imputing first to 700 k and subsequently to sequence level, (3) the software tools Beagle and Minimac, and (4) the size and composition of the reference panel for imputation. Analyses were performed for 30 DSN and 30 Holstein Frisian cattle available from the 1000 Bull Genomes Project. Imputation accuracy was assessed using a leave-one-out cross validation procedure. We observed that phasing of the target populations and the reference panels affects the imputation accuracy significantly. Minimac reached higher accuracy when imputing using small reference panels, while Beagle performed better with larger reference panels. In contrast to previous research, we found that when a low number of animals is available at the intermediate imputation step, the 1-step imputation approach yielded higher imputation accuracy compared to a 2-step imputation. Overall, the size of the reference panel for imputation is the most important factor leading to higher imputation accuracy, although using a larger reference panel consisting of a related but different breed (Holstein Frisian) significantly reduced imputation accuracy. Our findings provide specific recommendations for populations with a limited number of high density genotyped or sequenced animals available such as DSN. The overall recommendation when imputing a small population are to (1) use a large reference panel of the same breed, (2) use a large reference panel consisting of diverse breeds, or (3) when a large reference panel is not available, we recommend using a smaller same breed reference panel without including a different related breed.
Collapse
|
20
|
Genotype imputation accuracy in multiple equine breeds from medium- to high-density genotypes. J Anim Breed Genet 2018; 135:420-431. [DOI: 10.1111/jbg.12358] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2018] [Revised: 08/17/2018] [Accepted: 08/24/2018] [Indexed: 01/27/2023]
|
21
|
Accuracy of imputation of single-nucleotide polymorphism marker genotypes for water buffaloes (Bubalus bubalis) using different reference population sizes and imputation tools. Livest Sci 2018. [DOI: 10.1016/j.livsci.2018.08.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
|
22
|
Genotype imputation from various low-density SNP panels and its impact on accuracy of genomic breeding values in pigs. Animal 2018; 12:2235-2245. [PMID: 29706144 DOI: 10.1017/s175173111800085x] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
The uptake of genomic selection (GS) by the swine industry is still limited by the costs of genotyping. A feasible alternative to overcome this challenge is to genotype animals using an affordable low-density (LD) single nucleotide polymorphism (SNP) chip panel followed by accurate imputation to a high-density panel. Therefore, the main objective of this study was to screen incremental densities of LD panels in order to systematically identify one that balances the tradeoffs among imputation accuracy, prediction accuracy of genomic estimated breeding values (GEBVs), and genotype density (directly associated with genotyping costs). Genotypes using the Illumina Porcine60K BeadChip were available for 1378 Duroc (DU), 2361 Landrace (LA) and 3192 Yorkshire (YO) pigs. In addition, pseudo-phenotypes (de-regressed estimated breeding values) for five economically important traits were provided for the analysis. The reference population for genotyping imputation consisted of 931 DU, 1631 LA and 2103 YO animals and the remainder individuals were included in the validation population of each breed. A LD panel of 3000 evenly spaced SNPs (LD3K) yielded high imputation accuracy rates: 93.78% (DU), 97.07% (LA) and 97.00% (YO) and high correlations (>0.97) between the predicted GEBVs using the actual 60 K SNP genotypes and the imputed 60 K SNP genotypes for all traits and breeds. The imputation accuracy was influenced by the reference population size as well as the amount of parental genotype information available in the reference population. However, parental genotype information became less important when the LD panel had at least 3000 SNPs. The correlation of the GEBVs directly increased with an increase in imputation accuracy. When genotype information for both parents was available, a panel of 300 SNPs (imputed to 60 K) yielded GEBV predictions highly correlated (⩾0.90) with genomic predictions obtained based on the true 60 K panel, for all traits and breeds. For a small reference population size with no parents on reference population, it is recommended the use of a panel at least as dense as the LD3K and, when there are two parents in the reference population, a panel as small as the LD300 might be a feasible option. These findings are of great importance for the development of LD panels for swine in order to reduce genotyping costs, increase the uptake of GS and, therefore, optimize the profitability of the swine industry.
Collapse
|
23
|
Imputation from SNP chip to sequence: a case study in a Chinese indigenous chicken population. J Anim Sci Biotechnol 2018; 9:30. [PMID: 29581880 PMCID: PMC5861640 DOI: 10.1186/s40104-018-0241-5] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2017] [Accepted: 01/26/2018] [Indexed: 11/24/2022] Open
Abstract
Background Genome-wide association studies and genomic predictions are thought to be optimized by using whole-genome sequence (WGS) data. However, sequencing thousands of individuals of interest is expensive. Imputation from SNP panels to WGS data is an attractive and less expensive approach to obtain WGS data. The aims of this study were to investigate the accuracy of imputation and to provide insight into the design and execution of genotype imputation. Results We genotyped 450 chickens with a 600 K SNP array, and sequenced 24 key individuals by whole genome re-sequencing. Accuracy of imputation from putative 60 K and 600 K array data to WGS data was 0.620 and 0.812 for Beagle, and 0.810 and 0.914 for FImpute, respectively. By increasing the sequencing cost from 24X to 144X, the imputation accuracy increased from 0.525 to 0.698 for Beagle and from 0.654 to 0.823 for FImpute. With fixed sequence depth (12X), increasing the number of sequenced animals from 1 to 24, improved accuracy from 0.421 to 0.897 for FImpute and from 0.396 to 0.777 for Beagle. Using optimally selected key individuals resulted in a higher imputation accuracy compared with using randomly selected individuals as a reference population for re-sequencing. With fixed reference population size (24), imputation accuracy increased from 0.654 to 0.875 for FImpute and from 0.512 to 0.762 for Beagle as the sequencing depth increased from 1X to 12X. With a given total cost of genotyping, accuracy increased with the size of the reference population for FImpute, but the pattern was not valid for Beagle, which showed the highest accuracy at six fold coverage for the scenarios used in this study. Conclusions In conclusion, we comprehensively investigated the impacts of several key factors on genotype imputation. Generally, increasing sequencing cost gave a higher imputation accuracy. But with a fixed sequencing cost, the optimal imputation enhance the performance of WGP and GWAS. An optimal imputation strategy should take size of reference population, imputation algorithms, marker density, and population structure of the target population and methods to select key individuals into consideration comprehensively. This work sheds additional light on how to design and execute genotype imputation for livestock populations. Electronic supplementary material The online version of this article (10.1186/s40104-018-0241-5) contains supplementary material, which is available to authorized users.
Collapse
|
24
|
Genotype Imputation and Accuracy Evaluation in Racing Quarter Horses Genotyped Using Different Commercial SNP Panels. J Equine Vet Sci 2017. [DOI: 10.1016/j.jevs.2017.07.012] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
25
|
Linkage disequilibrium among commonly genotyped SNP variants detected from bull sequence,. Anim Genet 2017; 48:516-522. [DOI: 10.1111/age.12579] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/21/2017] [Indexed: 11/29/2022]
|
26
|
Justification for setting the individual animal genotype call rate threshold at eighty-five percent. J Anim Sci 2017; 94:4558-4569. [PMID: 27898963 DOI: 10.2527/jas.2016-0802] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Data quality of SNP arrays impacts the accuracy and precision of downstream data analyses. One such quality control measure often imposed is a threshold on individual animal call rate. Different call rate thresholds have been applied across studies; little is known, however, about the impact of these thresholds on the quality of the genotype data. The objective of the present study was to investigate the effect of different call rate thresholds on the integrity of the genotypes but also to quantify the contribution of different factors to the variability in animal call rate. Data included 142,342 samples genotyped on a custom Illumina genotype panel from 141,591 dairy and beef cattle; the number of Illumina SNP on the panel was 14,371. The mean animal call rate across all samples was 99.09%; 487 animals had both a low call rate (<99%) and a subsequent high call rate (≥99%) after resampling and regenotyping. Several factors were associated ( < 0.001) with individual call rate including animal sex, the sampling herd, the date of genotyping, the genotyping plate, and the plate well. The genotype and allele concordance between the genotypes of the 487 low- and high-call rate individuals improved at a diminishing rate as mean animal call rate increased. Mean genotype and allele concordance rates of 0.987 and 0.997, respectively, existed when animal call rate was between 85 and 90%, increasing to 0.998 and 0.999, respectively, when animal call rate was between 95 and <99%. The mean within-animal allele concordance rate of rare variants (i.e., minor allele frequency < 0.05) between low and high genotype call rate animals increased when animal call rate improved; an allele concordance rate of 1.00 was achieved when animal call rate was between 85 and <99%. The accuracy of imputation of the nonobserved genotypes in the low-call rate animals improved as animal call rate increased; the mean genotype concordance rate of the imputed nonobserved SNP was 0.41 when animal call rate was <40% but increased to 0.95 when animal call rate was between 95 and <99%. Parentage validation, determined by the count of opposing homozygotes in a parent-progeny pair, was unreliable when animal call rate was <85%. Therefore, to ensure the provision of high-quality genotypes while also considering the cost and inconvenience of resampling and regenotyping, we suggest a minimum animal call rate threshold of 85%.
Collapse
|
27
|
Imputation-Based Whole-Genome Sequence Association Study Rediscovered the Missing QTL for Lumbar Number in Sutai Pigs. Sci Rep 2017; 7:615. [PMID: 28377593 PMCID: PMC5429657 DOI: 10.1038/s41598-017-00729-0] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2016] [Accepted: 03/09/2017] [Indexed: 02/02/2023] Open
Abstract
Resequencing a number of individuals of various breeds as reference population and imputing the whole-genome sequences of individuals that were genotyped with medium-density chips to perform an association study is a very efficient strategy. Previously, we performed a genome-wide association study (GWAS) of lumbar number using 60K SNPs from the porcine Illumina chips in 418 Sutai pigs and did not detect any significant signals. Therefore, we imputed the whole-genome sequences of 418 Sutai individuals from 403 deeply resequenced reference individuals and performed association tests. We identified a quantitative trait locus (QTL) for lumbar number in SSC1 with a P value of 9.01E-18 that was close to the potential causative gene of NR6A1. The result of conditioning on the top SNP association test indicated that only one QTL was responsible for this trait in SSC1. The linkage disequilibrium (LD) drop test result for the condition of the reported potential causative mutation (c.575T > C missense mutation of NR6A1) indicated that this mutation was probably not the underlying mutation that affected lumbar number in our study. As the first trial of imputed whole-genome sequence GWAS in swine, this approach can be also powerful to investigate complex traits in pig like in human and cattle.
Collapse
|
28
|
Genotype Imputation To Improve the Cost-Efficiency of Genomic Selection in Farmed Atlantic Salmon. G3-GENES GENOMES GENETICS 2017; 7:1377-1383. [PMID: 28250015 PMCID: PMC5386885 DOI: 10.1534/g3.117.040717] [Citation(s) in RCA: 54] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Genomic selection uses genome-wide marker information to predict breeding values for traits of economic interest, and is more accurate than pedigree-based methods. The development of high density SNP arrays for Atlantic salmon has enabled genomic selection in selective breeding programs, alongside high-resolution association mapping of the genetic basis of complex traits. However, in sibling testing schemes typical of salmon breeding programs, trait records are available on many thousands of fish with close relationships to the selection candidates. Therefore, routine high density SNP genotyping may be prohibitively expensive. One means to reducing genotyping cost is the use of genotype imputation, where selected key animals (e.g., breeding program parents) are genotyped at high density, and the majority of individuals (e.g., performance tested fish and selection candidates) are genotyped at much lower density, followed by imputation to high density. The main objectives of the current study were to assess the feasibility and accuracy of genotype imputation in the context of a salmon breeding program. The specific aims were: (i) to measure the accuracy of genotype imputation using medium (25 K) and high (78 K) density mapped SNP panels, by masking varying proportions of the genotypes and assessing the correlation between the imputed genotypes and the true genotypes; and (ii) to assess the efficacy of imputed genotype data in genomic prediction of key performance traits (sea lice resistance and body weight). Imputation accuracies of up to 0.90 were observed using the simple two-generation pedigree dataset, and moderately high accuracy (0.83) was possible even with very low density SNP data (∼250 SNPs). The performance of genomic prediction using imputed genotype data was comparable to using true genotype data, and both were superior to pedigree-based prediction. These results demonstrate that the genotype imputation approach used in this study can provide a cost-effective method for generating robust genome-wide SNP data for genomic prediction in Atlantic salmon. Genotype imputation approaches are likely to form a critical component of cost-efficient genomic selection programs to improve economically important traits in aquaculture.
Collapse
|
29
|
Evaluation of the accuracy of imputed sequence variant genotypes and their utility for causal variant detection in cattle. Genet Sel Evol 2017; 49:24. [PMID: 28222685 PMCID: PMC5320806 DOI: 10.1186/s12711-017-0301-x] [Citation(s) in RCA: 71] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2016] [Accepted: 02/14/2017] [Indexed: 12/11/2022] Open
Abstract
Background The availability of dense genotypes and whole-genome sequence variants from various sources offers the opportunity to compile large datasets consisting of tens of thousands of individuals with genotypes at millions of polymorphic sites that may enhance the power of genomic analyses. The imputation of missing genotypes ensures that all individuals have genotypes for a shared set of variants. Results We evaluated the accuracy of imputation from dense genotypes to whole-genome sequence variants in 249 Fleckvieh and 450 Holstein cattle using Minimac and FImpute. The sequence variants of a subset of the animals were reduced to the variants that were included on the Illumina BovineHD genotyping array and subsequently inferred in silico using either within- or multi-breed reference populations. The accuracy of imputation varied considerably across chromosomes and dropped at regions where the bovine genome contains segmental duplications. Depending on the imputation strategy, the correlation between imputed and true genotypes ranged from 0.898 to 0.952. The accuracy of imputation was higher with Minimac than FImpute particularly for variants with a low minor allele frequency. Using a multi-breed reference population increased the accuracy of imputation, particularly when FImpute was used to infer genotypes. When the sequence variants were imputed using Minimac, the true genotypes were more correlated to predicted allele dosages than best-guess genotypes. The computing costs to impute 23,256,743 sequence variants in 6958 animals were ten-fold higher with Minimac than FImpute. Association studies with imputed sequence variants revealed seven quantitative trait loci (QTL) for milk fat percentage. Two causal mutations in the DGAT1 and GHR genes were the most significantly associated variants at two QTL on chromosomes 14 and 20 when Minimac was used to infer genotypes. Conclusions The population-based imputation of millions of sequence variants in large cohorts is computationally feasible and provides accurate genotypes. However, the accuracy of imputation is low in regions where the genome contains large segmental duplications or the coverage with array-derived single nucleotide polymorphisms is poor. Using a reference population that includes individuals from many breeds increases the accuracy of imputation particularly at low-frequency variants. Considering allele dosages rather than best-guess genotypes as explanatory variables is advantageous to detect causal mutations in association studies with imputed sequence variants. Electronic supplementary material The online version of this article (doi:10.1186/s12711-017-0301-x) contains supplementary material, which is available to authorized users.
Collapse
|
30
|
A comparison of different algorithms for phasing haplotypes using Holstein cattle genotypes and pedigree data. J Dairy Sci 2017; 100:2837-2849. [PMID: 28161175 DOI: 10.3168/jds.2016-11590] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2016] [Accepted: 12/09/2016] [Indexed: 01/25/2023]
Abstract
Phasing genotypes to haplotypes is becoming increasingly important due to its applications in the study of diseases, population and evolutionary genetics, imputation, and so on. Several studies have focused on the development of computational methods that infer haplotype phase from population genotype data. The aim of this study was to compare phasing algorithms implemented in Beagle, Findhap, FImpute, Impute2, and ShapeIt2 software using 50k and 777k (HD) genotyping data. Six scenarios were considered: no-parents, sire-progeny pairs, sire-dam-progeny trios, each with and without pedigree information in Holstein cattle. Algorithms were compared with respect to their phasing accuracy and computational efficiency. In the studied population, Beagle and FImpute were more accurate than other phasing algorithms. Across scenarios, phasing accuracies for Beagle and FImpute were 99.49-99.90% and 99.44-99.99% for 50k, respectively, and 99.90-99.99% and 99.87-99.99% for HD, respectively. Generally, FImpute resulted in higher accuracy when genotypic information of at least one parent was available. In the absence of parental genotypes and pedigree information, Beagle and Impute2 (with double the default number of states) were slightly more accurate than FImpute. Findhap gave high phasing accuracy when parents' genotypes and pedigree information were available. In terms of computing time, Findhap was the fastest algorithm followed by FImpute. FImpute was 30 to 131, 87 to 786, and 353 to 1,400 times faster across scenarios than Beagle, ShapeIt2, and Impute2, respectively. In summary, FImpute and Beagle were the most accurate phasing algorithms. Moreover, the low computational requirement of FImpute makes it an attractive algorithm for phasing genotypes of large livestock populations.
Collapse
|
31
|
Whole-genome sequence-based genomic prediction in laying chickens with different genomic relationship matrices to account for genetic architecture. Genet Sel Evol 2017; 49:8. [PMID: 28093063 PMCID: PMC5238523 DOI: 10.1186/s12711-016-0277-y] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2015] [Accepted: 12/05/2016] [Indexed: 11/10/2022] Open
Abstract
Background With the availability of next-generation sequencing technologies, genomic prediction based on whole-genome sequencing (WGS) data is now feasible in animal breeding schemes and was expected to lead to higher predictive ability, since such data may contain all genomic variants including causal mutations. Our objective was to compare prediction ability with high-density (HD) array data and WGS data in a commercial brown layer line with genomic best linear unbiased prediction (GBLUP) models using various approaches to weight single nucleotide polymorphisms (SNPs). Methods A total of 892 chickens from a commercial brown layer line were genotyped with 336 K segregating SNPs (array data) that included 157 K genic SNPs (i.e. SNPs in or around a gene). For these individuals, genome-wide sequence information was imputed based on data from re-sequencing runs of 25 individuals, leading to 5.2 million (M) imputed SNPs (WGS data), including 2.6 M genic SNPs. De-regressed proofs (DRP) for eggshell strength, feed intake and laying rate were used as quasi-phenotypic data in genomic prediction analyses. Four weighting factors for building a trait-specific genomic relationship matrix were investigated: identical weights, −(log10P) from genome-wide association study results, squares of SNP effects from random regression BLUP, and variable selection based weights (known as BLUP|GA). Predictive ability was measured as the correlation between DRP and direct genomic breeding values in five replications of a fivefold cross-validation. Results Averaged over the three traits, the highest predictive ability (0.366 ± 0.075) was obtained when only genic SNPs from WGS data were used. Predictive abilities with genic SNPs and all SNPs from HD array data were 0.361 ± 0.072 and 0.353 ± 0.074, respectively. Prediction with −(log10P) or squares of SNP effects as weighting factors for building a genomic relationship matrix or BLUP|GA did not increase accuracy, compared to that with identical weights, regardless of the SNP set used. Conclusions Our results show that little or no benefit was gained when using all imputed WGS data to perform genomic prediction compared to using HD array data regardless of the weighting factors tested. However, using only genic SNPs from WGS data had a positive effect on prediction ability. Electronic supplementary material The online version of this article (doi:10.1186/s12711-016-0277-y) contains supplementary material, which is available to authorized users.
Collapse
|
32
|
Evaluation of developed low-density genotype panels for imputation to higher density in independent dairy and beef cattle populations. J Anim Sci 2016; 94:949-62. [PMID: 27065257 DOI: 10.2527/jas.2015-0044] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
The objective of this study was to develop, using alternative algorithms, low-density SNP genotyping panels (384 to 12,000 SNP), which can be accurately imputed to higher-density panels across independent cattle populations. Single nucleotide polymorphisms were selected based on genomic characteristics (i.e., linkage disequilibrium [LD], minor allele frequency [MAF], and genomic distance) in a population of 1,267 Holstein-Friesian animals genotyped on the Illumina Bovine50 Beadchip (54,001 SNP). Single nucleotide polymorphism selection methods included 1) random; 2) equidistant location; 3) combination of SNP MAF and LD structure while maintaining relatively equal genomic distance between adjacent SNP; 4) a combination of high MAF, genomic distance between selected and candidate SNP, and correlation between genotypes of selected and candidate SNP; and 5) a machine learning algorithm. The panels were validated separately in 1) a population of 750 Holstein-Friesian animals with masked genotypes to reflect the lower-density SNP densities under investigation (1,249 animals with complete genotypes included in reference population) and 2) a population of 359 Limousin and Charolais cattle with high (777,962 SNP)-density genotypes (1,918 animals with complete genotypes included in the reference population). Irrespective of SNP selection method, imputation accuracy in both populations improved at a diminishing rate as the number of SNP included in the lower-density genotype panel increased. Additionally, the variability in mean imputation accuracy per individual decreased as the panel density increased. The SNP selection method had a major impact on the mean allele concordance rate, although its impact diminished as the panel density increased. Imputation accuracy for SNP selected using a combination of high SNP MAF, LD structure, and relatively equal genomic distance between SNP outperformed all other selection methods in densities < 12,000 SNP. Using this method of SNP selection, the correlation between the imputed and actual genotypes for the 3,000 SNP panel was 0.90 and 0.96 when applied to the beef and dairy populations, respectively; the respective correlations for the 6,000 SNP panel were 0.95 and 0.98. It is necessary to include between 3,000 and 6,000 SNP in a low-density panel to achieve adequate imputation accuracy to either medium density (approximately 50,000 SNP in the dairy population) or high density (approximately 700,000 SNP in the beef population) across diverse and independent populations.
Collapse
|
33
|
Short communication: Imputation of markers on the bovine X chromosome. J Dairy Sci 2016; 99:7313-7318. [DOI: 10.3168/jds.2016-11160] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2016] [Accepted: 06/03/2016] [Indexed: 11/19/2022]
|
34
|
Application of reproductive technologies to improve dairy cattle genomic selection. ACTA ACUST UNITED AC 2016. [DOI: 10.1134/s207905971603014x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
|
35
|
A 0.5-Mbp deletion on bovine chromosome 23 is a strong candidate for stillbirth in Nordic Red cattle. Genet Sel Evol 2016; 48:35. [PMID: 27091210 PMCID: PMC4835938 DOI: 10.1186/s12711-016-0215-z] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2015] [Accepted: 04/11/2016] [Indexed: 11/24/2022] Open
Abstract
Background A whole-genome association study of 4631 progeny-tested Nordic Red dairy cattle bulls using imputed next-generation sequencing data revealed a major quantitative trait locus (QTL) that affects birth index (BI) on Bos taurus autosome (BTA) 23. We analyzed this QTL to identify which of the component traits of BI are affected and understand its molecular basis. Results A genome-wide scan of BI in Nordic Red dairy cattle detected major QTL on BTA6, 14 and 23. The strongest associated single nucleotide polymorphism (SNP) on BTA23 was located at 13,313,896 bp with \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$- \log_{10} ({\text{p}}) = 50.63$$\end{document}-log10(p)=50.63. Analyses of component traits showed that the QTL had a large effect on stillbirth. Based on the 10 most strongly associated SNPs with stillbirth, we constructed a haplotype. Among this haplotype’s alleles, HAPQTL had a large negative effect on stillbirth. No animals were found to be homozygous for HAPQTL. Analysis of stillbirth records that were categorized by carrier status for HAPQTL of the sire and maternal grandsire suggested that this haplotype had a recessive mode of inheritance. Illumina BovineHD BeadChip genotypes and genotype intensity data indicated a chromosomal deletion between 12.28 and 12.81 Mbp on BTA23. An independent set of Illumina Bovine50k BeadChip genotypes identified a recessive lethal haplotype that spanned the deleted region. Conclusions A deleted region of approximately 500 kb that spans three genes on BTA23 was identified and is a strong candidate QTL with a large effect on BI by increasing stillbirth. Electronic supplementary material The online version of this article (doi:10.1186/s12711-016-0215-z) contains supplementary material, which is available to authorized users.
Collapse
|
36
|
Imputation Accuracy from Low to Moderate Density Single Nucleotide Polymorphism Chips in a Thai Multibreed Dairy Cattle Population. ASIAN-AUSTRALASIAN JOURNAL OF ANIMAL SCIENCES 2016; 29:464-70. [PMID: 26949946 PMCID: PMC4782080 DOI: 10.5713/ajas.15.0291] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/02/2015] [Revised: 07/31/2015] [Accepted: 08/24/2015] [Indexed: 11/27/2022]
Abstract
The objective of this study was to investigate the accuracy of imputation from low density (LDC) to moderate density SNP chips (MDC) in a Thai Holstein-Other multibreed dairy cattle population. Dairy cattle with complete pedigree information (n = 1,244) from 145 dairy farms were genotyped with GeneSeek GGP20K (n = 570), GGP26K (n = 540) and GGP80K (n = 134) chips. After checking for single nucleotide polymorphism (SNP) quality, 17,779 SNP markers in common between the GGP20K, GGP26K, and GGP80K were used to represent MDC. Animals were divided into two groups, a reference group (n = 912) and a test group (n = 332). The SNP markers chosen for the test group were those located in positions corresponding to GeneSeek GGP9K (n = 7,652). The LDC to MDC genotype imputation was carried out using three different software packages, namely Beagle 3.3 (population-based algorithm), FImpute 2.2 (combined family- and population-based algorithms) and Findhap 4 (combined family- and population-based algorithms). Imputation accuracies within and across chromosomes were calculated as ratios of correctly imputed SNP markers to overall imputed SNP markers. Imputation accuracy for the three software packages ranged from 76.79% to 93.94%. FImpute had higher imputation accuracy (93.94%) than Findhap (84.64%) and Beagle (76.79%). Imputation accuracies were similar and consistent across chromosomes for FImpute, but not for Findhap and Beagle. Most chromosomes that showed either high (73%) or low (80%) imputation accuracies were the same chromosomes that had above and below average linkage disequilibrium (LD; defined here as the correlation between pairs of adjacent SNP within chromosomes less than or equal to 1 Mb apart). Results indicated that FImpute was more suitable than Findhap and Beagle for genotype imputation in this Thai multibreed population. Perhaps additional increments in imputation accuracy could be achieved by increasing the completeness of pedigree information.
Collapse
|
37
|
Impact of imputation methods on the amount of genetic variation captured by a single-nucleotide polymorphism panel in soybeans. BMC Bioinformatics 2016; 17:55. [PMID: 26830693 PMCID: PMC4736474 DOI: 10.1186/s12859-016-0899-7] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2015] [Accepted: 01/19/2016] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Success in genome-wide association studies and marker-assisted selection depends on good phenotypic and genotypic data. The more complete this data is, the more powerful will be the results of analysis. Nevertheless, there are next-generation technologies that seek to provide genotypic information in spite of great proportions of missing data. The procedures these technologies use to impute genetic data, therefore, greatly affect downstream analyses. This study aims to (1) compare the genetic variance in a single-nucleotide polymorphism panel of soybean with missing data imputed using various methods, (2) evaluate the imputation accuracy and post-imputation quality associated with these methods, and (3) evaluate the impact of imputation method on heritability and the accuracy of genome-wide prediction of soybean traits. The imputation methods we evaluated were as follows: multivariate mixed model, hidden Markov model, logical algorithm, k-nearest neighbor, single value decomposition, and random forest. We used raw genotypes from the SoyNAM project and the following phenotypes: plant height, days to maturity, grain yield, and seed protein composition. RESULTS We propose an imputation method based on multivariate mixed models using pedigree information. Our methods comparison indicate that heritability of traits can be affected by the imputation method. Genotypes with missing values imputed with methods that make use of genealogic information can favor genetic analysis of highly polygenic traits, but not genome-wide prediction accuracy. The genotypic matrix captured the highest amount of genetic variance when missing loci were imputed by the method proposed in this paper. CONCLUSIONS We concluded that hidden Markov models and random forest imputation are more suitable to studies that aim analyses of highly heritable traits while pedigree-based methods can be used to best analyze traits with low heritability. Despite the notable contribution to heritability, advantages in genomic prediction were not observed by changing the imputation method. We identified significant differences across imputation methods in a dataset missing 20 % of the genotypic values. It means that genotypic data from genotyping technologies that provide a high proportion of missing values, such as GBS, should be handled carefully because the imputation method will impact downstream analysis.
Collapse
|
38
|
Multi-generational imputation of single nucleotide polymorphism marker genotypes and accuracy of genomic selection. Animal 2016; 10:1077-85. [PMID: 27076192 DOI: 10.1017/s1751731115002906] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Availability of high-density single nucleotide polymorphism (SNP) genotyping platforms provided unprecedented opportunities to enhance breeding programmes in livestock, poultry and plant species, and to better understand the genetic basis of complex traits. Using this genomic information, genomic breeding values (GEBVs), which are more accurate than conventional breeding values. The superiority of genomic selection is possible only when high-density SNP panels are used to track genes and QTLs affecting the trait. Unfortunately, even with the continuous decrease in genotyping costs, only a small fraction of the population has been genotyped with these high-density panels. It is often the case that a larger portion of the population is genotyped with low-density and low-cost SNP panels and then imputed to a higher density. Accuracy of SNP genotype imputation tends to be high when minimum requirements are met. Nevertheless, a certain rate of genotype imputation errors is unavoidable. Thus, it is reasonable to assume that the accuracy of GEBVs will be affected by imputation errors; especially, their cumulative effects over time. To evaluate the impact of multi-generational selection on the accuracy of SNP genotypes imputation and the reliability of resulting GEBVs, a simulation was carried out under varying updating of the reference population, distance between the reference and testing sets, and the approach used for the estimation of GEBVs. Using fixed reference populations, imputation accuracy decayed by about 0.5% per generation. In fact, after 25 generations, the accuracy was only 7% lower than the first generation. When the reference population was updated by either 1% or 5% of the top animals in the previous generations, decay of imputation accuracy was substantially reduced. These results indicate that low-density panels are useful, especially when the generational interval between reference and testing population is small. As the generational interval increases, the imputation accuracies decay, although not at an alarming rate. In absence of updating of the reference population, accuracy of GEBVs decays substantially in one or two generations at the rate of 20% to 25% per generation. When the reference population is updated by 1% or 5% every generation, the decay in accuracy was 8% to 11% after seven generations using true and imputed genotypes. These results indicate that imputed genotypes provide a viable alternative, even after several generations, as long the reference and training populations are appropriately updated to reflect the genetic change in the population.
Collapse
|
39
|
Comparison of genetic evaluations for milk yield and fat yield using a polygenic model and three genomic–polygenic models with different sets of SNP genotypes in Thai multibreed dairy cattle. Livest Sci 2015. [DOI: 10.1016/j.livsci.2015.10.008] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
40
|
Abstract
Background Despite ongoing reductions in the cost of sequencing technologies, whole genome SNP genotype imputation is often used as an alternative for obtaining abundant SNP genotypes for genome wide association studies. Several existing genotype imputation methods can be efficient for this purpose, while achieving various levels of imputation accuracy. Recent empirical results have shown that the two-step imputation may improve accuracy by imputing the low density genotyped study animals to a medium density array first and then to the target density. We are interested in building a series of staircase arrays that lead the low density array to the high density array or even the whole genome, such that genotype imputation along these staircases can achieve the highest accuracy. Results For genotype imputation from a lower density to a higher density, we first show how to select untyped SNPs to construct a medium density array. Subsequently, we determine for each selected SNP those untyped SNPs to be imputed in the add-one two-step imputation, and lastly how the clusters of imputed genotype are pieced together as the final imputation result. We design extensive empirical experiments using several hundred sequenced and genotyped animals to demonstrate that our novel two-step piecemeal imputation always achieves an improvement compared to the one-step imputation by the state-of-the-art methods Beagle and FImpute. Using the two-step piecemeal imputation, we present some preliminary success on whole genome SNP genotype imputation for genotyped animals via a series of staircase arrays. Conclusions From a low SNP density to the whole genome, intermediate pseudo-arrays can be computationally constructed by selecting the most informative SNPs for untyped SNP genotype imputation. Such pseudo-array staircases are able to impute more accurately than the classic one-step imputation.
Collapse
|
41
|
Comparison among three variant callers and assessment of the accuracy of imputation from SNP array data to whole-genome sequence level in chicken. BMC Genomics 2015; 16:824. [PMID: 26486989 PMCID: PMC4618161 DOI: 10.1186/s12864-015-2059-2] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2015] [Accepted: 10/09/2015] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND The technical progress in the last decade has made it possible to sequence millions of DNA reads in a relatively short time frame. Several variant callers based on different algorithms have emerged and have made it possible to extract single nucleotide polymorphisms (SNPs) out of the whole-genome sequence. Often, only a few individuals of a population are sequenced completely and imputation is used to obtain genotypes for all sequence-based SNP loci for other individuals, which have been genotyped for a subset of SNPs using a genotyping array. METHODS First, we compared the sets of variants detected with different variant callers, namely GATK, freebayes and SAMtools, and checked the quality of genotypes of the called variants in a set of 50 fully sequenced white and brown layers. Second, we assessed the imputation accuracy (measured as the correlation between imputed and true genotype per SNP and per individual, and genotype conflict between father-progeny pairs) when imputing from high density SNP array data to whole-genome sequence using data from around 1000 individuals from six different generations. Three different imputation programs (Minimac, FImpute and IMPUTE2) were checked in different validation scenarios. RESULTS There were 1,741,573 SNPs detected by all three callers on the studied chromosomes 3, 6, and 28, which was 71.6 % (81.6 %, 88.0 %) of SNPs detected by GATK (SAMtools, freebayes) in total. Genotype concordance (GC) defined as the proportion of individuals whose array-derived genotypes are the same as the sequence-derived genotypes over all non-missing SNPs on the array were 0.98 (GATK), 0.97 (freebayes) and 0.98 (SAMtools). Furthermore, the percentage of variants that had high values (>0.9) for another three measures (non-reference sensitivity, non-reference genotype concordance and precision) were 90 (88, 75) for GATK (SAMtools, freebayes). With all imputation programs, correlation between original and imputed genotypes was >0.95 on average with randomly masked 1000 SNPs from the SNP array and >0.85 for a leave-one-out cross-validation within sequenced individuals. CONCLUSIONS Performance of all variant callers studied was very good in general, particularly for GATK and SAMtools. FImpute performed slightly worse than Minimac and IMPUTE2 in terms of genotype correlation, especially for SNPs with low minor allele frequency, while it had lowest numbers in Mendelian conflicts in available father-progeny pairs. Correlations of real and imputed genotypes remained constantly high even if individuals to be imputed were several generations away from the sequenced individuals.
Collapse
|
42
|
Accuracy of imputation using the most common sires as reference population in layer chickens. BMC Genet 2015; 16:101. [PMID: 26282557 PMCID: PMC4539854 DOI: 10.1186/s12863-015-0253-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2014] [Accepted: 07/10/2015] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Genotype imputation has become a standard practice in modern genetic research to increase genome coverage and improve the accuracy of genomic selection (GS) and genome-wide association studies (GWAS). We assessed accuracies of imputing 60K genotype data from lower density single nucleotide polymorphism (SNP) panels using a small set of the most common sires in a population of 2140 white layer chickens. Several factors affecting imputation accuracy were investigated, including the size of the reference population, the level of the relationship between the reference and validation populations, and minor allele frequency (MAF) of the SNP being imputed. RESULTS The accuracy of imputation was assessed with different scenarios using 22 and 62 carefully selected reference animals (Ref(22) and Ref(62)). Animal-specific imputation accuracy corrected for gene content was moderate on average (~ 0.80) in most scenarios and low in the 3K to 60K scenario. Maximum average accuracies were 0.90 and 0.93 for the most favourable scenario for Ref(22) and Ref(62) respectively, when SNPs were masked independent of their MAF. SNPs with low MAF were more difficult to impute, and the larger reference population considerably improved the imputation accuracy for these rare SNPs. When Ref(22) was used for imputation, the average imputation accuracy decreased by 0.04 when validation population was two instead of one generation away from the reference and increased again by 0.05 when validation was three generations away. Selecting the reference animals from the most common sires, compared with random animals from the population, considerably improved imputation accuracy for low MAF SNPs, but gave only limited improvement for other MAF classes. The allelic R(2) measure from Beagle software was found to be a good predictor of imputation reliability (correlation ~ 0.8) when the density of validation panel was very low (3K) and the MAF of the SNP and the size of the reference population were not extremely small. CONCLUSIONS Even with a very small number of animals in the reference population, reasonable accuracy of imputation can be achieved. Selecting a set of the most common sires, rather than selecting random animals for the reference population, improves the imputation accuracy of rare alleles, which may be a benefit when imputing with whole genome re-sequencing data.
Collapse
|
43
|
Strategies for genotype imputation in composite beef cattle. BMC Genet 2015; 16:99. [PMID: 26250698 PMCID: PMC4527250 DOI: 10.1186/s12863-015-0251-7] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2015] [Accepted: 07/09/2015] [Indexed: 11/23/2022] Open
Abstract
Background Genotype imputation has been used to increase genomic information, allow more animals in genome-wide analyses, and reduce genotyping costs. In Brazilian beef cattle production, many animals are resulting from crossbreeding and such an event may alter linkage disequilibrium patterns. Thus, the challenge is to obtain accurately imputed genotypes in crossbred animals. The objective of this study was to evaluate the best fitting and most accurate imputation strategy on the MA genetic group (the progeny of a Charolais sire mated with crossbred Canchim X Zebu cows) and Canchim cattle. The data set contained 400 animals (born between 1999 and 2005) genotyped with the Illumina BovineHD panel. Imputation accuracy of genotypes from the Illumina-Bovine3K (3K), Illumina-BovineLD (6K), GeneSeek-Genomic-Profiler (GGP) BeefLD (GGP9K), GGP-IndicusLD (GGP20Ki), Illumina-BovineSNP50 (50K), GGP-IndicusHD (GGP75Ki), and GGP-BeefHD (GGP80K) to Illumina-BovineHD (HD) SNP panels were investigated. Seven scenarios for reference and target populations were tested; the animals were grouped according with birth year (S1), genetic groups (S2 and S3), genetic groups and birth year (S4 and S5), gender (S6), and gender and birth year (S7). Analyses were performed using FImpute and BEAGLE software and computation run-time was recorded. Genotype imputation accuracy was measured by concordance rate (CR) and allelic R square (R2). Results The highest imputation accuracy scenario consisted of a reference population with males and females and a target population with young females. Among the SNP panels in the tested scenarios, from the 50K, GGP75Ki and GGP80K were the most adequate to impute to HD in Canchim cattle. FImpute reduced computation run-time to impute genotypes from 20 to 100 times when compared to BEAGLE. Conclusion The genotyping panels possessing at least 50 thousands markers are suitable for genotype imputation to HD with acceptable accuracy. The FImpute algorithm demonstrated a higher efficiency of imputed markers, especially in lower density panels. These considerations may assist to increase genotypic information, reduce genotyping costs, and aid in genomic selection evaluations in crossbred animals. Electronic supplementary material The online version of this article (doi:10.1186/s12863-015-0251-7) contains supplementary material, which is available to authorized users.
Collapse
|
44
|
Strategies for single nucleotide polymorphism (SNP) genotyping to enhance genotype imputation in Gyr (Bos indicus) dairy cattle: Comparison of commercially available SNP chips. J Dairy Sci 2015; 98:4969-89. [DOI: 10.3168/jds.2014-9213] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2014] [Accepted: 03/22/2015] [Indexed: 01/15/2023]
|
45
|
Effect of reference population size and available ancestor genotypes on imputation of Mexican Holstein genotypes. J Dairy Sci 2015; 98:3478-84. [PMID: 25771055 DOI: 10.3168/jds.2014-9132] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2014] [Accepted: 02/02/2015] [Indexed: 11/19/2022]
Abstract
The effects of reference population size and the availability of information from genotyped ancestors on the accuracy of imputation of single nucleotide polymorphisms (SNP) were investigated for Mexican Holstein cattle. Three scenarios for reference population size were examined: (1) a local population of 2,011 genotyped Mexican Holsteins, (2) animals in scenario 1 plus 866 Holsteins in the US genotype database (GDB) with genotyped Mexican daughters, and (3) animals in scenario 1 and all US GDB Holsteins (338,073). Genotypes from 4 chip densities (2 low density, 1 mid density, and 1 high density) were imputed using findhap (version 3) to the 45,195 markers on the mid-density chip. Imputation success was determined by comparing the numbers of SNP with 1 or 2 alleles missing and the numbers of differently predicted SNP (conflicts) among the 3 scenarios. Imputation accuracy improved as chip density and numbers of genotyped ancestors increased, and the percentage of SNP with 1 missing allele was greater than that for 2 missing alleles for all scenarios. The largest numbers of conflicts were found between scenarios 1 and 3. The inclusion of information from direct ancestors (dam or sire) with US GDB genotypes in the imputation of Mexican Holstein genotypes increased imputation accuracy by 1 percentage point for low-density genotypes and by 0.5 percentage points for high-density genotypes, which was about half the gain found with information from all US GDB Holsteins. A larger reference population and the availability of genotyped ancestors improved imputation; animals with genotyped parents in a large reference population had higher imputation accuracy than those with no or few genotyped relatives in a small reference population. For small local populations, including genotypes from other related populations can aid in improving imputation accuracy.
Collapse
|
46
|
Accuracy of genome-wide imputation in Braford and Hereford beef cattle. BMC Genet 2014; 15:157. [PMID: 25543517 PMCID: PMC4300607 DOI: 10.1186/s12863-014-0157-9] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2014] [Accepted: 12/18/2014] [Indexed: 12/31/2022] Open
Abstract
Background Strategies for imputing genotypes from the Illumina-Bovine3K, Illumina-BovineLD (6K), BeefLD-GGP (8K), a non-commercial-15K and IndicusLD-GGP (20K) to either Illumina-BovineSNP50 (50K) or to Illumina-BovineHD (777K) SNP panel, as well as for imputing from 50K, GGP-IndicusHD (90iK) and GGP-BeefHD (90tK) to 777K were investigated. Imputation of low density (<50K) genotypes to 777K was carried out in either one or two steps. Imputation of ungenotyped parents (n = 37 sires) with four or more offspring to the 50K panel was also assessed. There were 2,946 Braford, 664 Hereford and 88 Nellore animals, from which 71, 59 and 88 were genotyped with the 777K panel, while all others had 50K genotypes. The reference population was comprised of 2,735 animals and 175 bulls for 50K and 777K, respectively. The low density panels were simulated by masking genotypes in the 50K or 777K panel for animals born in 2011. Analyses were performed using both Beagle and FImpute software. Genotype imputation accuracy was measured by concordance rate and allelic R2 between true and imputed genotypes. Results The average concordance rate using FImpute was 0.943 and 0.921 averaged across all simulated low density panels to 50K or to 777K, respectively, in comparison with 0.927 and 0.895 using Beagle. The allelic R2 was 0.912 and 0.866 for imputation to 50K or to 777K using FImpute, respectively, and 0.890 and 0.826 using Beagle. One and two steps imputation to 777K produced averaged concordance rates of 0.806 and 0.892 and allelic R2 of 0.674 and 0.819, respectively. Imputation of low density panels to 50K, with the exception of 3K, had overall concordance rates greater than 0.940 and allelic R2 greater than 0.919. Ungenotyped animals were imputed to 50K panel with an average concordance rate of 0.950 by FImpute. Conclusion FImpute accuracy outperformed Beagle on both imputation to 50K and to 777K. Two-step outperformed one-step imputation for imputing to 777K. Ungenotyped animals that have four or more offspring can have their 50K genotypes accurately inferred using FImpute. All low density panels, except the 3K, can be used to impute to the 50K using FImpute or Beagle with high concordance rate and allelic R2.
Collapse
|
47
|
Abstract
Genotype imputation is routinely applied in a large number of cattle breeds. Imputation has become a need due to the large number of SNP arrays with variable density (currently, from 2900 to 777,962 SNPs). Although many authors have studied the effect of different statistical methods on imputation accuracy, the impact of a (likely) change in the reference genome assembly on imputation from lower to higher density has not been determined so far. In this work, 1021 Italian Simmental SNP genotypes were remapped on the three most recent reference genome assemblies. Four imputation methods were used to assess the impact of an update in the reference genome. As expected, the four methods behaved differently, with large differences in terms of accuracy. Updating SNP coordinates on the three tested cattle reference genome assemblies determined only a slight variation on imputation results within method.
Collapse
|
48
|
Imputation of missing genotypes from low- to high-density SNP panel in different population designs. Anim Genet 2014; 46:1-7. [PMID: 25431355 DOI: 10.1111/age.12236] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/10/2014] [Indexed: 01/28/2023]
Abstract
Imputation of missing genotypes, in particular from low density to high density, is an important issue in genomic selection and genome-wide association studies. Given the marker densities, the most important factors affecting imputation accuracy are the size of the reference population and the relationship between individuals in the reference (genotyped with high-density panel) and study (genotyped with low-density panel) populations. In this study, we investigated the imputation accuracies when the reference population (genotyped with Illumina BovineSNP50 SNP panel) contained sires, halfsibs, or both sires and halfsibs of the individuals in the study population (genotyped with Illumina BovineLD SNP panel) using three imputation programs (fimpute v2.2, findhap v2, and beagle v3.3.2). Two criteria, correlation between true and imputed genotypes and missing rate after imputation, were used to evaluate the performance of the three programs in different scenarios. Our results showed that fimpute performed the best in all cases, with correlations from 0.921 to 0.978 when imputing from sires to their daughters or between halfsibs. In general, the accuracies of imputing between halfsibs or from sires to their daughters were higher than were those imputing between non-halfsibs or from sires to non-daughters. Including both sires and halfsibs in the reference population did not improve the imputation performance in comparison with when only including halfsibs in the reference population for all the three programs.
Collapse
|
49
|
Abstract
Background Genotype imputation from low-density (LD) to high-density single nucleotide polymorphism (SNP) chips is an important step before applying genomic selection, since denser chips tend to provide more reliable genomic predictions. Imputation methods rely partially on linkage disequilibrium between markers to infer unobserved genotypes. Bos indicus cattle (e.g. Nelore breed) are characterized, in general, by lower levels of linkage disequilibrium between genetic markers at short distances, compared to taurine breeds. Thus, it is important to evaluate the accuracy of imputation to better define which imputation method and chip are most appropriate for genomic applications in indicine breeds. Methods Accuracy of genotype imputation in Nelore cattle was evaluated using different LD chips, imputation software and sets of animals. Twelve commercial and customized LD chips with densities ranging from 7 K to 75 K were tested. Customized LD chips were virtually designed taking into account minor allele frequency, linkage disequilibrium and distance between markers. Software programs FImpute and BEAGLE were applied to impute genotypes. From 995 bulls and 1247 cows that were genotyped with the Illumina® BovineHD chip (HD), 793 sires composed the reference set, and the remaining 202 younger sires and all the cows composed two separate validation sets for which genotypes were masked except for the SNPs of the LD chip that were to be tested. Results Imputation accuracy increased with the SNP density of the LD chip. However, the gain in accuracy with LD chips with more than 15 K SNPs was relatively small because accuracy was already high at this density. Commercial and customized LD chips with equivalent densities presented similar results. FImpute outperformed BEAGLE for all LD chips and validation sets. Regardless of the imputation software used, accuracy tended to increase as the relatedness between imputed and reference animals increased, especially for the 7 K chip. Conclusions If the Illumina® BovineHD is considered as the target chip for genomic applications in the Nelore breed, cost-effectiveness can be improved by genotyping part of the animals with a chip containing around 15 K useful SNPs and imputing their high-density missing genotypes with FImpute. Electronic supplementary material The online version of this article (doi:10.1186/s12711-014-0069-1) contains supplementary material, which is available to authorized users.
Collapse
|
50
|
Imputation of sequence level genotypes in the Franches-Montagnes horse breed. Genet Sel Evol 2014; 46:63. [PMID: 25927638 PMCID: PMC4180851 DOI: 10.1186/s12711-014-0063-7] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2014] [Accepted: 09/11/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A cost-effective strategy to increase the density of available markers within a population is to sequence a small proportion of the population and impute whole-genome sequence data for the remaining population. Increased densities of typed markers are advantageous for genome-wide association studies (GWAS) and genomic predictions. METHODS We obtained genotypes for 54 602 SNPs (single nucleotide polymorphisms) in 1077 Franches-Montagnes (FM) horses and Illumina paired-end whole-genome sequencing data for 30 FM horses and 14 Warmblood horses. After variant calling, the sequence-derived SNP genotypes (~13 million SNPs) were used for genotype imputation with the software programs Beagle, Impute2 and FImpute. RESULTS The mean imputation accuracy of FM horses using Impute2 was 92.0%. Imputation accuracy using Beagle and FImpute was 74.3% and 77.2%, respectively. In addition, for Impute2 we determined the imputation accuracy of all individual horses in the validation population, which ranged from 85.7% to 99.8%. The subsequent inclusion of Warmblood sequence data further increased the correlation between true and imputed genotypes for most horses, especially for horses with a high level of admixture. The final imputation accuracy of the horses ranged from 91.2% to 99.5%. CONCLUSIONS Using Impute2, the imputation accuracy was higher than 91% for all horses in the validation population, which indicates that direct imputation of 50k SNP-chip data to sequence level genotypes is feasible in the FM population. The individual imputation accuracy depended mainly on the applied software and the level of admixture.
Collapse
|