Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Brøndum RF, Su G, Janss L, Sahana G, Guldbrandtsen B, Boichard D, Lund MS. Quantitative trait loci markers derived from whole genome sequence data increases the reliability of genomic prediction. J Dairy Sci 2015;98:4107-16. [PMID: 25892697 DOI: 10.3168/jds.2014-9005] [Citation(s) in RCA: 106] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2014] [Accepted: 03/12/2015] [Indexed: 12/30/2022]

For:	Brøndum RF, Su G, Janss L, Sahana G, Guldbrandtsen B, Boichard D, Lund MS. Quantitative trait loci markers derived from whole genome sequence data increases the reliability of genomic prediction. J Dairy Sci 2015;98:4107-16. [PMID: 25892697 DOI: 10.3168/jds.2014-9005] [Citation(s) in RCA: 106] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2014] [Accepted: 03/12/2015] [Indexed: 12/30/2022]

Number

Cited by Other Article(s)

de Las Heras-Saldana S, Lopez BI, Moghaddar N, Park W, Park JE, Chung KY, Lim D, Lee SH, Shin D, van der Werf JHJ. Use of gene expression and whole-genome sequence information to improve the accuracy of genomic prediction for carcass traits in Hanwoo cattle. Genet Sel Evol 2020;52:54. [PMID: 32993481 PMCID: PMC7525992 DOI: 10.1186/s12711-020-00574-2] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Accepted: 09/18/2020] [Indexed: 12/21/2022] Open

Abstract

Background

In this study, we assessed the accuracy of genomic prediction for carcass weight (CWT), marbling score (MS), eye muscle area (EMA) and back fat thickness (BFT) in Hanwoo cattle when using genomic best linear unbiased prediction (GBLUP), weighted GBLUP (wGBLUP), and a BayesR model. For these models, we investigated the potential gain from using pre-selected single nucleotide polymorphisms (SNPs) from a genome-wide association study (GWAS) on imputed sequence data and from gene expression information. We used data on 13,717 animals with carcass phenotypes and imputed sequence genotypes that were split in an independent GWAS discovery set of varying size and a remaining set for validation of prediction. Expression data were used from a Hanwoo gene expression experiment based on 45 animals.

Results

Using a larger number of animals in the reference set increased the accuracy of genomic prediction whereas a larger independent GWAS discovery dataset improved identification of predictive SNPs. Using pre-selected SNPs from GWAS in GBLUP improved accuracy of prediction by 0.02 for EMA and up to 0.05 for BFT, CWT, and MS, compared to a 50 k standard SNP array that gave accuracies of 0.50, 0.47, 0.58, and 0.47, respectively. Accuracy of prediction of BFT and CWT increased when BayesR was applied with the 50 k SNP array (0.02 and 0.03, respectively) and was further improved by combining the 50 k array with the top-SNPs (0.06 and 0.04, respectively). By contrast, using BayesR resulted in limited improvement for EMA and MS. wGBLUP did not improve accuracy but increased prediction bias. Based on the RNA-seq experiment, we identified informative expression quantitative trait loci, which, when used in GBLUP, improved the accuracy of prediction slightly, i.e. between 0.01 and 0.02. SNPs that were located in genes, the expression of which was associated with differences in trait phenotype, did not contribute to a higher prediction accuracy.

Conclusions

Our results show that, in Hanwoo beef cattle, when SNPs are pre-selected from GWAS on imputed sequence data, the accuracy of prediction improves only slightly whereas the contribution of SNPs that are selected based on gene expression is not significant. The benefit of statistical models to prioritize selected SNPs for estimating genomic breeding values is trait-specific and depends on the genetic architecture of each trait.

Collapse

Teng J, Huang S, Chen Z, Gao N, Ye S, Diao S, Ding X, Yuan X, Zhang H, Li J, Zhang Z. Optimizing genomic prediction model given causal genes in a dairy cattle population. J Dairy Sci 2020;103:10299-10310. [PMID: 32952023 DOI: 10.3168/jds.2020-18233] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Accepted: 07/07/2020] [Indexed: 01/15/2023]

Abstract

As genotypic data are moving from SNP chip toward whole-genome sequence, the accuracy of genomic prediction (GP) exhibits a marginal gain, although all genetic variation, including causal genes, are contained in whole-genome sequence data. Meanwhile, genetic analyses on complex traits, such as genome-wide association studies, have identified an increasing number of genomic regions, including potential causal genes, which would be reliable prior knowledge for GP. Many studies have tried to improve the performance of GP by modifying the prediction model to incorporate prior knowledge. Although several plausible results have been obtained from model modification or strategy optimization, most of them were validated in a specific empirical population with a limited variety of genetic architecture for complex traits. An alternative approach is to use simulated genetic architecture with known causal genes (e.g., simulated causative SNP) to evaluate different GP models with given causal genes. Our objectives were to (1) evaluate the performance of GP under a variety of genetic architectures with a subset of known causal genes and (2) compare different GP models modified by highlighting causal genes and different strategies to weight causal genes. In this study, we simulated pseudo-phenotypes under a variety of genetic architectures based on the real genotypes and phenotypes of a dairy cattle population. Besides classical genomic best linear unbiased prediction, we evaluated 3 modified GP models that highlight causal genes as follows: (1) by treating them as fixed effects, (2) by treating them as a separate random component, and (3) by combining them into the genomic relationship matrix as random effects. Our results showed that highlighting the known causal genes, which explained a considerable proportion of genetic variance in the GP models, increased the predictive accuracy. Combining all given causal genes into the genomic relationship matrix was the optimal strategy under all the scenarios validated, and treating causal genes as a separate random component is also recommended, when more than 20% of genetic variance was explained by known causal genes. Moreover, assigning differential weights to each causal gene further improved the predictive accuracy.

Collapse

Affiliation(s)

Jinyan Teng Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
Shuwen Huang Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
Zitao Chen Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
Ning Gao State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, North Third Road, Guangzhou Higher Education Mega Center, Guangzhou 510006, China
Shaopan Ye Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
Shuqi Diao Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
Xiangdong Ding National Engineering Laboratory for Animal Breeding, Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China
Xiaolong Yuan Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
Hao Zhang Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
Jiaqi Li Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
Zhe Zhang Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China.

Collapse

Shabalina T, Yin T, König S. Survival analyses in Holstein cows considering direct disease diagnoses and specific SNP marker effects. J Dairy Sci 2020;103:8257-8273. [DOI: 10.3168/jds.2020-18174] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2020] [Accepted: 05/07/2020] [Indexed: 12/11/2022]

Liu A, Lund MS, Boichard D, Karaman E, Guldbrandtsen B, Fritz S, Aamand GP, Nielsen US, Sahana G, Wang Y, Su G. Weighted single-step genomic best linear unbiased prediction integrating variants selected from sequencing data by association and bioinformatics analyses. Genet Sel Evol 2020;52:48. [PMID: 32799816 PMCID: PMC7429790 DOI: 10.1186/s12711-020-00568-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Accepted: 08/07/2020] [Indexed: 11/30/2022] Open

Abstract

Background

Sequencing data enable the detection of causal loci or single nucleotide polymorphisms (SNPs) highly linked to causal loci to improve genomic prediction. However, until now, studies on integrating such SNPs using a single-step genomic best linear unbiased prediction (ssGBLUP) model are scarce. We investigated the integration of sequencing SNPs selected by association (1262 SNPs) and bioinformatics (2359 SNPs) analyses into the currently used 54K-SNP chip, using three ssGBLUP models which make different assumptions on the distribution of SNP effects: a basic ssGBLUP model, a so-called featured ssGBLUP (ssFGBLUP) model that considered selected sequencing SNPs as a feature genetic component, and a weighted ssGBLUP (ssWGBLUP) model in which the genomic relationship matrix was weighted by the SNP variances estimated from a Bayesian whole-genome regression model, with every 1, 30, or 100 adjacent SNPs within a chromosome region sharing the same variance. We used data on milk production and female fertility in Danish Jersey. In total, 15,823 genotyped and 528,981‬ non-genotyped females born between 1990 and 2013 were used as reference population and 7415 genotyped females and 33,040 non-genotyped females born between 2014 and 2016 were used as validation population.

Results

With basic ssGBLUP, integrating SNPs selected from sequencing data improved prediction reliabilities for milk and protein yields, but resulted in limited or no improvement for fat yield and female fertility. Model performances depended on the SNP set used. When using ssWGBLUP with the 54K SNPs, reliabilities for milk and protein yields improved by 0.028 for genotyped animals and by 0.006 for non-genotyped animals compared with ssGBLUP. However, with the SNP set that included SNPs selected from sequencing data, no statistically significant difference in prediction reliability was observed between the three ssGBLUP models.

Conclusions

In summary, when using 54K SNPs, a ssWGBLUP model with a common weight on the SNPs in a given region is a feasible approach for single-trait genetic evaluation. Integrating relevant SNPs selected from sequencing data into the standard SNP chip can improve the reliability of genomic prediction. Based on such SNP data, a basic ssGBLUP model was suggested since no significant improvement was observed from using alternative models such as ssWGBLUP and ssFGBLUP.

Collapse

Lourenco D, Legarra A, Tsuruta S, Masuda Y, Aguilar I, Misztal I. Single-Step Genomic Evaluations from Theory to Practice: Using SNP Chips and Sequence Data in BLUPF90. Genes (Basel) 2020;11:E790. [PMID: 32674271 PMCID: PMC7397237 DOI: 10.3390/genes11070790] [Citation(s) in RCA: 60] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Revised: 07/03/2020] [Accepted: 07/06/2020] [Indexed: 11/16/2022] Open

Konstantinov KV, Goddard ME. Application of multivariate single-step SNP best linear unbiased predictor model and revised SNP list for genomic evaluation of dairy cattle in Australia. J Dairy Sci 2020;103:8305-8316. [PMID: 32622609 DOI: 10.3168/jds.2020-18242] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Accepted: 04/21/2020] [Indexed: 11/19/2022]

Abstract

The objectives of this study were (1) to evaluate the computational feasibility of the multitrait test-day single-step SNP-BLUP (ssSNP-BLUP) model using phenotypic records of genotyped and nongenotyped animals, and (2) to compare accuracies (coefficient of determination; R²) and bias of genomic estimated breeding values (GEBV) and de-regressed proofs as response variables in 3 Australian dairy cattle breeds (i.e., Holstein, Jersey, and Red breeds). Additive genomic random regression coefficients for milk, fat, protein yield and somatic cell score were predicted in the first, second, and third lactation. The predicted coefficients were used to derive 305-d GEBV and were compared with the traditional parent averages obtained from a BLUP model without genomic information. Cow fertility traits were evaluated from the 5-trait repeatability model (i.e., calving interval, days from calving to first service, pregnancy diagnosis, first service nonreturn rate, and lactation length). The de-regressed proofs were only for calving interval. Our results showed that ssSNP-BLUP using multitrait test-day model increased reliability and reduced bias of breeding values of young animals when compared with parent average from traditional BLUP in Australian Holsten, Jersey, and Red breeds. The use of a custom selection of approximately 46,000 SNP (custom XT SNP list) increased the reliability of GEBV compared with the results obtained using the commercial Illumina 50K chip (Illumina, San Diego, CA). The use of the second preconditioner substantially improved the convergence rate of the preconditioned conjugate gradient method, but further work is needed to improve the efficiency of the computation of the Kronecker matrix product by vector. Application of ssSNP-BLUP to multitrait random regression models is computationally feasible.

Collapse

Liu A, Lund MS, Boichard D, Mao X, Karaman E, Fritz S, Aamand GP, Wang Y, Su G. Imputation for sequencing variants preselected to a customized low-density chip. Sci Rep 2020;10:9524. [PMID: 32533087 PMCID: PMC7293337 DOI: 10.1038/s41598-020-66523-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2019] [Accepted: 05/19/2020] [Indexed: 12/27/2022] Open

Warburton CL, Engle BN, Ross EM, Costilla R, Moore SS, Corbet NJ, Allen JM, Laing AR, Fordyce G, Lyons RE, McGowan MR, Burns BM, Hayes BJ. Use of whole-genome sequence data and novel genomic selection strategies to improve selection for age at puberty in tropically-adapted beef heifers. Genet Sel Evol 2020;52:28. [PMID: 32460805 PMCID: PMC7251835 DOI: 10.1186/s12711-020-00547-5] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2019] [Accepted: 05/15/2020] [Indexed: 12/14/2022] Open

Abstract

Background

In tropically-adapted beef heifers, application of genomic prediction for age at puberty has been limited due to low prediction accuracies. Our aim was to investigate novel methods of pre-selecting whole-genome sequence (WGS) variants and alternative analysis methodologies; including genomic best linear unbiased prediction (GBLUP) with multiple genomic relationship matrices (MGRM) and Bayesian (BayesR) analyses, to determine if prediction accuracy for age at puberty can be improved.

Methods

Genotypes and phenotypes were obtained from two research herds. In total, 868 Brahman and 960 Tropical Composite heifers were recorded in the first population and 3695 Brahman, Santa Gertrudis and Droughtmaster heifers were recorded in the second population. Genotypes were imputed to 23 million whole-genome sequence variants. Eight strategies were used to pre-select variants from genome-wide association study (GWAS) results using conditional or joint (COJO) analyses. Pre-selected variants were included in three models, GBLUP with a single genomic relationship matrix (SGRM), GBLUP MGRM and BayesR. Five-way cross-validation was used to test the effect of marker panel density (6 K, 50 K and 800 K), analysis model, and inclusion of pre-selected WGS variants on prediction accuracy.

Results

In all tested scenarios, prediction accuracies for age at puberty were highest in BayesR analyses. The addition of pre-selected WGS variants had little effect on the accuracy of prediction when BayesR was used. The inclusion of WGS variants that were pre-selected using a meta-analysis with COJO analyses by chromosome, fitted in a MGRM model, had the highest prediction accuracies in the GBLUP analyses, regardless of marker density. When the low-density (6 K) panel was used, the prediction accuracy of GBLUP was equal (0.42) to that with the high-density panel when only six additional sequence variants (identified using meta-analysis COJO by chromosome) were included.

Conclusions

While BayesR consistently outperforms other methods in terms of prediction accuracies, reasonable improvements in accuracy can be achieved when using GBLUP and low-density panels with the inclusion of a relatively small number of highly relevant WGS variants.

Collapse

Raymond B, Wientjes YCJ, Bouwman AC, Schrooten C, Veerkamp RF. A deterministic equation to predict the accuracy of multi-population genomic prediction with multiple genomic relationship matrices. Genet Sel Evol 2020;52:21. [PMID: 32345213 PMCID: PMC7189707 DOI: 10.1186/s12711-020-00540-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2019] [Accepted: 04/14/2020] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

A multi-population genomic prediction (GP) model in which important pre-selected single nucleotide polymorphisms (SNPs) are differentially weighted (MPMG) has been shown to result in better prediction accuracy than a multi-population, single genomic relationship matrix ([Formula: see text]) GP model (MPSG) in which all SNPs are weighted equally. Our objective was to underpin theoretically the advantages and limits of the MPMG model over the MPSG model, by deriving and validating a deterministic prediction equation for its accuracy.

METHODS

Using selection index theory, we derived an equation to predict the accuracy of estimated total genomic values of selection candidates from population [Formula: see text] ([Formula: see text]), when individuals from two populations, [Formula: see text] and [Formula: see text], are combined in the training population and two [Formula: see text], made respectively from pre-selected and remaining SNPs, are fitted simultaneously in MPMG. We used simulations to validate the prediction equation in scenarios that differed in the level of genetic correlation between populations, heritability, and proportion of genetic variance explained by the pre-selected SNPs. Empirical accuracy of the MPMG model in each scenario was calculated and compared to the predicted accuracy from the equation.

RESULTS

In general, the derived prediction equation resulted in accurate predictions of [Formula: see text] for the scenarios evaluated. Using the prediction equation, we showed that an important advantage of the MPMG model over the MPSG model is its ability to benefit from the small number of independent chromosome segments ([Formula: see text]) due to the pre-selected SNPs, both within and across populations, whereas for the MPSG model, there is only a single value for [Formula: see text], calculated based on all SNPs, which is very large. However, this advantage is dependent on the pre-selected SNPs that explain some proportion of the total genetic variance for the trait.

CONCLUSIONS

We developed an equation that gives insight into why, and under which conditions the MPMG outperforms the MPSG model for GP. The equation can be used as a deterministic tool to assess the potential benefit of combining information from different populations, e.g., different breeds or lines for GP in livestock or plants, or different groups of people based on their ethnic background for prediction of disease risk scores.

Collapse

Misztal I, Lourenco D, Legarra A. Current status of genomic evaluation. J Anim Sci 2020;98:skaa101. [PMID: 32267923 PMCID: PMC7183352 DOI: 10.1093/jas/skaa101] [Citation(s) in RCA: 64] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2020] [Accepted: 04/07/2020] [Indexed: 12/14/2022] Open

Abstract

Early application of genomic selection relied on SNP estimation with phenotypes or de-regressed proofs (DRP). Chips of 50k SNP seemed sufficient for an accurate estimation of SNP effects. Genomic estimated breeding values (GEBV) were composed of an index with parent average, direct genomic value, and deduction of a parental index to eliminate double counting. Use of SNP selection or weighting increased accuracy with small data sets but had minimal to no impact with large data sets. Efforts to include potentially causative SNP derived from sequence data or high-density chips showed limited or no gain in accuracy. After the implementation of genomic selection, EBV by BLUP became biased because of genomic preselection and DRP computed based on EBV required adjustments, and the creation of DRP for females is hard and subject to double counting. Genomic selection was greatly simplified by single-step genomic BLUP (ssGBLUP). This method based on combining genomic and pedigree relationships automatically creates an index with all sources of information, can use any combination of male and female genotypes, and accounts for preselection. To avoid biases, especially under strong selection, ssGBLUP requires that pedigree and genomic relationships are compatible. Because the inversion of the genomic relationship matrix (G) becomes costly with more than 100k genotyped animals, large data computations in ssGBLUP were solved by exploiting limited dimensionality of genomic data due to limited effective population size. With such dimensionality ranging from 4k in chickens to about 15k in cattle, the inverse of G can be created directly (e.g., by the algorithm for proven and young) at a linear cost. Due to its simplicity and accuracy, ssGBLUP is routinely used for genomic selection by the major chicken, pig, and beef industries. Single step can be used to derive SNP effects for indirect prediction and for genome-wide association studies, including computations of the P-values. Alternative single-step formulations exist that use SNP effects for genotyped or for all animals. Although genomics is the new standard in breeding and genetics, there are still some problems that need to be solved. This involves new validation procedures that are unaffected by selection, parameter estimation that accounts for all the genomic data used in selection, and strategies to address reduction in genetic variances after genomic selection was implemented.

Collapse

Sehgal D, Rosyara U, Mondal S, Singh R, Poland J, Dreisigacker S. Incorporating Genome-Wide Association Mapping Results Into Genomic Prediction Models for Grain Yield and Yield Stability in CIMMYT Spring Bread Wheat. FRONTIERS IN PLANT SCIENCE 2020;11:197. [PMID: 32194596 PMCID: PMC7064468 DOI: 10.3389/fpls.2020.00197] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2019] [Accepted: 02/11/2020] [Indexed: 05/21/2023]

Liu T, Luo C, Ma J, Wang Y, Shu D, Su G, Qu H. High-Throughput Sequencing With the Preselection of Markers Is a Good Alternative to SNP Chips for Genomic Prediction in Broilers. Front Genet 2020;11:108. [PMID: 32174971 PMCID: PMC7056902 DOI: 10.3389/fgene.2020.00108] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2019] [Accepted: 01/30/2020] [Indexed: 11/13/2022] Open

Abstract

The choice of a genetic marker genotyping platform is important for genomic prediction in livestock and poultry. High-throughput sequencing can produce more genetic markers, but the genotype quality is lower than that obtained with single nucleotide polymorphism (SNP) chips. The aim of this study was to compare the accuracy of genomic prediction between high-throughput sequencing and SNP chips in broilers. In this study, we developed a new SNP marker screening method, the pre-marker-selection (PMS) method, to determine whether an SNP marker can be used for genomic prediction. We also compared a method which preselection marker based results from genome-wide association studies (GWAS). With the two methods, we analysed body weight at the12^th week (BW) and feed conversion ratio (FCR) in a local broiler population. A total of 395 birds were selected from the F2 generation of the population, and 10X specific-locus amplified fragment sequencing (SLAF-seq) and the Illumina Chicken 60K SNP Beadchip were used for genotyping. The genomic best linear unbiased prediction method (GBLUP) was used to predict the genomic breeding values. The accuracy of genomic prediction was validated by the leave-one-out cross-validation method. Without SNP marker screening, the accuracies of the genomic estimated breeding value (GEBV) of BW and FCR were 0.509 and 0.249, respectively, when using SLAF-seq, and the accuracies were 0.516 and 0.232, respectively, when using the SNP chip. With SNP marker screening by the PMS method, the accuracies of GEBV of the two traits were 0.671 and 0.499, respectively, when using SLAF-seq, and 0.605 and 0.422, respectively, when using the SNP chip. Our SNP marker screening method led to an increase of prediction accuracy by 0.089-0.250. With SNP marker screening by the GWAS method, the accuracies of genomic prediction for the two traits were also improved, but the gains of accuracy were less than the gains with PMS method for all traits. The results from this study indicate that our PMS method can improve the accuracy of GEBV, and that more accurate genomic prediction can be obtained from an increased number of genomic markers when using high-throughput sequencing in local broiler populations. Due to its lower genotyping cost, high-throughput sequencing could be a good alternative to SNP chips for genomic prediction in breeding programmes of local broiler populations.

Collapse

Groß C, Derks M, Megens HJ, Bosse M, Groenen MAM, Reinders M, de Ridder D. pCADD: SNV prioritisation in Sus scrofa. Genet Sel Evol 2020;52:4. [PMID: 32033531 PMCID: PMC7006094 DOI: 10.1186/s12711-020-0528-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2019] [Accepted: 01/28/2020] [Indexed: 01/22/2023] Open

Abstract

BACKGROUND

In animal breeding, identification of causative genetic variants is of major importance and high economical value. Usually, the number of candidate variants exceeds the number of variants that can be validated. One way of prioritizing probable candidates is by evaluating their potential to have a deleterious effect, e.g. by predicting their consequence. Due to experimental difficulties to evaluate variants that do not cause an amino-acid substitution, other prioritization methods are needed. For human genomes, the prediction of deleterious genomic variants has taken a step forward with the introduction of the combined annotation dependent depletion (CADD) method. In theory, this approach can be applied to any species. Here, we present pCADD (p for pig), a model to score single nucleotide variants (SNVs) in pig genomes.

RESULTS

To evaluate whether pCADD captures sites with biological meaning, we used transcripts from miRNAs and introns, sequences from genes that are specific for a particular tissue, and the different sites of codons, to test how well pCADD scores differentiate between functional and non-functional elements. Furthermore, we conducted an assessment of examples of non-coding and coding SNVs, which are causal for changes in phenotypes. Our results show that pCADD scores discriminate between functional and non-functional sequences and prioritize functional SNVs, and that pCADD is able to score the different positions in a codon relative to their redundancy. Taken together, these results indicate that based on pCADD scores, regions with biological relevance can be identified and distinguished according to their rate of adaptation.

CONCLUSIONS

We present the ability of pCADD to prioritize SNVs in the pig genome with respect to their putative deleteriousness, in accordance to the biological significance of the region in which they are located. We created scores for all possible SNVs, coding and non-coding, for all autosomes and the X chromosome of the pig reference sequence Sscrofa11.1, proposing a toolbox to prioritize variants and evaluate sequences to highlight new sites of interest to explain biological functions that are relevant to animal breeding.

Collapse

Oliveira Júnior GA, Santos DJA, Cesar ASM, Boison SA, Ventura RV, Perez BC, Garcia JF, Ferraz JBS, Garrick DJ. Fine mapping of genomic regions associated with female fertility in Nellore beef cattle based on sequence variants from segregating sires. J Anim Sci Biotechnol 2019;10:97. [PMID: 31890201 PMCID: PMC6913038 DOI: 10.1186/s40104-019-0403-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2019] [Accepted: 11/11/2019] [Indexed: 12/26/2022] Open

Abstract

Background

Impaired fertility in cattle limits the efficiency of livestock production systems. Unraveling the genetic architecture of fertility traits would facilitate their improvement by selection. In this study, we characterized SNP chip haplotypes at QTL blocks then used whole-genome sequencing to fine map genomic regions associated with reproduction in a population of Nellore (Bos indicus) heifers.

Methods

The dataset comprised of 1337 heifers genotyped using a GeneSeek® Genomic Profiler panel (74677 SNPs), representing the daughters from 78 sires. After performing marker quality control, 64800 SNPs were retained. Haplotypes carried by each sire at six previously identified QTL on BTAs 5, 14 and 18 for heifer pregnancy and BTAs 8, 11 and 22 for antral follicle count were constructed using findhap software. The significance of the contrasts between the effects of every two paternally-inherited haplotype alleles were used to identify sires that were heterozygous at each QTL. Whole-genome sequencing data localized to the haplotypes from six sires and 20 other ancestors were used to identify sequence variants that were concordant with the haplotype contrasts. Enrichment analyses were applied to these variants using KEGG and MeSH libraries.

Results

A total of six (BTA 5), six (BTA 14) and five (BTA 18) sires were heterozygous for heifer pregnancy QTL whereas six (BTA 8), fourteen (BTA 11), and five (BTA 22) sires were heterozygous for number of antral follicles’ QTL. Due to inadequate representation of many haplotype alleles in the sequenced animals, fine mapping analysis could only be reliably performed for the QTL on BTA 5 and 14, which had 641 and 3733 concordant candidate sequence variants, respectively. The KEGG “Circadian rhythm” and “Neurotrophin signaling pathway” were significantly associated with the genes in the QTL on BTA 5 whereas 32 MeSH terms were associated with the QTL on BTA 14. Among the concordant sequence variants, 0.2% and 0.3% were classified as missense variants for BTAs 5 and 14, respectively, highlighting the genes MTERF2, RTMB, ENSBTAG00000037306 (miRNA), ENSBTAG00000040351, PRKDC, and RGS20. The potential causal mutations found in the present study were associated with biological processes such as oocyte maturation, embryo development, placenta development and response to reproductive hormones.

Conclusions

The identification of heterozygous sires by positionally phasing SNP chip data and contrasting haplotype effects for previously detected QTL can be used for fine mapping to identify potential causal mutations and candidate genes. Genomic variants on genes MTERF2, RTBC, miRNA ENSBTAG00000037306, ENSBTAG00000040351, PRKDC, and RGS20, which are known to have influence on reproductive biological processes, were detected.

Collapse

Genomic prediction based on selected variants from imputed whole-genome sequence data in Australian sheep populations. Genet Sel Evol 2019;51:72. [PMID: 31805849 PMCID: PMC6896509 DOI: 10.1186/s12711-019-0514-2] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2019] [Accepted: 11/25/2019] [Indexed: 12/13/2022] Open

Abstract

Background

Whole-genome sequence (WGS) data could contain information on genetic variants at or in high linkage disequilibrium with causative mutations that underlie the genetic variation of polygenic traits. Thus far, genomic prediction accuracy has shown limited increase when using such information in dairy cattle studies, in which one or few breeds with limited diversity predominate. The objective of our study was to evaluate the accuracy of genomic prediction in a multi-breed Australian sheep population of relatively less related target individuals, when using information on imputed WGS genotypes.

Methods

Between 9626 and 26,657 animals with phenotypes were available for nine economically important sheep production traits and all had WGS imputed genotypes. About 30% of the data were used to discover predictive single nucleotide polymorphism (SNPs) based on a genome-wide association study (GWAS) and the remaining data were used for training and validation of genomic prediction. Prediction accuracy using selected variants from imputed sequence data was compared to that using a standard array of 50k SNP genotypes, thereby comparing genomic best linear prediction (GBLUP) and Bayesian methods (BayesR/BayesRC). Accuracy of genomic prediction was evaluated in two independent populations that were each lowly related to the training set, one being purebred Merino and the other crossbred Border Leicester x Merino sheep.

Results

A substantial improvement in prediction accuracy was observed when selected sequence variants were fitted alongside 50k genotypes as a separate variance component in GBLUP (2GBLUP) or in Bayesian analysis as a separate category of SNPs (BayesRC). From an average accuracy of 0.27 in both validation sets for the 50k array, the average absolute increase in accuracy across traits with 2GBLUP was 0.083 and 0.073 for purebred and crossbred animals, respectively, whereas with BayesRC it was 0.102 and 0.087. The average gain in accuracy was smaller when selected sequence variants were treated in the same category as 50k SNPs. Very little improvement over 50k prediction was observed when using all WGS variants.

Conclusions

Accuracy of genomic prediction in diverse sheep populations increased substantially by using variants selected from whole-genome sequence data based on an independent multi-breed GWAS, when compared to genomic prediction using standard 50K genotypes.

Collapse

Hayes BJ, Daetwyler HD. 1000 Bull Genomes Project to Map Simple and Complex Genetic Traits in Cattle: Applications and Outcomes. Annu Rev Anim Biosci 2019;7:89-102. [PMID: 30508490 DOI: 10.1146/annurev-animal-020518-115024] [Citation(s) in RCA: 184] [Impact Index Per Article: 36.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Liu Q, Hobbs HA, Domier LL. Genome-wide association study of the seed transmission rate of soybean mosaic virus and associated traits using two diverse population panels. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2019;132:3413-3424. [PMID: 31630210 DOI: 10.1007/s00122-019-03434-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/10/2019] [Accepted: 09/17/2019] [Indexed: 06/10/2023]

Abstract

KEY MESSAGE

Genome-wide association analyses identified candidates for genes involved in restricting virus movement into embryonic tissues, suppressing virus-induced seed coat mottling and preserving yield in soybean plants infected with soybean mosaic virus. Soybean mosaic virus (SMV) causes significant reductions in soybean yield and seed quality. Because seedborne infections can serve as primary sources of inoculum for SMV infections, resistance to SMV seed transmission provides a means to limit the impacts of SMV. In this study, two diverse population panels, Pop1 and Pop2, composed of 409 and 199 soybean plant introductions, respectively, were evaluated for SMV seed transmission rate, seed coat mottling, and seed yield from SMV-infected plants. The phenotypic data and genotypic data from the SoySNP50K dataset were analyzed using GAPIT and rrBLUP. For SMV seed transmission rate, a single locus was identified on chromosome 9 in Pop1. For SMV-induced seed coat mottling, loci were identified on chromosome 9 in Pop1 and on chromosome 3 in Pop2. For seed yield from SMV-infected plants, a single locus was identified on chromosome 3 in Pop2 that was within the map interval of a previously described quantitative trait locus for seed number. The high linkage disequilibrium regions surrounding the markers on chromosomes 3 and 9 contained a predicted nonsense-mediated RNA decay gene, multiple pectin methylesterase inhibitor genes (involved in restricting virus movement), two chalcone synthase genes, and a homolog of the yeast Rtf1 gene (involved in RNA-mediated transcriptional gene silencing). The results of this study provided additional insight into the genetic architecture of these three important traits, suggested candidate genes for downstream functional validation, and suggested that genomic prediction would outperform marker-assisted selection for two of the four trait-marker associations.

Collapse

Song H, Ye S, Jiang Y, Zhang Z, Zhang Q, Ding X. Using imputation-based whole-genome sequencing data to improve the accuracy of genomic prediction for combined populations in pigs. Genet Sel Evol 2019;51:58. [PMID: 31638889 PMCID: PMC6805481 DOI: 10.1186/s12711-019-0500-8] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2019] [Accepted: 10/07/2019] [Indexed: 11/17/2022] Open

Abstract

BACKGROUND

For genomic selection in populations with a small reference population, combining populations of the same breed or populations of related breeds is an effective way to increase the size of the reference population. However, genomic predictions based on single nucleotide polymorphism (SNP)-chip genotype data using combined populations with different genetic backgrounds or from different breeds have not shown a clear advantage over using within-population or within-breed predictions. The increasing availability of whole-genome sequencing (WGS) data provides new opportunities for combined population genomic prediction. Our objective was to investigate the accuracy of genomic prediction using imputation-based WGS data from combined populations in pigs. Using 80K SNP panel genotypes, WGS genotypes, or genotypes on WGS variants that were pruned based on linkage disequilibrium (LD), three methods [genomic best linear unbiased prediction (GBLUP), single-step (ss)GBLUP, and genomic feature (GF)BLUP] were implemented with different prior information to identify the best method to improve the accuracy of genomic prediction for combined populations in pigs.

RESULTS

In total, 2089 and 2043 individuals with production and reproduction phenotypes, respectively, from three Yorkshire populations with different genetic backgrounds were genotyped with the PorcineSNP80 panel. Imputation accuracy from 80K to WGS variants reached 92%. The results showed that use of the WGS data compared to the 80K SNP panel did not increase the accuracy of genomic prediction in a single population, but using WGS data with LD pruning and GFBLUP with prior information did yield higher accuracy than the 80K SNP panel. For the 80K SNP panel genotypes, using the combined population resulted in a slight improvement, no change, or even a slight decrease in accuracy in comparison with the single population for GBLUP and ssGBLUP, while accuracy increased by 1 to 2.4% when using WGS data. Notably, the GFBLUP method did not perform well for both the combined population and the single populations.

CONCLUSIONS

The use of WGS data was beneficial for combined population genomic prediction. Simply increasing the number of SNPs to the WGS level did not increase accuracy for a single population, while using pruned WGS data based on LD and GFBLUP with prior information could yield higher accuracy than the 80K SNP panel.

Collapse

Fragomeni BO, Lourenco DAL, Legarra A, VanRaden PM, Misztal I. Alternative SNP weighting for single-step genomic best linear unbiased predictor evaluation of stature in US Holsteins in the presence of selected sequence variants. J Dairy Sci 2019;102:10012-10019. [PMID: 31495612 DOI: 10.3168/jds.2019-16262] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2019] [Accepted: 07/16/2019] [Indexed: 11/19/2022]

Abstract

Causal variants inferred from sequence data analysis are expected to increase accuracy of genomic selection. In this work we evaluated the gain in reliability of genomic predictions, for stature in US Holsteins, when adding selected sequence variants to a pre-existent SNP chip. Two prediction methods were tested: de-regressed proofs assuming heterogeneous (genomic BLUP; GBLUP) residual variances and by single-step GBLUP (ssGBLUP) using actual phenotypes. Phenotypic data included 3,999,631 records for stature on 3,027,304 Holstein cows. Genotypes on 54,087 SNP markers (54k) were available for 26,877 bulls. Additionally, 16,648 selected sequence variants were combined with the 54k markers, for a total of 70,735 (70k) markers. In all methods, SNP in the genomic relationship matrix (G) were unweighted or weighted iteratively, with weights derived either by SNP effects squared or by a nonlinear method that resembles BayesA (nonlinear A). Reliability of genomic predictions were obtained by cross validation. With unweighted G derived from 54k markers, the reliabilities (× 100) were 72.4 for GBLUP and 75.3 for ssGBLUP. With unweighted G derived from 70k markers, the reliabilities were 73.4 and 76.0, respectively. Weighting by nonlinear A changed reliabilities to 73.3, and 75.9, respectively. Addition of selected sequence variants had a small effect on reliabilities. Weighting by quadratic functions reduced reliabilities. Weighting by nonlinear A increased reliabilities for GBLUP but had only a small effect in ssGBLUP. Reliabilities for direct genomic values extracted from ssGBLUP using unweighted G with 54k were higher than reliabilities by any GBLUP. Thus, ssGBLUP seems to capture more information than GBLUP and there is less room for extra reliability. Improvements in GBLUP may be because the weights in G change the covariance structure, which can explain a proportion of the variance that is accounted for when a heterogeneous residual variance is assumed by considering a different number of daughters per bull.

Collapse

GWAS for Meat and Carcass Traits Using Imputed Sequence Level Genotypes in Pooled F2-Designs in Pigs. G3-GENES GENOMES GENETICS 2019;9:2823-2834. [PMID: 31296617 PMCID: PMC6723123 DOI: 10.1534/g3.119.400452] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]

Yang F, Chen F, Li L, Yan L, Badri T, Lv C, Yu D, Zhang M, Jang X, Li J, Yuan L, Wang G, Li H, Li J, Cai Y. Three Novel Players: PTK2B, SYK, and TNFRSF21 Were Identified to Be Involved in the Regulation of Bovine Mastitis Susceptibility via GWAS and Post-transcriptional Analysis. Front Immunol 2019;10:1579. [PMID: 31447828 PMCID: PMC6691815 DOI: 10.3389/fimmu.2019.01579] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Accepted: 06/24/2019] [Indexed: 12/25/2022] Open

Abstract

Bovine mastitis is a common inflammatory disease caused by multiple factors in early lactation or dry period. Genome wide association studies (GWAS) can provide a convenient and effective strategy for understanding the biological basis of mastitis and better prevention. 2b-RADseq is a high-throughput sequencing technique that offers a powerful method for genome-wide genetic marker development and genotyping. In this study, single nucleotide polymorphisms (SNPs) of the immune-regulated gene correlative with mastitis were screened and identified by two stage association analysis via GWAS-2b-RADseq in Chinese Holstein cows. We have screened 10,058 high quality SNPs from 7,957,920 tags and calculated their allele frequencies. Twenty-seven significant SNPs were co-labeled in two GWAS analysis models [Bayesian (P < 0.001) and Logistic regression (P < 0.01)], and only three SNPs (rs75762330, C > T, PIC = 0.2999; rs88640083, A > G, PIC = 0.1676; rs20438858, G > A, PIC = 0.3366) were annotated to immune-regulated genes (PTK2B, SYK, and TNFRSF21). Identified three SNPs are located in non-coding regions with low or moderate genetic polymorphisms. However, independent sample population validation (Case-control study) data showed that three important SNPs (rs75762330, P < 0.025, OR > 1; rs88640083, P < 0.005, OR > 1; rs20438858, P < 0.001, OR < 1) were significantly associated with clinical mastitis trait. Importantly, PTK2B and SYK expression was down-regulated in both peripheral blood leukocytes (PBLs) of clinical mastitis cows and in vitro LPS (E. coli)-stimulated bovine mammary epithelial cells, while TNFRSF21 was up-regulated. Under the same conditions, expression of Toll-like receptor 4 (TLR4), AKT1, and pro-inflammatory factors (IL-1β and IL-8) were also up-regulated. Interestingly, network analysis indicated that PTK2B and SYK are co-expressed in innate immune signaling pathway of Chinese Holstein. Taken together, these results provided strong evidence for the study of SNPs in bovine mastitis, and revealed the role of SYK, PTK2B, and TNFRSF21 in bovine mastitis susceptibility/tolerance.

Collapse

Affiliation(s)

Fan Yang Anhui Provincial Key Lab of the Conservation and Exploitation of Biological Resources, College of Life Sciences, Anhui Normal University, Wuhu, China College of Animal Science and Technology, Nanjing Agricultural University, Nanjing, China
Fanghui Chen College of Animal Science and Technology, Nanjing Agricultural University, Nanjing, China
Lili Li National Animal Husbandry Station, Beijing, China
Li Yan Department of Radiation Oncology, Linyi People Hospital, Linyi, China
Tarig Badri College of Animal Science and Technology, Nanjing Agricultural University, Nanjing, China
Chenglong Lv College of Animal Science and Technology, Nanjing Agricultural University, Nanjing, China
Daolun Yu Anhui Provincial Key Lab of the Conservation and Exploitation of Biological Resources, College of Life Sciences, Anhui Normal University, Wuhu, China
Manling Zhang Anhui Provincial Key Lab of the Conservation and Exploitation of Biological Resources, College of Life Sciences, Anhui Normal University, Wuhu, China
Xiaojun Jang Anhui Provincial Key Lab of the Conservation and Exploitation of Biological Resources, College of Life Sciences, Anhui Normal University, Wuhu, China
Jie Li Anhui Provincial Key Lab of the Conservation and Exploitation of Biological Resources, College of Life Sciences, Anhui Normal University, Wuhu, China
Lu Yuan Anhui Provincial Key Lab of the Conservation and Exploitation of Biological Resources, College of Life Sciences, Anhui Normal University, Wuhu, China
Genlin Wang College of Animal Science and Technology, Nanjing Agricultural University, Nanjing, China
Honglin Li Department of Biochemistry and Molecular Biology, Medical College of Georgia, Augusta University, Augusta, GA, United States
Jun Li Anhui Provincial Key Lab of the Conservation and Exploitation of Biological Resources, College of Life Sciences, Anhui Normal University, Wuhu, China
Yafei Cai College of Animal Science and Technology, Nanjing Agricultural University, Nanjing, China

Collapse

Ye S, Gao N, Zheng R, Chen Z, Teng J, Yuan X, Zhang H, Chen Z, Zhang X, Li J, Zhang Z. Strategies for Obtaining and Pruning Imputed Whole-Genome Sequence Data for Genomic Prediction. Front Genet 2019;10:673. [PMID: 31379929 PMCID: PMC6650575 DOI: 10.3389/fgene.2019.00673] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2019] [Accepted: 06/27/2019] [Indexed: 11/13/2022] Open

Abstract

Genomic prediction with imputed whole-genome sequencing (WGS) data is an attractive approach to improve predictive ability with low cost. However, high accuracy has not been realized using this method in livestock. In this study, we imputed 435 individuals from 600K single nucleotide polymorphism (SNP) chip data to WGS data using different reference panels. We also investigated the prediction accuracy of genomic best linear unbiased prediction (GBLUP) using imputed WGS data from different reference panels, linkage disequilibrium (LD)-based marker pruning, and pre-selected variants based on Genome-wide association society (GWAS) results. Results showed that the imputation accuracies from 600K to WGS data were 0.873 ± 0.038, 0.906 ± 0.036, and 0.979 ± 0.010 for the internal, external, and combined reference panels, respectively. In most traits of chickens, the prediction accuracy of imputed WGS data obtained from the internal reference panel was greater than or equal to that of the combined reference panel; the external reference panel had the lowest prediction accuracy. Compared with 600K chip data, GBLUP with imputed WGS data had only a small increase (1-3%) in prediction accuracy. Using only variants selected from imputed WGS data based on GWAS results resulted in almost no increase for most traits and even increased the bias of the regression coefficient. The impact of the degree of LD of selected and remaining variants on prediction accuracy was different. For average daily gain (ADG), residual feed intake (RFI), intestine length (IL), and body weight in 91 days (BW91), the accuracy of GBLUP increased as the degree of LD of selected variants decreased, but the opposite relationship occurred for the remaining variants. But for breast muscle weight (BMW) and average daily feed intake (ADFI), the accuracy of GBLUP increased as the degree of LD of selected variants increased, and the degree of LD of remaining variants had a small effect on prediction accuracy. Overall, the optimal imputation strategy to obtain WGS data for genomic prediction should consider the relationship between selected individuals and target population individuals to avoid heterogeneity of imputation. LD-based marker pruning can be used to improve the accuracy of genomic prediction using imputed WGS data.

Collapse

Improvement of genomic prediction by integrating additional single nucleotide polymorphisms selected from imputed whole genome sequencing data. Heredity (Edinb) 2019;124:37-49. [PMID: 31278370 PMCID: PMC6906477 DOI: 10.1038/s41437-019-0246-7] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Revised: 05/11/2019] [Accepted: 06/17/2019] [Indexed: 11/10/2022] Open

Al Kalaldeh M, Gibson J, Duijvesteijn N, Daetwyler HD, MacLeod I, Moghaddar N, Lee SH, van der Werf JHJ. Using imputed whole-genome sequence data to improve the accuracy of genomic prediction for parasite resistance in Australian sheep. Genet Sel Evol 2019;51:32. [PMID: 31242855 PMCID: PMC6595562 DOI: 10.1186/s12711-019-0476-4] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2018] [Accepted: 06/18/2019] [Indexed: 01/16/2023] Open

Abstract

Background

This study aimed at (1) comparing the accuracies of genomic prediction for parasite resistance in sheep based on whole-genome sequence (WGS) data to those based on 50k and high-density (HD) single nucleotide polymorphism (SNP) panels; (2) investigating whether the use of variants within quantitative trait loci (QTL) regions that were selected from regional heritability mapping (RHM) in an independent dataset improved the accuracy more than variants selected from genome-wide association studies (GWAS); and (3) comparing the prediction accuracies between variants selected from WGS data to variants selected from the HD SNP panel.

Results

The accuracy of genomic prediction improved marginally from 0.16 ± 0.02 and 0.18 ± 0.01 when using all the variants from 50k and HD genotypes, respectively, to 0.19 ± 0.01 when using all the variants from WGS data. Fitting a GRM from the selected variants alongside a GRM from the 50k SNP genotypes improved the prediction accuracy substantially compared to fitting the 50k SNP genotypes alone. The gain in prediction accuracy was slightly more pronounced when variants were selected from WGS data compared to when variants were selected from the HD panel. When sequence variants that passed the GWAS \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$- log_{10} (p\,value)$$\end{document}-log10(pvalue) threshold of 3 across the entire genome were selected, the prediction accuracy improved by 5% (up to 0.21 ± 0.01), whereas when selection was limited to sequence variants that passed the same GWAS \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$- log_{10} (p\,value)$$\end{document}-log10(pvalue) threshold of 3 in regions identified by RHM, the accuracy improved by 9% (up to 0.25 ± 0.01).

Conclusions

Our results show that through careful selection of sequence variants from the QTL regions, the accuracy of genomic prediction for parasite resistance in sheep can be improved. These findings have important implications for genomic prediction in sheep.

Collapse

Ma P, Lund MS, Aamand GP, Su G. Use of a Bayesian model including QTL markers increases prediction reliability when test animals are distant from the reference population. J Dairy Sci 2019;102:7237-7247. [PMID: 31155255 DOI: 10.3168/jds.2018-15815] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2018] [Accepted: 03/31/2019] [Indexed: 01/23/2023]

Abstract

Relatedness between reference and test animals has an important effect on the reliability of genomic prediction for test animals. Because genomic prediction has been widely applied in practical cattle breeding and bulls have been selected according to genomic breeding value without progeny testing, the sires or grandsires of candidates might not have phenotypic information and might not be in the reference population when the candidates are selected. The objective of this study was to investigate the decreasing trend of the reliability of genomic prediction given distant reference populations, using genomic best linear unbiased prediction (GBLUP) and Bayesian variable selection models with or without including the quantitative trait locus (QTL) markers detected from sequencing data. The data used in this study consisted of 22,242 bulls genotyped using the 54K SNP array from EuroGenomics. Among them, 1,444 Danish bulls born from 2006 to 2010 were selected as test animals. Different reference populations with varying relationships to test animals were created according to pedigree-based relationships. The reference individuals having a relationship with one or more test animals higher than 0.4 (scenario ρ < 0.4), 0.2 (ρ < 0.2), or 0.1 (ρ < 0.1, where ρ = relationship coefficient) were removed from reference sets; these represented the distance between reference and test animals being 2 generations, 3 generations, and 4 generations, respectively. Imputed whole-genome sequencing data of bulls from Denmark were used to conduct a genome-wide association study (GWAS). A small number of significant variants (QTL markers) from the GWAS were added to the array data. To compare the effects of different models, the basic GBLUP model, a Bayesian selection variable model, a GBLUP model with 2 components of genetic effects, and a Bayesian model with pooled array data and QTL markers were used for estimating genomic estimated breeding values (GEBV) of test animals. The reliability of genomic prediction decreased when the test animals were more generations away from the reference population. The reliability of genomic prediction was 0.461 for 1 generation away and 0.396 for 3 generations away, with the same number of individuals in the reference set, using a GBLUP model with chip markers only. The results showed that using the Bayesian method and QTL markers improved the reliability of genomic prediction in all scenarios of relationship between test and reference animals, in a range of 1.3% and 65.1% (4 generations away with only 841 individuals in the reference set). However, most gains were for predictions of milk yield and fat yield. There was little improvement for predictions of protein yield and mastitis, and no improvement for prediction of fertility, except for scenario ρ < 0.1, in which there was a large improvement for predictions of all traits. On the other hand, models including more than 10% polygenic effect decreased prediction reliability when the relationship between test and reference animals was distant.

Collapse

Gebreyesus G, Bovenhuis H, Lund MS, Poulsen NA, Sun D, Buitenhuis B. Reliability of genomic prediction for milk fatty acid composition by using a multi-population reference and incorporating GWAS results. Genet Sel Evol 2019;51:16. [PMID: 31029078 PMCID: PMC6487064 DOI: 10.1186/s12711-019-0460-z] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2018] [Accepted: 04/10/2019] [Indexed: 01/01/2023] Open

Abstract

Background

Large-scale phenotyping for detailed milk fatty acid (FA) composition is difficult due to expensive and time-consuming analytical techniques. Reliability of genomic prediction is often low for traits that are expensive/difficult to measure and for breeds with a small reference population size. An effective method to increase reference population size could be to combine datasets from different populations. Prediction models might also benefit from incorporation of information on the biological underpinnings of quantitative traits. Genome-wide association studies (GWAS) show that genomic regions on Bos taurus chromosomes (BTA) 14, 19 and 26 underlie substantial proportions of the genetic variation in milk FA traits. Genomic prediction models that incorporate such results could enable improved prediction accuracy in spite of limited reference population sizes. In this study, we combine gas chromatography quantified FA samples from the Chinese, Danish and Dutch Holstein populations and implement a genomic feature best linear unbiased prediction (GFBLUP) model that incorporates variants on BTA14, 19 and 26 as genomic features for which random genetic effects are estimated separately. Prediction reliabilities were compared to those estimated with traditional GBLUP models.

Results

Predictions using a multi-population reference and a traditional GBLUP model resulted in average gains in prediction reliability of 10% points in the Dutch, 8% points in the Danish and 1% point in the Chinese populations compared to predictions based on population-specific references. Compared to the traditional GBLUP, implementation of the GFBLUP model with a multi-population reference led to further increases in prediction reliability of up to 38% points in the Dutch, 23% points in the Danish and 13% points in the Chinese populations. Prediction reliabilities from the GFBLUP model were moderate to high across the FA traits analyzed.

Conclusions

Our study shows that it is possible to predict genetic merits for milk FA traits with reasonable accuracy by combining related populations of a breed and using models that incorporate GWAS results. Our findings indicate that international collaborations that facilitate access to multi-population datasets could be highly beneficial to the implementation of genomic selection for detailed milk composition traits.

Electronic supplementary material

The online version of this article (10.1186/s12711-019-0460-z) contains supplementary material, which is available to authorized users.

Collapse

Nani JP, Rezende FM, Peñagaricano F. Predicting male fertility in dairy cattle using markers with large effect and functional annotation data. BMC Genomics 2019;20:258. [PMID: 30940077 PMCID: PMC6444482 DOI: 10.1186/s12864-019-5644-y] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2018] [Accepted: 03/25/2019] [Indexed: 11/22/2022] Open

Abstract

Background

Fertility is among the most important economic traits in dairy cattle. Genomic prediction for cow fertility has received much attention in the last decade, while bull fertility has been largely overlooked. The goal of this study was to assess genomic prediction of dairy bull fertility using markers with large effect and functional annotation data. Sire conception rate (SCR) was used as a measure of service sire fertility. Dataset consisted of 11.5 k U.S. Holstein bulls with SCR records and about 300 k single nucleotide polymorphism (SNP) markers. The analyses included the use of both single-kernel and multi-kernel predictive models fitting either all SNPs, markers with large effect, or markers with presumed functional roles, such as non-synonymous, synonymous, or non-coding regulatory variants.

Results

The entire set of SNPs yielded predictive correlations of 0.340. Five markers located on chromosomes BTA8, BTA9, BTA13, BTA17, and BTA27 showed marked dominance effects. Interestingly, the inclusion of these five major markers as fixed effects in the predictive models increased predictive correlations to 0.403, representing an increase in accuracy of about 19% compared with the standard model. Single-kernel models fitting functional SNP classes outperformed their counterparts using random sets of SNPs, suggesting that the predictive power of these functional variants is driven in part by their biological roles. Multi-kernel models fitting all the functional SNP classes together with the five major markers exhibited predictive correlations around 0.405.

Conclusions

The inclusion of markers with large effect markedly improved the prediction of dairy sire fertility. Functional variants exhibited higher predictive ability than random variants, but did not outperform the standard whole-genome approach. This research is the foundation for the development of novel strategies that could help the dairy industry make accurate genome-guided selection decisions on service sire fertility.

Collapse

Voss-Fels KP, Cooper M, Hayes BJ. Accelerating crop genetic gains with genomic selection. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2019;132:669-686. [PMID: 30569365 DOI: 10.1007/s00122-018-3270-8] [Citation(s) in RCA: 119] [Impact Index Per Article: 23.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/09/2018] [Accepted: 12/12/2018] [Indexed: 05/05/2023]

van den Berg S, Vandenplas J, van Eeuwijk FA, Bouwman AC, Lopes MS, Veerkamp RF. Imputation to whole-genome sequence using multiple pig populations and its use in genome-wide association studies. Genet Sel Evol 2019;51:2. [PMID: 30678638 PMCID: PMC6346588 DOI: 10.1186/s12711-019-0445-y] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2018] [Accepted: 01/10/2019] [Indexed: 11/10/2022] Open

Abstract

Background

Use of whole-genome sequence data (WGS) is expected to improve identification of quantitative trait loci (QTL). However, this requires imputation to WGS, often with a limited number of sequenced animals for the target population. The objective of this study was to investigate imputation to WGS in two pig lines using a multi-line reference population and, subsequently, to investigate the effect of using these imputed WGS (iWGS) for GWAS.

Methods

Phenotypes and genotypes were available on 12,184 Large White pigs (LW-line) and 4943 Dutch Landrace pigs (DL-line). Imputed 660 K and 80 K genotypes for the LW-line and DL-line, respectively, were imputed to iWGS using Beagle v.4.1. Since only 32 LW-line and 12 DL-line boars were sequenced, 142 animals from eight commercial lines were added. GWAS were performed for each line using the 80 K and 660 K SNPs, the genotype scores of iWGS SNPs that had an imputation accuracy (Beagle R²) higher than 0.6, and the dosage scores of all iWGS SNPs.

Results

For the DL-line (LW-line), imputation of 80 K genotypes to iWGS resulted in an average Beagle R² of 0.39 (0.49). After quality control, 2.5 × 10⁶ (3.5 × 10⁶) SNPs had a Beagle R² higher than 0.6, resulting in an average Beagle R² of 0.83 (0.93). Compared to the 80 K and 660 K genotypes, using iWGS led to the identification of 48.9 and 64.4% more QTL regions, for the DL-line and LW-line, respectively, and the most significant SNPs in the QTL regions explained a higher proportion of phenotypic variance. Using dosage instead of genotype scores improved the identification of QTL, because the model accounted for uncertainty of imputation, and all SNPs were used in the analysis.

Conclusions

Imputation to WGS using the multi-line reference population resulted in relatively poor imputation, especially when imputing from 80 K (DL-line). In spite of the poor imputation accuracies, using iWGS instead of a lower density SNP chip increased the number of detected QTL and the estimated proportion of phenotypic variance explained by these QTL, especially when dosage scores were used instead of genotype scores. Thus, iWGS, even with poor imputation accuracy, can be used to identify possible interesting regions for fine mapping.

Electronic supplementary material

The online version of this article (10.1186/s12711-019-0445-y) contains supplementary material, which is available to authorized users.

Collapse

Bolormaa S, Chamberlain AJ, Khansefid M, Stothard P, Swan AA, Mason B, Prowse-Wilkins CP, Duijvesteijn N, Moghaddar N, van der Werf JH, Daetwyler HD, MacLeod IM. Accuracy of imputation to whole-genome sequence in sheep. Genet Sel Evol 2019;51:1. [PMID: 30654735 PMCID: PMC6337865 DOI: 10.1186/s12711-018-0443-5] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2018] [Accepted: 12/18/2018] [Indexed: 12/12/2022] Open

Abstract

Background

The use of whole-genome sequence (WGS) data for genomic prediction and association studies is highly desirable because the causal mutations should be present in the data. The sequencing of 935 sheep from a range of breeds provides the opportunity to impute sheep genotyped with single nucleotide polymorphism (SNP) arrays to WGS. This study evaluated the accuracy of imputation from SNP genotypes to WGS using this reference population of 935 sequenced sheep.

Results

The accuracy of imputation from the Ovine Infinium^® HD BeadChip SNP (~ 500 k) to WGS was assessed for three target breeds: Merino, Poll Dorset and F1 Border Leicester × Merino. Imputation accuracy was highest for the Poll Dorset breed, although there were more Merino individuals in the sequenced reference population than Poll Dorset individuals. In addition, empirical imputation accuracies were higher (by up to 1.7%) when using larger multi-breed reference populations compared to using a smaller single-breed reference population. The mean accuracy of imputation across target breeds using the Minimac3 or the FImpute software was 0.94. The empirical imputation accuracy varied considerably across the genome; six chromosomes carried regions of one or more Mb with a mean imputation accuracy of < 0.7. Imputation accuracy in five variant annotation classes ranged from 0.87 (missense) up to 0.94 (intronic variants), where lower accuracy corresponded to higher proportions of rare alleles. The imputation quality statistic reported from Minimac3 (R²) had a clear positive relationship with the empirical imputation accuracy. Therefore, by first discarding imputed variants with an R² below 0.4, the mean empirical accuracy across target breeds increased to 0.97. Although accuracy of genomic prediction was less affected by filtering on R² in a multi-breed population of sheep with imputed WGS, the genomic heritability clearly tended to be lower when using variants with an R² ≤ 0.4.

Conclusions

The mean imputation accuracy was high for all target breeds and was increased by combining smaller breed sets into a multi-breed reference. We found that the Minimac3 software imputation quality statistic (R²) was a useful indicator of empirical imputation accuracy, enabling removal of very poorly imputed variants before downstream analyses.

Electronic supplementary material

The online version of this article (10.1186/s12711-018-0443-5) contains supplementary material, which is available to authorized users.

Collapse

Affiliation(s)

Sunduimijid Bolormaa Agriculture Victoria, AgriBio, Centre for AgriBioscience, 5 Ring Rd, Bundoora, VIC, 3083, Australia. .,Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.
Amanda J Chamberlain Agriculture Victoria, AgriBio, Centre for AgriBioscience, 5 Ring Rd, Bundoora, VIC, 3083, Australia
Majid Khansefid Agriculture Victoria, AgriBio, Centre for AgriBioscience, 5 Ring Rd, Bundoora, VIC, 3083, Australia.,Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia
Paul Stothard Faculty of Agricultural, Life and Environmental Sciences, University of Alberta, Edmonton, AB, T6G 2R3, Canada
Andrew A Swan Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.,Animal Genetics and Breeding Unit, University of New England, Armidale, NSW, 2351, Australia
Brett Mason Agriculture Victoria, AgriBio, Centre for AgriBioscience, 5 Ring Rd, Bundoora, VIC, 3083, Australia
Claire P Prowse-Wilkins Agriculture Victoria, AgriBio, Centre for AgriBioscience, 5 Ring Rd, Bundoora, VIC, 3083, Australia
Naomi Duijvesteijn Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.,School of Environmental and Rural Science, University of New England, Armidale, NSW, 2351, Australia
Nasir Moghaddar Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.,School of Environmental and Rural Science, University of New England, Armidale, NSW, 2351, Australia
Julius H van der Werf Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.,School of Environmental and Rural Science, University of New England, Armidale, NSW, 2351, Australia
Hans D Daetwyler Agriculture Victoria, AgriBio, Centre for AgriBioscience, 5 Ring Rd, Bundoora, VIC, 3083, Australia.,Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.,School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3086, Australia
Iona M MacLeod Agriculture Victoria, AgriBio, Centre for AgriBioscience, 5 Ring Rd, Bundoora, VIC, 3083, Australia.,Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia

Collapse

Harnessing genomic information for livestock improvement. Nat Rev Genet 2018;20:135-156. [DOI: 10.1038/s41576-018-0082-2] [Citation(s) in RCA: 154] [Impact Index Per Article: 25.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]

Zhang Q, Sahana G, Su G, Guldbrandtsen B, Lund MS, Calus MPL. Impact of rare and low-frequency sequence variants on reliability of genomic prediction in dairy cattle. Genet Sel Evol 2018;50:62. [PMID: 30458700 PMCID: PMC6247626 DOI: 10.1186/s12711-018-0432-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2018] [Accepted: 11/14/2018] [Indexed: 11/05/2022] Open

Abstract

Background

Availability of whole-genome sequence data for a large number of cattle and efficient imputation methodologies open a new opportunity to include rare and low-frequency variants (RLFV) in genomic prediction in dairy cattle. The objective of this study was to examine the impact of including RLFV that are within genes and selected from whole-genome sequence variants, on the reliability of genomic prediction for fertility, health and longevity in dairy cattle.

Results

All genic RLFV with a minor allele frequency lower than 0.05 were extracted from imputed sequence data and subsets were created using different strategies. These subsets were subsequently combined with Illumina 50 k single nucleotide polymorphism (SNP) data and used for genomic prediction. Reliability of prediction obtained by using 50 k SNP data alone was used as reference value and absolute changes in reliabilities are referred to as changes in percentage points. Adding a component that included either all the genic or a subset of selected RLFV into the model in addition to the 50 k component changed the reliability of predictions by − 2.2 to 1.1%, i.e. hardly no change in reliability of prediction was found, regardless of how the RLFV were selected. In addition to these empirical analyses, a simulation study was performed to evaluate the potential impact of adding RLFV in the model on the reliability of prediction. Three sets of causal RLFV (containing 21,468, 1348 and 235 RLFV) that were randomly selected from different numbers of genes were generated and accounted for 10% additional genetic variance of the estimated variance explained by the 50 k SNPs. When genic RLFV based on mapping results were included in the prediction model, reliabilities improved by up to 4.0% and when the causal RLFV were included they improved by up to 6.8%.

Conclusions

Using selected RLFV from whole-genome sequence data had only a small impact on the empirical reliability of genomic prediction in dairy cattle. Our simulations revealed that for sequence data to bring a benefit, the key is to identify causal RLFV.

Electronic supplementary material

The online version of this article (10.1186/s12711-018-0432-8) contains supplementary material, which is available to authorized users.

Collapse

Giuffra E, Tuggle CK. Functional Annotation of Animal Genomes (FAANG): Current Achievements and Roadmap. Annu Rev Anim Biosci 2018;7:65-88. [PMID: 30427726 DOI: 10.1146/annurev-animal-020518-114913] [Citation(s) in RCA: 94] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Raymond B, Bouwman AC, Wientjes YCJ, Schrooten C, Houwing-Duistermaat J, Veerkamp RF. Genomic prediction for numerically small breeds, using models with pre-selected and differentially weighted markers. Genet Sel Evol 2018;50:49. [PMID: 30314431 PMCID: PMC6186145 DOI: 10.1186/s12711-018-0419-5] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2018] [Accepted: 10/01/2018] [Indexed: 01/22/2023] Open

Abstract

BACKGROUND

Genomic prediction (GP) accuracy in numerically small breeds is limited by the small size of the reference population. Our objective was to test a multi-breed multiple genomic relationship matrices (GRM) GP model (MBMG) that weighs pre-selected markers separately, uses the remaining markers to explain the remaining genetic variance that can be explained by markers, and weighs information of breeds in the reference population by their genetic correlation with the validation breed.

METHODS

Genotype and phenotype data were used on 595 Jersey bulls from New Zealand and 5503 Holstein bulls from the Netherlands, all with deregressed proofs for stature. Different sets of markers were used, containing either pre-selected markers from a meta-genome-wide association analysis on stature, remaining markers or both. We implemented a multi-breed bivariate GREML model in which we fitted either a single multi-breed GRM (MBSG), or two distinct multi-breed GRM (MBMG), one made with pre-selected markers and the other with remaining markers. Accuracies of predicting stature for Jersey individuals using the multi-breed models (Holstein and Jersey combined reference population) was compared to those obtained using either the Jersey (within-breed) or Holstein (across-breed) reference population. All the models were subsequently fitted in the analysis of simulated phenotypes, with a simulated genetic correlation between breeds of 1, 0.5, and 0.25.

RESULTS

The MBMG model always gave better prediction accuracies for stature compared to MBSG, within-, and across-breed GP models. For example, with MBSG, accuracies obtained by fitting 48,912 unselected markers (0.43), 357 pre-selected markers (0.38) or a combination of both (0.43), were lower than accuracies obtained by fitting pre-selected and unselected markers in separate GRM in MBMG (0.49). This improvement was further confirmed by results from a simulation study, with MBMG performing on average 23% better than MBSG with all markers fitted.

CONCLUSIONS

With the MBMG model, it is possible to use information from numerically large breeds to improve prediction accuracy of numerically small breeds. The superiority of MBMG is mainly due to its ability to use information on pre-selected markers, explain the remaining genetic variance and weigh information from a different breed by the genetic correlation between breeds.

Collapse

Cai Z, Guldbrandtsen B, Lund MS, Sahana G. Prioritizing candidate genes post-GWAS using multiple sources of data for mastitis resistance in dairy cattle. BMC Genomics 2018;19:656. [PMID: 30189836 PMCID: PMC6127918 DOI: 10.1186/s12864-018-5050-x] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Accepted: 08/31/2018] [Indexed: 12/31/2022] Open

Abstract

Background

Improving resistance to mastitis, one of the costliest diseases in dairy production, has become an important objective in dairy cattle breeding. However, mastitis resistance is influenced by many genes involved in multiple processes, including the response to infection, inflammation, and post-infection healing. Low genetic heritability, environmental variations, and farm management differences further complicate the identification of links between genetic variants and mastitis resistance. Consequently, studies of the genetics of variation in mastitis resistance in dairy cattle lack agreement about the responsible genes.

Results

We associated 15,552,968 imputed whole-genome sequencing markers for 5147 Nordic Holstein cattle with mastitis resistance in a genome-wide association study (GWAS). Next, we augmented P-values for markers in genes in the associated regions using Gene Ontology terms, Kyoto Encyclopedia of Genes and Genomes pathway analysis, and mammalian phenotype database. To confirm results of gene-based analyses, we used gene expression data from E. coli-challenged cow udders. We identified 22 independent quantitative trait loci (QTL) that collectively explained 14% of the variance in breeding values for resistance to clinical mastitis (CM). Using association test statistics with multiple pieces of independent information on gene function and differential expression during bacterial infection, we suggested putative causal genes with biological relevance for 12 QTL affecting resistance to CM in dairy cattle.

Conclusion

Combining information on the nearest positional genes, gene-based analyses, and differential gene expression data from RNA-seq, we identified putative causal genes (candidate genes with biological evidence) in QTL for mastitis resistance in Nordic Holstein cattle. The same strategy can be applied for other traits.

Electronic supplementary material

The online version of this article (10.1186/s12864-018-5050-x) contains supplementary material, which is available to authorized users.

Collapse

Kroezen V, Schenkel F, Miglior F, Baes C, Squires E. Candidate gene association analyses for ketosis resistance in Holsteins. J Dairy Sci 2018;101:5240-5249. [DOI: 10.3168/jds.2017-13374] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2017] [Accepted: 02/14/2018] [Indexed: 11/19/2022]

Zhang C, Kemp RA, Stothard P, Wang Z, Boddicker N, Krivushin K, Dekkers J, Plastow G. Genomic evaluation of feed efficiency component traits in Duroc pigs using 80K, 650K and whole-genome sequence variants. Genet Sel Evol 2018;50:14. [PMID: 29625549 PMCID: PMC5889553 DOI: 10.1186/s12711-018-0387-9] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2017] [Accepted: 03/27/2018] [Indexed: 12/30/2022] Open

Abstract

BACKGROUND

Increasing marker density was proposed to have potential to improve the accuracy of genomic prediction for quantitative traits; whole-sequence data is expected to give the best accuracy of prediction, since all causal mutations that underlie a trait are expected to be included. However, in cattle and chicken, this assumption is not supported by empirical studies. Our objective was to compare the accuracy of genomic prediction of feed efficiency component traits in Duroc pigs using single nucleotide polymorphism (SNP) panels of 80K, imputed 650K, and whole-genome sequence variants using GBLUP, BayesB and BayesRC methods, with the ultimate purpose to determine the optimal method to increase genetic gain for feed efficiency in pigs.

RESULTS

Phenotypes of average daily feed intake (ADFI), average daily gain (ADG), ultrasound backfat depth (FAT), and loin muscle depth (LMD) were available for 1363 Duroc boars from a commercial breeding program. Genotype imputation accuracies reached 92.1% from 80K to 650K and 85.6% from 650K to whole-genome sequence variants. Average accuracies across methods and marker densities of genomic prediction of ADFI, FAT, LMD and ADG were 0.40, 0.65, 0.30 and 0.15, respectively. For ADFI and FAT, BayesB outperformed GBLUP, but increasing marker density had little advantage for genomic prediction. For ADG and LMD, GBLUP outperformed BayesB, while BayesRC based on whole-genome sequence data gave the best accuracies and reached up to 0.35 for LMD and 0.25 for ADG.

CONCLUSIONS

Use of genomic information was beneficial for prediction of ADFI and FAT but not for that of ADG and LMD compared to pedigree-based estimates. BayesB based on 80K SNPs gave the best genomic prediction accuracy for ADFI and FAT, while BayesRC based on whole-genome sequence data performed best for ADG and LMD. We suggest that these differences between traits in the effect of marker density and method on accuracy of genomic prediction are mainly due to the underlying genetic architecture of the traits.

Collapse

Song H, Li L, Ma P, Zhang S, Su G, Lund MS, Zhang Q, Ding X. Short communication: Improving the accuracy of genomic prediction of body conformation traits in Chinese Holsteins using markers derived from high-density marker panels. J Dairy Sci 2018;101:5250-5254. [PMID: 29550139 DOI: 10.3168/jds.2017-13456] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2017] [Accepted: 11/25/2017] [Indexed: 01/02/2023]

Affiliation(s)

H Song Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture of China, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing 100193, P.R. China
L Li Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture of China, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing 100193, P.R. China
P Ma Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture of China, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing 100193, P.R. China; Department of Molecular Biology and Genetics, Aarhus University, DK-8830 Tjele, Denmark; Department of Animal Science, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai 200240, P.R. China
S Zhang Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture of China, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing 100193, P.R. China
G Su Department of Molecular Biology and Genetics, Aarhus University, DK-8830 Tjele, Denmark
M S Lund Department of Molecular Biology and Genetics, Aarhus University, DK-8830 Tjele, Denmark
Q Zhang Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture of China, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing 100193, P.R. China
X Ding Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture of China, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing 100193, P.R. China.

Collapse

Jardim JG, Guldbrandtsen B, Lund MS, Sahana G. Association analysis for udder index and milking speed with imputed whole-genome sequence variants in Nordic Holstein cattle. J Dairy Sci 2017;101:2199-2212. [PMID: 29274975 DOI: 10.3168/jds.2017-12982] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2017] [Accepted: 10/30/2017] [Indexed: 12/26/2022]

Pausch H, Emmerling R, Gredler-Grandl B, Fries R, Daetwyler HD, Goddard ME. Meta-analysis of sequence-based association studies across three cattle breeds reveals 25 QTL for fat and protein percentages in milk at nucleotide resolution. BMC Genomics 2017;18:853. [PMID: 29121857 PMCID: PMC5680815 DOI: 10.1186/s12864-017-4263-8] [Citation(s) in RCA: 42] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2017] [Accepted: 11/02/2017] [Indexed: 11/25/2022] Open

Abstract

Background

Genotyping and whole-genome sequencing data have been generated for hundreds of thousands of cattle. International consortia used these data to compile imputation reference panels that facilitate the imputation of sequence variant genotypes for animals that have been genotyped using dense microarrays. Association studies with imputed sequence variant genotypes allow for the characterization of quantitative trait loci (QTL) at nucleotide resolution particularly when individuals from several breeds are included in the mapping populations.

Results

We imputed genotypes for 28 million sequence variants in 17,229 cattle of the Braunvieh, Fleckvieh and Holstein breeds in order to compile large mapping populations that provide high power to identify QTL for milk production traits. Association tests between imputed sequence variant genotypes and fat and protein percentages in milk uncovered between six and thirteen QTL (P < 1e-8) per breed. Eight of the detected QTL were significant in more than one breed. We combined the results across breeds using meta-analysis and identified a total of 25 QTL including six that were not significant in the within-breed association studies. Two missense mutations in the ABCG2 (p.Y581S, rs43702337, P = 4.3e-34) and GHR (p.F279Y, rs385640152, P = 1.6e-74) genes were the top variants at QTL on chromosomes 6 and 20. Another known causal missense mutation in the DGAT1 gene (p.A232K, rs109326954, P = 8.4e-1436) was the second top variant at a QTL on chromosome 14 but its allelic substitution effects were inconsistent across breeds. It turned out that the conflicting allelic substitution effects resulted from flaws in the imputed genotypes due to the use of a multi-breed reference population for genotype imputation.

Conclusions

Many QTL for milk production traits segregate across breeds and across-breed meta-analysis has greater power to detect such QTL than within-breed association testing. Association testing between imputed sequence variant genotypes and phenotypes of interest facilitates identifying causal mutations provided the accuracy of imputation is high. However, true causal mutations may remain undetected when the imputed sequence variant genotypes contain flaws. It is highly recommended to validate the effect of known causal variants in order to assess the ability to detect true causal mutations in association studies with imputed sequence variants.

Electronic supplementary material

The online version of this article (10.1186/s12864-017-4263-8) contains supplementary material, which is available to authorized users.

Collapse

van den Berg I, Bowman PJ, MacLeod IM, Hayes BJ, Wang T, Bolormaa S, Goddard ME. Multi-breed genomic prediction using Bayes R with sequence data and dropping variants with a small effect. Genet Sel Evol 2017;49:70. [PMID: 28934948 PMCID: PMC5609075 DOI: 10.1186/s12711-017-0347-9] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2017] [Accepted: 09/13/2017] [Indexed: 11/26/2022] Open

Abstract

Background

The increasing availability of whole-genome sequence data is expected to increase the accuracy of genomic prediction. However, results from simulation studies and analysis of real data do not always show an increase in accuracy from sequence data compared to high-density (HD) single nucleotide polymorphism (SNP) chip genotypes. In addition, the sheer number of variants makes analysis of all variants and accurate estimation of all effects computationally challenging. Our objective was to find a strategy to approximate the analysis of whole-sequence data with a Bayesian variable selection model. Using a simulated dataset, we applied a Bayes R hybrid model to analyse whole-sequence data, test the effect of dropping a proportion of variants during the analysis, and test how the analysis can be split into separate analyses per chromosome to reduce the elapsed computing time. We also investigated the effect of imputation errors on prediction accuracy. Subsequently, we applied the approach to a dataset that contained imputed sequences and records for production and fertility traits for 38,492 Holstein, Jersey, Australian Red and crossbred bulls and cows.

Results

With the simulated dataset, we found that prediction accuracy was highly increased for a breed that was not represented in the training population for sequence data compared to HD SNP data. Either dropping part of the variants during the analysis or splitting the analysis into separate analyses per chromosome decreased accuracy compared to analysing whole-sequence data. First, dropping variants from each chromosome and reanalysing the retained variants together resulted in an accuracy similar to that obtained when analysing whole-sequence data. Adding imputation errors decreased prediction accuracy, especially for errors in the validation population. With real data, using sequence variants resulted in accuracies that were similar to those obtained with the HD SNPs.

Conclusions

We present an efficient approach to approximate analysis of whole-sequence data with a Bayesian variable selection model. The lack of increase in prediction accuracy when applied to real data could be due to imputation errors, which demonstrates the importance of developing more accurate methods of imputation or directly genotyping sequence variants that have a major effect in the prediction equation.

Electronic supplementary material

The online version of this article (doi:10.1186/s12711-017-0347-9) contains supplementary material, which is available to authorized users.

Collapse

Fang L, Sahana G, Ma P, Su G, Yu Y, Zhang S, Lund MS, Sørensen P. Use of biological priors enhances understanding of genetic architecture and genomic prediction of complex traits within and between dairy cattle breeds. BMC Genomics 2017;18:604. [PMID: 28797230 PMCID: PMC5553760 DOI: 10.1186/s12864-017-4004-z] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2016] [Accepted: 08/02/2017] [Indexed: 02/08/2023] Open

Abstract

Background

A better understanding of the genetic architecture underlying complex traits (e.g., the distribution of causal variants and their effects) may aid in the genomic prediction. Here, we hypothesized that the genomic variants of complex traits might be enriched in a subset of genomic regions defined by genes grouped on the basis of “Gene Ontology” (GO), and that incorporating this independent biological information into genomic prediction models might improve their predictive ability.

Results

Four complex traits (i.e., milk, fat and protein yields, and mastitis) together with imputed sequence variants in Holstein (HOL) and Jersey (JER) cattle were analysed. We first carried out a post-GWAS analysis in a HOL training population to assess the degree of enrichment of the association signals in the gene regions defined by each GO term. We then extended the genomic best linear unbiased prediction model (GBLUP) to a genomic feature BLUP (GFBLUP) model, including an additional genomic effect quantifying the joint effect of a group of variants located in a genomic feature. The GBLUP model using a single random effect assumes that all genomic variants contribute to the genomic relationship equally, whereas GFBLUP attributes different weights to the individual genomic relationships in the prediction equation based on the estimated genomic parameters. Our results demonstrate that the immune-relevant GO terms were more associated with mastitis than milk production, and several biologically meaningful GO terms improved the prediction accuracy with GFBLUP for the four traits, as compared with GBLUP. The improvement of the genomic prediction between breeds (the average increase across the four traits was 0.161) was more apparent than that it was within the HOL (the average increase across the four traits was 0.020).

Conclusions

Our genomic feature modelling approaches provide a framework to simultaneously explore the genetic architecture and genomic prediction of complex traits by taking advantage of independent biological knowledge.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-017-4004-z) contains supplementary material, which is available to authorized users.

Collapse

Fragomeni BO, Lourenco DAL, Masuda Y, Legarra A, Misztal I. Incorporation of causative quantitative trait nucleotides in single-step GBLUP. Genet Sel Evol 2017;49:59. [PMID: 28747171 PMCID: PMC5530494 DOI: 10.1186/s12711-017-0335-0] [Citation(s) in RCA: 51] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2017] [Accepted: 07/17/2017] [Indexed: 11/23/2022] Open

Abstract

Background

Much effort is put into identifying causative quantitative trait nucleotides (QTN) in animal breeding, empowered by the availability of dense single nucleotide polymorphism (SNP) information. Genomic selection using traditional SNP information is easily implemented for any number of genotyped individuals using single-step genomic best linear unbiased predictor (ssGBLUP) with the algorithm for proven and young (APY). Our aim was to investigate whether ssGBLUP is useful for genomic prediction when some or all QTN are known.

Methods

Simulations included 180,000 animals across 11 generations. Phenotypes were available for all animals in generations 6 to 10. Genotypes for 60,000 SNPs across 10 chromosomes were available for 29,000 individuals. The genetic variance was fully accounted for by 100 or 1000 biallelic QTN. Raw genomic relationship matrices (GRM) were computed from (a) unweighted SNPs, (b) unweighted SNPs and causative QTN, (c) SNPs and causative QTN weighted with results obtained with genome-wide association studies, (d) unweighted SNPs and causative QTN with simulated weights, (e) only unweighted causative QTN, (f–h) as in (b–d) but using only the top 10% causative QTN, and (i) using only causative QTN with simulated weight. Predictions were computed by pedigree-based BLUP (PBLUP) and ssGBLUP. Raw GRM were blended with 1 or 5% of the numerator relationship matrix, or 1% of the identity matrix. Inverses of GRM were obtained directly or with APY.

Results

Accuracy of breeding values for 5000 genotyped animals in the last generation with PBLUP was 0.32, and for ssGBLUP it increased to 0.49 with an unweighted GRM, 0.53 after adding unweighted QTN, 0.63 when QTN weights were estimated, and 0.89 when QTN weights were based on true effects known from the simulation. When the GRM was constructed from causative QTN only, accuracy was 0.95 and 0.99 with blending at 5 and 1%, respectively. Accuracies simulating 1000 QTN were generally lower, with a similar trend. Accuracies using the APY inverse were equal or higher than those with a regular inverse.

Conclusions

Single-step GBLUP can account for causative QTN via a weighted GRM. Accuracy gains are maximum when variances of causative QTN are known and blending is at 1%.

Collapse

Wu X, Guldbrandtsen B, Nielsen US, Lund MS, Sahana G. Association analysis for young stock survival index with imputed whole-genome sequence variants in Nordic Holstein cattle. J Dairy Sci 2017;100:6356-6370. [PMID: 28551195 DOI: 10.3168/jds.2017-12688] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2017] [Accepted: 04/05/2017] [Indexed: 01/09/2023]

Abstract

Identification of the genetic variants associated with calf survival in dairy cattle will aid in the elimination of harmful mutations from the cattle population and the reduction of calf and young stock mortality rates. We used de-regressed estimated breeding values for the young stock survival (YSS) index as response variables in a genome-wide association study with imputed whole-genome sequence variants. A total of 4,610 bulls with estimated breeding values were genotyped with the Illumina BovineSNP50 (Illumina, San Diego, CA) single nucleotide polymorphism (SNP) genotyping array. Genotypes were imputed to whole-genome sequence variants. After quality control, 15,419,550 SNP on 29 Bos taurus autosomes (BTA) were used for association analysis. A modified mixed-model association analysis was used for a genome scan, followed by a linear mixed-model analysis for selected genetic variants. We identified 498 SNP on BTA5 and BTA18 that were associated with the YSS index in Nordic Holstein. The SNP rs440345507 (Chr5:94721790) on BTA5 was the putative causal mutation affecting YSS. Two haplotype-based models were used to identify haplotypes with the largest detrimental effects on YSS index. For each association signal, 1 haplotype region with harmful effects and the lead associated SNP were identified. Detected haplotypes on BTA5 and BTA18 explained 1.16 and 1.20%, respectively, of genetic variance for the YSS index. We examined whether YSS quantitative trait loci (QTL) on BTA5 and BTA18 were associated with stillbirth. YSS QTL on BTA18 overlapped a QTL region for stillbirth, but most likely 2 different causal variants were responsible for these 2 QTL. Four component traits of the YSS index, defined by sex and age, were analyzed separately by the modified mixed-model approach. The same genomic regions were associated with both bull and heifer calf mortality. Several genes (EPS8, LOC100138951, and KLK family genes) contained a lead associated SNP or were included in haplotypes with large detrimental effects on YSS in Nordic Holstein cattle.

Collapse

Lopes MS, Bovenhuis H, van Son M, Nordbø Ø, Grindflek EH, Knol EF, Bastiaansen JWM. Using markers with large effect in genetic and genomic predictions. J Anim Sci 2017;95:59-71. [PMID: 28177367 DOI: 10.2527/jas.2016.0754] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Abstract

The first attempts of applying marker-assisted selection (MAS) in animal breeding were not very successful because the identification of markers closely linked to QTL using low-density microsatellite panels was difficult. More recently, the use of high-density SNP panels in genome-wide association studies (GWAS) have increased the power and precision of identifying markers linked to QTL, which offer new possibilities for MAS. However, when GWAS started to be performed, the focus of many breeders had already shifted from the use of MAS to the application of genomic selection (using all available markers without any preselection of markers linked to QTL). In this study, we aimed to evaluate the prediction accuracy of a MAS approach that accounts for GWAS findings in the prediction models by including the most significant SNP from GWAS as a fixed effect in the marker-assisted BLUP (MA-BLUP) and marker-assisted genomic BLUP (MA-GBLUP) prediction models. A second aim was to compare the prediction accuracies from the marker-assisted models with those obtained from a Bayesian variable selection (BVS) model. To compare the prediction accuracies of traditional BLUP, MA-BLUP, genomic BLUP (GBLUP), MA-GBLUP, and BVS, we applied these models to the trait "number of teats" in 4 distinct pig populations, for validation of the results. The most significant SNP in each population was located at approximately 103.50 Mb on chromosome 7. Applying MAS by accounting for the most significant SNP in the prediction models resulted in improved prediction accuracy for number of teats in all evaluated populations compared with BLUP and GBLUP. Using MA-BLUP instead of BLUP, the increase in prediction accuracy ranged from 0.021 to 0.124, whereas using MA-GBLUP instead of GBLUP, the increase in prediction accuracy ranged from 0.003 to 0.043. The BVS model resulted in similar or higher prediction accuracies than MA-GBLUP. For the trait number of teats, BLUP resulted in the lowest prediction accuracies whereas the highest were observed when applying MA-GBLUP or BVS. In the same data set, MA-BLUP can yield similar or superior accuracies compared with GBLUP. The superiority of MA-GBLUP over traditional GBLUP is more pronounced when training populations are smaller and when relationships between training and validation populations are smaller. Marker-assisted GBLUP did not outperform BVS but does have implementation advantages in large-scale evaluations.

Collapse

Selecting sequence variants to improve genomic predictions for dairy cattle. Genet Sel Evol 2017;49:32. [PMID: 28270096 PMCID: PMC5339980 DOI: 10.1186/s12711-017-0307-4] [Citation(s) in RCA: 81] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2016] [Accepted: 02/27/2017] [Indexed: 01/26/2023] Open

Abstract

Background

Millions of genetic variants have been identified by population-scale sequencing projects, but subsets of these variants are needed for routine genomic predictions or genotyping arrays. Methods for selecting sequence variants were compared using simulated sequence genotypes and real July 2015 data from the 1000 Bull Genomes Project.

Methods

Candidate sequence variants for 444 Holstein animals were combined with high-density (HD) imputed genotypes for 26,970 progeny-tested Holstein bulls. Test 1 included single nucleotide polymorphisms (SNPs) for 481,904 candidate sequence variants. Test 2 also included 249,966 insertions-deletions (InDels). After merging sequence variants with 312,614 HD SNPs and editing steps, Tests 1 and 2 included 762,588 and 1,003,453 variants, respectively. Imputation quality from findhap software was assessed with 404 of the sequenced animals in the reference population and 40 randomly chosen animals for validation. Their sequence genotypes were reduced to the subset of genotypes that were in common with HD genotypes and then imputed back to sequence. Predictions were tested for 33 traits using 2015 data of 3983 US validation bulls with daughters that were first phenotyped after August 2011.

Results

The average percentage of correctly imputed variants across all chromosomes was 97.2 for Test 1 and 97.0 for Test 2. Total time required to prepare, edit, impute, and estimate the effects of sequence variants for 27,235 bulls was about 1 week using less than 33 threads. Many sequence variants had larger estimated effects than nearby HD SNPs, but prediction reliability improved only by 0.6 percentage points in Test 1 when sequence SNPs were added to HD SNPs and by 0.4 percentage points in Test 2 when sequence SNPs and InDels were included. However, selecting the 16,648 candidate SNPs with the largest estimated effects and adding them to the 60,671 SNPs used in routine evaluations improved reliabilities by 2.7 percentage points.

Conclusions

Reliabilities for genomic predictions improved when selected sequence variants were added; gains were similar for simulated and real data for the same population, and larger than previous gains obtained by adding HD SNPs. With many genotyped animals, many data sources, and millions of variants, computing strategies must efficiently balance costs of imputation, selection, and prediction to obtain subsets of markers that provide the highest accuracy.

Electronic supplementary material

The online version of this article (doi:10.1186/s12711-017-0307-4) contains supplementary material, which is available to authorized users.

Collapse

Le TH, Christensen OF, Nielsen B, Sahana G. Genome-wide association study for conformation traits in three Danish pig breeds. Genet Sel Evol 2017;49:12. [PMID: 28118822 PMCID: PMC5259967 DOI: 10.1186/s12711-017-0289-2] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2016] [Accepted: 01/12/2017] [Indexed: 02/07/2023] Open

Abstract

Background

Selection for sound conformation has been widely used as a primary approach to reduce lameness and leg weakness in pigs. Identification of genomic regions that affect conformation traits would help to improve selection accuracy for these lowly to moderately heritable traits. Our objective was to identify genetic factors that underlie leg and back conformation traits in three Danish pig breeds by performing a genome-wide association study followed by meta-analyses.

Methods

Data on four conformation traits (front leg, back, hind leg and overall conformation) for three Danish pig breeds (23,898 Landrace, 24,130 Yorkshire and 16,524 Duroc pigs) were used for association analyses. Estimated effects of single nucleotide polymorphisms (SNPs) from single-trait association analyses were combined in two meta-analyses: (1) a within-breed meta-analysis for multiple traits to examine if there are pleiotropic genetic variants within a breed; and (2) an across-breed meta-analysis for a single trait to examine if the same quantitative trait loci (QTL) segregate across breeds. SNP annotation was implemented through Sus scrofa Build 10.2 on Ensembl to search for candidate genes.

Results

Among the 14, 12 and 13 QTL that were detected in the single-trait association analyses for the three breeds, the most significant SNPs explained 2, 2.3 and 11.4% of genetic variance for back quality in Landrace, overall conformation in Yorkshire and back quality in Duroc, respectively. Several candidate genes for these QTL were also identified, i.e. LRPPRC, WRAP73, VRTN and PPARD likely control conformation traits through the regulation of bone and muscle development, and IGF2BP2, GH1, CCND2 and MSH2 can have an influence through growth-related processes. Meta-analyses not only confirmed many significant SNPs from single-trait analyses with higher significance levels, but also detected several additional associated SNPs and suggested QTL with possible pleiotropic effects.

Conclusions

Our results imply that conformation traits are complex and may be partly controlled by genes that are involved in bone and skeleton development, muscle and fat metabolism, and growth processes. A reliable list of QTL and candidate genes was provided that can be used in fine-mapping and marker assisted selection to improve conformation traits in pigs.

Electronic supplementary material

The online version of this article (doi:10.1186/s12711-017-0289-2) contains supplementary material, which is available to authorized users.

Collapse

Ni G, Cavero D, Fangmann A, Erbe M, Simianer H. Whole-genome sequence-based genomic prediction in laying chickens with different genomic relationship matrices to account for genetic architecture. Genet Sel Evol 2017;49:8. [PMID: 28093063 PMCID: PMC5238523 DOI: 10.1186/s12711-016-0277-y] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2015] [Accepted: 12/05/2016] [Indexed: 11/10/2022] Open

Abstract

Background

With the availability of next-generation sequencing technologies, genomic prediction based on whole-genome sequencing (WGS) data is now feasible in animal breeding schemes and was expected to lead to higher predictive ability, since such data may contain all genomic variants including causal mutations. Our objective was to compare prediction ability with high-density (HD) array data and WGS data in a commercial brown layer line with genomic best linear unbiased prediction (GBLUP) models using various approaches to weight single nucleotide polymorphisms (SNPs).

Methods

A total of 892 chickens from a commercial brown layer line were genotyped with 336 K segregating SNPs (array data) that included 157 K genic SNPs (i.e. SNPs in or around a gene). For these individuals, genome-wide sequence information was imputed based on data from re-sequencing runs of 25 individuals, leading to 5.2 million (M) imputed SNPs (WGS data), including 2.6 M genic SNPs. De-regressed proofs (DRP) for eggshell strength, feed intake and laying rate were used as quasi-phenotypic data in genomic prediction analyses. Four weighting factors for building a trait-specific genomic relationship matrix were investigated: identical weights, −(log₁₀P) from genome-wide association study results, squares of SNP effects from random regression BLUP, and variable selection based weights (known as BLUP|GA). Predictive ability was measured as the correlation between DRP and direct genomic breeding values in five replications of a fivefold cross-validation.

Results

Averaged over the three traits, the highest predictive ability (0.366 ± 0.075) was obtained when only genic SNPs from WGS data were used. Predictive abilities with genic SNPs and all SNPs from HD array data were 0.361 ± 0.072 and 0.353 ± 0.074, respectively. Prediction with −(log₁₀P) or squares of SNP effects as weighting factors for building a genomic relationship matrix or BLUP|GA did not increase accuracy, compared to that with identical weights, regardless of the SNP set used.

Conclusions

Our results show that little or no benefit was gained when using all imputed WGS data to perform genomic prediction compared to using HD array data regardless of the weighting factors tested. However, using only genic SNPs from WGS data had a positive effect on prediction ability.

Electronic supplementary material

The online version of this article (doi:10.1186/s12711-016-0277-y) contains supplementary material, which is available to authorized users.

Collapse

Veerkamp RF, Bouwman AC, Schrooten C, Calus MPL. Genomic prediction using preselected DNA variants from a GWAS with whole-genome sequence data in Holstein-Friesian cattle. Genet Sel Evol 2016;48:95. [PMID: 27905878 PMCID: PMC5134274 DOI: 10.1186/s12711-016-0274-1] [Citation(s) in RCA: 67] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2016] [Accepted: 11/24/2016] [Indexed: 11/10/2022] Open

Abstract

Background

Whole-genome sequence data is expected to capture genetic variation more completely than common genotyping panels. Our objective was to compare the proportion of variance explained and the accuracy of genomic prediction by using imputed sequence data or preselected SNPs from a genome-wide association study (GWAS) with imputed whole-genome sequence data.

Methods

Phenotypes were available for 5503 Holstein–Friesian bulls. Genotypes were imputed up to whole-genome sequence (13,789,029 segregating DNA variants) by using run 4 of the 1000 bull genomes project. The program GCTA was used to perform GWAS for protein yield (PY), somatic cell score (SCS) and interval from first to last insemination (IFL). From the GWAS, subsets of variants were selected and genomic relationship matrices (GRM) were used to estimate the variance explained in 2087 validation animals and to evaluate the genomic prediction ability. Finally, two GRM were fitted together in several models to evaluate the effect of selected variants that were in competition with all the other variants.

Results

The GRM based on full sequence data explained only marginally more genetic variation than that based on common SNP panels: for PY, SCS and IFL, genomic heritability improved from 0.81 to 0.83, 0.83 to 0.87 and 0.69 to 0.72, respectively. Sequence data also helped to identify more variants linked to quantitative trait loci and resulted in clearer GWAS peaks across the genome. The proportion of total variance explained by the selected variants combined in a GRM was considerably smaller than that explained by all variants (less than 0.31 for all traits). When selected variants were used, accuracy of genomic predictions decreased and bias increased.

Conclusions

Although 35 to 42 variants were detected that together explained 13 to 19% of the total variance (18 to 23% of the genetic variance) when fitted alone, there was no advantage in using dense sequence information for genomic prediction in the Holstein data used in our study. Detection and selection of variants within a single breed are difficult due to long-range linkage disequilibrium. Stringent selection of variants resulted in more biased genomic predictions, although this might be due to the training population being the same dataset from which the selected variants were identified.

Electronic supplementary material

The online version of this article (doi:10.1186/s12711-016-0274-1) contains supplementary material, which is available to authorized users.

Collapse

100

van den Berg I, Boichard D, Lund MS. Sequence variants selected from a multi-breed GWAS can improve the reliability of genomic predictions in dairy cattle. Genet Sel Evol 2016;48:83. [PMID: 27809758 PMCID: PMC5095991 DOI: 10.1186/s12711-016-0259-0] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2015] [Accepted: 10/19/2016] [Indexed: 01/01/2023] Open