Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Wang T, Chen YPP, Bowman PJ, Goddard ME, Hayes BJ. A hybrid expectation maximisation and MCMC sampling algorithm to implement Bayesian mixture model based genomic prediction and QTL mapping. BMC Genomics 2016;17:744. [PMID: 27654580 PMCID: PMC5031345 DOI: 10.1186/s12864-016-3082-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2016] [Accepted: 09/10/2016] [Indexed: 11/23/2022] Open

For:	Wang T, Chen YPP, Bowman PJ, Goddard ME, Hayes BJ. A hybrid expectation maximisation and MCMC sampling algorithm to implement Bayesian mixture model based genomic prediction and QTL mapping. BMC Genomics 2016;17:744. [PMID: 27654580 PMCID: PMC5031345 DOI: 10.1186/s12864-016-3082-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2016] [Accepted: 09/10/2016] [Indexed: 11/23/2022] Open

Number

Cited by Other Article(s)

Zhao T, Wang F, Mott R, Dekkers J, Cheng H. Using encrypted genotypes and phenotypes for collaborative genomic analyses to maintain data confidentiality. Genetics 2024;226:iyad210. [PMID: 38085098 PMCID: PMC11090459 DOI: 10.1093/genetics/iyad210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 11/13/2023] [Indexed: 03/08/2024] Open

Ma H, Li H, Ge F, Zhao H, Zhu B, Zhang L, Gao H, Xu L, Li J, Wang Z. Improving Genomic Predictions in Multi-Breed Cattle Populations: A Comparative Analysis of BayesR and GBLUP Models. Genes (Basel) 2024;15:253. [PMID: 38397242 PMCID: PMC10887749 DOI: 10.3390/genes15020253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2024] [Revised: 02/09/2024] [Accepted: 02/16/2024] [Indexed: 02/25/2024] Open

Abstract

Numerous studies have shown that combining populations from similar or closely related genetic breeds improves the accuracy of genomic predictions (GP). Extensive experimentation with diverse Bayesian and genomic best linear unbiased prediction (GBLUP) models have been developed to explore multi-breed genomic selection (GS) in livestock, ultimately establishing them as successful approaches for predicting genomic estimated breeding value (GEBV). This study aimed to assess the effectiveness of using BayesR and GBLUP models with linkage disequilibrium (LD)-weighted genomic relationship matrices (GRMs) for genomic prediction in three different beef cattle breeds to identify the best approach for enhancing the accuracy of multi-breed genomic selection in beef cattle. Additionally, a comparison was conducted to evaluate the predictive precision of different marker densities and genetic correlations among the three breeds of beef cattle. The GRM between Yunling cattle (YL) and other breeds demonstrated modest affinity and highlighted a notable genetic concordance of 0.87 between Chinese Wagyu (WG) and Huaxi (HX) cattle. In the within-breed GS, BayesR demonstrated an advantage over GBLUP. The prediction accuracies for HX cattle using the BayesR model were 0.52 with BovineHD BeadChip data (HD) and 0.46 with whole-genome sequencing data (WGS). In comparison to the GBLUP model, the accuracy increased by 26.8% for HD data and 9.5% for WGS data. For WG and YL, BayesR doubled the within-breed prediction accuracy to 14.3% from 7.1%, outperforming GBLUP across both HD and WGS datasets. Moreover, analyzing multiple breeds using genomic selection showed that BayesR consistently outperformed GBLUP in terms of predictive accuracy, especially when using WGS. For instance, in a mixed reference population of HX and WG, BayesR achieved a significant accuracy of 0.53 using WGS for HX, which was a substantial enhancement over the accuracies obtained with GBLUP models. The research further highlights the benefit of including various breeds in the reference group, leading to enhanced accuracy in predictions and emphasizing the importance of comprehensive genomic selection methods. Our research findings indicate that BayesR exhibits superior performance compared to GBLUP in multi-breed genomic prediction accuracy, achieving a maximum improvement of 33.3%, especially in genetically diverse breeds. The improvement can be attributed to the effective utilization of higher single nucleotide polymorphism (SNP) marker density by BayesR, resulting in enhanced prediction accuracy. This evidence conclusively demonstrates the significant impact of BayesR on enhancing genomic predictions in diverse cattle populations, underscoring the crucial role of genetic relatedness in selection methodologies. In parallel, subsequent studies should focus on refining GRM and exploring alternative models for GP.

Collapse

Wolc A, Dekkers JCM. Application of Bayesian genomic prediction methods to genome-wide association analyses. Genet Sel Evol 2022;54:31. [PMID: 35562659 PMCID: PMC9103490 DOI: 10.1186/s12711-022-00724-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 04/27/2022] [Indexed: 11/19/2022] Open

Naserkheil M, Mehrban H, Lee D, Park MN. Genome-wide Association Study for Carcass Primal Cut Yields Using Single-step Bayesian Approach in Hanwoo Cattle. Front Genet 2021;12:752424. [PMID: 34899840 PMCID: PMC8662546 DOI: 10.3389/fgene.2021.752424] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 11/02/2021] [Indexed: 12/30/2022] Open

Abstract

The importance of meat and carcass quality is growing in beef cattle production to meet both producer and consumer demands. Primal cut yields, which reflect the body compositions of carcass, could determine the carcass grade and, consequently, command premium prices. Despite its importance, there have been few genome-wide association studies on these traits. This study aimed to identify genomic regions and putative candidate genes related to 10 primal cut traits, including tenderloin, sirloin, striploin, chuck, brisket, top round, bottom round, shank, flank, and rib in Hanwoo cattle using a single-step Bayesian regression (ssBR) approach. After genomic data quality control, 43,987 SNPs from 3,745 genotyped animals were available, of which 3,467 had phenotypic records for the analyzed traits. A total of 16 significant genomic regions (1-Mb window) were identified, of which five large-effect quantitative trait loci (QTLs) located on chromosomes 6 at 38–39 Mb, 11 at 21–22 Mb, 14 at 6–7 Mb and 26–27 Mb, and 19 at 26–27 Mb were associated with more than one trait, while the remaining 11 QTLs were trait-specific. These significant regions were harbored by 154 genes, among which TOX, FAM184B, SPP1, IBSP, PKD2, SDCBP, PIGY, LCORL, NCAPG, and ABCG2 were noteworthy. Enrichment analysis revealed biological processes and functional terms involved in growth and lipid metabolism, such as growth (GO:0040007), muscle structure development (GO:0061061), skeletal system development (GO:0001501), animal organ development (GO:0048513), lipid metabolic process (GO:0006629), response to lipid (GO:0033993), metabolic pathways (bta01100), focal adhesion (bta04510), ECM–receptor interaction (bta04512), fat digestion and absorption (bta04975), and Rap1 signaling pathway (bta04015) being the most significant for the carcass primal cut traits. Thus, identification of quantitative trait loci regions and plausible candidate genes will aid in a better understanding of the genetic and biological mechanisms regulating carcass primal cut yields.

Collapse

Zhao T, Fernando R, Cheng H. Interpretable artificial neural networks incorporating Bayesian alphabet models for genome-wide prediction and association studies. G3 (BETHESDA, MD.) 2021;11:jkab228. [PMID: 34499126 PMCID: PMC8496266 DOI: 10.1093/g3journal/jkab228] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Accepted: 06/22/2021] [Indexed: 01/05/2023]

Abstract

In conventional linear models for whole-genome prediction and genome-wide association studies (GWAS), it is usually assumed that the relationship between genotypes and phenotypes is linear. Bayesian neural networks have been used to account for non-linearity such as complex genetic architectures. Here, we introduce a method named NN-Bayes, where "NN" stands for neural networks, and "Bayes" stands for Bayesian Alphabet models, including a collection of Bayesian regression models such as BayesA, BayesB, BayesC, and Bayesian LASSO. NN-Bayes incorporates Bayesian Alphabet models into non-linear neural networks via hidden layers between single-nucleotide polymorphisms (SNPs) and observed traits. Thus, NN-Bayes attempts to improve the performance of genome-wide prediction and GWAS by accommodating non-linear relationships between the hidden nodes and the observed trait, while maintaining genomic interpretability through the Bayesian regression models that connect the SNPs to the hidden nodes. For genomic interpretability, the posterior distribution of marker effects in NN-Bayes is inferred by Markov chain Monte Carlo approaches and used for inference of association through posterior inclusion probabilities and window posterior probability of association. In simulation studies with dominance and epistatic effects, performance of NN-Bayes was significantly better than conventional linear models for both GWAS and whole-genome prediction, and the differences on prediction accuracy were substantial in magnitude. In real-data analyses, for the soy dataset, NN-Bayes achieved significantly higher prediction accuracies than conventional linear models, and results from other four different species showed that NN-Bayes had similar prediction performance to linear models, which is potentially due to the small sample size. Our NN-Bayes is optimized for high-dimensional genomic data and implemented in an open-source package called "JWAS." NN-Bayes can lead to greater use of Bayesian neural networks to account for non-linear relationships due to its interpretability and computational performance.

Collapse

Joukhadar R, Hollaway G, Shi F, Kant S, Forrest K, Wong D, Petkowski J, Pasam R, Tibbits J, Bariana H, Bansal U, Spangenberg G, Daetwyler H, Gendall T, Hayden M. Genome-wide association reveals a complex architecture for rust resistance in 2300 worldwide bread wheat accessions screened under various Australian conditions. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2020;133:2695-2712. [PMID: 32504212 DOI: 10.1007/s00122-020-03626-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/19/2019] [Accepted: 05/25/2020] [Indexed: 05/13/2023]

Affiliation(s)

Reem Joukhadar Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, Australia. Department of Animal, Plant and Soil Sciences, La Trobe University, Bundoora, VIC, Australia.
Grant Hollaway Agriculture Victoria, Natimuk Road, Horsham, VIC, 3401, Australia
Fan Shi Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, Australia
Surya Kant Agriculture Victoria, Natimuk Road, Horsham, VIC, 3401, Australia
Kerrie Forrest Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, Australia
Debbie Wong Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, Australia
Joanna Petkowski Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, Australia
Raj Pasam Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, Australia
Josquin Tibbits Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, Australia
Harbans Bariana Faculty of Agriculture and Environment, Plant Breeding Institute-Cobbitty, The University of Sydney, PMB4011, Narellan, NSW, 2567, Australia
Urmil Bansal Faculty of Agriculture and Environment, Plant Breeding Institute-Cobbitty, The University of Sydney, PMB4011, Narellan, NSW, 2567, Australia
German Spangenberg Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, Australia School of Applied Systems Biology, La Trobe University, Bundoora, VIC, Australia
Hans Daetwyler Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, Australia School of Applied Systems Biology, La Trobe University, Bundoora, VIC, Australia
Tony Gendall Department of Animal, Plant and Soil Sciences, La Trobe University, Bundoora, VIC, Australia
Matthew Hayden Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, Australia School of Applied Systems Biology, La Trobe University, Bundoora, VIC, Australia

Collapse

Haile-Mariam M, MacLeod IM, Bolormaa S, Schrooten C, O'Connor E, de Jong G, Daetwyler HD, Pryce JE. Value of sharing cow reference population between countries on reliability of genomic prediction for milk yield traits. J Dairy Sci 2019;103:1711-1728. [PMID: 31864746 DOI: 10.3168/jds.2019-17170] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2019] [Accepted: 10/24/2019] [Indexed: 01/08/2023]

Abstract

Increasing the reliability of genomic prediction (GP) of economic traits in the pasture-based dairy production systems of New Zealand (NZ) and Australia (AU) is important to both countries. This study assessed if sharing cow phenotype and genotype data of NZ and AU improves the reliability of GP for NZ bulls. Data from approximately 32,000 NZ genotyped cows and their contemporaries were included in the May 2018 routine genetic evaluation of the Australian Dairy cattle in an attempt to provide consistent phenotypes for both countries. After the genetic evaluation, deregressed proofs of cows were calculated for milk yield traits. The April 2018 multiple across-country evaluation of Interbull was also used to calculate deregressed proofs for bulls on the NZ scale. Approximately 1,178 Jersey (Jer) and 6,422 Holstein (Hol) bulls had genotype and phenotype data. In addition to NZ cows, phenotype data of close to 60,000 genotyped Australian (AU) cows from the same genetic evaluation run as NZ cows were used. All AU and NZ females were genotyped using low-density SNP chips (<10K SNP) and were imputed first to 50K and then to ∼600K (referred to as high density; HD). We used up to 98,000 animals in the reference populations, both by expanding the NZ reference set (cow, bull, single breed to multi-breed set) and by adding AU cows. Reliabilities of GP were calculated for 508 Jer and 1,251 Hol bulls whose sires are not included in the reference set (RS) to ensure that real differences are not masked by close relationships. The GP was tested using 50K or high-density SNP chip using genomic BLUP in bivariate (considering country as a trait) or single trait models. The RS that gave the highest reliability for each breed were also tested using a hybrid GP method that combines expectation maximization with Bayes R. The addition of the AU cows to an NZ RS that included either NZ cows only, or cows and bulls, improved the reliability of GP for both NZ Hol and Jer validation bulls for all traits. Using single breed reference populations also increased reliability when NZ crossbred cows were added to reference populations that included only purebred NZ bulls and cows and AU cows. The full multi-breed RS (all NZ cows and bulls and AU cows) provided similar reliabilities in NZ Hol bulls, when compared with the single breed reference with crossbred NZ cows. For Jer validation bulls, the RS that included Jer cows and bulls and crossbred cows from NZ and Jer cows from AU was marginally better than the all-breed, all-country RS. In terms of reliability, the advantage of the HD SNP chip was small but captured more of the genomic variance than the 50K, particularly for Hol. The expectation maximization Bayes R GP method was slightly (up to 3 percentage points) better than genomic BLUP. We conclude that GP of milk production traits in NZ bulls improves by up to 7 percentage points in reliability by expanding the NZ reference population to include AU cows.

Collapse

Hayes BJ, Daetwyler HD. 1000 Bull Genomes Project to Map Simple and Complex Genetic Traits in Cattle: Applications and Outcomes. Annu Rev Anim Biosci 2019;7:89-102. [PMID: 30508490 DOI: 10.1146/annurev-animal-020518-115024] [Citation(s) in RCA: 229] [Impact Index Per Article: 38.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

van den Berg I, Hayes BJ, Chamberlain AJ, Goddard ME. Overlap between eQTL and QTL associated with production traits and fertility in dairy cattle. BMC Genomics 2019;20:291. [PMID: 30987590 PMCID: PMC6466667 DOI: 10.1186/s12864-019-5656-7] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2018] [Accepted: 03/29/2019] [Indexed: 01/26/2023] Open

Abstract

Background

Identifying causative mutations or genes through which quantitative trait loci (QTL) act has proven very difficult. Using information such as gene expression may help to identify genes and mutations underlying QTL. Our objective was to identify regions associated both with production traits or fertility and with gene expression, in dairy cattle. We used three different approaches to discover QTL that are also expression QTL (eQTL): 1) estimate the correlation between local genomic estimated breeding values (GEBV) and gene expression, 2) investigate whether the 300 intervals explaining most genetic variance for a trait contain more eQTL than 300 randomly selected intervals, and 3) a colocalisation analysis. Phenotypes and genotypes up to sequence level of 35,775 dairy bulls and cows were used for QTL mapping, and gene expression and genotypes of 131 cows were used to identify eQTL.

Results

With all three approaches, we identified some overlap between eQTL and QTL, though the majority of QTL in our dataset did not seem to be eQTL. The most significant associations between QTL and eQTL were found for intervals on chromosome 18, where local GEBV for all traits showed a strong association with the expression of the FUK and DDX19B. Intervals whose local GEBV for a trait correlated highly significantly with the expression of a nearby gene explained only a very small part of the genetic variance for that trait. It is likely that part of these correlations were due to linkage disequilibrium (LD) in the interval. While the 300 intervals explaining most genetic variance explained most of the GEBV variance, they contained only slightly more eQTL than 300 randomly selected intervals that explained a minimal portion of the GEBV variance. Furthermore, some variants showed a high colocalisation probability, but this was only the case for few variants.

Conclusions

Several reasons may have contributed to the low level of overlap between QTL and eQTL detected in our study, including a lack of power in the eQTL study and long-range LD making it difficult to separate QTL and eQTL. Furthermore, it may be that eQTL explain only a small fraction of QTL.

Electronic supplementary material

The online version of this article (10.1186/s12864-019-5656-7) contains supplementary material, which is available to authorized users.

Collapse

van den Berg I, Meuwissen THE, MacLeod IM, Goddard ME. Predicting the effect of reference population on the accuracy of within, across, and multibreed genomic prediction. J Dairy Sci 2019;102:3155-3174. [PMID: 30738664 DOI: 10.3168/jds.2018-15231] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Accepted: 12/08/2018] [Indexed: 01/24/2023]

Abstract

Genomic prediction is widely used to select candidates for breeding. Size and composition of the reference population are important factors influencing prediction accuracy. In Holstein dairy cattle, large reference populations are used, but this is difficult to achieve in numerically small breeds and for traits that are not routinely recorded. The prediction accuracy is usually estimated using cross-validation, requiring the full data set. It would be useful to have a method to predict the benefit of multibreed reference populations that does not require the availability of the full data set. Our objective was to study the effect of the size and breed composition of the reference population on the accuracy of genomic prediction using genomic BLUP and Bayes R. We also examined the effect of trait heritability and validation breed on prediction accuracy. Using these empirical results, we investigated the use of a formula to predict the effect of the size and composition of the reference population on the accuracy of genomic prediction. Phenotypes were simulated in a data set containing real genotypes of imputed sequence variants for 22,752 dairy bulls and cows, including Holstein, Jersey, Red Holstein, and Australian Red cattle. Different reference populations were constructed, varying in size and composition, to study within-breed, multibreed, and across-breed prediction. Phenotypes were simulated varying in heritability, number of chromosomes, and number of quantitative trait loci. Genomic prediction was carried out using genomic BLUP and Bayes R. We used either the genomic relationship matrix (GRM) to estimate the number of independent chromosomal segments and subsequently to predict accuracy, or the accuracies obtained from single-breed reference populations to predict the accuracies of larger or multibreed reference populations. Using the GRM overestimated the accuracy; this overestimation was likely due to close relationships among some of the reference animals. Consequently, the GRM could not be used to predict the accuracy of genomic prediction reliably. However, a method using the prediction accuracies obtained by cross-validation using a small, single-breed reference population predicted the accuracy using a multibreed reference population well and slightly overestimated the accuracy for a larger reference population of the same breed, but gave a reasonably close estimate of the accuracy for a multibreed reference population. This method could be useful for making decisions regarding the size and composition of the reference population.

Collapse

GWAS by GBLUP: Single and Multimarker EMMAX and Bayes Factors, with an Example in Detection of a Major Gene for Horse Gait. G3-GENES GENOMES GENETICS 2018;8:2301-2308. [PMID: 29748199 PMCID: PMC6027892 DOI: 10.1534/g3.118.200336] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]

Abstract

Bayesian models for genomic prediction and association mapping are being increasingly used in genetics analysis of quantitative traits. Given a point estimate of variance components, the popular methods SNP-BLUP and GBLUP result in joint estimates of the effect of all markers on the analyzed trait; single and multiple marker frequentist tests (EMMAX) can be constructed from these estimates. Indeed, BLUP methods can be seen simultaneously as Bayesian or frequentist methods. So far there is no formal method to produce Bayesian statistics from GBLUP. Here we show that the Bayes Factor, a commonly admitted statistical procedure, can be computed as the ratio of two normal densities: the first, of the estimate of the marker effect over its posterior standard deviation; the second of the null hypothesis (a value of 0 over the prior standard deviation). We extend the BF to pool evidence from several markers and of several traits. A real data set that we analyze, with ours and existing methods, analyzes 630 horses genotyped for 41711 polymorphic SNPs for the trait “outcome of the qualification test” (which addresses gait, or ambling, of horses) for which a known major gene exists. In the horse data, single marker EMMAX shows a significant effect at the right place at Bonferroni level. The BF points to the same location although with low numerical values. The strength of evidence combining information from several consecutive markers increases using the BF and decreases using EMMAX, which comes from a fundamental difference in the Bayesian and frequentist schools of hypothesis testing. We conclude that our BF method complements frequentist EMMAX analyses because it provides a better pooling of evidence across markers, although its use for primary detection is unclear due to the lack of defined rejection thresholds.

Collapse

van den Berg I, Bowman PJ, MacLeod IM, Hayes BJ, Wang T, Bolormaa S, Goddard ME. Multi-breed genomic prediction using Bayes R with sequence data and dropping variants with a small effect. Genet Sel Evol 2017;49:70. [PMID: 28934948 PMCID: PMC5609075 DOI: 10.1186/s12711-017-0347-9] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2017] [Accepted: 09/13/2017] [Indexed: 11/26/2022] Open

Abstract

Background

The increasing availability of whole-genome sequence data is expected to increase the accuracy of genomic prediction. However, results from simulation studies and analysis of real data do not always show an increase in accuracy from sequence data compared to high-density (HD) single nucleotide polymorphism (SNP) chip genotypes. In addition, the sheer number of variants makes analysis of all variants and accurate estimation of all effects computationally challenging. Our objective was to find a strategy to approximate the analysis of whole-sequence data with a Bayesian variable selection model. Using a simulated dataset, we applied a Bayes R hybrid model to analyse whole-sequence data, test the effect of dropping a proportion of variants during the analysis, and test how the analysis can be split into separate analyses per chromosome to reduce the elapsed computing time. We also investigated the effect of imputation errors on prediction accuracy. Subsequently, we applied the approach to a dataset that contained imputed sequences and records for production and fertility traits for 38,492 Holstein, Jersey, Australian Red and crossbred bulls and cows.

Results

With the simulated dataset, we found that prediction accuracy was highly increased for a breed that was not represented in the training population for sequence data compared to HD SNP data. Either dropping part of the variants during the analysis or splitting the analysis into separate analyses per chromosome decreased accuracy compared to analysing whole-sequence data. First, dropping variants from each chromosome and reanalysing the retained variants together resulted in an accuracy similar to that obtained when analysing whole-sequence data. Adding imputation errors decreased prediction accuracy, especially for errors in the validation population. With real data, using sequence variants resulted in accuracies that were similar to those obtained with the HD SNPs.

Conclusions

We present an efficient approach to approximate analysis of whole-sequence data with a Bayesian variable selection model. The lack of increase in prediction accuracy when applied to real data could be due to imputation errors, which demonstrates the importance of developing more accurate methods of imputation or directly genotyping sequence variants that have a major effect in the prediction equation.

Electronic supplementary material

The online version of this article (doi:10.1186/s12711-017-0347-9) contains supplementary material, which is available to authorized users.

Collapse

Wang T, Chen YPP, MacLeod IM, Pryce JE, Goddard ME, Hayes BJ. Application of a Bayesian non-linear model hybrid scheme to sequence data for genomic prediction and QTL mapping. BMC Genomics 2017;18:618. [PMID: 28810831 PMCID: PMC5558724 DOI: 10.1186/s12864-017-4030-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2017] [Accepted: 08/07/2017] [Indexed: 11/10/2022] Open

Abstract

Background

Using whole genome sequence data might improve genomic prediction accuracy, when compared with high-density SNP arrays, and could lead to identification of casual mutations affecting complex traits. For some traits, the most accurate genomic predictions are achieved with non-linear Bayesian methods. However, as the number of variants and the size of the reference population increase, the computational time required to implement these Bayesian methods (typically with Monte Carlo Markov Chain sampling) becomes unfeasibly long.

Results

Here, we applied a new method, HyB_BR (for Hybrid BayesR), which implements a mixture model of normal distributions and hybridizes an Expectation-Maximization (EM) algorithm followed by Markov Chain Monte Carlo (MCMC) sampling, to genomic prediction in a large dairy cattle population with imputed whole genome sequence data. The imputed whole genome sequence data included 994,019 variant genotypes of 16,214 Holstein and Jersey bulls and cows. Traits included fat yield, milk volume, protein kg, fat% and protein% in milk, as well as fertility and heat tolerance. HyB_BR achieved genomic prediction accuracies as high as the full MCMC implementation of BayesR, both for predicting a validation set of Holstein and Jersey bulls (multi-breed prediction) and a validation set of Australian Red bulls (across-breed prediction). HyB_BR had a ten fold reduction in compute time, compared with the MCMC implementation of BayesR (48 hours versus 594 hours). We also demonstrate that in many cases HyB_BR identified sequence variants with a high posterior probability of affecting the milk production or fertility traits that were similar to those identified in BayesR. For heat tolerance, both HyB_BR and BayesR found variants in or close to promising candidate genes associated with this trait and not detected by previous studies.

Conclusions

The results demonstrate that HyB_BR is a feasible method for simultaneous genomic prediction and QTL mapping with whole genome sequence in large reference populations.

Collapse

Pausch H, MacLeod IM, Fries R, Emmerling R, Bowman PJ, Daetwyler HD, Goddard ME. Evaluation of the accuracy of imputed sequence variant genotypes and their utility for causal variant detection in cattle. Genet Sel Evol 2017;49:24. [PMID: 28222685 PMCID: PMC5320806 DOI: 10.1186/s12711-017-0301-x] [Citation(s) in RCA: 71] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2016] [Accepted: 02/14/2017] [Indexed: 12/11/2022] Open

Abstract

Background

The availability of dense genotypes and whole-genome sequence variants from various sources offers the opportunity to compile large datasets consisting of tens of thousands of individuals with genotypes at millions of polymorphic sites that may enhance the power of genomic analyses. The imputation of missing genotypes ensures that all individuals have genotypes for a shared set of variants.

Results

We evaluated the accuracy of imputation from dense genotypes to whole-genome sequence variants in 249 Fleckvieh and 450 Holstein cattle using Minimac and FImpute. The sequence variants of a subset of the animals were reduced to the variants that were included on the Illumina BovineHD genotyping array and subsequently inferred in silico using either within- or multi-breed reference populations. The accuracy of imputation varied considerably across chromosomes and dropped at regions where the bovine genome contains segmental duplications. Depending on the imputation strategy, the correlation between imputed and true genotypes ranged from 0.898 to 0.952. The accuracy of imputation was higher with Minimac than FImpute particularly for variants with a low minor allele frequency. Using a multi-breed reference population increased the accuracy of imputation, particularly when FImpute was used to infer genotypes. When the sequence variants were imputed using Minimac, the true genotypes were more correlated to predicted allele dosages than best-guess genotypes. The computing costs to impute 23,256,743 sequence variants in 6958 animals were ten-fold higher with Minimac than FImpute. Association studies with imputed sequence variants revealed seven quantitative trait loci (QTL) for milk fat percentage. Two causal mutations in the DGAT1 and GHR genes were the most significantly associated variants at two QTL on chromosomes 14 and 20 when Minimac was used to infer genotypes.

Conclusions

The population-based imputation of millions of sequence variants in large cohorts is computationally feasible and provides accurate genotypes. However, the accuracy of imputation is low in regions where the genome contains large segmental duplications or the coverage with array-derived single nucleotide polymorphisms is poor. Using a reference population that includes individuals from many breeds increases the accuracy of imputation particularly at low-frequency variants. Considering allele dosages rather than best-guess genotypes as explanatory variables is advantageous to detect causal mutations in association studies with imputed sequence variants.

Electronic supplementary material

The online version of this article (doi:10.1186/s12711-017-0301-x) contains supplementary material, which is available to authorized users.

Collapse