1
|
Zhao T, Wang F, Mott R, Dekkers J, Cheng H. Using encrypted genotypes and phenotypes for collaborative genomic analyses to maintain data confidentiality. Genetics 2024; 226:iyad210. [PMID: 38085098 PMCID: PMC11090459 DOI: 10.1093/genetics/iyad210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 11/13/2023] [Indexed: 03/08/2024] Open
Abstract
To adhere to and capitalize on the benefits of the FAIR (findable, accessible, interoperable, and reusable) principles in agricultural genome-to-phenome studies, it is crucial to address privacy and intellectual property issues that prevent sharing and reuse of data in research and industry. Direct sharing of genotype and phenotype data is often prohibited due to intellectual property and privacy concerns. Thus, there is a pressing need for encryption methods that obscure confidential aspects of the data, without affecting the outcomes of certain statistical analyses. A homomorphic encryption method for genotypes and phenotypes (HEGP) has been proposed for single-marker regression in genome-wide association studies (GWAS) using linear mixed models with Gaussian errors. This methodology permits frequentist likelihood-based parameter estimation and inference. In this paper, we extend HEGP to broader applications in genome-to-phenome analyses. We show that HEGP is suited to commonly used linear mixed models for genetic analyses of quantitative traits including genomic best linear unbiased prediction (GBLUP) and ridge-regression best linear unbiased prediction (RR-BLUP), as well as Bayesian variable selection methods (e.g. those in Bayesian Alphabet), for genetic parameter estimation, genomic prediction, and GWAS. By advancing the capabilities of HEGP, we offer researchers and industry professionals a secure and efficient approach for collaborative genomic analyses while preserving data confidentiality.
Collapse
Affiliation(s)
- Tianjing Zhao
- Department of Animal Science, University of California, Davis, CA 95616, USA
- Department of Animal Science, University of Nebraska-Lincoln, Lincoln, NE 68583, USA
| | - Fangyi Wang
- Department of Plant Sciences, University of California, Davis, CA 95616, USA
| | - Richard Mott
- Genetics Institute, University College London, London, WC1E 6BT, UK
| | - Jack Dekkers
- Department of Animal Science, Iowa State University, Ames, IA 50011, USA
| | - Hao Cheng
- Department of Animal Science, University of California, Davis, CA 95616, USA
| |
Collapse
|
2
|
Ma H, Li H, Ge F, Zhao H, Zhu B, Zhang L, Gao H, Xu L, Li J, Wang Z. Improving Genomic Predictions in Multi-Breed Cattle Populations: A Comparative Analysis of BayesR and GBLUP Models. Genes (Basel) 2024; 15:253. [PMID: 38397242 PMCID: PMC10887749 DOI: 10.3390/genes15020253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2024] [Revised: 02/09/2024] [Accepted: 02/16/2024] [Indexed: 02/25/2024] Open
Abstract
Numerous studies have shown that combining populations from similar or closely related genetic breeds improves the accuracy of genomic predictions (GP). Extensive experimentation with diverse Bayesian and genomic best linear unbiased prediction (GBLUP) models have been developed to explore multi-breed genomic selection (GS) in livestock, ultimately establishing them as successful approaches for predicting genomic estimated breeding value (GEBV). This study aimed to assess the effectiveness of using BayesR and GBLUP models with linkage disequilibrium (LD)-weighted genomic relationship matrices (GRMs) for genomic prediction in three different beef cattle breeds to identify the best approach for enhancing the accuracy of multi-breed genomic selection in beef cattle. Additionally, a comparison was conducted to evaluate the predictive precision of different marker densities and genetic correlations among the three breeds of beef cattle. The GRM between Yunling cattle (YL) and other breeds demonstrated modest affinity and highlighted a notable genetic concordance of 0.87 between Chinese Wagyu (WG) and Huaxi (HX) cattle. In the within-breed GS, BayesR demonstrated an advantage over GBLUP. The prediction accuracies for HX cattle using the BayesR model were 0.52 with BovineHD BeadChip data (HD) and 0.46 with whole-genome sequencing data (WGS). In comparison to the GBLUP model, the accuracy increased by 26.8% for HD data and 9.5% for WGS data. For WG and YL, BayesR doubled the within-breed prediction accuracy to 14.3% from 7.1%, outperforming GBLUP across both HD and WGS datasets. Moreover, analyzing multiple breeds using genomic selection showed that BayesR consistently outperformed GBLUP in terms of predictive accuracy, especially when using WGS. For instance, in a mixed reference population of HX and WG, BayesR achieved a significant accuracy of 0.53 using WGS for HX, which was a substantial enhancement over the accuracies obtained with GBLUP models. The research further highlights the benefit of including various breeds in the reference group, leading to enhanced accuracy in predictions and emphasizing the importance of comprehensive genomic selection methods. Our research findings indicate that BayesR exhibits superior performance compared to GBLUP in multi-breed genomic prediction accuracy, achieving a maximum improvement of 33.3%, especially in genetically diverse breeds. The improvement can be attributed to the effective utilization of higher single nucleotide polymorphism (SNP) marker density by BayesR, resulting in enhanced prediction accuracy. This evidence conclusively demonstrates the significant impact of BayesR on enhancing genomic predictions in diverse cattle populations, underscoring the crucial role of genetic relatedness in selection methodologies. In parallel, subsequent studies should focus on refining GRM and exploring alternative models for GP.
Collapse
Affiliation(s)
- Haoran Ma
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (H.M.); (H.L.); (L.Z.); (J.L.)
| | - Hongwei Li
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (H.M.); (H.L.); (L.Z.); (J.L.)
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB 510632, Canada
| | - Fei Ge
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (H.M.); (H.L.); (L.Z.); (J.L.)
| | - Huqiong Zhao
- College of Animal Science, Shanxi Agricultural University, Jinzhong 030801, China
| | - Bo Zhu
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (H.M.); (H.L.); (L.Z.); (J.L.)
| | - Lupei Zhang
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (H.M.); (H.L.); (L.Z.); (J.L.)
| | - Huijiang Gao
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (H.M.); (H.L.); (L.Z.); (J.L.)
| | - Lingyang Xu
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (H.M.); (H.L.); (L.Z.); (J.L.)
| | - Junya Li
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (H.M.); (H.L.); (L.Z.); (J.L.)
| | - Zezhao Wang
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (H.M.); (H.L.); (L.Z.); (J.L.)
| |
Collapse
|
3
|
Wolc A, Dekkers JCM. Application of Bayesian genomic prediction methods to genome-wide association analyses. Genet Sel Evol 2022; 54:31. [PMID: 35562659 PMCID: PMC9103490 DOI: 10.1186/s12711-022-00724-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 04/27/2022] [Indexed: 11/19/2022] Open
Abstract
Background Bayesian genomic prediction methods were developed to simultaneously fit all genotyped markers to a set of available phenotypes for prediction of breeding values for quantitative traits, allowing for differences in the genetic architecture (distribution of marker effects) of traits. These methods also provide a flexible and reliable framework for genome-wide association (GWA) studies. The objective here was to review developments in Bayesian hierarchical and variable selection models for GWA analyses. Results By fitting all genotyped markers simultaneously, Bayesian GWA methods implicitly account for population structure and the multiple-testing problem of classical single-marker GWA. Implemented using Markov chain Monte Carlo methods, Bayesian GWA methods allow for control of error rates using probabilities obtained from posterior distributions. Power of GWA studies using Bayesian methods can be enhanced by using informative priors based on previous association studies, gene expression analyses, or functional annotation information. Applied to multiple traits, Bayesian GWA analyses can give insight into pleiotropic effects by multi-trait, structural equation, or graphical models. Bayesian methods can also be used to combine genomic, transcriptomic, proteomic, and other -omics data to infer causal genotype to phenotype relationships and to suggest external interventions that can improve performance. Conclusions Bayesian hierarchical and variable selection methods provide a unified and powerful framework for genomic prediction, GWA, integration of prior information, and integration of information from other -omics platforms to identify causal mutations for complex quantitative traits.
Collapse
Affiliation(s)
- Anna Wolc
- Department of Animal Science, Iowa State University, 806 Stange Road, 239 Kildee Hall, Ames, IA, 50010, USA.,Hy-Line International, 2583 240th Street, Dallas Center, IA, 50063, USA
| | - Jack C M Dekkers
- Department of Animal Science, Iowa State University, 806 Stange Road, 239 Kildee Hall, Ames, IA, 50010, USA.
| |
Collapse
|
4
|
Naserkheil M, Mehrban H, Lee D, Park MN. Genome-wide Association Study for Carcass Primal Cut Yields Using Single-step Bayesian Approach in Hanwoo Cattle. Front Genet 2021; 12:752424. [PMID: 34899840 PMCID: PMC8662546 DOI: 10.3389/fgene.2021.752424] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 11/02/2021] [Indexed: 12/30/2022] Open
Abstract
The importance of meat and carcass quality is growing in beef cattle production to meet both producer and consumer demands. Primal cut yields, which reflect the body compositions of carcass, could determine the carcass grade and, consequently, command premium prices. Despite its importance, there have been few genome-wide association studies on these traits. This study aimed to identify genomic regions and putative candidate genes related to 10 primal cut traits, including tenderloin, sirloin, striploin, chuck, brisket, top round, bottom round, shank, flank, and rib in Hanwoo cattle using a single-step Bayesian regression (ssBR) approach. After genomic data quality control, 43,987 SNPs from 3,745 genotyped animals were available, of which 3,467 had phenotypic records for the analyzed traits. A total of 16 significant genomic regions (1-Mb window) were identified, of which five large-effect quantitative trait loci (QTLs) located on chromosomes 6 at 38–39 Mb, 11 at 21–22 Mb, 14 at 6–7 Mb and 26–27 Mb, and 19 at 26–27 Mb were associated with more than one trait, while the remaining 11 QTLs were trait-specific. These significant regions were harbored by 154 genes, among which TOX, FAM184B, SPP1, IBSP, PKD2, SDCBP, PIGY, LCORL, NCAPG, and ABCG2 were noteworthy. Enrichment analysis revealed biological processes and functional terms involved in growth and lipid metabolism, such as growth (GO:0040007), muscle structure development (GO:0061061), skeletal system development (GO:0001501), animal organ development (GO:0048513), lipid metabolic process (GO:0006629), response to lipid (GO:0033993), metabolic pathways (bta01100), focal adhesion (bta04510), ECM–receptor interaction (bta04512), fat digestion and absorption (bta04975), and Rap1 signaling pathway (bta04015) being the most significant for the carcass primal cut traits. Thus, identification of quantitative trait loci regions and plausible candidate genes will aid in a better understanding of the genetic and biological mechanisms regulating carcass primal cut yields.
Collapse
Affiliation(s)
- Masoumeh Naserkheil
- Animal Breeding and Genetics Division, National Institute of Animal Science, Cheonan-si, South Korea
| | - Hossein Mehrban
- Department of Animal Science, Shahrekord University, Shahrekord, Iran
| | - Deukmin Lee
- Department of Animal Life and Environment Sciences, Hankyong National University, Anseong-si, South Korea
| | - Mi Na Park
- Animal Breeding and Genetics Division, National Institute of Animal Science, Cheonan-si, South Korea
| |
Collapse
|
5
|
Zhao T, Fernando R, Cheng H. Interpretable artificial neural networks incorporating Bayesian alphabet models for genome-wide prediction and association studies. G3 (BETHESDA, MD.) 2021; 11:jkab228. [PMID: 34499126 PMCID: PMC8496266 DOI: 10.1093/g3journal/jkab228] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Accepted: 06/22/2021] [Indexed: 01/05/2023]
Abstract
In conventional linear models for whole-genome prediction and genome-wide association studies (GWAS), it is usually assumed that the relationship between genotypes and phenotypes is linear. Bayesian neural networks have been used to account for non-linearity such as complex genetic architectures. Here, we introduce a method named NN-Bayes, where "NN" stands for neural networks, and "Bayes" stands for Bayesian Alphabet models, including a collection of Bayesian regression models such as BayesA, BayesB, BayesC, and Bayesian LASSO. NN-Bayes incorporates Bayesian Alphabet models into non-linear neural networks via hidden layers between single-nucleotide polymorphisms (SNPs) and observed traits. Thus, NN-Bayes attempts to improve the performance of genome-wide prediction and GWAS by accommodating non-linear relationships between the hidden nodes and the observed trait, while maintaining genomic interpretability through the Bayesian regression models that connect the SNPs to the hidden nodes. For genomic interpretability, the posterior distribution of marker effects in NN-Bayes is inferred by Markov chain Monte Carlo approaches and used for inference of association through posterior inclusion probabilities and window posterior probability of association. In simulation studies with dominance and epistatic effects, performance of NN-Bayes was significantly better than conventional linear models for both GWAS and whole-genome prediction, and the differences on prediction accuracy were substantial in magnitude. In real-data analyses, for the soy dataset, NN-Bayes achieved significantly higher prediction accuracies than conventional linear models, and results from other four different species showed that NN-Bayes had similar prediction performance to linear models, which is potentially due to the small sample size. Our NN-Bayes is optimized for high-dimensional genomic data and implemented in an open-source package called "JWAS." NN-Bayes can lead to greater use of Bayesian neural networks to account for non-linear relationships due to its interpretability and computational performance.
Collapse
Affiliation(s)
- Tianjing Zhao
- Department of Animal Science, University of California Davis, Davis, CA 95616, USA
- Integrative Genetics and Genomics Graduate Group, University of California Davis, Davis, CA 95616, USA
| | - Rohan Fernando
- Department of Animal Science, Iowa State University, Ames, IA 50011, USA
| | - Hao Cheng
- Department of Animal Science, University of California Davis, Davis, CA 95616, USA
| |
Collapse
|
6
|
Joukhadar R, Hollaway G, Shi F, Kant S, Forrest K, Wong D, Petkowski J, Pasam R, Tibbits J, Bariana H, Bansal U, Spangenberg G, Daetwyler H, Gendall T, Hayden M. Genome-wide association reveals a complex architecture for rust resistance in 2300 worldwide bread wheat accessions screened under various Australian conditions. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2020; 133:2695-2712. [PMID: 32504212 DOI: 10.1007/s00122-020-03626-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/19/2019] [Accepted: 05/25/2020] [Indexed: 05/13/2023]
Abstract
We utilized 2300 wheat accessions including worldwide landraces, cultivars and primary synthetic-derived germplasm with three Australian cultivars: Annuello, Yitpi and Correll, to investigate field-based resistance to leaf (Lr) rust, stem (Sr) rust and stripe (Yr) rust diseases across a range of Australian wheat agri-production zones. Generally, the resistance in the modern Australian cultivars, synthetic derivatives, South and North American materials outperformed other geographical subpopulations. Different environments for each trait showed significant correlations, with average r values of 0.53, 0.23 and 0.66 for Lr, Sr and Yr, respectively. Single-trait genome-wide association studies (GWAS) revealed several environment-specific and multi-environment quantitative trait loci (QTL). Multi-trait GWAS confirmed a cluster of Yr QTL on chromosome 3B within a 4.4-cM region. Linkage disequilibrium and comparative mapping showed that at least three Yr QTL exist within the 3B cluster including the durable rust resistance gene Yr30. An Sr/Lr QTL on chromosome 3D was found mainly in the synthetic-derived germplasm from Annuello background which is known to carry the Agropyron elongatum 3D translocation involving the Sr24/Lr24 resistance locus. Interestingly, estimating the SNP effects using a BayesR method showed that the correlation among the highest 1% of QTL effects across environments (excluding GWAS QTL) had significant correlations, with average r values of 0.26, 0.16 and 0.55 for Lr, Sr and Yr, respectively. These results indicate the importance of small effect QTL in achieving durable rust resistance which can be captured using genomic selection.
Collapse
Affiliation(s)
- Reem Joukhadar
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, Australia.
- Department of Animal, Plant and Soil Sciences, La Trobe University, Bundoora, VIC, Australia.
| | - Grant Hollaway
- Agriculture Victoria, Natimuk Road, Horsham, VIC, 3401, Australia
| | - Fan Shi
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, Australia
| | - Surya Kant
- Agriculture Victoria, Natimuk Road, Horsham, VIC, 3401, Australia
| | - Kerrie Forrest
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, Australia
| | - Debbie Wong
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, Australia
| | - Joanna Petkowski
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, Australia
| | - Raj Pasam
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, Australia
| | - Josquin Tibbits
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, Australia
| | - Harbans Bariana
- Faculty of Agriculture and Environment, Plant Breeding Institute-Cobbitty, The University of Sydney, PMB4011, Narellan, NSW, 2567, Australia
| | - Urmil Bansal
- Faculty of Agriculture and Environment, Plant Breeding Institute-Cobbitty, The University of Sydney, PMB4011, Narellan, NSW, 2567, Australia
| | - German Spangenberg
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, Australia
| | - Hans Daetwyler
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, Australia
| | - Tony Gendall
- Department of Animal, Plant and Soil Sciences, La Trobe University, Bundoora, VIC, Australia
| | - Matthew Hayden
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, Australia
| |
Collapse
|
7
|
Haile-Mariam M, MacLeod IM, Bolormaa S, Schrooten C, O'Connor E, de Jong G, Daetwyler HD, Pryce JE. Value of sharing cow reference population between countries on reliability of genomic prediction for milk yield traits. J Dairy Sci 2019; 103:1711-1728. [PMID: 31864746 DOI: 10.3168/jds.2019-17170] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2019] [Accepted: 10/24/2019] [Indexed: 01/08/2023]
Abstract
Increasing the reliability of genomic prediction (GP) of economic traits in the pasture-based dairy production systems of New Zealand (NZ) and Australia (AU) is important to both countries. This study assessed if sharing cow phenotype and genotype data of NZ and AU improves the reliability of GP for NZ bulls. Data from approximately 32,000 NZ genotyped cows and their contemporaries were included in the May 2018 routine genetic evaluation of the Australian Dairy cattle in an attempt to provide consistent phenotypes for both countries. After the genetic evaluation, deregressed proofs of cows were calculated for milk yield traits. The April 2018 multiple across-country evaluation of Interbull was also used to calculate deregressed proofs for bulls on the NZ scale. Approximately 1,178 Jersey (Jer) and 6,422 Holstein (Hol) bulls had genotype and phenotype data. In addition to NZ cows, phenotype data of close to 60,000 genotyped Australian (AU) cows from the same genetic evaluation run as NZ cows were used. All AU and NZ females were genotyped using low-density SNP chips (<10K SNP) and were imputed first to 50K and then to ∼600K (referred to as high density; HD). We used up to 98,000 animals in the reference populations, both by expanding the NZ reference set (cow, bull, single breed to multi-breed set) and by adding AU cows. Reliabilities of GP were calculated for 508 Jer and 1,251 Hol bulls whose sires are not included in the reference set (RS) to ensure that real differences are not masked by close relationships. The GP was tested using 50K or high-density SNP chip using genomic BLUP in bivariate (considering country as a trait) or single trait models. The RS that gave the highest reliability for each breed were also tested using a hybrid GP method that combines expectation maximization with Bayes R. The addition of the AU cows to an NZ RS that included either NZ cows only, or cows and bulls, improved the reliability of GP for both NZ Hol and Jer validation bulls for all traits. Using single breed reference populations also increased reliability when NZ crossbred cows were added to reference populations that included only purebred NZ bulls and cows and AU cows. The full multi-breed RS (all NZ cows and bulls and AU cows) provided similar reliabilities in NZ Hol bulls, when compared with the single breed reference with crossbred NZ cows. For Jer validation bulls, the RS that included Jer cows and bulls and crossbred cows from NZ and Jer cows from AU was marginally better than the all-breed, all-country RS. In terms of reliability, the advantage of the HD SNP chip was small but captured more of the genomic variance than the 50K, particularly for Hol. The expectation maximization Bayes R GP method was slightly (up to 3 percentage points) better than genomic BLUP. We conclude that GP of milk production traits in NZ bulls improves by up to 7 percentage points in reliability by expanding the NZ reference population to include AU cows.
Collapse
Affiliation(s)
- M Haile-Mariam
- Agriculture Victoria, Department of Jobs, Precincts and Regions, Bundoora, VIC 3083, Australia.
| | - I M MacLeod
- Agriculture Victoria, Department of Jobs, Precincts and Regions, Bundoora, VIC 3083, Australia
| | - S Bolormaa
- Agriculture Victoria, Department of Jobs, Precincts and Regions, Bundoora, VIC 3083, Australia
| | | | | | - G de Jong
- CRV, 6800 AL Arnhem, the Netherlands
| | - H D Daetwyler
- Agriculture Victoria, Department of Jobs, Precincts and Regions, Bundoora, VIC 3083, Australia; School of Applied Systems Biology, La Trobe University, Bundoora, VIC 3083, Australia
| | - J E Pryce
- Agriculture Victoria, Department of Jobs, Precincts and Regions, Bundoora, VIC 3083, Australia; School of Applied Systems Biology, La Trobe University, Bundoora, VIC 3083, Australia
| |
Collapse
|
8
|
Hayes BJ, Daetwyler HD. 1000 Bull Genomes Project to Map Simple and Complex Genetic Traits in Cattle: Applications and Outcomes. Annu Rev Anim Biosci 2019; 7:89-102. [PMID: 30508490 DOI: 10.1146/annurev-animal-020518-115024] [Citation(s) in RCA: 229] [Impact Index Per Article: 38.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The 1000 Bull Genomes Project is a collection of whole-genome sequences from 2,703 individuals capturing a significant proportion of the world's cattle diversity. So far, 84 million single-nucleotide polymorphisms (SNPs) and 2.5 million small insertion deletions have been identified in the collection, a very high level of genetic diversity. The project has greatly accelerated the identification of deleterious mutations for a range of genetic diseases, as well as for embryonic lethals. The rate of identification of causal mutations for complex traits has been slower, reflecting the typically small effect size of these mutations and the fact that many are likely in as-yet-unannotated regulatory regions. Both the deleterious mutations that have been identified and the mutations associated with complex trait variation have been included in low-cost SNP array designs, and these arrays are being genotyped in tens of thousands of dairy and beef cattle, enabling management of deleterious mutations in these populations as well as genomic selection.
Collapse
Affiliation(s)
- Ben J Hayes
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St Lucia, Queensland 4067, Australia; .,Agriculture Victoria Research, AgriBio, Bundoora, Victoria 3083, Australia
| | - Hans D Daetwyler
- Agriculture Victoria Research, AgriBio, Bundoora, Victoria 3083, Australia.,School of Applied Systems Biology, La Trobe University, Bundoora, Victoria 3083, Australia
| |
Collapse
|
9
|
van den Berg I, Hayes BJ, Chamberlain AJ, Goddard ME. Overlap between eQTL and QTL associated with production traits and fertility in dairy cattle. BMC Genomics 2019; 20:291. [PMID: 30987590 PMCID: PMC6466667 DOI: 10.1186/s12864-019-5656-7] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2018] [Accepted: 03/29/2019] [Indexed: 01/26/2023] Open
Abstract
Background Identifying causative mutations or genes through which quantitative trait loci (QTL) act has proven very difficult. Using information such as gene expression may help to identify genes and mutations underlying QTL. Our objective was to identify regions associated both with production traits or fertility and with gene expression, in dairy cattle. We used three different approaches to discover QTL that are also expression QTL (eQTL): 1) estimate the correlation between local genomic estimated breeding values (GEBV) and gene expression, 2) investigate whether the 300 intervals explaining most genetic variance for a trait contain more eQTL than 300 randomly selected intervals, and 3) a colocalisation analysis. Phenotypes and genotypes up to sequence level of 35,775 dairy bulls and cows were used for QTL mapping, and gene expression and genotypes of 131 cows were used to identify eQTL. Results With all three approaches, we identified some overlap between eQTL and QTL, though the majority of QTL in our dataset did not seem to be eQTL. The most significant associations between QTL and eQTL were found for intervals on chromosome 18, where local GEBV for all traits showed a strong association with the expression of the FUK and DDX19B. Intervals whose local GEBV for a trait correlated highly significantly with the expression of a nearby gene explained only a very small part of the genetic variance for that trait. It is likely that part of these correlations were due to linkage disequilibrium (LD) in the interval. While the 300 intervals explaining most genetic variance explained most of the GEBV variance, they contained only slightly more eQTL than 300 randomly selected intervals that explained a minimal portion of the GEBV variance. Furthermore, some variants showed a high colocalisation probability, but this was only the case for few variants. Conclusions Several reasons may have contributed to the low level of overlap between QTL and eQTL detected in our study, including a lack of power in the eQTL study and long-range LD making it difficult to separate QTL and eQTL. Furthermore, it may be that eQTL explain only a small fraction of QTL. Electronic supplementary material The online version of this article (10.1186/s12864-019-5656-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- I van den Berg
- Faculty of Veterinary & Agricultural Science, University of Melbourne, Parkville, Victoria, Australia. .,Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, Victoria, Australia.
| | - B J Hayes
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, Victoria, Australia.,Queensland Alliance for Agriculture and Food Innovation, Centre for Animal Science, University of Queensland, St Lucia, Queensland, 4067, Australia
| | - A J Chamberlain
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, Victoria, Australia
| | - M E Goddard
- Faculty of Veterinary & Agricultural Science, University of Melbourne, Parkville, Victoria, Australia.,Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, Victoria, Australia
| |
Collapse
|
10
|
van den Berg I, Meuwissen THE, MacLeod IM, Goddard ME. Predicting the effect of reference population on the accuracy of within, across, and multibreed genomic prediction. J Dairy Sci 2019; 102:3155-3174. [PMID: 30738664 DOI: 10.3168/jds.2018-15231] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Accepted: 12/08/2018] [Indexed: 01/24/2023]
Abstract
Genomic prediction is widely used to select candidates for breeding. Size and composition of the reference population are important factors influencing prediction accuracy. In Holstein dairy cattle, large reference populations are used, but this is difficult to achieve in numerically small breeds and for traits that are not routinely recorded. The prediction accuracy is usually estimated using cross-validation, requiring the full data set. It would be useful to have a method to predict the benefit of multibreed reference populations that does not require the availability of the full data set. Our objective was to study the effect of the size and breed composition of the reference population on the accuracy of genomic prediction using genomic BLUP and Bayes R. We also examined the effect of trait heritability and validation breed on prediction accuracy. Using these empirical results, we investigated the use of a formula to predict the effect of the size and composition of the reference population on the accuracy of genomic prediction. Phenotypes were simulated in a data set containing real genotypes of imputed sequence variants for 22,752 dairy bulls and cows, including Holstein, Jersey, Red Holstein, and Australian Red cattle. Different reference populations were constructed, varying in size and composition, to study within-breed, multibreed, and across-breed prediction. Phenotypes were simulated varying in heritability, number of chromosomes, and number of quantitative trait loci. Genomic prediction was carried out using genomic BLUP and Bayes R. We used either the genomic relationship matrix (GRM) to estimate the number of independent chromosomal segments and subsequently to predict accuracy, or the accuracies obtained from single-breed reference populations to predict the accuracies of larger or multibreed reference populations. Using the GRM overestimated the accuracy; this overestimation was likely due to close relationships among some of the reference animals. Consequently, the GRM could not be used to predict the accuracy of genomic prediction reliably. However, a method using the prediction accuracies obtained by cross-validation using a small, single-breed reference population predicted the accuracy using a multibreed reference population well and slightly overestimated the accuracy for a larger reference population of the same breed, but gave a reasonably close estimate of the accuracy for a multibreed reference population. This method could be useful for making decisions regarding the size and composition of the reference population.
Collapse
Affiliation(s)
- I van den Berg
- Faculty of Veterinary & Agricultural Science, University of Melbourne, 3010 Parkville, Victoria, Australia; Agriculture Victoria, AgriBio, Centre for AgriBioscience, 3083 Bundoora, Victoria, Australia.
| | - T H E Meuwissen
- Department of Animal and Aquacultural Sciences, Norwegian University of Life Sciences, 1432 Ås, Norway
| | - I M MacLeod
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, 3083 Bundoora, Victoria, Australia
| | - M E Goddard
- Faculty of Veterinary & Agricultural Science, University of Melbourne, 3010 Parkville, Victoria, Australia; Agriculture Victoria, AgriBio, Centre for AgriBioscience, 3083 Bundoora, Victoria, Australia
| |
Collapse
|
11
|
GWAS by GBLUP: Single and Multimarker EMMAX and Bayes Factors, with an Example in Detection of a Major Gene for Horse Gait. G3-GENES GENOMES GENETICS 2018; 8:2301-2308. [PMID: 29748199 PMCID: PMC6027892 DOI: 10.1534/g3.118.200336] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Bayesian models for genomic prediction and association mapping are being increasingly used in genetics analysis of quantitative traits. Given a point estimate of variance components, the popular methods SNP-BLUP and GBLUP result in joint estimates of the effect of all markers on the analyzed trait; single and multiple marker frequentist tests (EMMAX) can be constructed from these estimates. Indeed, BLUP methods can be seen simultaneously as Bayesian or frequentist methods. So far there is no formal method to produce Bayesian statistics from GBLUP. Here we show that the Bayes Factor, a commonly admitted statistical procedure, can be computed as the ratio of two normal densities: the first, of the estimate of the marker effect over its posterior standard deviation; the second of the null hypothesis (a value of 0 over the prior standard deviation). We extend the BF to pool evidence from several markers and of several traits. A real data set that we analyze, with ours and existing methods, analyzes 630 horses genotyped for 41711 polymorphic SNPs for the trait “outcome of the qualification test” (which addresses gait, or ambling, of horses) for which a known major gene exists. In the horse data, single marker EMMAX shows a significant effect at the right place at Bonferroni level. The BF points to the same location although with low numerical values. The strength of evidence combining information from several consecutive markers increases using the BF and decreases using EMMAX, which comes from a fundamental difference in the Bayesian and frequentist schools of hypothesis testing. We conclude that our BF method complements frequentist EMMAX analyses because it provides a better pooling of evidence across markers, although its use for primary detection is unclear due to the lack of defined rejection thresholds.
Collapse
|
12
|
van den Berg I, Bowman PJ, MacLeod IM, Hayes BJ, Wang T, Bolormaa S, Goddard ME. Multi-breed genomic prediction using Bayes R with sequence data and dropping variants with a small effect. Genet Sel Evol 2017; 49:70. [PMID: 28934948 PMCID: PMC5609075 DOI: 10.1186/s12711-017-0347-9] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2017] [Accepted: 09/13/2017] [Indexed: 11/26/2022] Open
Abstract
Background The increasing availability of whole-genome sequence data is expected to increase the accuracy of genomic prediction. However, results from simulation studies and analysis of real data do not always show an increase in accuracy from sequence data compared to high-density (HD) single nucleotide polymorphism (SNP) chip genotypes. In addition, the sheer number of variants makes analysis of all variants and accurate estimation of all effects computationally challenging. Our objective was to find a strategy to approximate the analysis of whole-sequence data with a Bayesian variable selection model. Using a simulated dataset, we applied a Bayes R hybrid model to analyse whole-sequence data, test the effect of dropping a proportion of variants during the analysis, and test how the analysis can be split into separate analyses per chromosome to reduce the elapsed computing time. We also investigated the effect of imputation errors on prediction accuracy. Subsequently, we applied the approach to a dataset that contained imputed sequences and records for production and fertility traits for 38,492 Holstein, Jersey, Australian Red and crossbred bulls and cows. Results With the simulated dataset, we found that prediction accuracy was highly increased for a breed that was not represented in the training population for sequence data compared to HD SNP data. Either dropping part of the variants during the analysis or splitting the analysis into separate analyses per chromosome decreased accuracy compared to analysing whole-sequence data. First, dropping variants from each chromosome and reanalysing the retained variants together resulted in an accuracy similar to that obtained when analysing whole-sequence data. Adding imputation errors decreased prediction accuracy, especially for errors in the validation population. With real data, using sequence variants resulted in accuracies that were similar to those obtained with the HD SNPs. Conclusions We present an efficient approach to approximate analysis of whole-sequence data with a Bayesian variable selection model. The lack of increase in prediction accuracy when applied to real data could be due to imputation errors, which demonstrates the importance of developing more accurate methods of imputation or directly genotyping sequence variants that have a major effect in the prediction equation. Electronic supplementary material The online version of this article (doi:10.1186/s12711-017-0347-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Irene van den Berg
- Faculty of Veterinary and Agricultural Science, University of Melbourne, Parkville, VIC, Australia.
| | - Phil J Bowman
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia.,School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3083, Australia
| | - Iona M MacLeod
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia
| | - Ben J Hayes
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia.,Queensland Alliance for Agriculture and Food Innovation, Centre for Animal Science, University of Queensland, St Lucia, QLD, Australia
| | - Tingting Wang
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia
| | - Sunduimijid Bolormaa
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia
| | - Mike E Goddard
- Faculty of Veterinary and Agricultural Science, University of Melbourne, Parkville, VIC, Australia.,Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia
| |
Collapse
|
13
|
Wang T, Chen YPP, MacLeod IM, Pryce JE, Goddard ME, Hayes BJ. Application of a Bayesian non-linear model hybrid scheme to sequence data for genomic prediction and QTL mapping. BMC Genomics 2017; 18:618. [PMID: 28810831 PMCID: PMC5558724 DOI: 10.1186/s12864-017-4030-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2017] [Accepted: 08/07/2017] [Indexed: 11/10/2022] Open
Abstract
Background Using whole genome sequence data might improve genomic prediction accuracy, when compared with high-density SNP arrays, and could lead to identification of casual mutations affecting complex traits. For some traits, the most accurate genomic predictions are achieved with non-linear Bayesian methods. However, as the number of variants and the size of the reference population increase, the computational time required to implement these Bayesian methods (typically with Monte Carlo Markov Chain sampling) becomes unfeasibly long. Results Here, we applied a new method, HyB_BR (for Hybrid BayesR), which implements a mixture model of normal distributions and hybridizes an Expectation-Maximization (EM) algorithm followed by Markov Chain Monte Carlo (MCMC) sampling, to genomic prediction in a large dairy cattle population with imputed whole genome sequence data. The imputed whole genome sequence data included 994,019 variant genotypes of 16,214 Holstein and Jersey bulls and cows. Traits included fat yield, milk volume, protein kg, fat% and protein% in milk, as well as fertility and heat tolerance. HyB_BR achieved genomic prediction accuracies as high as the full MCMC implementation of BayesR, both for predicting a validation set of Holstein and Jersey bulls (multi-breed prediction) and a validation set of Australian Red bulls (across-breed prediction). HyB_BR had a ten fold reduction in compute time, compared with the MCMC implementation of BayesR (48 hours versus 594 hours). We also demonstrate that in many cases HyB_BR identified sequence variants with a high posterior probability of affecting the milk production or fertility traits that were similar to those identified in BayesR. For heat tolerance, both HyB_BR and BayesR found variants in or close to promising candidate genes associated with this trait and not detected by previous studies. Conclusions The results demonstrate that HyB_BR is a feasible method for simultaneous genomic prediction and QTL mapping with whole genome sequence in large reference populations.
Collapse
Affiliation(s)
- Tingting Wang
- School of Engineering and Mathematical Sciences, La Trobe University, Melbourne, VIC, 3083, Australia. .,Agriculture Victoria, AgriBio, Centre for AgriBioscience, Melbourne, VIC, 3083, Australia. .,Dairy Futures Cooperative Research Centre, Melbourne, VIC, 3083, Australia.
| | - Yi-Ping Phoebe Chen
- School of Engineering and Mathematical Sciences, La Trobe University, Melbourne, VIC, 3083, Australia
| | - Iona M MacLeod
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Melbourne, VIC, 3083, Australia.,Dairy Futures Cooperative Research Centre, Melbourne, VIC, 3083, Australia
| | - Jennie E Pryce
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Melbourne, VIC, 3083, Australia.,Dairy Futures Cooperative Research Centre, Melbourne, VIC, 3083, Australia.,School of Applied Systems Biology, La Trobe University, Melbourne, VIC, 3083, Australia
| | - Michael E Goddard
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Melbourne, VIC, 3083, Australia.,Dairy Futures Cooperative Research Centre, Melbourne, VIC, 3083, Australia.,Faculty of Veterinary and Agricultural Science, University of Melbourne, Melbourne, VIC, 3010, Australia
| | - Ben J Hayes
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Melbourne, VIC, 3083, Australia.,Dairy Futures Cooperative Research Centre, Melbourne, VIC, 3083, Australia.,Queensland Alliance for Agriculture and Food Innovation, Centre for Animal Science, University of Queensland, St Lucia, Brisbane, QLD, 4072, Australia
| |
Collapse
|
14
|
Pausch H, MacLeod IM, Fries R, Emmerling R, Bowman PJ, Daetwyler HD, Goddard ME. Evaluation of the accuracy of imputed sequence variant genotypes and their utility for causal variant detection in cattle. Genet Sel Evol 2017; 49:24. [PMID: 28222685 PMCID: PMC5320806 DOI: 10.1186/s12711-017-0301-x] [Citation(s) in RCA: 71] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2016] [Accepted: 02/14/2017] [Indexed: 12/11/2022] Open
Abstract
Background The availability of dense genotypes and whole-genome sequence variants from various sources offers the opportunity to compile large datasets consisting of tens of thousands of individuals with genotypes at millions of polymorphic sites that may enhance the power of genomic analyses. The imputation of missing genotypes ensures that all individuals have genotypes for a shared set of variants. Results We evaluated the accuracy of imputation from dense genotypes to whole-genome sequence variants in 249 Fleckvieh and 450 Holstein cattle using Minimac and FImpute. The sequence variants of a subset of the animals were reduced to the variants that were included on the Illumina BovineHD genotyping array and subsequently inferred in silico using either within- or multi-breed reference populations. The accuracy of imputation varied considerably across chromosomes and dropped at regions where the bovine genome contains segmental duplications. Depending on the imputation strategy, the correlation between imputed and true genotypes ranged from 0.898 to 0.952. The accuracy of imputation was higher with Minimac than FImpute particularly for variants with a low minor allele frequency. Using a multi-breed reference population increased the accuracy of imputation, particularly when FImpute was used to infer genotypes. When the sequence variants were imputed using Minimac, the true genotypes were more correlated to predicted allele dosages than best-guess genotypes. The computing costs to impute 23,256,743 sequence variants in 6958 animals were ten-fold higher with Minimac than FImpute. Association studies with imputed sequence variants revealed seven quantitative trait loci (QTL) for milk fat percentage. Two causal mutations in the DGAT1 and GHR genes were the most significantly associated variants at two QTL on chromosomes 14 and 20 when Minimac was used to infer genotypes. Conclusions The population-based imputation of millions of sequence variants in large cohorts is computationally feasible and provides accurate genotypes. However, the accuracy of imputation is low in regions where the genome contains large segmental duplications or the coverage with array-derived single nucleotide polymorphisms is poor. Using a reference population that includes individuals from many breeds increases the accuracy of imputation particularly at low-frequency variants. Considering allele dosages rather than best-guess genotypes as explanatory variables is advantageous to detect causal mutations in association studies with imputed sequence variants. Electronic supplementary material The online version of this article (doi:10.1186/s12711-017-0301-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Hubert Pausch
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, 3083, Australia.
| | - Iona M MacLeod
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, 3083, Australia
| | - Ruedi Fries
- Chair of Animal Breeding, Technische Universitaet Muenchen, 85354, Freising, Germany
| | - Reiner Emmerling
- Institute of Animal Breeding, Bavarian State Research Center for Agriculture, 85586, Grub, Germany
| | - Phil J Bowman
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, 3083, Australia.,School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3083, Australia
| | - Hans D Daetwyler
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, 3083, Australia.,School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3083, Australia
| | - Michael E Goddard
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, 3083, Australia.,Faculty of Veterinary and Agricultural Sciences, University of Melbourne, Melbourne, VIC, 3010, Australia
| |
Collapse
|