1
|
Schneider H, Krizanac AM, Falker-Gieske C, Heise J, Tetens J, Thaller G, Bennewitz J. Genomic dissection of the correlation between milk yield and various health traits using functional and evolutionary information about imputed sequence variants of 34,497 German Holstein cows. BMC Genomics 2024; 25:265. [PMID: 38461236 PMCID: PMC11385139 DOI: 10.1186/s12864-024-10115-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Accepted: 02/13/2024] [Indexed: 03/11/2024] Open
Abstract
BACKGROUND Over the last decades, it was subject of many studies to investigate the genomic connection of milk production and health traits in dairy cattle. Thereby, incorporating functional information in genomic analyses has been shown to improve the understanding of biological and molecular mechanisms shaping complex traits and the accuracies of genomic prediction, especially in small populations and across-breed settings. Still, little is known about the contribution of different functional and evolutionary genome partitioning subsets to milk production and dairy health. Thus, we performed a uni- and a bivariate analysis of milk yield (MY) and eight health traits using a set of ~34,497 German Holstein cows with 50K chip genotypes and ~17 million imputed sequence variants divided into 27 subsets depending on their functional and evolutionary annotation. In the bivariate analysis, eight trait-combinations were observed that contrasted MY with each health trait. Two genomic relationship matrices (GRM) were included, one consisting of the 50K chip variants and one consisting of each set of subset variants, to obtain subset heritabilities and genetic correlations. In addition, 50K chip heritabilities and genetic correlations were estimated applying merely the 50K GRM. RESULTS In general, 50K chip heritabilities were larger than the subset heritabilities. The largest heritabilities were found for MY, which was 0.4358 for the 50K and 0.2757 for the subset heritabilities. Whereas all 50K genetic correlations were negative, subset genetic correlations were both, positive and negative (ranging from -0.9324 between MY and mastitis to 0.6662 between MY and digital dermatitis). The subsets containing variants which were annotated as noncoding related, splice sites, untranslated regions, metabolic quantitative trait loci, and young variants ranked highest in terms of their contribution to the traits` genetic variance. We were able to show that linkage disequilibrium between subset variants and adjacent variants did not cause these subsets` high effect. CONCLUSION Our results confirm the connection of milk production and health traits in dairy cattle via the animals` metabolic state. In addition, they highlight the potential of including functional information in genomic analyses, which helps to dissect the extent and direction of the observed traits` connection in more detail.
Collapse
Affiliation(s)
- Helen Schneider
- Institute of Animal Science, University of Hohenheim, 70599, Stuttgart, Germany.
| | - Ana-Marija Krizanac
- Department of Animal Sciences, University of Göttingen, 37077, Göttingen, Germany
| | | | - Johannes Heise
- Vereinigte Informationssysteme Tierhaltung w.V. (VIT), 27283, Verden, Germany
| | - Jens Tetens
- Department of Animal Sciences, University of Göttingen, 37077, Göttingen, Germany
| | - Georg Thaller
- Institute of Animal Breeding and Husbandry, Christian-Albrechts University of Kiel, 24098, Kiel, Germany
| | - Jörn Bennewitz
- Institute of Animal Science, University of Hohenheim, 70599, Stuttgart, Germany
| |
Collapse
|
2
|
Id-Lahoucine S, Cánovas A, Legarra A, Casellas J. Transmission ratio distortion regions in the context of genomic evaluation and their effects on reproductive traits in cattle. J Dairy Sci 2023; 106:7786-7798. [PMID: 37210358 DOI: 10.3168/jds.2022-23062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Accepted: 04/19/2023] [Indexed: 05/22/2023]
Abstract
Transmission ratio distortion (TRD), which is a deviation from Mendelian expectations, has been associated with basic mechanisms of life such as sperm and ova fertility and viability at developmental stages of the reproductive cycle. In this study different models including TRD regions were tested for different reproductive traits [days from first service to conception (FSTC), number of services, first service nonreturn rate (NRR), and stillbirth (SB)]. Thus, in addition to a basic model with systematic and random effects, including genetic effects modeled through a genomic relationship matrix, we developed 2 additional models, including a second genomic relationship matrix based on TRD regions, and TRD regions as a random effect assuming heterogeneous variances. The analyses were performed with 10,623 cows and 1,520 bulls genotyped for 47,910 SNPs, 590 TRD regions, and several records ranging from 9,587 (FSTC) to 19,667 (SB). The results of this study showed the ability of TRD regions to capture some additional genetic variance for some traits; however, this did not translate into higher accuracy for genomic prediction. This could be explained by the nature of TRD itself, which may arise in different stages of the reproductive cycle. Nevertheless, important effects of TRD regions were found on SB (31 regions) and NRR (18 regions) when comparing at-risk versus control matings, especially for regions with allelic TRD pattern. Particularly for NRR, the probability of observing nonpregnant cow increases by up to 27% for specific TRD regions, and the probability of observing stillbirth increased by up to 254%. These results support the relevance of several TRD regions on some reproductive traits, especially those with allelic patterns that have not received as much attention as recessive TRD patterns.
Collapse
Affiliation(s)
- S Id-Lahoucine
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph N1G 2W1, ON, Canada
| | - A Cánovas
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph N1G 2W1, ON, Canada.
| | - A Legarra
- INRAE, UR631 SAGA, BP 52627, 32326 Castanet-Tolosan, France
| | - J Casellas
- Departament de Ciència Animal i dels Aliments, Universitat Autònoma de Barcelona, Bellaterra 08193, Barcelona, Spain
| |
Collapse
|
3
|
Kriaridou C, Tsairidou S, Fraslin C, Gorjanc G, Looseley ME, Johnston IA, Houston RD, Robledo D. Evaluation of low-density SNP panels and imputation for cost-effective genomic selection in four aquaculture species. Front Genet 2023; 14:1194266. [PMID: 37252666 PMCID: PMC10213886 DOI: 10.3389/fgene.2023.1194266] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2023] [Accepted: 04/26/2023] [Indexed: 05/31/2023] Open
Abstract
Genomic selection can accelerate genetic progress in aquaculture breeding programmes, particularly for traits measured on siblings of selection candidates. However, it is not widely implemented in most aquaculture species, and remains expensive due to high genotyping costs. Genotype imputation is a promising strategy that can reduce genotyping costs and facilitate the broader uptake of genomic selection in aquaculture breeding programmes. Genotype imputation can predict ungenotyped SNPs in populations genotyped at a low-density (LD), using a reference population genotyped at a high-density (HD). In this study, we used datasets of four aquaculture species (Atlantic salmon, turbot, common carp and Pacific oyster), phenotyped for different traits, to investigate the efficacy of genotype imputation for cost-effective genomic selection. The four datasets had been genotyped at HD, and eight LD panels (300-6,000 SNPs) were generated in silico. SNPs were selected to be: i) evenly distributed according to physical position ii) selected to minimise the linkage disequilibrium between adjacent SNPs or iii) randomly selected. Imputation was performed with three different software packages (AlphaImpute2, FImpute v.3 and findhap v.4). The results revealed that FImpute v.3 was faster and achieved higher imputation accuracies. Imputation accuracy increased with increasing panel density for both SNP selection methods, reaching correlations greater than 0.95 in the three fish species and 0.80 in Pacific oyster. In terms of genomic prediction accuracy, the LD and the imputed panels performed similarly, reaching values very close to the HD panels, except in the pacific oyster dataset, where the LD panel performed better than the imputed panel. In the fish species, when LD panels were used for genomic prediction without imputation, selection of markers based on either physical or genetic distance (instead of randomly) resulted in a high prediction accuracy, whereas imputation achieved near maximal prediction accuracy independently of the LD panel, showing higher reliability. Our results suggests that, in fish species, well-selected LD panels may achieve near maximal genomic selection prediction accuracy, and that the addition of imputation will result in maximal accuracy independently of the LD panel. These strategies represent effective and affordable methods to incorporate genomic selection into most aquaculture settings.
Collapse
Affiliation(s)
- Christina Kriaridou
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, United Kingdom
| | - Smaragda Tsairidou
- Global Academy of Agriculture and Food Systems, University of Edinburgh, Edinburgh, United Kingdom
| | - Clémence Fraslin
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, United Kingdom
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, United Kingdom
| | | | | | - Ross D. Houston
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, United Kingdom
- Benchmark Genetics, Penicuik, United Kingdom
| | - Diego Robledo
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, United Kingdom
| |
Collapse
|
4
|
Wolf MJ, Neumann GB, Kokuć P, Yin T, Brockmann GA, König S, May K. Genetic evaluations for endangered dual-purpose German Black Pied cattle using 50K SNPs, a breed-specific 200K chip, and whole-genome sequencing. J Dairy Sci 2023; 106:3345-3358. [PMID: 37028956 DOI: 10.3168/jds.2022-22665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Accepted: 12/16/2022] [Indexed: 04/09/2023]
Abstract
Genetic evaluations of local cattle breeds are hampered due to small reference groups or biased due to the utilization of SNP effects estimated in other large populations. Against this background, there is a lack of studies addressing the possible advantage of whole-genome sequences (WGS) or consideration of specific variants from WGS data in genomic predictions for local breeds with small population size. Consequently, the aim of this study was to compare genetic parameters and accuracies of genomic estimated breeding values (GEBV) for 305-d production traits, fat-to protein ratio (FPR), and somatic cell score (SCS) at the first test date after calving and confirmation traits of the endangered German Black Pied cattle (DSN) breed using 4 different marker panels: (1) the commercial 50K Illumina BovineSNP50 BeadChip, (2) a customized 200K chip designed for DSN (DSN200K) which considers the most important variants for DSN from WGS, (3) randomly generated 200K chips based on WGS data, and (4) a WGS panel. The same number of animals was considered for all marker panel analyses (i.e., 1,811 genotyped or sequenced cows for conformation traits, 2,383 cows for lactation production traits, and 2,420 cows for FPR and SCS). Mixed models for the estimation of genetic parameters directly included the respective genomic relationship matrix from the different marker panels plus the trait-specific fixed effects. For the calculation of GEBV accuracies, we applied repeated random subsampling validation. In the process of separate cross-validations per trait, we created a validation set including 20% of cows with masked phenotypes, and a training set comprising 80% of the cows. The cows were selected randomly in a procedure with 10 replicates considering replacements in the different scenarios. The accuracy was defined as the correlation between the direct GEBV and the phenotypes with subtracted corresponding fixed effects for the cows in the validation set. For FPR and SCS, as well as for lactation production traits, heritabilities were largest based on WGS data, but the increase compared with the 50K or DSN200K applications was quite small in the range from 0.01 to 0.03. Also, for most of the conformation traits, heritabilities were largest based on WGS and DSN200K data, but the increase was in the range of the corresponding standard error. Accordingly, GEBV accuracies for most of the studied traits were highest based on WGS data or when utilizing the DSN200K chip, but the accuracy differences across the marker panels were quite small and nonsignificant. In conclusion, WGS data and the DSN200K chip only contributed to minor improvements in genomic predictions, still justifying the use of the commercial 50K chip. Nevertheless, WGS and the 200KDSN chip harbor breed-specific variants, which are valuable for studying causal genetic mechanisms in the endangered DSN population.
Collapse
Affiliation(s)
- Manuel J Wolf
- Institute of Animal Breeding and Genetics, Justus-Liebig-University Gießen, 35390 Gießen, Germany
| | - Guilherme B Neumann
- Animal Breeding Biology and Molecular Genetics, Albrecht Daniel Thaer-Institute for Agricultural and Horticultural Sciences, Humboldt Universität zu Berlin, 10115 Berlin, Germany
| | - Paula Kokuć
- Animal Breeding Biology and Molecular Genetics, Albrecht Daniel Thaer-Institute for Agricultural and Horticultural Sciences, Humboldt Universität zu Berlin, 10115 Berlin, Germany
| | - Tong Yin
- Institute of Animal Breeding and Genetics, Justus-Liebig-University Gießen, 35390 Gießen, Germany
| | - Gudrun A Brockmann
- Animal Breeding Biology and Molecular Genetics, Albrecht Daniel Thaer-Institute for Agricultural and Horticultural Sciences, Humboldt Universität zu Berlin, 10115 Berlin, Germany
| | - Sven König
- Institute of Animal Breeding and Genetics, Justus-Liebig-University Gießen, 35390 Gießen, Germany.
| | - Katharina May
- Institute of Animal Breeding and Genetics, Justus-Liebig-University Gießen, 35390 Gießen, Germany
| |
Collapse
|
5
|
Oppong RF, Boutin T, Campbell A, McIntosh AM, Porteous D, Hayward C, Haley CS, Navarro P, Knott S. SNP and Haplotype Regional Heritability Mapping (SNHap-RHM): Joint Mapping of Common and Rare Variation Affecting Complex Traits. Front Genet 2022; 12:791712. [PMID: 35069690 PMCID: PMC8770330 DOI: 10.3389/fgene.2021.791712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Accepted: 12/14/2021] [Indexed: 11/13/2022] Open
Abstract
We describe a genome-wide analytical approach, SNP and Haplotype Regional Heritability Mapping (SNHap-RHM), that provides regional estimates of the heritability across locally defined regions in the genome. This approach utilises relationship matrices that are based on sharing of SNP and haplotype alleles at local haplotype blocks delimited by recombination boundaries in the genome. We implemented the approach on simulated data and show that the haplotype-based regional GRMs capture variation that is complementary to that captured by SNP-based regional GRMs, and thus justifying the fitting of the two GRMs jointly in a single analysis (SNHap-RHM). SNHap-RHM captures regions in the genome contributing to the phenotypic variation that existing genome-wide analysis methods may fail to capture. We further demonstrate that there are real benefits to be gained from this approach by applying it to real data from about 20,000 individuals from the Generation Scotland: Scottish Family Health Study. We analysed height and major depressive disorder (MDD). We identified seven genomic regions that are genome-wide significant for height, and three regions significant at a suggestive threshold (p-value < 1 × 10-5) for MDD. These significant regions have genes mapped to within 400 kb of them. The genes mapped for height have been reported to be associated with height in humans. Similarly, those mapped for MDD have been reported to be associated with major depressive disorder and other psychiatry phenotypes. The results show that SNHap-RHM presents an exciting new opportunity to analyse complex traits by allowing the joint mapping of novel genomic regions tagged by either SNPs or haplotypes, potentially leading to the recovery of some of the "missing" heritability.
Collapse
Affiliation(s)
- Richard F. Oppong
- Longitudinal Studies Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD, United States
- Institute of Evolutionary Biology, School of Biological Sciences, The University of Edinburgh, Edinburgh, United Kingdom
| | - Thibaud Boutin
- MRC Human Genetics Unit, Institute of Genetics and Cancer, The University of Edinburgh, Edinburgh, United Kingdom
| | - Archie Campbell
- Centre for Genomic and Experimental Medicine, Institute of Genetics and Cancer, The University of Edinburgh, Edinburgh, United Kingdom
| | - Andrew M. McIntosh
- Division of Psychiatry, The University of Edinburgh, Edinburgh, United Kingdom
| | - David Porteous
- Centre for Genomic and Experimental Medicine, Institute of Genetics and Cancer, The University of Edinburgh, Edinburgh, United Kingdom
| | - Caroline Hayward
- MRC Human Genetics Unit, Institute of Genetics and Cancer, The University of Edinburgh, Edinburgh, United Kingdom
| | - Chris S. Haley
- MRC Human Genetics Unit, Institute of Genetics and Cancer, The University of Edinburgh, Edinburgh, United Kingdom
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Edinburgh, United Kingdom
| | - Pau Navarro
- MRC Human Genetics Unit, Institute of Genetics and Cancer, The University of Edinburgh, Edinburgh, United Kingdom
| | - Sara Knott
- Institute of Evolutionary Biology, School of Biological Sciences, The University of Edinburgh, Edinburgh, United Kingdom
| |
Collapse
|
6
|
Fernandes Júnior GA, Carvalheiro R, de Oliveira HN, Sargolzaei M, Costilla R, Ventura RV, Fonseca LFS, Neves HHR, Hayes BJ, de Albuquerque LG. Imputation accuracy to whole-genome sequence in Nellore cattle. Genet Sel Evol 2021; 53:27. [PMID: 33711929 PMCID: PMC7953568 DOI: 10.1186/s12711-021-00622-5] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2020] [Accepted: 03/05/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A cost-effective strategy to explore the complete DNA sequence in animals for genetic evaluation purposes is to sequence key ancestors of a population, followed by imputation mechanisms to infer marker genotypes that were not originally reported in a target population of animals genotyped with single nucleotide polymorphism (SNP) panels. The feasibility of this process relies on the accuracy of the genotype imputation in that population, particularly for potential causal mutations which may be at low frequency and either within genes or regulatory regions. The objective of the present study was to investigate the imputation accuracy to the sequence level in a Nellore beef cattle population, including that for variants in annotation classes which are more likely to be functional. METHODS Information of 151 key sequenced Nellore sires were used to assess the imputation accuracy from bovine HD BeadChip SNP (~ 777 k) to whole-genome sequence. The choice of the sires aimed at optimizing the imputation accuracy of a genotypic database, comprised of about 10,000 genotyped Nellore animals. Genotype imputation was performed using two computational approaches: FImpute3 and Minimac4 (after using Eagle for phasing). The accuracy of the imputation was evaluated using a fivefold cross-validation scheme and measured by the squared correlation between observed and imputed genotypes, calculated by individual and by SNP. SNPs were classified into a range of annotations, and the accuracy of imputation within each annotation classification was also evaluated. RESULTS High average imputation accuracies per animal were achieved using both FImpute3 (0.94) and Minimac4 (0.95). On average, common variants (minor allele frequency (MAF) > 0.03) were more accurately imputed by Minimac4 and low-frequency variants (MAF ≤ 0.03) were more accurately imputed by FImpute3. The inherent Minimac4 Rsq imputation quality statistic appears to be a good indicator of the empirical Minimac4 imputation accuracy. Both software provided high average SNP-wise imputation accuracy for all classes of biological annotations. CONCLUSIONS Our results indicate that imputation to whole-genome sequence is feasible in Nellore beef cattle since high imputation accuracies per individual are expected. SNP-wise imputation accuracy is software-dependent, especially for rare variants. The accuracy of imputation appears to be relatively independent of annotation classification.
Collapse
Affiliation(s)
| | - Roberto Carvalheiro
- School of Agricultural and Veterinarian Sciences, UNESP, Jaboticabal, SP, 14884-900, Brazil.,National Council for Scientific and Technological Development, CNPq, Brasília, DF, 71605-001, Brazil
| | - Henrique N de Oliveira
- School of Agricultural and Veterinarian Sciences, UNESP, Jaboticabal, SP, 14884-900, Brazil.,National Council for Scientific and Technological Development, CNPq, Brasília, DF, 71605-001, Brazil
| | - Mehdi Sargolzaei
- Ontario Veterinary College, UG, Guelph, Canada.,Select Sires Inc., Plain City, OH, USA
| | - Roy Costilla
- Queensland Alliance for Agriculture and Food Innovation, UQ, Brisbane, QLD, 4072, Australia
| | - Ricardo V Ventura
- School of Veterinary Medicine and Animal Science, USP, Pirassununga, SP, 13635-900, Brazil
| | - Larissa F S Fonseca
- School of Agricultural and Veterinarian Sciences, UNESP, Jaboticabal, SP, 14884-900, Brazil
| | | | - Ben J Hayes
- Queensland Alliance for Agriculture and Food Innovation, UQ, Brisbane, QLD, 4072, Australia
| | - Lucia G de Albuquerque
- School of Agricultural and Veterinarian Sciences, UNESP, Jaboticabal, SP, 14884-900, Brazil. .,National Council for Scientific and Technological Development, CNPq, Brasília, DF, 71605-001, Brazil.
| |
Collapse
|
7
|
Zhang Q, Sahana G, Su G, Guldbrandtsen B, Lund MS, Calus MPL. Impact of rare and low-frequency sequence variants on reliability of genomic prediction in dairy cattle. Genet Sel Evol 2018; 50:62. [PMID: 30458700 PMCID: PMC6247626 DOI: 10.1186/s12711-018-0432-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2018] [Accepted: 11/14/2018] [Indexed: 11/05/2022] Open
Abstract
Background Availability of whole-genome sequence data for a large number of cattle and efficient imputation methodologies open a new opportunity to include rare and low-frequency variants (RLFV) in genomic prediction in dairy cattle. The objective of this study was to examine the impact of including RLFV that are within genes and selected from whole-genome sequence variants, on the reliability of genomic prediction for fertility, health and longevity in dairy cattle. Results All genic RLFV with a minor allele frequency lower than 0.05 were extracted from imputed sequence data and subsets were created using different strategies. These subsets were subsequently combined with Illumina 50 k single nucleotide polymorphism (SNP) data and used for genomic prediction. Reliability of prediction obtained by using 50 k SNP data alone was used as reference value and absolute changes in reliabilities are referred to as changes in percentage points. Adding a component that included either all the genic or a subset of selected RLFV into the model in addition to the 50 k component changed the reliability of predictions by − 2.2 to 1.1%, i.e. hardly no change in reliability of prediction was found, regardless of how the RLFV were selected. In addition to these empirical analyses, a simulation study was performed to evaluate the potential impact of adding RLFV in the model on the reliability of prediction. Three sets of causal RLFV (containing 21,468, 1348 and 235 RLFV) that were randomly selected from different numbers of genes were generated and accounted for 10% additional genetic variance of the estimated variance explained by the 50 k SNPs. When genic RLFV based on mapping results were included in the prediction model, reliabilities improved by up to 4.0% and when the causal RLFV were included they improved by up to 6.8%. Conclusions Using selected RLFV from whole-genome sequence data had only a small impact on the empirical reliability of genomic prediction in dairy cattle. Our simulations revealed that for sequence data to bring a benefit, the key is to identify causal RLFV. Electronic supplementary material The online version of this article (10.1186/s12711-018-0432-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Qianqian Zhang
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark. .,Wageningen University and Research, Animal Breeding and Genomics, Wageningen, The Netherlands. .,Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.
| | - Goutam Sahana
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark
| | - Guosheng Su
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark
| | - Bernt Guldbrandtsen
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark
| | - Mogens Sandø Lund
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark
| | - Mario P L Calus
- Wageningen University and Research, Animal Breeding and Genomics, Wageningen, The Netherlands
| |
Collapse
|
8
|
Zeng J, Garrick D, Dekkers J, Fernando R. A nested mixture model for genomic prediction using whole-genome SNP genotypes. PLoS One 2018; 13:e0194683. [PMID: 29561877 PMCID: PMC5862491 DOI: 10.1371/journal.pone.0194683] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2017] [Accepted: 03/07/2018] [Indexed: 11/19/2022] Open
Abstract
Genomic prediction exploits single nucleotide polymorphisms (SNPs) across the whole genome for predicting genetic merit of selection candidates. In most models for genomic prediction, e.g. BayesA, B, C, R and GBLUP, independence of SNP effects is assumed. However, SNP effects are expected to be locally dependent given the presence of a nearby QTL because SNPs surrounding the QTL do not segregate independently. A consequence of ignoring this dependence is that SNPs with small effects may be overly shrunk, e.g. effects from markers with high minor allele frequencies (MAF) that flank QTL with low MAF. A nested mixture model (BayesN) is developed to account for the dependence of effects of SNPs that are closely linked, where the effects of SNPs in every non-overlapping genomic window a priori follow a point mass at zero for all SNPs or a mixture of some SNPs with nonzero effects and others with zero effects. It can be regarded as a parsimonious alternative to the existing antedependence model, antiBayesB, which allow a nonstationary dependence of SNP effects. Illumina 777K BovineHD genotypes from 948 Angus cattle were used to simulate 5,000 offspring, with 4,000 used for training and 1,000 for validation. Scenarios with 300 common (MAF > 0.05) or rare (MAF < 0.05) QTL randomly selected from segregating SNPs were replicated 8 times. SNPs corresponding to QTL were masked from a 600k panel comprising SNPs with MAF > 0.05 or a 50k evenly spaced subset of these. Compared with BayesB and a modified antiBayesB, BayesN improved the accuracy of prediction up to 2.0% with 50k SNPs and up to 7.0% with 600k SNPs, most improvements occurring in the rare QTL scenario. Computing time was reduced up to 60% with 50k SNPs and up to 75% with 600k SNPs. BayesN is an accurate and computationally efficient method for genomic prediction with whole-genome SNPs, especially for traits with rare QTL.
Collapse
Affiliation(s)
- Jian Zeng
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
- * E-mail:
| | - Dorian Garrick
- School of Agriculture, Massey University, Palmerston North, New Zealand
| | - Jack Dekkers
- Department of Animal Science, Iowa State University, Ames, Iowa, United States of America
| | - Rohan Fernando
- Department of Animal Science, Iowa State University, Ames, Iowa, United States of America
| |
Collapse
|
9
|
Frischknecht M, Meuwissen THE, Bapst B, Seefried FR, Flury C, Garrick D, Signer-Hasler H, Stricker C, Bieber A, Fries R, Russ I, Sölkner J, Bagnato A, Gredler-Grandl B. Short communication: Genomic prediction using imputed whole-genome sequence variants in Brown Swiss Cattle. J Dairy Sci 2017; 101:1292-1296. [PMID: 29153527 DOI: 10.3168/jds.2017-12890] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2017] [Accepted: 09/28/2017] [Indexed: 01/27/2023]
Abstract
The accuracy of genomic prediction determines response to selection. It has been hypothesized that accuracy of genomic breeding values can be increased by a higher density of variants. We used imputed whole-genome sequence data and various single nucleotide polymorphism (SNP) selection criteria to estimate genomic breeding values in Brown Swiss cattle. The extreme scenarios were 50K SNP chip data and whole-genome sequence data with intermediate scenarios using linkage disequilibrium-pruned whole-genome sequence variants, only variants predicted to be missense, or the top 50K variants from genome-wide association studies. We estimated genomic breeding values for 3 traits (somatic cell score, nonreturn rate in heifers, and stature) and found differences in accuracy levels between traits. However, among different SNP sets, accuracy was very similar. In our analyses, sequence data led to a marginal increase in accuracy for 1 trait and was lower than 50K for the other traits. We concluded that the inclusion of imputed whole-genome sequence data does not lead to increased accuracy of genomic prediction with the methods.
Collapse
Affiliation(s)
- Mirjam Frischknecht
- Qualitas AG, Zug 6300, Switzerland; School of Agricultural, Forest and Food Sciences (HAFL), Bern University of Applied Sciences, Zollikofen 3052, Switzerland.
| | - Theodorus H E Meuwissen
- Department of Animal and Aquacultural Sciences, Norwegian University of Life Science, Ås 1432, Norway
| | | | | | - Christine Flury
- School of Agricultural, Forest and Food Sciences (HAFL), Bern University of Applied Sciences, Zollikofen 3052, Switzerland
| | - Dorian Garrick
- Institute of Veterinary, Animal & Biomedical Sciences, Massey University, Palmerston North 4442, New Zealand
| | - Heidi Signer-Hasler
- School of Agricultural, Forest and Food Sciences (HAFL), Bern University of Applied Sciences, Zollikofen 3052, Switzerland
| | | | -
- Interbull Center, Uppsala 75007, Sweden
| | - Anna Bieber
- Department of Animal Sciences, Research Institute of Organic Agriculture (FiBL), Frick 5070, Switzerland
| | - Ruedi Fries
- Chair of Animal Breeding, Technische Universität München, Freising-Weihenstephan 85354, Germany
| | - Ingolf Russ
- Tierzuchtforschung e.V., Poing-Grub 85586, Germany
| | - Johann Sölkner
- Department of Sustainable Agricultural Systems, Division of Livestock Sciences, University of Natural Resources and Life Sciences, Wien 1180, Austria
| | - Alessandro Bagnato
- Department of Veterinary Sciences and Technologies for Food Safety, University of Milan, Milano 20133, Italy
| | | |
Collapse
|
10
|
Evaluation of the potential use of a meta-population for genomic selection in autochthonous beef cattle populations. Animal 2017; 12:1350-1357. [PMID: 29094666 DOI: 10.1017/s175173111700283x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
This study investigated the potential application of genomic selection under a multi-breed scheme in the Spanish autochthonous beef cattle populations using a simulation study that replicates the structure of linkage disequilibrium obtained from a sample of 25 triplets of sire/dam/offspring per population and using the BovineHD Beadchip. Purebred and combined reference sets were used for the genomic evaluation and several scenarios of different genetic architecture of the trait were investigated. The single-breed evaluations yielded the highest within-breed accuracies. Across breed accuracies were found low but positive on average confirming the genetic connectedness between the populations. If the same genotyping effort is split in several populations, the accuracies were lower when compared with single-breed evaluation, but showed a small advantage over small-sized purebred reference sets over the accuracies of subsequent generations. Besides, the genetic architecture of the trait did not show any relevant effect on the accuracy with the exception of rare variants, which yielded slightly lower results and higher loss of predictive ability over the generations.
Collapse
|
11
|
Zhang Q, Calus MPL, Guldbrandtsen B, Lund MS, Sahana G. Contribution of rare and low-frequency whole-genome sequence variants to complex traits variation in dairy cattle. Genet Sel Evol 2017; 49:60. [PMID: 28764638 PMCID: PMC5539983 DOI: 10.1186/s12711-017-0336-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2017] [Accepted: 07/24/2017] [Indexed: 11/26/2022] Open
Abstract
Background Whole-genome sequencing and imputation methodologies have enabled the study of the effects of genomic variants with low to very low minor allele frequency (MAF) on variation in complex traits. Our objective was to estimate the proportion of variance explained by imputed sequence variants classified according to their MAF compared with the variance explained by the pedigree-based additive genetic relationship matrix for 17 traits in Nordic Holstein dairy cattle. Results Imputed sequence variants were grouped into seven classes according to their MAF (0.001–0.01, 0.01–0.05, 0.05–0.1, 0.1–0.2, 0.2–0.3, 0.3–0.4 and 0.4–0.5). The total contribution of all imputed sequence variants to variance in deregressed estimated breeding values or proofs (DRP) for different traits ranged from 0.41 [standard error (SE) = 0.026] for temperament to 0.87 (SE = 0.011) for milk yield. The contribution of rare variants (MAF < 0.01) to the total DRP variance explained by all imputed sequence variants was relatively small (a maximum of 12.5% for the health index). Rare and low-frequency variants (MAF < 0.05) contributed a larger proportion of the explained DRP variances (>13%) for health-related traits than for production traits (<11%). However, a substantial proportion of these variance estimates across different MAF classes had large SE, especially when the variance explained by a MAF class was small. The proportion of DRP variance that was explained by all imputed whole-genome sequence variants improved slightly compared with variance explained by the 50 k Illumina markers, which are routinely used in bovine genomic prediction. However, the proportion of DRP variance explained by imputed sequence variants was lower than that explained by pedigree relationships, ranging from 1.5% for milk yield to 37.9% for the health index. Conclusions Imputed sequence variants explained more of the variance in DRP than the 50 k markers for most traits, but explained less variance than that captured by pedigree-based relationships. Although in humans partitioning variants into groups based on MAF and linkage disequilibrium was used to estimate heritability without bias, many of our bovine estimates had a high SE. For a reliable estimate of the explained DRP variance for different MAF classes, larger sample sizes are needed. Electronic supplementary material The online version of this article (doi:10.1186/s12711-017-0336-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Qianqian Zhang
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, DK-8830, Tjele, Denmark. .,Animal Breeding and Genomics, Wageningen University & Research, 6700AH, Wageningen, The Netherlands.
| | - Mario P L Calus
- Animal Breeding and Genomics, Wageningen University & Research, 6700AH, Wageningen, The Netherlands
| | - Bernt Guldbrandtsen
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, DK-8830, Tjele, Denmark
| | - Mogens Sandø Lund
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, DK-8830, Tjele, Denmark
| | - Goutam Sahana
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, DK-8830, Tjele, Denmark.
| |
Collapse
|
12
|
Pausch H, MacLeod IM, Fries R, Emmerling R, Bowman PJ, Daetwyler HD, Goddard ME. Evaluation of the accuracy of imputed sequence variant genotypes and their utility for causal variant detection in cattle. Genet Sel Evol 2017; 49:24. [PMID: 28222685 PMCID: PMC5320806 DOI: 10.1186/s12711-017-0301-x] [Citation(s) in RCA: 71] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2016] [Accepted: 02/14/2017] [Indexed: 12/11/2022] Open
Abstract
Background The availability of dense genotypes and whole-genome sequence variants from various sources offers the opportunity to compile large datasets consisting of tens of thousands of individuals with genotypes at millions of polymorphic sites that may enhance the power of genomic analyses. The imputation of missing genotypes ensures that all individuals have genotypes for a shared set of variants. Results We evaluated the accuracy of imputation from dense genotypes to whole-genome sequence variants in 249 Fleckvieh and 450 Holstein cattle using Minimac and FImpute. The sequence variants of a subset of the animals were reduced to the variants that were included on the Illumina BovineHD genotyping array and subsequently inferred in silico using either within- or multi-breed reference populations. The accuracy of imputation varied considerably across chromosomes and dropped at regions where the bovine genome contains segmental duplications. Depending on the imputation strategy, the correlation between imputed and true genotypes ranged from 0.898 to 0.952. The accuracy of imputation was higher with Minimac than FImpute particularly for variants with a low minor allele frequency. Using a multi-breed reference population increased the accuracy of imputation, particularly when FImpute was used to infer genotypes. When the sequence variants were imputed using Minimac, the true genotypes were more correlated to predicted allele dosages than best-guess genotypes. The computing costs to impute 23,256,743 sequence variants in 6958 animals were ten-fold higher with Minimac than FImpute. Association studies with imputed sequence variants revealed seven quantitative trait loci (QTL) for milk fat percentage. Two causal mutations in the DGAT1 and GHR genes were the most significantly associated variants at two QTL on chromosomes 14 and 20 when Minimac was used to infer genotypes. Conclusions The population-based imputation of millions of sequence variants in large cohorts is computationally feasible and provides accurate genotypes. However, the accuracy of imputation is low in regions where the genome contains large segmental duplications or the coverage with array-derived single nucleotide polymorphisms is poor. Using a reference population that includes individuals from many breeds increases the accuracy of imputation particularly at low-frequency variants. Considering allele dosages rather than best-guess genotypes as explanatory variables is advantageous to detect causal mutations in association studies with imputed sequence variants. Electronic supplementary material The online version of this article (doi:10.1186/s12711-017-0301-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Hubert Pausch
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, 3083, Australia.
| | - Iona M MacLeod
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, 3083, Australia
| | - Ruedi Fries
- Chair of Animal Breeding, Technische Universitaet Muenchen, 85354, Freising, Germany
| | - Reiner Emmerling
- Institute of Animal Breeding, Bavarian State Research Center for Agriculture, 85586, Grub, Germany
| | - Phil J Bowman
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, 3083, Australia.,School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3083, Australia
| | - Hans D Daetwyler
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, 3083, Australia.,School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3083, Australia
| | - Michael E Goddard
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, 3083, Australia.,Faculty of Veterinary and Agricultural Sciences, University of Melbourne, Melbourne, VIC, 3010, Australia
| |
Collapse
|