101
|
Using Sequence Variants in Linkage Disequilibrium with Causative Mutations to Improve Across-Breed Prediction in Dairy Cattle: A Simulation Study. G3-GENES GENOMES GENETICS 2016; 6:2553-61. [PMID: 27317779 PMCID: PMC4978908 DOI: 10.1534/g3.116.027730] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Sequence data are expected to increase the reliability of genomic prediction by containing causative mutations directly, especially in cases where low linkage disequilibrium between markers and causative mutations limits prediction reliability, such as across-breed prediction in dairy cattle. In practice, the causative mutations are unknown, and prediction with only variants in perfect linkage disequilibrium with the causative mutations is not realistic, leading to a reduced reliability compared to knowing the causative variants. Our objective was to use sequence data to investigate the potential benefits of sequence data for the prediction of genomic relationships, and consequently reliability of genomic breeding values. We used sequence data from five dairy cattle breeds, and a larger number of imputed sequences for two of the five breeds. We focused on the influence of linkage disequilibrium between markers and causative mutations, and assumed that a fraction of the causative mutations was shared across breeds and had the same effect across breeds. By comparing the loss in reliability of different scenarios, varying the distance between markers and causative mutations, using either all genome wide markers from commercial SNP chips, or only the markers closest to the causative mutations, we demonstrate the importance of using only variants very close to the causative mutations, especially for across-breed prediction. Rare variants improved prediction only if they were very close to rare causative mutations, and all causative mutations were rare. Our results show that sequence data can potentially improve genomic prediction, but careful selection of markers is essential.
Collapse
|
102
|
Calus MPL, Bouwman AC, Schrooten C, Veerkamp RF. Efficient genomic prediction based on whole-genome sequence data using split-and-merge Bayesian variable selection. Genet Sel Evol 2016; 48:49. [PMID: 27357580 PMCID: PMC4926307 DOI: 10.1186/s12711-016-0225-x] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2016] [Accepted: 06/16/2016] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Use of whole-genome sequence data is expected to increase persistency of genomic prediction across generations and breeds but affects model performance and requires increased computing time. In this study, we investigated whether the split-and-merge Bayesian stochastic search variable selection (BSSVS) model could overcome these issues. BSSVS is performed first on subsets of sequence-based variants and then on a merged dataset containing variants selected in the first step. RESULTS We used a dataset that included 4,154,064 variants after editing and de-regressed proofs for 3415 reference and 2138 validation bulls for somatic cell score, protein yield and interval first to last insemination. In the first step, BSSVS was performed on 106 subsets each containing ~39,189 variants. In the second step, 1060 up to 472,492 variants, selected from the first step, were included to estimate the accuracy of genomic prediction. Accuracies were at best equal to those achieved with the commonly used Bovine 50k-SNP chip, although the number of variants within a few well-known quantitative trait loci regions was considerably enriched. When variant selection and the final genomic prediction were performed on the same data, predictions were biased. Predictions computed as the average of the predictions computed for each subset achieved the highest accuracies, i.e. 0.5 to 1.1 % higher than the accuracies obtained with the 50k-SNP chip, and yielded the least biased predictions. Finally, the accuracy of genomic predictions obtained when all sequence-based variants were included was similar or up to 1.4 % lower compared to that based on the average predictions across the subsets. By applying parallelization, the split-and-merge procedure was completed in 5 days, while the standard analysis including all sequence-based variants took more than three months. CONCLUSIONS The split-and-merge approach splits one large computational task into many much smaller ones, which allows the use of parallel processing and thus efficient genomic prediction based on whole-genome sequence data. The split-and-merge approach did not improve prediction accuracy, probably because we used data on a single breed for which relationships between individuals were high. Nevertheless, the split-and-merge approach may have potential for applications on data from multiple breeds.
Collapse
Affiliation(s)
- Mario P L Calus
- Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, PO Box 338, 6700 AH, Wageningen, The Netherlands.
| | - Aniek C Bouwman
- Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, PO Box 338, 6700 AH, Wageningen, The Netherlands
| | | | - Roel F Veerkamp
- Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, PO Box 338, 6700 AH, Wageningen, The Netherlands
| |
Collapse
|
103
|
Zhang Q, Guldbrandtsen B, Thomasen JR, Lund MS, Sahana G. Genome-wide association study for longevity with whole-genome sequencing in 3 cattle breeds. J Dairy Sci 2016; 99:7289-7298. [PMID: 27289149 DOI: 10.3168/jds.2015-10697] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2015] [Accepted: 05/04/2016] [Indexed: 01/05/2023]
Abstract
Longevity is an important economic trait in dairy production. Improvements in longevity could increase the average number of lactations per cow, thereby affecting the profitability of the dairy cattle industry. Improved longevity for cows reduces the replacement cost of stock and enables animals to achieve the highest production period. Moreover, longevity is an indirect indicator of animal welfare. Using whole-genome sequencing variants in 3 dairy cattle breeds, we carried out an association study and identified 7 genomic regions in Holstein and 5 regions in Red Dairy Cattle that were associated with longevity. Meta-analyses of 3 breeds revealed 2 significant genomic regions, located on chromosomes 6 (META-CHR6-88MB) and 18 (META-CHR18-58MB). META-CHR6-88MB overlaps with 2 known genes: neuropeptide G-protein coupled receptor (NPFFR2; 89,052,210-89,059,348 bp) and vitamin D-binding protein precursor (GC; 88,695,940-88,739,180 bp). The NPFFR2 gene was previously identified as a candidate gene for mastitis resistance. META-CHR18-58MB overlaps with zinc finger protein 717 (ZNF717; 58,130,465-58,141,877 bp) and zinc finger protein 613 (ZNF613; 58,115,782-58,117,110 bp), which have been associated with calving difficulties. Information on longevity-associated genomic regions could be used to find causal genes/variants influencing longevity and exploited to improve the reliability of genomic prediction.
Collapse
Affiliation(s)
- Qianqian Zhang
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, DK-8830 Tjele, Denmark; Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, 6700 AH Wageningen, the Netherlands.
| | - Bernt Guldbrandtsen
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, DK-8830 Tjele, Denmark
| | - Jørn Rind Thomasen
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, DK-8830 Tjele, Denmark; VikingGenetics, Assentoft, DK-8960 Randers, Denmark
| | - Mogens Sandø Lund
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, DK-8830 Tjele, Denmark
| | - Goutam Sahana
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, DK-8830 Tjele, Denmark
| |
Collapse
|
104
|
Wiggans GR, Cooper TA, VanRaden PM, Van Tassell CP, Bickhart DM, Sonstegard TS. Increasing the number of single nucleotide polymorphisms used in genomic evaluation of dairy cattle. J Dairy Sci 2016; 99:4504-4511. [PMID: 27040793 DOI: 10.3168/jds.2015-10456] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2015] [Accepted: 02/14/2016] [Indexed: 11/19/2022]
Abstract
GeneSeek (Neogen Corp., Lexington, KY) designed a new version of the GeneSeek Genomic Profiler HD BeadChip for Dairy Cattle, which originally had >77,000 single nucleotide polymorphisms (SNP). A set of >140,000 SNP was selected that included all SNP on the existing GeneSeek chip, all SNP used in US national genomic evaluations, SNP that were possible functional mutations, and other informative SNP. Because SNP with a lower minor allele frequency might track causative variants better, 30,000 more SNP were selected from the Illumina BovineHD Genotyping BeadChip (Illumina Inc., San Diego, CA) by choosing SNP to maximize differences in minor allele frequency between a SNP being considered for the new chip and the 2 SNP that flanked it. Single-gene tests were included if their location was known and bioinformatics indicated relevance for dairy cattle. To determine which SNP from the new chip should be included in genomic evaluations, genotypes available from chips already in use were used to impute and evaluate the SNP set. Effects for 134,511 usable SNP were estimated for all breed-trait combinations; SNP with the largest absolute values for effects were selected (5,000 for Holsteins, 1,000 for Jerseys, and 500 each for Brown Swiss and Ayrshires for each trait). To increase overlap with the 60,671 SNP currently used for genomic evaluation, 12,094 more SNP with the largest effects were added. After removing SNP with many parent-progeny conflicts, 84,937 SNP remained. Three cutoff studies were conducted with 3 SNP sets to determine reliability gain over that for parent average when evaluations based on August 2011 data were used to predict December 2014 performance. Across all traits, mean Holstein reliability gains were 32.5, 33.4, and 32.0 percentage points for 60,671, 84,937, and 134,511 SNP, respectively. After genotypes from the new chip became available, the proposed set was reduced from 84,937 to 77,321 SNP to remove SNP that were not included during manufacture, reduce computing time, and improve imputation performance. The set of 77,321 SNP was evaluated using August 2011 data to predict April 2015 performance. Reliability gain over 60,671 SNP was 1.4 percentage points across traits for Holsteins. Improvement over 84,937 SNP was partially the result of 4mo of additional data and genotypes from the new chip. Revision of the SNP set used for genomic evaluation is expected to be an ongoing process to increase evaluation accuracy.
Collapse
Affiliation(s)
- G R Wiggans
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, Beltsville, MD 20705-2350.
| | - T A Cooper
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, Beltsville, MD 20705-2350
| | - P M VanRaden
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, Beltsville, MD 20705-2350
| | - C P Van Tassell
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, Beltsville, MD 20705-2350
| | - D M Bickhart
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, Beltsville, MD 20705-2350
| | - T S Sonstegard
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, Beltsville, MD 20705-2350
| |
Collapse
|
105
|
Review: Opportunities and challenges for small populations of dairy cattle in the era of genomics. Animal 2016; 10:1050-60. [PMID: 26957010 DOI: 10.1017/s1751731116000410] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
In modern dairy cattle breeding, genomic breeding programs have the potential to increase efficiency and genetic gain. At the same time, the requirements and the availability of genotypes and phenotypes present a challenge. The set-up of a large enough reference population for genomic prediction is problematic for numerically small breeds but also for hard to measure traits. The first part of this study is a review of the current literature on strategies to overcome the lack of reference data. One solution is the use of combined reference populations from different breeds, different countries, or different research populations. Results reveal that the level of relationship between the merged populations is the most important factor. Compiling closely related populations facilitates the accurate estimation of marker effects and thus results in high accuracies of genomic prediction. Consequently, mixed reference populations of the same breed, but from different countries are more promising than combining different breeds, especially if those are more distantly related. The use of female reference information has the potential to enlarge the reference population size. Including females is advisable for small populations and difficult traits, and maybe combined with genotyping females and imputing those that are un-genotyped. The efficient use of imputation for un-genotyped individuals requires a set of genotyped related animals and well-considered selection strategies which animals to choose for genotyping and phenotyping. Small populations have to find ways to derive additional advantages from the cost-intensive establishment of genomic breeding schemes. Possible solutions may be the use of genomic information for inbreeding control, parentage verification, within-herd selection, adjusted mating plans or conservation strategies. The second part of the paper deals with the issue of high-quality phenotypes against the background of new, difficult and hard to measure traits. The use of contracted herds for phenotyping is recommended, as additional traits, when compared to standard traits used in dairy cattle breeding can be measured at set moments in time. This can be undertaken even for the recording of health traits, thus resulting in complete contemporary groups for health traits. Future traits to be recorded and used in genomic breeding programs, at least partly will be traits for which traditional selection based on widespread phenotyping is not possible. Enabling phenotyping of sufficient numbers to enable genomic selection will rely on cooperation between scientists from different disciplines and may require multidisciplinary approaches.
Collapse
|
106
|
VanRaden PM. Practical implications for genetic modeling in the genomics era. J Dairy Sci 2016; 99:2405-2412. [PMID: 26778313 DOI: 10.3168/jds.2015-10038] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Accepted: 11/16/2015] [Indexed: 11/19/2022]
Abstract
Genetic models convert data into estimated breeding values and other information useful to breeders. The goal is to provide accurate and timely predictions of the future performance for each animal (or embryo). Modeling involves defining traits, editing raw data, removing environmental effects, including genetic by environmental interactions and correlations among traits, and accounting for nonadditive inheritance or nonnormal distributions. Data include phenotypes and pedigrees during the last century and genotypes within the last decade. The genomic data can include single nucleotide polymorphisms, quantitative trait loci, insertions, deletions, and haplotypes. Subsets must be selected to reduce computation because total numbers of variants that can be imputed have increased rapidly from thousands to millions. Current computation using 60,671 markers takes just a few days. Nonlinear models can account for the nonnormal distribution of genomic effects, but reliability is usually better than that of linear models only for traits influenced by major genes. Numbers of genotyped animals have also increased rapidly in the joint North American database from a few thousand in 2009 to over 1 million in 2015. Most are young females and will contribute to estimating allele effects in the future, but only about 150,000 have phenotypes so far. Genomic preselection can bias traditional animal models because Mendelian sampling of phenotyped progeny and mates is no longer expected to average zero; however, estimates of bias are small in current US data. Single-step models that combine pedigree and genomic relationships can account for preselection, but approximations are required for affordable computation. Traditional animal models may include all breeds and crossbreds, but most genomic evaluations are still computed within breed. Models that include inbreeding, heterosis, dominance, and interactions can improve predictions for individual matings. Multitrait genomic models may be preferred for traits with many missing records or when foreign records are included as pseudo-observations, but most countries use multitrait traditional evaluations followed by single-trait genomic evaluations. Genomic reliabilities are about 70% for the more heritable traits. Researchers must choose from many available models and explain how the models work so that breeders can more confidently apply the predictions in their selection programs.
Collapse
Affiliation(s)
- P M VanRaden
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, Beltsville, MD 20705-2350.
| |
Collapse
|
107
|
Meuwissen T, Hayes B, Goddard M. Genomic selection: A paradigm shift in animal breeding. Anim Front 2016. [DOI: 10.2527/af.2016-0002] [Citation(s) in RCA: 223] [Impact Index Per Article: 27.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Affiliation(s)
| | - Ben Hayes
- Department of Economic Development, Jobs, Transport and Resources and Dairy Futures Cooperative Research Centre, Agribio, 5 Ring Road, Bundoora, VIC 3083, Australia; School of Applied Systems Biology, La Trobe University, Bundoora, VIC 3083, Australia
| | - Mike Goddard
- Department of Economic Development, Jobs, Transport and Resources and Dairy Futures Cooperative Research Centre, Agribio, 5 Ring Road, Bundoora, VIC 3083, Australia; Faculty of veterinary and agricultural sciences, University of Melbourne, Parkville, Australia
| |
Collapse
|
108
|
|
109
|
Raven LA, Cocks BG, Kemper KE, Chamberlain AJ, Vander Jagt CJ, Goddard ME, Hayes BJ. Targeted imputation of sequence variants and gene expression profiling identifies twelve candidate genes associated with lactation volume, composition and calving interval in dairy cattle. Mamm Genome 2015; 27:81-97. [DOI: 10.1007/s00335-015-9613-8] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2015] [Accepted: 10/28/2015] [Indexed: 10/22/2022]
|
110
|
Kadri NK, Guldbrandtsen B, Lund MS, Sahana G. Genetic dissection of milk yield traits and mastitis resistance quantitative trait loci on chromosome 20 in dairy cattle. J Dairy Sci 2015; 98:9015-25. [PMID: 26409972 DOI: 10.3168/jds.2015-9599] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2015] [Accepted: 07/25/2015] [Indexed: 11/19/2022]
Abstract
Intense selection to increase milk yield has had negative consequences for mastitis incidence in dairy cattle. Due to low heritability of mastitis resistance and an unfavorable genetic correlation with milk yield, a reduction in mastitis through traditional breeding has been difficult to achieve. Here, we examined quantitative trait loci (QTL) that segregate for clinical mastitis and milk yield on Bos taurus autosome 20 (BTA20) to determine whether both traits are affected by a single polymorphism (pleiotropy) or by multiple closely linked polymorphisms. In the latter but not the former situation, undesirable genetic correlation could potentially be broken by selecting animals that have favorable variants for both traits. First, we performed a within-breed association study using a haplotype-based method in Danish Holstein cattle (HOL). Next, we analyzed Nordic Red dairy cattle (RDC) and Danish Jersey cattle (JER) with the goal of determining whether these QTL identified in Holsteins were segregating across breeds. Genotypes for 12,566 animals (5,966 HOL, 5,458 RDC, and 1,142 JER) were determined by using the Illumina Bovine SNP50 BeadChip (50K; Illumina, San Diego, CA), which identifies 1,568 single nucleotide polymorphisms on BTA20. Data were combined, phased, and clustered into haplotype states, followed by within- and across-breed haplotype-based association analyses using a linear mixed model. Association signals for both clinical mastitis and milk yield peaked in the 26- to 40-Mb region on BTA20 in HOL. Single-variant association analyses were carried out in the QTL region using whole sequence level variants imputed from references of 2,036 HD genotypes (BovineHD BeadChip; Illumina) and 242 whole-genome sequences. The milk QTL were also segregating in RDC and JER on the BTA20-targeted region; however, an indication of differences in the causal factor(s) was observed across breeds. A previously reported F279Y mutation (rs385640152) within the growth hormone receptor gene showed strong association with milk, fat, and protein yields. In HOL, the highest peaks for milk yield and susceptibility to mastitis were separated by over 3.5 Mb (3.8 Mb by haplotype analysis, 3.6 Mb by single nucleotide polymorphism analysis), suggesting separate genetic variants for the traits. Further analysis yielded 2 candidate mutations for the mastitis QTL, at 33,642,072 bp (rs378947583) in an intronic region of the caspase recruitment domain protein 6 gene and 35,969,994 bp (rs133596506) in an intronic region of the leukemia-inhibitory factor receptor gene. These findings suggest that it may be possible to separate these beneficial and detrimental genetic factors through targeted selective breeding.
Collapse
Affiliation(s)
- Naveen K Kadri
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, 8830 Tjele, Denmark
| | - Bernt Guldbrandtsen
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, 8830 Tjele, Denmark
| | - Mogens S Lund
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, 8830 Tjele, Denmark
| | - Goutam Sahana
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, 8830 Tjele, Denmark.
| |
Collapse
|