51
|
de Las Heras-Saldana S, Lopez BI, Moghaddar N, Park W, Park JE, Chung KY, Lim D, Lee SH, Shin D, van der Werf JHJ. Use of gene expression and whole-genome sequence information to improve the accuracy of genomic prediction for carcass traits in Hanwoo cattle. Genet Sel Evol 2020; 52:54. [PMID: 32993481 PMCID: PMC7525992 DOI: 10.1186/s12711-020-00574-2] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Accepted: 09/18/2020] [Indexed: 12/21/2022] Open
Abstract
Background In this study, we assessed the accuracy of genomic prediction for carcass weight (CWT), marbling score (MS), eye muscle area (EMA) and back fat thickness (BFT) in Hanwoo cattle when using genomic best linear unbiased prediction (GBLUP), weighted GBLUP (wGBLUP), and a BayesR model. For these models, we investigated the potential gain from using pre-selected single nucleotide polymorphisms (SNPs) from a genome-wide association study (GWAS) on imputed sequence data and from gene expression information. We used data on 13,717 animals with carcass phenotypes and imputed sequence genotypes that were split in an independent GWAS discovery set of varying size and a remaining set for validation of prediction. Expression data were used from a Hanwoo gene expression experiment based on 45 animals. Results Using a larger number of animals in the reference set increased the accuracy of genomic prediction whereas a larger independent GWAS discovery dataset improved identification of predictive SNPs. Using pre-selected SNPs from GWAS in GBLUP improved accuracy of prediction by 0.02 for EMA and up to 0.05 for BFT, CWT, and MS, compared to a 50 k standard SNP array that gave accuracies of 0.50, 0.47, 0.58, and 0.47, respectively. Accuracy of prediction of BFT and CWT increased when BayesR was applied with the 50 k SNP array (0.02 and 0.03, respectively) and was further improved by combining the 50 k array with the top-SNPs (0.06 and 0.04, respectively). By contrast, using BayesR resulted in limited improvement for EMA and MS. wGBLUP did not improve accuracy but increased prediction bias. Based on the RNA-seq experiment, we identified informative expression quantitative trait loci, which, when used in GBLUP, improved the accuracy of prediction slightly, i.e. between 0.01 and 0.02. SNPs that were located in genes, the expression of which was associated with differences in trait phenotype, did not contribute to a higher prediction accuracy. Conclusions Our results show that, in Hanwoo beef cattle, when SNPs are pre-selected from GWAS on imputed sequence data, the accuracy of prediction improves only slightly whereas the contribution of SNPs that are selected based on gene expression is not significant. The benefit of statistical models to prioritize selected SNPs for estimating genomic breeding values is trait-specific and depends on the genetic architecture of each trait.
Collapse
Affiliation(s)
| | - Bryan Irvine Lopez
- Animal Genomics and Bioinformatics Division, National Institute of Animal Science, Rural Development Administration, Wanju, 55365, Republic of Korea
| | - Nasir Moghaddar
- School of Environmental and Rural Science, University of New England, Armidale, NSW 2351, Australia
| | - Woncheoul Park
- Animal Genomics and Bioinformatics Division, National Institute of Animal Science, Rural Development Administration, Wanju, 55365, Republic of Korea
| | - Jong-Eun Park
- Animal Genomics and Bioinformatics Division, National Institute of Animal Science, Rural Development Administration, Wanju, 55365, Republic of Korea
| | - Ki Y Chung
- Department of Beef Science, Korea National College of Agriculture and Fisheries, Jeonju, Republic of Korea
| | - Dajeong Lim
- Animal Genomics and Bioinformatics Division, National Institute of Animal Science, Rural Development Administration, Wanju, 55365, Republic of Korea.
| | - Seung H Lee
- Division of Animal and Dairy Science, Chungnam National University, Deajeon, 34148, Republic of Korea
| | - Donghyun Shin
- The Animal Molecular Genetics and Breeding Centre, Jeonbuk National University, Jeonju, 54896, Republic of Korea
| | - Julius H J van der Werf
- School of Environmental and Rural Science, University of New England, Armidale, NSW 2351, Australia.
| |
Collapse
|
52
|
Teng J, Huang S, Chen Z, Gao N, Ye S, Diao S, Ding X, Yuan X, Zhang H, Li J, Zhang Z. Optimizing genomic prediction model given causal genes in a dairy cattle population. J Dairy Sci 2020; 103:10299-10310. [PMID: 32952023 DOI: 10.3168/jds.2020-18233] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Accepted: 07/07/2020] [Indexed: 01/15/2023]
Abstract
As genotypic data are moving from SNP chip toward whole-genome sequence, the accuracy of genomic prediction (GP) exhibits a marginal gain, although all genetic variation, including causal genes, are contained in whole-genome sequence data. Meanwhile, genetic analyses on complex traits, such as genome-wide association studies, have identified an increasing number of genomic regions, including potential causal genes, which would be reliable prior knowledge for GP. Many studies have tried to improve the performance of GP by modifying the prediction model to incorporate prior knowledge. Although several plausible results have been obtained from model modification or strategy optimization, most of them were validated in a specific empirical population with a limited variety of genetic architecture for complex traits. An alternative approach is to use simulated genetic architecture with known causal genes (e.g., simulated causative SNP) to evaluate different GP models with given causal genes. Our objectives were to (1) evaluate the performance of GP under a variety of genetic architectures with a subset of known causal genes and (2) compare different GP models modified by highlighting causal genes and different strategies to weight causal genes. In this study, we simulated pseudo-phenotypes under a variety of genetic architectures based on the real genotypes and phenotypes of a dairy cattle population. Besides classical genomic best linear unbiased prediction, we evaluated 3 modified GP models that highlight causal genes as follows: (1) by treating them as fixed effects, (2) by treating them as a separate random component, and (3) by combining them into the genomic relationship matrix as random effects. Our results showed that highlighting the known causal genes, which explained a considerable proportion of genetic variance in the GP models, increased the predictive accuracy. Combining all given causal genes into the genomic relationship matrix was the optimal strategy under all the scenarios validated, and treating causal genes as a separate random component is also recommended, when more than 20% of genetic variance was explained by known causal genes. Moreover, assigning differential weights to each causal gene further improved the predictive accuracy.
Collapse
Affiliation(s)
- Jinyan Teng
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Shuwen Huang
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Zitao Chen
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Ning Gao
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, North Third Road, Guangzhou Higher Education Mega Center, Guangzhou 510006, China
| | - Shaopan Ye
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Shuqi Diao
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Xiangdong Ding
- National Engineering Laboratory for Animal Breeding, Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China
| | - Xiaolong Yuan
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Hao Zhang
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Jiaqi Li
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Zhe Zhang
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China.
| |
Collapse
|
53
|
Shabalina T, Yin T, König S. Survival analyses in Holstein cows considering direct disease diagnoses and specific SNP marker effects. J Dairy Sci 2020; 103:8257-8273. [DOI: 10.3168/jds.2020-18174] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2020] [Accepted: 05/07/2020] [Indexed: 12/11/2022]
|
54
|
Liu A, Lund MS, Boichard D, Karaman E, Guldbrandtsen B, Fritz S, Aamand GP, Nielsen US, Sahana G, Wang Y, Su G. Weighted single-step genomic best linear unbiased prediction integrating variants selected from sequencing data by association and bioinformatics analyses. Genet Sel Evol 2020; 52:48. [PMID: 32799816 PMCID: PMC7429790 DOI: 10.1186/s12711-020-00568-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Accepted: 08/07/2020] [Indexed: 11/30/2022] Open
Abstract
Background Sequencing data enable the detection of causal loci or single nucleotide polymorphisms (SNPs) highly linked to causal loci to improve genomic prediction. However, until now, studies on integrating such SNPs using a single-step genomic best linear unbiased prediction (ssGBLUP) model are scarce. We investigated the integration of sequencing SNPs selected by association (1262 SNPs) and bioinformatics (2359 SNPs) analyses into the currently used 54K-SNP chip, using three ssGBLUP models which make different assumptions on the distribution of SNP effects: a basic ssGBLUP model, a so-called featured ssGBLUP (ssFGBLUP) model that considered selected sequencing SNPs as a feature genetic component, and a weighted ssGBLUP (ssWGBLUP) model in which the genomic relationship matrix was weighted by the SNP variances estimated from a Bayesian whole-genome regression model, with every 1, 30, or 100 adjacent SNPs within a chromosome region sharing the same variance. We used data on milk production and female fertility in Danish Jersey. In total, 15,823 genotyped and 528,981 non-genotyped females born between 1990 and 2013 were used as reference population and 7415 genotyped females and 33,040 non-genotyped females born between 2014 and 2016 were used as validation population. Results With basic ssGBLUP, integrating SNPs selected from sequencing data improved prediction reliabilities for milk and protein yields, but resulted in limited or no improvement for fat yield and female fertility. Model performances depended on the SNP set used. When using ssWGBLUP with the 54K SNPs, reliabilities for milk and protein yields improved by 0.028 for genotyped animals and by 0.006 for non-genotyped animals compared with ssGBLUP. However, with the SNP set that included SNPs selected from sequencing data, no statistically significant difference in prediction reliability was observed between the three ssGBLUP models. Conclusions In summary, when using 54K SNPs, a ssWGBLUP model with a common weight on the SNPs in a given region is a feasible approach for single-trait genetic evaluation. Integrating relevant SNPs selected from sequencing data into the standard SNP chip can improve the reliability of genomic prediction. Based on such SNP data, a basic ssGBLUP model was suggested since no significant improvement was observed from using alternative models such as ssWGBLUP and ssFGBLUP.
Collapse
Affiliation(s)
- Aoxing Liu
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830, Tjele, Denmark.
| | - Mogens Sandø Lund
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830, Tjele, Denmark
| | - Didier Boichard
- INRAE, AgroParisTech, GABI, Université Paris-Saclay, 78350, Jouy-en-Josas, France
| | - Emre Karaman
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830, Tjele, Denmark
| | - Bernt Guldbrandtsen
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830, Tjele, Denmark
| | - Sebastien Fritz
- INRAE, AgroParisTech, GABI, Université Paris-Saclay, 78350, Jouy-en-Josas, France.,ALLICE, 75012, Paris, France
| | | | | | - Goutam Sahana
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830, Tjele, Denmark
| | - Yachun Wang
- Key Laboratory of Animal Genetics, Breeding and Reproduction, MARA; National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, 100193, Beijing, P.R. China
| | - Guosheng Su
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830, Tjele, Denmark
| |
Collapse
|
55
|
Lourenco D, Legarra A, Tsuruta S, Masuda Y, Aguilar I, Misztal I. Single-Step Genomic Evaluations from Theory to Practice: Using SNP Chips and Sequence Data in BLUPF90. Genes (Basel) 2020; 11:E790. [PMID: 32674271 PMCID: PMC7397237 DOI: 10.3390/genes11070790] [Citation(s) in RCA: 60] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Revised: 07/03/2020] [Accepted: 07/06/2020] [Indexed: 11/16/2022] Open
Abstract
Single-step genomic evaluation became a standard procedure in livestock breeding, and the main reason is the ability to combine all pedigree, phenotypes, and genotypes available into one single evaluation, without the need of post-analysis processing. Therefore, the incorporation of data on genotyped and non-genotyped animals in this method is straightforward. Since 2009, two main implementations of single-step were proposed. One is called single-step genomic best linear unbiased prediction (ssGBLUP) and uses single nucleotide polymorphism (SNP) to construct the genomic relationship matrix; the other is the single-step Bayesian regression (ssBR), which is a marker effect model. Under the same assumptions, both models are equivalent. In this review, we focus solely on ssGBLUP. The implementation of ssGBLUP into the BLUPF90 software suite was done in 2009, and since then, several changes were made to make ssGBLUP flexible to any model, number of traits, number of phenotypes, and number of genotyped animals. Single-step GBLUP from the BLUPF90 software suite has been used for genomic evaluations worldwide. In this review, we will show theoretical developments and numerical examples of ssGBLUP using SNP data from regular chips to sequence data.
Collapse
Affiliation(s)
- Daniela Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA 30602, USA; (S.T.); (Y.M.); (I.M.)
| | - Andres Legarra
- Institut National de la Recherche Agronomique, UMR1388 GenPhySE, 31326 Castanet Tolosan, France;
| | - Shogo Tsuruta
- Department of Animal and Dairy Science, University of Georgia, Athens, GA 30602, USA; (S.T.); (Y.M.); (I.M.)
| | - Yutaka Masuda
- Department of Animal and Dairy Science, University of Georgia, Athens, GA 30602, USA; (S.T.); (Y.M.); (I.M.)
| | - Ignacio Aguilar
- Instituto Nacional de Investigación Agropecuaria (INIA), 11500 Montevideo, Uruguay;
| | - Ignacy Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens, GA 30602, USA; (S.T.); (Y.M.); (I.M.)
| |
Collapse
|
56
|
Konstantinov KV, Goddard ME. Application of multivariate single-step SNP best linear unbiased predictor model and revised SNP list for genomic evaluation of dairy cattle in Australia. J Dairy Sci 2020; 103:8305-8316. [PMID: 32622609 DOI: 10.3168/jds.2020-18242] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Accepted: 04/21/2020] [Indexed: 11/19/2022]
Abstract
The objectives of this study were (1) to evaluate the computational feasibility of the multitrait test-day single-step SNP-BLUP (ssSNP-BLUP) model using phenotypic records of genotyped and nongenotyped animals, and (2) to compare accuracies (coefficient of determination; R2) and bias of genomic estimated breeding values (GEBV) and de-regressed proofs as response variables in 3 Australian dairy cattle breeds (i.e., Holstein, Jersey, and Red breeds). Additive genomic random regression coefficients for milk, fat, protein yield and somatic cell score were predicted in the first, second, and third lactation. The predicted coefficients were used to derive 305-d GEBV and were compared with the traditional parent averages obtained from a BLUP model without genomic information. Cow fertility traits were evaluated from the 5-trait repeatability model (i.e., calving interval, days from calving to first service, pregnancy diagnosis, first service nonreturn rate, and lactation length). The de-regressed proofs were only for calving interval. Our results showed that ssSNP-BLUP using multitrait test-day model increased reliability and reduced bias of breeding values of young animals when compared with parent average from traditional BLUP in Australian Holsten, Jersey, and Red breeds. The use of a custom selection of approximately 46,000 SNP (custom XT SNP list) increased the reliability of GEBV compared with the results obtained using the commercial Illumina 50K chip (Illumina, San Diego, CA). The use of the second preconditioner substantially improved the convergence rate of the preconditioned conjugate gradient method, but further work is needed to improve the efficiency of the computation of the Kronecker matrix product by vector. Application of ssSNP-BLUP to multitrait random regression models is computationally feasible.
Collapse
Affiliation(s)
- K V Konstantinov
- DataGene Limited, Agriculture Victoria, AgriBio Centre for AgriBusiness, 5 Ring Rd., Bundoora, Victoria 3083, Australia.
| | - M E Goddard
- Melbourne School of Land and Environment, University of Melbourne, Parkville, Victoria 3010, Australia
| |
Collapse
|
57
|
Liu A, Lund MS, Boichard D, Mao X, Karaman E, Fritz S, Aamand GP, Wang Y, Su G. Imputation for sequencing variants preselected to a customized low-density chip. Sci Rep 2020; 10:9524. [PMID: 32533087 PMCID: PMC7293337 DOI: 10.1038/s41598-020-66523-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2019] [Accepted: 05/19/2020] [Indexed: 12/27/2022] Open
Abstract
The sequencing variants preselected from association analyses and bioinformatics analyses could improve genomic prediction. In this study, the imputation of sequencing SNPs preselected from major dairy breeds in Denmark-Finland-Sweden (DFS) and France (FRA) was investigated for both contemporary animals and old bulls in Danish Jersey. For contemporary animals, a two-step imputation which first imputed to 54 K and then to 54 K + DFS + FRA SNPs achieved highest accuracy. Correlations between observed and imputed genotypes were 91.6% for DFS SNPs and 87.6% for FRA SNPs, while concordance rates were 96.6% for DFS SNPs and 93.5% for FRA SNPs. The SNPs with lower minor allele frequency (MAF) tended to have lower correlations but higher concordance rates. For old bulls, imputation for DFS and FRA SNPs were relatively accurate even for bulls without progenies (correlations higher than 97.2% and concordance rates higher than 98.4%). For contemporary animals, given limited imputation accuracy of preselected sequencing SNPs especially for SNPs with low MAF, it would be a good strategy to directly genotype preselected sequencing SNPs with a customized SNP chip. For old bulls, given high imputation accuracy for preselected sequencing SNPs with all MAF ranges, it would be unnecessary to re-genotype preselected sequencing SNPs.
Collapse
Affiliation(s)
- Aoxing Liu
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830, Tjele, Denmark.,Key Laboratory of Animal Genetics, Breeding and Reproduction, MARA; National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, 100193, Beijing, P.R. China
| | - Mogens Sandø Lund
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830, Tjele, Denmark
| | - Didier Boichard
- GABI, INRA, AgroParisTech, Université Paris Saclay, 78350, Jouy-en-Josas, France
| | - Xiaowei Mao
- Key Laboratory of Vertebrate Evolution and Human Origins, Institute of Vertebrate Paleontology and Paleoanthropology, Chinese Academy of Sciences, 100044, Beijing, P.R. China.,CAS Center for Excellence in Life and Paleoenvironment, 100044, Beijing, P.R. China
| | - Emre Karaman
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830, Tjele, Denmark
| | - Sebastien Fritz
- GABI, INRA, AgroParisTech, Université Paris Saclay, 78350, Jouy-en-Josas, France.,ALLICE, 75012, Paris, France
| | | | - Yachun Wang
- Key Laboratory of Animal Genetics, Breeding and Reproduction, MARA; National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, 100193, Beijing, P.R. China.
| | - Guosheng Su
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830, Tjele, Denmark.
| |
Collapse
|
58
|
Warburton CL, Engle BN, Ross EM, Costilla R, Moore SS, Corbet NJ, Allen JM, Laing AR, Fordyce G, Lyons RE, McGowan MR, Burns BM, Hayes BJ. Use of whole-genome sequence data and novel genomic selection strategies to improve selection for age at puberty in tropically-adapted beef heifers. Genet Sel Evol 2020; 52:28. [PMID: 32460805 PMCID: PMC7251835 DOI: 10.1186/s12711-020-00547-5] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2019] [Accepted: 05/15/2020] [Indexed: 12/14/2022] Open
Abstract
Background In tropically-adapted beef heifers, application of genomic prediction for age at puberty has been limited due to low prediction accuracies. Our aim was to investigate novel methods of pre-selecting whole-genome sequence (WGS) variants and alternative analysis methodologies; including genomic best linear unbiased prediction (GBLUP) with multiple genomic relationship matrices (MGRM) and Bayesian (BayesR) analyses, to determine if prediction accuracy for age at puberty can be improved. Methods Genotypes and phenotypes were obtained from two research herds. In total, 868 Brahman and 960 Tropical Composite heifers were recorded in the first population and 3695 Brahman, Santa Gertrudis and Droughtmaster heifers were recorded in the second population. Genotypes were imputed to 23 million whole-genome sequence variants. Eight strategies were used to pre-select variants from genome-wide association study (GWAS) results using conditional or joint (COJO) analyses. Pre-selected variants were included in three models, GBLUP with a single genomic relationship matrix (SGRM), GBLUP MGRM and BayesR. Five-way cross-validation was used to test the effect of marker panel density (6 K, 50 K and 800 K), analysis model, and inclusion of pre-selected WGS variants on prediction accuracy. Results In all tested scenarios, prediction accuracies for age at puberty were highest in BayesR analyses. The addition of pre-selected WGS variants had little effect on the accuracy of prediction when BayesR was used. The inclusion of WGS variants that were pre-selected using a meta-analysis with COJO analyses by chromosome, fitted in a MGRM model, had the highest prediction accuracies in the GBLUP analyses, regardless of marker density. When the low-density (6 K) panel was used, the prediction accuracy of GBLUP was equal (0.42) to that with the high-density panel when only six additional sequence variants (identified using meta-analysis COJO by chromosome) were included. Conclusions While BayesR consistently outperforms other methods in terms of prediction accuracies, reasonable improvements in accuracy can be achieved when using GBLUP and low-density panels with the inclusion of a relatively small number of highly relevant WGS variants.
Collapse
Affiliation(s)
- Christie L Warburton
- Centre for Animal Science, Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St. Lucia, QLD, Australia.
| | - Bailey N Engle
- Centre for Animal Science, Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St. Lucia, QLD, Australia
| | - Elizabeth M Ross
- Centre for Animal Science, Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St. Lucia, QLD, Australia
| | - Roy Costilla
- Centre for Animal Science, Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St. Lucia, QLD, Australia
| | - Stephen S Moore
- Centre for Animal Science, Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St. Lucia, QLD, Australia
| | - Nicholas J Corbet
- School of Health, Medical and Applied Sciences, Central Queensland University, Rockhampton, QLD, Australia
| | - Jack M Allen
- Agricultural Business Research Institute, University of New England, Armidale, NSW, Australia
| | - Alan R Laing
- Formerly Department of Agriculture and Fisheries, Ayr, QLD, Australia
| | - Geoffry Fordyce
- Centre for Animal Science, Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St. Lucia, QLD, Australia
| | - Russell E Lyons
- School of Veterinary Science, The University of Queensland, St Lucia, QLD, Australia.,Neogen, University of Queensland, Gatton, QLD, Australia
| | - Michael R McGowan
- School of Veterinary Science, The University of Queensland, St Lucia, QLD, Australia
| | - Brian M Burns
- Formerly Department of Agriculture and Fisheries, Rockhampton, QLD, Australia
| | - Ben J Hayes
- Centre for Animal Science, Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St. Lucia, QLD, Australia
| |
Collapse
|
59
|
Raymond B, Wientjes YCJ, Bouwman AC, Schrooten C, Veerkamp RF. A deterministic equation to predict the accuracy of multi-population genomic prediction with multiple genomic relationship matrices. Genet Sel Evol 2020; 52:21. [PMID: 32345213 PMCID: PMC7189707 DOI: 10.1186/s12711-020-00540-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2019] [Accepted: 04/14/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A multi-population genomic prediction (GP) model in which important pre-selected single nucleotide polymorphisms (SNPs) are differentially weighted (MPMG) has been shown to result in better prediction accuracy than a multi-population, single genomic relationship matrix ([Formula: see text]) GP model (MPSG) in which all SNPs are weighted equally. Our objective was to underpin theoretically the advantages and limits of the MPMG model over the MPSG model, by deriving and validating a deterministic prediction equation for its accuracy. METHODS Using selection index theory, we derived an equation to predict the accuracy of estimated total genomic values of selection candidates from population [Formula: see text] ([Formula: see text]), when individuals from two populations, [Formula: see text] and [Formula: see text], are combined in the training population and two [Formula: see text], made respectively from pre-selected and remaining SNPs, are fitted simultaneously in MPMG. We used simulations to validate the prediction equation in scenarios that differed in the level of genetic correlation between populations, heritability, and proportion of genetic variance explained by the pre-selected SNPs. Empirical accuracy of the MPMG model in each scenario was calculated and compared to the predicted accuracy from the equation. RESULTS In general, the derived prediction equation resulted in accurate predictions of [Formula: see text] for the scenarios evaluated. Using the prediction equation, we showed that an important advantage of the MPMG model over the MPSG model is its ability to benefit from the small number of independent chromosome segments ([Formula: see text]) due to the pre-selected SNPs, both within and across populations, whereas for the MPSG model, there is only a single value for [Formula: see text], calculated based on all SNPs, which is very large. However, this advantage is dependent on the pre-selected SNPs that explain some proportion of the total genetic variance for the trait. CONCLUSIONS We developed an equation that gives insight into why, and under which conditions the MPMG outperforms the MPSG model for GP. The equation can be used as a deterministic tool to assess the potential benefit of combining information from different populations, e.g., different breeds or lines for GP in livestock or plants, or different groups of people based on their ethnic background for prediction of disease risk scores.
Collapse
Affiliation(s)
- Biaty Raymond
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands. .,Biometris, Wageningen University and Research, 6700AA, Wageningen, The Netherlands.
| | - Yvonne C J Wientjes
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands
| | - Aniek C Bouwman
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands
| | | | - Roel F Veerkamp
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands
| |
Collapse
|
60
|
Misztal I, Lourenco D, Legarra A. Current status of genomic evaluation. J Anim Sci 2020; 98:skaa101. [PMID: 32267923 PMCID: PMC7183352 DOI: 10.1093/jas/skaa101] [Citation(s) in RCA: 64] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2020] [Accepted: 04/07/2020] [Indexed: 12/14/2022] Open
Abstract
Early application of genomic selection relied on SNP estimation with phenotypes or de-regressed proofs (DRP). Chips of 50k SNP seemed sufficient for an accurate estimation of SNP effects. Genomic estimated breeding values (GEBV) were composed of an index with parent average, direct genomic value, and deduction of a parental index to eliminate double counting. Use of SNP selection or weighting increased accuracy with small data sets but had minimal to no impact with large data sets. Efforts to include potentially causative SNP derived from sequence data or high-density chips showed limited or no gain in accuracy. After the implementation of genomic selection, EBV by BLUP became biased because of genomic preselection and DRP computed based on EBV required adjustments, and the creation of DRP for females is hard and subject to double counting. Genomic selection was greatly simplified by single-step genomic BLUP (ssGBLUP). This method based on combining genomic and pedigree relationships automatically creates an index with all sources of information, can use any combination of male and female genotypes, and accounts for preselection. To avoid biases, especially under strong selection, ssGBLUP requires that pedigree and genomic relationships are compatible. Because the inversion of the genomic relationship matrix (G) becomes costly with more than 100k genotyped animals, large data computations in ssGBLUP were solved by exploiting limited dimensionality of genomic data due to limited effective population size. With such dimensionality ranging from 4k in chickens to about 15k in cattle, the inverse of G can be created directly (e.g., by the algorithm for proven and young) at a linear cost. Due to its simplicity and accuracy, ssGBLUP is routinely used for genomic selection by the major chicken, pig, and beef industries. Single step can be used to derive SNP effects for indirect prediction and for genome-wide association studies, including computations of the P-values. Alternative single-step formulations exist that use SNP effects for genotyped or for all animals. Although genomics is the new standard in breeding and genetics, there are still some problems that need to be solved. This involves new validation procedures that are unaffected by selection, parameter estimation that accounts for all the genomic data used in selection, and strategies to address reduction in genetic variances after genomic selection was implemented.
Collapse
Affiliation(s)
- Ignacy Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens, GA
| | - Daniela Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA
| | - Andres Legarra
- Department of Animal Genetics, Institut National de la Recherche Agronomique, Castanet-Tolosan, France
| |
Collapse
|
61
|
Sehgal D, Rosyara U, Mondal S, Singh R, Poland J, Dreisigacker S. Incorporating Genome-Wide Association Mapping Results Into Genomic Prediction Models for Grain Yield and Yield Stability in CIMMYT Spring Bread Wheat. FRONTIERS IN PLANT SCIENCE 2020; 11:197. [PMID: 32194596 PMCID: PMC7064468 DOI: 10.3389/fpls.2020.00197] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2019] [Accepted: 02/11/2020] [Indexed: 05/21/2023]
Abstract
Untangling the genetic architecture of grain yield (GY) and yield stability is an important determining factor to optimize genomics-assisted selection strategies in wheat. We conducted in-depth investigation on the above using a large set of advanced bread wheat lines (4,302), which were genotyped with genotyping-by-sequencing markers and phenotyped under contrasting (irrigated and stress) environments. Haplotypes-based genome-wide-association study (GWAS) identified 58 associations with GY and 15 with superiority index Pi (measure of stability). Sixteen associations with GY were "environment-specific" with two on chromosomes 3B and 6B with the large effects and 8 associations were consistent across environments and trials. For Pi, 8 associations were from chromosomes 4B and 7B, indicating 'hot spot' regions for stability. Epistatic interactions contributed to an additional 5-9% variation on average. We further explored whether integrating consistent and robust associations identified in GWAS as fixed effects in prediction models improves prediction accuracy. For GY, the model accounting for the haplotype-based GWAS loci as fixed effects led to up to 9-10% increase in prediction accuracy, whereas for Pi this approach did not provide any advantage. This is the first report of integrating genetic architecture of GY and yield stability into prediction models in wheat.
Collapse
Affiliation(s)
- Deepmala Sehgal
- Global Wheat Program, International Maize and Wheat Improvement Center, Texcoco, Mexico
| | - Umesh Rosyara
- Global Wheat Program, International Maize and Wheat Improvement Center, Texcoco, Mexico
| | - Suchismita Mondal
- Global Wheat Program, International Maize and Wheat Improvement Center, Texcoco, Mexico
| | - Ravi Singh
- Global Wheat Program, International Maize and Wheat Improvement Center, Texcoco, Mexico
| | - Jesse Poland
- Department of Plant Pathology, Kansas State University, Manhattan, KS, United States
| | - Susanne Dreisigacker
- Global Wheat Program, International Maize and Wheat Improvement Center, Texcoco, Mexico
| |
Collapse
|
62
|
Liu T, Luo C, Ma J, Wang Y, Shu D, Su G, Qu H. High-Throughput Sequencing With the Preselection of Markers Is a Good Alternative to SNP Chips for Genomic Prediction in Broilers. Front Genet 2020; 11:108. [PMID: 32174971 PMCID: PMC7056902 DOI: 10.3389/fgene.2020.00108] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2019] [Accepted: 01/30/2020] [Indexed: 11/13/2022] Open
Abstract
The choice of a genetic marker genotyping platform is important for genomic prediction in livestock and poultry. High-throughput sequencing can produce more genetic markers, but the genotype quality is lower than that obtained with single nucleotide polymorphism (SNP) chips. The aim of this study was to compare the accuracy of genomic prediction between high-throughput sequencing and SNP chips in broilers. In this study, we developed a new SNP marker screening method, the pre-marker-selection (PMS) method, to determine whether an SNP marker can be used for genomic prediction. We also compared a method which preselection marker based results from genome-wide association studies (GWAS). With the two methods, we analysed body weight at the12th week (BW) and feed conversion ratio (FCR) in a local broiler population. A total of 395 birds were selected from the F2 generation of the population, and 10X specific-locus amplified fragment sequencing (SLAF-seq) and the Illumina Chicken 60K SNP Beadchip were used for genotyping. The genomic best linear unbiased prediction method (GBLUP) was used to predict the genomic breeding values. The accuracy of genomic prediction was validated by the leave-one-out cross-validation method. Without SNP marker screening, the accuracies of the genomic estimated breeding value (GEBV) of BW and FCR were 0.509 and 0.249, respectively, when using SLAF-seq, and the accuracies were 0.516 and 0.232, respectively, when using the SNP chip. With SNP marker screening by the PMS method, the accuracies of GEBV of the two traits were 0.671 and 0.499, respectively, when using SLAF-seq, and 0.605 and 0.422, respectively, when using the SNP chip. Our SNP marker screening method led to an increase of prediction accuracy by 0.089-0.250. With SNP marker screening by the GWAS method, the accuracies of genomic prediction for the two traits were also improved, but the gains of accuracy were less than the gains with PMS method for all traits. The results from this study indicate that our PMS method can improve the accuracy of GEBV, and that more accurate genomic prediction can be obtained from an increased number of genomic markers when using high-throughput sequencing in local broiler populations. Due to its lower genotyping cost, high-throughput sequencing could be a good alternative to SNP chips for genomic prediction in breeding programmes of local broiler populations.
Collapse
Affiliation(s)
- Tianfei Liu
- State Key Laboratory of Livestock and Poultry Breeding, Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China
| | - Chenglong Luo
- State Key Laboratory of Livestock and Poultry Breeding, Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China
| | - Jie Ma
- Guangdong Provincial Key Laboratory of Animal Breeding and Nutrition, Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China
| | - Yan Wang
- State Key Laboratory of Livestock and Poultry Breeding, Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China
| | - Dingming Shu
- State Key Laboratory of Livestock and Poultry Breeding, Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China
| | - Guosheng Su
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, Tjele, Denmark
| | - Hao Qu
- State Key Laboratory of Livestock and Poultry Breeding, Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China
| |
Collapse
|
63
|
Groß C, Derks M, Megens HJ, Bosse M, Groenen MAM, Reinders M, de Ridder D. pCADD: SNV prioritisation in Sus scrofa. Genet Sel Evol 2020; 52:4. [PMID: 32033531 PMCID: PMC7006094 DOI: 10.1186/s12711-020-0528-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2019] [Accepted: 01/28/2020] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND In animal breeding, identification of causative genetic variants is of major importance and high economical value. Usually, the number of candidate variants exceeds the number of variants that can be validated. One way of prioritizing probable candidates is by evaluating their potential to have a deleterious effect, e.g. by predicting their consequence. Due to experimental difficulties to evaluate variants that do not cause an amino-acid substitution, other prioritization methods are needed. For human genomes, the prediction of deleterious genomic variants has taken a step forward with the introduction of the combined annotation dependent depletion (CADD) method. In theory, this approach can be applied to any species. Here, we present pCADD (p for pig), a model to score single nucleotide variants (SNVs) in pig genomes. RESULTS To evaluate whether pCADD captures sites with biological meaning, we used transcripts from miRNAs and introns, sequences from genes that are specific for a particular tissue, and the different sites of codons, to test how well pCADD scores differentiate between functional and non-functional elements. Furthermore, we conducted an assessment of examples of non-coding and coding SNVs, which are causal for changes in phenotypes. Our results show that pCADD scores discriminate between functional and non-functional sequences and prioritize functional SNVs, and that pCADD is able to score the different positions in a codon relative to their redundancy. Taken together, these results indicate that based on pCADD scores, regions with biological relevance can be identified and distinguished according to their rate of adaptation. CONCLUSIONS We present the ability of pCADD to prioritize SNVs in the pig genome with respect to their putative deleteriousness, in accordance to the biological significance of the region in which they are located. We created scores for all possible SNVs, coding and non-coding, for all autosomes and the X chromosome of the pig reference sequence Sscrofa11.1, proposing a toolbox to prioritize variants and evaluate sequences to highlight new sites of interest to explain biological functions that are relevant to animal breeding.
Collapse
Affiliation(s)
- Christian Groß
- Delft Bioinformatics Lab, University of Technology Delft, 2600GA, Delft, The Netherlands. .,Bioinformatics Group, Wageningen University & Research, 6708 PB, Wageningen, The Netherlands.
| | - Martijn Derks
- Animal Breeding and Genomics, Wageningen University & Research, Wageningen, The Netherlands
| | - Hendrik-Jan Megens
- Animal Breeding and Genomics, Wageningen University & Research, Wageningen, The Netherlands
| | - Mirte Bosse
- Animal Breeding and Genomics, Wageningen University & Research, Wageningen, The Netherlands
| | - Martien A M Groenen
- Animal Breeding and Genomics, Wageningen University & Research, Wageningen, The Netherlands
| | - Marcel Reinders
- Delft Bioinformatics Lab, University of Technology Delft, 2600GA, Delft, The Netherlands
| | - Dick de Ridder
- Bioinformatics Group, Wageningen University & Research, 6708 PB, Wageningen, The Netherlands
| |
Collapse
|
64
|
Oliveira Júnior GA, Santos DJA, Cesar ASM, Boison SA, Ventura RV, Perez BC, Garcia JF, Ferraz JBS, Garrick DJ. Fine mapping of genomic regions associated with female fertility in Nellore beef cattle based on sequence variants from segregating sires. J Anim Sci Biotechnol 2019; 10:97. [PMID: 31890201 PMCID: PMC6913038 DOI: 10.1186/s40104-019-0403-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2019] [Accepted: 11/11/2019] [Indexed: 12/26/2022] Open
Abstract
Background Impaired fertility in cattle limits the efficiency of livestock production systems. Unraveling the genetic architecture of fertility traits would facilitate their improvement by selection. In this study, we characterized SNP chip haplotypes at QTL blocks then used whole-genome sequencing to fine map genomic regions associated with reproduction in a population of Nellore (Bos indicus) heifers. Methods The dataset comprised of 1337 heifers genotyped using a GeneSeek® Genomic Profiler panel (74677 SNPs), representing the daughters from 78 sires. After performing marker quality control, 64800 SNPs were retained. Haplotypes carried by each sire at six previously identified QTL on BTAs 5, 14 and 18 for heifer pregnancy and BTAs 8, 11 and 22 for antral follicle count were constructed using findhap software. The significance of the contrasts between the effects of every two paternally-inherited haplotype alleles were used to identify sires that were heterozygous at each QTL. Whole-genome sequencing data localized to the haplotypes from six sires and 20 other ancestors were used to identify sequence variants that were concordant with the haplotype contrasts. Enrichment analyses were applied to these variants using KEGG and MeSH libraries. Results A total of six (BTA 5), six (BTA 14) and five (BTA 18) sires were heterozygous for heifer pregnancy QTL whereas six (BTA 8), fourteen (BTA 11), and five (BTA 22) sires were heterozygous for number of antral follicles’ QTL. Due to inadequate representation of many haplotype alleles in the sequenced animals, fine mapping analysis could only be reliably performed for the QTL on BTA 5 and 14, which had 641 and 3733 concordant candidate sequence variants, respectively. The KEGG “Circadian rhythm” and “Neurotrophin signaling pathway” were significantly associated with the genes in the QTL on BTA 5 whereas 32 MeSH terms were associated with the QTL on BTA 14. Among the concordant sequence variants, 0.2% and 0.3% were classified as missense variants for BTAs 5 and 14, respectively, highlighting the genes MTERF2, RTMB, ENSBTAG00000037306 (miRNA), ENSBTAG00000040351, PRKDC, and RGS20. The potential causal mutations found in the present study were associated with biological processes such as oocyte maturation, embryo development, placenta development and response to reproductive hormones. Conclusions The identification of heterozygous sires by positionally phasing SNP chip data and contrasting haplotype effects for previously detected QTL can be used for fine mapping to identify potential causal mutations and candidate genes. Genomic variants on genes MTERF2, RTBC, miRNA ENSBTAG00000037306, ENSBTAG00000040351, PRKDC, and RGS20, which are known to have influence on reproductive biological processes, were detected.
Collapse
Affiliation(s)
- Gerson A Oliveira Júnior
- 1Department of Veterinary Medicine, University of São Paulo (USP), Faculty of Animal Science and Food Engineer, Pirassununga, SP Brazil.,2Department of Animal Bioscience, Center for Genetic Improvement of Livestock, University of Guelph, Guelph, ON Canada
| | - Daniel J A Santos
- 3Department of Animal and Avian Sciences, University of Maryland, College Park, Maryland, USA
| | - Aline S M Cesar
- 4Department of Animal Science, University of São Paulo (USP), Piracicaba, SP Brazil
| | - Solomon A Boison
- 5Department of Sustainable Agricultural Systems, University of Natural Resources and Life Sciences, Vienna, Austria
| | - Ricardo V Ventura
- 2Department of Animal Bioscience, Center for Genetic Improvement of Livestock, University of Guelph, Guelph, ON Canada.,6Department of Animal Nutrition and Production, School of Veterinary Medicine and Animal Science, University of São Paulo (USP), Pirassununga, Brazil
| | - Bruno C Perez
- 1Department of Veterinary Medicine, University of São Paulo (USP), Faculty of Animal Science and Food Engineer, Pirassununga, SP Brazil
| | - José F Garcia
- 7Department of Support, Production and Animal Health, School of Veterinary Medicine, São Paulo State University (Unesp), Araçatuba, SP Brazil
| | - José Bento S Ferraz
- 1Department of Veterinary Medicine, University of São Paulo (USP), Faculty of Animal Science and Food Engineer, Pirassununga, SP Brazil
| | - Dorian J Garrick
- 8School of Agriculture, Massey University, Ruakura Ag Centre, Hamilton, New Zealand
| |
Collapse
|
65
|
Genomic prediction based on selected variants from imputed whole-genome sequence data in Australian sheep populations. Genet Sel Evol 2019; 51:72. [PMID: 31805849 PMCID: PMC6896509 DOI: 10.1186/s12711-019-0514-2] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2019] [Accepted: 11/25/2019] [Indexed: 12/13/2022] Open
Abstract
Background Whole-genome sequence (WGS) data could contain information on genetic variants at or in high linkage disequilibrium with causative mutations that underlie the genetic variation of polygenic traits. Thus far, genomic prediction accuracy has shown limited increase when using such information in dairy cattle studies, in which one or few breeds with limited diversity predominate. The objective of our study was to evaluate the accuracy of genomic prediction in a multi-breed Australian sheep population of relatively less related target individuals, when using information on imputed WGS genotypes. Methods Between 9626 and 26,657 animals with phenotypes were available for nine economically important sheep production traits and all had WGS imputed genotypes. About 30% of the data were used to discover predictive single nucleotide polymorphism (SNPs) based on a genome-wide association study (GWAS) and the remaining data were used for training and validation of genomic prediction. Prediction accuracy using selected variants from imputed sequence data was compared to that using a standard array of 50k SNP genotypes, thereby comparing genomic best linear prediction (GBLUP) and Bayesian methods (BayesR/BayesRC). Accuracy of genomic prediction was evaluated in two independent populations that were each lowly related to the training set, one being purebred Merino and the other crossbred Border Leicester x Merino sheep. Results A substantial improvement in prediction accuracy was observed when selected sequence variants were fitted alongside 50k genotypes as a separate variance component in GBLUP (2GBLUP) or in Bayesian analysis as a separate category of SNPs (BayesRC). From an average accuracy of 0.27 in both validation sets for the 50k array, the average absolute increase in accuracy across traits with 2GBLUP was 0.083 and 0.073 for purebred and crossbred animals, respectively, whereas with BayesRC it was 0.102 and 0.087. The average gain in accuracy was smaller when selected sequence variants were treated in the same category as 50k SNPs. Very little improvement over 50k prediction was observed when using all WGS variants. Conclusions Accuracy of genomic prediction in diverse sheep populations increased substantially by using variants selected from whole-genome sequence data based on an independent multi-breed GWAS, when compared to genomic prediction using standard 50K genotypes.
Collapse
|
66
|
Hayes BJ, Daetwyler HD. 1000 Bull Genomes Project to Map Simple and Complex Genetic Traits in Cattle: Applications and Outcomes. Annu Rev Anim Biosci 2019; 7:89-102. [PMID: 30508490 DOI: 10.1146/annurev-animal-020518-115024] [Citation(s) in RCA: 184] [Impact Index Per Article: 36.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The 1000 Bull Genomes Project is a collection of whole-genome sequences from 2,703 individuals capturing a significant proportion of the world's cattle diversity. So far, 84 million single-nucleotide polymorphisms (SNPs) and 2.5 million small insertion deletions have been identified in the collection, a very high level of genetic diversity. The project has greatly accelerated the identification of deleterious mutations for a range of genetic diseases, as well as for embryonic lethals. The rate of identification of causal mutations for complex traits has been slower, reflecting the typically small effect size of these mutations and the fact that many are likely in as-yet-unannotated regulatory regions. Both the deleterious mutations that have been identified and the mutations associated with complex trait variation have been included in low-cost SNP array designs, and these arrays are being genotyped in tens of thousands of dairy and beef cattle, enabling management of deleterious mutations in these populations as well as genomic selection.
Collapse
Affiliation(s)
- Ben J Hayes
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St Lucia, Queensland 4067, Australia; .,Agriculture Victoria Research, AgriBio, Bundoora, Victoria 3083, Australia
| | - Hans D Daetwyler
- Agriculture Victoria Research, AgriBio, Bundoora, Victoria 3083, Australia.,School of Applied Systems Biology, La Trobe University, Bundoora, Victoria 3083, Australia
| |
Collapse
|
67
|
Liu Q, Hobbs HA, Domier LL. Genome-wide association study of the seed transmission rate of soybean mosaic virus and associated traits using two diverse population panels. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2019; 132:3413-3424. [PMID: 31630210 DOI: 10.1007/s00122-019-03434-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/10/2019] [Accepted: 09/17/2019] [Indexed: 06/10/2023]
Abstract
KEY MESSAGE Genome-wide association analyses identified candidates for genes involved in restricting virus movement into embryonic tissues, suppressing virus-induced seed coat mottling and preserving yield in soybean plants infected with soybean mosaic virus. Soybean mosaic virus (SMV) causes significant reductions in soybean yield and seed quality. Because seedborne infections can serve as primary sources of inoculum for SMV infections, resistance to SMV seed transmission provides a means to limit the impacts of SMV. In this study, two diverse population panels, Pop1 and Pop2, composed of 409 and 199 soybean plant introductions, respectively, were evaluated for SMV seed transmission rate, seed coat mottling, and seed yield from SMV-infected plants. The phenotypic data and genotypic data from the SoySNP50K dataset were analyzed using GAPIT and rrBLUP. For SMV seed transmission rate, a single locus was identified on chromosome 9 in Pop1. For SMV-induced seed coat mottling, loci were identified on chromosome 9 in Pop1 and on chromosome 3 in Pop2. For seed yield from SMV-infected plants, a single locus was identified on chromosome 3 in Pop2 that was within the map interval of a previously described quantitative trait locus for seed number. The high linkage disequilibrium regions surrounding the markers on chromosomes 3 and 9 contained a predicted nonsense-mediated RNA decay gene, multiple pectin methylesterase inhibitor genes (involved in restricting virus movement), two chalcone synthase genes, and a homolog of the yeast Rtf1 gene (involved in RNA-mediated transcriptional gene silencing). The results of this study provided additional insight into the genetic architecture of these three important traits, suggested candidate genes for downstream functional validation, and suggested that genomic prediction would outperform marker-assisted selection for two of the four trait-marker associations.
Collapse
Affiliation(s)
- Qiong Liu
- Department of Crop Sciences, University of Illinois, Urbana, IL, 61801, USA
| | - Houston A Hobbs
- Department of Crop Sciences, University of Illinois, Urbana, IL, 61801, USA
| | - Leslie L Domier
- Soybean/Maize Germplasm, Pathology, and Genetics Research Unit, United States Department of Agriculture-Agricultural Research Service, Urbana, IL, 61801, USA.
| |
Collapse
|
68
|
Song H, Ye S, Jiang Y, Zhang Z, Zhang Q, Ding X. Using imputation-based whole-genome sequencing data to improve the accuracy of genomic prediction for combined populations in pigs. Genet Sel Evol 2019; 51:58. [PMID: 31638889 PMCID: PMC6805481 DOI: 10.1186/s12711-019-0500-8] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2019] [Accepted: 10/07/2019] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND For genomic selection in populations with a small reference population, combining populations of the same breed or populations of related breeds is an effective way to increase the size of the reference population. However, genomic predictions based on single nucleotide polymorphism (SNP)-chip genotype data using combined populations with different genetic backgrounds or from different breeds have not shown a clear advantage over using within-population or within-breed predictions. The increasing availability of whole-genome sequencing (WGS) data provides new opportunities for combined population genomic prediction. Our objective was to investigate the accuracy of genomic prediction using imputation-based WGS data from combined populations in pigs. Using 80K SNP panel genotypes, WGS genotypes, or genotypes on WGS variants that were pruned based on linkage disequilibrium (LD), three methods [genomic best linear unbiased prediction (GBLUP), single-step (ss)GBLUP, and genomic feature (GF)BLUP] were implemented with different prior information to identify the best method to improve the accuracy of genomic prediction for combined populations in pigs. RESULTS In total, 2089 and 2043 individuals with production and reproduction phenotypes, respectively, from three Yorkshire populations with different genetic backgrounds were genotyped with the PorcineSNP80 panel. Imputation accuracy from 80K to WGS variants reached 92%. The results showed that use of the WGS data compared to the 80K SNP panel did not increase the accuracy of genomic prediction in a single population, but using WGS data with LD pruning and GFBLUP with prior information did yield higher accuracy than the 80K SNP panel. For the 80K SNP panel genotypes, using the combined population resulted in a slight improvement, no change, or even a slight decrease in accuracy in comparison with the single population for GBLUP and ssGBLUP, while accuracy increased by 1 to 2.4% when using WGS data. Notably, the GFBLUP method did not perform well for both the combined population and the single populations. CONCLUSIONS The use of WGS data was beneficial for combined population genomic prediction. Simply increasing the number of SNPs to the WGS level did not increase accuracy for a single population, while using pruned WGS data based on LD and GFBLUP with prior information could yield higher accuracy than the 80K SNP panel.
Collapse
Affiliation(s)
- Hailiang Song
- Key Laboratory of Animal Genetics and Breeding of Ministry of Agriculture, National Engineering Laboratory of Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Shaopan Ye
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, Guangdong China
| | - Yifan Jiang
- Key Laboratory of Animal Genetics and Breeding of Ministry of Agriculture, National Engineering Laboratory of Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Zhe Zhang
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, Guangdong China
| | - Qin Zhang
- Shandong Provincial Key Laboratory of Animal Biotechnology and Disease Control and Prevention, Shandong Agricultural University, Taian, China
| | - Xiangdong Ding
- Key Laboratory of Animal Genetics and Breeding of Ministry of Agriculture, National Engineering Laboratory of Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| |
Collapse
|
69
|
Fragomeni BO, Lourenco DAL, Legarra A, VanRaden PM, Misztal I. Alternative SNP weighting for single-step genomic best linear unbiased predictor evaluation of stature in US Holsteins in the presence of selected sequence variants. J Dairy Sci 2019; 102:10012-10019. [PMID: 31495612 DOI: 10.3168/jds.2019-16262] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2019] [Accepted: 07/16/2019] [Indexed: 11/19/2022]
Abstract
Causal variants inferred from sequence data analysis are expected to increase accuracy of genomic selection. In this work we evaluated the gain in reliability of genomic predictions, for stature in US Holsteins, when adding selected sequence variants to a pre-existent SNP chip. Two prediction methods were tested: de-regressed proofs assuming heterogeneous (genomic BLUP; GBLUP) residual variances and by single-step GBLUP (ssGBLUP) using actual phenotypes. Phenotypic data included 3,999,631 records for stature on 3,027,304 Holstein cows. Genotypes on 54,087 SNP markers (54k) were available for 26,877 bulls. Additionally, 16,648 selected sequence variants were combined with the 54k markers, for a total of 70,735 (70k) markers. In all methods, SNP in the genomic relationship matrix (G) were unweighted or weighted iteratively, with weights derived either by SNP effects squared or by a nonlinear method that resembles BayesA (nonlinear A). Reliability of genomic predictions were obtained by cross validation. With unweighted G derived from 54k markers, the reliabilities (× 100) were 72.4 for GBLUP and 75.3 for ssGBLUP. With unweighted G derived from 70k markers, the reliabilities were 73.4 and 76.0, respectively. Weighting by nonlinear A changed reliabilities to 73.3, and 75.9, respectively. Addition of selected sequence variants had a small effect on reliabilities. Weighting by quadratic functions reduced reliabilities. Weighting by nonlinear A increased reliabilities for GBLUP but had only a small effect in ssGBLUP. Reliabilities for direct genomic values extracted from ssGBLUP using unweighted G with 54k were higher than reliabilities by any GBLUP. Thus, ssGBLUP seems to capture more information than GBLUP and there is less room for extra reliability. Improvements in GBLUP may be because the weights in G change the covariance structure, which can explain a proportion of the variance that is accounted for when a heterogeneous residual variance is assumed by considering a different number of daughters per bull.
Collapse
Affiliation(s)
- B O Fragomeni
- Department of Animal Science, University of Connecticut, Storrs-Mansfield 06269.
| | - D A L Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens 30602
| | - A Legarra
- Institut National de la Recherche Agronomique, UMR1388 GenPhySE, Castanet Tolosan, France 31326
| | - P M VanRaden
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, Beltsville, MD 20705
| | - I Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens 30602
| |
Collapse
|
70
|
GWAS for Meat and Carcass Traits Using Imputed Sequence Level Genotypes in Pooled F2-Designs in Pigs. G3-GENES GENOMES GENETICS 2019; 9:2823-2834. [PMID: 31296617 PMCID: PMC6723123 DOI: 10.1534/g3.119.400452] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
In order to gain insight into the genetic architecture of economically important traits in pigs and to derive suitable genetic markers to improve these traits in breeding programs, many studies have been conducted to map quantitative trait loci. Shortcomings of these studies were low mapping resolution, large confidence intervals for quantitative trait loci-positions and large linkage disequilibrium blocks. Here, we overcome these shortcomings by pooling four large F2 designs to produce smaller linkage disequilibrium blocks and by resequencing the founder generation at high coverage and the F1 generation at low coverage for subsequent imputation of the F2 generation to whole genome sequencing marker density. This lead to the discovery of more than 32 million variants, 8 million of which have not been previously reported. The pooling of the four F2 designs enabled us to perform a joint genome-wide association study, which lead to the identification of numerous significantly associated variant clusters on chromosomes 1, 2, 4, 7, 17 and 18 for the growth and carcass traits average daily gain, back fat thickness, meat fat ratio, and carcass length. We could not only confirm previously reported, but also discovered new quantitative trait loci. As a result, several new candidate genes are discussed, among them BMP2 (bone morphogenetic protein 2), which we recently discovered in a related study. Variant effect prediction revealed that 15 high impact variants for the traits back fat thickness, meat fat ratio and carcass length were among the statistically significantly associated variants.
Collapse
|
71
|
Yang F, Chen F, Li L, Yan L, Badri T, Lv C, Yu D, Zhang M, Jang X, Li J, Yuan L, Wang G, Li H, Li J, Cai Y. Three Novel Players: PTK2B, SYK, and TNFRSF21 Were Identified to Be Involved in the Regulation of Bovine Mastitis Susceptibility via GWAS and Post-transcriptional Analysis. Front Immunol 2019; 10:1579. [PMID: 31447828 PMCID: PMC6691815 DOI: 10.3389/fimmu.2019.01579] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Accepted: 06/24/2019] [Indexed: 12/25/2022] Open
Abstract
Bovine mastitis is a common inflammatory disease caused by multiple factors in early lactation or dry period. Genome wide association studies (GWAS) can provide a convenient and effective strategy for understanding the biological basis of mastitis and better prevention. 2b-RADseq is a high-throughput sequencing technique that offers a powerful method for genome-wide genetic marker development and genotyping. In this study, single nucleotide polymorphisms (SNPs) of the immune-regulated gene correlative with mastitis were screened and identified by two stage association analysis via GWAS-2b-RADseq in Chinese Holstein cows. We have screened 10,058 high quality SNPs from 7,957,920 tags and calculated their allele frequencies. Twenty-seven significant SNPs were co-labeled in two GWAS analysis models [Bayesian (P < 0.001) and Logistic regression (P < 0.01)], and only three SNPs (rs75762330, C > T, PIC = 0.2999; rs88640083, A > G, PIC = 0.1676; rs20438858, G > A, PIC = 0.3366) were annotated to immune-regulated genes (PTK2B, SYK, and TNFRSF21). Identified three SNPs are located in non-coding regions with low or moderate genetic polymorphisms. However, independent sample population validation (Case-control study) data showed that three important SNPs (rs75762330, P < 0.025, OR > 1; rs88640083, P < 0.005, OR > 1; rs20438858, P < 0.001, OR < 1) were significantly associated with clinical mastitis trait. Importantly, PTK2B and SYK expression was down-regulated in both peripheral blood leukocytes (PBLs) of clinical mastitis cows and in vitro LPS (E. coli)-stimulated bovine mammary epithelial cells, while TNFRSF21 was up-regulated. Under the same conditions, expression of Toll-like receptor 4 (TLR4), AKT1, and pro-inflammatory factors (IL-1β and IL-8) were also up-regulated. Interestingly, network analysis indicated that PTK2B and SYK are co-expressed in innate immune signaling pathway of Chinese Holstein. Taken together, these results provided strong evidence for the study of SNPs in bovine mastitis, and revealed the role of SYK, PTK2B, and TNFRSF21 in bovine mastitis susceptibility/tolerance.
Collapse
Affiliation(s)
- Fan Yang
- Anhui Provincial Key Lab of the Conservation and Exploitation of Biological Resources, College of Life Sciences, Anhui Normal University, Wuhu, China
- College of Animal Science and Technology, Nanjing Agricultural University, Nanjing, China
| | - Fanghui Chen
- College of Animal Science and Technology, Nanjing Agricultural University, Nanjing, China
| | - Lili Li
- National Animal Husbandry Station, Beijing, China
| | - Li Yan
- Department of Radiation Oncology, Linyi People Hospital, Linyi, China
| | - Tarig Badri
- College of Animal Science and Technology, Nanjing Agricultural University, Nanjing, China
| | - Chenglong Lv
- College of Animal Science and Technology, Nanjing Agricultural University, Nanjing, China
| | - Daolun Yu
- Anhui Provincial Key Lab of the Conservation and Exploitation of Biological Resources, College of Life Sciences, Anhui Normal University, Wuhu, China
| | - Manling Zhang
- Anhui Provincial Key Lab of the Conservation and Exploitation of Biological Resources, College of Life Sciences, Anhui Normal University, Wuhu, China
| | - Xiaojun Jang
- Anhui Provincial Key Lab of the Conservation and Exploitation of Biological Resources, College of Life Sciences, Anhui Normal University, Wuhu, China
| | - Jie Li
- Anhui Provincial Key Lab of the Conservation and Exploitation of Biological Resources, College of Life Sciences, Anhui Normal University, Wuhu, China
| | - Lu Yuan
- Anhui Provincial Key Lab of the Conservation and Exploitation of Biological Resources, College of Life Sciences, Anhui Normal University, Wuhu, China
| | - Genlin Wang
- College of Animal Science and Technology, Nanjing Agricultural University, Nanjing, China
| | - Honglin Li
- Department of Biochemistry and Molecular Biology, Medical College of Georgia, Augusta University, Augusta, GA, United States
| | - Jun Li
- Anhui Provincial Key Lab of the Conservation and Exploitation of Biological Resources, College of Life Sciences, Anhui Normal University, Wuhu, China
| | - Yafei Cai
- College of Animal Science and Technology, Nanjing Agricultural University, Nanjing, China
| |
Collapse
|
72
|
Ye S, Gao N, Zheng R, Chen Z, Teng J, Yuan X, Zhang H, Chen Z, Zhang X, Li J, Zhang Z. Strategies for Obtaining and Pruning Imputed Whole-Genome Sequence Data for Genomic Prediction. Front Genet 2019; 10:673. [PMID: 31379929 PMCID: PMC6650575 DOI: 10.3389/fgene.2019.00673] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2019] [Accepted: 06/27/2019] [Indexed: 11/13/2022] Open
Abstract
Genomic prediction with imputed whole-genome sequencing (WGS) data is an attractive approach to improve predictive ability with low cost. However, high accuracy has not been realized using this method in livestock. In this study, we imputed 435 individuals from 600K single nucleotide polymorphism (SNP) chip data to WGS data using different reference panels. We also investigated the prediction accuracy of genomic best linear unbiased prediction (GBLUP) using imputed WGS data from different reference panels, linkage disequilibrium (LD)-based marker pruning, and pre-selected variants based on Genome-wide association society (GWAS) results. Results showed that the imputation accuracies from 600K to WGS data were 0.873 ± 0.038, 0.906 ± 0.036, and 0.979 ± 0.010 for the internal, external, and combined reference panels, respectively. In most traits of chickens, the prediction accuracy of imputed WGS data obtained from the internal reference panel was greater than or equal to that of the combined reference panel; the external reference panel had the lowest prediction accuracy. Compared with 600K chip data, GBLUP with imputed WGS data had only a small increase (1-3%) in prediction accuracy. Using only variants selected from imputed WGS data based on GWAS results resulted in almost no increase for most traits and even increased the bias of the regression coefficient. The impact of the degree of LD of selected and remaining variants on prediction accuracy was different. For average daily gain (ADG), residual feed intake (RFI), intestine length (IL), and body weight in 91 days (BW91), the accuracy of GBLUP increased as the degree of LD of selected variants decreased, but the opposite relationship occurred for the remaining variants. But for breast muscle weight (BMW) and average daily feed intake (ADFI), the accuracy of GBLUP increased as the degree of LD of selected variants increased, and the degree of LD of remaining variants had a small effect on prediction accuracy. Overall, the optimal imputation strategy to obtain WGS data for genomic prediction should consider the relationship between selected individuals and target population individuals to avoid heterogeneity of imputation. LD-based marker pruning can be used to improve the accuracy of genomic prediction using imputed WGS data.
Collapse
Affiliation(s)
- Shaopan Ye
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Ning Gao
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Rongrong Zheng
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Zitao Chen
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Jinyan Teng
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Xiaolong Yuan
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Hao Zhang
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Zanmou Chen
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Xiquan Zhang
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Jiaqi Li
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Zhe Zhang
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| |
Collapse
|
73
|
Improvement of genomic prediction by integrating additional single nucleotide polymorphisms selected from imputed whole genome sequencing data. Heredity (Edinb) 2019; 124:37-49. [PMID: 31278370 PMCID: PMC6906477 DOI: 10.1038/s41437-019-0246-7] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Revised: 05/11/2019] [Accepted: 06/17/2019] [Indexed: 11/10/2022] Open
Abstract
The availability of whole genome sequencing (WGS) data enables the discovery of causative single nucleotide polymorphisms (SNPs) or SNPs in high linkage disequilibrium with causative SNPs. This study investigated effects of integrating SNPs selected from imputed WGS data into the data of 54K chip on genomic prediction in Danish Jersey. The WGS SNPs, mainly including peaks of quantitative trait loci, structure variants, regulatory regions of genes, and SNPs within genes with strong effects predicted with variant effect predictor, were selected in previous analyses for dairy breeds in Denmark–Finland–Sweden (DFS) and France (FRA). Animals genotyped with 54K chip, standard LD chip, and customized LD chip which covered selected WGS SNPs and SNPs in the standard LD chip, were imputed to 54K together with DFS and FRA SNPs. Genomic best linear unbiased prediction (GBLUP) and Bayesian four-distribution mixture models considering 54K and selected WGS SNPs as one (a one-component model) or two separate genetic components (a two-component model) were used to predict breeding values. For milk production traits and mastitis, both DFS (0.025) and FRA (0.029) sets of additional WGS SNPs improved reliabilities, and inclusions of all selected WGS SNPs generally achieved highest improvements of reliabilities (0.034). A Bayesian four-distribution model yielded higher reliabilities than a GBLUP model for milk and protein, but extra gains in reliabilities from using selected WGS SNPs were smaller for a Bayesian four-distribution model than a GBLUP model. Generally, no significant difference was observed between one-component and two-component models, except for using GBLUP models for milk.
Collapse
|
74
|
Al Kalaldeh M, Gibson J, Duijvesteijn N, Daetwyler HD, MacLeod I, Moghaddar N, Lee SH, van der Werf JHJ. Using imputed whole-genome sequence data to improve the accuracy of genomic prediction for parasite resistance in Australian sheep. Genet Sel Evol 2019; 51:32. [PMID: 31242855 PMCID: PMC6595562 DOI: 10.1186/s12711-019-0476-4] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2018] [Accepted: 06/18/2019] [Indexed: 01/16/2023] Open
Abstract
Background This study aimed at (1) comparing the accuracies of genomic prediction for parasite resistance in sheep based on whole-genome sequence (WGS) data to those based on 50k and high-density (HD) single nucleotide polymorphism (SNP) panels; (2) investigating whether the use of variants within quantitative trait loci (QTL) regions that were selected from regional heritability mapping (RHM) in an independent dataset improved the accuracy more than variants selected from genome-wide association studies (GWAS); and (3) comparing the prediction accuracies between variants selected from WGS data to variants selected from the HD SNP panel. Results The accuracy of genomic prediction improved marginally from 0.16 ± 0.02 and 0.18 ± 0.01 when using all the variants from 50k and HD genotypes, respectively, to 0.19 ± 0.01 when using all the variants from WGS data. Fitting a GRM from the selected variants alongside a GRM from the 50k SNP genotypes improved the prediction accuracy substantially compared to fitting the 50k SNP genotypes alone. The gain in prediction accuracy was slightly more pronounced when variants were selected from WGS data compared to when variants were selected from the HD panel. When sequence variants that passed the GWAS \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$- log_{10} (p\,value)$$\end{document}-log10(pvalue) threshold of 3 across the entire genome were selected, the prediction accuracy improved by 5% (up to 0.21 ± 0.01), whereas when selection was limited to sequence variants that passed the same GWAS \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$- log_{10} (p\,value)$$\end{document}-log10(pvalue) threshold of 3 in regions identified by RHM, the accuracy improved by 9% (up to 0.25 ± 0.01). Conclusions Our results show that through careful selection of sequence variants from the QTL regions, the accuracy of genomic prediction for parasite resistance in sheep can be improved. These findings have important implications for genomic prediction in sheep.
Collapse
Affiliation(s)
- Mohammad Al Kalaldeh
- Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia. .,School of Environmental and Rural Science, University of New England, Armidale, NSW, 2351, Australia.
| | - John Gibson
- Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.,School of Environmental and Rural Science, University of New England, Armidale, NSW, 2351, Australia
| | - Naomi Duijvesteijn
- Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.,School of Environmental and Rural Science, University of New England, Armidale, NSW, 2351, Australia
| | - Hans D Daetwyler
- Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.,Centre for AgriBioscience, Agriculture Victoria, Bundoora, VIC, 3083, Australia.,School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3083, Australia
| | - Iona MacLeod
- Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.,Centre for AgriBioscience, Agriculture Victoria, Bundoora, VIC, 3083, Australia
| | - Nasir Moghaddar
- Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.,School of Environmental and Rural Science, University of New England, Armidale, NSW, 2351, Australia
| | - Sang Hong Lee
- Australian Centre for Precision Health, University of South Australia Cancer Research Institute, University of South Australia, Adelaide, SA, 5000, Australia
| | - Julius H J van der Werf
- Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.,School of Environmental and Rural Science, University of New England, Armidale, NSW, 2351, Australia
| |
Collapse
|
75
|
Ma P, Lund MS, Aamand GP, Su G. Use of a Bayesian model including QTL markers increases prediction reliability when test animals are distant from the reference population. J Dairy Sci 2019; 102:7237-7247. [PMID: 31155255 DOI: 10.3168/jds.2018-15815] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2018] [Accepted: 03/31/2019] [Indexed: 01/23/2023]
Abstract
Relatedness between reference and test animals has an important effect on the reliability of genomic prediction for test animals. Because genomic prediction has been widely applied in practical cattle breeding and bulls have been selected according to genomic breeding value without progeny testing, the sires or grandsires of candidates might not have phenotypic information and might not be in the reference population when the candidates are selected. The objective of this study was to investigate the decreasing trend of the reliability of genomic prediction given distant reference populations, using genomic best linear unbiased prediction (GBLUP) and Bayesian variable selection models with or without including the quantitative trait locus (QTL) markers detected from sequencing data. The data used in this study consisted of 22,242 bulls genotyped using the 54K SNP array from EuroGenomics. Among them, 1,444 Danish bulls born from 2006 to 2010 were selected as test animals. Different reference populations with varying relationships to test animals were created according to pedigree-based relationships. The reference individuals having a relationship with one or more test animals higher than 0.4 (scenario ρ < 0.4), 0.2 (ρ < 0.2), or 0.1 (ρ < 0.1, where ρ = relationship coefficient) were removed from reference sets; these represented the distance between reference and test animals being 2 generations, 3 generations, and 4 generations, respectively. Imputed whole-genome sequencing data of bulls from Denmark were used to conduct a genome-wide association study (GWAS). A small number of significant variants (QTL markers) from the GWAS were added to the array data. To compare the effects of different models, the basic GBLUP model, a Bayesian selection variable model, a GBLUP model with 2 components of genetic effects, and a Bayesian model with pooled array data and QTL markers were used for estimating genomic estimated breeding values (GEBV) of test animals. The reliability of genomic prediction decreased when the test animals were more generations away from the reference population. The reliability of genomic prediction was 0.461 for 1 generation away and 0.396 for 3 generations away, with the same number of individuals in the reference set, using a GBLUP model with chip markers only. The results showed that using the Bayesian method and QTL markers improved the reliability of genomic prediction in all scenarios of relationship between test and reference animals, in a range of 1.3% and 65.1% (4 generations away with only 841 individuals in the reference set). However, most gains were for predictions of milk yield and fat yield. There was little improvement for predictions of protein yield and mastitis, and no improvement for prediction of fertility, except for scenario ρ < 0.1, in which there was a large improvement for predictions of all traits. On the other hand, models including more than 10% polygenic effect decreased prediction reliability when the relationship between test and reference animals was distant.
Collapse
Affiliation(s)
- Peipei Ma
- Department of Animal Science, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai 200240, P.R. China; Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, DK-8830, Aarhus, Denmark
| | - Mogens S Lund
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, DK-8830, Aarhus, Denmark
| | - Gert P Aamand
- NAV Nordic Cattle Genetic Evaluation, DK-8200, Aarhus, Denmark
| | - Guosheng Su
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, DK-8830, Aarhus, Denmark.
| |
Collapse
|
76
|
Gebreyesus G, Bovenhuis H, Lund MS, Poulsen NA, Sun D, Buitenhuis B. Reliability of genomic prediction for milk fatty acid composition by using a multi-population reference and incorporating GWAS results. Genet Sel Evol 2019; 51:16. [PMID: 31029078 PMCID: PMC6487064 DOI: 10.1186/s12711-019-0460-z] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2018] [Accepted: 04/10/2019] [Indexed: 01/01/2023] Open
Abstract
Background Large-scale phenotyping for detailed milk fatty acid (FA) composition is difficult due to expensive and time-consuming analytical techniques. Reliability of genomic prediction is often low for traits that are expensive/difficult to measure and for breeds with a small reference population size. An effective method to increase reference population size could be to combine datasets from different populations. Prediction models might also benefit from incorporation of information on the biological underpinnings of quantitative traits. Genome-wide association studies (GWAS) show that genomic regions on Bos taurus chromosomes (BTA) 14, 19 and 26 underlie substantial proportions of the genetic variation in milk FA traits. Genomic prediction models that incorporate such results could enable improved prediction accuracy in spite of limited reference population sizes. In this study, we combine gas chromatography quantified FA samples from the Chinese, Danish and Dutch Holstein populations and implement a genomic feature best linear unbiased prediction (GFBLUP) model that incorporates variants on BTA14, 19 and 26 as genomic features for which random genetic effects are estimated separately. Prediction reliabilities were compared to those estimated with traditional GBLUP models. Results Predictions using a multi-population reference and a traditional GBLUP model resulted in average gains in prediction reliability of 10% points in the Dutch, 8% points in the Danish and 1% point in the Chinese populations compared to predictions based on population-specific references. Compared to the traditional GBLUP, implementation of the GFBLUP model with a multi-population reference led to further increases in prediction reliability of up to 38% points in the Dutch, 23% points in the Danish and 13% points in the Chinese populations. Prediction reliabilities from the GFBLUP model were moderate to high across the FA traits analyzed. Conclusions Our study shows that it is possible to predict genetic merits for milk FA traits with reasonable accuracy by combining related populations of a breed and using models that incorporate GWAS results. Our findings indicate that international collaborations that facilitate access to multi-population datasets could be highly beneficial to the implementation of genomic selection for detailed milk composition traits. Electronic supplementary material The online version of this article (10.1186/s12711-019-0460-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Grum Gebreyesus
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Blichers Allé 20, P.O. Box 50, 8830, Tjele, Denmark. .,Animal Breeding and Genomics Centre, Wageningen University, P.O. Box 338, 6700 AH, Wageningen, The Netherlands.
| | - Henk Bovenhuis
- Animal Breeding and Genomics Centre, Wageningen University, P.O. Box 338, 6700 AH, Wageningen, The Netherlands
| | - Mogens S Lund
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Blichers Allé 20, P.O. Box 50, 8830, Tjele, Denmark
| | - Nina A Poulsen
- Department of Food Science, Aarhus University, Blichers Allé 20, P.O. Box 50, 8830, Tjele, Denmark
| | - Dongxiao Sun
- Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture of China, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Bart Buitenhuis
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Blichers Allé 20, P.O. Box 50, 8830, Tjele, Denmark
| |
Collapse
|
77
|
Nani JP, Rezende FM, Peñagaricano F. Predicting male fertility in dairy cattle using markers with large effect and functional annotation data. BMC Genomics 2019; 20:258. [PMID: 30940077 PMCID: PMC6444482 DOI: 10.1186/s12864-019-5644-y] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2018] [Accepted: 03/25/2019] [Indexed: 11/22/2022] Open
Abstract
Background Fertility is among the most important economic traits in dairy cattle. Genomic prediction for cow fertility has received much attention in the last decade, while bull fertility has been largely overlooked. The goal of this study was to assess genomic prediction of dairy bull fertility using markers with large effect and functional annotation data. Sire conception rate (SCR) was used as a measure of service sire fertility. Dataset consisted of 11.5 k U.S. Holstein bulls with SCR records and about 300 k single nucleotide polymorphism (SNP) markers. The analyses included the use of both single-kernel and multi-kernel predictive models fitting either all SNPs, markers with large effect, or markers with presumed functional roles, such as non-synonymous, synonymous, or non-coding regulatory variants. Results The entire set of SNPs yielded predictive correlations of 0.340. Five markers located on chromosomes BTA8, BTA9, BTA13, BTA17, and BTA27 showed marked dominance effects. Interestingly, the inclusion of these five major markers as fixed effects in the predictive models increased predictive correlations to 0.403, representing an increase in accuracy of about 19% compared with the standard model. Single-kernel models fitting functional SNP classes outperformed their counterparts using random sets of SNPs, suggesting that the predictive power of these functional variants is driven in part by their biological roles. Multi-kernel models fitting all the functional SNP classes together with the five major markers exhibited predictive correlations around 0.405. Conclusions The inclusion of markers with large effect markedly improved the prediction of dairy sire fertility. Functional variants exhibited higher predictive ability than random variants, but did not outperform the standard whole-genome approach. This research is the foundation for the development of novel strategies that could help the dairy industry make accurate genome-guided selection decisions on service sire fertility.
Collapse
Affiliation(s)
- Juan Pablo Nani
- Department of Animal Sciences, University of Florida, 2250 Shealy Drive, Gainesville, FL, 32611, USA.,Estación Experimental Agropecuaria Rafaela, Instituto Nacional de Tecnología Agropecuaria, 22-2300, Rafaela, SF, Argentina
| | - Fernanda M Rezende
- Department of Animal Sciences, University of Florida, 2250 Shealy Drive, Gainesville, FL, 32611, USA.,Faculdade de Medicina Veterinária, Universidade Federal de Uberlândia, Uberlândia, MG, 38410-337, Brazil
| | - Francisco Peñagaricano
- Department of Animal Sciences, University of Florida, 2250 Shealy Drive, Gainesville, FL, 32611, USA. .,University of Florida Genetics Institute, University of Florida, Gainesville, FL, 32610, USA.
| |
Collapse
|
78
|
Voss-Fels KP, Cooper M, Hayes BJ. Accelerating crop genetic gains with genomic selection. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2019; 132:669-686. [PMID: 30569365 DOI: 10.1007/s00122-018-3270-8] [Citation(s) in RCA: 119] [Impact Index Per Article: 23.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/09/2018] [Accepted: 12/12/2018] [Indexed: 05/05/2023]
Abstract
Genomic prediction based on additive genetic effects can accelerate genetic gain. There are opportunities for further improvement by including non-additive effects that access untapped sources of genetic diversity. Several studies have reported a worrying gap between the projected global future demand for plant-based products and the current annual rates of production increase, indicating that enhancing the rate of genetic gain might be critical for future food security. Therefore, new breeding technologies and strategies are required to significantly boost genetic improvement of future crop cultivars. Genomic selection (GS) has delivered considerable genetic gain in animal breeding and is becoming an essential component of many modern plant breeding programmes as well. In this paper, we review the lessons learned from implementing GS in livestock and the impact of GS on crop breeding, and discuss important features for the success of GS under different breeding scenarios. We highlight major challenges associated with GS including rapid genotyping, phenotyping, genotype-by-environment interaction and non-additivity and give examples for opportunities to overcome these issues. Finally, the potential of combining GS with other modern technologies in order to maximise the rate of crop genetic improvement is discussed, including the potential of increasing prediction accuracy by integration of crop growth models in GS frameworks.
Collapse
Affiliation(s)
- Kai Peter Voss-Fels
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St Lucia, QLD, 4072, Australia
| | - Mark Cooper
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St Lucia, QLD, 4072, Australia
| | - Ben John Hayes
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St Lucia, QLD, 4072, Australia.
| |
Collapse
|
79
|
van den Berg S, Vandenplas J, van Eeuwijk FA, Bouwman AC, Lopes MS, Veerkamp RF. Imputation to whole-genome sequence using multiple pig populations and its use in genome-wide association studies. Genet Sel Evol 2019; 51:2. [PMID: 30678638 PMCID: PMC6346588 DOI: 10.1186/s12711-019-0445-y] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2018] [Accepted: 01/10/2019] [Indexed: 11/10/2022] Open
Abstract
Background Use of whole-genome sequence data (WGS) is expected to improve identification of quantitative trait loci (QTL). However, this requires imputation to WGS, often with a limited number of sequenced animals for the target population. The objective of this study was to investigate imputation to WGS in two pig lines using a multi-line reference population and, subsequently, to investigate the effect of using these imputed WGS (iWGS) for GWAS. Methods Phenotypes and genotypes were available on 12,184 Large White pigs (LW-line) and 4943 Dutch Landrace pigs (DL-line). Imputed 660 K and 80 K genotypes for the LW-line and DL-line, respectively, were imputed to iWGS using Beagle v.4.1. Since only 32 LW-line and 12 DL-line boars were sequenced, 142 animals from eight commercial lines were added. GWAS were performed for each line using the 80 K and 660 K SNPs, the genotype scores of iWGS SNPs that had an imputation accuracy (Beagle R2) higher than 0.6, and the dosage scores of all iWGS SNPs. Results For the DL-line (LW-line), imputation of 80 K genotypes to iWGS resulted in an average Beagle R2 of 0.39 (0.49). After quality control, 2.5 × 106 (3.5 × 106) SNPs had a Beagle R2 higher than 0.6, resulting in an average Beagle R2 of 0.83 (0.93). Compared to the 80 K and 660 K genotypes, using iWGS led to the identification of 48.9 and 64.4% more QTL regions, for the DL-line and LW-line, respectively, and the most significant SNPs in the QTL regions explained a higher proportion of phenotypic variance. Using dosage instead of genotype scores improved the identification of QTL, because the model accounted for uncertainty of imputation, and all SNPs were used in the analysis. Conclusions Imputation to WGS using the multi-line reference population resulted in relatively poor imputation, especially when imputing from 80 K (DL-line). In spite of the poor imputation accuracies, using iWGS instead of a lower density SNP chip increased the number of detected QTL and the estimated proportion of phenotypic variance explained by these QTL, especially when dosage scores were used instead of genotype scores. Thus, iWGS, even with poor imputation accuracy, can be used to identify possible interesting regions for fine mapping. Electronic supplementary material The online version of this article (10.1186/s12711-019-0445-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sanne van den Berg
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands.,Biometris, Wageningen University and Research, P.O. Box 16, 6700 AA, Wageningen, The Netherlands
| | - Jérémie Vandenplas
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands
| | - Fred A van Eeuwijk
- Biometris, Wageningen University and Research, P.O. Box 16, 6700 AA, Wageningen, The Netherlands
| | - Aniek C Bouwman
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands
| | - Marcos S Lopes
- Topigs Norsvin Research Center, 6640 AA, Beuningen, The Netherlands.,Topigs Norsvin, Curitiba, 80420-190, Brazil
| | - Roel F Veerkamp
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands.
| |
Collapse
|
80
|
Bolormaa S, Chamberlain AJ, Khansefid M, Stothard P, Swan AA, Mason B, Prowse-Wilkins CP, Duijvesteijn N, Moghaddar N, van der Werf JH, Daetwyler HD, MacLeod IM. Accuracy of imputation to whole-genome sequence in sheep. Genet Sel Evol 2019; 51:1. [PMID: 30654735 PMCID: PMC6337865 DOI: 10.1186/s12711-018-0443-5] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2018] [Accepted: 12/18/2018] [Indexed: 12/12/2022] Open
Abstract
Background The use of whole-genome sequence (WGS) data for genomic prediction and association studies is highly desirable because the causal mutations should be present in the data. The sequencing of 935 sheep from a range of breeds provides the opportunity to impute sheep genotyped with single nucleotide polymorphism (SNP) arrays to WGS. This study evaluated the accuracy of imputation from SNP genotypes to WGS using this reference population of 935 sequenced sheep. Results The accuracy of imputation from the Ovine Infinium® HD BeadChip SNP (~ 500 k) to WGS was assessed for three target breeds: Merino, Poll Dorset and F1 Border Leicester × Merino. Imputation accuracy was highest for the Poll Dorset breed, although there were more Merino individuals in the sequenced reference population than Poll Dorset individuals. In addition, empirical imputation accuracies were higher (by up to 1.7%) when using larger multi-breed reference populations compared to using a smaller single-breed reference population. The mean accuracy of imputation across target breeds using the Minimac3 or the FImpute software was 0.94. The empirical imputation accuracy varied considerably across the genome; six chromosomes carried regions of one or more Mb with a mean imputation accuracy of < 0.7. Imputation accuracy in five variant annotation classes ranged from 0.87 (missense) up to 0.94 (intronic variants), where lower accuracy corresponded to higher proportions of rare alleles. The imputation quality statistic reported from Minimac3 (R2) had a clear positive relationship with the empirical imputation accuracy. Therefore, by first discarding imputed variants with an R2 below 0.4, the mean empirical accuracy across target breeds increased to 0.97. Although accuracy of genomic prediction was less affected by filtering on R2 in a multi-breed population of sheep with imputed WGS, the genomic heritability clearly tended to be lower when using variants with an R2 ≤ 0.4. Conclusions The mean imputation accuracy was high for all target breeds and was increased by combining smaller breed sets into a multi-breed reference. We found that the Minimac3 software imputation quality statistic (R2) was a useful indicator of empirical imputation accuracy, enabling removal of very poorly imputed variants before downstream analyses. Electronic supplementary material The online version of this article (10.1186/s12711-018-0443-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sunduimijid Bolormaa
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, 5 Ring Rd, Bundoora, VIC, 3083, Australia. .,Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.
| | - Amanda J Chamberlain
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, 5 Ring Rd, Bundoora, VIC, 3083, Australia
| | - Majid Khansefid
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, 5 Ring Rd, Bundoora, VIC, 3083, Australia.,Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia
| | - Paul Stothard
- Faculty of Agricultural, Life and Environmental Sciences, University of Alberta, Edmonton, AB, T6G 2R3, Canada
| | - Andrew A Swan
- Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.,Animal Genetics and Breeding Unit, University of New England, Armidale, NSW, 2351, Australia
| | - Brett Mason
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, 5 Ring Rd, Bundoora, VIC, 3083, Australia
| | - Claire P Prowse-Wilkins
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, 5 Ring Rd, Bundoora, VIC, 3083, Australia
| | - Naomi Duijvesteijn
- Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.,School of Environmental and Rural Science, University of New England, Armidale, NSW, 2351, Australia
| | - Nasir Moghaddar
- Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.,School of Environmental and Rural Science, University of New England, Armidale, NSW, 2351, Australia
| | - Julius H van der Werf
- Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.,School of Environmental and Rural Science, University of New England, Armidale, NSW, 2351, Australia
| | - Hans D Daetwyler
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, 5 Ring Rd, Bundoora, VIC, 3083, Australia.,Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.,School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3086, Australia
| | - Iona M MacLeod
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, 5 Ring Rd, Bundoora, VIC, 3083, Australia.,Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia
| |
Collapse
|
81
|
|
82
|
Zhang Q, Sahana G, Su G, Guldbrandtsen B, Lund MS, Calus MPL. Impact of rare and low-frequency sequence variants on reliability of genomic prediction in dairy cattle. Genet Sel Evol 2018; 50:62. [PMID: 30458700 PMCID: PMC6247626 DOI: 10.1186/s12711-018-0432-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2018] [Accepted: 11/14/2018] [Indexed: 11/05/2022] Open
Abstract
Background Availability of whole-genome sequence data for a large number of cattle and efficient imputation methodologies open a new opportunity to include rare and low-frequency variants (RLFV) in genomic prediction in dairy cattle. The objective of this study was to examine the impact of including RLFV that are within genes and selected from whole-genome sequence variants, on the reliability of genomic prediction for fertility, health and longevity in dairy cattle. Results All genic RLFV with a minor allele frequency lower than 0.05 were extracted from imputed sequence data and subsets were created using different strategies. These subsets were subsequently combined with Illumina 50 k single nucleotide polymorphism (SNP) data and used for genomic prediction. Reliability of prediction obtained by using 50 k SNP data alone was used as reference value and absolute changes in reliabilities are referred to as changes in percentage points. Adding a component that included either all the genic or a subset of selected RLFV into the model in addition to the 50 k component changed the reliability of predictions by − 2.2 to 1.1%, i.e. hardly no change in reliability of prediction was found, regardless of how the RLFV were selected. In addition to these empirical analyses, a simulation study was performed to evaluate the potential impact of adding RLFV in the model on the reliability of prediction. Three sets of causal RLFV (containing 21,468, 1348 and 235 RLFV) that were randomly selected from different numbers of genes were generated and accounted for 10% additional genetic variance of the estimated variance explained by the 50 k SNPs. When genic RLFV based on mapping results were included in the prediction model, reliabilities improved by up to 4.0% and when the causal RLFV were included they improved by up to 6.8%. Conclusions Using selected RLFV from whole-genome sequence data had only a small impact on the empirical reliability of genomic prediction in dairy cattle. Our simulations revealed that for sequence data to bring a benefit, the key is to identify causal RLFV. Electronic supplementary material The online version of this article (10.1186/s12711-018-0432-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Qianqian Zhang
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark. .,Wageningen University and Research, Animal Breeding and Genomics, Wageningen, The Netherlands. .,Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.
| | - Goutam Sahana
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark
| | - Guosheng Su
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark
| | - Bernt Guldbrandtsen
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark
| | - Mogens Sandø Lund
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark
| | - Mario P L Calus
- Wageningen University and Research, Animal Breeding and Genomics, Wageningen, The Netherlands
| |
Collapse
|
83
|
Giuffra E, Tuggle CK. Functional Annotation of Animal Genomes (FAANG): Current Achievements and Roadmap. Annu Rev Anim Biosci 2018; 7:65-88. [PMID: 30427726 DOI: 10.1146/annurev-animal-020518-114913] [Citation(s) in RCA: 94] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Functional annotation of genomes is a prerequisite for contemporary basic and applied genomic research, yet farmed animal genomics is deficient in such annotation. To address this, the FAANG (Functional Annotation of Animal Genomes) Consortium is producing genome-wide data sets on RNA expression, DNA methylation, and chromatin modification, as well as chromatin accessibility and interactions. In addition to informing our understanding of genome function, including comparative approaches to elucidate constrained sequence or epigenetic elements, these annotation maps will improve the precision and sensitivity of genomic selection for animal improvement. A scientific community-driven effort has already created a coordinated data collection and analysis enterprise crucial for the success of this global effort. Although it is early in this continuing process, functional data have already been produced and application to genetic improvement reported. The functional annotation delivered by the FAANG initiative will add value and utility to the greatly improved genome sequences being established for domesticated animal species.
Collapse
Affiliation(s)
- Elisabetta Giuffra
- Génétique Animale et Biologie Intégrative (GABI), Institut National de la Recherche Agronomique (INRA), AgroParisTech, Université Paris Saclay, 78350 Jouy-en-Josas, France;
| | | | | |
Collapse
|
84
|
Raymond B, Bouwman AC, Wientjes YCJ, Schrooten C, Houwing-Duistermaat J, Veerkamp RF. Genomic prediction for numerically small breeds, using models with pre-selected and differentially weighted markers. Genet Sel Evol 2018; 50:49. [PMID: 30314431 PMCID: PMC6186145 DOI: 10.1186/s12711-018-0419-5] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2018] [Accepted: 10/01/2018] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Genomic prediction (GP) accuracy in numerically small breeds is limited by the small size of the reference population. Our objective was to test a multi-breed multiple genomic relationship matrices (GRM) GP model (MBMG) that weighs pre-selected markers separately, uses the remaining markers to explain the remaining genetic variance that can be explained by markers, and weighs information of breeds in the reference population by their genetic correlation with the validation breed. METHODS Genotype and phenotype data were used on 595 Jersey bulls from New Zealand and 5503 Holstein bulls from the Netherlands, all with deregressed proofs for stature. Different sets of markers were used, containing either pre-selected markers from a meta-genome-wide association analysis on stature, remaining markers or both. We implemented a multi-breed bivariate GREML model in which we fitted either a single multi-breed GRM (MBSG), or two distinct multi-breed GRM (MBMG), one made with pre-selected markers and the other with remaining markers. Accuracies of predicting stature for Jersey individuals using the multi-breed models (Holstein and Jersey combined reference population) was compared to those obtained using either the Jersey (within-breed) or Holstein (across-breed) reference population. All the models were subsequently fitted in the analysis of simulated phenotypes, with a simulated genetic correlation between breeds of 1, 0.5, and 0.25. RESULTS The MBMG model always gave better prediction accuracies for stature compared to MBSG, within-, and across-breed GP models. For example, with MBSG, accuracies obtained by fitting 48,912 unselected markers (0.43), 357 pre-selected markers (0.38) or a combination of both (0.43), were lower than accuracies obtained by fitting pre-selected and unselected markers in separate GRM in MBMG (0.49). This improvement was further confirmed by results from a simulation study, with MBMG performing on average 23% better than MBSG with all markers fitted. CONCLUSIONS With the MBMG model, it is possible to use information from numerically large breeds to improve prediction accuracy of numerically small breeds. The superiority of MBMG is mainly due to its ability to use information on pre-selected markers, explain the remaining genetic variance and weigh information from a different breed by the genetic correlation between breeds.
Collapse
Affiliation(s)
- Biaty Raymond
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH Wageningen, The Netherlands
- Biometris, Wageningen University and Research, 6700 AA Wageningen, The Netherlands
| | - Aniek C. Bouwman
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH Wageningen, The Netherlands
| | - Yvonne C. J. Wientjes
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH Wageningen, The Netherlands
| | | | - Jeanine Houwing-Duistermaat
- Department of Medical Statistics and Bioinformatics, Leiden University Medical Centre, 2333 ZC Leiden, The Netherlands
- School of Mathematics, Faculty of Mathematics and Physical Sciences, University of Leeds, Leeds, LS2 9JT UK
| | - Roel F. Veerkamp
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH Wageningen, The Netherlands
| |
Collapse
|
85
|
Cai Z, Guldbrandtsen B, Lund MS, Sahana G. Prioritizing candidate genes post-GWAS using multiple sources of data for mastitis resistance in dairy cattle. BMC Genomics 2018; 19:656. [PMID: 30189836 PMCID: PMC6127918 DOI: 10.1186/s12864-018-5050-x] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Accepted: 08/31/2018] [Indexed: 12/31/2022] Open
Abstract
Background Improving resistance to mastitis, one of the costliest diseases in dairy production, has become an important objective in dairy cattle breeding. However, mastitis resistance is influenced by many genes involved in multiple processes, including the response to infection, inflammation, and post-infection healing. Low genetic heritability, environmental variations, and farm management differences further complicate the identification of links between genetic variants and mastitis resistance. Consequently, studies of the genetics of variation in mastitis resistance in dairy cattle lack agreement about the responsible genes. Results We associated 15,552,968 imputed whole-genome sequencing markers for 5147 Nordic Holstein cattle with mastitis resistance in a genome-wide association study (GWAS). Next, we augmented P-values for markers in genes in the associated regions using Gene Ontology terms, Kyoto Encyclopedia of Genes and Genomes pathway analysis, and mammalian phenotype database. To confirm results of gene-based analyses, we used gene expression data from E. coli-challenged cow udders. We identified 22 independent quantitative trait loci (QTL) that collectively explained 14% of the variance in breeding values for resistance to clinical mastitis (CM). Using association test statistics with multiple pieces of independent information on gene function and differential expression during bacterial infection, we suggested putative causal genes with biological relevance for 12 QTL affecting resistance to CM in dairy cattle. Conclusion Combining information on the nearest positional genes, gene-based analyses, and differential gene expression data from RNA-seq, we identified putative causal genes (candidate genes with biological evidence) in QTL for mastitis resistance in Nordic Holstein cattle. The same strategy can be applied for other traits. Electronic supplementary material The online version of this article (10.1186/s12864-018-5050-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Zexi Cai
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, 8830, Tjele, Denmark.
| | - Bernt Guldbrandtsen
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, 8830, Tjele, Denmark
| | - Mogens Sandø Lund
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, 8830, Tjele, Denmark
| | - Goutam Sahana
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, 8830, Tjele, Denmark
| |
Collapse
|
86
|
Kroezen V, Schenkel F, Miglior F, Baes C, Squires E. Candidate gene association analyses for ketosis resistance in Holsteins. J Dairy Sci 2018; 101:5240-5249. [DOI: 10.3168/jds.2017-13374] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2017] [Accepted: 02/14/2018] [Indexed: 11/19/2022]
|
87
|
Zhang C, Kemp RA, Stothard P, Wang Z, Boddicker N, Krivushin K, Dekkers J, Plastow G. Genomic evaluation of feed efficiency component traits in Duroc pigs using 80K, 650K and whole-genome sequence variants. Genet Sel Evol 2018; 50:14. [PMID: 29625549 PMCID: PMC5889553 DOI: 10.1186/s12711-018-0387-9] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2017] [Accepted: 03/27/2018] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Increasing marker density was proposed to have potential to improve the accuracy of genomic prediction for quantitative traits; whole-sequence data is expected to give the best accuracy of prediction, since all causal mutations that underlie a trait are expected to be included. However, in cattle and chicken, this assumption is not supported by empirical studies. Our objective was to compare the accuracy of genomic prediction of feed efficiency component traits in Duroc pigs using single nucleotide polymorphism (SNP) panels of 80K, imputed 650K, and whole-genome sequence variants using GBLUP, BayesB and BayesRC methods, with the ultimate purpose to determine the optimal method to increase genetic gain for feed efficiency in pigs. RESULTS Phenotypes of average daily feed intake (ADFI), average daily gain (ADG), ultrasound backfat depth (FAT), and loin muscle depth (LMD) were available for 1363 Duroc boars from a commercial breeding program. Genotype imputation accuracies reached 92.1% from 80K to 650K and 85.6% from 650K to whole-genome sequence variants. Average accuracies across methods and marker densities of genomic prediction of ADFI, FAT, LMD and ADG were 0.40, 0.65, 0.30 and 0.15, respectively. For ADFI and FAT, BayesB outperformed GBLUP, but increasing marker density had little advantage for genomic prediction. For ADG and LMD, GBLUP outperformed BayesB, while BayesRC based on whole-genome sequence data gave the best accuracies and reached up to 0.35 for LMD and 0.25 for ADG. CONCLUSIONS Use of genomic information was beneficial for prediction of ADFI and FAT but not for that of ADG and LMD compared to pedigree-based estimates. BayesB based on 80K SNPs gave the best genomic prediction accuracy for ADFI and FAT, while BayesRC based on whole-genome sequence data performed best for ADG and LMD. We suggest that these differences between traits in the effect of marker density and method on accuracy of genomic prediction are mainly due to the underlying genetic architecture of the traits.
Collapse
Affiliation(s)
- Chunyan Zhang
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, T6G 2R3, Canada
| | | | - Paul Stothard
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, T6G 2R3, Canada
| | - Zhiquan Wang
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, T6G 2R3, Canada
| | | | - Kirill Krivushin
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, T6G 2R3, Canada
| | - Jack Dekkers
- Department of Animal Science, Iowa State University, Ames, IA, 50011, USA
| | - Graham Plastow
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, T6G 2R3, Canada.
| |
Collapse
|
88
|
Song H, Li L, Ma P, Zhang S, Su G, Lund MS, Zhang Q, Ding X. Short communication: Improving the accuracy of genomic prediction of body conformation traits in Chinese Holsteins using markers derived from high-density marker panels. J Dairy Sci 2018; 101:5250-5254. [PMID: 29550139 DOI: 10.3168/jds.2017-13456] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2017] [Accepted: 11/25/2017] [Indexed: 01/02/2023]
Abstract
This study investigated the efficiency of genomic prediction with adding the markers identified by genome-wide association study (GWAS) using a data set of imputed high-density (HD) markers from 54K markers in Chinese Holsteins. Among 3,056 Chinese Holsteins with imputed HD data, 2,401 individuals born before October 1, 2009, were used for GWAS and a reference population for genomic prediction, and the 220 younger cows were used as a validation population. In total, 1,403, 1,536, and 1,383 significant single nucleotide polymorphisms (SNP; false discovery rate at 0.05) associated with conformation final score, mammary system, and feet and legs were identified, respectively. About 2 to 3% genetic variance of 3 traits was explained by these significant SNP. Only a very small proportion of significant SNP identified by GWAS was included in the 54K marker panel. Three new marker sets (54K+) were herein produced by adding significant SNP obtained by linear mixed model for each trait into the 54K marker panel. Genomic breeding values were predicted using a Bayesian variable selection (BVS) model. The accuracies of genomic breeding value by BVS based on the 54K+ data were 2.0 to 5.2% higher than those based on the 54K data. The imputed HD markers yielded 1.4% higher accuracy on average (BVS) than the 54K data. Both the 54K+ and HD data generated lower bias of genomic prediction, and the 54K+ data yielded the lowest bias in all situations. Our results show that the imputed HD data were not very useful for improving the accuracy of genomic prediction and that adding the significant markers derived from the imputed HD marker panel could improve the accuracy of genomic prediction and decrease the bias of genomic prediction.
Collapse
Affiliation(s)
- H Song
- Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture of China, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing 100193, P.R. China
| | - L Li
- Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture of China, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing 100193, P.R. China
| | - P Ma
- Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture of China, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing 100193, P.R. China; Department of Molecular Biology and Genetics, Aarhus University, DK-8830 Tjele, Denmark; Department of Animal Science, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai 200240, P.R. China
| | - S Zhang
- Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture of China, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing 100193, P.R. China
| | - G Su
- Department of Molecular Biology and Genetics, Aarhus University, DK-8830 Tjele, Denmark
| | - M S Lund
- Department of Molecular Biology and Genetics, Aarhus University, DK-8830 Tjele, Denmark
| | - Q Zhang
- Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture of China, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing 100193, P.R. China
| | - X Ding
- Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture of China, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing 100193, P.R. China.
| |
Collapse
|
89
|
Jardim JG, Guldbrandtsen B, Lund MS, Sahana G. Association analysis for udder index and milking speed with imputed whole-genome sequence variants in Nordic Holstein cattle. J Dairy Sci 2017; 101:2199-2212. [PMID: 29274975 DOI: 10.3168/jds.2017-12982] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2017] [Accepted: 10/30/2017] [Indexed: 12/26/2022]
Abstract
Genome-wide association testing facilitates the identification of genetic variants associated with complex traits. Mapping genes that promote genetic resistance to mastitis could reduce the cost of antibiotic use and enhance animal welfare and milk production by improving outcomes of breeding for udder health. Using imputed whole-genome sequence variants, we carried out association studies for 2 traits related to udder health, udder index, and milking speed in Nordic Holstein cattle. A total of 4,921 bulls genotyped with the BovineSNP50 BeadChip array were imputed to high-density genotypes (Illumina BovineHD BeadChip, Illumina, San Diego, CA) and, subsequently, to whole-genome sequence variants. An association analysis was carried out using a linear mixed model. Phenotypes used in the association analyses were deregressed breeding values. Multitrait meta-analysis was carried out for these 2 traits. We identified 10 and 8 chromosomes harboring markers that were significantly associated with udder index and milking speed, respectively. Strongest association signals were observed on chromosome 20 for udder index and chromosome 19 for milking speed. Multitrait meta-analysis identified 13 chromosomes harboring associated markers for the combination of udder index and milking speed. The associated region on chromosome 20 overlapped with earlier reported quantitative trait loci for similar traits in other cattle populations. Moreover, this region was located close to the FYB gene, which is involved in platelet activation and controls IL-2 expression; FYB is a strong candidate gene for udder health and worthy of further investigation.
Collapse
Affiliation(s)
- Júlia Gazzoni Jardim
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, 8830 Tjele, Denmark; Laboratory of Reproduction and Animal Breeding, State University of North Fluminense Darcy Ribeiro, Av. Alberto Lamego, 2000 Parque California, Campos dos Goytacazes, RJ, 28013-602, Brazil
| | - Bernt Guldbrandtsen
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, 8830 Tjele, Denmark
| | - Mogens Sandø Lund
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, 8830 Tjele, Denmark
| | - Goutam Sahana
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, 8830 Tjele, Denmark.
| |
Collapse
|
90
|
Pausch H, Emmerling R, Gredler-Grandl B, Fries R, Daetwyler HD, Goddard ME. Meta-analysis of sequence-based association studies across three cattle breeds reveals 25 QTL for fat and protein percentages in milk at nucleotide resolution. BMC Genomics 2017; 18:853. [PMID: 29121857 PMCID: PMC5680815 DOI: 10.1186/s12864-017-4263-8] [Citation(s) in RCA: 42] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2017] [Accepted: 11/02/2017] [Indexed: 11/25/2022] Open
Abstract
Background Genotyping and whole-genome sequencing data have been generated for hundreds of thousands of cattle. International consortia used these data to compile imputation reference panels that facilitate the imputation of sequence variant genotypes for animals that have been genotyped using dense microarrays. Association studies with imputed sequence variant genotypes allow for the characterization of quantitative trait loci (QTL) at nucleotide resolution particularly when individuals from several breeds are included in the mapping populations. Results We imputed genotypes for 28 million sequence variants in 17,229 cattle of the Braunvieh, Fleckvieh and Holstein breeds in order to compile large mapping populations that provide high power to identify QTL for milk production traits. Association tests between imputed sequence variant genotypes and fat and protein percentages in milk uncovered between six and thirteen QTL (P < 1e-8) per breed. Eight of the detected QTL were significant in more than one breed. We combined the results across breeds using meta-analysis and identified a total of 25 QTL including six that were not significant in the within-breed association studies. Two missense mutations in the ABCG2 (p.Y581S, rs43702337, P = 4.3e-34) and GHR (p.F279Y, rs385640152, P = 1.6e-74) genes were the top variants at QTL on chromosomes 6 and 20. Another known causal missense mutation in the DGAT1 gene (p.A232K, rs109326954, P = 8.4e-1436) was the second top variant at a QTL on chromosome 14 but its allelic substitution effects were inconsistent across breeds. It turned out that the conflicting allelic substitution effects resulted from flaws in the imputed genotypes due to the use of a multi-breed reference population for genotype imputation. Conclusions Many QTL for milk production traits segregate across breeds and across-breed meta-analysis has greater power to detect such QTL than within-breed association testing. Association testing between imputed sequence variant genotypes and phenotypes of interest facilitates identifying causal mutations provided the accuracy of imputation is high. However, true causal mutations may remain undetected when the imputed sequence variant genotypes contain flaws. It is highly recommended to validate the effect of known causal variants in order to assess the ability to detect true causal mutations in association studies with imputed sequence variants. Electronic supplementary material The online version of this article (10.1186/s12864-017-4263-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Hubert Pausch
- Animal Genomics, Institute of Agricultural Sciences, ETH Zurich, 8092, Zurich, Switzerland. .,Agriculture Research Division, Agriculture Victoria, Department of Economic Development, Jobs, Transport and Resources, AgriBio, VIC, 3083, Australia.
| | - Reiner Emmerling
- Institute of Animal Breeding, Bavarian State Research Center for Agriculture, 85586, Grub, Germany
| | | | - Ruedi Fries
- Animal Breeding, Technische Universitaet Muenchen, 85354, Freising, Germany
| | - Hans D Daetwyler
- Agriculture Research Division, Agriculture Victoria, Department of Economic Development, Jobs, Transport and Resources, AgriBio, VIC, 3083, Australia.,School of Applied Systems Biology, LaTrobe University, Bundoora, VIC, 3083, Australia
| | - Michael E Goddard
- Agriculture Research Division, Agriculture Victoria, Department of Economic Development, Jobs, Transport and Resources, AgriBio, VIC, 3083, Australia.,Faculty of Veterinary and Agricultural Sciences, University of Melbourne, Melbourne, VIC, 3010, Australia
| |
Collapse
|
91
|
van den Berg I, Bowman PJ, MacLeod IM, Hayes BJ, Wang T, Bolormaa S, Goddard ME. Multi-breed genomic prediction using Bayes R with sequence data and dropping variants with a small effect. Genet Sel Evol 2017; 49:70. [PMID: 28934948 PMCID: PMC5609075 DOI: 10.1186/s12711-017-0347-9] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2017] [Accepted: 09/13/2017] [Indexed: 11/26/2022] Open
Abstract
Background The increasing availability of whole-genome sequence data is expected to increase the accuracy of genomic prediction. However, results from simulation studies and analysis of real data do not always show an increase in accuracy from sequence data compared to high-density (HD) single nucleotide polymorphism (SNP) chip genotypes. In addition, the sheer number of variants makes analysis of all variants and accurate estimation of all effects computationally challenging. Our objective was to find a strategy to approximate the analysis of whole-sequence data with a Bayesian variable selection model. Using a simulated dataset, we applied a Bayes R hybrid model to analyse whole-sequence data, test the effect of dropping a proportion of variants during the analysis, and test how the analysis can be split into separate analyses per chromosome to reduce the elapsed computing time. We also investigated the effect of imputation errors on prediction accuracy. Subsequently, we applied the approach to a dataset that contained imputed sequences and records for production and fertility traits for 38,492 Holstein, Jersey, Australian Red and crossbred bulls and cows. Results With the simulated dataset, we found that prediction accuracy was highly increased for a breed that was not represented in the training population for sequence data compared to HD SNP data. Either dropping part of the variants during the analysis or splitting the analysis into separate analyses per chromosome decreased accuracy compared to analysing whole-sequence data. First, dropping variants from each chromosome and reanalysing the retained variants together resulted in an accuracy similar to that obtained when analysing whole-sequence data. Adding imputation errors decreased prediction accuracy, especially for errors in the validation population. With real data, using sequence variants resulted in accuracies that were similar to those obtained with the HD SNPs. Conclusions We present an efficient approach to approximate analysis of whole-sequence data with a Bayesian variable selection model. The lack of increase in prediction accuracy when applied to real data could be due to imputation errors, which demonstrates the importance of developing more accurate methods of imputation or directly genotyping sequence variants that have a major effect in the prediction equation. Electronic supplementary material The online version of this article (doi:10.1186/s12711-017-0347-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Irene van den Berg
- Faculty of Veterinary and Agricultural Science, University of Melbourne, Parkville, VIC, Australia.
| | - Phil J Bowman
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia.,School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3083, Australia
| | - Iona M MacLeod
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia
| | - Ben J Hayes
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia.,Queensland Alliance for Agriculture and Food Innovation, Centre for Animal Science, University of Queensland, St Lucia, QLD, Australia
| | - Tingting Wang
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia
| | - Sunduimijid Bolormaa
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia
| | - Mike E Goddard
- Faculty of Veterinary and Agricultural Science, University of Melbourne, Parkville, VIC, Australia.,Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia
| |
Collapse
|
92
|
Fang L, Sahana G, Ma P, Su G, Yu Y, Zhang S, Lund MS, Sørensen P. Use of biological priors enhances understanding of genetic architecture and genomic prediction of complex traits within and between dairy cattle breeds. BMC Genomics 2017; 18:604. [PMID: 28797230 PMCID: PMC5553760 DOI: 10.1186/s12864-017-4004-z] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2016] [Accepted: 08/02/2017] [Indexed: 02/08/2023] Open
Abstract
Background A better understanding of the genetic architecture underlying complex traits (e.g., the distribution of causal variants and their effects) may aid in the genomic prediction. Here, we hypothesized that the genomic variants of complex traits might be enriched in a subset of genomic regions defined by genes grouped on the basis of “Gene Ontology” (GO), and that incorporating this independent biological information into genomic prediction models might improve their predictive ability. Results Four complex traits (i.e., milk, fat and protein yields, and mastitis) together with imputed sequence variants in Holstein (HOL) and Jersey (JER) cattle were analysed. We first carried out a post-GWAS analysis in a HOL training population to assess the degree of enrichment of the association signals in the gene regions defined by each GO term. We then extended the genomic best linear unbiased prediction model (GBLUP) to a genomic feature BLUP (GFBLUP) model, including an additional genomic effect quantifying the joint effect of a group of variants located in a genomic feature. The GBLUP model using a single random effect assumes that all genomic variants contribute to the genomic relationship equally, whereas GFBLUP attributes different weights to the individual genomic relationships in the prediction equation based on the estimated genomic parameters. Our results demonstrate that the immune-relevant GO terms were more associated with mastitis than milk production, and several biologically meaningful GO terms improved the prediction accuracy with GFBLUP for the four traits, as compared with GBLUP. The improvement of the genomic prediction between breeds (the average increase across the four traits was 0.161) was more apparent than that it was within the HOL (the average increase across the four traits was 0.020). Conclusions Our genomic feature modelling approaches provide a framework to simultaneously explore the genetic architecture and genomic prediction of complex traits by taking advantage of independent biological knowledge. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-4004-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Lingzhao Fang
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, 8830, Tjele, Denmark. .,Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture & National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China.
| | - Goutam Sahana
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, 8830, Tjele, Denmark
| | - Peipei Ma
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, 8830, Tjele, Denmark
| | - Guosheng Su
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, 8830, Tjele, Denmark
| | - Ying Yu
- Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture & National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Shengli Zhang
- Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture & National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Mogens Sandø Lund
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, 8830, Tjele, Denmark
| | - Peter Sørensen
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, 8830, Tjele, Denmark
| |
Collapse
|
93
|
Fragomeni BO, Lourenco DAL, Masuda Y, Legarra A, Misztal I. Incorporation of causative quantitative trait nucleotides in single-step GBLUP. Genet Sel Evol 2017; 49:59. [PMID: 28747171 PMCID: PMC5530494 DOI: 10.1186/s12711-017-0335-0] [Citation(s) in RCA: 51] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2017] [Accepted: 07/17/2017] [Indexed: 11/23/2022] Open
Abstract
Background Much effort is put into identifying causative quantitative trait nucleotides (QTN) in animal breeding, empowered by the availability of dense single nucleotide polymorphism (SNP) information. Genomic selection using traditional SNP information is easily implemented for any number of genotyped individuals using single-step genomic best linear unbiased predictor (ssGBLUP) with the algorithm for proven and young (APY). Our aim was to investigate whether ssGBLUP is useful for genomic prediction when some or all QTN are known. Methods Simulations included 180,000 animals across 11 generations. Phenotypes were available for all animals in generations 6 to 10. Genotypes for 60,000 SNPs across 10 chromosomes were available for 29,000 individuals. The genetic variance was fully accounted for by 100 or 1000 biallelic QTN. Raw genomic relationship matrices (GRM) were computed from (a) unweighted SNPs, (b) unweighted SNPs and causative QTN, (c) SNPs and causative QTN weighted with results obtained with genome-wide association studies, (d) unweighted SNPs and causative QTN with simulated weights, (e) only unweighted causative QTN, (f–h) as in (b–d) but using only the top 10% causative QTN, and (i) using only causative QTN with simulated weight. Predictions were computed by pedigree-based BLUP (PBLUP) and ssGBLUP. Raw GRM were blended with 1 or 5% of the numerator relationship matrix, or 1% of the identity matrix. Inverses of GRM were obtained directly or with APY. Results Accuracy of breeding values for 5000 genotyped animals in the last generation with PBLUP was 0.32, and for ssGBLUP it increased to 0.49 with an unweighted GRM, 0.53 after adding unweighted QTN, 0.63 when QTN weights were estimated, and 0.89 when QTN weights were based on true effects known from the simulation. When the GRM was constructed from causative QTN only, accuracy was 0.95 and 0.99 with blending at 5 and 1%, respectively. Accuracies simulating 1000 QTN were generally lower, with a similar trend. Accuracies using the APY inverse were equal or higher than those with a regular inverse. Conclusions Single-step GBLUP can account for causative QTN via a weighted GRM. Accuracy gains are maximum when variances of causative QTN are known and blending is at 1%.
Collapse
Affiliation(s)
- Breno O Fragomeni
- Edgar L. Rhodes Center for Animal and Dairy Science, University of Georgia, Athens, GA, USA.
| | - Daniela A L Lourenco
- Edgar L. Rhodes Center for Animal and Dairy Science, University of Georgia, Athens, GA, USA
| | - Yutaka Masuda
- Edgar L. Rhodes Center for Animal and Dairy Science, University of Georgia, Athens, GA, USA
| | - Andres Legarra
- GenPhySE, INRA, INPT, INP-ENVT, Université de Toulouse, 31326, Castanet-Tolosan, France
| | - Ignacy Misztal
- Edgar L. Rhodes Center for Animal and Dairy Science, University of Georgia, Athens, GA, USA
| |
Collapse
|
94
|
Wu X, Guldbrandtsen B, Nielsen US, Lund MS, Sahana G. Association analysis for young stock survival index with imputed whole-genome sequence variants in Nordic Holstein cattle. J Dairy Sci 2017; 100:6356-6370. [PMID: 28551195 DOI: 10.3168/jds.2017-12688] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2017] [Accepted: 04/05/2017] [Indexed: 01/09/2023]
Abstract
Identification of the genetic variants associated with calf survival in dairy cattle will aid in the elimination of harmful mutations from the cattle population and the reduction of calf and young stock mortality rates. We used de-regressed estimated breeding values for the young stock survival (YSS) index as response variables in a genome-wide association study with imputed whole-genome sequence variants. A total of 4,610 bulls with estimated breeding values were genotyped with the Illumina BovineSNP50 (Illumina, San Diego, CA) single nucleotide polymorphism (SNP) genotyping array. Genotypes were imputed to whole-genome sequence variants. After quality control, 15,419,550 SNP on 29 Bos taurus autosomes (BTA) were used for association analysis. A modified mixed-model association analysis was used for a genome scan, followed by a linear mixed-model analysis for selected genetic variants. We identified 498 SNP on BTA5 and BTA18 that were associated with the YSS index in Nordic Holstein. The SNP rs440345507 (Chr5:94721790) on BTA5 was the putative causal mutation affecting YSS. Two haplotype-based models were used to identify haplotypes with the largest detrimental effects on YSS index. For each association signal, 1 haplotype region with harmful effects and the lead associated SNP were identified. Detected haplotypes on BTA5 and BTA18 explained 1.16 and 1.20%, respectively, of genetic variance for the YSS index. We examined whether YSS quantitative trait loci (QTL) on BTA5 and BTA18 were associated with stillbirth. YSS QTL on BTA18 overlapped a QTL region for stillbirth, but most likely 2 different causal variants were responsible for these 2 QTL. Four component traits of the YSS index, defined by sex and age, were analyzed separately by the modified mixed-model approach. The same genomic regions were associated with both bull and heifer calf mortality. Several genes (EPS8, LOC100138951, and KLK family genes) contained a lead associated SNP or were included in haplotypes with large detrimental effects on YSS in Nordic Holstein cattle.
Collapse
Affiliation(s)
- Xiaoping Wu
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, 8830 Tjele, Denmark
| | - Bernt Guldbrandtsen
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, 8830 Tjele, Denmark
| | - Ulrik Sander Nielsen
- Livestock Innovation, SEGES, Danish Agricultural and Food Council F.m.b.A, 8200 Aarhus, Denmark
| | - Mogens Sandø Lund
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, 8830 Tjele, Denmark
| | - Goutam Sahana
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, 8830 Tjele, Denmark.
| |
Collapse
|
95
|
Lopes MS, Bovenhuis H, van Son M, Nordbø Ø, Grindflek EH, Knol EF, Bastiaansen JWM. Using markers with large effect in genetic and genomic predictions. J Anim Sci 2017; 95:59-71. [PMID: 28177367 DOI: 10.2527/jas.2016.0754] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The first attempts of applying marker-assisted selection (MAS) in animal breeding were not very successful because the identification of markers closely linked to QTL using low-density microsatellite panels was difficult. More recently, the use of high-density SNP panels in genome-wide association studies (GWAS) have increased the power and precision of identifying markers linked to QTL, which offer new possibilities for MAS. However, when GWAS started to be performed, the focus of many breeders had already shifted from the use of MAS to the application of genomic selection (using all available markers without any preselection of markers linked to QTL). In this study, we aimed to evaluate the prediction accuracy of a MAS approach that accounts for GWAS findings in the prediction models by including the most significant SNP from GWAS as a fixed effect in the marker-assisted BLUP (MA-BLUP) and marker-assisted genomic BLUP (MA-GBLUP) prediction models. A second aim was to compare the prediction accuracies from the marker-assisted models with those obtained from a Bayesian variable selection (BVS) model. To compare the prediction accuracies of traditional BLUP, MA-BLUP, genomic BLUP (GBLUP), MA-GBLUP, and BVS, we applied these models to the trait "number of teats" in 4 distinct pig populations, for validation of the results. The most significant SNP in each population was located at approximately 103.50 Mb on chromosome 7. Applying MAS by accounting for the most significant SNP in the prediction models resulted in improved prediction accuracy for number of teats in all evaluated populations compared with BLUP and GBLUP. Using MA-BLUP instead of BLUP, the increase in prediction accuracy ranged from 0.021 to 0.124, whereas using MA-GBLUP instead of GBLUP, the increase in prediction accuracy ranged from 0.003 to 0.043. The BVS model resulted in similar or higher prediction accuracies than MA-GBLUP. For the trait number of teats, BLUP resulted in the lowest prediction accuracies whereas the highest were observed when applying MA-GBLUP or BVS. In the same data set, MA-BLUP can yield similar or superior accuracies compared with GBLUP. The superiority of MA-GBLUP over traditional GBLUP is more pronounced when training populations are smaller and when relationships between training and validation populations are smaller. Marker-assisted GBLUP did not outperform BVS but does have implementation advantages in large-scale evaluations.
Collapse
|
96
|
Selecting sequence variants to improve genomic predictions for dairy cattle. Genet Sel Evol 2017; 49:32. [PMID: 28270096 PMCID: PMC5339980 DOI: 10.1186/s12711-017-0307-4] [Citation(s) in RCA: 81] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2016] [Accepted: 02/27/2017] [Indexed: 01/26/2023] Open
Abstract
Background Millions of genetic variants have been identified by population-scale sequencing projects, but subsets of these variants are needed for routine genomic predictions or genotyping arrays. Methods for selecting sequence variants were compared using simulated sequence genotypes and real July 2015 data from the 1000 Bull Genomes Project. Methods Candidate sequence variants for 444 Holstein animals were combined with high-density (HD) imputed genotypes for 26,970 progeny-tested Holstein bulls. Test 1 included single nucleotide polymorphisms (SNPs) for 481,904 candidate sequence variants. Test 2 also included 249,966 insertions-deletions (InDels). After merging sequence variants with 312,614 HD SNPs and editing steps, Tests 1 and 2 included 762,588 and 1,003,453 variants, respectively. Imputation quality from findhap software was assessed with 404 of the sequenced animals in the reference population and 40 randomly chosen animals for validation. Their sequence genotypes were reduced to the subset of genotypes that were in common with HD genotypes and then imputed back to sequence. Predictions were tested for 33 traits using 2015 data of 3983 US validation bulls with daughters that were first phenotyped after August 2011. Results The average percentage of correctly imputed variants across all chromosomes was 97.2 for Test 1 and 97.0 for Test 2. Total time required to prepare, edit, impute, and estimate the effects of sequence variants for 27,235 bulls was about 1 week using less than 33 threads. Many sequence variants had larger estimated effects than nearby HD SNPs, but prediction reliability improved only by 0.6 percentage points in Test 1 when sequence SNPs were added to HD SNPs and by 0.4 percentage points in Test 2 when sequence SNPs and InDels were included. However, selecting the 16,648 candidate SNPs with the largest estimated effects and adding them to the 60,671 SNPs used in routine evaluations improved reliabilities by 2.7 percentage points. Conclusions Reliabilities for genomic predictions improved when selected sequence variants were added; gains were similar for simulated and real data for the same population, and larger than previous gains obtained by adding HD SNPs. With many genotyped animals, many data sources, and millions of variants, computing strategies must efficiently balance costs of imputation, selection, and prediction to obtain subsets of markers that provide the highest accuracy. Electronic supplementary material The online version of this article (doi:10.1186/s12711-017-0307-4) contains supplementary material, which is available to authorized users.
Collapse
|
97
|
Le TH, Christensen OF, Nielsen B, Sahana G. Genome-wide association study for conformation traits in three Danish pig breeds. Genet Sel Evol 2017; 49:12. [PMID: 28118822 PMCID: PMC5259967 DOI: 10.1186/s12711-017-0289-2] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2016] [Accepted: 01/12/2017] [Indexed: 02/07/2023] Open
Abstract
Background Selection for sound conformation has been widely used as a primary approach to reduce lameness and leg weakness in pigs. Identification of genomic regions that affect conformation traits would help to improve selection accuracy for these lowly to moderately heritable traits. Our objective was to identify genetic factors that underlie leg and back conformation traits in three Danish pig breeds by performing a genome-wide association study followed by meta-analyses. Methods Data on four conformation traits (front leg, back, hind leg and overall conformation) for three Danish pig breeds (23,898 Landrace, 24,130 Yorkshire and 16,524 Duroc pigs) were used for association analyses. Estimated effects of single nucleotide polymorphisms (SNPs) from single-trait association analyses were combined in two meta-analyses: (1) a within-breed meta-analysis for multiple traits to examine if there are pleiotropic genetic variants within a breed; and (2) an across-breed meta-analysis for a single trait to examine if the same quantitative trait loci (QTL) segregate across breeds. SNP annotation was implemented through Sus scrofa Build 10.2 on Ensembl to search for candidate genes. Results Among the 14, 12 and 13 QTL that were detected in the single-trait association analyses for the three breeds, the most significant SNPs explained 2, 2.3 and 11.4% of genetic variance for back quality in Landrace, overall conformation in Yorkshire and back quality in Duroc, respectively. Several candidate genes for these QTL were also identified, i.e. LRPPRC, WRAP73, VRTN and PPARD likely control conformation traits through the regulation of bone and muscle development, and IGF2BP2, GH1, CCND2 and MSH2 can have an influence through growth-related processes. Meta-analyses not only confirmed many significant SNPs from single-trait analyses with higher significance levels, but also detected several additional associated SNPs and suggested QTL with possible pleiotropic effects. Conclusions Our results imply that conformation traits are complex and may be partly controlled by genes that are involved in bone and skeleton development, muscle and fat metabolism, and growth processes. A reliable list of QTL and candidate genes was provided that can be used in fine-mapping and marker assisted selection to improve conformation traits in pigs. Electronic supplementary material The online version of this article (doi:10.1186/s12711-017-0289-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Thu H Le
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark. .,Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Uppsala, Sweden.
| | - Ole F Christensen
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark
| | - Bjarne Nielsen
- SEGES Pig Research Centre, Axeltorv, Copenhagen, Denmark
| | - Goutam Sahana
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark
| |
Collapse
|
98
|
Ni G, Cavero D, Fangmann A, Erbe M, Simianer H. Whole-genome sequence-based genomic prediction in laying chickens with different genomic relationship matrices to account for genetic architecture. Genet Sel Evol 2017; 49:8. [PMID: 28093063 PMCID: PMC5238523 DOI: 10.1186/s12711-016-0277-y] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2015] [Accepted: 12/05/2016] [Indexed: 11/10/2022] Open
Abstract
Background With the availability of next-generation sequencing technologies, genomic prediction based on whole-genome sequencing (WGS) data is now feasible in animal breeding schemes and was expected to lead to higher predictive ability, since such data may contain all genomic variants including causal mutations. Our objective was to compare prediction ability with high-density (HD) array data and WGS data in a commercial brown layer line with genomic best linear unbiased prediction (GBLUP) models using various approaches to weight single nucleotide polymorphisms (SNPs). Methods A total of 892 chickens from a commercial brown layer line were genotyped with 336 K segregating SNPs (array data) that included 157 K genic SNPs (i.e. SNPs in or around a gene). For these individuals, genome-wide sequence information was imputed based on data from re-sequencing runs of 25 individuals, leading to 5.2 million (M) imputed SNPs (WGS data), including 2.6 M genic SNPs. De-regressed proofs (DRP) for eggshell strength, feed intake and laying rate were used as quasi-phenotypic data in genomic prediction analyses. Four weighting factors for building a trait-specific genomic relationship matrix were investigated: identical weights, −(log10P) from genome-wide association study results, squares of SNP effects from random regression BLUP, and variable selection based weights (known as BLUP|GA). Predictive ability was measured as the correlation between DRP and direct genomic breeding values in five replications of a fivefold cross-validation. Results Averaged over the three traits, the highest predictive ability (0.366 ± 0.075) was obtained when only genic SNPs from WGS data were used. Predictive abilities with genic SNPs and all SNPs from HD array data were 0.361 ± 0.072 and 0.353 ± 0.074, respectively. Prediction with −(log10P) or squares of SNP effects as weighting factors for building a genomic relationship matrix or BLUP|GA did not increase accuracy, compared to that with identical weights, regardless of the SNP set used. Conclusions Our results show that little or no benefit was gained when using all imputed WGS data to perform genomic prediction compared to using HD array data regardless of the weighting factors tested. However, using only genic SNPs from WGS data had a positive effect on prediction ability. Electronic supplementary material The online version of this article (doi:10.1186/s12711-016-0277-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Guiyan Ni
- Animal Breeding and Genetics Group, Georg-August-Universität, Göttingen, Germany.
| | | | - Anna Fangmann
- Animal Breeding and Genetics Group, Georg-August-Universität, Göttingen, Germany
| | - Malena Erbe
- Animal Breeding and Genetics Group, Georg-August-Universität, Göttingen, Germany.,Institute for Animal Breeding, Bavarian State Research Centre for Agriculture, Grub, Germany
| | - Henner Simianer
- Animal Breeding and Genetics Group, Georg-August-Universität, Göttingen, Germany
| |
Collapse
|
99
|
Veerkamp RF, Bouwman AC, Schrooten C, Calus MPL. Genomic prediction using preselected DNA variants from a GWAS with whole-genome sequence data in Holstein-Friesian cattle. Genet Sel Evol 2016; 48:95. [PMID: 27905878 PMCID: PMC5134274 DOI: 10.1186/s12711-016-0274-1] [Citation(s) in RCA: 67] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2016] [Accepted: 11/24/2016] [Indexed: 11/10/2022] Open
Abstract
Background Whole-genome sequence data is expected to capture genetic variation more completely than common genotyping panels. Our objective was to compare the proportion of variance explained and the accuracy of genomic prediction by using imputed sequence data or preselected SNPs from a genome-wide association study (GWAS) with imputed whole-genome sequence data. Methods Phenotypes were available for 5503 Holstein–Friesian bulls. Genotypes were imputed up to whole-genome sequence (13,789,029 segregating DNA variants) by using run 4 of the 1000 bull genomes project. The program GCTA was used to perform GWAS for protein yield (PY), somatic cell score (SCS) and interval from first to last insemination (IFL). From the GWAS, subsets of variants were selected and genomic relationship matrices (GRM) were used to estimate the variance explained in 2087 validation animals and to evaluate the genomic prediction ability. Finally, two GRM were fitted together in several models to evaluate the effect of selected variants that were in competition with all the other variants. Results The GRM based on full sequence data explained only marginally more genetic variation than that based on common SNP panels: for PY, SCS and IFL, genomic heritability improved from 0.81 to 0.83, 0.83 to 0.87 and 0.69 to 0.72, respectively. Sequence data also helped to identify more variants linked to quantitative trait loci and resulted in clearer GWAS peaks across the genome. The proportion of total variance explained by the selected variants combined in a GRM was considerably smaller than that explained by all variants (less than 0.31 for all traits). When selected variants were used, accuracy of genomic predictions decreased and bias increased. Conclusions Although 35 to 42 variants were detected that together explained 13 to 19% of the total variance (18 to 23% of the genetic variance) when fitted alone, there was no advantage in using dense sequence information for genomic prediction in the Holstein data used in our study. Detection and selection of variants within a single breed are difficult due to long-range linkage disequilibrium. Stringent selection of variants resulted in more biased genomic predictions, although this might be due to the training population being the same dataset from which the selected variants were identified. Electronic supplementary material The online version of this article (doi:10.1186/s12711-016-0274-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Roel F Veerkamp
- Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands. .,Department of Animal and Aquacultural Sciences, Norwegian University of Life Sciences, P.O. Box 5003, 1432, Ås, Norway.
| | - Aniek C Bouwman
- Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands
| | | | - Mario P L Calus
- Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands
| |
Collapse
|
100
|
van den Berg I, Boichard D, Lund MS. Sequence variants selected from a multi-breed GWAS can improve the reliability of genomic predictions in dairy cattle. Genet Sel Evol 2016; 48:83. [PMID: 27809758 PMCID: PMC5095991 DOI: 10.1186/s12711-016-0259-0] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2015] [Accepted: 10/19/2016] [Indexed: 01/01/2023] Open
Abstract
Background Sequence data can potentially increase the reliability of genomic predictions, because such data include causative mutations instead of relying on linkage disequilibrium (LD) between causative mutations and prediction variants. However, the location of the causative mutations is not known, and the presence of many variants that are in low LD with the causative mutations may reduce prediction reliability. Our objective was to investigate whether the use of variants at quantitative trait loci (QTL) that are identified in a multi-breed genome-wide association study (GWAS) for milk, fat and protein yield would increase the reliability of within- and multi-breed genomic predictions in Holstein, Jersey and Danish Red cattle. A wide range of scenarios that test different strategies to select prediction markers, for both within-breed and multi-breed prediction, were compared. Results For all breeds and traits, the use of variants selected from a multi-breed GWAS resulted in substantial increases in prediction reliabilities compared to within-breed prediction using a 50 K SNP array. Reliabilities depended highly on the choice of the prediction markers, and the scenario that led to the highest reliability varied between breeds and traits. While genomic correlations across breeds were low for genome-wide sequence variants, the effects of the QTL variants that yielded the highest reliabilities were highly correlated across breeds. Conclusions Our results show that the use of sequence variants, which are located near peaks of QTL that are detected in a multi-breed GWAS, can increase reliability of genomic predictions. Electronic supplementary material The online version of this article (doi:10.1186/s12711-016-0259-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Irene van den Berg
- Department of Molecular Biology and Genetics, Faculty of Science and Technology, Center for Quantitative Genetics and Genomics, Aarhus University, 8830, Tjele, Denmark. .,GABI, INRA, AgroParisTech, Université Paris Saclay, 78350, Jouy-en-Josas, France.
| | - Didier Boichard
- GABI, INRA, AgroParisTech, Université Paris Saclay, 78350, Jouy-en-Josas, France
| | - Mogens S Lund
- Department of Molecular Biology and Genetics, Faculty of Science and Technology, Center for Quantitative Genetics and Genomics, Aarhus University, 8830, Tjele, Denmark
| |
Collapse
|