1
|
Nilson SM, Burke JM, Murdoch BM, Morgan JLM, Lewis RM. Pedigree diversity and implications for genetic selection of Katahdin sheep. J Anim Breed Genet 2024; 141:304-316. [PMID: 38108572 DOI: 10.1111/jbg.12842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 11/30/2023] [Accepted: 12/02/2023] [Indexed: 12/19/2023]
Abstract
The Katahdin hair breed gained popularity in the United States as low input and prolific, with a propensity to exhibit parasite resistance. With the introduction of genomically enhanced estimated breeding values (GEBV) to the Katahdin genetic evaluation, defining the diversity present in the breed is pertinent. Utilizing pedigree records (n = 92,030) from 1984 to 2019 from the National Sheep Improvement Program, our objectives were to (i) estimate the completeness and quality of the pedigree, (ii) calculate diversity statistics for the whole pedigree and relevant reference subpopulations and (iii) assess the impact of current diversity on genomic selection. Reference 1 was Katahdins born from 2017 to 2019 (n = 23,494), while reference 2 was a subset with at least three generations of Katahdin ancestry (n = 9327). The completeness of the whole pedigree, and the pedigrees of reference 1 and reference 2, were above 50% through the fourth, fifth and seventh generation of ancestors, respectively. Effective population size (Ne) averaged 111 animals with a range from 42.2 to 451.0. The average generation interval was 2.9 years for the whole pedigree and reference 1, and 2.8 years for reference 2. The mean individual inbreeding and average relatedness coefficients were 1.62% and 0.91%, 1.74% and 0.90% and 2.94% and 1.46% for the whole pedigree, reference 1, and reference 2, respectively. There were over 300 effective founders in the whole pedigree and reference 1, with 169 in reference 2. Effective number of ancestors were over 150 for the whole pedigree and reference 1, while there were 67 for reference 2. Prediction accuracies increased as the reference population grew from 1k to 7.5k and plateaued at 15k animals. Given the large number of founders and ancestors contributing to the base genetic variation in the breed, the Ne is sufficient to maintain diversity while achieving progress with selection. Stable low rates of inbreeding and relatedness suggest that incorporating genetic conservation in breeding decisions is currently not of high priority. Current Ne suggests that with limited genotyping, high levels of accuracy for genomic prediction can be achieved. However, intense selection on GEBV may cause loss of genetic diversity long term.
Collapse
Affiliation(s)
- Sara M Nilson
- Department of Animal Science, University of Nebraska-Lincoln, Lincoln, Nebraska, USA
| | - Joan M Burke
- USDA, ARS, Dale Bumpers Small Farms Research Center, Booneville, Arkansas, USA
| | - Brenda M Murdoch
- Department of Animal, Veterinary and Food Science, University of Idaho, Moscow, Idaho, USA
| | | | - Ronald M Lewis
- Department of Animal Science, University of Nebraska-Lincoln, Lincoln, Nebraska, USA
| |
Collapse
|
2
|
Alemu A, Åstrand J, Montesinos-López OA, Isidro Y Sánchez J, Fernández-Gónzalez J, Tadesse W, Vetukuri RR, Carlsson AS, Ceplitis A, Crossa J, Ortiz R, Chawade A. Genomic selection in plant breeding: Key factors shaping two decades of progress. MOLECULAR PLANT 2024; 17:552-578. [PMID: 38475993 DOI: 10.1016/j.molp.2024.03.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 01/22/2024] [Accepted: 03/08/2024] [Indexed: 03/14/2024]
Abstract
Genomic selection, the application of genomic prediction (GP) models to select candidate individuals, has significantly advanced in the past two decades, effectively accelerating genetic gains in plant breeding. This article provides a holistic overview of key factors that have influenced GP in plant breeding during this period. We delved into the pivotal roles of training population size and genetic diversity, and their relationship with the breeding population, in determining GP accuracy. Special emphasis was placed on optimizing training population size. We explored its benefits and the associated diminishing returns beyond an optimum size. This was done while considering the balance between resource allocation and maximizing prediction accuracy through current optimization algorithms. The density and distribution of single-nucleotide polymorphisms, level of linkage disequilibrium, genetic complexity, trait heritability, statistical machine-learning methods, and non-additive effects are the other vital factors. Using wheat, maize, and potato as examples, we summarize the effect of these factors on the accuracy of GP for various traits. The search for high accuracy in GP-theoretically reaching one when using the Pearson's correlation as a metric-is an active research area as yet far from optimal for various traits. We hypothesize that with ultra-high sizes of genotypic and phenotypic datasets, effective training population optimization methods and support from other omics approaches (transcriptomics, metabolomics and proteomics) coupled with deep-learning algorithms could overcome the boundaries of current limitations to achieve the highest possible prediction accuracy, making genomic selection an effective tool in plant breeding.
Collapse
Affiliation(s)
- Admas Alemu
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden.
| | - Johanna Åstrand
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden; Lantmännen Lantbruk, Svalöv, Sweden
| | | | - Julio Isidro Y Sánchez
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223 Madrid, Spain
| | - Javier Fernández-Gónzalez
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223 Madrid, Spain
| | - Wuletaw Tadesse
- International Center for Agricultural Research in the Dry Areas (ICARDA), Rabat, Morocco
| | - Ramesh R Vetukuri
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden
| | - Anders S Carlsson
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden
| | | | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera México-Veracruz, Texcoco, México 52640, Mexico
| | - Rodomiro Ortiz
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden.
| | - Aakash Chawade
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden
| |
Collapse
|
3
|
Fernández-González J, Haquin B, Combes E, Bernard K, Allard A, Isidro Y Sánchez J. Maximizing efficiency in sunflower breeding through historical data optimization. PLANT METHODS 2024; 20:42. [PMID: 38493115 PMCID: PMC10943787 DOI: 10.1186/s13007-024-01151-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 01/30/2024] [Indexed: 03/18/2024]
Abstract
Genomic selection (GS) has become an increasingly popular tool in plant breeding programs, propelled by declining genotyping costs, an increase in computational power, and rediscovery of the best linear unbiased prediction methodology over the past two decades. This development has led to an accumulation of extensive historical datasets with genotypic and phenotypic information, triggering the question of how to best utilize these datasets. Here, we investigate whether all available data or a subset should be used to calibrate GS models for across-year predictions in a 7-year dataset of a commercial hybrid sunflower breeding program. We employed a multi-objective optimization approach to determine the ideal years to include in the training set (TRS). Next, for a given combination of TRS years, we further optimized the TRS size and its genetic composition. We developed the Min_GRM size optimization method which consistently found the optimal TRS size, reducing dimensionality by 20% with an approximately 1% loss in predictive ability. Additionally, the Tails_GEGVs algorithm displayed potential, outperforming the use of all data by using just 60% of it for grain yield, a high-complexity, low-heritability trait. Moreover, maximizing the genetic diversity of the TRS resulted in a consistent predictive ability across the entire range of genotypic values in the test set. Interestingly, the Tails_GEGVs algorithm, due to its ability to leverage heterogeneity, enhanced predictive performance for key hybrids with extreme genotypic values. Our study provides new insights into the optimal utilization of historical data in plant breeding programs, resulting in improved GS model predictive ability.
Collapse
Affiliation(s)
- Javier Fernández-González
- Centro de Biotecnologia y Genómica de Plantas (CBGP, UPM-INIA)-Instituto Nacional de Investigación y Tecnologia Agraria y Alimentaria (INIA), Universidad Politécnica de Madrid (UPM), Campus de Montegancedo-UPM, Pozuelo de Alarcón, Madrid, 28223, Spain.
| | | | | | | | | | - Julio Isidro Y Sánchez
- Centro de Biotecnologia y Genómica de Plantas (CBGP, UPM-INIA)-Instituto Nacional de Investigación y Tecnologia Agraria y Alimentaria (INIA), Universidad Politécnica de Madrid (UPM), Campus de Montegancedo-UPM, Pozuelo de Alarcón, Madrid, 28223, Spain.
| |
Collapse
|
4
|
Schneider H, Krizanac AM, Falker-Gieske C, Heise J, Tetens J, Thaller G, Bennewitz J. Genomic dissection of the correlation between milk yield and various health traits using functional and evolutionary information about imputed sequence variants of 34,497 German Holstein cows. BMC Genomics 2024; 25:265. [PMID: 38461236 DOI: 10.1186/s12864-024-10115-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Accepted: 02/13/2024] [Indexed: 03/11/2024] Open
Abstract
BACKGROUND Over the last decades, it was subject of many studies to investigate the genomic connection of milk production and health traits in dairy cattle. Thereby, incorporating functional information in genomic analyses has been shown to improve the understanding of biological and molecular mechanisms shaping complex traits and the accuracies of genomic prediction, especially in small populations and across-breed settings. Still, little is known about the contribution of different functional and evolutionary genome partitioning subsets to milk production and dairy health. Thus, we performed a uni- and a bivariate analysis of milk yield (MY) and eight health traits using a set of ~34,497 German Holstein cows with 50K chip genotypes and ~17 million imputed sequence variants divided into 27 subsets depending on their functional and evolutionary annotation. In the bivariate analysis, eight trait-combinations were observed that contrasted MY with each health trait. Two genomic relationship matrices (GRM) were included, one consisting of the 50K chip variants and one consisting of each set of subset variants, to obtain subset heritabilities and genetic correlations. In addition, 50K chip heritabilities and genetic correlations were estimated applying merely the 50K GRM. RESULTS In general, 50K chip heritabilities were larger than the subset heritabilities. The largest heritabilities were found for MY, which was 0.4358 for the 50K and 0.2757 for the subset heritabilities. Whereas all 50K genetic correlations were negative, subset genetic correlations were both, positive and negative (ranging from -0.9324 between MY and mastitis to 0.6662 between MY and digital dermatitis). The subsets containing variants which were annotated as noncoding related, splice sites, untranslated regions, metabolic quantitative trait loci, and young variants ranked highest in terms of their contribution to the traits` genetic variance. We were able to show that linkage disequilibrium between subset variants and adjacent variants did not cause these subsets` high effect. CONCLUSION Our results confirm the connection of milk production and health traits in dairy cattle via the animals` metabolic state. In addition, they highlight the potential of including functional information in genomic analyses, which helps to dissect the extent and direction of the observed traits` connection in more detail.
Collapse
Affiliation(s)
- Helen Schneider
- Institute of Animal Science, University of Hohenheim, 70599, Stuttgart, Germany.
| | - Ana-Marija Krizanac
- Department of Animal Sciences, University of Göttingen, 37077, Göttingen, Germany
| | | | - Johannes Heise
- Vereinigte Informationssysteme Tierhaltung w.V. (VIT), 27283, Verden, Germany
| | - Jens Tetens
- Department of Animal Sciences, University of Göttingen, 37077, Göttingen, Germany
| | - Georg Thaller
- Institute of Animal Breeding and Husbandry, Christian-Albrechts University of Kiel, 24098, Kiel, Germany
| | - Jörn Bennewitz
- Institute of Animal Science, University of Hohenheim, 70599, Stuttgart, Germany
| |
Collapse
|
5
|
Fernández-González J, Akdemir D, Isidro Y Sánchez J. A comparison of methods for training population optimization in genomic selection. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2023; 136:30. [PMID: 36892603 PMCID: PMC9998580 DOI: 10.1007/s00122-023-04265-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Accepted: 11/21/2022] [Indexed: 06/18/2023]
Abstract
Maximizing CDmean and Avg_GRM_self were the best criteria for training set optimization. A training set size of 50-55% (targeted) or 65-85% (untargeted) is needed to obtain 95% of the accuracy. With the advent of genomic selection (GS) as a widespread breeding tool, mechanisms to efficiently design an optimal training set for GS models became more relevant, since they allow maximizing the accuracy while minimizing the phenotyping costs. The literature described many training set optimization methods, but there is a lack of a comprehensive comparison among them. This work aimed to provide an extensive benchmark among optimization methods and optimal training set size by testing a wide range of them in seven datasets, six different species, different genetic architectures, population structure, heritabilities, and with several GS models to provide some guidelines about their application in breeding programs. Our results showed that targeted optimization (uses information from the test set) performed better than untargeted (does not use test set data), especially when heritability was low. The mean coefficient of determination was the best targeted method, although it was computationally intensive. Minimizing the average relationship within the training set was the best strategy for untargeted optimization. Regarding the optimal training set size, maximum accuracy was obtained when the training set was the entire candidate set. Nevertheless, a 50-55% of the candidate set was enough to reach 95-100% of the maximum accuracy in the targeted scenario, while we needed a 65-85% for untargeted optimization. Our results also suggested that a diverse training set makes GS robust against population structure, while including clustering information was less effective. The choice of the GS model did not have a significant influence on the prediction accuracies.
Collapse
Affiliation(s)
- Javier Fernández-González
- Centro de Biotecnologia y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnologia Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223, Madrid, Spain.
| | - Deniz Akdemir
- CIBMTR (Center for International Blood and Marrow Transplant Research), National Marrow Donor Program/Be The Match, Minneapolis, USA
| | - Julio Isidro Y Sánchez
- Centro de Biotecnologia y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnologia Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223, Madrid, Spain.
| |
Collapse
|
6
|
Angarita Barajas BK, Cantet RJC, Steibel JP, Schrauf MF, Forneris NS. Heritability estimates and predictive ability for pig meat quality traits using identity-by-state and identity-by-descent relationships in an F 2 population. J Anim Breed Genet 2023; 140:13-27. [PMID: 36300585 DOI: 10.1111/jbg.12742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Accepted: 10/05/2022] [Indexed: 12/13/2022]
Abstract
Genomic relationships can be computed with dense genome-wide genotypes through different methods, either based on identity-by-state (IBS) or identity-by-descent (IBD). The latter has been shown to increase the accuracy of both estimated relationships and predicted breeding values. However, it is not clear whether an IBD approach would achieve greater heritability ( h 2 ) and predictive ability ( r ̂ y , y ̂ ) than its IBS counterpart for data with low-depth pedigrees. Here, we compare both approaches in terms of the estimated of h 2 and r ̂ y , y ̂ , using data on meat quality and carcass traits recorded in experimental crossbred pigs, with a pedigree constrained to only three generations. Three animal models were fitted which differed on the relationship matrix: an IBS model ( G IBS ), an IBD (defined within the known pedigree) model ( G IBD ), and a pedigree model ( A 22 ). In 9 of 20 traits, the range of increase for the estimates of σ u 2 and h 2 was 1.2-2.9 times greater with G IBS and G IBD models than with A 22 . Whereas for all traits, both parameters were similar between genomic models. The r ̂ y , y ̂ of the genomic models was higher compared to A 22 . A scarce increment in r ̂ y , y ̂ was found with G IBS when compared to G IBD , most likely due to the former recovering sizeable relationships among founder F0 animals.
Collapse
Affiliation(s)
| | - Rodolfo J C Cantet
- Instituto de Investigaciones en Producción Animal (INPA-CONICET-UBA), Buenos Aires, Argentina.,Departamento de Producción Animal, Facultad de Agronomía, Universidad de Buenos Aires, Buenos Aires, Argentina
| | - Juan P Steibel
- Department of Animal Science, Michigan State University, East Lansing, Michigan, USA.,Department of Fisheries and Wildlife, Michigan State University, East Lansing, Michigan, USA
| | - Matias F Schrauf
- Departamento de Métodos Cuantitativos y Sistemas de Información, Facultad de Agronomía, Universidad de Buenos Aires, Buenos Aires, Argentina.,Animal Breeding & Genomics, Wageningen Livestock Research, Wageningen University & Research, Wageningen, The Netherlands
| | - Natalia S Forneris
- Instituto de Investigaciones en Producción Animal (INPA-CONICET-UBA), Buenos Aires, Argentina.,Departamento de Producción Animal, Facultad de Agronomía, Universidad de Buenos Aires, Buenos Aires, Argentina
| |
Collapse
|
7
|
Wang B, Li P, Hou L, Zhou W, Tao W, Liu C, Liu K, Niu P, Zhang Z, Li Q, Su G, Huang R. Genome‐wide association study and genomic prediction for intramuscular fat content in Suhuai pigs using imputed whole‐genome sequencing data. Evol Appl 2022; 15:2054-2066. [DOI: 10.1111/eva.13496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 08/22/2022] [Accepted: 10/04/2022] [Indexed: 11/29/2022] Open
Affiliation(s)
- Binbin Wang
- Key Laboratory in Nanjing for Evaluation and Utilization of Pigs Resources Ministry of Agriculture and Rural Areas of China, Institute of Swine Science, Nanjing Agricultural University Nanjing China
- Center for Quantitative Genetics and Genomics Aarhus University Aarhus Denmark
- Huaian Academy Nanjing Agricultural University China
| | - Pinghua Li
- Key Laboratory in Nanjing for Evaluation and Utilization of Pigs Resources Ministry of Agriculture and Rural Areas of China, Institute of Swine Science, Nanjing Agricultural University Nanjing China
- Huaian Academy Nanjing Agricultural University China
| | - Liming Hou
- Key Laboratory in Nanjing for Evaluation and Utilization of Pigs Resources Ministry of Agriculture and Rural Areas of China, Institute of Swine Science, Nanjing Agricultural University Nanjing China
- Huaian Academy Nanjing Agricultural University China
| | - Wuduo Zhou
- Key Laboratory in Nanjing for Evaluation and Utilization of Pigs Resources Ministry of Agriculture and Rural Areas of China, Institute of Swine Science, Nanjing Agricultural University Nanjing China
| | - Wei Tao
- Key Laboratory in Nanjing for Evaluation and Utilization of Pigs Resources Ministry of Agriculture and Rural Areas of China, Institute of Swine Science, Nanjing Agricultural University Nanjing China
- Huaian Academy Nanjing Agricultural University China
| | - Chenxi Liu
- Key Laboratory in Nanjing for Evaluation and Utilization of Pigs Resources Ministry of Agriculture and Rural Areas of China, Institute of Swine Science, Nanjing Agricultural University Nanjing China
- Huaian Academy Nanjing Agricultural University China
| | - Kaiyue Liu
- Key Laboratory in Nanjing for Evaluation and Utilization of Pigs Resources Ministry of Agriculture and Rural Areas of China, Institute of Swine Science, Nanjing Agricultural University Nanjing China
- Huaian Academy Nanjing Agricultural University China
| | - Peipei Niu
- Huaian Academy Nanjing Agricultural University China
| | | | - Qiang Li
- Huaiyin Xinhuai Pig Breeding Farm of Huaian City China
| | - Guosheng Su
- Center for Quantitative Genetics and Genomics Aarhus University Aarhus Denmark
| | - Ruihua Huang
- Key Laboratory in Nanjing for Evaluation and Utilization of Pigs Resources Ministry of Agriculture and Rural Areas of China, Institute of Swine Science, Nanjing Agricultural University Nanjing China
- Huaian Academy Nanjing Agricultural University China
| |
Collapse
|
8
|
Mancin E, Mota LFM, Tuliozi B, Verdiglione R, Mantovani R, Sartori C. Improvement of Genomic Predictions in Small Breeds by Construction of Genomic Relationship Matrix Through Variable Selection. Front Genet 2022; 13:814264. [PMID: 35664297 PMCID: PMC9158133 DOI: 10.3389/fgene.2022.814264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2021] [Accepted: 03/22/2022] [Indexed: 11/13/2022] Open
Abstract
Genomic selection has been increasingly implemented in the animal breeding industry, and it is becoming a routine method in many livestock breeding contexts. However, its use is still limited in several small-population local breeds, which are, nonetheless, an important source of genetic variability of great economic value. A major roadblock for their genomic selection is accuracy when population size is limited: to improve breeding value accuracy, variable selection models that assume heterogenous variance have been proposed over the last few years. However, while these models might outperform traditional and genomic predictions in terms of accuracy, they also carry a proportional increase of breeding value bias and dispersion. These mutual increases are especially striking when genomic selection is performed with a low number of phenotypes and high shrinkage value—which is precisely the situation that happens with small local breeds. In our study, we tested several alternative methods to improve the accuracy of genomic selection in a small population. First, we investigated the impact of using only a subset of informative markers regarding prediction accuracy, bias, and dispersion. We used different algorithms to select them, such as recursive feature eliminations, penalized regression, and XGBoost. We compared our results with the predictions of pedigree-based BLUP, single-step genomic BLUP, and weighted single-step genomic BLUP in different simulated populations obtained by combining various parameters in terms of number of QTLs and effective population size. We also investigated these approaches on a real data set belonging to the small local Rendena breed. Our results show that the accuracy of GBLUP in small-sized populations increased when performed with SNPs selected via variable selection methods both in simulated and real data sets. In addition, the use of variable selection models—especially those using XGBoost—in our real data set did not impact bias and the dispersion of estimated breeding values. We have discussed possible explanations for our results and how our study can help estimate breeding values for future genomic selection in small breeds.
Collapse
Affiliation(s)
- Enrico Mancin
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padua, Legnaro, Italy
| | - Lucio Flavio Macedo Mota
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padua, Legnaro, Italy
| | - Beniamino Tuliozi
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padua, Legnaro, Italy
| | - Rina Verdiglione
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padua, Legnaro, Italy
| | - Roberto Mantovani
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padua, Legnaro, Italy
| | - Cristina Sartori
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padua, Legnaro, Italy
| |
Collapse
|
9
|
Dzievit MJ, Guo T, Li X, Yu J. Comprehensive analytical and empirical evaluation of genomic prediction across diverse accessions in maize. THE PLANT GENOME 2021; 14:e20160. [PMID: 34661990 DOI: 10.1002/tpg2.20160] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/08/2021] [Accepted: 08/23/2021] [Indexed: 06/13/2023]
Abstract
Efficiently exploiting natural genetic diversity captured by accessions stored in genebanks is crucial to genetic improvement of major crops. Selecting accessions of interest from genebanks has traditionally required information from extensive and expensive evaluation; however, low-cost genotyping combined with genomic prediction have enabled us to generate predicted genetic merits for the entire set with targeted phenotypic evaluation of representative subsets. To explore this general approach, analytical assessment and empirical validation of the maize (Zea mays L.) association population (MAP) as a training population were conducted in the present study. Cross-validation within the MAP revealed mostly modest to strong predictive ability for 36 traits, generally in parallel with the square root of heritability. The MAP was then used to train the prediction models to generate genomic estimated breeding values (GEBVs) for the Ames Diversity Panel. Empirical validation conducted for nine traits across two validation populations confirmed the accuracy level indicated by the cross-validation of the training population. An upper bound for reliability (U value) was calculated for the accessions in the prediction population using genotypic data. The group of accessions with high U values generally had high predictive ability, even though the range of observed trait values was similar to the group of accessions with low U values. Our comprehensive analysis validated the general approach of turbocharging genebanks with genomics and genomic prediction. In addition, breeders and researchers can consider both GEBVs and U values to balance the needs of improving specific traits and broadening genetic diversity when selecting accessions from genebanks.
Collapse
Affiliation(s)
| | - Tingting Guo
- Dep. of Agronomy, Iowa State Univ., Ames, IA, 50011, USA
| | - Xianran Li
- Dep. of Agronomy, Iowa State Univ., Ames, IA, 50011, USA
| | - Jianming Yu
- Dep. of Agronomy, Iowa State Univ., Ames, IA, 50011, USA
| |
Collapse
|
10
|
Vojgani E, Pook T, Martini JWR, Hölker AC, Mayer M, Schön CC, Simianer H. Accounting for epistasis improves genomic prediction of phenotypes with univariate and bivariate models across environments. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2021; 134:2913-2930. [PMID: 34115154 PMCID: PMC8354961 DOI: 10.1007/s00122-021-03868-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Accepted: 05/24/2021] [Indexed: 06/12/2023]
Abstract
The accuracy of genomic prediction of phenotypes can be increased by including the top-ranked pairwise SNP interactions into the prediction model. We compared the predictive ability of various prediction models for a maize dataset derived from 910 doubled haploid lines from two European landraces (Kemater Landmais Gelb and Petkuser Ferdinand Rot), which were tested at six locations in Germany and Spain. The compared models were Genomic Best Linear Unbiased Prediction (GBLUP) as an additive model, Epistatic Random Regression BLUP (ERRBLUP) accounting for all pairwise SNP interactions, and selective Epistatic Random Regression BLUP (sERRBLUP) accounting for a selected subset of pairwise SNP interactions. These models have been compared in both univariate and bivariate statistical settings for predictions within and across environments. Our results indicate that modeling all pairwise SNP interactions into the univariate/bivariate model (ERRBLUP) is not superior in predictive ability to the respective additive model (GBLUP). However, incorporating only a selected subset of interactions with the highest effect variances in univariate/bivariate sERRBLUP can increase predictive ability significantly compared to the univariate/bivariate GBLUP. Overall, bivariate models consistently outperform univariate models in predictive ability. Across all studied traits, locations and landraces, the increase in prediction accuracy from univariate GBLUP to univariate sERRBLUP ranged from 5.9 to 112.4 percent, with an average increase of 47 percent. For bivariate models, the change ranged from -0.3 to + 27.9 percent comparing the bivariate sERRBLUP to the bivariate GBLUP, with an average increase of 11 percent. This considerable increase in predictive ability achieved by sERRBLUP may be of interest for "sparse testing" approaches in which only a subset of the lines/hybrids of interest is observed at each location.
Collapse
Affiliation(s)
- Elaheh Vojgani
- Center for Integrated Breeding Research, Animal Breeding and Genetics Group, University of Goettingen, Goettingen, Germany.
| | - Torsten Pook
- Center for Integrated Breeding Research, Animal Breeding and Genetics Group, University of Goettingen, Goettingen, Germany
| | - Johannes W R Martini
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, State of Mexico, Mexico
| | - Armin C Hölker
- Plant Breeding, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| | - Manfred Mayer
- Plant Breeding, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| | - Chris-Carolin Schön
- Plant Breeding, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| | - Henner Simianer
- Center for Integrated Breeding Research, Animal Breeding and Genetics Group, University of Goettingen, Goettingen, Germany
| |
Collapse
|
11
|
McGaugh SE, Lorenz AJ, Flagel LE. The utility of genomic prediction models in evolutionary genetics. Proc Biol Sci 2021; 288:20210693. [PMID: 34344180 PMCID: PMC8334854 DOI: 10.1098/rspb.2021.0693] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Accepted: 07/15/2021] [Indexed: 12/25/2022] Open
Abstract
Variation in complex traits is the result of contributions from many loci of small effect. Based on this principle, genomic prediction methods are used to make predictions of breeding value for an individual using genome-wide molecular markers. In breeding, genomic prediction models have been used in plant and animal breeding for almost two decades to increase rates of genetic improvement and reduce the length of artificial selection experiments. However, evolutionary genomics studies have been slow to incorporate this technique to select individuals for breeding in a conservation context or to learn more about the genetic architecture of traits, the genetic value of missing individuals or microevolution of breeding values. Here, we outline the utility of genomic prediction and provide an overview of the methodology. We highlight opportunities to apply genomic prediction in evolutionary genetics of wild populations and the best practices when using these methods on field-collected phenotypes.
Collapse
Affiliation(s)
- Suzanne E. McGaugh
- Ecology, Evolution, and Behavior, University of Minnesota, 140 Gortner Lab, 1479 Gortner Avenue, Saint Paul, MN 55108, USA
| | - Aaron J. Lorenz
- Agronomy and Plant Genetics, University of Minnesota, 411 Borlaug Hall, 1991 Upper Buford Circle, Saint Paul, MN 55108, USA
| | - Lex E. Flagel
- Plant and Microbial Biology, University of Minnesota, 140 Gortner Lab, 1479 Gortner Avenue, Saint Paul, MN 55108, USA
- Bayer Crop Science, 700 W Chesterfield Parkway, Chesterfield, MO 63017, USA
| |
Collapse
|
12
|
Dekkers JCM, Su H, Cheng J. Predicting the accuracy of genomic predictions. Genet Sel Evol 2021; 53:55. [PMID: 34187354 PMCID: PMC8244147 DOI: 10.1186/s12711-021-00647-w] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Accepted: 06/11/2021] [Indexed: 11/22/2022] Open
Abstract
Background Mathematical models are needed for the design of breeding programs using genomic prediction. While deterministic models for selection on pedigree-based estimates of breeding values (PEBV) are available, these have not been fully developed for genomic selection, with a key missing component being the accuracy of genomic EBV (GEBV) of selection candidates. Here, a deterministic method was developed to predict this accuracy within a closed breeding population based on the accuracy of GEBV and PEBV in the reference population and the distance of selection candidates from their closest ancestors in the reference population. Methods The accuracy of GEBV was modeled as a combination of the accuracy of PEBV and of EBV based on genomic relationships deviated from pedigree (DEBV). Loss of the accuracy of DEBV from the reference to the target population was modeled based on the effective number of independent chromosome segments in the reference population (Me). Measures of Me derived from the inverse of the variance of relationships and from the accuracies of GEBV and PEBV in the reference population, derived using either a Fisher information or a selection index approach, were compared by simulation. Results Using simulation, both the Fisher and the selection index approach correctly predicted accuracy in the target population over time, both with and without selection. The index approach, however, resulted in estimates of Me that were less affected by heritability, reference size, and selection, and which are, therefore, more appropriate as a population parameter. The variance of relationships underpredicted Me and was greatly affected by selection. A leave-one-out cross-validation approach was proposed to estimate required accuracies of EBV in the reference population. Aspects of the methods were validated using real data. Conclusions A deterministic method was developed to predict the accuracy of GEBV in selection candidates in a closed breeding population. The population parameter Me that is required for these predictions can be derived from an available reference data set, and applied to other reference data sets and traits for that population. This method can be used to evaluate the benefit of genomic prediction and to optimize genomic selection breeding programs. Supplementary Information The online version contains supplementary material available at 10.1186/s12711-021-00647-w.
Collapse
Affiliation(s)
- Jack C M Dekkers
- Department of Animal Science, Iowa State University, Ames, Iowa, USA.
| | - Hailin Su
- Department of Animal Science, Iowa State University, Ames, Iowa, USA
| | - Jian Cheng
- Department of Animal Science, Iowa State University, Ames, Iowa, USA
| |
Collapse
|
13
|
Salek Ardestani S, Jafarikia M, Sargolzaei M, Sullivan B, Miar Y. Genomic Prediction of Average Daily Gain, Back-Fat Thickness, and Loin Muscle Depth Using Different Genomic Tools in Canadian Swine Populations. Front Genet 2021; 12:665344. [PMID: 34149806 PMCID: PMC8209496 DOI: 10.3389/fgene.2021.665344] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2021] [Accepted: 04/15/2021] [Indexed: 12/12/2022] Open
Abstract
Improvement of prediction accuracy of estimated breeding values (EBVs) can lead to increased profitability for swine breeding companies. This study was performed to compare the accuracy of different popular genomic prediction methods and traditional best linear unbiased prediction (BLUP) for future performance of back-fat thickness (BFT), average daily gain (ADG), and loin muscle depth (LMD) in Canadian Duroc, Landrace, and Yorkshire swine breeds. In this study, 17,019 pigs were genotyped using Illumina 60K and Affymetrix 50K panels. After quality control and imputation steps, a total of 41,304, 48,580, and 49,102 single-nucleotide polymorphisms remained for Duroc (n = 6,649), Landrace (n = 5,362), and Yorkshire (n = 5,008) breeds, respectively. The breeding values of animals in the validation groups (n = 392–774) were predicted before performance test using BLUP, BayesC, BayesCπ, genomic BLUP (GBLUP), and single-step GBLUP (ssGBLUP) methods. The prediction accuracies were obtained using the correlation between the predicted breeding values and their deregressed EBVs (dEBVs) after performance test. The genomic prediction methods showed higher prediction accuracies than traditional BLUP for all scenarios. Although the accuracies of genomic prediction methods were not significantly (P > 0.05) different, ssGBLUP was the most accurate method for Duroc-ADG, Duroc-LMD, Landrace-BFT, Landrace-ADG, and Yorkshire-BFT scenarios, and BayesCπ was the most accurate method for Duroc-BFT, Landrace-LMD, and Yorkshire-ADG scenarios. Furthermore, BayesCπ method was the least biased method for Duroc-LMD, Landrace-BFT, Landrace-ADG, Yorkshire-BFT, and Yorkshire-ADG scenarios. Our findings can be beneficial for accelerating the genetic progress of BFT, ADG, and LMD in Canadian swine populations by selecting more accurate and unbiased genomic prediction methods.
Collapse
Affiliation(s)
| | - Mohsen Jafarikia
- Canadian Centre for Swine Improvement, Ottawa, ON, Canada.,Centre for Genetic Improvement of Livestock (CGIL), Department of Animal Biosciences, University of Guelph, Guelph, ON, Canada
| | - Mehdi Sargolzaei
- Department of Pathobiology, University of Guelph, Guelph, ON, Canada.,Select Sires Inc., Plain City, OH, United States
| | - Brian Sullivan
- Canadian Centre for Swine Improvement, Ottawa, ON, Canada
| | - Younes Miar
- Department of Animal Science and Aquaculture, Dalhousie University, Truro, NS, Canada
| |
Collapse
|
14
|
Karaman E, Su G, Croue I, Lund MS. Genomic prediction using a reference population of multiple pure breeds and admixed individuals. Genet Sel Evol 2021; 53:46. [PMID: 34058971 PMCID: PMC8168010 DOI: 10.1186/s12711-021-00637-y] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2020] [Accepted: 05/11/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In dairy cattle populations in which crossbreeding has been used, animals show some level of diversity in their origins. In rotational crossbreeding, for instance, crossbred dams are mated with purebred sires from different pure breeds, and the genetic composition of crossbred animals is an admixture of the breeds included in the rotation. How to use the data of such individuals in genomic evaluations is still an open question. In this study, we aimed at providing methodologies for the use of data from crossbred individuals with an admixed genetic background together with data from multiple pure breeds, for the purpose of genomic evaluations for both purebred and crossbred animals. A three-breed rotational crossbreeding system was mimicked using simulations based on animals genotyped with the 50 K single nucleotide polymorphism (SNP) chip. RESULTS For purebred populations, within-breed genomic predictions generally led to higher accuracies than those from multi-breed predictions using combined data of pure breeds. Adding admixed population's (MIX) data to the combined pure breed data considering MIX as a different breed led to higher accuracies. When prediction models were able to account for breed origin of alleles, accuracies were generally higher than those from combining all available data, depending on the correlation of quantitative trait loci (QTL) effects between the breeds. Accuracies varied when using SNP effects from any of the pure breeds to predict the breeding values of MIX. Using those breed-specific SNP effects that were estimated separately in each pure breed, while accounting for breed origin of alleles for the selection candidates of MIX, generally improved the accuracies. Models that are able to accommodate MIX data with the breed origin of alleles approach generally led to higher accuracies than models without breed origin of alleles, depending on the correlation of QTL effects between the breeds. CONCLUSIONS Combining all available data, pure breeds' and admixed population's data, in a multi-breed reference population is beneficial for the estimation of breeding values for pure breeds with a small reference population. For MIX, such an approach can lead to higher accuracies than considering breed origin of alleles for the selection candidates, and using breed-specific SNP effects estimated separately in each pure breed. Including MIX data in the reference population of multiple breeds by considering the breed origin of alleles, accuracies can be further improved. Our findings are relevant for breeding programs in which crossbreeding is systematically applied, and also for populations that involve different subpopulations and between which exchange of genetic material is routine practice.
Collapse
Affiliation(s)
- Emre Karaman
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830, Tjele, Denmark.
| | - Guosheng Su
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830, Tjele, Denmark
| | | | - Mogens S Lund
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830, Tjele, Denmark
| |
Collapse
|
15
|
Cesarani A, Biffani S, Garcia A, Lourenco D, Bertolini G, Neglia G, Misztal I, Macciotta NPP. Genomic investigation of milk production in Italian buffalo. ITALIAN JOURNAL OF ANIMAL SCIENCE 2021. [DOI: 10.1080/1828051x.2021.1902404] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Affiliation(s)
- Alberto Cesarani
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, USA
| | - Stefano Biffani
- Consiglio Nazionale delle Ricerche (CNR), Istituto di biologia e biotecnologia agraria (IBBA), Milano, Italy
| | - Andre Garcia
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, USA
| | - Daniela Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, USA
| | - Giacomo Bertolini
- Associazione Nazionale Allevatori Specie Bufalina (ANASB), Caserta, Italy
| | - Gianluca Neglia
- Dipartimento di Medicina Veterinaria e Produzioni Animali, II University of Naples, Napoli, Italy
| | - Ignacy Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, USA
| | | |
Collapse
|
16
|
Vojgani E, Pook T, Simianer H. Phenotype Prediction Under Epistasis. Methods Mol Biol 2021; 2212:105-120. [PMID: 33733353 DOI: 10.1007/978-1-0716-0947-7_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/13/2023]
Abstract
Reliable methods of phenotype prediction from genomic data play an increasingly important role in many areas of plant and animal breeding. Thus, developing methods that enhance prediction accuracy is of major interest. Here, we provide three methods for this purpose: (1) Genomic Best Linear Unbiased Prediction (GBLUP) as a model just accounting for additive SNP effects; (2) Epistatic Random Regression BLUP (ERRBLUP) as a full epistatic model which incorporates all pairwise SNP interactions, and (3) selective Epistatic Random Regression BLUP (sERRBLUP) as an epistatic model which incorporates a subset of pairwise SNP interactions selected based on their absolute effect sizes or the effect variances, which is computed based on solutions from the ERRBLUP model. We compared the predictive ability obtained from GBLUP, ERRBLUP, and sERRBLUP with genotypes from a publicly available wheat dataset and respective simulated phenotypes. Results showed that sERRBLUP provides a substantial increase in prediction accuracy compared to the other methods when the optimal proportion of SNP interactions is kept in the model, especially when an optimal proportion of SNP interactions is selected based on the SNP interaction effect sizes. All methods described here are implemented in the R-package EpiGP, which is able to process large-scale genomic data in a computationally efficient way.
Collapse
Affiliation(s)
- Elaheh Vojgani
- Center for Integrated Breeding Research, Animal Breeding and Genetics Group, Department of Animal Sciences, University of Goettingen, Goettingen, Germany.
| | - Torsten Pook
- Center for Integrated Breeding Research, Animal Breeding and Genetics Group, Department of Animal Sciences, University of Goettingen, Goettingen, Germany
| | - Henner Simianer
- Center for Integrated Breeding Research, Animal Breeding and Genetics Group, Department of Animal Sciences, University of Goettingen, Goettingen, Germany
| |
Collapse
|
17
|
Amini F, Franco FR, Hu G, Wang L. The look ahead trace back optimizer for genomic selection under transparent and opaque simulators. Sci Rep 2021; 11:4124. [PMID: 33602979 PMCID: PMC7893003 DOI: 10.1038/s41598-021-83567-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Accepted: 02/02/2021] [Indexed: 11/29/2022] Open
Abstract
Recent advances in genomic selection (GS) have demonstrated the importance of not only the accuracy of genomic prediction but also the intelligence of selection strategies. The look ahead selection algorithm, for example, has been found to significantly outperform the widely used truncation selection approach in terms of genetic gain, thanks to its strategy of selecting breeding parents that may not necessarily be elite themselves but have the best chance of producing elite progeny in the future. This paper presents the look ahead trace back algorithm as a new variant of the look ahead approach, which introduces several improvements to further accelerate genetic gain especially under imperfect genomic prediction. Perhaps an even more significant contribution of this paper is the design of opaque simulators for evaluating the performance of GS algorithms. These simulators are partially observable, explicitly capture both additive and non-additive genetic effects, and simulate uncertain recombination events more realistically. In contrast, most existing GS simulation settings are transparent, either explicitly or implicitly allowing the GS algorithm to exploit certain critical information that may not be possible in actual breeding programs. Comprehensive computational experiments were carried out using a maize data set to compare a variety of GS algorithms under four simulators with different levels of opacity. These results reveal how differently a same GS algorithm would interact with different simulators, suggesting the need for continued research in the design of more realistic simulators. As long as GS algorithms continue to be trained in silico rather than in planta, the best way to avoid disappointing discrepancy between their simulated and actual performances may be to make the simulator as akin to the complex and opaque nature as possible.
Collapse
Affiliation(s)
- Fatemeh Amini
- Department of Industrial and Manufacturing Systems Engineering, Iowa State University, Ames, IA, 50011, USA
| | - Felipe Restrepo Franco
- Department of Industrial and Manufacturing Systems Engineering, Iowa State University, Ames, IA, 50011, USA
| | - Guiping Hu
- Department of Industrial and Manufacturing Systems Engineering, Iowa State University, Ames, IA, 50011, USA
| | - Lizhi Wang
- Department of Industrial and Manufacturing Systems Engineering, Iowa State University, Ames, IA, 50011, USA.
| |
Collapse
|
18
|
Xiang R, MacLeod IM, Daetwyler HD, de Jong G, O’Connor E, Schrooten C, Chamberlain AJ, Goddard ME. Genome-wide fine-mapping identifies pleiotropic and functional variants that predict many traits across global cattle populations. Nat Commun 2021; 12:860. [PMID: 33558518 PMCID: PMC7870883 DOI: 10.1038/s41467-021-21001-0] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Accepted: 11/23/2020] [Indexed: 02/08/2023] Open
Abstract
The difficulty in finding causative mutations has hampered their use in genomic prediction. Here, we present a methodology to fine-map potentially causal variants genome-wide by integrating the functional, evolutionary and pleiotropic information of variants using GWAS, variant clustering and Bayesian mixture models. Our analysis of 17 million sequence variants in 44,000+ Australian dairy cattle for 34 traits suggests, on average, one pleiotropic QTL existing in each 50 kb chromosome-segment. We selected a set of 80k variants representing potentially causal variants within each chromosome segment to develop a bovine XT-50K genotyping array. The custom array contains many pleiotropic variants with biological functions, including splicing QTLs and variants at conserved sites across 100 vertebrate species. This biology-informed custom array outperformed the standard array in predicting genetic value of multiple traits across populations in independent datasets of 90,000+ dairy cattle from the USA, Australia and New Zealand.
Collapse
Affiliation(s)
- Ruidong Xiang
- grid.1008.90000 0001 2179 088XFaculty of Veterinary and Agricultural Science, The University of Melbourne, Parkville, VIC Australia ,grid.452283.a0000 0004 0407 2669Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC Australia
| | - Iona M. MacLeod
- grid.452283.a0000 0004 0407 2669Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC Australia
| | - Hans D. Daetwyler
- grid.452283.a0000 0004 0407 2669Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC Australia ,grid.1018.80000 0001 2342 0938School of Applied Systems Biology, La Trobe University, Bundoora, VIC Australia
| | | | | | | | - Amanda J. Chamberlain
- grid.452283.a0000 0004 0407 2669Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC Australia
| | - Michael E. Goddard
- grid.1008.90000 0001 2179 088XFaculty of Veterinary and Agricultural Science, The University of Melbourne, Parkville, VIC Australia ,grid.452283.a0000 0004 0407 2669Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC Australia
| |
Collapse
|
19
|
Farooq M, van Dijk ADJ, Nijveen H, Aarts MGM, Kruijer W, Nguyen TP, Mansoor S, de Ridder D. Prior Biological Knowledge Improves Genomic Prediction of Growth-Related Traits in Arabidopsis thaliana. Front Genet 2021; 11:609117. [PMID: 33552126 PMCID: PMC7855462 DOI: 10.3389/fgene.2020.609117] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Accepted: 12/21/2020] [Indexed: 01/11/2023] Open
Abstract
Prediction of growth-related complex traits is highly important for crop breeding. Photosynthesis efficiency and biomass are direct indicators of overall plant performance and therefore even minor improvements in these traits can result in significant breeding gains. Crop breeding for complex traits has been revolutionized by technological developments in genomics and phenomics. Capitalizing on the growing availability of genomics data, genome-wide marker-based prediction models allow for efficient selection of the best parents for the next generation without the need for phenotypic information. Until now such models mostly predict the phenotype directly from the genotype and fail to make use of relevant biological knowledge. It is an open question to what extent the use of such biological knowledge is beneficial for improving genomic prediction accuracy and reliability. In this study, we explored the use of publicly available biological information for genomic prediction of photosynthetic light use efficiency (Φ PSII ) and projected leaf area (PLA) in Arabidopsis thaliana. To explore the use of various types of knowledge, we mapped genomic polymorphisms to Gene Ontology (GO) terms and transcriptomics-based gene clusters, and applied these in a Genomic Feature Best Linear Unbiased Predictor (GFBLUP) model, which is an extension to the traditional Genomic BLUP (GBLUP) benchmark. Our results suggest that incorporation of prior biological knowledge can improve genomic prediction accuracy for both Φ PSII and PLA. The improvement achieved depends on the trait, type of knowledge and trait heritability. Moreover, transcriptomics offers complementary evidence to the Gene Ontology for improvement when used to define functional groups of genes. In conclusion, prior knowledge about trait-specific groups of genes can be directly translated into improved genomic prediction.
Collapse
Affiliation(s)
- Muhammad Farooq
- Bioinformatics Group, Wageningen University, Wageningen, Netherlands
- Molecular Virology and Gene Silencing Lab, Agricultural Biotechnology Division, National Institute for Biotechnology and Genetic Engineering (NIBGE), Punjab, Pakistan
| | - Aalt D. J. van Dijk
- Bioinformatics Group, Wageningen University, Wageningen, Netherlands
- Biometris, Wageningen University, Wageningen, Netherlands
| | - Harm Nijveen
- Bioinformatics Group, Wageningen University, Wageningen, Netherlands
| | - Mark G. M. Aarts
- Laboratory of Genetics, Wageningen University, Wageningen, Netherlands
| | - Willem Kruijer
- Biometris, Wageningen University, Wageningen, Netherlands
| | - Thu-Phuong Nguyen
- Laboratory of Genetics, Wageningen University, Wageningen, Netherlands
| | - Shahid Mansoor
- Molecular Virology and Gene Silencing Lab, Agricultural Biotechnology Division, National Institute for Biotechnology and Genetic Engineering (NIBGE), Punjab, Pakistan
| | - Dick de Ridder
- Bioinformatics Group, Wageningen University, Wageningen, Netherlands
| |
Collapse
|
20
|
Naserkheil M, Lee DH, Mehrban H. Improving the accuracy of genomic evaluation for linear body measurement traits using single-step genomic best linear unbiased prediction in Hanwoo beef cattle. BMC Genet 2020; 21:144. [PMID: 33267771 PMCID: PMC7709290 DOI: 10.1186/s12863-020-00928-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Accepted: 10/27/2020] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Recently, there has been a growing interest in the genetic improvement of body measurement traits in farm animals. They are widely used as predictors of performance, longevity, and production traits, and it is worthwhile to investigate the prediction accuracies of genomic selection for these traits. In genomic prediction, the single-step genomic best linear unbiased prediction (ssGBLUP) method allows the inclusion of information from genotyped and non-genotyped relatives in the analysis. Hence, we aimed to compare the prediction accuracy obtained from a pedigree-based BLUP only on genotyped animals (PBLUP-G), a traditional pedigree-based BLUP (PBLUP), a genomic BLUP (GBLUP), and a single-step genomic BLUP (ssGBLUP) method for the following 10 body measurement traits at yearling age of Hanwoo cattle: body height (BH), body length (BL), chest depth (CD), chest girth (CG), chest width (CW), hip height (HH), hip width (HW), rump length (RL), rump width (RW), and thurl width (TW). The data set comprised 13,067 phenotypic records for body measurement traits and 1523 genotyped animals with 34,460 single-nucleotide polymorphisms. The accuracy for each trait and model was estimated only for genotyped animals using five-fold cross-validations. RESULTS The accuracies ranged from 0.02 to 0.19, 0.22 to 0.42, 0.21 to 0.44, and from 0.36 to 0.55 as assessed using the PBLUP-G, PBLUP, GBLUP, and ssGBLUP methods, respectively. The average predictive accuracies across traits were 0.13 for PBLUP-G, 0.34 for PBLUP, 0.33 for GBLUP, and 0.45 for ssGBLUP methods. Our results demonstrated that averaged across all traits, ssGBLUP outperformed PBLUP and GBLUP by 33 and 43%, respectively, in terms of prediction accuracy. Moreover, the least root of mean square error was obtained by ssGBLUP method. CONCLUSIONS Our findings suggest that considering the ssGBLUP model may be a promising way to ensure acceptable accuracy of predictions for body measurement traits, especially for improving the prediction accuracy of selection candidates in ongoing Hanwoo breeding programs.
Collapse
Affiliation(s)
- Masoumeh Naserkheil
- Department of Animal Science, University College of Agriculture and Natural Resources, University of Tehran, P.O. Box: 4111, Karaj, 77871-31587 Iran
| | - Deuk Hwan Lee
- Department of Animal Life and Environment Sciences, Hankyong National University, Jungang-ro 327, Anseong-si, Gyeonggi-do South Korea
| | - Hossein Mehrban
- Department of Animal Science, Shahrekord University, P.O. Box: 115, Shahrekord, 88186-34141 Iran
| |
Collapse
|
21
|
Yu X, Leiboff S, Li X, Guo T, Ronning N, Zhang X, Muehlbauer GJ, Timmermans MC, Schnable PS, Scanlon MJ, Yu J. Genomic prediction of maize microphenotypes provides insights for optimizing selection and mining diversity. PLANT BIOTECHNOLOGY JOURNAL 2020; 18:2456-2465. [PMID: 32452105 PMCID: PMC7680549 DOI: 10.1111/pbi.13420] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2019] [Revised: 05/05/2020] [Accepted: 05/13/2020] [Indexed: 05/25/2023]
Abstract
Effective evaluation of millions of crop genetic stocks is an essential component of exploiting genetic diversity to achieve global food security. By leveraging genomics and data analytics, genomic prediction is a promising strategy to efficiently explore the potential of these gene banks by starting with phenotyping a small designed subset. Reliable genomic predictions have enhanced selection of many macroscopic phenotypes in plants and animals. However, the use of genomicprediction strategies for analysis of microscopic phenotypes is limited. Here, we exploited the power of genomic prediction for eight maize traits related to the shoot apical meristem (SAM), the microscopic stem cell niche that generates all the above-ground organs of the plant. With 435 713 genomewide single-nucleotide polymorphisms (SNPs), we predicted SAM morphology traits for 2687 diverse maize inbreds based on a model trained from 369 inbreds. An empirical validation experiment with 488 inbreds obtained a prediction accuracy of 0.37-0.57 across eight traits. In addition, we show that a significantly higher prediction accuracy was achieved by leveraging the U value (upper bound for reliability) that quantifies the genomic relationships of the validation set with the training set. Our findings suggest that double selection considering both prediction and reliability can be implemented in choosing selection candidates for phenotyping when exploring new diversity is desired. In this case, individuals with less extreme predicted values and moderate reliability values can be considered. Our study expands the turbocharging gene banks via genomic prediction from the macrophenotypes into the microphenotypic space.
Collapse
Affiliation(s)
- Xiaoqing Yu
- Department of AgronomyIowa State UniversityAmesIAUSA
| | - Samuel Leiboff
- Plant Biology SectionSchool of Integrative Plant ScienceCornell UniversityIthacaNYUSA
| | - Xianran Li
- Department of AgronomyIowa State UniversityAmesIAUSA
| | - Tingting Guo
- Department of AgronomyIowa State UniversityAmesIAUSA
| | - Natalie Ronning
- Plant Biology SectionSchool of Integrative Plant ScienceCornell UniversityIthacaNYUSA
| | - Xiaoyu Zhang
- Department of Plant BiologyUniversity of GeorgiaAthensGAUSA
| | - Gary J. Muehlbauer
- Department of Agronomy and Plant GeneticsUniversity of MinnesotaSt. PaulMNUSA
| | | | | | - Michael J. Scanlon
- Plant Biology SectionSchool of Integrative Plant ScienceCornell UniversityIthacaNYUSA
| | - Jianming Yu
- Department of AgronomyIowa State UniversityAmesIAUSA
| |
Collapse
|
22
|
Garcia ALS, Masuda Y, Tsuruta S, Miller S, Misztal I, Lourenco D. Indirect predictions with a large number of genotyped animals using the algorithm for proven and young. J Anim Sci 2020; 98:5831156. [PMID: 32374831 PMCID: PMC7263398 DOI: 10.1093/jas/skaa154] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Accepted: 04/30/2020] [Indexed: 11/21/2022] Open
Abstract
Reliable single-nucleotide polymorphisms (SNP) effects from genomic best linear unbiased prediction BLUP (GBLUP) and single-step GBLUP (ssGBLUP) are needed to calculate indirect predictions (IP) for young genotyped animals and animals not included in official evaluations. Obtaining reliable SNP effects and IP requires a minimum number of animals and when a large number of genotyped animals are available, the algorithm for proven and young (APY) may be needed. Thus, the objectives of this study were to evaluate IP with an increasingly larger number of genotyped animals and to determine the minimum number of animals needed to compute reliable SNP effects and IP. Genotypes and phenotypes for birth weight, weaning weight, and postweaning gain were provided by the American Angus Association. The number of animals with phenotypes was more than 3.8 million. Genotyped animals were assigned to three cumulative year-classes: born until 2013 (N = 114,937), born until 2014 (N = 183,847), and born until 2015 (N = 280,506). A three-trait model was fitted using the APY algorithm with 19,021 core animals under two scenarios: 1) core 2013 (random sample of animals born until 2013) used for all year-classes and 2) core 2014 (random sample of animals born until 2014) used for year-class 2014 and core 2015 (random sample of animals born until 2015) used for year-class 2015. GBLUP used phenotypes from genotyped animals only, whereas ssGBLUP used all available phenotypes. SNP effects were predicted using genomic estimated breeding values (GEBV) from either all genotyped animals or only core animals. The correlations between GEBV from GBLUP and IP obtained using SNP effects from core 2013 were ≥0.99 for animals born in 2013 but as low as 0.07 for animals born in 2014 and 2015. Conversely, the correlations between GEBV from ssGBLUP and IP were ≥0.99 for animals born in all years. IP predictive abilities computed with GEBV from ssGBLUP and SNP predictions based on only core animals were as high as those based on all genotyped animals. The correlations between GEBV and IP from ssGBLUP were ≥0.76, ≥0.90, and ≥0.98 when SNP effects were computed using 2k, 5k, and 15k core animals. Suitable IP based on GEBV from GBLUP can be obtained when SNP predictions are based on an appropriate number of core animals, but a considerable decline in IP accuracy can occur in subsequent years. Conversely, IP from ssGBLUP based on large numbers of phenotypes from non-genotyped animals have persistent accuracy over time.
Collapse
Affiliation(s)
- Andre L S Garcia
- Department of Animal and Dairy Science, University of Georgia, Athens, GA
| | - Yutaka Masuda
- Department of Animal and Dairy Science, University of Georgia, Athens, GA
| | - Shogo Tsuruta
- Department of Animal and Dairy Science, University of Georgia, Athens, GA
| | | | - Ignacy Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens, GA
| | - Daniela Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA
| |
Collapse
|
23
|
Gualdrón Duarte JL, Gori AS, Hubin X, Lourenco D, Charlier C, Misztal I, Druet T. Performances of Adaptive MultiBLUP, Bayesian regressions, and weighted-GBLUP approaches for genomic predictions in Belgian Blue beef cattle. BMC Genomics 2020; 21:545. [PMID: 32762654 PMCID: PMC7430838 DOI: 10.1186/s12864-020-06921-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2020] [Accepted: 07/17/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Genomic selection has been successfully implemented in many livestock and crop species. The genomic best linear unbiased predictor (GBLUP) approach, assigning equal variance to all SNP effects, is one of the reference methods. When large-effect variants contribute to complex traits, it has been shown that genomic prediction methods that assign a higher variance to subsets of SNP effects can achieve higher prediction accuracy. We herein compared the efficiency of several such approaches, including the Adaptive MultiBLUP (AM-BLUP) that uses local genomic relationship matrices (GRM) to automatically identify and weight genomic regions with large effects, to predict genetic merit in Belgian Blue beef cattle. RESULTS We used a population of approximately 10,000 genotyped cows and their phenotypes for 14 traits, mostly related to muscular development and body dimensions. According to the trait, we found that 4 to 25% of the genetic variance could be associated with 2 to 12 genomic regions harbouring large-effect variants. Noteworthy, three previously identified recessive deleterious variants presented heterozygote advantage and were among the most significant SNPs for several traits. The AM-BLUP resulted in increased reliability of genomic predictions compared to GBLUP (+ 2%), but Bayesian methods proved more efficient (+ 3%). Overall, the reliability gains remained thus limited although higher gains were observed for skin thickness, a trait affected by two genomic regions having particularly large effects. Higher accuracies than those from the original AM-BLUP were achieved when applying the Bayesian Sparse Linear Mixed Model to pre-select groups of SNPs with large effects and subsequently use their estimated variance to build a weighted GRM. Finally, the single-step GBLUP performed best and could be further improved (+ 3% prediction accuracy) by using these weighted GRM. CONCLUSIONS The AM-BLUP is an attractive method to automatically identify and weight genomic regions with large effects on complex traits. However, the method was less accurate than Bayesian methods. Overall, weighted methods achieved modest accuracy gains compared to GBLUP. Nevertheless, the computational efficiency of the AM-BLUP might be valuable at higher marker density, including with whole-genome sequencing data. Furthermore, weighted GRM are particularly useful to account for large variance loci in the single-step GBLUP.
Collapse
Affiliation(s)
- José Luis Gualdrón Duarte
- Unit of Animal Genomics, GIGA-R, 11 Avenue de l'Hôpital (B34), University of Liège, 4000, Liège, Belgium.
| | - Ann-Stephan Gori
- Innovation Department, Elevéo asbl and Inovéo, Awé Group, 5590, Ciney, Belgium
| | - Xavier Hubin
- Innovation Department, Elevéo asbl and Inovéo, Awé Group, 5590, Ciney, Belgium
| | - Daniela Lourenco
- Department of Animal and Dairy Science, University of Georgia, 425 River Rd, Athens, GA, 30602, USA
| | - Carole Charlier
- Unit of Animal Genomics, GIGA-R, 11 Avenue de l'Hôpital (B34), University of Liège, 4000, Liège, Belgium
| | - Ignacy Misztal
- Department of Animal and Dairy Science, University of Georgia, 425 River Rd, Athens, GA, 30602, USA
| | - Tom Druet
- Unit of Animal Genomics, GIGA-R, 11 Avenue de l'Hôpital (B34), University of Liège, 4000, Liège, Belgium
| |
Collapse
|
24
|
Lourenco D, Legarra A, Tsuruta S, Masuda Y, Aguilar I, Misztal I. Single-Step Genomic Evaluations from Theory to Practice: Using SNP Chips and Sequence Data in BLUPF90. Genes (Basel) 2020; 11:E790. [PMID: 32674271 PMCID: PMC7397237 DOI: 10.3390/genes11070790] [Citation(s) in RCA: 60] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Revised: 07/03/2020] [Accepted: 07/06/2020] [Indexed: 11/16/2022] Open
Abstract
Single-step genomic evaluation became a standard procedure in livestock breeding, and the main reason is the ability to combine all pedigree, phenotypes, and genotypes available into one single evaluation, without the need of post-analysis processing. Therefore, the incorporation of data on genotyped and non-genotyped animals in this method is straightforward. Since 2009, two main implementations of single-step were proposed. One is called single-step genomic best linear unbiased prediction (ssGBLUP) and uses single nucleotide polymorphism (SNP) to construct the genomic relationship matrix; the other is the single-step Bayesian regression (ssBR), which is a marker effect model. Under the same assumptions, both models are equivalent. In this review, we focus solely on ssGBLUP. The implementation of ssGBLUP into the BLUPF90 software suite was done in 2009, and since then, several changes were made to make ssGBLUP flexible to any model, number of traits, number of phenotypes, and number of genotyped animals. Single-step GBLUP from the BLUPF90 software suite has been used for genomic evaluations worldwide. In this review, we will show theoretical developments and numerical examples of ssGBLUP using SNP data from regular chips to sequence data.
Collapse
Affiliation(s)
- Daniela Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA 30602, USA; (S.T.); (Y.M.); (I.M.)
| | - Andres Legarra
- Institut National de la Recherche Agronomique, UMR1388 GenPhySE, 31326 Castanet Tolosan, France;
| | - Shogo Tsuruta
- Department of Animal and Dairy Science, University of Georgia, Athens, GA 30602, USA; (S.T.); (Y.M.); (I.M.)
| | - Yutaka Masuda
- Department of Animal and Dairy Science, University of Georgia, Athens, GA 30602, USA; (S.T.); (Y.M.); (I.M.)
| | - Ignacio Aguilar
- Instituto Nacional de Investigación Agropecuaria (INIA), 11500 Montevideo, Uruguay;
| | - Ignacy Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens, GA 30602, USA; (S.T.); (Y.M.); (I.M.)
| |
Collapse
|
25
|
Ben Zaabza H, Mäntysaari EA, Strandén I. Using Monte Carlo method to include polygenic effects in calculation of SNP-BLUP model reliability. J Dairy Sci 2020; 103:5170-5182. [PMID: 32253036 DOI: 10.3168/jds.2019-17255] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2019] [Accepted: 02/04/2020] [Indexed: 11/19/2022]
Abstract
An SNP-BLUP model is computationally scalable even for large numbers of genotyped animals. When genetic variation cannot be completely captured by SNP markers, a more accurate model is obtained by fitting a residual polygenic effect (RPG) as well. However, inclusion of the RPG effect increases the size of the SNP-BLUP mixed model equations (MME) by the number of genotyped animals. Consequently, the calculation of model reliabilities requiring elements of the inverted MME coefficient matrix becomes more computationally challenging with increasing numbers of genotyped animals. We present a Monte Carlo (MC)-based sampling method to estimate the reliability of the SNP-BLUP model including the RPG effect, where the MME size depends on the number of markers and MC samples. We compared reliabilities calculated using different RPG proportions and different MC sample sizes in analyzing 2 data sets. Data set 1 (data set 2) contained 19,757 (222,619) genotyped animals, with 11,729 (50,240) SNP markers, and 231,186 (13.35 million) pedigree animals. Correlations between the correct and the MC-calculated reliabilities were above 98% even with 5,000 MC samples and an 80% RPG proportion in both data sets. However, more MC samples were needed to achieve a small maximum absolute difference and mean squared error, particularly when the RPG proportion exceeded 20%. The computing time for MC SNP-BLUP was shorter than for GBLUP. In conclusion, the MC-based approach can be an effective strategy for calculating SNP-BLUP model reliability with an RPG effect included.
Collapse
Affiliation(s)
- H Ben Zaabza
- Natural Resources Institute Finland (Luke), FI-31600 Jokioinen, Finland.
| | - E A Mäntysaari
- Natural Resources Institute Finland (Luke), FI-31600 Jokioinen, Finland
| | - I Strandén
- Natural Resources Institute Finland (Luke), FI-31600 Jokioinen, Finland
| |
Collapse
|
26
|
Misztal I, Lourenco D, Legarra A. Current status of genomic evaluation. J Anim Sci 2020; 98:skaa101. [PMID: 32267923 PMCID: PMC7183352 DOI: 10.1093/jas/skaa101] [Citation(s) in RCA: 64] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2020] [Accepted: 04/07/2020] [Indexed: 12/14/2022] Open
Abstract
Early application of genomic selection relied on SNP estimation with phenotypes or de-regressed proofs (DRP). Chips of 50k SNP seemed sufficient for an accurate estimation of SNP effects. Genomic estimated breeding values (GEBV) were composed of an index with parent average, direct genomic value, and deduction of a parental index to eliminate double counting. Use of SNP selection or weighting increased accuracy with small data sets but had minimal to no impact with large data sets. Efforts to include potentially causative SNP derived from sequence data or high-density chips showed limited or no gain in accuracy. After the implementation of genomic selection, EBV by BLUP became biased because of genomic preselection and DRP computed based on EBV required adjustments, and the creation of DRP for females is hard and subject to double counting. Genomic selection was greatly simplified by single-step genomic BLUP (ssGBLUP). This method based on combining genomic and pedigree relationships automatically creates an index with all sources of information, can use any combination of male and female genotypes, and accounts for preselection. To avoid biases, especially under strong selection, ssGBLUP requires that pedigree and genomic relationships are compatible. Because the inversion of the genomic relationship matrix (G) becomes costly with more than 100k genotyped animals, large data computations in ssGBLUP were solved by exploiting limited dimensionality of genomic data due to limited effective population size. With such dimensionality ranging from 4k in chickens to about 15k in cattle, the inverse of G can be created directly (e.g., by the algorithm for proven and young) at a linear cost. Due to its simplicity and accuracy, ssGBLUP is routinely used for genomic selection by the major chicken, pig, and beef industries. Single step can be used to derive SNP effects for indirect prediction and for genome-wide association studies, including computations of the P-values. Alternative single-step formulations exist that use SNP effects for genotyped or for all animals. Although genomics is the new standard in breeding and genetics, there are still some problems that need to be solved. This involves new validation procedures that are unaffected by selection, parameter estimation that accounts for all the genomic data used in selection, and strategies to address reduction in genetic variances after genomic selection was implemented.
Collapse
Affiliation(s)
- Ignacy Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens, GA
| | - Daniela Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA
| | - Andres Legarra
- Department of Animal Genetics, Institut National de la Recherche Agronomique, Castanet-Tolosan, France
| |
Collapse
|
27
|
Karaman E, Lund MS, Su G. Multi-trait single-step genomic prediction accounting for heterogeneous (co)variances over the genome. Heredity (Edinb) 2020; 124:274-287. [PMID: 31641237 PMCID: PMC6972913 DOI: 10.1038/s41437-019-0273-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2019] [Revised: 09/05/2019] [Accepted: 09/06/2019] [Indexed: 11/23/2022] Open
Abstract
Widely used genomic prediction models may not properly account for heterogeneous (co)variance structure across the genome. Models such as BayesA and BayesB assume locus-specific variance, which are highly influenced by the prior for (co)variance of single nucleotide polymorphism (SNP) effect, regardless of the size of data. Models such as BayesC or GBLUP assume a common (co)variance for a proportion (BayesC) or all (GBLUP) of the SNP effects. In this study, we propose a multi-trait Bayesian whole genome regression method (BayesN0), which is based on grouping a number of predefined SNPs to account for heterogeneous (co)variance structure across the genome. This model was also implemented in single-step Bayesian regression (ssBayesN0). For practical implementation, we considered multi-trait single-step SNPBLUP models, using (co)variance estimates from BayesN0 or ssBayesN0. Genotype data were simulated using haplotypes on first five chromosomes of 2200 Danish Holstein cattle, and phenotypes were simulated for two traits with heritabilities 0.1 or 0.4, assuming 200 quantitative trait loci (QTL). We compared prediction accuracy from different prediction models and different region sizes (one SNP, 100 SNPs, one chromosome or whole genome). In general, highest accuracies were obtained when 100 adjacent SNPs were grouped together. The ssBayesN0 improved accuracies over BayesN0, and using (co)variance estimates from ssBayesN0 generally yielded higher accuracies than using (co)variance estimates from BayesN0, for the 100 SNPs region size. Our results suggest that it could be a good strategy to estimate (co)variance components from ssBayesN0, and then to use those estimates in genomic prediction using multi-trait single-step SNPBLUP, in routine genomic evaluations.
Collapse
Affiliation(s)
- Emre Karaman
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830, Tjele, Denmark.
| | - Mogens S Lund
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830, Tjele, Denmark
| | - Guosheng Su
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830, Tjele, Denmark
| |
Collapse
|
28
|
Pocrnic I, Lourenco DAL, Masuda Y, Misztal I. Accuracy of genomic BLUP when considering a genomic relationship matrix based on the number of the largest eigenvalues: a simulation study. Genet Sel Evol 2019; 51:75. [PMID: 31830899 PMCID: PMC6907194 DOI: 10.1186/s12711-019-0516-0] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2019] [Accepted: 12/04/2019] [Indexed: 12/20/2022] Open
Abstract
Background The dimensionality of genomic information is limited by the number of independent chromosome segments (Me), which is a function of the effective population size. This dimensionality can be determined approximately by singular value decomposition of the gene content matrix, by eigenvalue decomposition of the genomic relationship matrix (GRM), or by the number of core animals in the algorithm for proven and young (APY) that maximizes the accuracy of genomic prediction. In the latter, core animals act as proxies to linear combinations of Me. Field studies indicate that a moderate accuracy of genomic selection is achieved with a small dataset, but that further improvement of the accuracy requires much more data. When only one quarter of the optimal number of core animals are used in the APY algorithm, the accuracy of genomic selection is only slightly below the optimal value. This suggests that genomic selection works on clusters of Me. Results The simulation included datasets with different population sizes and amounts of phenotypic information. Computations were done by genomic best linear unbiased prediction (GBLUP) with selected eigenvalues and corresponding eigenvectors of the GRM set to zero. About four eigenvalues in the GRM explained 10% of the genomic variation, and less than 2% of the total eigenvalues explained 50% of the genomic variation. With limited phenotypic information, the accuracy of GBLUP was close to the peak where most of the smallest eigenvalues were set to zero. With a large amount of phenotypic information, accuracy increased as smaller eigenvalues were added. Conclusions A small amount of phenotypic data is sufficient to estimate only the effects of the largest eigenvalues and the associated eigenvectors that contain a large fraction of the genomic information, and a very large amount of data is required to estimate the remaining eigenvalues that account for a limited amount of genomic information. Core animals in the APY algorithm act as proxies of almost the same number of eigenvalues. By using an eigenvalues-based approach, it was possible to explain why the moderate accuracy of genomic selection based on small datasets only increases slowly as more data are added.
Collapse
Affiliation(s)
- Ivan Pocrnic
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, 30602, USA.
| | - Daniela A L Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, 30602, USA
| | - Yutaka Masuda
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, 30602, USA
| | - Ignacy Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, 30602, USA
| |
Collapse
|
29
|
Fragomeni BO, Lourenco DAL, Legarra A, VanRaden PM, Misztal I. Alternative SNP weighting for single-step genomic best linear unbiased predictor evaluation of stature in US Holsteins in the presence of selected sequence variants. J Dairy Sci 2019; 102:10012-10019. [PMID: 31495612 DOI: 10.3168/jds.2019-16262] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2019] [Accepted: 07/16/2019] [Indexed: 11/19/2022]
Abstract
Causal variants inferred from sequence data analysis are expected to increase accuracy of genomic selection. In this work we evaluated the gain in reliability of genomic predictions, for stature in US Holsteins, when adding selected sequence variants to a pre-existent SNP chip. Two prediction methods were tested: de-regressed proofs assuming heterogeneous (genomic BLUP; GBLUP) residual variances and by single-step GBLUP (ssGBLUP) using actual phenotypes. Phenotypic data included 3,999,631 records for stature on 3,027,304 Holstein cows. Genotypes on 54,087 SNP markers (54k) were available for 26,877 bulls. Additionally, 16,648 selected sequence variants were combined with the 54k markers, for a total of 70,735 (70k) markers. In all methods, SNP in the genomic relationship matrix (G) were unweighted or weighted iteratively, with weights derived either by SNP effects squared or by a nonlinear method that resembles BayesA (nonlinear A). Reliability of genomic predictions were obtained by cross validation. With unweighted G derived from 54k markers, the reliabilities (× 100) were 72.4 for GBLUP and 75.3 for ssGBLUP. With unweighted G derived from 70k markers, the reliabilities were 73.4 and 76.0, respectively. Weighting by nonlinear A changed reliabilities to 73.3, and 75.9, respectively. Addition of selected sequence variants had a small effect on reliabilities. Weighting by quadratic functions reduced reliabilities. Weighting by nonlinear A increased reliabilities for GBLUP but had only a small effect in ssGBLUP. Reliabilities for direct genomic values extracted from ssGBLUP using unweighted G with 54k were higher than reliabilities by any GBLUP. Thus, ssGBLUP seems to capture more information than GBLUP and there is less room for extra reliability. Improvements in GBLUP may be because the weights in G change the covariance structure, which can explain a proportion of the variance that is accounted for when a heterogeneous residual variance is assumed by considering a different number of daughters per bull.
Collapse
Affiliation(s)
- B O Fragomeni
- Department of Animal Science, University of Connecticut, Storrs-Mansfield 06269.
| | - D A L Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens 30602
| | - A Legarra
- Institut National de la Recherche Agronomique, UMR1388 GenPhySE, Castanet Tolosan, France 31326
| | - P M VanRaden
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, Beltsville, MD 20705
| | - I Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens 30602
| |
Collapse
|
30
|
Improvement of genomic prediction by integrating additional single nucleotide polymorphisms selected from imputed whole genome sequencing data. Heredity (Edinb) 2019; 124:37-49. [PMID: 31278370 PMCID: PMC6906477 DOI: 10.1038/s41437-019-0246-7] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Revised: 05/11/2019] [Accepted: 06/17/2019] [Indexed: 11/10/2022] Open
Abstract
The availability of whole genome sequencing (WGS) data enables the discovery of causative single nucleotide polymorphisms (SNPs) or SNPs in high linkage disequilibrium with causative SNPs. This study investigated effects of integrating SNPs selected from imputed WGS data into the data of 54K chip on genomic prediction in Danish Jersey. The WGS SNPs, mainly including peaks of quantitative trait loci, structure variants, regulatory regions of genes, and SNPs within genes with strong effects predicted with variant effect predictor, were selected in previous analyses for dairy breeds in Denmark–Finland–Sweden (DFS) and France (FRA). Animals genotyped with 54K chip, standard LD chip, and customized LD chip which covered selected WGS SNPs and SNPs in the standard LD chip, were imputed to 54K together with DFS and FRA SNPs. Genomic best linear unbiased prediction (GBLUP) and Bayesian four-distribution mixture models considering 54K and selected WGS SNPs as one (a one-component model) or two separate genetic components (a two-component model) were used to predict breeding values. For milk production traits and mastitis, both DFS (0.025) and FRA (0.029) sets of additional WGS SNPs improved reliabilities, and inclusions of all selected WGS SNPs generally achieved highest improvements of reliabilities (0.034). A Bayesian four-distribution model yielded higher reliabilities than a GBLUP model for milk and protein, but extra gains in reliabilities from using selected WGS SNPs were smaller for a Bayesian four-distribution model than a GBLUP model. Generally, no significant difference was observed between one-component and two-component models, except for using GBLUP models for milk.
Collapse
|
31
|
Hao Y, Wang H, Yang X, Zhang H, He C, Li D, Li H, Wang G, Wang J, Fu J. Genomic Prediction using Existing Historical Data Contributing to Selection in Biparental Populations: A Study of Kernel Oil in Maize. THE PLANT GENOME 2019; 12. [PMID: 30951098 DOI: 10.3835/plantgenome2018.05.0025] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
Maize ( L.) kernel oil provides high-quality nutrition for animal feed and human health. A certain number of maize breeding programs seek to enhance oil concentration and composition. Genomic selection (GS), which entails selection based on genomic estimated breeding values (GEBVs), has proven to be efficient in breeding programs. Here, we estimate the robustness of predictions for the oil traits of maize kernels in biparental recombination inbred lines (RILs) using a GS model built based on an association population. Most statistical models, including ridge regression-best linear unbiased prediction (RR-BLUP), showed high prediction accuracy in the training population through a cross validation procedure. The training population size was more important than marker density and a statistical model for prediction performance. Using the optimized GS model, prediction of the biparental RIL population showed medium-high prediction accuracy (0.68) compared with prediction using only oil associated markers ( = 0.43). The potential to apply the GS model to another RIL population that is genetically less related to the training population was also examined, showing promising prediction accuracy in the top selected lines. Our results proved that genomic prediction using existing data is robust for the prediction of polygenic traits with moderate to high heritability.
Collapse
|
32
|
Impact of genotyping strategy on the accuracy of genomic prediction in simulated populations of purebred swine. Animal 2019; 13:1804-1810. [PMID: 30616709 DOI: 10.1017/s1751731118003567] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Single-step genomic BLUP (ssGBLUP) has been widely used in genomic evaluation due to relatively higher prediction accuracy and simplicity of use. The prediction accuracy from ssGBLUP depends on the amount of information available concerning both genotype and phenotype. This study investigated how information on genotype and phenotype that had been acquired from previous generations influences the prediction accuracy of ssGBLUP, and thus we sought an optimal balance about genotypic and phenotypic information to achieve a cost-effective and computationally efficient genomic evaluation. We generated two genetically correlated traits (h2 = 0.35 for trait A, h2 = 0.10 for trait B and genetic correlation 0.20) as well as two distinct populations mimicking purebred swine. Phenotypic and genotypic information in different numbers of previous generations and different genotyping rates for each litter were set to generate different datasets. Prediction accuracy was evaluated by correlating genomic estimated breeding values with true breeding values for genotyped animals in the last generation. The results revealed a negligible impact of previous generations that lacked genotyped animals on the prediction accuracy. Phenotypic and genotypic data, including the most recent three to four generations with a genotyping rate of 40% or 50% for each litter, could lead to asymptotic maximum prediction accuracy for genotyped animals in the last generation. Single-step genomic best linear unbiased prediction yielded an optimal balance about genotypic and phenotypic information to ensure a cost-effective and computationally efficient genomic evaluation of populations of polytocous animals such as purebred pigs.
Collapse
|
33
|
Genomic Prediction Using Multi-trait Weighted GBLUP Accounting for Heterogeneous Variances and Covariances Across the Genome. G3-GENES GENOMES GENETICS 2018; 8:3549-3558. [PMID: 30194089 PMCID: PMC6222589 DOI: 10.1534/g3.118.200673] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Implicit assumption of common (co)variance for all loci in multi-trait Genomic Best Linear Unbiased Prediction (GBLUP) results in a genomic relationship matrix (G) that is common to all traits. When this assumption is violated, Bayesian whole genome regression methods may be superior to GBLUP by accounting for unequal (co)variance for all loci or genome regions. This study aimed to develop a strategy to improve the accuracy of GBLUP for multi-trait genomic prediction, using (co)variance estimates of SNP effects from Bayesian whole genome regression methods. Five generations (G1-G5, test populations) of genotype data were available by simulations based on data of 2,200 Danish Holstein cows (G0, reference population). Two correlated traits with heritabilities of 0.1 or 0.4, and a genetic correlation of 0.45 were generated. First, SNP effects and breeding values were estimated using BayesAS method, assuming (co)variance was the same for SNPs within a genome region, and different between regions. Region size was set as one SNP, 100 SNPs, a whole chromosome or whole genome. Second, posterior (co)variances of SNP effects were used to weight SNPs in construction of G matrices. In general, region size of 100 SNPs led to highest prediction accuracies using BayesAS, and wGBLUP outperformed GBLUP at this region size. Our results suggest that when genetic architectures of traits favor Bayesian methods, the accuracy of multi-trait GBLUP can be as high as the Bayesian method if SNPs are weighted by the Bayesian posterior (co)variances.
Collapse
|
34
|
Cheng H, Kizilkaya K, Zeng J, Garrick D, Fernando R. Genomic Prediction from Multiple-Trait Bayesian Regression Methods Using Mixture Priors. Genetics 2018. [PMID: 29514861 DOI: 10.1534/genetics.118.300650/-/dc1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/01/2023] Open
Abstract
Bayesian multiple-regression methods incorporating different mixture priors for marker effects are used widely in genomic prediction. Improvement in prediction accuracies from using those methods, such as BayesB, BayesC, and BayesCπ, have been shown in single-trait analyses with both simulated and real data. These methods have been extended to multi-trait analyses, but only under the restrictive assumption that a locus simultaneously affects all the traits or none of them. This assumption is not biologically meaningful, especially in multi-trait analyses involving many traits. In this paper, we develop and implement a more general multi-trait BayesC[Formula: see text] and BayesB methods allowing a broader range of mixture priors. Our methods allow a locus to affect any combination of traits, e.g., in a 5-trait analysis, the "restrictive" model only allows two situations, whereas ours allow all 32 situations. Further, we compare our methods to single-trait methods and the "restrictive" multi-trait formulation using real and simulated data. In the real data analysis, higher prediction accuracies were observed from both our new broad-based multi-trait methods and the "restrictive" formulation. The broad-based and restrictive multi-trait methods showed similar prediction accuracies. In the simulated data analysis, higher prediction accuracies to the "restrictive" method were observed from our general multi-trait methods for intermediate training population size. The software tool JWAS offers open-source routines to perform these analyses.
Collapse
Affiliation(s)
- Hao Cheng
- Department of Animal Science, University of California Davis, California 95616
| | - Kadir Kizilkaya
- Department of Animal Science, Adnan Menderes University, 9100 Aydin, Turkey
| | - Jian Zeng
- Program in Complex Trait Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, QLD 4072, Australia
| | - Dorian Garrick
- School of Agriculture, Massey University, Palmerston North 4442 New Zealand
| | - Rohan Fernando
- Department of Animal Science, Iowa State University, Ames, Iowa 50011-1050
| |
Collapse
|
35
|
Zeng J, Garrick D, Dekkers J, Fernando R. A nested mixture model for genomic prediction using whole-genome SNP genotypes. PLoS One 2018; 13:e0194683. [PMID: 29561877 PMCID: PMC5862491 DOI: 10.1371/journal.pone.0194683] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2017] [Accepted: 03/07/2018] [Indexed: 11/19/2022] Open
Abstract
Genomic prediction exploits single nucleotide polymorphisms (SNPs) across the whole genome for predicting genetic merit of selection candidates. In most models for genomic prediction, e.g. BayesA, B, C, R and GBLUP, independence of SNP effects is assumed. However, SNP effects are expected to be locally dependent given the presence of a nearby QTL because SNPs surrounding the QTL do not segregate independently. A consequence of ignoring this dependence is that SNPs with small effects may be overly shrunk, e.g. effects from markers with high minor allele frequencies (MAF) that flank QTL with low MAF. A nested mixture model (BayesN) is developed to account for the dependence of effects of SNPs that are closely linked, where the effects of SNPs in every non-overlapping genomic window a priori follow a point mass at zero for all SNPs or a mixture of some SNPs with nonzero effects and others with zero effects. It can be regarded as a parsimonious alternative to the existing antedependence model, antiBayesB, which allow a nonstationary dependence of SNP effects. Illumina 777K BovineHD genotypes from 948 Angus cattle were used to simulate 5,000 offspring, with 4,000 used for training and 1,000 for validation. Scenarios with 300 common (MAF > 0.05) or rare (MAF < 0.05) QTL randomly selected from segregating SNPs were replicated 8 times. SNPs corresponding to QTL were masked from a 600k panel comprising SNPs with MAF > 0.05 or a 50k evenly spaced subset of these. Compared with BayesB and a modified antiBayesB, BayesN improved the accuracy of prediction up to 2.0% with 50k SNPs and up to 7.0% with 600k SNPs, most improvements occurring in the rare QTL scenario. Computing time was reduced up to 60% with 50k SNPs and up to 75% with 600k SNPs. BayesN is an accurate and computationally efficient method for genomic prediction with whole-genome SNPs, especially for traits with rare QTL.
Collapse
Affiliation(s)
- Jian Zeng
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
- * E-mail:
| | - Dorian Garrick
- School of Agriculture, Massey University, Palmerston North, New Zealand
| | - Jack Dekkers
- Department of Animal Science, Iowa State University, Ames, Iowa, United States of America
| | - Rohan Fernando
- Department of Animal Science, Iowa State University, Ames, Iowa, United States of America
| |
Collapse
|
36
|
Abstract
Bayesian multiple-regression methods incorporating different mixture priors for marker effects are used widely in genomic prediction. Improvement in prediction accuracies from using those methods, such as BayesB, BayesC, and BayesCπ, have been shown in single-trait analyses with both simulated and real data. These methods have been extended to multi-trait analyses, but only under the restrictive assumption that a locus simultaneously affects all the traits or none of them. This assumption is not biologically meaningful, especially in multi-trait analyses involving many traits. In this paper, we develop and implement a more general multi-trait BayesC[Formula: see text] and BayesB methods allowing a broader range of mixture priors. Our methods allow a locus to affect any combination of traits, e.g., in a 5-trait analysis, the "restrictive" model only allows two situations, whereas ours allow all 32 situations. Further, we compare our methods to single-trait methods and the "restrictive" multi-trait formulation using real and simulated data. In the real data analysis, higher prediction accuracies were observed from both our new broad-based multi-trait methods and the "restrictive" formulation. The broad-based and restrictive multi-trait methods showed similar prediction accuracies. In the simulated data analysis, higher prediction accuracies to the "restrictive" method were observed from our general multi-trait methods for intermediate training population size. The software tool JWAS offers open-source routines to perform these analyses.
Collapse
|
37
|
Morota G. ShinyGPAS: interactive genomic prediction accuracy simulator based on deterministic formulas. Genet Sel Evol 2017; 49:91. [PMID: 29262775 PMCID: PMC5738850 DOI: 10.1186/s12711-017-0368-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2017] [Accepted: 12/11/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Deterministic formulas for the accuracy of genomic predictions highlight the relationships among prediction accuracy and potential factors influencing prediction accuracy prior to performing computationally intensive cross-validation. Visualizing such deterministic formulas in an interactive manner may lead to a better understanding of how genetic factors control prediction accuracy. RESULTS The software to simulate deterministic formulas for genomic prediction accuracy was implemented in R and encapsulated as a web-based Shiny application. Shiny genomic prediction accuracy simulator (ShinyGPAS) simulates various deterministic formulas and delivers dynamic scatter plots of prediction accuracy versus genetic factors impacting prediction accuracy, while requiring only mouse navigation in a web browser. ShinyGPAS is available at: https://chikudaisei.shinyapps.io/shinygpas/ . CONCLUSION ShinyGPAS is a shiny-based interactive genomic prediction accuracy simulator using deterministic formulas. It can be used for interactively exploring potential factors that influence prediction accuracy in genome-enabled prediction, simulating achievable prediction accuracy prior to genotyping individuals, or supporting in-class teaching. ShinyGPAS is open source software and it is hosted online as a freely available web-based resource with an intuitive graphical user interface.
Collapse
Affiliation(s)
- Gota Morota
- Department of Animal Science, University of Nebraska-Lincoln, PO Box 830908, Lincoln, NE, 68583-0908, USA.
| |
Collapse
|
38
|
Factors affecting GEBV accuracy with single-step Bayesian models. Heredity (Edinb) 2017; 120:100-109. [PMID: 29167557 DOI: 10.1038/s41437-017-0010-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2017] [Revised: 09/04/2017] [Accepted: 09/14/2017] [Indexed: 12/23/2022] Open
Abstract
A single-step approach to obtain genomic prediction was first proposed in 2009. Many studies have investigated the components of GEBV accuracy in genomic selection. However, it is still unclear how the population structure and the relationships between training and validation populations influence GEBV accuracy in terms of single-step analysis. Here, we explored the components of GEBV accuracy in single-step Bayesian analysis with a simulation study. Three scenarios with various numbers of QTL (5, 50, and 500) were simulated. Three models were implemented to analyze the simulated data: single-step genomic best linear unbiased prediction (GBLUP; SSGBLUP), single-step BayesA (SS-BayesA), and single-step BayesB (SS-BayesB). According to our results, GEBV accuracy was influenced by the relationships between the training and validation populations more significantly for ungenotyped animals than for genotyped animals. SS-BayesA/BayesB showed an obvious advantage over SSGBLUP with the scenarios of 5 and 50 QTL. SS-BayesB model obtained the lowest accuracy with the 500 QTL in the simulation. SS-BayesA model was the most efficient and robust considering all QTL scenarios. Generally, both the relationships between training and validation populations and LD between markers and QTL contributed to GEBV accuracy in the single-step analysis, and the advantages of single-step Bayesian models were more apparent when the trait is controlled by fewer QTL.
Collapse
|
39
|
Lourenco DAL, Fragomeni BO, Bradford HL, Menezes IR, Ferraz JBS, Aguilar I, Tsuruta S, Misztal I. Implications of SNP weighting on single-step genomic predictions for different reference population sizes. J Anim Breed Genet 2017; 134:463-471. [PMID: 28833593 DOI: 10.1111/jbg.12288] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2017] [Accepted: 07/19/2017] [Indexed: 01/20/2023]
Abstract
We investigated the importance of SNP weighting in populations with 2,000 to 25,000 genotyped animals. Populations were simulated with two effective sizes (20 or 100) and three numbers of QTL (10, 50 or 500). Pedigree information was available for six generations; phenotypes were recorded for the four middle generations. Animals from the last three generations were genotyped for 45,000 SNP. Single-step genomic BLUP (ssGBLUP) and weighted ssGBLUP (WssGBLUP) were used to estimate genomic EBV using a genomic relationship matrix (G). The WssGBLUP performed better in small genotyped populations; however, any advantage for WssGBLUP was reduced or eliminated when more animals were genotyped. WssGBLUP had greater resolution for genome-wide association (GWA) as did increasing the number of genotyped animals. For few QTL, accuracy was greater for WssGBLUP than ssGBLUP; however, for many QTL, accuracy was the same for both methods. The largest genotyped set was used to assess the dimensionality of genomic information (number of effective SNP). The number of effective SNP was considerably less in weighted G than in unweighted G. Once the number of independent SNP is well represented in the genotyped population, the impact of SNP weighting becomes less important.
Collapse
Affiliation(s)
- D A L Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, USA
| | - B O Fragomeni
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, USA
| | - H L Bradford
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, USA
| | - I R Menezes
- FZEA, University of Sao Paulo, Pirassununga, SP, Brazil
| | - J B S Ferraz
- FZEA, University of Sao Paulo, Pirassununga, SP, Brazil
| | - I Aguilar
- Instituto Nacional de Investigacion Agropecuaria, Canelones, Uruguay
| | - S Tsuruta
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, USA
| | - I Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, USA
| |
Collapse
|
40
|
Fragomeni BO, Lourenco DAL, Masuda Y, Legarra A, Misztal I. Incorporation of causative quantitative trait nucleotides in single-step GBLUP. Genet Sel Evol 2017; 49:59. [PMID: 28747171 PMCID: PMC5530494 DOI: 10.1186/s12711-017-0335-0] [Citation(s) in RCA: 51] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2017] [Accepted: 07/17/2017] [Indexed: 11/23/2022] Open
Abstract
Background Much effort is put into identifying causative quantitative trait nucleotides (QTN) in animal breeding, empowered by the availability of dense single nucleotide polymorphism (SNP) information. Genomic selection using traditional SNP information is easily implemented for any number of genotyped individuals using single-step genomic best linear unbiased predictor (ssGBLUP) with the algorithm for proven and young (APY). Our aim was to investigate whether ssGBLUP is useful for genomic prediction when some or all QTN are known. Methods Simulations included 180,000 animals across 11 generations. Phenotypes were available for all animals in generations 6 to 10. Genotypes for 60,000 SNPs across 10 chromosomes were available for 29,000 individuals. The genetic variance was fully accounted for by 100 or 1000 biallelic QTN. Raw genomic relationship matrices (GRM) were computed from (a) unweighted SNPs, (b) unweighted SNPs and causative QTN, (c) SNPs and causative QTN weighted with results obtained with genome-wide association studies, (d) unweighted SNPs and causative QTN with simulated weights, (e) only unweighted causative QTN, (f–h) as in (b–d) but using only the top 10% causative QTN, and (i) using only causative QTN with simulated weight. Predictions were computed by pedigree-based BLUP (PBLUP) and ssGBLUP. Raw GRM were blended with 1 or 5% of the numerator relationship matrix, or 1% of the identity matrix. Inverses of GRM were obtained directly or with APY. Results Accuracy of breeding values for 5000 genotyped animals in the last generation with PBLUP was 0.32, and for ssGBLUP it increased to 0.49 with an unweighted GRM, 0.53 after adding unweighted QTN, 0.63 when QTN weights were estimated, and 0.89 when QTN weights were based on true effects known from the simulation. When the GRM was constructed from causative QTN only, accuracy was 0.95 and 0.99 with blending at 5 and 1%, respectively. Accuracies simulating 1000 QTN were generally lower, with a similar trend. Accuracies using the APY inverse were equal or higher than those with a regular inverse. Conclusions Single-step GBLUP can account for causative QTN via a weighted GRM. Accuracy gains are maximum when variances of causative QTN are known and blending is at 1%.
Collapse
Affiliation(s)
- Breno O Fragomeni
- Edgar L. Rhodes Center for Animal and Dairy Science, University of Georgia, Athens, GA, USA.
| | - Daniela A L Lourenco
- Edgar L. Rhodes Center for Animal and Dairy Science, University of Georgia, Athens, GA, USA
| | - Yutaka Masuda
- Edgar L. Rhodes Center for Animal and Dairy Science, University of Georgia, Athens, GA, USA
| | - Andres Legarra
- GenPhySE, INRA, INPT, INP-ENVT, Université de Toulouse, 31326, Castanet-Tolosan, France
| | - Ignacy Misztal
- Edgar L. Rhodes Center for Animal and Dairy Science, University of Georgia, Athens, GA, USA
| |
Collapse
|
41
|
Neyhart JL, Tiede T, Lorenz AJ, Smith KP. Evaluating Methods of Updating Training Data in Long-Term Genomewide Selection. G3 (BETHESDA, MD.) 2017; 7:1499-1510. [PMID: 28315831 PMCID: PMC5427505 DOI: 10.1534/g3.117.040550] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/14/2017] [Accepted: 03/10/2017] [Indexed: 12/22/2022]
Abstract
Genomewide selection is hailed for its ability to facilitate greater genetic gains per unit time. Over breeding cycles, the requisite linkage disequilibrium (LD) between quantitative trait loci and markers is expected to change as a result of recombination, selection, and drift, leading to a decay in prediction accuracy. Previous research has identified the need to update the training population using data that may capture new LD generated over breeding cycles; however, optimal methods of updating have not been explored. In a barley (Hordeum vulgare L.) breeding simulation experiment, we examined prediction accuracy and response to selection when updating the training population each cycle with the best predicted lines, the worst predicted lines, both the best and worst predicted lines, random lines, criterion-selected lines, or no lines. In the short term, we found that updating with the best predicted lines or the best and worst predicted lines resulted in high prediction accuracy and genetic gain, but in the long term, all methods (besides not updating) performed similarly. We also examined the impact of including all data in the training population or only the most recent data. Though patterns among update methods were similar, using a smaller but more recent training population provided a slight advantage in prediction accuracy and genetic gain. In an actual breeding program, a breeder might desire to gather phenotypic data on lines predicted to be the best, perhaps to evaluate possible cultivars. Therefore, our results suggest that an optimal method of updating the training population is also very practical.
Collapse
Affiliation(s)
- Jeffrey L Neyhart
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, Minnesota 55108
| | - Tyler Tiede
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, Minnesota 55108
| | - Aaron J Lorenz
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, Minnesota 55108
| | - Kevin P Smith
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, Minnesota 55108
| |
Collapse
|
42
|
Application of Whole-Genome Prediction Methods for Genome-Wide Association Studies: A Bayesian Approach. JOURNAL OF AGRICULTURAL BIOLOGICAL AND ENVIRONMENTAL STATISTICS 2017. [DOI: 10.1007/s13253-017-0277-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
43
|
Affiliation(s)
- Patrick J Brown
- Associate Professor in the Department of Crop Sciences, University of Illinois, Urbana, Illinois 61801, USA
| |
Collapse
|
44
|
Yu X, Li X, Guo T, Zhu C, Wu Y, Mitchell SE, Roozeboom KL, Wang D, Wang ML, Pederson GA, Tesso TT, Schnable PS, Bernardo R, Yu J. Genomic prediction contributing to a promising global strategy to turbocharge gene banks. NATURE PLANTS 2016; 2:16150. [PMID: 27694945 DOI: 10.1038/nplants.2016.150] [Citation(s) in RCA: 121] [Impact Index Per Article: 15.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/24/2016] [Accepted: 08/31/2016] [Indexed: 05/18/2023]
Abstract
The 7.4 million plant accessions in gene banks are largely underutilized due to various resource constraints, but current genomic and analytic technologies are enabling us to mine this natural heritage. Here we report a proof-of-concept study to integrate genomic prediction into a broad germplasm evaluation process. First, a set of 962 biomass sorghum accessions were chosen as a reference set by germplasm curators. With high throughput genotyping-by-sequencing (GBS), we genetically characterized this reference set with 340,496 single nucleotide polymorphisms (SNPs). A set of 299 accessions was selected as the training set to represent the overall diversity of the reference set, and we phenotypically characterized the training set for biomass yield and other related traits. Cross-validation with multiple analytical methods using the data of this training set indicated high prediction accuracy for biomass yield. Empirical experiments with a 200-accession validation set chosen from the reference set confirmed high prediction accuracy. The potential to apply the prediction model to broader genetic contexts was also examined with an independent population. Detailed analyses on prediction reliability provided new insights into strategy optimization. The success of this project illustrates that a global, cost-effective strategy may be designed to assess the vast amount of valuable germplasm archived in 1,750 gene banks.
Collapse
Affiliation(s)
- Xiaoqing Yu
- Iowa State University, Ames, Iowa 50011, USA
| | - Xianran Li
- Iowa State University, Ames, Iowa 50011, USA
| | | | | | - Yuye Wu
- Kansas State University, Manhattan, Kansas 66506, USA
| | | | | | - Donghai Wang
- Kansas State University, Manhattan, Kansas 66506, USA
| | - Ming Li Wang
- US Department of Agriculture, Agricultural Research Service (USDA-ARS), Griffin, Georgia 30223, USA
| | - Gary A Pederson
- US Department of Agriculture, Agricultural Research Service (USDA-ARS), Griffin, Georgia 30223, USA
| | | | | | - Rex Bernardo
- University of Minnesota, St. Paul, Minnesota 55108, USA
| | - Jianming Yu
- Iowa State University, Ames, Iowa 50011, USA
| |
Collapse
|