1
|
Ferrão MAG, da Fonseca AFA, Volpi PS, de Souza LC, Comério M, Filho ACV, Riva-Souza EM, Munoz PR, Ferrão RG, Ferrão LFV. Genomic-assisted breeding for climate-smart coffee. THE PLANT GENOME 2024; 17:e20321. [PMID: 36946358 DOI: 10.1002/tpg2.20321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 01/25/2023] [Accepted: 02/12/2023] [Indexed: 06/18/2023]
Abstract
Coffee is a universal beverage that drives a multi-industry market on a global basis. Today, the sustainability of coffee production is threatened by accelerated climate changes. In this work, we propose the implementation of genomic-assisted breeding for climate-smart coffee in Coffea canephora. This species is adapted to higher temperatures and is more resilient to biotic and abiotic stresses. After evaluating two populations, over multiple harvests, and under severe drought weather condition, we dissected the genetic architecture of yield, disease resistance, and quality-related traits. By integrating genome-wide association studies and diallel analyses, our contribution is four-fold: (i) we identified a set of molecular markers with major effects associated with disease resistance and post-harvest traits, while yield and plant architecture presented a polygenic background; (ii) we demonstrated the relevance of nonadditive gene actions and projected hybrid vigor when genotypes from different geographically botanical groups are crossed; (iii) we computed medium-to-large heritability values for most of the traits, representing potential for fast genetic progress; and (iv) we provided a first step toward implementing molecular breeding to accelerate improvements in C. canephora. Altogether, this work is a blueprint for how quantitative genetics and genomics can assist coffee breeding and support the supply chain in the face of the current global changes.
Collapse
Affiliation(s)
- Maria Amélia G Ferrão
- Instituto Capixaba de Pesquisa, Assistência Técnica e Extensão Rural-Incaper, ES, Brazil
- Empresa Brasileira de Pesquisa Agropecuária-Embrapa Café, Brasília, Brazil
| | - Aymbire F A da Fonseca
- Instituto Capixaba de Pesquisa, Assistência Técnica e Extensão Rural-Incaper, ES, Brazil
- Empresa Brasileira de Pesquisa Agropecuária-Embrapa Café, Brasília, Brazil
| | - Paulo S Volpi
- Instituto Capixaba de Pesquisa, Assistência Técnica e Extensão Rural-Incaper, ES, Brazil
| | - Lucimara C de Souza
- Instituto Capixaba de Pesquisa, Assistência Técnica e Extensão Rural-Incaper, ES, Brazil
| | - Marcone Comério
- Instituto Capixaba de Pesquisa, Assistência Técnica e Extensão Rural-Incaper, ES, Brazil
| | - Abraão C Verdin Filho
- Instituto Capixaba de Pesquisa, Assistência Técnica e Extensão Rural-Incaper, ES, Brazil
| | - Elaine M Riva-Souza
- Instituto Capixaba de Pesquisa, Assistência Técnica e Extensão Rural-Incaper, ES, Brazil
| | - Patricio R Munoz
- Blueberry Breeding and Genomics Lab, Horticultural Sciences Department, University of Florida, Gainesville, FL, USA
| | - Romário G Ferrão
- Instituto Capixaba de Pesquisa, Assistência Técnica e Extensão Rural-Incaper, ES, Brazil
- Multivix Group, ES, Brazil
| | - Luís Felipe V Ferrão
- Blueberry Breeding and Genomics Lab, Horticultural Sciences Department, University of Florida, Gainesville, FL, USA
| |
Collapse
|
2
|
Lanzl T, Melchinger AE, Schön CC. Influence of the mating design on the additive genetic variance in plant breeding populations. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2023; 136:236. [PMID: 37906322 PMCID: PMC10618341 DOI: 10.1007/s00122-023-04447-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 08/14/2023] [Indexed: 11/02/2023]
Abstract
KEY MESSAGE Mating designs determine the realized additive genetic variance in a population sample. Deflated or inflated variances can lead to reduced or overly optimistic assessment of future selection gains. The additive genetic variance [Formula: see text] inherent to a breeding population is a major determinant of short- and long-term genetic gain. When estimated from experimental data, it is not only the additive variances at individual loci (QTL) but also covariances between QTL pairs that contribute to estimates of [Formula: see text]. Thus, estimates of [Formula: see text] depend on the genetic structure of the data source and vary between population samples. Here, we provide a theoretical framework for calculating the expectation and variance of [Formula: see text] from genotypic data of a given population sample. In addition, we simulated breeding populations derived from different numbers of parents (P = 2, 4, 8, 16) and crossed according to three different mating designs (disjoint, factorial and half-diallel crosses). We calculated the variance of [Formula: see text] and of the parameter b reflecting the covariance component in [Formula: see text] standardized by the genic variance. Our results show that mating designs resulting in large biparental families derived from few disjoint crosses carry a high risk of generating progenies exhibiting strong covariances between QTL pairs on different chromosomes. We discuss the consequences of the resulting deflated or inflated [Formula: see text] estimates for phenotypic and genome-based selection as well as for applying the usefulness criterion in selection. We show that already one round of recombination can effectively break negative and positive covariances between QTL pairs induced by the mating design. We suggest to obtain reliable estimates of [Formula: see text] and its components in a population sample by applying statistical methods differing in their treatment of QTL covariances.
Collapse
Affiliation(s)
- Tobias Lanzl
- Plant Breeding, TUM School of Life Sciences, Technical University of Munich, 85354, Freising, Germany
| | - Albrecht E Melchinger
- Plant Breeding, TUM School of Life Sciences, Technical University of Munich, 85354, Freising, Germany
- Institute of Plant Breeding, Seed Science and Population Genetics, University of Hohenheim, 70599, Stuttgart, Germany
| | - Chris-Carolin Schön
- Plant Breeding, TUM School of Life Sciences, Technical University of Munich, 85354, Freising, Germany.
| |
Collapse
|
3
|
Jiménez NP, Feldmann MJ, Famula RA, Pincot DDA, Bjornson M, Cole GS, Knapp SJ. Harnessing underutilized gene bank diversity and genomic prediction of cross usefulness to enhance resistance to Phytophthora cactorum in strawberry. THE PLANT GENOME 2023; 16:e20275. [PMID: 36480594 DOI: 10.1002/tpg2.20275] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 09/19/2022] [Indexed: 05/10/2023]
Abstract
The development of strawberry (Fragaria × ananassa Duchesne ex Rozier) cultivars resistant to Phytophthora crown rot (PhCR), a devastating disease caused by the soil-borne pathogen Phytophthora cactorum (Lebert & Cohn) J. Schröt., has been challenging partly because the resistance phenotypes are quantitative and only moderately heritable. To develop deeper insights into the genetics of resistance and build the foundation for applying genomic selection, a genetically diverse training population was screened for resistance to California isolates of the pathogen. Here we show that genetic gains in breeding for resistance to PhCR have been negligible (3% of the cultivars tested were highly resistant and none surpassed early 20th century cultivars). Narrow-sense genomic heritability for PhCR resistance ranged from 0.41 to 0.75 among training population individuals. Using multivariate genome-wide association studies (GWAS), we identified a large-effect locus (predicted to be RPc2) that explained 43.6-51.6% of the genetic variance, was necessary but not sufficient for resistance, and was associated with calcium channel and other candidate genes with known plant defense functions. The addition of underutilized gene bank resources to our training population doubled additive genetic variance, increased the accuracy of genomic selection, and enabled the discovery of individuals carrying favorable alleles that are either rare or not present in modern cultivars. The incorporation of an RPc2-associated single-nucleotide polymorphism (SNP) as a fixed effect increased genomic prediction accuracy from 0.40 to 0.55. Finally, we show that parent selection using genomic-estimated breeding values, genetic variances, and cross usefulness holds promise for enhancing resistance to PhCR in strawberry.
Collapse
Affiliation(s)
- Nicolás P Jiménez
- Dep. of Plant Sciences, Univ. of California, One Shields Ave, Davis, CA, 95616, USA
| | - Mitchell J Feldmann
- Dep. of Plant Sciences, Univ. of California, One Shields Ave, Davis, CA, 95616, USA
| | - Randi A Famula
- Dep. of Plant Sciences, Univ. of California, One Shields Ave, Davis, CA, 95616, USA
| | - Dominique D A Pincot
- Dep. of Plant Sciences, Univ. of California, One Shields Ave, Davis, CA, 95616, USA
| | - Marta Bjornson
- Dep. of Plant Sciences, Univ. of California, One Shields Ave, Davis, CA, 95616, USA
| | - Glenn S Cole
- Dep. of Plant Sciences, Univ. of California, One Shields Ave, Davis, CA, 95616, USA
| | - Steven J Knapp
- Dep. of Plant Sciences, Univ. of California, One Shields Ave, Davis, CA, 95616, USA
| |
Collapse
|
4
|
Nandudu L, Kawuki R, Ogbonna A, Kanaabi M, Jannink JL. Genetic dissection of cassava brown streak disease in a genomic selection population. FRONTIERS IN PLANT SCIENCE 2023; 13:1099409. [PMID: 36714759 PMCID: PMC9880483 DOI: 10.3389/fpls.2022.1099409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Accepted: 12/28/2022] [Indexed: 06/18/2023]
Abstract
Introduction Cassava brown streak disease (CBSD) is a major threat to food security in East and central Africa. Breeding for resistance against CBSD is the most economical and sustainable way of addressing this challenge. Methods This study seeks to assess the (1) performance of CBSD incidence and severity; (2) identify genomic regions associated with CBSD traits and (3) candidate genes in the regions of interest, in the Cycle 2 population of the National Crops Resources Research Institute. Results A total of 302 diverse clones were screened, revealing that CBSD incidence across growing seasons was 44%. Severity scores for both foliar and root symptoms ranged from 1.28 to 1.99 and 1.75 to 2.28, respectively across seasons. Broad sense heritability ranged from low to high (0.15 - 0.96), while narrow sense heritability ranged from low to moderate (0.03 - 0.61). Five QTLs, explaining approximately 19% phenotypic variation were identified for CBSD severity at 3 months after planting on chromosomes 1, 13, and 18 in the univariate GWAS analysis. Multivariate GWAS analysis identified 17 QTLs that were consistent with the univariate analysis including additional QTLs on chromosome 6. Seventy-seven genes were identified in these regions with functions such as catalytic activity, ATP-dependent activity, binding, response to stimulus, translation regulator activity, transporter activity among others. Discussion These results suggest variation in virulence in the C2 population, largely due to genetics and annotated genes in these QTLs regions may play critical roles in virus initiation and replication, thus increasing susceptibility to CBSD.
Collapse
Affiliation(s)
- Leah Nandudu
- Section of Plant Breeding and Genetics, School of Integrative Plant Sciences, Cornell University, Ithaca, NY, United States
- Root crops Department National Crops Resources Research Institute (NaCRRI), Kampala, Uganda
| | - Robert Kawuki
- Root crops Department National Crops Resources Research Institute (NaCRRI), Kampala, Uganda
| | - Alex Ogbonna
- Section of Plant Breeding and Genetics, School of Integrative Plant Sciences, Cornell University, Ithaca, NY, United States
| | - Michael Kanaabi
- Root crops Department National Crops Resources Research Institute (NaCRRI), Kampala, Uganda
| | - Jean-Luc Jannink
- Section of Plant Breeding and Genetics, School of Integrative Plant Sciences, Cornell University, Ithaca, NY, United States
- US Department of Agriculture, Agricultural Research Service (USDA-ARS), Ithaca, NY, United States
| |
Collapse
|
5
|
Rembe M, Zhao Y, Wendler N, Oldach K, Korzun V, Reif JC. The Potential of Genome-Wide Prediction to Support Parental Selection, Evaluated with Data from a Commercial Barley Breeding Program. PLANTS 2022; 11:plants11192564. [PMID: 36235430 PMCID: PMC9571379 DOI: 10.3390/plants11192564] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 09/18/2022] [Accepted: 09/23/2022] [Indexed: 11/29/2022]
Abstract
Parental selection is at the beginning and contributes significantly to the success of any breeding work. The value of a cross is reflected in the potential of its progeny population. Breeders invest substantial resources in evaluating progeny to select the best performing genotypes as candidates for variety development. Several proposals have been made to use genomics to support parental selection. These have mostly been evaluated using theoretical considerations or simulation studies. However, evaluations using experimental data have rarely been conducted. In this study, we tested the potential of genomic prediction for predicting the progeny mean, variance, and usefulness criterion using data from an applied breeding population for winter barley. For three traits with genetic architectures at varying levels of complexity, ear emergence, plant height, and grain yield, progeny mean, variance, and usefulness criterion were predicted and validated in scenarios resembling situations in which the described tools shall be used in plant breeding. While the population mean could be predicted with moderate to high prediction abilities amounting to 0.64, 0.21, and 0.39 in ear emergence, plant height, and grain yield, respectively, the prediction of family variance appeared difficult, as reflected in low prediction abilities of 0.41, 0.11, and 0.14, for ear emergence, plant height, and grain yield, respectively. We have shown that identifying superior crosses remains a challenging task and suggest that the success of predicting the usefulness criterion depends strongly on the complexity of the underlying trait.
Collapse
Affiliation(s)
- Maximilian Rembe
- Department of Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), D-06466 Gatersleben, Germany
| | - Yusheng Zhao
- Department of Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), D-06466 Gatersleben, Germany
| | - Neele Wendler
- KWS LOCHOW GmbH, Ferdinand-von-Lochow-Str. 5, 29303 Bergen, Germany
| | - Klaus Oldach
- KWS LOCHOW GmbH, Ferdinand-von-Lochow-Str. 5, 29303 Bergen, Germany
| | - Viktor Korzun
- KWS SAAT SE & Co. KGaA, Grimsehlstr. 31, 37574 Einbeck, Germany
| | - Jochen C. Reif
- Department of Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), D-06466 Gatersleben, Germany
- Correspondence:
| |
Collapse
|
6
|
Lupi AS, Sumpter NA, Leask MP, O'Sullivan J, Fadason T, de Los Campos G, Merriman TR, Reynolds RJ, Vazquez AI. Local genetic covariance between serum urate and kidney function estimated with Bayesian multitrait models. G3 (BETHESDA, MD.) 2022; 12:6649732. [PMID: 35876900 PMCID: PMC9434310 DOI: 10.1093/g3journal/jkac158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Accepted: 06/05/2022] [Indexed: 11/13/2022]
Abstract
Hyperuricemia (serum urate >6.8 mg/dl) is associated with several cardiometabolic and renal diseases, such as gout and chronic kidney disease. Previous studies have examined the shared genetic basis of chronic kidney disease and hyperuricemia in humans either using single-variant tests or estimating whole-genome genetic correlations between the traits. Individual variants typically explain a small fraction of the genetic correlation between traits, thus the ability to map pleiotropic loci is lacking power for available sample sizes. Alternatively, whole-genome estimates of genetic correlation indicate a moderate correlation between these traits. While useful to explain the comorbidity of these traits, whole-genome genetic correlation estimates do not shed light on what regions may be implicated in the shared genetic basis of traits. Therefore, to fill the gap between these two approaches, we used local Bayesian multitrait models to estimate the genetic covariance between a marker for chronic kidney disease (estimated glomerular filtration rate) and serum urate in specific genomic regions. We identified 134 overlapping linkage disequilibrium windows with statistically significant covariance estimates, 49 of which had positive directionalities, and 85 negative directionalities, the latter being consistent with that of the overall genetic covariance. The 134 significant windows condensed to 64 genetically distinct shared loci which validate 17 previously identified shared loci with consistent directionality and revealed 22 novel pleiotropic genes. Finally, to examine potential biological mechanisms for these shared loci, we have identified a subset of the genomic windows that are associated with gene expression using colocalization analyses. The regions identified by our local Bayesian multitrait model approach may help explain the association between chronic kidney disease and hyperuricemia.
Collapse
Affiliation(s)
- Alexa S Lupi
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI 48824, USA.,Institute for Quantitative Health Science and Engineering, Systems Biology, Michigan State University, East Lansing, MI 48824, USA
| | - Nicholas A Sumpter
- Department of Medicine, The University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Megan P Leask
- Department of Medicine, The University of Alabama at Birmingham, Birmingham, AL 35294, USA.,Department of Biochemistry, University of Otago, Dunedin 9016, New Zealand
| | - Justin O'Sullivan
- Liggins Institute, The University of Auckland, Auckland 1142, New Zealand
| | - Tayaza Fadason
- Liggins Institute, The University of Auckland, Auckland 1142, New Zealand
| | - Gustavo de Los Campos
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI 48824, USA.,Institute for Quantitative Health Science and Engineering, Systems Biology, Michigan State University, East Lansing, MI 48824, USA.,Department of Statistics and Probability, Michigan State University, East Lansing, MI 48824, USA
| | - Tony R Merriman
- Department of Medicine, The University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Richard J Reynolds
- Department of Medicine, The University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Ana I Vazquez
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI 48824, USA.,Institute for Quantitative Health Science and Engineering, Systems Biology, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
7
|
Hansen PB, Ruud AK, de los Campos G, Malinowska M, Nagy I, Svane SF, Thorup-Kristensen K, Jensen JD, Krusell L, Asp T. Integration of DNA Methylation and Transcriptome Data Improves Complex Trait Prediction in Hordeum vulgare. PLANTS 2022; 11:plants11172190. [PMID: 36079572 PMCID: PMC9459846 DOI: 10.3390/plants11172190] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Revised: 08/19/2022] [Accepted: 08/21/2022] [Indexed: 11/30/2022]
Abstract
Whole-genome multi-omics profiles contain valuable information for the characterization and prediction of complex traits in plants. In this study, we evaluate multi-omics models to predict four complex traits in barley (Hordeum vulgare); grain yield, thousand kernel weight, protein content, and nitrogen uptake. Genomic, transcriptomic, and DNA methylation data were obtained from 75 spring barley lines tested in the RadiMax semi-field phenomics facility under control and water-scarce treatment. By integrating multi-omics data at genomic, transcriptomic, and DNA methylation regulatory levels, a higher proportion of phenotypic variance was explained (0.72–0.91) than with genomic models alone (0.55–0.86). The correlation between predictions and phenotypes varied from 0.17–0.28 for control plants and 0.23–0.37 for water-scarce plants, and the increase in accuracy was significant for nitrogen uptake and protein content compared to models using genomic information alone. Adding transcriptomic and DNA methylation information to the prediction models explained more of the phenotypic variance attributed to the environment in grain yield and nitrogen uptake. It furthermore explained more of the non-additive genetic effects for thousand kernel weight and protein content. Our results show the feasibility of multi-omics prediction for complex traits in barley.
Collapse
Affiliation(s)
- Pernille Bjarup Hansen
- Center for Quantitative Genetics and Genomics, Aarhus University, 4200 Slagelse, Denmark
- Correspondence: (P.B.H.); (T.A.); Tel.: +45-87158243 (T.A.)
| | - Anja Karine Ruud
- Center for Quantitative Genetics and Genomics, Aarhus University, 4200 Slagelse, Denmark
| | - Gustavo de los Campos
- Departments of Epidemiology & Biostatistics and Statistics & Probability, Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Marta Malinowska
- Center for Quantitative Genetics and Genomics, Aarhus University, 4200 Slagelse, Denmark
| | - Istvan Nagy
- Center for Quantitative Genetics and Genomics, Aarhus University, 4200 Slagelse, Denmark
| | - Simon Fiil Svane
- Section for Crop Sciences, Department of Plant and Environmental Sciences, Copenhagen University, 2630 Taastrup, Denmark
| | - Kristian Thorup-Kristensen
- Section for Crop Sciences, Department of Plant and Environmental Sciences, Copenhagen University, 2630 Taastrup, Denmark
| | | | - Lene Krusell
- Sejet Plant Breeding, Nørremarksvej 67, 8700 Horsens, Denmark
| | - Torben Asp
- Center for Quantitative Genetics and Genomics, Aarhus University, 4200 Slagelse, Denmark
- Correspondence: (P.B.H.); (T.A.); Tel.: +45-87158243 (T.A.)
| |
Collapse
|
8
|
Pérez-Rodríguez P, de Los Campos G. Multi-trait Bayesian Shrinkage and Variable Selection Models with the BGLR R-package. Genetics 2022; 222:6655691. [PMID: 35924977 PMCID: PMC9434216 DOI: 10.1093/genetics/iyac112] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Accepted: 07/14/2022] [Indexed: 12/02/2022] Open
Abstract
The BGLR-R package implements various types of single-trait shrinkage/variable selection Bayesian regressions. The package was first released in 2014, since then it has become a software very often used in genomic studies. We recently develop functionality for multitrait models. The implementation allows users to include an arbitrary number of random-effects terms. For each set of predictors, users can choose diffuse, Gaussian, and Gaussian–spike–slab multivariate priors. Unlike other software packages for multitrait genomic regressions, BGLR offers many specifications for (co)variance parameters (unstructured, diagonal, factor analytic, and recursive). Samples from the posterior distribution of the models implemented in the multitrait function are generated using a Gibbs sampler, which is implemented by combining code written in the R and C programming languages. In this article, we provide an overview of the models and methods implemented BGLR’s multitrait function, present examples that illustrate the use of the package, and benchmark the performance of the software.
Collapse
Affiliation(s)
- Paulino Pérez-Rodríguez
- Colegio de Postgraduados, CP 56230, Montecillos, Estado de México, México.,Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI 48824, USA
| | - Gustavo de Los Campos
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI 48824, USA.,Department of Statistics and Probability, Michigan State University, East Lansing, MI 48824, USA.,Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
9
|
Feldmann MJ, Piepho HP, Knapp SJ. Average semivariance directly yields accurate estimates of the genomic variance in complex trait analyses. G3 GENES|GENOMES|GENETICS 2022; 12:6571389. [PMID: 35442424 PMCID: PMC9157152 DOI: 10.1093/g3journal/jkac080] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/27/2021] [Accepted: 03/17/2022] [Indexed: 11/23/2022]
Abstract
Many important traits in plants, animals, and microbes are polygenic and challenging to improve through traditional marker-assisted selection. Genomic prediction addresses this by incorporating all genetic data in a mixed model framework. The primary method for predicting breeding values is genomic best linear unbiased prediction, which uses the realized genomic relationship or kinship matrix (K) to connect genotype to phenotype. Genomic relationship matrices share information among entries to estimate the observed entries’ genetic values and predict unobserved entries’ genetic values. One of the main parameters of such models is genomic variance (σg2), or the variance of a trait associated with a genome-wide sample of DNA polymorphisms, and genomic heritability (hg2); however, the seminal papers introducing different forms of K often do not discuss their effects on the model estimated variance components despite their importance in genetic research and breeding. Here, we discuss the effect of several standard methods for calculating the genomic relationship matrix on estimates of σg2 and hg2. With current approaches, we found that the genomic variance tends to be either overestimated or underestimated depending on the scaling and centering applied to the marker matrix (Z), the value of the average diagonal element of K, and the assortment of alleles and heterozygosity (H) in the observed population. Using the average semivariance, we propose a new matrix, KASV, that directly yields accurate estimates of σg2 and hg2 in the observed population and produces best linear unbiased predictors equivalent to routine methods in plants and animals.
Collapse
Affiliation(s)
- Mitchell J Feldmann
- Department of Plant Sciences, University of California , Davis, CA 95616, USA
| | - Hans-Peter Piepho
- Biostatistics Unit, Institute of Crop Science, University of Hohenheim , 70593 Stuttgart, Germany
| | - Steven J Knapp
- Department of Plant Sciences, University of California , Davis, CA 95616, USA
| |
Collapse
|
10
|
Ogawa S, Matsuda H, Taniguchi Y, Watanabe T, Sugimoto Y, Iwaisaki H. Estimation of the autosomal contribution to total additive genetic variability of carcass traits in Japanese Black cattle. Anim Sci J 2022; 93:e13710. [PMID: 35416392 DOI: 10.1111/asj.13710] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Revised: 02/18/2022] [Accepted: 03/18/2022] [Indexed: 11/29/2022]
Abstract
We attempted to estimate the additive genetic variance explained by each autosome, using genotype data of 33,657 single nucleotide polymorphism (SNP) markers in 2271 Japanese Black fattened steers. Traits were cold carcass weight, ribeye area, rib thickness, subcutaneous fat thickness, estimated yield percentage, and marbling score. Two mixed linear models were used: One is that (model 1) incorporating a genomic relationship matrix (G matrix) constructed by using all available SNPs, and another (model 2), incorporating two G matrices constructed by using the SNPs on one autosome and using those on the remaining autosomes. Genomic heritabilities estimated using model 1 were moderate to high. The sums of the proportions of the additive genetic variance explained by each autosome to the total genetic variance estimated by using model 2 were >90%. For carcass weight, the proportions explained by Bos taurus autosomes 6, 8, and 14 were higher than those explained by the remaining autosomes. In some cases, the estimated proportion was close to 0. The results obtained from model 2 could provide a novel insight into the genetic architecture, such as heritability per chromosome, of carcass traits in Japanese Black cattle, although further careful investigation would be required.
Collapse
Affiliation(s)
| | | | - Yukio Taniguchi
- Graduate School of Agriculture, Kyoto University, Kyoto, Japan
| | | | - Yoshikazu Sugimoto
- Shirakawa Institute of Animal Genetics, Japan Livestock Technology Association, Tokyo, Japan
| | | |
Collapse
|
11
|
Burch KS, Hou K, Ding Y, Wang Y, Gazal S, Shi H, Pasaniuc B. Partitioning gene-level contributions to complex-trait heritability by allele frequency identifies disease-relevant genes. Am J Hum Genet 2022; 109:692-709. [PMID: 35271803 PMCID: PMC9069080 DOI: 10.1016/j.ajhg.2022.02.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Accepted: 02/15/2022] [Indexed: 11/15/2022] Open
Abstract
Recent works have shown that SNP heritability-which is dominated by low-effect common variants-may not be the most relevant quantity for localizing high-effect/critical disease genes. Here, we introduce methods to estimate the proportion of phenotypic variance explained by a given assignment of SNPs to a single gene ("gene-level heritability"). We partition gene-level heritability by minor allele frequency (MAF) to find genes whose gene-level heritability is explained exclusively by "low-frequency/rare" variants (0.5% ≤ MAF < 1%). Applying our method to ∼16K protein-coding genes and 25 quantitative traits in the UK Biobank (N = 290K "White British"), we find that, on average across traits, ∼2.5% of nonzero-heritability genes have a rare-variant component and only ∼0.8% (327 gene-trait pairs) have heritability exclusively from rare variants. Of these 327 gene-trait pairs, 114 (35%) were not detected by existing gene-level association testing methods. The additional genes we identify are significantly enriched for known disease genes, and we find several examples of genes that have been previously implicated in phenotypically related Mendelian disorders. Notably, the rare-variant component of gene-level heritability exhibits trends different from those of common-variant gene-level heritability. For example, while total gene-level heritability increases with gene length, the rare-variant component is significantly larger among shorter genes; the cumulative distributions of gene-level heritability also vary across traits and reveal differences in the relative contributions of rare/common variants to overall gene-level polygenicity. While nonzero gene-level heritability does not imply causality, if interpreted in the correct context, gene-level heritability can reveal useful insights into complex-trait genetic architecture.
Collapse
Affiliation(s)
- Kathryn S Burch
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Computational Medicine, David Geffen School of Medicine at University of California, Los Angeles, Los Angeles, CA 90095, USA.
| | - Kangcheng Hou
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Computational Medicine, David Geffen School of Medicine at University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Yi Ding
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Computational Medicine, David Geffen School of Medicine at University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Yifei Wang
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine at University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Steven Gazal
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Huwenbo Shi
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; OMNI Bioinformatics, Genentech, 1 DNA Way, South San Francisco, CA 94080, USA
| | - Bogdan Pasaniuc
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Pathology and Laboratory Medicine, David Geffen School of Medicine at University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Human Genetics, David Geffen School of Medicine at University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Computational Medicine, David Geffen School of Medicine at University of California, Los Angeles, Los Angeles, CA 90095, USA.
| |
Collapse
|
12
|
Jung M, Keller B, Roth M, Aranzana MJ, Auwerkerken A, Guerra W, Al-Rifaï M, Lewandowski M, Sanin N, Rymenants M, Didelot F, Dujak C, Font i Forcada C, Knauf A, Laurens F, Studer B, Muranty H, Patocchi A. Genetic architecture and genomic predictive ability of apple quantitative traits across environments. HORTICULTURE RESEARCH 2022; 9:uhac028. [PMID: 35184165 PMCID: PMC8976694 DOI: 10.1093/hr/uhac028] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Revised: 12/09/2021] [Accepted: 01/11/2022] [Indexed: 06/14/2023]
Abstract
Implementation of genomic tools is desirable to increase the efficiency of apple breeding. Recently, the multi-environment apple reference population (apple REFPOP) proved useful for rediscovering loci, estimating genomic predictive ability, and studying genotype by environment interactions (G × E). So far, only two phenological traits were investigated using the apple REFPOP, although the population may be valuable when dissecting genetic architecture and reporting predictive abilities for additional key traits in apple breeding. Here we show contrasting genetic architecture and genomic predictive abilities for 30 quantitative traits across up to six European locations using the apple REFPOP. A total of 59 stable and 277 location-specific associations were found using GWAS, 69.2% of which are novel when compared with 41 reviewed publications. Average genomic predictive abilities of 0.18-0.88 were estimated using main-effect univariate, main-effect multivariate, multi-environment univariate, and multi-environment multivariate models. The G × E accounted for up to 24% of the phenotypic variability. This most comprehensive genomic study in apple in terms of trait-environment combinations provided knowledge of trait biology and prediction models that can be readily applied for marker-assisted or genomic selection, thus facilitating increased breeding efficiency.
Collapse
Affiliation(s)
- Michaela Jung
- Agroscope, Breeding Research Group, 8820 Wädenswil, Switzerland
- Molecular Plant Breeding, Institute of Agricultural Sciences, ETH Zurich, 8092 Zurich, Switzerland
| | - Beat Keller
- Agroscope, Breeding Research Group, 8820 Wädenswil, Switzerland
- Molecular Plant Breeding, Institute of Agricultural Sciences, ETH Zurich, 8092 Zurich, Switzerland
| | - Morgane Roth
- Agroscope, Breeding Research Group, 8820 Wädenswil, Switzerland
- GAFL, INRAE, 84140 Montfavet, France
| | - Maria José Aranzana
- IRTA (Institut de Recerca i Tecnologia Agroalimentàries), 08140 Caldes de Montbui, Barcelona, Spain
- Centre for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Campus UAB, 08193 Bellaterra, Barcelona, Spain
| | | | | | - Mehdi Al-Rifaï
- Univ Angers, Institut Agro, INRAE, IRHS, SFR QuaSaV, F-49000 Angers, France
| | - Mariusz Lewandowski
- The National Institute of Horticultural Research, Konstytucji 3 Maja 1/3, 96-100 Skierniewice, Poland
| | | | - Marijn Rymenants
- Better3fruit N.V., 3202 Rillaar, Belgium
- Laboratory for Plant Genetics and Crop Improvement, KU Leuven, B-3001 Leuven, Belgium
| | | | - Christian Dujak
- Centre for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Campus UAB, 08193 Bellaterra, Barcelona, Spain
| | - Carolina Font i Forcada
- IRTA (Institut de Recerca i Tecnologia Agroalimentàries), 08140 Caldes de Montbui, Barcelona, Spain
| | - Andrea Knauf
- Agroscope, Breeding Research Group, 8820 Wädenswil, Switzerland
- Molecular Plant Breeding, Institute of Agricultural Sciences, ETH Zurich, 8092 Zurich, Switzerland
| | - François Laurens
- Univ Angers, Institut Agro, INRAE, IRHS, SFR QuaSaV, F-49000 Angers, France
| | - Bruno Studer
- Molecular Plant Breeding, Institute of Agricultural Sciences, ETH Zurich, 8092 Zurich, Switzerland
| | - Hélène Muranty
- Univ Angers, Institut Agro, INRAE, IRHS, SFR QuaSaV, F-49000 Angers, France
| | - Andrea Patocchi
- Agroscope, Breeding Research Group, 8820 Wädenswil, Switzerland
| |
Collapse
|
13
|
Khanal P, Tempelman RJ. The use of milk Fourier-transform mid-infrared spectroscopy to diagnose pregnancy and determine spectral regional associations with pregnancy in US dairy cows. J Dairy Sci 2022; 105:3209-3221. [DOI: 10.3168/jds.2021-21079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Accepted: 12/21/2021] [Indexed: 11/19/2022]
|
14
|
Lara LADC, Pocrnic I, Oliveira TDP, Gaynor RC, Gorjanc G. Temporal and genomic analysis of additive genetic variance in breeding programmes. Heredity (Edinb) 2022; 128:21-32. [PMID: 34912044 PMCID: PMC8733024 DOI: 10.1038/s41437-021-00485-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Revised: 10/24/2021] [Accepted: 11/01/2021] [Indexed: 11/13/2022] Open
Abstract
Genetic variance is a central parameter in quantitative genetics and breeding. Assessing changes in genetic variance over time as well as the genome is therefore of high interest. Here, we extend a previously proposed framework for temporal analysis of genetic variance using the pedigree-based model, to a new framework for temporal and genomic analysis of genetic variance using marker-based models. To this end, we describe the theory of partitioning genetic variance into genic variance and within-chromosome and between-chromosome linkage-disequilibrium, and how to estimate these variance components from a marker-based model fitted to observed phenotype and marker data. The new framework involves three steps: (i) fitting a marker-based model to data, (ii) sampling realisations of marker effects from the fitted model and for each sample calculating realisations of genetic values and (iii) calculating the variance of sampled genetic values by time and genome partitions. Analysing time partitions indicates breeding programme sustainability, while analysing genome partitions indicates contributions from chromosomes and chromosome pairs and linkage-disequilibrium. We demonstrate the framework with a simulated breeding programme involving a complex trait. Results show good concordance between simulated and estimated variances, provided that the fitted model is capturing genetic complexity of a trait. We observe a reduction of genetic variance due to selection and drift changing allele frequencies, and due to selection inducing negative linkage-disequilibrium.
Collapse
Affiliation(s)
- Letícia A de C Lara
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Edinburgh, UK.
| | - Ivan Pocrnic
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Edinburgh, UK
| | - Thiago de P Oliveira
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Edinburgh, UK
| | - R Chris Gaynor
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Edinburgh, UK
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Edinburgh, UK
| |
Collapse
|
15
|
Wolfe MD, Chan AW, Kulakow P, Rabbi I, Jannink JL. Genomic mating in outbred species: predicting cross usefulness with additive and total genetic covariance matrices. Genetics 2021; 219:6363799. [PMID: 34740244 PMCID: PMC8570794 DOI: 10.1093/genetics/iyab122] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Accepted: 07/13/2021] [Indexed: 11/14/2022] Open
Abstract
Diverse crops are both outbred and clonally propagated. Breeders typically use truncation selection of parents and invest significant time, land, and money evaluating the progeny of crosses to find exceptional genotypes. We developed and tested genomic mate selection criteria suitable for organisms of arbitrary homozygosity level where the full-sibling progeny are of direct interest as future parents and/or cultivars. We extended cross variance and covariance variance prediction to include dominance effects and predicted the multivariate selection index genetic variance of crosses based on haplotypes of proposed parents, marker effects, and recombination frequencies. We combined the predicted mean and variance into usefulness criteria for parent and variety development. We present an empirical study of cassava (Manihot esculenta), a staple tropical root crop. We assessed the potential to predict the multivariate genetic distribution (means, variances, and trait covariances) of 462 cassava families in terms of additive and total value using cross-validation. Most variance (89%) and covariance (70%) prediction accuracy estimates were greater than zero. The usefulness of crosses was accurately predicted with good correspondence between the predicted and the actual mean performance of family members breeders selected for advancement as new parents and candidate varieties. We also used a directional dominance model to quantify significant inbreeding depression for most traits. We predicted 47,083 possible crosses of 306 parents and contrasted them to those previously tested to show how mate selection can reveal the new potential within the germplasm. We enable breeders to consider the potential of crosses to produce future parents (progeny with top breeding values) and varieties (progeny with top own performance).
Collapse
Affiliation(s)
- Marnin D Wolfe
- Section on Plant Breeding and Genetics, School of Integrative Plant Sciences, Cornell University, Ithaca, NY 14850, USA
| | - Ariel W Chan
- Section on Plant Breeding and Genetics, School of Integrative Plant Sciences, Cornell University, Ithaca, NY 14850, USA
| | - Peter Kulakow
- International Institute of Tropical Agriculture (IITA), Ibadan, Nigeria
| | - Ismail Rabbi
- International Institute of Tropical Agriculture (IITA), Ibadan, Nigeria
| | - Jean-Luc Jannink
- Section on Plant Breeding and Genetics, School of Integrative Plant Sciences, Cornell University, Ithaca, NY 14850, USA.,USDA-ARS, Ithaca, NY 14850, USA
| |
Collapse
|
16
|
Dissecting the Genetic Architecture of Biofuel-Related Traits in a Sorghum Breeding Population. G3-GENES GENOMES GENETICS 2020; 10:4565-4577. [PMID: 33051261 PMCID: PMC7718745 DOI: 10.1534/g3.120.401582] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In sorghum [Sorghum bicolor (L.) Moench], hybrid cultivars for the biofuel industry are desired. Along with selection based on testcross performance, evaluation of the breeding population per se is also important for the success of hybrid breeding. In addition to additive genetic effects, non-additive (i.e., dominance and epistatic) effects are expected to contribute to the performance of early generations. Unfortunately, studies on early generations in sorghum breeding programs are limited. In this study, we analyzed a breeding population for bioenergy sorghum, which was previously developed based on testcross performance, to compare genomic selection models both trained on and evaluated for the per se performance of the 3rd generation S0 individuals. Of over 200 ancestral inbred accessions in the base population, only 13 founders contributed to the 3rd generation as progenitors. Compared to the founders, the performances of the population per se were improved for target traits. The total genetic variance within the S0 generation progenies themselves for all traits was mainly additive, although non-additive variances contributed to each trait to some extent. For genomic selection, linear regression models explicitly considering all genetic components showed a higher predictive ability than other linear and non-linear models. Although the number and effect distribution of underlying loci was different among the traits, the influence of priors for marker effects was relatively small. These results indicate the importance of considering non-additive effects for dissecting the genetic architecture of early breeding generations and predicting the performance per se.
Collapse
|
17
|
Keller B, Ariza-Suarez D, de la Hoz J, Aparicio JS, Portilla-Benavides AE, Buendia HF, Mayor VM, Studer B, Raatz B. Genomic Prediction of Agronomic Traits in Common Bean ( Phaseolus vulgaris L.) Under Environmental Stress. FRONTIERS IN PLANT SCIENCE 2020; 11:1001. [PMID: 32774338 PMCID: PMC7381332 DOI: 10.3389/fpls.2020.01001] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Accepted: 06/18/2020] [Indexed: 05/19/2023]
Abstract
In plant and animal breeding, genomic prediction models are established to select new lines based on genomic data, without the need for laborious phenotyping. Prediction models can be trained on recent or historic phenotypic data and increasingly available genotypic data. This enables the adoption of genomic selection also in under-used legume crops such as common bean. Beans are an important staple food in the tropics and mainly grown by smallholders under limiting environmental conditions such as drought or low soil fertility. Therefore, genotype-by-environment interactions (G × E) are an important consideration when developing new bean varieties. However, G × E are often not considered in genomic prediction models nor are these models implemented in current bean breeding programs. Here we show the prediction abilities of four agronomic traits in common bean under various environmental stresses based on twelve field trials. The dataset includes 481 elite breeding lines characterized by 5,820 SNP markers. Prediction abilities over all twelve trials ranged between 0.6 and 0.8 for yield and days to maturity, respectively, predicting new lines into new seasons. In all four evaluated traits, the prediction abilities reached about 50-80% of the maximum accuracies given by phenotypic correlations and heritability. Predictions under drought and low phosphorus stress were up to 10 and 20% improved when G × E were included in the model, respectively. Our results demonstrate the potential of genomic selection to increase the genetic gain in common bean breeding. Prediction abilities improved when more phenotypic data was available and G × E could be accounted for. Furthermore, the developed models allowed us to predict genotypic performance under different environmental stresses. This will be a key factor in the development of common bean varieties adapted to future challenging conditions.
Collapse
Affiliation(s)
- Beat Keller
- Bean Program, Agrobiodiversity Area, International Center for Tropical Agriculture (CIAT), Cali, Colombia
- Molecular Plant Breeding, Institute of Agricultural Sciences, ETH Zurich, Zurich, Switzerland
| | - Daniel Ariza-Suarez
- Bean Program, Agrobiodiversity Area, International Center for Tropical Agriculture (CIAT), Cali, Colombia
| | - Juan de la Hoz
- Bean Program, Agrobiodiversity Area, International Center for Tropical Agriculture (CIAT), Cali, Colombia
| | - Johan Steven Aparicio
- Bean Program, Agrobiodiversity Area, International Center for Tropical Agriculture (CIAT), Cali, Colombia
| | | | - Hector Fabio Buendia
- Bean Program, Agrobiodiversity Area, International Center for Tropical Agriculture (CIAT), Cali, Colombia
| | - Victor Manuel Mayor
- Bean Program, Agrobiodiversity Area, International Center for Tropical Agriculture (CIAT), Cali, Colombia
| | - Bruno Studer
- Molecular Plant Breeding, Institute of Agricultural Sciences, ETH Zurich, Zurich, Switzerland
| | - Bodo Raatz
- Bean Program, Agrobiodiversity Area, International Center for Tropical Agriculture (CIAT), Cali, Colombia
| |
Collapse
|
18
|
Wang L, Janss LL, Madsen P, Henshall J, Huang CH, Marois D, Alemu S, Sørensen AC, Jensen J. Effect of genomic selection and genotyping strategy on estimation of variance components in animal models using different relationship matrices. Genet Sel Evol 2020; 52:31. [PMID: 32527317 PMCID: PMC7291515 DOI: 10.1186/s12711-020-00550-w] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Accepted: 06/02/2020] [Indexed: 11/21/2022] Open
Abstract
Background The traditional way to estimate variance components (VC) is based on the animal model using a pedigree-based relationship matrix (A) (A-AM). After genomic selection was introduced into breeding programs, it was anticipated that VC estimates from A-AM would be biased because the effect of selection based on genomic information is not captured. The single-step method (H-AM), which uses an H matrix as (co)variance matrix, can be used as an alternative to estimate VC. Here, we compared VC estimates from A-AM and H-AM and investigated the effect of genomic selection, genotyping strategy and genotyping proportion on the estimation of VC from the two methods, by analyzing a dataset from a commercial broiler line and a simulated dataset that mimicked the broiler population. Results VC estimates from H-AM were severely overestimated with a high proportion of selective genotyping, and overestimation increased as proportion of genotyping increased in the analysis of both commercial and simulated data. This bias in H-AM estimates arises when selective genotyping is used to construct the H-matrix, regardless of whether selective genotyping is applied or not in the selection process. For simulated populations under genomic selection, estimates of genetic variance from A-AM were also significantly overestimated when the effect of genomic selection was strong. Our results suggest that VC estimates from H-AM under random genotyping have the expected values. Predicted breeding values from H-AM were inflated when VC estimates were biased, and inflation differed between genotyped and ungenotyped animals, which can lead to suboptimal selection decisions. Conclusions We conclude that VC estimates from H-AM are biased with selective genotyping, but are close to expected values with random genotyping.VC estimates from A-AM in populations under genomic selection are also biased but to a much lesser degree. Therefore, we recommend the use of H-AM with random genotyping to estimate VC for populations under genomic selection. Our results indicate that it is still possible to use selective genotyping in selection, but then VC estimation should avoid the use of genotypes from one side only of the distribution of phenotypes. Hence, a dual genotyping strategy may be needed to address both selection and VC estimation.
Collapse
Affiliation(s)
- Lei Wang
- Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark.
| | - Luc L Janss
- Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark
| | - Per Madsen
- Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark
| | | | | | | | - Setegn Alemu
- Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark
| | - A C Sørensen
- Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark
| | - Just Jensen
- Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark
| |
Collapse
|
19
|
Allier A, Teyssèdre S, Lehermeier C, Charcosset A, Moreau L. Genomic prediction with a maize collaborative panel: identification of genetic resources to enrich elite breeding programs. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2020; 133:201-215. [PMID: 31595338 DOI: 10.1007/s00122-019-03451-9] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Accepted: 09/28/2019] [Indexed: 05/02/2023]
Abstract
Collaborative diversity panels and genomic prediction seem relevant to identify and harness genetic resources for polygenic trait-specific enrichment of elite germplasms. In plant breeding, genetic diversity is important to maintain the pace of genetic gain and the ability to respond to new challenges in a context of climatic and social expectation changes. Many genetic resources are accessible to breeders but cannot all be considered for broadening the genetic diversity of elite germplasm. This study presents the use of genomic predictions trained on a collaborative diversity panel, which assembles genetic resources and elite lines, to identify resources to enrich an elite germplasm. A maize collaborative panel (386 lines) was considered to estimate genome-wide marker effects. Relevant predictive abilities (0.40-0.55) were observed on a large population of private elite materials, which supported the interest of such a collaborative panel for diversity management perspectives. Grain-yield estimated marker effects were used to select a donor that best complements an elite recipient at individual loci or haplotype segments, or that is expected to give the best-performing progeny with the elite. Among existing and new criteria that were compared, some gave more weight to the donor-elite complementarity than to the donor value, and appeared more adapted to long-term objective. We extended this approach to the selection of a set of donors complementing an elite population. We defined a crossing plan between identified donors and elite recipients. Our results illustrated how collaborative projects based on diversity panels including both public resources and elite germplasm can contribute to a better characterization of genetic resources in view of their use to enrich elite germplasm.
Collapse
Affiliation(s)
- Antoine Allier
- GQE - Le Moulon, INRA, Univ. Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, 91190, Gif-sur-Yvette, France
- RAGT2n, Genetics and Analytics Unit, 12510, Druelle, France
| | | | | | - Alain Charcosset
- GQE - Le Moulon, INRA, Univ. Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, 91190, Gif-sur-Yvette, France
| | - Laurence Moreau
- GQE - Le Moulon, INRA, Univ. Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, 91190, Gif-sur-Yvette, France.
| |
Collapse
|
20
|
Gao H, Madsen P, Aamand GP, Thomasen JR, Sørensen AC, Jensen J. Bias in estimates of variance components in populations undergoing genomic selection: a simulation study. BMC Genomics 2019; 20:956. [PMID: 31818251 PMCID: PMC6902321 DOI: 10.1186/s12864-019-6323-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Accepted: 11/22/2019] [Indexed: 01/07/2023] Open
Abstract
Background After the extensive implementation of genomic selection (GS), the choice of the statistical model and data used to estimate variance components (VCs) remains unclear. A primary concern is that VCs estimated from a traditional pedigree-based animal model (P-AM) will be biased due to ignoring the impact of GS. The objectives of this study were to examine the effects of GS on estimates of VC in the analysis of different sets of phenotypes and to investigate VC estimation using different methods. Data were simulated to resemble the Danish Jersey population. The simulation included three phases: (1) a historical phase; (2) 20 years of conventional breeding; and (3) 15 years of GS. The three scenarios based on different sets of phenotypes for VC estimation were as follows: (1) Pheno1: phenotypes from only the conventional phase (1–20 years); (2) Pheno1 + 2: phenotypes from both the conventional phase and GS phase (1–35 years); (3) Pheno2: phenotypes from only the GS phase (21–35 years). Single-step genomic BLUP (ssGBLUP), a single-step Bayesian regression model (ssBR), and P-AM were applied. Two base populations were defined: the first was the founder population referred to by the pedigree-based relationship (P-base); the second was the base population referred to by the current genotyped population (G-base). Results In general, both the ssGBLUP and ssBR models with all the phenotypic and genotypic information (Pheno1 + 2) yielded biased estimates of additive genetic variance compared to the P-base model. When the phenotypes from the conventional breeding phase were excluded (Pheno2), P-AM led to underestimation of the genetic variance of P-base. Compared to the VCs of G-base, when phenotypes from the conventional breeding phase (Pheno2) were ignored, the ssBR model yielded unbiased estimates of the total genetic variance and marker-based genetic variance, whereas the residual variance was overestimated. Conclusions The results show that neither of the single-step models (ssGBLUP and ssBR) can precisely estimate the VCs for populations undergoing GS. Overall, the best solution for obtaining unbiased estimates of VCs is to use P-AM with phenotypes from the conventional phase or phenotypes from both the conventional and GS phases.
Collapse
Affiliation(s)
- Hongding Gao
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, DK-8830, Tjele, Denmark. .,Nordic Cattle Genetic Evaluation, DK-8200, Aarhus, Denmark.
| | - Per Madsen
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, DK-8830, Tjele, Denmark
| | | | | | - Anders Christian Sørensen
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, DK-8830, Tjele, Denmark
| | - Just Jensen
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, DK-8830, Tjele, Denmark
| |
Collapse
|
21
|
Inferring trait-specific similarity among individuals from molecular markers and phenotypes with Bayesian regression. Theor Popul Biol 2019; 132:47-59. [PMID: 31830483 DOI: 10.1016/j.tpb.2019.11.008] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2019] [Revised: 11/21/2019] [Accepted: 11/22/2019] [Indexed: 12/20/2022]
Abstract
Modeling covariance structure based on genetic similarity between pairs of relatives plays an important role in evolutionary, quantitative and statistical genetics. Historically, genetic similarity between individuals has been quantified from pedigrees via the probability that randomly chosen homologous alleles between individuals are identical by descent (IBD). At present, however, many genetic analyses rely on molecular markers, with realized measures of genomic similarity replacing IBD-based expected similarities. Animal and plant breeders, for example, now employ marker-based genomic relationship matrices between individuals in prediction models and in estimation of genome-based heritability coefficients. Phenotypes convey information about genetic similarity as well. For instance, if phenotypic values are at least partially the result of the action of quantitative trait loci, one would expect the former to inform about the latter, as in genome-wide association studies. Statistically, a non-trivial conditional distribution of unknown genetic similarities, given phenotypes, is to be expected. A Bayesian formalism is presented here that applies to whole-genome regression methods where some genetic similarity matrix, e.g., a genomic relationship matrix, can be defined. Our Bayesian approach, based on phenotypes and markers, converts prior (markers only) expected similarity into trait-specific posterior similarity. A simulation illustrates situations under which effective Bayesian learning from phenotypes occurs. Pinus and wheat data sets were used to demonstrate applicability of the concept in practice. The methodology applies to a wide class of Bayesian linear regression models, it extends to the multiple-trait domain, and can also be used to develop phenotype-guided similarity kernels in prediction problems.
Collapse
|
22
|
Macedo FL, Reverter A, Legarra A. Behavior of the Linear Regression method to estimate bias and accuracies with correct and incorrect genetic evaluation models. J Dairy Sci 2019; 103:529-544. [PMID: 31704008 DOI: 10.3168/jds.2019-16603] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Accepted: 09/13/2019] [Indexed: 11/19/2022]
Abstract
Bias in genetic evaluations has been a constant concern in animal genetics. The interest in this topic has increased in the last years, since many studies have detected overestimation (bias) in estimated breeding values (EBV). Detecting the existence of bias, and the realized accuracy of predictions, is therefore of importance, yet this is difficult when studying small data sets or breeds. In this study, we tested by simulation the recently presented method Linear Regression (LR) for estimation of bias, slope, and accuracy of pedigree EBV. The LR method computes statistics by comparing EBV from a data set containing old, partial information with EBV from a data set containing all information (old and new, a whole data set) for the same individuals. The method proposes an estimator for bias (Δpˆ), an estimator of slope (bpˆ), and 3 estimators related to accuracies: the ratio between accuracies [Formula: see text] the reliability of the partial data set (accp2ˆ), and the ratio of reliabilities (ρp,w2ˆ). We simulated a dairy scheme for low (0.10) and moderate (0.30) heritabilities. In both cases, we checked the behavior of the estimators for 3 scenarios: (1) when the evaluation model is the same as the model used to simulate the data; (2) when the evaluation model uses an incorrect heritability; and (3) when the data includes an environmental trend. For scenarios in which the evaluation model was correct, the LR method was capable of correctly estimating bias, slope, and accuracies, with better performance for higher heritability [i.e., corr(bp,bpˆ) was 0.45 for h2 = 0.10 and 0.59 for h2 = 0.30]. In cases of the use of incorrect heritabilities in the evaluation model, the bias was correctly estimated in direction but not in magnitude. In the same way, the magnitudes of bias and of slope were underestimated in scenarios with environmental trends in data, except for cases in which contemporary groups were random and greatly shrunken. In general, accuracies were well estimated in all scenarios. The LR method is capable of checking bias and accuracy in all cases, if the evaluation model is reasonably correct or robust, and its estimations are more precise with more information (e.g., high heritability). If the model uses an incorrect heritability or a hidden trend exists in the data, it is still possible to estimate the direction and existence of bias and slope but not always their magnitudes.
Collapse
Affiliation(s)
- F L Macedo
- INRA, GenPhySE, Castanet-Tolosan 31320, France; Facultad de Veterinaria, Universidad de la República, 11600 Montevideo, Uruguay.
| | - A Reverter
- CSIRO Agriculture and Food, St. Lucia 4067, Australia
| | - A Legarra
- INRA, GenPhySE, Castanet-Tolosan 31320, France
| |
Collapse
|
23
|
Allier A, Lehermeier C, Charcosset A, Moreau L, Teyssèdre S. Improving Short- and Long-Term Genetic Gain by Accounting for Within-Family Variance in Optimal Cross-Selection. Front Genet 2019; 10:1006. [PMID: 31737033 PMCID: PMC6828944 DOI: 10.3389/fgene.2019.01006] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2019] [Accepted: 09/20/2019] [Indexed: 12/30/2022] Open
Abstract
The implementation of genomic selection in recurrent breeding programs raises the concern that a higher inbreeding rate could compromise the long-term genetic gain. An optimized mating strategy that maximizes the performance in progeny and maintains diversity for long-term genetic gain is therefore essential. The optimal cross-selection approach aims at identifying the optimal set of crosses that maximizes the expected genetic value in the progeny under a constraint on genetic diversity in the progeny. Optimal cross-selection usually does not account for within-family selection, i.e., the fact that only a selected fraction of each family is used as parents of the next generation. In this study, we consider within-family variance accounting for linkage disequilibrium between quantitative trait loci to predict the expected mean performance and the expected genetic diversity in the selected progeny of a set of crosses. These predictions rely on the usefulness criterion parental contribution (UCPC) method. We compared UCPC-based optimal cross-selection and the optimal cross-selection approach in a long-term simulated recurrent genomic selection breeding program considering overlapping generations. UCPC-based optimal cross-selection proved to be more efficient to convert the genetic diversity into short- and long-term genetic gains than optimal cross-selection. We also showed that, using the UCPC-based optimal cross-selection, the long-term genetic gain can be increased with only a limited reduction of the short-term commercial genetic gain.
Collapse
Affiliation(s)
- Antoine Allier
- GQE-Le Moulon, INRA, Univ. Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, Gif-sur-Yvette, France
- Genetics and Analytics Unit, RAGT2n, Druelle, France
| | | | - Alain Charcosset
- GQE-Le Moulon, INRA, Univ. Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, Gif-sur-Yvette, France
| | - Laurence Moreau
- GQE-Le Moulon, INRA, Univ. Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, Gif-sur-Yvette, France
| | | |
Collapse
|
24
|
Schreck N, Piepho HP, Schlather M. Best Prediction of the Additive Genomic Variance in Random-Effects Models. Genetics 2019; 213:379-394. [PMID: 31383770 PMCID: PMC6781909 DOI: 10.1534/genetics.119.302324] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Accepted: 07/30/2019] [Indexed: 12/26/2022] Open
Abstract
The additive genomic variance in linear models with random marker effects can be defined as a random variable that is in accordance with classical quantitative genetics theory. Common approaches to estimate the genomic variance in random-effects linear models based on genomic marker data can be regarded as estimating the unconditional (or prior) expectation of this random additive genomic variance, and result in a negligence of the contribution of linkage disequilibrium (LD). We introduce a novel best prediction (BP) approach for the additive genomic variance in both the current and the base population in the framework of genomic prediction using the genomic best linear unbiased prediction (gBLUP) method. The resulting best predictor is the conditional (or posterior) expectation of the additive genomic variance when using the additional information given by the phenotypic data, and is structurally in accordance with the genomic equivalent of the classical additive genetic variance in random-effects models. In particular, the best predictor includes the contribution of (marker) LD to the additive genomic variance and possibly fully eliminates the missing contribution of LD that is caused by the assumptions of statistical frameworks such as the random-effects model. We derive an empirical best predictor (eBP) and compare its performance with common approaches to estimate the additive genomic variance in random-effects models on commonly used genomic datasets.
Collapse
Affiliation(s)
- Nicholas Schreck
- Research Group on Stochastics and its Applications, School of Business Informatics and Mathematics, University of Mannheim, 68159, Germany
| | - Hans-Peter Piepho
- Biostatistics Unit, Institute of Crop Science, University of Hohenheim, 70593 Stuttgart, Germany
| | - Martin Schlather
- Research Group on Stochastics and its Applications, School of Business Informatics and Mathematics, University of Mannheim, 68159, Germany
- Animal Breeding and Genetics Group, Center for Integrated Breeding Research, University of Goettingen, 37075, Germany
| |
Collapse
|
25
|
Michel S, Löschenberger F, Ametz C, Pachler B, Sparry E, Bürstmayr H. Simultaneous selection for grain yield and protein content in genomics-assisted wheat breeding. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2019; 132:1745-1760. [PMID: 30810763 PMCID: PMC6531418 DOI: 10.1007/s00122-019-03312-5] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Accepted: 02/15/2019] [Indexed: 05/10/2023]
Abstract
Large genetic improvement can be achieved by simultaneous genomic selection for grain yield and protein content when combining different breeding strategies in the form of selection indices. Genomic selection has been implemented in many national and international breeding programmes in recent years. Numerous studies have shown the potential of this new breeding tool; few have, however, taken the simultaneous selection for multiple traits into account that is though common practice in breeding programmes. The simultaneous improvement in grain yield and protein content is thereby a major challenge in wheat breeding due to a severe negative trade-off. Accordingly, the potential and limits of multi-trait selection for this particular trait complex utilizing the vast phenotypic and genomic data collected in an applied wheat breeding programme were investigated in this study. Two breeding strategies based on various genomic-selection indices were compared, which (1) aimed to select high-protein genotypes with acceptable yield potential and (2) develop high-yielding varieties, while maintaining protein content. The prediction accuracy of preliminary yield trials could be strongly improved when combining phenotypic and genomic information in a genomics-assisted selection approach, which surpassed both genomics-based and classical phenotypic selection methods both for single trait predictions and in genomic index selection across years. The employed genomic selection indices mitigated furthermore the negative trade-off between grain yield and protein content leading to a substantial selection response for protein yield, i.e. total seed nitrogen content, which suggested that it is feasible to develop varieties that combine a superior yield potential with comparably high protein content, thus utilizing available nitrogen resources more efficiently.
Collapse
Affiliation(s)
- Sebastian Michel
- Department of Agrobiotechnology, IFA-Tulln, University of Natural Resources and Life Sciences Vienna, Konrad-Lorenz-Str. 20, 3430, Tulln, Austria.
| | | | - Christian Ametz
- Saatzucht Donau GesmbH & CoKG, Saatzuchtstrasse 11, 2301, Probstdorf, Austria
| | - Bernadette Pachler
- Saatzucht Donau GesmbH & CoKG, Saatzuchtstrasse 11, 2301, Probstdorf, Austria
| | - Ellen Sparry
- C&M Seeds, 6180 5th Line, Palmerston, ON, N0G 2P0, Canada
| | - Hermann Bürstmayr
- Department of Agrobiotechnology, IFA-Tulln, University of Natural Resources and Life Sciences Vienna, Konrad-Lorenz-Str. 20, 3430, Tulln, Austria
| |
Collapse
|
26
|
Usefulness Criterion and Post-selection Parental Contributions in Multi-parental Crosses: Application to Polygenic Trait Introgression. G3-GENES GENOMES GENETICS 2019; 9:1469-1479. [PMID: 30819823 PMCID: PMC6505154 DOI: 10.1534/g3.119.400129] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Predicting the usefulness of crosses in terms of expected genetic gain and genetic diversity is of interest to secure performance in the progeny and to maintain long-term genetic gain in plant breeding. A wide range of crossing schemes are possible including large biparental crosses, backcrosses, four-way crosses, and synthetic populations. In silico progeny simulations together with genome-based prediction of quantitative traits can be used to guide mating decisions. However, the large number of multi-parental combinations can hinder the use of simulations in practice. Analytical solutions have been proposed recently to predict the distribution of a quantitative trait in the progeny of biparental crosses using information of recombination frequency and linkage disequilibrium between loci. Here, we extend this approach to obtain the progeny distribution of more complex crosses including two to four parents. Considering agronomic traits and parental genome contribution as jointly multivariate normally distributed traits, the usefulness criterion parental contribution (UCPC) enables to (i) evaluate the expected genetic gain for agronomic traits, and at the same time (ii) evaluate parental genome contributions to the selected fraction of progeny. We validate and illustrate UCPC in the context of multiple allele introgression from a donor into one or several elite recipients in maize (Zea mays L.). Recommendations regarding the interest of two-way, three-way, and backcrosses were derived depending on the donor performance. We believe that the computationally efficient UCPC approach can be useful for mate selection and allocation in many plant and animal breeding contexts.
Collapse
|
27
|
Allier A, Teyssèdre S, Lehermeier C, Claustres B, Maltese S, Melkior S, Moreau L, Charcosset A. Assessment of breeding programs sustainability: application of phenotypic and genomic indicators to a North European grain maize program. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2019; 132:1321-1334. [PMID: 30666392 DOI: 10.1007/s00122-019-03280-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/25/2018] [Accepted: 01/07/2019] [Indexed: 06/09/2023]
Abstract
We review and propose easily implemented and affordable indicators to assess the genetic diversity and the potential of a breeding population and propose solutions for its long-term management. Successful plant breeding programs rely on balanced efforts between short-term goals to develop competitive cultivars and long-term goals to improve and maintain diversity in the genetic pool. Indicators of the sustainability of response to selection in breeding pools are of key importance in this context. We reviewed and proposed sets of indicators based on temporal phenotypic and genotypic data and applied them on an early maize grain program implying two breeding pools (Dent and Flint) selected in a reciprocal manner. Both breeding populations showed a significant positive genetic gain summing up to 1.43 qx/ha/year but contrasted evolutions of genetic variance. Advances in high-throughput genotyping permitted the identification of regions of low diversity, mainly localized in pericentromeric regions. Observed changes in genetic diversity were multiple, reflecting a complex breeding system. We estimated the impact of linkage disequilibrium (LD) and of allelic diversity on the additive genetic variance at a genome-wide and chromosome-wide scale. Consistently with theoretical expectation under directional selection, we found a negative contribution of LD to genetic variance, which was unevenly distributed between chromosomes. This suggests different chromosome selection histories and underlines the interest to recombine specific chromosome regions. All three sets of indicators valorize in house data and are easy to implement in the era of genomic selection in every breeding program.
Collapse
Affiliation(s)
- Antoine Allier
- GQE ‑ Le Moulon, INRA, Univ. Paris‑Sud, CNRS, AgroParisTech, Université Paris-Saclay, 91190, Gif-sur-Yvette, France
- RAGT2n, Genetics and Analytics Unit, 12510, Druelle, France
| | | | | | | | | | | | - Laurence Moreau
- GQE ‑ Le Moulon, INRA, Univ. Paris‑Sud, CNRS, AgroParisTech, Université Paris-Saclay, 91190, Gif-sur-Yvette, France
| | - Alain Charcosset
- GQE ‑ Le Moulon, INRA, Univ. Paris‑Sud, CNRS, AgroParisTech, Université Paris-Saclay, 91190, Gif-sur-Yvette, France.
| |
Collapse
|
28
|
Modeling Heterogeneity in the Genetic Architecture of Ethnically Diverse Groups Using Random Effect Interaction Models. Genetics 2019; 211:1395-1407. [PMID: 30796011 PMCID: PMC6456318 DOI: 10.1534/genetics.119.301909] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2018] [Accepted: 01/24/2019] [Indexed: 01/08/2023] Open
Abstract
In humans, most genome-wide association studies have been conducted using data from Caucasians and many of the reported findings have not replicated in other populations. This lack of replication may be due to statistical issues (small sample sizes or confounding) or perhaps more fundamentally to differences in the genetic architecture of traits between ethnically diverse subpopulations. What aspects of the genetic architecture of traits vary between subpopulations and how can this be quantified? We consider studying effect heterogeneity using Bayesian random effect interaction models. The proposed methodology can be applied using shrinkage and variable selection methods, and produces useful information about effect heterogeneity in the form of whole-genome summaries (e.g., the proportions of variance of a complex trait explained by a set of SNPs and the average correlation of effects) as well as SNP-specific attributes. Using simulations, we show that the proposed methodology yields (nearly) unbiased estimates when the sample size is not too small relative to the number of SNPs used. Subsequently, we used the methodology for the analyses of four complex human traits (standing height, high-density lipoprotein, low-density lipoprotein, and serum urate levels) in European-Americans (EAs) and African-Americans (AAs). The estimated correlations of effects between the two subpopulations were well below unity for all the traits, ranging from 0.73 to 0.50. The extent of effect heterogeneity varied between traits and SNP sets. Height showed less differences in SNP effects between AAs and EAs whereas HDL, a trait highly influenced by lifestyle, exhibited a greater extent of effect heterogeneity. For all the traits, we observed substantial variability in effect heterogeneity across SNPs, suggesting that effect heterogeneity varies between regions of the genome.
Collapse
|
29
|
Alves FC, Granato ÍSC, Galli G, Lyra DH, Fritsche-Neto R, de Los Campos G. Bayesian analysis and prediction of hybrid performance. PLANT METHODS 2019; 15:14. [PMID: 30774704 PMCID: PMC6366084 DOI: 10.1186/s13007-019-0388-x] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/16/2018] [Accepted: 01/16/2019] [Indexed: 05/02/2023]
Abstract
BACKGROUND The selection of hybrids is an essential step in maize breeding. However, evaluating a large number of hybrids in field trials can be extremely costly. However, genomic models can be used to predict the expected performance of un-tested genotypes. Bayesian models offer a very flexible framework for hybrid prediction. The Bayesian methodology can be used with parametric and semi-parametric assumptions for additive and non-additive effects. Furthermore, samples from the posterior distribution of Bayesian models can be used to estimate the variance due to general and specific combining abilities even in cases where additive and non-additive effects are not mutually orthogonal. Also, the use of Bayesian models for analysis and prediction of hybrid performance has remained fairly limited. RESULTS We provided an overview of Bayesian parametric and semi-parametric genomic models for prediction of agronomic traits in maize hybrids and discussed how these models can be used to decompose the genotypic variance into components due to general and specific combining ability. We applied the methodology to data from 906 single cross tropical maize hybrids derived from a convergent population. Our results show that: (1) non-additive effects make a sizable contribution to the genetic variance of grain yield; however, the relative importance of non-additive effects was much smaller for ear and plant height; (2) genomic prediction can achieve relatively high accuracy in predicting phenotypes of un-tested hybrids and in pre-screening. CONCLUSIONS Genomic prediction can be a useful tool in pre-screening of hybrids and could contribute to the improvement of the efficiency and efficacy of maize hybrids breeding programs. The Bayesian framework offers a great deal of flexibility in modeling hybrid performance. The methodology can be used to estimate important genetic parameters and render predictions of the expected hybrid performance as well measures of uncertainty about such predictions.
Collapse
Affiliation(s)
- Filipe Couto Alves
- 2Department of Epidemiology and Biostatistics, Michigan State University, 775 Woodlot Dr. Office 1315, East Lansing, USA
| | - Ítalo Stefanine Correa Granato
- 3Department of Genetics, "Luiz de Queiroz" College of Agriculture, University of São Paulo, Avenida Pádua Dias, No 11, Piracicaba, São Paulo Brazil
| | - Giovanni Galli
- 3Department of Genetics, "Luiz de Queiroz" College of Agriculture, University of São Paulo, Avenida Pádua Dias, No 11, Piracicaba, São Paulo Brazil
| | - Danilo Hottis Lyra
- 4Department of Computational and Analytical Sciences, Rothamsted Research, Harpenden, UK
| | - Roberto Fritsche-Neto
- 3Department of Genetics, "Luiz de Queiroz" College of Agriculture, University of São Paulo, Avenida Pádua Dias, No 11, Piracicaba, São Paulo Brazil
| | - Gustavo de Los Campos
- 1Departments of Epidemiology and Biostatistics, Statistics and Probability and Institute of Quantitative Health Science and Engineering, Michigan State University, 775 Woodlot Dr. Office 1311, East Lansing, USA
| |
Collapse
|
30
|
Legarra A, Reverter A. Semi-parametric estimates of population accuracy and bias of predictions of breeding values and future phenotypes using the LR method. Genet Sel Evol 2018; 50:53. [PMID: 30400768 PMCID: PMC6219059 DOI: 10.1186/s12711-018-0426-6] [Citation(s) in RCA: 101] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2018] [Accepted: 10/15/2018] [Indexed: 11/29/2022] Open
Abstract
Background Cross-validation tools are used increasingly to validate and compare genetic evaluation methods but analytical properties of cross-validation methods are rarely described. There is also a lack of cross-validation tools for complex problems such as prediction of indirect effects (e.g. maternal effects) or for breeding schemes with small progeny group sizes. Results We derive the expected value of several quadratic forms by comparing genetic evaluations including “partial” and “whole” data. We propose statistics that compare genetic evaluations including “partial” and “whole” data based on differences in means, covariance, and correlation, and term the use of these statistics “method LR” (from linear regression). Contrary to common belief, the regression of true on estimated breeding values is (on expectation) lower than 1 for small or related validation sets, due to family structures. For validation sets that are sufficiently large, we show that these statistics yield estimators of bias, slope or dispersion, and population accuracy for estimated breeding values. Similar results hold for prediction of future phenotypes although we show that estimates of bias, slope or dispersion using prediction of future phenotypes are sensitive to incorrect heritabilities or precorrection for fixed effects. We present an example for a set of 2111 Brahman beef cattle for which, in repeated partitioning of the data into training and validation sets, there is very good agreement of statistics of method LR with prediction of future phenotypes. Conclusions Analytical properties of cross-validation measures are presented. We present a new method named LR for cross-validation that is automatic, easy to use, and which yields the quantities of interest. The method compares predictions based on partial and whole data, which results in estimates of accuracy and biases. Prediction of observed records may yield biased results due to precorrection or use of incorrect heritabilities. Electronic supplementary material The online version of this article (10.1186/s12711-018-0426-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Andres Legarra
- INRA, UMR1388 GenPhySE, 31326, Castanet-Tolosan, France.
| | - Antonio Reverter
- CSIRO Agriculture and Food, 306 Carmody Rd., St. Lucia, QLD, 4067, Australia
| |
Collapse
|
31
|
Molenaar H, Boehm R, Piepho HP. Phenotypic Selection in Ornamental Breeding: It's Better to Have the BLUPs Than to Have the BLUEs. FRONTIERS IN PLANT SCIENCE 2018; 9:1511. [PMID: 30455707 PMCID: PMC6230591 DOI: 10.3389/fpls.2018.01511] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/13/2018] [Accepted: 09/26/2018] [Indexed: 05/13/2023]
Abstract
Plant breeders always face the challenge to select the best individuals. Selection methods are required that maximize selection gain based on available data. When several crosses have been made, the BLUP procedure achieves this by combining phenotypic data with information on pedigree relationships via an index, known as family-index selection. The index, estimated based on the intra-class correlation coefficient, exploits the relationship among individuals within a family relative to other families in the population. An intra-class correlation coefficient of one indicates that the individual performance can be fully explained based on the family background, whereas an intra-class correlation coefficient of zero indicates the performance of individuals is independent of the family background. In the case the intra-class correlation coefficient is one, family-index selection is considered. In the case the intra-class correlation coefficient is zero, individual selection is considered. The main difference between individual and family-index selection lies in the adjustment in estimating the individual's effect depending on the intra-class correlation coefficient afforded by the latter. Two examples serve to illustrate the application of the BLUP method. The efficiency of individual and family-index selection was evaluated in terms of the heritability obtained from linear mixed models implementing the selection methods by suitably defining the treatment factor as the sum of individual and family effect. Family-index selection was found to be at least as efficient as individual selection in Dianthus caryophyllus L., except for flower size in standard carnation and vase life in mini carnation for which traits family-index selection outperformed individual selection. Family-index selection was superior to individual selection in Pelargonium zonale in cases when the heritability was low. Hence, the pedigree-based BLUP procedure can enhance selection efficiency in production-related traits in P. zonale or shelf-life related in D. caryophyllus L.
Collapse
Affiliation(s)
- Heike Molenaar
- Biostatistics Unit, Institute of Crop Science, University of Hohenheim, Stuttgart, Germany
| | | | - Hans-Peter Piepho
- Biostatistics Unit, Institute of Crop Science, University of Hohenheim, Stuttgart, Germany
| |
Collapse
|
32
|
de Los Campos G, Vazquez AI, Hsu S, Lello L. Complex-Trait Prediction in the Era of Big Data. Trends Genet 2018; 34:746-754. [PMID: 30139641 PMCID: PMC6150788 DOI: 10.1016/j.tig.2018.07.004] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Revised: 07/09/2018] [Accepted: 07/16/2018] [Indexed: 01/18/2023]
Abstract
Accurate prediction of complex traits requires using a large number of DNA variants. Advances in statistical and machine learning methodology enable the identification of complex patterns in high-dimensional settings. However, training these highly parameterized methods requires very large data sets. Until recently, such data sets were not available. But the situation is changing rapidly as very large biomedical data sets comprising individual genotype-phenotype data for hundreds of thousands of individuals become available in public and private domains. We argue that the convergence of advances in methodology and the advent of Big Genomic Data will enable unprecedented improvements in complex-trait prediction; we review theory and evidence supporting our claim and discuss challenges and opportunities that Big Data will bring to complex-trait prediction.
Collapse
Affiliation(s)
- Gustavo de Los Campos
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI 48824, USA; Department of Statistics and Probability, Michigan State University, East Lansing, MI 48824, USA; Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI 48824, USA.
| | - Ana Ines Vazquez
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI 48824, USA; Department of Statistics and Probability, Michigan State University, East Lansing, MI 48824, USA
| | - Stephen Hsu
- Department of Physics and Astronomy, Michigan State University, East Lansing, MI 48824, USA; Cognitive Genomics Laboratory, BGI, Shenzhen 518083, China
| | - Louis Lello
- Department of Physics and Astronomy, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
33
|
Enciso-Rodriguez F, Douches D, Lopez-Cruz M, Coombs J, de Los Campos G. Genomic Selection for Late Blight and Common Scab Resistance in Tetraploid Potato ( Solanum tuberosum). G3 (BETHESDA, MD.) 2018; 8:2471-2481. [PMID: 29794167 PMCID: PMC6027896 DOI: 10.1534/g3.118.200273] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Potato (Solanum tuberosum) is a staple food crop and is considered one of the main sources of carbohydrates worldwide. Late blight (Phytophthora infestans) and common scab (Streptomyces scabies) are two of the primary production constraints faced by potato farming. Previous studies have identified a few resistance genes for both late blight and common scab; however, these genes explain only a limited fraction of the heritability of these diseases. Genomic selection has been demonstrated to be an effective methodology for breeding value prediction in many major crops (e.g., maize and wheat). However, the technology has received little attention in potato breeding. We present the first genomic selection study involving late blight and common scab in tetraploid potato. Our data involves 4,110 (Single Nucleotide Polymorphisms, SNPs) and phenotypic field evaluations for late blight (n=1,763) and common scab (n=3,885) collected in seven and nine years, respectively. We report moderately high genomic heritability estimates (0.46 ± 0.04 and 0.45 ± 0.017, for late blight and common scab, respectively). The extent of genotype-by-year interaction was high for late blight and low for common scab. Our assessment of prediction accuracy demonstrates the applicability of genomic prediction for tetraploid potato breeding. For both traits, we found that more than 90% of the genetic variance could be captured with an additive model. For common scab, the highest prediction accuracy was achieved using an additive model. For late blight, small but statistically significant gains in prediction accuracy were achieved using a model that accounted for both additive and dominance effects. Using whole-genome regression models we identified SNPs located in previously reported hotspots regions for late blight, on genes associated with systemic disease resistance responses, and a new locus located in a WRKY transcription factor for common scab.
Collapse
Affiliation(s)
| | | | | | | | - Gustavo de Los Campos
- Department of Epidemiology & Biostatistics
- Department of Statistics & Probability
- Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, Michigan, 48824
| |
Collapse
|
34
|
Gianola D, Cecchinato A, Naya H, Schön CC. Prediction of Complex Traits: Robust Alternatives to Best Linear Unbiased Prediction. Front Genet 2018; 9:195. [PMID: 29951082 PMCID: PMC6008589 DOI: 10.3389/fgene.2018.00195] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2018] [Accepted: 05/14/2018] [Indexed: 12/05/2022] Open
Abstract
A widely used method for prediction of complex traits in animal and plant breeding is “genomic best linear unbiased prediction” (GBLUP). In a quantitative genetics setting, BLUP is a linear regression of phenotypes on a pedigree or on a genomic relationship matrix, depending on the type of input information available. Normality of the distributions of random effects and of model residuals is not required for BLUP but a Gaussian assumption is made implicitly. A potential downside is that Gaussian linear regressions are sensitive to outliers, genetic or environmental in origin. We present simple (relative to a fully Bayesian analysis) to implement robust alternatives to BLUP using a linear model with residual t or Laplace distributions instead of a Gaussian one, and evaluate the methods with milk yield records on Italian Brown Swiss cattle, grain yield data in inbred wheat lines, and using three traits measured on accessions of Arabidopsis thaliana. The methods do not use Markov chain Monte Carlo sampling and model hyper-parameters, viewed here as regularization “knobs,” are tuned via some cross-validation. Uncertainty of predictions are evaluated by employing bootstrapping or by random reconstruction of training and testing sets. It was found (e.g., test-day milk yield in cows, flowering time and FRIGIDA expression in Arabidopsis) that the best predictions were often those obtained with the robust methods. The results obtained are encouraging and stimulate further investigation and generalization.
Collapse
Affiliation(s)
- Daniel Gianola
- Department of Animal Sciences, University of Wisconsin-Madison, Madison, WI, United States.,Department of Dairy Science, University of Wisconsin-Madison, Madison, WI, United States.,Department of Plant Sciences, TUM School of Life Sciences, Technical University of Munich, Munich, Germany.,Department of Agronomy, Food Natural Resources, Animals and Environment, Università degli Studi di Padova, Padova, Italy.,Institut Pasteur de Montevideo, Montevideo, Uruguay
| | - Alessio Cecchinato
- Department of Agronomy, Food Natural Resources, Animals and Environment, Università degli Studi di Padova, Padova, Italy
| | - Hugo Naya
- Institut Pasteur de Montevideo, Montevideo, Uruguay
| | - Chris-Carolin Schön
- Department of Plant Sciences, TUM School of Life Sciences, Technical University of Munich, Munich, Germany
| |
Collapse
|
35
|
Genetic Gain Increases by Applying the Usefulness Criterion with Improved Variance Prediction in Selection of Crosses. Genetics 2017; 207:1651-1661. [PMID: 29038144 DOI: 10.1534/genetics.117.300403] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Accepted: 10/10/2017] [Indexed: 11/18/2022] Open
Abstract
A crucial step in plant breeding is the selection and combination of parents to form new crosses. Genome-based prediction guides the selection of high-performing parental lines in many crop breeding programs which ensures a high mean performance of progeny. To warrant maximum selection progress, a new cross should also provide a large progeny variance. The usefulness concept as measure of the gain that can be obtained from a specific cross accounts for variation in progeny variance. Here, it is shown that genetic gain can be considerably increased when crosses are selected based on their genomic usefulness criterion compared to selection based on mean genomic estimated breeding values. An efficient and improved method to predict the genetic variance of a cross based on Markov chain Monte Carlo samples of marker effects from a whole-genome regression model is suggested. In simulations representing selection procedures in crop breeding programs, the performance of this novel approach is compared with existing methods, like selection based on mean genomic estimated breeding values and optimal haploid values. In all cases, higher genetic gain was obtained compared with previously suggested methods. When 1% of progenies per cross were selected, the genetic gain based on the estimated usefulness criterion increased by 0.14 genetic standard deviation compared to a selection based on mean genomic estimated breeding values. Analytical derivations of the progeny genotypic variance-covariance matrix based on parental genotypes and genetic map information make simulations of progeny dispensable, and allow fast implementation in large-scale breeding programs.
Collapse
|
36
|
Abstract
Modern biobanks that collect genotype-phenotype information from hundreds of thousands of individuals bring unprecedented opportunities for genomic... Despite the important discoveries reported by genome-wide association (GWA) studies, for most traits and diseases the prediction R-squared (R-sq.) achieved with genetic scores remains considerably lower than the trait heritability. Modern biobanks will soon deliver unprecedentedly large biomedical data sets: Will the advent of big data close the gap between the trait heritability and the proportion of variance that can be explained by a genomic predictor? We addressed this question using Bayesian methods and a data analysis approach that produces a surface response relating prediction R-sq. with sample size and model complexity (e.g., number of SNPs). We applied the methodology to data from the interim release of the UK Biobank. Focusing on human height as a model trait and using 80,000 records for model training, we achieved a prediction R-sq. in testing (n = 22,221) of 0.24 (95% C.I.: 0.23–0.25). Our estimates show that prediction R-sq. increases with sample size, reaching an estimated plateau at values that ranged from 0.1 to 0.37 for models using 500 and 50,000 (GWA-selected) SNPs, respectively. Soon much larger data sets will become available. Using the estimated surface response, we forecast that larger sample sizes will lead to further improvements in prediction R-sq. We conclude that big data will lead to a substantial reduction of the gap between trait heritability and the proportion of interindividual differences that can be explained with a genomic predictor. However, even with the power of big data, for complex traits we anticipate that the gap between prediction R-sq. and trait heritability will not be fully closed.
Collapse
|