1
|
Alemu A, Åstrand J, Montesinos-López OA, Isidro Y Sánchez J, Fernández-Gónzalez J, Tadesse W, Vetukuri RR, Carlsson AS, Ceplitis A, Crossa J, Ortiz R, Chawade A. Genomic selection in plant breeding: Key factors shaping two decades of progress. MOLECULAR PLANT 2024; 17:552-578. [PMID: 38475993 DOI: 10.1016/j.molp.2024.03.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 01/22/2024] [Accepted: 03/08/2024] [Indexed: 03/14/2024]
Abstract
Genomic selection, the application of genomic prediction (GP) models to select candidate individuals, has significantly advanced in the past two decades, effectively accelerating genetic gains in plant breeding. This article provides a holistic overview of key factors that have influenced GP in plant breeding during this period. We delved into the pivotal roles of training population size and genetic diversity, and their relationship with the breeding population, in determining GP accuracy. Special emphasis was placed on optimizing training population size. We explored its benefits and the associated diminishing returns beyond an optimum size. This was done while considering the balance between resource allocation and maximizing prediction accuracy through current optimization algorithms. The density and distribution of single-nucleotide polymorphisms, level of linkage disequilibrium, genetic complexity, trait heritability, statistical machine-learning methods, and non-additive effects are the other vital factors. Using wheat, maize, and potato as examples, we summarize the effect of these factors on the accuracy of GP for various traits. The search for high accuracy in GP-theoretically reaching one when using the Pearson's correlation as a metric-is an active research area as yet far from optimal for various traits. We hypothesize that with ultra-high sizes of genotypic and phenotypic datasets, effective training population optimization methods and support from other omics approaches (transcriptomics, metabolomics and proteomics) coupled with deep-learning algorithms could overcome the boundaries of current limitations to achieve the highest possible prediction accuracy, making genomic selection an effective tool in plant breeding.
Collapse
Affiliation(s)
- Admas Alemu
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden.
| | - Johanna Åstrand
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden; Lantmännen Lantbruk, Svalöv, Sweden
| | | | - Julio Isidro Y Sánchez
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223 Madrid, Spain
| | - Javier Fernández-Gónzalez
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223 Madrid, Spain
| | - Wuletaw Tadesse
- International Center for Agricultural Research in the Dry Areas (ICARDA), Rabat, Morocco
| | - Ramesh R Vetukuri
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden
| | - Anders S Carlsson
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden
| | | | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera México-Veracruz, Texcoco, México 52640, Mexico
| | - Rodomiro Ortiz
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden.
| | - Aakash Chawade
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden
| |
Collapse
|
2
|
Fernández-González J, Haquin B, Combes E, Bernard K, Allard A, Isidro Y Sánchez J. Maximizing efficiency in sunflower breeding through historical data optimization. PLANT METHODS 2024; 20:42. [PMID: 38493115 PMCID: PMC10943787 DOI: 10.1186/s13007-024-01151-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 01/30/2024] [Indexed: 03/18/2024]
Abstract
Genomic selection (GS) has become an increasingly popular tool in plant breeding programs, propelled by declining genotyping costs, an increase in computational power, and rediscovery of the best linear unbiased prediction methodology over the past two decades. This development has led to an accumulation of extensive historical datasets with genotypic and phenotypic information, triggering the question of how to best utilize these datasets. Here, we investigate whether all available data or a subset should be used to calibrate GS models for across-year predictions in a 7-year dataset of a commercial hybrid sunflower breeding program. We employed a multi-objective optimization approach to determine the ideal years to include in the training set (TRS). Next, for a given combination of TRS years, we further optimized the TRS size and its genetic composition. We developed the Min_GRM size optimization method which consistently found the optimal TRS size, reducing dimensionality by 20% with an approximately 1% loss in predictive ability. Additionally, the Tails_GEGVs algorithm displayed potential, outperforming the use of all data by using just 60% of it for grain yield, a high-complexity, low-heritability trait. Moreover, maximizing the genetic diversity of the TRS resulted in a consistent predictive ability across the entire range of genotypic values in the test set. Interestingly, the Tails_GEGVs algorithm, due to its ability to leverage heterogeneity, enhanced predictive performance for key hybrids with extreme genotypic values. Our study provides new insights into the optimal utilization of historical data in plant breeding programs, resulting in improved GS model predictive ability.
Collapse
Affiliation(s)
- Javier Fernández-González
- Centro de Biotecnologia y Genómica de Plantas (CBGP, UPM-INIA)-Instituto Nacional de Investigación y Tecnologia Agraria y Alimentaria (INIA), Universidad Politécnica de Madrid (UPM), Campus de Montegancedo-UPM, Pozuelo de Alarcón, Madrid, 28223, Spain.
| | | | | | | | | | - Julio Isidro Y Sánchez
- Centro de Biotecnologia y Genómica de Plantas (CBGP, UPM-INIA)-Instituto Nacional de Investigación y Tecnologia Agraria y Alimentaria (INIA), Universidad Politécnica de Madrid (UPM), Campus de Montegancedo-UPM, Pozuelo de Alarcón, Madrid, 28223, Spain.
| |
Collapse
|
3
|
Bartholomé J, Frouin J, Brottier L, Cao TV, Boisnard A, Ahmadi N, Courtois B. Genomic selection for salinity tolerance in japonica rice. PLoS One 2023; 18:e0291833. [PMID: 37756295 PMCID: PMC10530037 DOI: 10.1371/journal.pone.0291833] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2023] [Accepted: 09/06/2023] [Indexed: 09/29/2023] Open
Abstract
Improving plant performance in salinity-prone conditions is a significant challenge in breeding programs. Genomic selection is currently integrated into many plant breeding programs as a tool for increasing selection intensity and precision for complex traits and for reducing breeding cycle length. A rice reference panel (RP) of 241 Oryza sativa L. japonica accessions genotyped with 20,255 SNPs grown in control and mild salinity stress conditions was evaluated at the vegetative stage for eight morphological traits and ion mass fractions (Na and K). Weak to strong genotype-by-condition interactions were found for the traits considered. Cross-validation showed that the predictive ability of genomic prediction methods ranged from 0.25 to 0.64 for multi-environment models with morphological traits and from 0.05 to 0.40 for indices of stress response and ion mass fractions. The performances of a breeding population (BP) comprising 393 japonica accessions were predicted with models trained on the RP. For validation of the predictive performances of the models, a subset of 41 accessions was selected from the BP and phenotyped under the same experimental conditions as the RP. The predictive abilities estimated on this subset ranged from 0.00 to 0.66 for the multi-environment models, depending on the traits, and were strongly correlated with the predictive abilities on cross-validation in the RP in salt condition (r = 0.69). We show here that genomic selection is efficient for predicting the salt stress tolerance of breeding lines. Genomic selection could improve the efficiency of rice breeding strategies for salinity-prone environments.
Collapse
Affiliation(s)
- Jérôme Bartholomé
- UMR AGAP Institut, CIRAD, Cali, Colombia
- UMR AGAP Institut, Institut Agro, Univ Montpellier, CIRAD, INRAE, Montpellier, France
- Alliance Bioversity-CIAT, Recta Palmira Cali, Colombia
| | - Julien Frouin
- UMR AGAP Institut, Institut Agro, Univ Montpellier, CIRAD, INRAE, Montpellier, France
- CIRAD, UMR AGAP Institut, Montpellier, France
| | - Laurent Brottier
- UMR AGAP Institut, Institut Agro, Univ Montpellier, CIRAD, INRAE, Montpellier, France
- CIRAD, UMR AGAP Institut, Montpellier, France
| | - Tuong-Vi Cao
- UMR AGAP Institut, Institut Agro, Univ Montpellier, CIRAD, INRAE, Montpellier, France
- CIRAD, UMR AGAP Institut, Montpellier, France
| | | | - Nourollah Ahmadi
- UMR AGAP Institut, Institut Agro, Univ Montpellier, CIRAD, INRAE, Montpellier, France
- CIRAD, UMR AGAP Institut, Montpellier, France
| | - Brigitte Courtois
- UMR AGAP Institut, Institut Agro, Univ Montpellier, CIRAD, INRAE, Montpellier, France
- CIRAD, UMR AGAP Institut, Montpellier, France
| |
Collapse
|
4
|
Fernández-González J, Akdemir D, Isidro Y Sánchez J. A comparison of methods for training population optimization in genomic selection. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2023; 136:30. [PMID: 36892603 PMCID: PMC9998580 DOI: 10.1007/s00122-023-04265-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Accepted: 11/21/2022] [Indexed: 06/18/2023]
Abstract
Maximizing CDmean and Avg_GRM_self were the best criteria for training set optimization. A training set size of 50-55% (targeted) or 65-85% (untargeted) is needed to obtain 95% of the accuracy. With the advent of genomic selection (GS) as a widespread breeding tool, mechanisms to efficiently design an optimal training set for GS models became more relevant, since they allow maximizing the accuracy while minimizing the phenotyping costs. The literature described many training set optimization methods, but there is a lack of a comprehensive comparison among them. This work aimed to provide an extensive benchmark among optimization methods and optimal training set size by testing a wide range of them in seven datasets, six different species, different genetic architectures, population structure, heritabilities, and with several GS models to provide some guidelines about their application in breeding programs. Our results showed that targeted optimization (uses information from the test set) performed better than untargeted (does not use test set data), especially when heritability was low. The mean coefficient of determination was the best targeted method, although it was computationally intensive. Minimizing the average relationship within the training set was the best strategy for untargeted optimization. Regarding the optimal training set size, maximum accuracy was obtained when the training set was the entire candidate set. Nevertheless, a 50-55% of the candidate set was enough to reach 95-100% of the maximum accuracy in the targeted scenario, while we needed a 65-85% for untargeted optimization. Our results also suggested that a diverse training set makes GS robust against population structure, while including clustering information was less effective. The choice of the GS model did not have a significant influence on the prediction accuracies.
Collapse
Affiliation(s)
- Javier Fernández-González
- Centro de Biotecnologia y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnologia Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223, Madrid, Spain.
| | - Deniz Akdemir
- CIBMTR (Center for International Blood and Marrow Transplant Research), National Marrow Donor Program/Be The Match, Minneapolis, USA
| | - Julio Isidro Y Sánchez
- Centro de Biotecnologia y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnologia Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223, Madrid, Spain.
| |
Collapse
|
5
|
Jeon D, Kang Y, Lee S, Choi S, Sung Y, Lee TH, Kim C. Digitalizing breeding in plants: A new trend of next-generation breeding based on genomic prediction. FRONTIERS IN PLANT SCIENCE 2023; 14:1092584. [PMID: 36743488 PMCID: PMC9892199 DOI: 10.3389/fpls.2023.1092584] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 01/05/2023] [Indexed: 06/18/2023]
Abstract
As the world's population grows and food needs diversification, the demand for cereals and horticultural crops with beneficial traits increases. In order to meet a variety of demands, suitable cultivars and innovative breeding methods need to be developed. Breeding methods have changed over time following the advance of genetics. With the advent of new sequencing technology in the early 21st century, predictive breeding, such as genomic selection (GS), emerged when large-scale genomic information became available. GS shows good predictive ability for the selection of individuals with traits of interest even for quantitative traits by using various types of the whole genome-scanning markers, breaking away from the limitations of marker-assisted selection (MAS). In the current review, we briefly describe the history of breeding techniques, each breeding method, various statistical models applied to GS and methods to increase the GS efficiency. Consequently, we intend to propose and define the term digital breeding through this review article. Digital breeding is to develop a predictive breeding methods such as GS at a higher level, aiming to minimize human intervention by automatically proceeding breeding design, propagating breeding populations, and to make selections in consideration of various environments, climates, and topography during the breeding process. We also classified the phases of digital breeding based on the technologies and methods applied to each phase. This review paper will provide an understanding and a direction for the final evolution of plant breeding in the future.
Collapse
Affiliation(s)
- Donghyun Jeon
- Plant Computational Genomics Laboratory, Department of Science in Smart Agriculture Systems, Chungnam National University, Daejeon, Republic of Korea
| | - Yuna Kang
- Plant Computational Genomics Laboratory, Department of Crop Science, Chungnam National University, Daejeon, Republic of Korea
| | - Solji Lee
- Plant Computational Genomics Laboratory, Department of Crop Science, Chungnam National University, Daejeon, Republic of Korea
| | - Sehyun Choi
- Plant Computational Genomics Laboratory, Department of Crop Science, Chungnam National University, Daejeon, Republic of Korea
| | - Yeonjun Sung
- Plant Computational Genomics Laboratory, Department of Science in Smart Agriculture Systems, Chungnam National University, Daejeon, Republic of Korea
| | - Tae-Ho Lee
- Genomics Division, National Institute of Agricultural Sciences, Jeonju, Republic of Korea
| | - Changsoo Kim
- Plant Computational Genomics Laboratory, Department of Science in Smart Agriculture Systems, Chungnam National University, Daejeon, Republic of Korea
- Plant Computational Genomics Laboratory, Department of Crop Science, Chungnam National University, Daejeon, Republic of Korea
| |
Collapse
|
6
|
Building a Calibration Set for Genomic Prediction, Characteristics to Be Considered, and Optimization Approaches. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2467:77-112. [PMID: 35451773 DOI: 10.1007/978-1-0716-2205-6_3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
The efficiency of genomic selection strongly depends on the prediction accuracy of the genetic merit of candidates. Numerous papers have shown that the composition of the calibration set is a key contributor to prediction accuracy. A poorly defined calibration set can result in low accuracies, whereas an optimized one can considerably increase accuracy compared to random sampling, for a same size. Alternatively, optimizing the calibration set can be a way of decreasing the costs of phenotyping by enabling similar levels of accuracy compared to random sampling but with fewer phenotypic units. We present here the different factors that have to be considered when designing a calibration set, and review the different criteria proposed in the literature. We classified these criteria into two groups: model-free criteria based on relatedness, and criteria derived from the linear mixed model. We introduce criteria targeting specific prediction objectives including the prediction of highly diverse panels, biparental families, or hybrids. We also review different ways of updating the calibration set, and different procedures for optimizing phenotyping experimental designs.
Collapse
|
7
|
Bartholomé J, Prakash PT, Cobb JN. Genomic Prediction: Progress and Perspectives for Rice Improvement. Methods Mol Biol 2022; 2467:569-617. [PMID: 35451791 DOI: 10.1007/978-1-0716-2205-6_21] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Genomic prediction can be a powerful tool to achieve greater rates of genetic gain for quantitative traits if thoroughly integrated into a breeding strategy. In rice as in other crops, the interest in genomic prediction is very strong with a number of studies addressing multiple aspects of its use, ranging from the more conceptual to the more practical. In this chapter, we review the literature on rice (Oryza sativa) and summarize important considerations for the integration of genomic prediction in breeding programs. The irrigated breeding program at the International Rice Research Institute is used as a concrete example on which we provide data and R scripts to reproduce the analysis but also to highlight practical challenges regarding the use of predictions. The adage "To someone with a hammer, everything looks like a nail" describes a common psychological pitfall that sometimes plagues the integration and application of new technologies to a discipline. We have designed this chapter to help rice breeders avoid that pitfall and appreciate the benefits and limitations of applying genomic prediction, as it is not always the best approach nor the first step to increasing the rate of genetic gain in every context.
Collapse
Affiliation(s)
- Jérôme Bartholomé
- CIRAD, UMR AGAP Institut, Montpellier, France.
- AGAP Institut, Univ Montpellier, CIRAD, INRAE, Montpellier SupAgro, Montpellier, France.
- Rice Breeding Platform, International Rice Research Institute, Manila, Philippines.
| | | | | |
Collapse
|
8
|
Baertschi C, Cao TV, Bartholomé J, Ospina Y, Quintero C, Frouin J, Bouvet JM, Grenier C. Impact of early genomic prediction for recurrent selection in an upland rice synthetic population. G3 (BETHESDA, MD.) 2021; 11:jkab320. [PMID: 34498036 PMCID: PMC8664429 DOI: 10.1093/g3journal/jkab320] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Accepted: 08/16/2021] [Indexed: 11/14/2022]
Abstract
Population breeding through recurrent selection is based on the repetition of evaluation and recombination among best-selected individuals. In this type of breeding strategy, early evaluation of selection candidates combined with genomic prediction could substantially shorten the breeding cycle length, thus increasing the rate of genetic gain. The objective of this study was to optimize early genomic prediction in an upland rice (Oryza sativa L.) synthetic population improved through recurrent selection via shuttle breeding in two sites. To this end, we used genomic prediction on 334 S0 genotypes evaluated with early generation progeny testing (S0:2 and S0:3) across two sites. Four traits were measured (plant height, days to flowering, grain yield, and grain zinc concentration) and the predictive ability was assessed for the target site. For days to flowering and plant height, which correlate well among sites (0.51-0.62), an increase of up to 0.4 in predictive ability was observed when the model was trained using the two sites. For grain zinc concentration, adding the phenotype of the predicted lines in the nontarget site to the model improved the predictive ability (0.51 with two-site and 0.31 with single-site model), whereas for grain yield the gain was less (0.42 with two-site and 0.35 with single-site calibration). Through these results, we found a good opportunity to optimize the genomic recurrent selection scheme and maximize the use of resources by performing early progeny testing in two sites for traits with best expression and/or relevance in each specific environment.
Collapse
Affiliation(s)
- Cédric Baertschi
- CIRAD, UMR AGAP Institut, F-34398 Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, F-34398 Montpellier, France
| | - Tuong-Vi Cao
- CIRAD, UMR AGAP Institut, F-34398 Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, F-34398 Montpellier, France
| | - Jérôme Bartholomé
- CIRAD, UMR AGAP Institut, F-34398 Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, F-34398 Montpellier, France
- Rice Breeding Platform, International Rice Research Institute, Metro Manila, Philippines
| | - Yolima Ospina
- Alliance Bioversity-CIAT, Recta Palmira Cali, Colombia
| | | | - Julien Frouin
- CIRAD, UMR AGAP Institut, F-34398 Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, F-34398 Montpellier, France
| | - Jean-Marc Bouvet
- CIRAD, UMR AGAP Institut, F-34398 Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, F-34398 Montpellier, France
- CIRAD, Dispositif de Recherche et d’Enseignement en Partenariat “Forêts et Biodiversité à Madagascar”, Antananarivo, Madagascar
| | - Cécile Grenier
- CIRAD, UMR AGAP Institut, F-34398 Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, F-34398 Montpellier, France
- Alliance Bioversity-CIAT, Recta Palmira Cali, Colombia
| |
Collapse
|
9
|
Wilson S, Malosetti M, Maliepaard C, Mulder HA, Visser RGF, van Eeuwijk F. Training Set Construction for Genomic Prediction in Auto-Tetraploids: An Example in Potato. FRONTIERS IN PLANT SCIENCE 2021; 12:771075. [PMID: 34899794 PMCID: PMC8651708 DOI: 10.3389/fpls.2021.771075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/05/2021] [Accepted: 10/20/2021] [Indexed: 06/14/2023]
Abstract
Training set construction is an important prerequisite to Genomic Prediction (GP), and while this has been studied in diploids, polyploids have not received the same attention. Polyploidy is a common feature in many crop plants, like for example banana and blueberry, but also potato which is the third most important crop in the world in terms of food consumption, after rice and wheat. The aim of this study was to investigate the impact of different training set construction methods using a publicly available diversity panel of tetraploid potatoes. Four methods of training set construction were compared: simple random sampling, stratified random sampling, genetic distance sampling and sampling based on the coefficient of determination (CDmean). For stratified random sampling, population structure analyses were carried out in order to define sub-populations, but since sub-populations accounted for only 16.6% of genetic variation, there were negligible differences between stratified and simple random sampling. For genetic distance sampling, four genetic distance measures were compared and though they performed similarly, Euclidean distance was the most consistent. In the majority of cases the CDmean method was the best sampling method, and compared to simple random sampling gave improvements of 4-14% in cross-validation scenarios, and 2-8% in scenarios with an independent test set, while genetic distance sampling gave improvements of 5.5-10.5% and 0.4-4.5%. No interaction was found between sampling method and the statistical model for the traits analyzed.
Collapse
Affiliation(s)
- Stefan Wilson
- Biometris, Wageningen University & Research, Wageningen, Netherlands
| | - Marcos Malosetti
- Biometris, Wageningen University & Research, Wageningen, Netherlands
| | - Chris Maliepaard
- Plant Breeding, Wageningen University & Research, Wageningen, Netherlands
| | - Han A. Mulder
- Wageningen University & Research, Animal Breeding and Genomics, Wageningen, Netherlands
| | | | - Fred van Eeuwijk
- Biometris, Wageningen University & Research, Wageningen, Netherlands
| |
Collapse
|
10
|
Li W, Boer MP, Zheng C, Joosen RVL, van Eeuwijk FA. An IBD-based mixed model approach for QTL mapping in multiparental populations. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2021; 134:3643-3660. [PMID: 34342658 PMCID: PMC8519866 DOI: 10.1007/s00122-021-03919-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 07/16/2021] [Indexed: 05/16/2023]
Abstract
The identity-by-descent (IBD)-based mixed model approach introduced in this study can detect quantitative trait loci (QTLs) referring to the parental origin and simultaneously account for multilevel relatedness of individuals within and across families. This unified approach is proved to be a powerful approach for all kinds of multiparental population (MPP) designs. Multiparental populations (MPPs) have become popular for quantitative trait loci (QTL) detection. Tools for QTL mapping in MPPs are mostly developed for specific MPPs and do not generalize well to other MPPs. We present an IBD-based mixed model approach for QTL mapping in all kinds of MPP designs, e.g., diallel, Nested Association Mapping (NAM), and Multiparental Advanced Generation Intercross (MAGIC) designs. The first step is to compute identity-by-descent (IBD) probabilities using a general Hidden Markov model framework, called reconstructing ancestry blocks bit by bit (RABBIT). Next, functions of IBD information are used as design matrices, or genetic predictors, in a mixed model approach to estimate variance components for multiallelic genetic effects associated with parents. Family-specific residual genetic effects are added, and a polygenic effect is structured by kinship relations between individuals. Case studies of simulated diallel, NAM, and MAGIC designs proved that the advanced IBD-based multi-QTL mixed model approach incorporating both kinship relations and family-specific residual variances (IBD.MQMkin_F) is robust across a variety of MPP designs and allele segregation patterns in comparison to a widely used benchmark association mapping method, and in most cases, outperformed or behaved at least as well as other tools developed for specific MPP designs in terms of mapping power and resolution. Successful analyses of real data cases confirmed the wide applicability of our IBD-based mixed model methodology.
Collapse
Affiliation(s)
- Wenhao Li
- Biometris, Wageningen University and Research Center, P.O Box 100, 6700 AC, Wageningen, The Netherlands
| | - Martin P Boer
- Biometris, Wageningen University and Research Center, P.O Box 100, 6700 AC, Wageningen, The Netherlands
| | - Chaozhi Zheng
- Biometris, Wageningen University and Research Center, P.O Box 100, 6700 AC, Wageningen, The Netherlands
| | - Ronny V L Joosen
- Rijk Zwaan Breeding B.V., P.O Box 40, 2678 ZG, De Lier, The Netherlands
| | - Fred A van Eeuwijk
- Biometris, Wageningen University and Research Center, P.O Box 100, 6700 AC, Wageningen, The Netherlands.
| |
Collapse
|
11
|
Isidro y Sánchez J, Akdemir D. Training Set Optimization for Sparse Phenotyping in Genomic Selection: A Conceptual Overview. FRONTIERS IN PLANT SCIENCE 2021; 12:715910. [PMID: 34589099 PMCID: PMC8475495 DOI: 10.3389/fpls.2021.715910] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Accepted: 08/10/2021] [Indexed: 06/13/2023]
Abstract
Genomic selection (GS) is becoming an essential tool in breeding programs due to its role in increasing genetic gain per unit time. The design of the training set (TRS) in GS is one of the key steps in the implementation of GS in plant and animal breeding programs mainly because (i) TRS optimization is critical for the efficiency and effectiveness of GS, (ii) breeders test genotypes in multi-year and multi-location trials to select the best-performing ones. In this framework, TRS optimization can help to decrease the number of genotypes to be tested and, therefore, reduce phenotyping cost and time, and (iii) we can obtain better prediction accuracies from optimally selected TRS than an arbitrary TRS. Here, we concentrate the efforts on reviewing the lessons learned from TRS optimization studies and their impact on crop breeding and discuss important features for the success of TRS optimization under different scenarios. In this article, we review the lessons learned from training population optimization in plants and the major challenges associated with the optimization of GS including population size, the relationship between training and test set (TS), update of TRS, and the use of different packages and algorithms for TRS implementation in GS. Finally, we describe general guidelines to improving the rate of genetic improvement by maximizing the use of the TRS optimization in the GS framework.
Collapse
Affiliation(s)
- Julio Isidro y Sánchez
- Centro de Biotecnologia y Genómica de Plantas, Instituto Nacional de Investigación y Tecnologia Agraria y Alimentaria, Universidad Politécnica de Madrid, Campus de Montegancedo, Madrid, Spain
| | - Deniz Akdemir
- Animal and Crop Science Division, Agriculture and Food Science Centre, University College Dublin, Dublin, Ireland
| |
Collapse
|
12
|
Paril JF, Balding DJ, Fournier-Level A. Optimizing sampling design and sequencing strategy for the genomic analysis of quantitative traits in natural populations. Mol Ecol Resour 2021; 22:137-152. [PMID: 34192415 DOI: 10.1111/1755-0998.13458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Revised: 05/02/2021] [Accepted: 06/25/2021] [Indexed: 11/27/2022]
Abstract
Mapping the genes underlying ecologically relevant traits in natural populations is fundamental to develop a molecular understanding of species adaptation. Current sequencing technologies enable the characterization of a species' genetic diversity across the landscape or even over its whole range. The relevant capture of the genetic diversity across the landscape is critical for a successful genetic mapping of traits and there are no clear guidelines on how to achieve an optimal sampling and which sequencing strategy to implement. Here we determine, through simulation, the sampling scheme that maximizes the power to map the genetic basis of a complex trait in an outbreeding species across an idealized landscape and draw genomic predictions for the trait, comparing individual and pool sequencing strategies. Our results show that quantitative trait locus detection power and prediction accuracy are higher when more populations over the landscape are sampled and this is more cost-effectively done with pool sequencing than with individual sequencing. Additionally, we recommend sampling populations from areas of high genetic diversity. As progress in sequencing enables the integration of trait-based functional ecology into landscape genomics studies, these findings will guide study designs allowing direct measures of genetic effects in natural populations across the environment.
Collapse
Affiliation(s)
- Jefferson F Paril
- School of Biosciences, The University of Melbourne, Parkville, Victoria, Australia
| | - David J Balding
- School of Biosciences, The University of Melbourne, Parkville, Victoria, Australia.,Melbourne Integrative Genomics, The University of Melbourne, Parkville, Victoria, Australia.,School of Mathematics and Statistics, The University of Melbourne, Parkville, Victoria, Australia
| | - Alexandre Fournier-Level
- School of Biosciences, The University of Melbourne, Parkville, Victoria, Australia.,Melbourne Integrative Genomics, The University of Melbourne, Parkville, Victoria, Australia
| |
Collapse
|
13
|
Michel S, Wagner C, Nosenko T, Steiner B, Samad-Zamini M, Buerstmayr M, Mayer K, Buerstmayr H. Merging Genomics and Transcriptomics for Predicting Fusarium Head Blight Resistance in Wheat. Genes (Basel) 2021; 12:114. [PMID: 33477759 PMCID: PMC7832326 DOI: 10.3390/genes12010114] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 01/14/2021] [Accepted: 01/16/2021] [Indexed: 01/13/2023] Open
Abstract
Genomic selection with genome-wide distributed molecular markers has evolved into a well-implemented tool in many breeding programs during the last decade. The resistance against Fusarium head blight (FHB) in wheat is probably one of the most thoroughly studied systems within this framework. Aside from the genome, other biological strata like the transcriptome have likewise shown some potential in predictive breeding strategies but have not yet been investigated for the FHB-wheat pathosystem. The aims of this study were thus to compare the potential of genomic with transcriptomic prediction, and to assess the merit of blending incomplete transcriptomic with complete genomic data by the single-step method. A substantial advantage of gene expression data over molecular markers has been observed for the prediction of FHB resistance in the studied diversity panel of breeding lines and released cultivars. An increase in prediction ability was likewise found for the single-step predictions, although this can mostly be attributed to an increased accuracy among the RNA-sequenced genotypes. The usage of transcriptomics can thus be seen as a complement to already established predictive breeding pipelines with pedigree and genomic data, particularly when more cost-efficient multiplexing techniques for RNA-sequencing will become more accessible in the future.
Collapse
Affiliation(s)
- Sebastian Michel
- Institute of Biotechnology in Plant Production (IFA-Tulln), University of Natural Resources and Life Sciences Vienna, 3430 Tulln, Austria; (C.W.); (B.S.); (M.S.-Z.); (M.B.); (H.B.)
| | - Christian Wagner
- Institute of Biotechnology in Plant Production (IFA-Tulln), University of Natural Resources and Life Sciences Vienna, 3430 Tulln, Austria; (C.W.); (B.S.); (M.S.-Z.); (M.B.); (H.B.)
| | - Tetyana Nosenko
- PGSB Plant Genome and Systems Biology, Helmholtz Center Munich, German Research Center for Environmental Health, 85764 Neuherberg, Germany; (T.N.); (K.M.)
- Research Unit Environmental Simulation (EUS) at the Institute of Biochemical Plant Pathology (BIOP), Helmholtz Zentrum München, 85764 Neuherberg, Germany
| | - Barbara Steiner
- Institute of Biotechnology in Plant Production (IFA-Tulln), University of Natural Resources and Life Sciences Vienna, 3430 Tulln, Austria; (C.W.); (B.S.); (M.S.-Z.); (M.B.); (H.B.)
| | - Mina Samad-Zamini
- Institute of Biotechnology in Plant Production (IFA-Tulln), University of Natural Resources and Life Sciences Vienna, 3430 Tulln, Austria; (C.W.); (B.S.); (M.S.-Z.); (M.B.); (H.B.)
- Saatzucht Edelhof GmbH, 3910 Zwettl, Austria
| | - Maria Buerstmayr
- Institute of Biotechnology in Plant Production (IFA-Tulln), University of Natural Resources and Life Sciences Vienna, 3430 Tulln, Austria; (C.W.); (B.S.); (M.S.-Z.); (M.B.); (H.B.)
| | - Klaus Mayer
- PGSB Plant Genome and Systems Biology, Helmholtz Center Munich, German Research Center for Environmental Health, 85764 Neuherberg, Germany; (T.N.); (K.M.)
| | - Hermann Buerstmayr
- Institute of Biotechnology in Plant Production (IFA-Tulln), University of Natural Resources and Life Sciences Vienna, 3430 Tulln, Austria; (C.W.); (B.S.); (M.S.-Z.); (M.B.); (H.B.)
| |
Collapse
|
14
|
Brauner PC, Müller D, Molenaar WS, Melchinger AE. Genomic prediction with multiple biparental families. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2020; 133:133-147. [PMID: 31595337 DOI: 10.1007/s00122-019-03445-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Accepted: 09/18/2019] [Indexed: 06/10/2023]
Abstract
For genomic prediction within biparental families using multiple biparental families, combined training sets comprising full-sibs from the same family and half-sib families are recommended to reach high and robust prediction accuracy, whereas inclusion of unrelated families is risky and can have negative effects. In recycling breeding, where elite inbreds are recombined to generate new source material, genomic and phenotypic information from lines of numerous biparental families (BPFs) is commonly available for genomic prediction (GP). For each BPF with a large number of candidates in the prediction set (PS), the training set (TS) can be composed of lines from the same full-sib family or multiple related and unrelated families to increase the TS size. GP was applied to BPFs generated in silico and from two published experiments to evaluate the prediction accuracy ([Formula: see text]) of different TS compositions. We compared [Formula: see text] for individual pairs of BPFs using as TS either full-sib, half-sib, or unrelated BPFs. While full-sibs yielded highly positive [Formula: see text] and half-sibs also mostly positive [Formula: see text] values, unrelated families had often negative [Formula: see text], and including these families in a combined TS reduced [Formula: see text]. By simulations, we demonstrated that optimized TS compositions exist, yielding 5-10% higher [Formula: see text] than the TS including all available BPFs. However, identification of poorly predictive families and finding the optimal TS composition with various quantitative-genetic parameters estimated from available data was not successful. Therefore, we suggest omitting unrelated families and combining in the TS full-sib and few half-sib families produced by specific mating designs, with a medium number (~ 50) of genotypes per family. This helps in balancing high [Formula: see text] in GP with a sufficient effective population size of the entire breeding program for securing high short- and long-term selection progress.
Collapse
Affiliation(s)
- Pedro C Brauner
- Institute of Plant Breeding, Seed Sciences and Population Genetics, University of Hohenheim, Fruwirthstraße 21, 70599, Stuttgart, Germany
| | - Dominik Müller
- Institute of Plant Breeding, Seed Sciences and Population Genetics, University of Hohenheim, Fruwirthstraße 21, 70599, Stuttgart, Germany
| | - Willem S Molenaar
- Institute of Plant Breeding, Seed Sciences and Population Genetics, University of Hohenheim, Fruwirthstraße 21, 70599, Stuttgart, Germany
| | - Albrecht E Melchinger
- Institute of Plant Breeding, Seed Sciences and Population Genetics, University of Hohenheim, Fruwirthstraße 21, 70599, Stuttgart, Germany.
| |
Collapse
|
15
|
Millet EJ, Kruijer W, Coupel-Ledru A, Alvarez Prado S, Cabrera-Bosquet L, Lacube S, Charcosset A, Welcker C, van Eeuwijk F, Tardieu F. Genomic prediction of maize yield across European environmental conditions. Nat Genet 2019; 51:952-956. [PMID: 31110353 DOI: 10.1038/s41588-019-0414-y] [Citation(s) in RCA: 101] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2018] [Accepted: 04/08/2019] [Indexed: 11/10/2022]
Abstract
The development of germplasm adapted to changing climate is required to ensure food security1,2. Genomic prediction is a powerful tool to evaluate many genotypes but performs poorly in contrasting environmental scenarios3-7 (genotype × environment interaction), in spite of promising results for flowering time8. New avenues are opened by the development of sensor networks for environmental characterization in thousands of fields9,10. We present a new strategy for germplasm evaluation under genotype × environment interaction. Yield was dissected in grain weight and number and genotype × environment interaction in these components was modeled as genotypic sensitivity to environmental drivers. Environments were characterized using genotype-specific indices computed from sensor data in each field and the progression of phenology calibrated for each genotype on a phenotyping platform. A whole-genome regression approach for the genotypic sensitivities led to accurate prediction of yield under genotype × environment interaction in a wide range of environmental scenarios, outperforming a benchmark approach.
Collapse
Affiliation(s)
- Emilie J Millet
- Biometris, WUR, Wageningen, the Netherlands.,LEPSE, INRA, Université Montpellier, SupAgro, Montpellier, France.,Biometris, WUR, Wageningen, the Netherlands
| | | | - Aude Coupel-Ledru
- LEPSE, INRA, Université Montpellier, SupAgro, Montpellier, France.,University of Bristol, School of Biological Sciences, Bristol, UK
| | - Santiago Alvarez Prado
- LEPSE, INRA, Université Montpellier, SupAgro, Montpellier, France.,IFEVA and CONICET, Buenos Aires, Argentina
| | | | - Sébastien Lacube
- LEPSE, INRA, Université Montpellier, SupAgro, Montpellier, France
| | - Alain Charcosset
- GQE-Le Moulon, INRA, Université Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, Gif-sur-Yvette, France
| | - Claude Welcker
- LEPSE, INRA, Université Montpellier, SupAgro, Montpellier, France
| | | | - François Tardieu
- LEPSE, INRA, Université Montpellier, SupAgro, Montpellier, France.
| |
Collapse
|
16
|
Mangin B, Rincent R, Rabier CE, Moreau L, Goudemand-Dugue E. Training set optimization of genomic prediction by means of EthAcc. PLoS One 2019; 14:e0205629. [PMID: 30779753 PMCID: PMC6380617 DOI: 10.1371/journal.pone.0205629] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2018] [Accepted: 01/03/2019] [Indexed: 12/17/2022] Open
Abstract
Genomic prediction is a useful tool for plant and animal breeding programs and is starting to be used to predict human diseases as well. A shortcoming that slows down the genomic selection deployment is that the accuracy of the prediction is not known a priori. We propose EthAcc (Estimated THeoretical ACCuracy) as a method for estimating the accuracy given a training set that is genotyped and phenotyped. EthAcc is based on a causal quantitative trait loci model estimated by a genome-wide association study. This estimated causal model is crucial; therefore, we compared different methods to find the one yielding the best EthAcc. The multilocus mixed model was found to perform the best. We compared EthAcc to accuracy estimators that can be derived via a mixed marker model. We showed that EthAcc is the only approach to correctly estimate the accuracy. Moreover, in case of a structured population, in accordance with the achieved accuracy, EthAcc showed that the biggest training set is not always better than a smaller and closer training set. We then performed training set optimization with EthAcc and compared it to CDmean. EthAcc outperformed CDmean on real datasets from sugar beet, maize, and wheat. Nonetheless, its performance was mainly due to the use of an optimal but inaccessible set as a start of the optimization algorithm. EthAcc's precision and algorithm issues prevent it from reaching a good training set with a random start. Despite this drawback, we demonstrated that a substantial gain in accuracy can be obtained by performing training set optimization.
Collapse
Affiliation(s)
- Brigitte Mangin
- LIPM, Université de Toulouse, INRA, CNRS, Castanet-Tolosan, France
- * E-mail:
| | | | - Charles-Elie Rabier
- ISEM, Univ. Montpellier, CNRS, EPHE, IRD, Montpellier, France
- LIRMM, Univ. Montpellier, CNRS, Montpellier, France
| | - Laurence Moreau
- GQE-Le Moulon, INRA, Univ Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, Gif-sur-Yvette, France
| | | |
Collapse
|
17
|
Design of training populations for selective phenotyping in genomic prediction. Sci Rep 2019; 9:1446. [PMID: 30723226 PMCID: PMC6363789 DOI: 10.1038/s41598-018-38081-6] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2017] [Accepted: 12/10/2018] [Indexed: 11/30/2022] Open
Abstract
Phenotyping is the current bottleneck in plant breeding, especially because next-generation sequencing has decreased genotyping cost more than 100.000 fold in the last 20 years. Therefore, the cost of phenotyping needs to be optimized within a breeding program. When designing the implementation of genomic selection scheme into the breeding cycle, breeders need to select the optimal method for (1) selecting training populations that maximize genomic prediction accuracy and (2) to reduce the cost of phenotyping while improving precision. In this article, we compared methods for selecting training populations under two scenarios: Firstly, when the objective is to select a training population set (TRS) to predict the remaining individuals from the same population (Untargeted), and secondly, when a test set (TS) is first defined and genotyped, and then the TRS is optimized specifically around the TS (Targeted). Our results show that optimization methods that include information from the test set (targeted) showed the highest accuracies, indicating that apriori information from the TS improves genomic predictions. In addition, predictive ability enhanced especially when population size was small which is a target to decrease phenotypic cost within breeding programs.
Collapse
|
18
|
Genomic Prediction Within and Across Biparental Families: Means and Variances of Prediction Accuracy and Usefulness of Deterministic Equations. G3-GENES GENOMES GENETICS 2017; 7:3571-3586. [PMID: 28916649 PMCID: PMC5677162 DOI: 10.1534/g3.117.300076] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
A major application of genomic prediction (GP) in plant breeding is the identification of superior inbred lines within families derived from biparental crosses. When models for various traits were trained within related or unrelated biparental families (BPFs), experimental studies found substantial variation in prediction accuracy (PA), but little is known about the underlying factors. We used SNP marker genotypes of inbred lines from either elite germplasm or landraces of maize (Zeamays L.) as parents to generate in silico 300 BPFs of doubled-haploid lines. We analyzed PA within each BPF for 50 simulated polygenic traits, using genomic best linear unbiased prediction (GBLUP) models trained with individuals from either full-sib (FSF), half-sib (HSF), or unrelated families (URF) for various sizes (Ntrain) of the training set and different heritabilities (h2). In addition, we modified two deterministic equations for forecasting PA to account for inbreeding and genetic variance unexplained by the training set. Averaged across traits, PA was high within FSF (0.41–0.97) with large variation only for Ntrain<50 and h2<0.6. For HSF and URF, PA was on average ∼40–60% lower and varied substantially among different combinations of BPFs used for model training and prediction as well as different traits. As exemplified by HSF results, PA of across-family GP can be very low if causal variants not segregating in the training set account for a sizeable proportion of the genetic variance among predicted individuals. Deterministic equations accurately forecast the PA expected over many traits, yet cannot capture trait-specific deviations. We conclude that model training within BPFs generally yields stable PA, whereas a high level of uncertainty is encountered in across-family GP. Our study shows the extent of variation in PA that must be at least reckoned with in practice and offers a starting point for the design of training sets composed of multiple BPFs.
Collapse
|
19
|
Rincent R, Charcosset A, Moreau L. Predicting genomic selection efficiency to optimize calibration set and to assess prediction accuracy in highly structured populations. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2017; 130:2231-2247. [PMID: 28795202 PMCID: PMC5641287 DOI: 10.1007/s00122-017-2956-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/09/2016] [Accepted: 07/26/2017] [Indexed: 05/02/2023]
Abstract
KEY MESSAGE We propose a criterion to predict genomic selection efficiency for structured populations. This criterion is useful to define optimal calibration set and to estimate prediction reliability for multiparental populations. Genomic selection refers to the use of genotypic information for predicting the performance of selection candidates. It has been shown that prediction accuracy depends on various parameters including the composition of the calibration set (CS). Assessing the level of accuracy of a given prediction scenario is of highest importance because it can be used to optimize CS sampling before collecting phenotypes, and once the breeding values are predicted it informs the breeders about the reliability of these predictions. Different criteria were proposed to optimize CS sampling in highly diverse panels, which can be useful to screen collections of genotypes. But plant breeders often work on structured material such as biparental or multiparental populations, for which these criteria are less adapted. We derived from the generalized coefficient of determination (CD) theory different criteria to optimize CS sampling and to assess the reliability associated to predictions in structured populations. These criteria were evaluated on two nested association mapping (NAM) populations and two highly diverse panels of maize. They were efficient to sample optimized CS in most situations. They could also estimate at least partly the reliability associated to predictions between NAM families, but they could not estimate differences in the reliability associated to the predictions of NAM families using the highly diverse panels as calibration sets. We illustrated that the CD criteria could be adapted to various prediction scenarios including inter and intra-family predictions, resulting in higher prediction accuracies.
Collapse
Affiliation(s)
- R Rincent
- INRA, UMR 1095 Génétique, Diversité et Ecophysiologie des Céréales, 5 chemin de Beaulieu, 63100, Clermont-Ferrand, France.
- Université Blaise Pascal, UMR 1095 Génétique, Diversité et Ecophysiologie des Céréales, 63178, Aubière Cedex, France.
| | - A Charcosset
- UMR de Génétique Végétale, INRA - Université Paris-Sud - CNRS, 91190, Gif-Sur-Yvette, France
| | - L Moreau
- UMR de Génétique Végétale, INRA - Université Paris-Sud - CNRS, 91190, Gif-Sur-Yvette, France
| |
Collapse
|
20
|
Garin V, Wimmer V, Mezmouk S, Malosetti M, van Eeuwijk F. How do the type of QTL effect and the form of the residual term influence QTL detection in multi-parent populations? A case study in the maize EU-NAM population. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2017; 130:1753-1764. [PMID: 28547012 PMCID: PMC5511610 DOI: 10.1007/s00122-017-2923-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/17/2016] [Accepted: 05/11/2017] [Indexed: 05/25/2023]
Abstract
In the QTL analysis of multi-parent populations, the inclusion of QTLs with various types of effects can lead to a better description of the phenotypic variation and increased power. For the type of QTL effect in QTL models for multi-parent populations (MPPs), various options exist to define them with respect to their origin. They can be modelled as referring to close parental lines or to further away ancestral founder lines. QTL models for MPPs can also be characterized by the homo- or heterogeneity of variance for polygenic effects. The most suitable model for the origin of the QTL effect and the homo- or heterogeneity of polygenic effects may be a function of the genetic distance distribution between the parents of MPPs. We investigated the statistical properties of various QTL detection models for MPPs taking into account the genetic distances between the parents of the MPP. We evaluated models with different assumptions about the QTL effect and the form of the residual term using cross validation. For the EU-NAM data, we showed that it can be useful to mix in the same model QTLs with different types of effects (parental, ancestral, or bi-allelic). The benefit of using cross-specific residual terms to handle the heterogeneity of variance was less obvious for this particular data set.
Collapse
Affiliation(s)
- Vincent Garin
- Biometris, Wageningen University and Research Center, P.O Box 100, 6700 AC, Wageningen, The Netherlands.
| | | | | | - Marcos Malosetti
- Biometris, Wageningen University and Research Center, P.O Box 100, 6700 AC, Wageningen, The Netherlands
| | - Fred van Eeuwijk
- Biometris, Wageningen University and Research Center, P.O Box 100, 6700 AC, Wageningen, The Netherlands
| |
Collapse
|
21
|
Neyhart JL, Tiede T, Lorenz AJ, Smith KP. Evaluating Methods of Updating Training Data in Long-Term Genomewide Selection. G3 (BETHESDA, MD.) 2017; 7:1499-1510. [PMID: 28315831 PMCID: PMC5427505 DOI: 10.1534/g3.117.040550] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/14/2017] [Accepted: 03/10/2017] [Indexed: 12/22/2022]
Abstract
Genomewide selection is hailed for its ability to facilitate greater genetic gains per unit time. Over breeding cycles, the requisite linkage disequilibrium (LD) between quantitative trait loci and markers is expected to change as a result of recombination, selection, and drift, leading to a decay in prediction accuracy. Previous research has identified the need to update the training population using data that may capture new LD generated over breeding cycles; however, optimal methods of updating have not been explored. In a barley (Hordeum vulgare L.) breeding simulation experiment, we examined prediction accuracy and response to selection when updating the training population each cycle with the best predicted lines, the worst predicted lines, both the best and worst predicted lines, random lines, criterion-selected lines, or no lines. In the short term, we found that updating with the best predicted lines or the best and worst predicted lines resulted in high prediction accuracy and genetic gain, but in the long term, all methods (besides not updating) performed similarly. We also examined the impact of including all data in the training population or only the most recent data. Though patterns among update methods were similar, using a smaller but more recent training population provided a slight advantage in prediction accuracy and genetic gain. In an actual breeding program, a breeder might desire to gather phenotypic data on lines predicted to be the best, perhaps to evaluate possible cultivars. Therefore, our results suggest that an optimal method of updating the training population is also very practical.
Collapse
Affiliation(s)
- Jeffrey L Neyhart
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, Minnesota 55108
| | - Tyler Tiede
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, Minnesota 55108
| | - Aaron J Lorenz
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, Minnesota 55108
| | - Kevin P Smith
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, Minnesota 55108
| |
Collapse
|